Programming and software engineering are not new disciplines. Although software engineering has under gone shifts in philosophy, the fundamental mechanism used for defining the logic within a system is still the same as it was decades ago. Advances in languages and constructs have made the development of software easier and have reduced development time. Each successive generation of programming languages has obtained this simplification by introducing further abstractions away from the complexities of generating machine specific instructions required to actually run an executable within or across operating systems. These advances are still occurring today, and we are now able to develop more complex programs more rapidly than at any time in the past. This rapid progression has empowered the scientific developer, as modern experimental driven biological science requires the rapid development of algorithms and systems. Biology, in all its forms, is fundamentally an observational and experimental science. Whether it be ecology, neuroscience, clinical studies, or molecular biology, the high volumes of semantically rich large biological data sets require a high level of software development. This means that the software must be developed to a high standard and in a minimal amount of time. Therefore, to meet the demands of developing software to support research, the scientific developer must know about the latest tools and techniques. This chapter introduces some of these tools and techniques, in particular those that will help in the development of data intensive applications.
KeywordsLife Science Unify Modeling Language Application Server Business Logic Class Library
This work was supported by Grant Number P50GMO76547 from the National Institute of General Medical Sciences. The content is solely the responsibility of the author and does not necessarily represent the official views of the NIGMS or the NIH.
AOP Aspect Oriented Programming is used to easily apply cross-cutting functionality (e.g. logging) to programs. A programmer typically defines a method as having a particular aspect, and a separate framework will be responsible for ensuring that the correct behavior occurs (e.g., when, how and where the code injection occurs). This cross-cutting can be injected into method calls at a variety of times in an objects life cycle.
CORBA The Common Object Request Broker Architecture supports interoperability between distributed processes (applications). Central to the architecture is an ORB (object request broker) which both marshals data and controls compartmentalization (to allow for invocation on specific remote threads etc.) of the different processes. The specification was defined by the OMG, and ORBS are available for most platforms.
EJB. Enterprise Java Beans provides a means for writing application servers in Java. A container manages a number of enterprise beans and provides access to common functionality and imposes control over the beans (e.g., life cycle, resource management, transactions and security). The EJB specification has evolved considerably since its first release and is now a feature rich framework which can be used to easily develop complex server side functionality.
IDL The Interface Definition Language formalizes the remote interfaces that can be accessed through CORBA. IDL has evolved considerably, with the advent of pass-by-value and components (facets). A WSDL serves the same type of purpose for Web Services.
JCR The Java Content Repository is a specific Java Standard (JSR-170) for defining the interface to a content repository. A content repository is a flexible system that is typically customized for a specific usage, when customized, it is referred to as a Content Management System (CMS).
JSR Java Specification Requests is the process through which community standards are achieved for Java. The requests are diverse and have led to a number of useful reference implementations.
LSID The Life Science Identifier standard provides a concentrate definition and implementation of a URN. The LSID specification outlines how the URN is resolved to two locations (the data and the metadata) through the use of “an authority.” In this way, the authority acts as a registry. The documents that are retrieved are returned as objects and an associated RDF data file which encodes the metadata. The standard also encompasses many aspects of using URNS and includes specifications for associated services (e.g., assignment).
Maven Maven is a build and artifact management tool available from Apache. Its primary use is for Java.
MIDL The Microsoft Interface Definition Language serves a similar purpose to IDL but is generally based on specifying the remote procedure call interface which is used between COM components.
OASIS The Organization for the Advancement of Structured Information Standards is a standard body made up of members from a large number of organizations. They have been particular effective in driving forward standards for Web Service extensions.
OMG The Object Management Group is an open not for profit standardization body. The OMG have produced a number of horizontal (e.g., Trader service, Naming service, Event Service) and vertical (see LSR) standards for use with CORBA.
ORM Object Relation Mapping provides a means to map object onto relational databases, and to map relational databases into objects. A number of ORM solutions are available with hibernate being the most prevalent.
OWL The Web Ontology Language is an RDF description of an underlying data resource. The ontology describes the data items produced through a web service as well as the relationships between them.
POJO A Plain Old Java Object is one that uses “separation of concerns,” so that only business logic (and not, for example, server logic) is implemented. Any required dependencies and services are injected after the code has been written.
QL In the EJB 2.0 standard, a query language was introduced, this was originally to standardizes the “finder” logic in the now obsolete EJB Homes.
REST Representational State Transfer (REST) can be considered an alternative to SOAP, although it is considerably easier to implement. REST uses pre-existing technologies as the basis for the protocol (e.g., “the web is the platform”). There exists some confusion about what represents a Restful service, rather than just an HTTP encoded request for an XML document. True REST is based upon the verb/noun/type based calls, where you apply an operation (verb e.g., POST, GET, PUT and DELETE) to a URI (noun) with a certain view (type).
RUP The rational unified process was a software development process that was popularized through the release of Rational Rose and associated tools. It centered on UML and provided a means to gather use cases, match them to features, and track feature development and defects. Its popularity has decreased significantly over the last decade.
SOAP SOAP is a protocol for making requests on remote services to return structured data. It is designed to use any high level protocol that supports the sending of information and is primarily used with http. Much like CORBA, interoperability is the big draw of SOAP, and (unlike CORBA) SOAP has the advantage of being simple to develop and test. The original stateless nature of SOAP limited its usage; however, with the advent of WS-RF (and other standards) SOAP is maturing into a general purpose object protocol.
UML The Unified Modeling Language formalizes visual representations for most aspects of software design. This formalization encompasses uses cases, class structure, state transitions, sequence of method calls, and deployment scenarios.
WS-* The WS-* are a series of specifications for adding functionality to SOAP. These extensions provide new functionality such as security, messaging, binary object attachment and state. These extensions generally involve the addition of information to the SOAP message (within the envelope). State information can be maintained between SOAP calls through the use of resource frameworks (e.g., WS-RF).
XP Extreme Programming was largely a reaction to the “over specification” that was proposed though the RUP. XP advocated a number of approaches that were designed to ensure more responsive (Agile) well written code could be developed.
- Booch G (1996) Object Solutions: Managing the Object-Oriented ProjectGoogle Scholar
- LSR. (cited: Available from: http://www.omg.org/lsr)
- caBIO. (cited: Available from: http://cabio.nci.nih.gov/)
- Senger M, Rice P, Oinn T (2003) SOAPLab – a unified Sesame door to analysis tools. In: UK e-Science All Hands Meeting, Nottingham, UKGoogle Scholar
- LexGrid. (cited: Available from http://informatics.mayo.edu/LexGrid/)
- Goble C (2005) Putting semantics into e-science and grids in proceedings E-science. In: 1st IEEE international conference on e-science and grid technologies, Melbourne, AustraliaGoogle Scholar
- Cao J et al (2003) GridFlow: Workflow management for grid computing. In: 3rd international symposium on cluster computing and the grid. IEEEGoogle Scholar
- Amin K et al (2004) GridAnt: A client-controllable grid workflow system. In: 7th international conference on system sciences. IEEE: HawaiiGoogle Scholar
- Scaffidi C, Shaw M, Myers B (2005) Estimating the numbers of end users and end user programmers. In: Proceedings of 2005 IEEE symposium on visual languages and human-centric computing, Dallas, TexasGoogle Scholar
- Batory D (2004) Feature-oriented programming and the AHEAD tool suite. In: international conference on software engineering, Edinburgh, UKGoogle Scholar
- Stajich et al (2002) The bioperl toolkit: Perl modules for the life sciences. Genome Res (12):1611–1618Google Scholar
- Vambenepe WT, Thompson C, Talwar V, Rafaeli S, Murray B, Milojicic D, Iyer S, Farkas K, Arlitt M (2005) Dealing with scale and adaptation of global web services management. In: IEEE International Conference on Web Services (KWS 2005), ISBN 0-7695-2409-5Google Scholar