Advertisement

Bioinformatics pp 403-440 | Cite as

Programming Languages

  • John Boyle
Chapter

Abstract

Programming and software engineering are not new disciplines. Although software engineering has under gone shifts in philosophy, the fundamental mechanism used for defining the logic within a system is still the same as it was decades ago. Advances in languages and constructs have made the development of software easier and have reduced development time. Each successive generation of programming languages has obtained this simplification by introducing further abstractions away from the complexities of generating machine specific instructions required to actually run an executable within or across operating systems. These advances are still occurring today, and we are now able to develop more complex programs more rapidly than at any time in the past. This rapid progression has empowered the scientific developer, as modern experimental driven biological science requires the rapid development of algorithms and systems. Biology, in all its forms, is fundamentally an observational and experimental science. Whether it be ecology, neuroscience, clinical studies, or molecular biology, the high volumes of semantically rich large biological data sets require a high level of software development. This means that the software must be developed to a high standard and in a minimal amount of time. Therefore, to meet the demands of developing software to support research, the scientific developer must know about the latest tools and techniques. This chapter introduces some of these tools and techniques, in particular those that will help in the development of data intensive applications.

Keywords

Life Science Unify Modeling Language Application Server Business Logic Class Library 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This work was supported by Grant Number P50GMO76547 from the National Institute of General Medical Sciences. The content is solely the responsibility of the author and does not necessarily represent the official views of the NIGMS or the NIH.

Glossary

ADT Abstract data types pre-date the adoption of object-oriented programming. They provided a means to reuse storage and retrieval structures, and are similar to “generics” (e.g., lists, tables, queues).

AOP Aspect Oriented Programming is used to easily apply cross-cutting functionality (e.g. logging) to programs. A programmer typically defines a method as having a particular aspect, and a separate framework will be responsible for ensuring that the correct behavior occurs (e.g., when, how and where the code injection occurs). This cross-cutting can be injected into method calls at a variety of times in an objects life cycle.

BPEL The business process execution language is a specification designed to support the high level orchestration of web services. The heart of the BPEL specification is the scripting language which defines how services and data produced by them are linked together. This specification is rich enough to allow for most workflows and defines both how method invocations and data are linked and the how web services should be coordinated (e.g., concurrency, choices, sequential operations). The specification also defines extensions to the WSDL which can be used to specify links between services.

CORBA The Common Object Request Broker Architecture supports interoperability between distributed processes (applications). Central to the architecture is an ORB (object request broker) which both marshals data and controls compartmentalization (to allow for invocation on specific remote threads etc.) of the different processes. The specification was defined by the OMG, and ORBS are available for most platforms.

DCOM Provides for a means to make distributed calls between COM (Component Object Model) objects. Thread compartmentalization and marshalling (using low level XML interchange) are handled automatically. For application developers, this has largely been superseded by.NET Remoting.

EJB. Enterprise Java Beans provides a means for writing application servers in Java. A container manages a number of enterprise beans and provides access to common functionality and imposes control over the beans (e.g., life cycle, resource management, transactions and security). The EJB specification has evolved considerably since its first release and is now a feature rich framework which can be used to easily develop complex server side functionality.

I3C The I3C was a short lived commercially led organization established to standardize aspects of life science informatics. The organization was led by Oracle, Sun and IBM. The I3C did promote the use of LSIDs, which have been adopted by the OMG.

IDL The Interface Definition Language formalizes the remote interfaces that can be accessed through CORBA. IDL has evolved considerably, with the advent of pass-by-value and components (facets). A WSDL serves the same type of purpose for Web Services.

IIOP The Internet Inter-Orb Protocol is the means through which Object Request Brokers (ORBS) communicate. This allows for discovery, life cycle and compartmentalization of object requests.

JCR The Java Content Repository is a specific Java Standard (JSR-170) for defining the interface to a content repository. A content repository is a flexible system that is typically customized for a specific usage, when customized, it is referred to as a Content Management System (CMS).

JNDI The Java Naming and Directory Interface are the specification for the directory and naming system using within Java. The underlying system can use a variety of systems (e.g., RMI Registry) and provides a means to discovery and query resources.

JSR Java Specification Requests is the process through which community standards are achieved for Java. The requests are diverse and have led to a number of useful reference implementations.

LINQ This is a.NET project that extends the platform to allow for general resource querying from within code. Resources that are queried can then be accessed as objects within the framework.

LSID The Life Science Identifier standard provides a concentrate definition and implementation of a URN. The LSID specification outlines how the URN is resolved to two locations (the data and the metadata) through the use of “an authority.” In this way, the authority acts as a registry. The documents that are retrieved are returned as objects and an associated RDF data file which encodes the metadata. The standard also encompasses many aspects of using URNS and includes specifications for associated services (e.g., assignment).

LSR The Life Science Research group of the OMG defines standard in the “vertical” life science domain. The body has defined and adopted a number of standards. These standards cover a wide range of areas (including the “sequence” and “literature”).

Maven Maven is a build and artifact management tool available from Apache. Its primary use is for Java.

MDA A Model Driven Architecture is one where the model underlying the system is defined in a language independent way, and the corresponding services/classes are automatically pushed out from that model. Typically, the model is defined in UML, and then XMI is used to automatically generate stubs/skeletons which can be used to provide implementations of the model.

MIDL The Microsoft Interface Definition Language serves a similar purpose to IDL but is generally based on specifying the remote procedure call interface which is used between COM components.

MVC Model view controller pattern is commonly used in both web application frameworks and GUI frameworks. Commands are managed by the controller, which directs changes to an underlying model, and (multiple) views provide representations of the model.

OASIS The Organization for the Advancement of Structured Information Standards is a standard body made up of members from a large number of organizations. They have been particular effective in driving forward standards for Web Service extensions.

ODBC/OLEDB The Open Database Base Connectivity is a definition of the interface presented by a DBMS. The ODBC specification is well established and bridges with other technologies (including JDBC). The OLEDB is an extension to the ODBC offer richer functionality.

OMG The Object Management Group is an open not for profit standardization body. The OMG have produced a number of horizontal (e.g., Trader service, Naming service, Event Service) and vertical (see LSR) standards for use with CORBA.

OMT The Object Modeling Technique is a predecessor to UML and provides a formal representation of the design of software.

ORM Object Relation Mapping provides a means to map object onto relational databases, and to map relational databases into objects. A number of ORM solutions are available with hibernate being the most prevalent.

OSGi OSGi is a standards organization which provides a framework for building applications. The framework provides for both a means for components within an application to be discovered, and also an updating mechanism.

OWL The Web Ontology Language is an RDF description of an underlying data resource. The ontology describes the data items produced through a web service as well as the relationships between them.

P(M)BV Pass (or Marshall) By Value in distributed systems allows for objects to be moved between nodes, rather than using remote references.

POJO A Plain Old Java Object is one that uses “separation of concerns,” so that only business logic (and not, for example, server logic) is implemented. Any required dependencies and services are injected after the code has been written.

PURL A Persistence URL is one that points to a resolution service, which ensures that the underlying resource can always be located with the PURL.

QL In the EJB 2.0 standard, a query language was introduced, this was originally to standardizes the “finder” logic in the now obsolete EJB Homes.

RDBMS A relational database management system is the environment in which relational database instances exists. A RDBMS provides a unified framework which can be used to control the physical (tablespaces), conceptual (logical schemas), and external (views) of databases.

REST Representational State Transfer (REST) can be considered an alternative to SOAP, although it is considerably easier to implement. REST uses pre-existing technologies as the basis for the protocol (e.g., “the web is the platform”). There exists some confusion about what represents a Restful service, rather than just an HTTP encoded request for an XML document. True REST is based upon the verb/noun/type based calls, where you apply an operation (verb e.g., POST, GET, PUT and DELETE) to a URI (noun) with a certain view (type).

RMI Remote Method Innovation is a Java-to-Java solution for communication between distributed Java threads/applications. RMI uses a number of abstraction layers (remote reference layer/RRL and transport layer), this has a number of advantages including the fact that different underlying protocols can be used to actually provide the communication (e.g., IIOP). Marshalling is done through serialization, leasing is available, and distributed GC is supported. RMI is a convenient, but not interoperable, protocol.

RUP The rational unified process was a software development process that was popularized through the release of Rational Rose and associated tools. It centered on UML and provided a means to gather use cases, match them to features, and track feature development and defects. Its popularity has decreased significantly over the last decade.

SOA A Service Oriented Architecture is one which consists of loosely coupled federated services. There is typically little linkage between these services, and they are generally discovered dynamically using a registry system or similar. SOAs have grown in popularity within many enterprises, as they provide a practical and convenient for disparate groups to share information/processes.

SOAP SOAP is a protocol for making requests on remote services to return structured data. It is designed to use any high level protocol that supports the sending of information and is primarily used with http. Much like CORBA, interoperability is the big draw of SOAP, and (unlike CORBA) SOAP has the advantage of being simple to develop and test. The original stateless nature of SOAP limited its usage; however, with the advent of WS-RF (and other standards) SOAP is maturing into a general purpose object protocol.

SPARQL The SPARQL Protocol and RDF Query Language are designed to allow for the querying and retrieval of documents across multiple unstructured data stores. The power of the system is the distributed RDF documents (or other data stores) remain unchanged, but queries can be run across them – and so it fits well with a “bottom-up” approach. Such a unified approach to accessing information is required to make the semantic web (Web 3.0) a reality, and there do already exist some implementations.

UML The Unified Modeling Language formalizes visual representations for most aspects of software design. This formalization encompasses uses cases, class structure, state transitions, sequence of method calls, and deployment scenarios.

URN A Uniform Resource Name is a type of URI (Uniform Resource Identifier). It is the logical counterpart to a URL, in that it provides the name of a resource rather than the exact location of a resource. A number of URN implementations are available, including LSIDs.

WS-* The WS-* are a series of specifications for adding functionality to SOAP. These extensions provide new functionality such as security, messaging, binary object attachment and state. These extensions generally involve the addition of information to the SOAP message (within the envelope). State information can be maintained between SOAP calls through the use of resource frameworks (e.g., WS-RF).

WSDL The Web Service Description Language provides a means to specify the interface exposed by a SOAP Web Service. The WSDL document can be automatically retrieved, and tools can be use to generate convenience classes for specific languages, so that no XML parsing code needs to be written by the developer. When writing a WSDL a number of standards (e.g., WS-I) are available to ensure interoperability, typically though the use of profiles with literal/document “styles.”

XP Extreme Programming was largely a reaction to the “over specification” that was proposed though the RUP. XP advocated a number of approaches that were designed to ensure more responsive (Agile) well written code could be developed.

References

  1. Booch G (1996) Object Solutions: Managing the Object-Oriented ProjectGoogle Scholar
  2. Wilkinson M, Links M (2002) BioMOBY: An open source biological web services proposal. Brief Bioinform 3(4):331–341CrossRefPubMedGoogle Scholar
  3. Etzold T, Ulyanov A, Argos P (1998) SRS: Information retrieval system for molecular biology data banks. Meth Enzymol 266:114–128CrossRefGoogle Scholar
  4. Birney E (2004) An overview of Ensembl. Genome Research 14(5):925–928CrossRefPubMedGoogle Scholar
  5. Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC (2001) Discovery link: A system for integrated access to life sciences data sources. IBM Syst J 40(2):489–511CrossRefGoogle Scholar
  6. LSR. (cited: Available from: http://www.omg.org/lsr)
  7. caBIO. (cited: Available from: http://cabio.nci.nih.gov/)
  8. Senger M, Rice P, Oinn T (2003) SOAPLab – a unified Sesame door to analysis tools. In: UK e-Science All Hands Meeting, Nottingham, UKGoogle Scholar
  9. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0. Nat Genet 38:500–501CrossRefPubMedGoogle Scholar
  10. Marzolf B, Deutsch EW, Moss P, Campbell D, Johnson MH, Galitski T (2006) SBEAMS-Microarray: Database software supporting genomic expression analyses for systems biology. BMC Bioinformatics 7:286–291CrossRefPubMedGoogle Scholar
  11. LexGrid. (cited: Available from http://informatics.mayo.edu/LexGrid/)
  12. Goble C (2005) Putting semantics into e-science and grids in proceedings E-science. In: 1st IEEE international conference on e-science and grid technologies, Melbourne, AustraliaGoogle Scholar
  13. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock M, Wipat A, Li P (2004) Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045–3054CrossRefPubMedGoogle Scholar
  14. Cao J et al (2003) GridFlow: Workflow management for grid computing. In: 3rd international symposium on cluster computing and the grid. IEEEGoogle Scholar
  15. Covitz P, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahri H, Gustafson S and Buetow K (2003) “caCORE: A common infrastructure for cancer informatics” Bioinformatics, 19, 18, pp 2404–2412CrossRefPubMedGoogle Scholar
  16. Amin K et al (2004) GridAnt: A client-controllable grid workflow system. In: 7th international conference on system sciences. IEEE: HawaiiGoogle Scholar
  17. Quackenbush J et al (2006) Top-down standards will not serve systems biology. Nature 440(7080):24CrossRefPubMedGoogle Scholar
  18. Scaffidi C, Shaw M, Myers B (2005) Estimating the numbers of end users and end user programmers. In: Proceedings of 2005 IEEE symposium on visual languages and human-centric computing, Dallas, TexasGoogle Scholar
  19. Batory D (2004) Feature-oriented programming and the AHEAD tool suite. In: international conference on software engineering, Edinburgh, UKGoogle Scholar
  20. Stajich et al (2002) The bioperl toolkit: Perl modules for the life sciences. Genome Res (12):1611–1618Google Scholar
  21. Vambenepe WT, Thompson C, Talwar V, Rafaeli S, Murray B, Milojicic D, Iyer S, Farkas K, Arlitt M (2005) Dealing with scale and adaptation of global web services management. In: IEEE International Conference on Web Services (KWS 2005), ISBN 0-7695-2409-5Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.The Institute for Systems BiologySeattleUSA

Personalised recommendations