JAMI: a Java library for molecular interactions and data interoperability
A number of different molecular interactions data download formats now exist, designed to allow access to these valuable data by diverse user groups. These formats include the PSI-XML and MITAB standard interchange formats developed by Molecular Interaction workgroup of the HUPO-PSI in addition to other, use-specific downloads produced by other resources. The onus is currently on the user to ensure that a piece of software is capable of read/writing all necessary versions of each format. This problem may increase, as data providers strive to meet ever more sophisticated user demands and data types.
A collaboration between EMBL-EBI and the University of Cambridge has produced JAMI, a single library to unify standard molecular interaction data formats such as PSI-MI XML and PSI-MITAB. The JAMI free, open-source library enables the development of molecular interaction computational tools and pipelines without the need to produce different versions of software to read different versions of the data formats.
Software and tools developed on top of the JAMI framework are able to integrate and support both PSI-MI XML and PSI-MITAB. The use of JAMI avoids the requirement to chain conversions between formats in order to reach a desired output format and prevents code and unit test duplication as the code becomes more modular. JAMI’s model interfaces are abstracted from the underlying format, hiding the complexity and requirements of each data format from developers using JAMI as a library.
KeywordsMolecular interactions Protein-protein interaction Protein complexes Data standards HUPO-PSI PSI-MI
Human Proteomics Organization
- IMEx Consortium
International Molecular Exchange Consortium
Java Molecular Interaction framework
Proteomics Standards Initiative
Molecular interaction data is crucial to the study and understanding of the molecular biology of a cell. These data are large and complex, but the creation of a standardised data interchange format (PSI-MI XML) allowed easier access, enabling users to merge data from disparate resources and encouraging the development of tools and software that facilitated network visualisation and analysis. Version 1.0  of the format only allowed a relatively simple description of protein interactions but as the data grew, limitations of the original format were identified, and an updated version, PSI-MI XML2.5 , was released in 2007. It allows the description of interactions between molecules other than proteins, and enables the detailed capture of both experimental context and the constructs used in each assay. This version of the interchange format is still widely used to capture experimental data, but the need to describe more abstract concepts has recently resulted in the release of PSI-MI XML 3.0 . PSI-MI XML3.0 allows the capture of details of cooperative or allosteric binding sites, the composition of protein complexes taken from multiple publications, and more complex data types such as dynamic interaction networks that change with time or with concentration of agonist. A simpler tab-delimited representation of molecular interaction data has also been available since 2007 but this has also grown in complexity in response to user requests, and MITAB2.5, 2.6 and 2.7 are now all available . Additionally, at the 2017 HUPO-PSI workshop, the Molecular Interaction workgroup decided the newly developed MI-JSON will be its recommended protocol for serving interaction data to web pages and visualisation tools.
PSI-MI XML, MITAB and MI-JSON are all capable of holding the same data, in differing degrees of detail, and are all annotated using a single shared controlled vocabulary but exist to serve different user groups. The XML format is largely used by software developers and database managers, the MITAB by biologists interested in simple binary representation and the MI-JSON for visual representation. Updating any data format necessitates changes to many dependent systems. A broad range of software, including curation, editing, export, visualisation, validation and analysis packages use the PSI-MI formats to access and manipulate the data and consequently need to be updated with every format update. Format updates add complexity to existing software packages, as the programs need to be extended to utilise the new version whilst still continuing to support those already existing and widely-used. These software and standards are consumed by a diverse group of organisations with different levels of resources, ranging from PhD students in small research groups to data pipeline specialists in pharmaceutical or bioinformatics companies. Potentially some groups may end up using legacy standards and software for many years simply because they do not possess the skills, time, or budget to update their software.
Supporting such diverse needs is time and resource intensive, yet securing funding for software maintenance is challenging . Each new data format is useful and must be maintained, but each update generates a new library, with duplicated code, requiring parallel testing and generating its own bugs. In summary, while new formats meet genuine need, they also result in an expensive cascade of changes to software and tools.
The JAMI (Java Molecular Interaction framework) library was developed, using an object-orientated approach, to address these concerns. JAMI can import, inter-convert and re-export molecular interaction data in a variety of formats and versions. The software has been designed to ensure that modules to read/write new format types can easily be written and added to the library, thus providing a single change-resilient software component to handle all molecular interaction data. It is generally intended that the JAMI library will be used within a Java application, rather than being made available as an API, but users could look to develop a programmatic interface using the JAMI framework, if required. Given the change-resilient remit of the JAMI framework, it was necessary to ensure that JAMI can handle multiple use cases. It needs to concurrently support legacy data models, contemporary data models, and any new changes required in the future, as interaction data becomes ever more sophisticated in its nature. For this reason, the data model was deliberately kept flexible enough to expand, with all classes being interfaces with a default implementation. Implementations may be added, edited or removed if necessary over time. Main entities in the data model include Complex, Interaction, Entity, Participant, and Publication - interfaces with a default implementation and format-specific overloaded behaviours. For example, PSI-XML 2.5  allowed experiment descriptions to contain either a cross-reference to a Publication object, or directly contain a list of attributes such as author and journal, whereas in XML 3.0, it is possible to associate both of these data members with an experiment . Since the Publication and XML export classes are only interfaces, exporting the two different types of Publication can be handled by the same software, with implementation classes reconciling the two XML versions.
When included as a library in bioinformatics software, JAMI hides the complexity of supporting multiple data formats. It facilitates data import, integration and analysis, simplifying software development by offering a single API. JAMI also eases the creation of new interchange formats, like JSON-LD or RDF. Additional formats can be added once to JAMI and are then supported in multiple software packages with little effort. Similarly, JAMI prevents code duplication - each of these software sources drawing from JAMI now share code, ensuring less effort is put into the development of multiple XML/MITAB parsing modules.
JAMI-Core and JAMI-commons
JAMI-Core forms the foundation of JAMI, comprising of standalone modules for each data input/output type that JAMI is capable of handling, alongside appropriate listeners and factories to instantiate JAMI’s classes. For software using JAMI internally, JAMI-Commons serves as the default entry point, functioning as a thin code-loading helper to import the XML and MITAB input/output modules, along with all relevant code dependencies. Dependencies are limited where possible to prevent potentially disruptive updates from third parties.
JAMI-Core is also the home of code that handles spoke and matrix expansion. Spoke or matrix expansion is used to convert interactions with more than two participants into multiple binary interactions - this is frequently needed as many tools only operate on binary interactions.
Data input/output types
Input and output formats currently provided by the JAMI library
PSI-XML (2.5 and 3.0)
JAMI’s enricher package allows known interactors sparsely annotated with little descriptive information to be enriched with additional data accessed from external sources. An example of this, as shown in blue at the top of Fig. 1, demonstrates fetching additional data from UniProtKB  to enrich protein interactors and ChEBI  for small molecules.
The enrichment package communicates with external web services via a suite of modular web service-specific fetcher packages within the JAMI-Bridges package. Many of these sources are biomedical ontologies and a module (JAMI-OLS) has been developed to import data via the Ontology Look-up Service (OLS), thus giving the user access to the 204 ontologies (December 2017) available through the OLS API. Separating the fetcher/bridges packages from the enricher provides an abstraction layer that ensures external changes, such as adding or removing enrichment sources, cannot unintentionally affect the entire software architecture.
Given the change-resilient remit of the JAMI framework, it is necessary to ensure that JAMI can handle multiple current use cases. It needs to concurrently support legacy data models, contemporary data models, and any new changes enacted. For this reason, the data model was deliberately kept flexible enough to expand, with all classes being interfaces with a default implementation. Implementations may be added, edited or removed if necessary over time.
The Proteomics Standards Initiative (PSI) has developed and actively promotes the use of open standard data formats and has a proven track record in developing robust, pluggable programming interfaces to address the issue of data being made available in a range of formats .
The JAMI library has been created to address this problem in the field of molecular interactions and has already been adopted by a number of resources which process, import, export and/or visualise interaction data. We describe below a number of implementations and use these to exemplify the utility and flexibility of the JAMI framework.
The biological data warehouse InterMine  provides another concrete example of how JAMI can be used to input and output molecular interaction data. InterMine is organism-agnostic and open source, designed to consolidate discrete data sources with varied data formats into a single database. Data can be accessed via a web application interface or more directly via APIs, with clients available in multiple languages. Efficient and maintainable data import and export is therefore of particular significance to InterMine.
As detailed above, the specialist ComplexViewer visualisation software requires a custom MI-JSON data format for input. Once again, InterMine uses the JAMI library to perform this task, by querying the database for data related to the interaction, and transforming it to match the JAMI data model. JAMI then is able to export a MI-JSON file to ComplexViewer.
Chord interaction diagram
HUPO PSI-MI semantic validator
Molecular interaction files can be either rapidly curated, curated to MIMIx specifications  or contain the detailed information captured by members of the IMEx Consortium  but, in all cases, need to be semantically valid to enable data exchange and merger. The PSI-MI semantic validator (http://www.ebi.ac.uk/intact/validator)  not only checks the XML syntax of a submitted file but also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Previously, the validator was only able to validate PSI-PAR and PSI-XML 2.5, but using JAMI it can now also validate MITAB 2.5, 2.6, and 2.7 and PSI-XML 3.0 ].
Agile protein Interactomes DataServer (APID)
APID (Agile Protein Interactomes DataServer)  is a database that provides a comprehensive collection of protein interactomes for more than 400 organisms based on the integration of known, experimentally validated protein-protein physical interactions from several primary databases, e.g. BIND , BioGRID , DIP, HPRD , IntAct, and MINT. Construction of the interactomes is done with a methodological approach to report quality levels and coverage over the proteomes for each organism included. The APID algorithm uses a protocol based on JAMI to process all of the PSI-XML formatted data and then uses the JAMI-generated interaction objects in all the workflows which have been subsequently implemented by this resource. It also takes advantage of the ability of JAMI to expand complexes when multiple interactions are detected.
The current implementation of JAMI provides the molecular interactions community with a powerful library to enable the development and long-term maintenance of third-party tools. It enables these formats to be updated and refreshed in response to new data types and resources, and is capable of read/writing all existing version of the PSI-MI XML and MITAB formats and methodologies without obsoleting existing tools and resources. The PSI formats, however, are not the only mechanisms available for exchange of molecular interaction data and we intend to use the robust architecture of JAMI to provide additional read/writers for XGMML, BioPax and RDF. Additionally, we will use the JAMI framework to improve users’ ability to merge data from different resources, improving the existing MImerge software , and use the ability to generate a standardised MI-JSON file to improve front end technologies, in particular data visualisation.
JAMI is proving its value as a framework to remove the need for redundant software development and testing with every release of a new or updated molecular interaction data standard. We intend to continue its development, extend its functionality and make it applicable to a wider set of use cases. The further development of JAMI is an open source project coordinated through the ‘MICommunity’ GitHub organisation (https://github.com/MICommunity) with documentation available at https://github.com/MICommunity/psi-jami/tree/master/docs and also code examples at https://github.com/MICommunity/psi-jami/tree/master/jami-examples. Please join us if you are interested in its future development.
Availability and requirements
Operating system(s): Platform independent.
Programming language: Java.
Any restrictions to use by non-academics: None.
MS, MK, AS, JS, JH and YY were funded by BBSRC MIDAS grant (BB/L024179/1), this grant provided the funds for the design of JAMI and its implementation by the IntAct, Complex Portal and InterMine data resources. CC and JR were funded by the Wellcome Trust [103,139, 063412, 203,149] for the design of the ComplexViewer.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study. All software is available from https://github.com/MICommunity/psi-jami
MS(D), ND-T, MK, AS, JS, JH and YY designed the JAMI library and implemented it in the IntAct and InterMine resources, DA-L, JDLR, AC, CC, JR updated and designed tools to use the new library, SO and BM provided use cases and example files. JS and SO drafted the manuscript with input from all authors, YY designed the figures. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Sivade (Dumousseau) M, Alonso-López D, Ammari M, Bradley, G, Campbell, NH., Ceol A. et al. Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions BMC Bioinformatics. https://doi.org/10.1186/s12859-018-2118-1.
- 5.MICommunity. MICommunity/psi-jami [Internet]. GitHub. Available from: https://github.com/MICommunity/psi-jami.
- 12.Balakrishnan R, Park J, Karra K, Hitz BC, Binkley G, Hong EL, et al. YeastMine—an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit 2012:bar062.Google Scholar
- 16.Jupp S, Burdett T, Malone J, Leroy C, Pearce M, McMurry J et al. (2015) A new ontology lookup service at EMBL-EBI. In: Malone, J. Et al. (eds.) Proceedings of SWAT4LS International Conference 2015″..Google Scholar
- 25.Villaveces JM, Jiménez RC, Porras P, Del-Toro N, Duesbury M, Dumousseau M, et al. Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study. Database. 2015. Available from: https://doi.org/10.1093/database/bau131
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.