A basis for a visual language for describing, archiving and analyzing functional models of complex biological systems
- 15k Downloads
We propose that a computerized, internet-based graphical description language for systems biology will be essential for describing, archiving and analyzing complex problems of biological function in health and disease.
We outline here a conceptual basis for designing such a language and describe BioD, a prototype language that we have used to explore the utility and feasibility of this approach to functional biology. Using example models, we demonstrate that a rather limited lexicon of icons and arrows suffices to describe complex cell-biological systems as discrete models that can be posted and linked on the internet.
Given available computer and internet technology, BioD may be implemented as an extensible, multidisciplinary language that can be used to archive functional systems knowledge and be extended to support both qualitative and quantitative functional analysis.
KeywordsIcon Ketogenic Diet System Biology Markup Language Visual Language Functional Entity
Standard graphical representations of complex systems have been developed for many disciplines in order to communicate, archive and analyze systems knowledge. Electronic circuit diagrams and architectural plans, for instance, can be created, read and analyzed by knowledgeable persons, yet there is no such common graphical language for describing functional systems in biology. A variety of representations are used in print or online to archive knowledge in particular domains such as metabolic pathways [1,2], gene networks , signaling networks [4,5], and molecular interactions [6,7], yet are not sufficiently standardized to represent cross-disciplinary systems - the interactions of gene expression and metabolism at the inter- and intra-molecular levels, for instance. To describe such multidisciplinary biological problems at multiple levels of abstraction (for example, from intramolecular to disease phenotype), biologists customarily resort to informal cartoon diagrams which, although expressive, are often ambiguous and must be annotated to be interpreted properly.
Here we suggest that a standardized visual biological description language would provide more readable and less ambiguous communication and, with computational implementation, provide a basis for distributed searchable archives of functional (as opposed to structural) knowledge, and serve as a 'computer-aided design' (CAD) language for simulating and analyzing biological systems. To explore these possibilities, we are developing a prototype biological description language, BioD, as a platform to test its conceptual basis, explore its utility and identify key issues surrounding its implementation.
Results and discussion
Functional properties: a basis for a biological description language
If the goal were simply to standardize the graphics of cartoon diagrams, one could collect and standardize sets of icons and arrows for use as 'clip art' in general-purpose graphics software or as templates in diagramming software such as Visio™ (Microsoft™; for Windows™) or TopDown™ (Kaetron™ Software; for MacOS™ and Windows™). To build a sophisticated software editor for a visual language and, especially, to design the language to support computer archiving and analysis, one must, however, establish deeper foundations for creating, organizing and using graphic elements.
As a starting point, consider the common metaphor of biological systems as 'circuits'. Indeed, viewing metabolic pathways as electrical circuits is particularly apt as, in each case, a limited set of entity classes (metabolites versus resistors, capacitors and inductors) are represented by icons that are linked by lines (or arrows) representing interactions between iconified entities. The benefit of such pathway and circuit diagrams is, of course, that they help us infer and track how a change in a 'state property' (for example, metabolite concentration versus electrical voltage) of one icon acts via an 'action property' (metabolic flux rate versus electrical current) on the state properties of linked icons. The problem for biologists is that metabolite concentrations and flux rates, while absolutely fundamental to many biological systems, are not the only functional properties that need to be considered.
Just as architectural plans must display and distinguish several interacting circuit systems with different properties (such as power, water, drainage, ventilation, telephone, security, and digital cable), diagrams of multidisciplinary biological systems must distinguish multiple functions (metabolic pathways, enzyme kinetics, ion fluxes, fluid flow) characterized by different properties (concentration, catalytic activity, voltage, pressure, respectively). Thus, the key challenge for describing, archiving and analyzing complex biological systems is not simply to collect expressive icons and arrows but to establish a system of expressive icons and arrows that consistently and unambiguously represents the functional entities and their specific functional properties. Does an arrow increase (or decrease) the concentration or the catalytic activity of an enzyme icon? Can we consistently represent the separate effects of transmembrane ion flow on both ion concentration and transmembrane voltage?
Functional interactions between building-block icons, whether aggregated as compound icons or not, are represented by 'action' arrows (Figure 1, upper right panel) that are designed to minimize the functional ambiguity of informal cartoon diagrams. It is required, for instance, that arrows consistently distinguish: which (if there is more than one) of an icon's functional properties are affected either by context (as in cartoons) or by appearance (for example, double-weight arrows change concentration properties, by chemical reactions or transport, whereas single-weight arrows activate or inhibit activity properties); and how the property is affected (for example, filled heads mean 'increase', open heads mean 'decrease'). As for the lexicon of icons, the lexicon of action arrows in Figure 1 could be extended to include other forms to represent, say, thermal fluxes (which change temperature properties) and fluid flows (which change compartmental volume and pressure properties).
In the Cdk4 kinase model of Figure 2 we use single-weight arrows to represent how the activity of the kinase site is inhibited by occupancy of the p15 binding site and is activated by occupancy of the phosphorylation site. A double-weight, solid-headed 'phosphorylate' arrow represents the transfer of phosphate groups by the kinase site of Cdk4 to increase occupancy of the retinoblastoma protein (RB) phosphorylation site. A phosphatase molecular site (a square with open-style P) represents how an unspecified phosphatase activity reduces the occupancy of the RB phosphorylation site. A dashed, double-weight 'occupy' arrow represents the binding to and increased occupancy of p15 to its binding site on Cdk4. In this manner the diagram can represent both intra- and inter-molecular site-site interactions while distinguishing activation/inhibition (as by conformational changes) from chemical reactions between species.
Consistent coding of qualitative effects as 'increase' or 'decrease' arrows allows qualitative 'what if experiments where an initiating 'event' - an increase or decrease of a functional property of one icon, say - can be tracked through a model (or a network of models, see below) as a chain of events. Such events chains can be displayed in 'event diagrams' to represent some events (but not necessarily all events) as chains of 'event' icons (each representing a change of a specific functional property) linked by 'cause' arrows. For instance, the event diagram in Figure 2b displays the consequences (as events icons) of increasing the concentration of p15 in the model in Figure 2a. In complex systems, such consistent qualitative representations can reveal previously unappreciated causal links - redundant parallel pathways or feedback loops - in models built solely from pairwise interactions of many functional parts.
As we will discuss below, formalizing a functional descriptive language for biology can offer additional benefits for exploring and understanding the function of complex biological systems if the functional structures and processes can be adequately and unambiguously described.
Extending BioD to other domains
A major criterion for the utility of any technical language is whether it can fully express concepts within a domain of knowledge; that is, is the language 'complete'? For a domain such as biology which is almost limitless in terms of physical (not to mention psychic, social and evolutionary) phenomena, it is unlikely that any single language will suffice for all purposes - nor could such a language be proved complete by some inductive process. Rather, in our approach to designing and developing BioD, we have begun with basic cell-biological models as test beds for BioD's conceptual framework (that is, of functional properties, icons and arrows), its lexicon and its icon-arrow linking rules (tantamount to a 'graphical grammar'). We have iteratively tested and modified the language to describe an expanding array of biological functions that now includes online models of: the cell-cycle G1/S checkpoint machinery as it may be controlled by ion channel activation ; and the metabolic, signaling and neuronal pathways by which the ketogenic diet may affect seizure susceptibility . In this early phase of design we expect modifications in basic as well as superficial features (as for any software design process), but we also expect that the basic precepts of the functional-property approach can be extended to other phenomenological domains such as fluid flow, heat exchange, neural circuitry, and so forth. To demonstrate the progress of language development within the cell-biological domain, we will describe and discuss the G1/S checkpoint control model.
Extending the lexicon to describe G1/S checkpoint machinery
Extensions for compartmental transport, gene expression and membrane biophysics
The example models in Figures 3,4,5,6 demonstrate that a functional description language can be extended by defining new functional properties, associating the properties with existing or new icons, and using the new icons to represent a wide variety of biological phenomena. In creating the BioD lexicon and drawing conventions we have opted for simple icons and arrows (more elaborate and expressive graphics can readily be imagined) so that diagrams can be easily drawn by hand or with a variety of general purpose computer graphics applications. Creating and editing models is greatly facilitated, however, by applications such as TopDown™ (as used here) or Visio™ that provide 'templates' of reuseable icons and arrows and that also maintain icon-icon arrow links as icons are moved about in the field of the diagram. Even at this stage of development, BioD diagrams created by these general graphic and diagramming tools can be readily imported to available web page design tools to create, edit, link and post models on the internet (see sample BioD model sites [8,9]).
Design considerations for a BioD editor
To fully realize the advantages of a standardized visual language, however, custom editors will be required for creating new icon and arrow objects, for maintaining icon and arrow lexicons, and for creating and editing models. The key first step in developing editors (as well as archiving and analysis software, see below) is to develop standardized, platform-independent computer representations of core BioD language elements (functional properties, icons, arrows and models) in object-oriented programming (OOP) terms so that editors and browsers can be developed on a number of platforms.
In OOP parlance, for instance, BioD icons and arrow objects would 'know' how to draw themselves, 'inherit' functional and graphical properties from 'parent' icons and arrows, and know how to link to each other. The tasks of a BioD editor, as we envisage, would be to create icons and arrows, organize them into lexicons and palettes, and allow users to draw and link them into models. A particularly attractive, but considerably more sophisticated, feature would allow models to be built and formatted automatically according to pairwise functional interactions as available in metabolic pathway, gene network and protein interaction databases.
Model-model links for archiving and searching system models
A standardized biological description language that is formalized in terms of computational and not just graphical objects offers two very attractive benefits. First, it would be possible to build models of very large systems by linking individual submodels into distributed model networks on the internet. Second, it would be possible to treat such model networks as searchable archives of functional knowledge that can be queried for specific functional relationships.
The first possibility can be illustrated by linking models (see Figures 3,4,5) into networks using only two kinds of model-model links: 'action' and 'identity' links. Action links allow a functional property of an icon in one model to act on a functional property of an icon or action in another model, as if in a single model. For instance, an action link (iconified as a wall plug) allows the calcium-dependent activation of calmodulin in Figure 4 to inhibit the export of cyclin D1 from the nucleus in Figure 3. Identity links, on the other hand, allow models to share a common icon and its properties. For instance, an identity link (iconified as an = sign) establishes that the cyclin D1 concentration in Figure 2 represents the same cyclin D1 concentration as in the nucleus in Figure 3. Additional examples of model-model links can be seen in a network of models created to explain how the ketogenic diet might affect seizure activity .
The second possibility is to establish online models as searchable archives of functional, as opposed to structural, relationships. Just as modern bioinformatics provides immense databases and search tools for molecular structures and polymer sequences , we envisage that with suitable computational implementation, distributed online models may be queried not only for local pairwise interactions (such as site-site interactions) but also for long-distance chains of events (for example 'Find models where a potassium channel blocker affects cell-cycle progression'), a capability not available in any other approach to biological systems. Implementing this capability would depend, first, on encoding and storing BioD language elements and models as objects in XML (Extensible Markup Language ). This is a standard language for describing and exchanging information on the internet and is rapidly beginning to fulfill its promise  as a means of representing biological knowledge such as polymer sequences , cellular and anatomical structures  and mathematical models of biological systems . Second, to link model elements into distributed networks on the internet, XML-encoded BioD objects and models can be linked using XLink (XML Linking Language ) and queried for patterns of functional interactions using query languages such as Lore (Lightweight Object Repository ) that have been developed for distributed, semistructured data.
Beyond description: model simulation and analysis
Visual languages in other fields [24,25] have been implemented as formal computer languages so that the diagrams are not only comprehensible to humans but can be analyzed by computers. This includes CAD systems for electronic circuit design and visual languages for computer programming (for example ProLog) as well as mathematical modeling software that translates diagrams into mathematical equations for quantitative analysis. Indeed, visual languages suitable for mathematical modeling in biology (for example Stella™ , SAAM II™ [27,28] and KineCyte™ [29,30]) have contributed to the establishment of quantitative modeling as a 'gold standard' and critical element of biological analysis.
Despite the successes and value of quantitative analytical methods, it must be recognized that progress in the knowledge of biological systems proceeds largely on qualitative information and reasoning ('A binds to B.' 'If C increases then D decreases.') rather than quantitative analysis. This is due in large part to the huge range of experimental conditions and phenomena that are encountered and must be considered, and the great time, effort and expense required to generate consistent sets of quantitative system properties (concentrations, reaction rates and binding constants, for instance) many of which are experimentally inaccessible. In the face of such difficulties, biology, perhaps more than other disciplines, consists of a vast amount of qualitative information that forms the basis for speculation, reasoning and decision-making at all levels from the basic science laboratory to the clinic. To exploit this reservoir of qualitative biological knowledge, a variety of qualitative and hybrid qualitative/quantitative simulation and analysis tools that have been developed in the artificial intelligence community [31,32,33,34] have been adapted to the analysis of biological problems [35,36,37,38].
The advantage of hybrid qualitative/quantitative methods with BioD is that they allow simulation and analysis of the behavior of complex systems at the prevailing level of systems knowledge; BioD models may be brainstormed in descriptive terms early in the discovery process and then upgraded in stages as qualitative information and then quantitative data become available. Model development and testing can, therefore, become an integral part of early discovery and hypothesis testing all within the context of a single graphical description, rather than developing models simply as validations of well understood phenomena. In effect, BioD models would fill the descriptive and analytical gap between informal cartoons and fully quantitative mathematical models.
To accomplish this, BioD object data structures would have to be extended to first represent functional properties and then provide action arrows with logical rules or mathematical functions that determine how the states of linked icons affect each other. Within a BioD editor, such rules and functions would be selectable in preprogrammed form from context-sensitive menus of appropriate and specific action arrows (a feature in KineCyte™ simulation software; RainTown ). Encoding BioD data structures in XML according to the Systems Biology Markup Language (SBML), a recently proposed 'open standard' for exchanging mathematical models of biological systems , would allow BioD models to be exported and quantitatively simulated in a variety of analytical software platforms.
Dissociating the specification of functional properties and values from the graphical description of the model means that models can be created on a purely speculative basis, and then, as system knowledge matures, qualitative values and action rules can be selected (or specified) to support qualitative reasoning and hypothesis testing. As quantitative values and functions are warranted, they can then used to constrain the qualitative predictions of the model.
We propose a conceptual basis for an extensible biological description language that is designed to be implemented as visual computer language for describing, archiving and analyzing complex, multidisciplinary biological systems. We view this suggestion not as radical but as a natural, perhaps inevitable, extension to biomedicine of the visual computer language paradigm [24,25,40] that has been successfully applied in other technical disciplines. The major challenge is that biology offers perhaps some of the most complex and functionally diverse systems known.
With example models here and online, we have shown that a limited, but extensible, lexicon of icons and arrows is capable of describing complex systems that involve a range of cell-biological disciplines from gene regulation to the biophysics of ion transport. The value of such standardization is increased conciseness, readability and clarity for communication of knowledge about complex systems. Beyond simple graphic standardization, computer implementation of a graphic language for biology is not only feasible with currently available computer and internet technology, but could yield substantial benefits to biomedical education, investigation and industry. For instance, development of a common language to build accessible common models are critical elements in the 'blackboard' approach [41,42] for multidisciplinary collaborations on complex problems. We suggest that a standard language for building and posting cross-disciplinary functional models on the internet will be a necessary stage in the evolution of bioinformatics, as there is a growing need to express and understand whole-systems behavior and the complex systems that underlie disease. We further suggest that additional extensions to a standard graphical language can serve as a user interface for a variety of qualitative and quantitative simulation and analysis methods used in other disciplines.
The prototype language we present here is designed primarily for describing cell-biological systems as discrete networks of interacting functional objects at multiple conceptual levels. As such, it may prove cumbersome for describing massively parallel systems (such as the coupled expression of thousands of genes) or continuum systems (such as the wave propagation of cytoplasmic calcium). Even in these modeling domains, however, a standardized visual computational interface would allow modelers to describe discrete parts ('finite elements') of systems and/or to embed the systems in larger BioD models that both control and depend on the massive or continuum systems models.
How might BioD evolve and be extended from a core set of elements as described above to more general biological description language? In our approach, we have followed the model of recent developments of internet protocols, operating systems and computer languages by envisioning that if a core computational framework can be established, it can be extended and implemented for a variety of methods, languages and operating systems. Thus we envision that, given the development of one (or more) editor applications, users could invent BioD language elements (functional properties, icons, arrows and models) as needed to describe specific domains. As domain lexicons and model-model networks expand, with inevitable conflicts and duplications, we would suggest more global standardization be negotiated according to 'open-source' mechanisms of distributed, cooperative development  used for developing the Linux operating system and now being applied to SBML development .
By whatever mechanism, language development raises other key issues. How can one maintain the delicate balance of expressiveness and readability - can the language concisely describe all systems of interest while still being understandable without extensive training? When does the ability to derive more expressive icons begin to diminish readability as viewers are required to recognize and interpret subtle differences between similar icons? Even if successfully developed for selected domains and purposes, inevitably there will be edges to any language for describing all domains and for all analytical purposes - hence the continued appearance of new computer languages such as Java.
We have proposed that a generalized biological descriptive language akin to schematic diagramming languages in other disciplines is a necessary step in the evolution of functional bioinformatics. Toward this, we have developed a prototype language, BioD, that we have used to test the value of this approach and to explore the feasibility of computational implementation using object-oriented programming methods in the context of Internet-based communications. We suggest that a sufficiently formalized descriptive language built on the BioD concept of 'functional properties' can anchor a computational framework capable of supporting the archiving of extended, web-linked model networks and model analysis using hybrid qualitative reasoning and quantitative simulations.
We thank Jim Roberts, Dan Gottschling and Barbara Trask for helpful discussion during the preparation of this manuscript.
- 2.From sequence to function. An introduction to the KEGG project. [http://kegg.genome.ad.jp/kegg/kegg2.html]
- 5.Transpath home page. [http://220.127.116.11/index.html]
- 7.MODULES in extracellular proteins. [http://www.bork.embl-heidelberg.de/Modules/extra.html]
- 8.BioD model of G1/S checkpoint machinery. [http://www.rainbio.com/BioD_G1-S.html]
- 9.BioD model of the ketogenic diet and epilepsy. [http://depts.washington.edu/perc/infoscientists/ketogenicdiet/KD_Events.html]
- 15.Ptashne M: A Genetic Switch. Cambridge, MA: Cell Press;. 1992, 20-28.Google Scholar
- 17.Extensible Markup Language (XML) 1.0 (Second Edition). [http://www.w3.org/TR/REC-xml]
- 20.The Physiome Markup Languages. [http://www.physiome.org.nz/]
- 21.Systems Biology Markup Language [SBML]. [http://www.cds.caltech.edu/erato/index.html]
- 22.XML Linking Language (XLink) Version 1.0. [http://www.w3.org/TR/xlink/]
- 31.Bonarini A, Maniezzo V: Integrating qualitative and quantitative modeling. Int J Expert Systems Res Appl. 1991, 4: 51-70.Google Scholar
- 32.Fishwick PA, Zeigler BP: Creating Qualitative and Combined Models with Discrete Events. IEEE/Computer Society Press: Los Alamitos, CA,. 1991Google Scholar
- 34.Kuipers BJ, Shults B: Reasoning in Logic About Continuous Systems. San Francisco, CA: Morgan Kaufman Publ.;. 1994Google Scholar
- 38.Thieffry D, Thomas R: Qualitative analysis of gene networks. Pac Symp Biocomput. 1998, 77-88.Google Scholar
- 39.KineCyte™ biological systems simulator. [http://www.rainbio.com/KineCyte.html]
- 40.Glasgow J, Narayanan NH, Chandrasekaran B: Diagrammatic Reasoning: Cognitive and Computational Perspectives. Cambridge, MA, The AAAI Press;. 1995Google Scholar
- 41.Corkill DD: Blackboard systems. AI Expert. 1991, 6: 40-47.Google Scholar
- 42.Buschmann F, Meunier R, Rohnert H, Sommerlad P, Stal M: Pattern-Oriented Software Architecture; A System of Patterns. New York: John Wiley and Sons;. 1996Google Scholar
- 43.DiBona C, Ockman S, Stone M: Open Sources: Voices from the Open Source Revolution. Sebastopol, CA: O'Reilly & Associates, Inc; 1999).Google Scholar
- 48.Wang S, Melkoumian Z, Woodfork KA, Cather C, Davidson AG, Wonderlin WF, Strobl JS: Evidence for an early G1 ionic event necessary for cell cycle progression and survival in the MCF-7 human breast carcinoma cell line. J Cell Physiol. 1998, 176: 456-464. 10.1002/(SICI)1097-4652(199809)176:3<456::AID-JCP2>3.3.CO;2-Z.PubMedCrossRefGoogle Scholar