MausDB: An open source application for phenotype data and mouse colony management in large-scale mouse phenotyping projects
- 8.6k Downloads
Large-scale, comprehensive and standardized high-throughput mouse phenotyping has been established as a tool of functional genome research by the German Mouse Clinic and others. In all these projects, vast amounts of data are continuously generated and need to be stored, prepared for data-mining procedures and eventually be made publicly available. Thus, central storage and integrated management of mouse phenotype data, genotype data, metadata and linked external data are highly important. Requirements most probably depend on the individual mouse housing unit or project and the demand for either very specific individual database solutions or very flexible solutions that can be easily adapted to local demands. Not every group has the resources and/or the know-how to develop software for this purpose. A database application has been developed for the German Mouse Clinic in order to meet all requirements mentioned above.
We present MausDB, the German Mouse Clinic web-based database application that integrates standard mouse colony management, phenotyping workflow scheduling features and mouse phenotyping result data management. It links mouse phenotype data with genotype data, metadata and external data such as public web databases, which is a prerequisite for comprehensive data analysis and mining. We describe how this can be achieved with a lean and user-friendly system built on open standards.
MausDB is suited for large-scale, high-throughput phenotyping facilities but can also be used exclusively for mouse colony management within smaller units or projects. The system is successfully used as the primary mouse and data management tool of the German Mouse Clinic and other mouse facilities. We offer MausDB to the scientific community as open source software to provide a system for storage of data from functional genomics projects in a well-structured, easily accessible form.
KeywordsScreening Module Work List Central Data Management Mutant Mouse Line Database Level
The concept of standardized, high-throughput and comprehensive screening of mice has proven to be successful for identifying new phenotypes in mutant mouse lines by the German Mouse Clinic (GMC) [1, 2, 3, 4, 5, 6, 7] and others [8, 9].
In the GMC, experts from various fields of mouse behavior, physiology, morphology, metabolism and pathology work side-by-side in one building in 14 individual modules (allergy, behavior, cardiovascular system, clinical chemistry, dysmorphology, energy metabolism, eye development and vision, immunology, lung function, molecular phenotyping, neurology, nociception, pathology and steroid metabolism) in close collaboration with clinicians and veterinarians .
Mouse mutants and their littermate controls pass through the different modules of the GMC in multi-parallel phenotyping pipelines following a standardized workflow. In the course of the high-throughput primary screen, up to 320 parameters per mouse line are measured, and these findings may be supplemented by results from secondary and tertiary screening assays. In addition, individual modules may conduct independent projects and/or more intensive phenotyping procedures not included in the primary screen.
As a consequence, data integration is a major issue in the GMC, and appropriate bioinformatics support as well as well-defined data structures and processes are required. Data should preferably be stored in a central database to ease the identification of genotype-specific phenotypes or correlations between parameters measured in different modules and to perform cross-line comparisons. Central data management is crucial for integration of measured phenotype data with metadata (e.g. standard operating procedures (SOPs), experimental and housing conditions, etc.) and external data (e.g. linking of mouse genotype data with public databases). As an example for the integration of local data with external data, locally defined gene loci can be cross-linked to external information by attaching URLs pointing to public resources such as MGI or Ensembl. This feature reduces redundant information retrieval on the user side, facilitates discussion of phenotyping results and can be additionally used to cross-link databases for data mining purposes. Thus, downstream data analysis and data mining tools can access a central data resource rather than multiple distributed spreadsheet files. Central data management also facilitates quality control, data curation and backup as well as data exchange, e.g. within the cross-European phenotyping effort EUMODIC (Eu ropean Mo use Di sease C linic) .
In addition to the scientific and phenotyping data-related aspects, an integrated mouse information and management system must also support mouse husbandry and mouse house management. In the GMC, mouse lines from all over the world are imported for primary screen phenotyping and bred for secondary or tertiary phenotyping or for individual research projects. In order to centrally manage shared resources such as rooms, racks, cages and personnel, all animals need to be managed and tracked by the same system.
Common to all mice in the GMC and other mouse facilities at the Helmholtz Zentrum München is the need for documentation of all aspects of a mouse and its life, including sex, genotype, date of birth, origin (import or weaning), date and reason of death, kind of genetic modification and use in experiments that are subject to authorization. Some of these data have to be reported to local authorities on a regular basis.
Several mouse database systems have been developed and published in the course of other large phenotype screening projects during the last years [11, 12, 13, 14], and a couple of additional mouse database systems are commercially available. Despite the existence of these high-quality systems, we opted to develop a system for the needs of the GMC rather than to adapt third-party products to our requirements or to adapt our requirements to the features of third-party products. Therefore, we developed MausDB as a tool that meets all demands of the GMC mentioned above.
MausDB is set up as a typical LAMP system. In this context, the acronym describes the combined use of L inux as the operating system, A pache as the web server, M ySQL as the relational database management system and P erl as the scripting language.
Since ease of installation and administration were major issues when setting up MausDB on our servers, we decided to use the Ubuntu Linux distribution (version 6.06 LTS). In our hands, the whole system, including all necessary packages for MausDB and MausDB-specific program files and databases, can be installed on a blank computer starting from a Ubuntu Linux 6.06 LTS live CD in less than 1 hour.
The hardware requirements of MausDB on the server side are moderate. Although our production server for the GMC (60+ total users, ~15 concurrent users) is a dual processor system (Intel Xeon, 3.06 GHz) with 4 GB RAM, MausDB also runs smoothly on a simple desktop computer with a single 2 GHz CPU and 1 GB RAM with the same number of users.
Results and Discussion
MausDB is a web-based application fully built on free standards (Linux operating system, Apache Web server, MySQL database, Perl as programming language). Non-redundant storage of data in a central database ensures integrity and consistency of data. Using a central database with an adequate backup strategy and administration also improves sustainability of scientific data and helps prevent data loss. Multiple users can simultaneously access the database via a web browser from their individual client computers no matter which operating system they use.
Although MausDB was primarily developed for the needs of the GMC, it has also turned out to be a valuable tool for other mouse facilities at the Helmholtz Zentrum München due to its flexible and general-purpose design.
As of January 2008, data of around 90,000 mice from four large mouse facilities at the Helmholtz Zentrum München – German Research Center for Environmental Health, including the GMC, are managed using MausDB.
Our objectives during development of MausDB were primarily to meet the functional requirements described above, but acceptance of the new system by its prospective users was also of prime importance. Usefulness and usability are the main essential issues with respect to user acceptance, especially in a quite heterogeneous environment. Usability is closely linked with convenience and ease of use, so we put much effort into development of a user-friendly interface.
Ease of use
Intuitive use helps to reduce errors that are produced by user interaction, and ease of use also helps minimize the effort for user training. We applied user interaction concepts that most everyone is familiar with from other World Wide Web contexts. For example, we implemented a mouse "cart", which can be used to first collect a set of mice and then apply a common procedure (e.g. mating, genotyping, culling or moving to another cage) to the selected mice; as most Internet shops use a virtual "shopping cart", no specific training is needed to instruct users how to do the same thing with mice.
Since we identified abbreviations and cryptic language as a major barrier to usability, we use clear and non-ambiguous English language in the user interface and avoid the use of abbreviations as much as possible.
Flexibility with only a few strict rules
The GMC has a strict workflow for mice subjected to primary screening. On the other hand, many mice are imported or bred for secondary screening research projects by the individual scientists from different screening modules. This is reflected by a large number of – sometimes mutually contradictory – user requirements for handling even standard operations such as mating, weaning or mouse movement.
To cope with all these specifications, we implemented only few basic rules. Strict rules are not necessary in all cases: there is no need, for example, to strictly prevent mice with the same ear marks from being in the same cage, as there might be additional attributes that help to distinguish mice. In the same example, it also makes no sense to apply strict rules on the database level when the physical movement has already been performed.
Thus, MausDB follows the convention to only generate a warning in such error-prone situations and let the user decide whether to ignore the warning or not. Therefore, MausDB users are more in charge of the correctness of their input than users of other systems that may apply stricter or more complex rules. On the other hand, this flexibility provides the opportunity to use MausDB in a quite heterogeneous environment without the need to define and administer project-specific rules. In addition, the complexity of the system can be kept very low, as every rule might create new dependencies.
To minimize the need for intervention by database or system administrators, corrections of false entries that need to be done regularly (e.g. update sex) or have little or no side effects (e.g. update ear marks) can be done by the users on their own without having to contact an administrator.
Some tools (e.g. check database integrity, database statistics) and frequently needed administrative task dialogs (e.g. adding new users, setting up new rooms and racks, defining new mouse lines) are integrated into the MausDB web user interface but are restricted to users with administrative privileges. No SQL experience is needed for this kind of daily routine administration.
In the current version of MausDB, some complex or infrequent operations require inserting or altering data on the database level, where basic SQL experience is necessary.
Customizing the user interface or adding new features is straightforward but requires advanced Perl and SQL skills.
MausDB features and capabilities
Phenotyping workflow management
In the GMC, every screening module offers the measurement of different parameters, which are grouped within standardized assays or so-called parameter sets. For example, the neurology module screens mice following a modified SHIRPA (S mithKline Beecham Pharmaceuticals; H arwell, MRC Mouse Genome Centre and Mammalian Genetics Unit; I mperial College School of Medicine at St. Mary's; R oyal London Hospital, P henotype A ssessment) protocol  that includes 23 individual parameters. The parameters and assays have to be defined on the database level using SQL commands, as there is currently no graphical user interface for this purpose. Defining new parameters or assays does not require modification of the source code; everything can be easily configured on the database level.
Mutant mouse lines subjected to primary screening enter the GMC in general at the age of 5 weeks and pass the different screening modules in a strictly defined order, the so-called primary workflow .
Phenotype data management
In general, spreadsheet files are produced directly by, for example, a blood analyzer or grip strength meter. However, for specific needs, spreadsheet files can be generated manually by the screeners or are generated via export from module-specific databases. Uploading of phenotyping results is straightforward and works by simply uploading the appropriate spreadsheet file via the web interface. This approach is quite universal and can be used by almost any institution by configuring the settings on the database level, without changing the source code.
During the uploading procedure, the full path and file name of the spreadsheet file as well as the sheet name containing the results are requested interactively. The result sheets have to have a standardized, assay-specific matrix format: the results from one mouse are arranged in one row, with the columns representing mouse ID, date of measurement and the different phenotyping parameters of the assay. The uploading procedure includes checking of data type (float, integer, text, Boolean) and plausibility checking of parametric results, mouse IDs and dates (to some extent using regular expressions). The column header names and the column position used in the result file are compared with expected values stored in the database for each assay. Undefined, additional columns are ignored. A color-coded warning is displayed for every spreadsheet field with a missing value. Critical errors such as invalid or missing mouse ID and date, missing or displaced columns or wrong data type cause an abortion of the uploading procedure. Bounds and ranges for plausibility checks can be defined for every parameter in the database, and these additional checks can be plugged easily into the uploading procedure. In the last step of the uploading procedure, a final visual inspection of the result matrix has to be performed by the user before the results are inserted into the database.
In addition to the uploading of pre-defined parametric data, any file (for example, spreadsheet files, image files or expression chip analysis files) can be uploaded and permanently attached to a mouse or a group of mice.
MausDB does not currently use any ontologies to store phenotype data, but this will be a feature of future versions. In addition, the use of controlled vocabularies for the collection of phenotype data will be implemented.
Mouse management and husbandry
Standard animal management tasks are probably very similar in most mouse facilities. In MausDB every mouse has its own, unique ID. In terms of quality and good practice, this property of MausDB is essential for its use in the GMC.
Grouping of mice using the "cart"
Regardless of where they are actually located, mice can be grouped by virtually putting them in the so-called "cart". Carts are attached to the browser session, allowing temporary grouping of mice, but they can also be stored permanently for public or private use and reloaded later on. This feature of the cart system is very useful in the course of the primary screening workflow: mouse cohorts stay in the GMC for 14 weeks, during which they are sequentially moved to 11 independent screening modules. During this time, the mice may be put into other cages and examined in different assays, but they always stay grouped together in their original "cart".
Search & find functions
Searches can be restricted to mice in the session cart. Thus, by combining the use of search & find functions and the cart, complex search operations can be performed.
For each mouse, MausDB can manage multiple mutant alleles and their respective genotypes, which can be assigned either individually or for a selection of mice via the cart.
MausDB is designed to cope with thousands to tens of thousands of concurrently living mice in large mouse facilities. As an integrated system, it can be used for managing mouse breeding and phenotype data as well as scheduling screening workflow in such phenotyping centers.
Although MausDB is designed for rather large projects, it can still be used for small-scale mouse stock breeding with only a few racks. Using the cart and the phenotyping order management tools, MausDB can be used in fully managed units, where a central management team coordinates tasks to be performed by technicians and animal keepers, though these management tools might need further improvement. On the other hand, MausDB can also be used in decentralized mouse facilities, where different independent groups operate on their own without being directed by a central management team.
Benefits of MausDB
MausDB is freely available open source software and thereby can help to reduce costs. Download, use and adaptation or further development of MausDB is not only allowed, but encouraged. From our experience, MausDB also helps to reduce the amount of time spent with mouse colony and data management because information is centrally stored and accessible for concurrent read and write access by many users.
Projects sharing mouse space in a central facility can profit from sharing hardware (computers and cage card printers) and personnel trained in using a common mouse colony management system.
In comparison to distributed spreadsheet files or paper-based laboratory journals, the use of MausDB helps to improve overall data quality, as changes are made to a central database and are checked for plausibility.
Storage of structured data in a central relational database is also a prerequisite for integrating specific phenotyping data with data from public databases. As a consequence, the application of data mining methods to phenotyping data is significantly facilitated.
Planned future developments
We intend to implement new features for the documentation of treatments on the level of individual mice, such as exposure to environmental challenges or medication. In addition, integration of tools for basic statistical analysis, data visualization and data mining is planned. Integration of ontologies and controlled vocabularies for the collection of phenotype data will also be implemented in future versions of MausDB.
We have developed an integrated phenotyping workflow, data and mouse management system named MausDB that can be used by mouse facilities ranging from large-scale, high-throughput phenotype screening facilities to small mouse stock breeding units. MausDB centrally stores and integrates phenotype data with mouse husbandry data (e.g. line, genotype) and other metadata on the level of individual mice, allowing access by data analysis and data mining tools. The MausDB web interface is very intuitive and user-friendly, which reduces the need for user training to a minimum. Due to its lean and open design, it can be easily installed and adapted for custom purposes. We offer MausDB to the scientific community as open source software under the terms of the GNU General Public License (GPL).
Availability and requirements
Project name: MausDB
Project home page: http://www.helmholtz-muenchen.de/ieg/ (section "downloads")
Operating system: platform-independent
Programming language: Perl
License: GNU GPL
Any restrictions for use by non-academics: none
This work was funded by grant 01GR0430 from the NGFN (Nationales Genomforschungsnetz). We thank Lore Becker, Birgit Rathkolb, Reinhard Seeliger and all other MausDB users for helpful discussions in the planning phase and during development of the system. Thanks to Walter Pargent for helpful discussions about data management and for sharing the experiences he had with MouseNet.
- 1.The German Mouse Clinic[http://www.mouseclinic.de]
- 2.Gailus-Durner V, Fuchs H, Becker L, Bolle I, Brielmeier M, Calzada-Wack J, Elvert R, Ehrhardt N, Dalke C, Franz TJ, Grundner-Culemann E, Hammelbacher S, Holter SM, Holzlwimmer G, Horsch M, Javaheri A, Kalaydjiev SV, Klempt M, Kling E, Kunder S, Lengger C, Lisse T, Mijalski T, Naton B, Pedersen V, Prehn C, Przemeck G, Racz I, Reinhard C, Reitmeir P, Schneider I, Schrewe A, Steinkamp R, Zybill C, Adamski J, Beckers J, Behrendt H, Favor J, Graw J, Heldmaier G, Hofler H, Ivandic B, Katus H, Kirchhof P, Klingenspor M, Klopstock T, Lengeling A, Muller W, Ohl F, Ollert M, Quintanilla-Martinez L, Schmidt J, Schulz H, Wolf E, Wurst W, Zimmer A, Busch DH, de Angelis MH: Introducing the German Mouse Clinic: open access platform for standardized phenotyping. Nat Methods 2005, 2(6):403–404. 10.1038/nmeth0605-403CrossRefPubMedGoogle Scholar
- 3.Schneider I, Tirsch WS, Faus-Kessler T, Becker L, Kling E, Busse RL, Bender A, Feddersen B, Tritschler J, Fuchs H, Gailus-Durner V, Englmeier KH, de Angelis MH, Klopstock T: Systematic, standardized and comprehensive neurological phenotyping of inbred mice strains in the German Mouse Clinic. J Neurosci Methods 2006, 157(1):82–90. 10.1016/j.jneumeth.2006.04.002CrossRefPubMedGoogle Scholar
- 4.Meyer CW, Elvert R, Scherag A, Ehrhardt N, Gailus-Durner V, Fuchs H, Schafer H, Hrabe de Angelis M, Heldmaier G, Klingenspor M: Power matters in closing the phenotyping gap. Naturwissenschaften 2007.Google Scholar
- 5.Barrantes Idel B, Montero-Pedrazuela A, Guadano-Ferraz A, Obregon MJ, Martinez de Mena R, Gailus-Durner V, Fuchs H, Franz TJ, Kalaydjiev S, Klempt M, Holter S, Rathkolb B, Reinhard C, Morreale de Escobar G, Bernal J, Busch DH, Wurst W, Wolf E, Schulz H, Shtrom S, Greiner E, Hrabe de Angelis M, Westphal H, Niehrs C: Generation and characterization of dickkopf3 mutant mice. Mol Cell Biol 2006, 26(6):2317–2326. 10.1128/MCB.26.6.2317-2326.2006CrossRefPubMedGoogle Scholar
- 6.Pasche B, Kalaydjiev S, Franz TJ, Kremmer E, Gailus-Durner V, Fuchs H, Hrabe de Angelis M, Lengeling A, Busch DH: Sex-dependent susceptibility to Listeria monocytogenes infection is mediated by differential interleukin-10 production. Infect Immun 2005, 73(9):5952–5960. 10.1128/IAI.73.9.5952-5960.2005PubMedCentralCrossRefPubMedGoogle Scholar
- 7.Vauti F, Goller T, Beine R, Becker L, Klopstock T, Holter SM, Wurst W, Fuchs H, Gailus-Durner V, de Angelis MH, Arnold HH: The mouse Trm1-like gene is expressed in neural tissues and plays a role in motor coordination and exploratory behaviour. Gene 2007, 389(2):174–185. 10.1016/j.gene.2006.11.004CrossRefPubMedGoogle Scholar
- 10.The European Mouse Disease Clinic[http://www.eumodic.org]
- 12.Masuya H, Nakai Y, Motegi H, Niinaya N, Kida Y, Kaneko Y, Aritake H, Suzuki N, Ishii J, Koorikawa K, Suzuki T, Inoue M, Kobayashi K, Toki H, Wada Y, Kaneda H, Ishijima J, Takahashi KR, Minowa O, Noda T, Wakana S, Gondo Y, Shiroishi T: Development and implementation of a database system to manage a large-scale mouse ENU-mutagenesis program. Mamm Genome 2004, 15(5):404–411. 10.1007/s00335-004-2265-8CrossRefPubMedGoogle Scholar
- 14.Strivens MA, Selley RL, Greenaway SJ, Hewitt M, Liu X, Battershill K, McCormack SL, Pickford KA, Vizor L, Nolan PM, Hunter AJ, Peters J, Brown SD: Informatics for mutagenesis: the design of mutabase--a distributed data recording system for animal husbandry, mutagenesis, and phenotypic analysis. Mamm Genome 2000, 11(7):577–583. 10.1007/s003350010110CrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.