The Arrowsmith Project: 2005 Status Report

Smalheiser, Neil R.

doi:10.1007/11563983_5

The Arrowsmith Project: 2005 Status Report

Neil R. Smalheiser²¹

Conference paper

733 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Abstract

In the 1980s, Don Swanson proposed the concept of “undiscovered public knowledge,” and published several examples in which two disparate literatures (i.e., sets of articles having no papers in common, no authors in common, and few cross-citations) nevertheless held complementary pieces of knowledge that, when brought together, made compelling and testable predictions about potential therapies for human disorders. In the 1990s, Don and I published more predictions together and created a computer-assisted search strategy (“Arrowsmith”). At first, the so-called one-node search was emphasized, in which one begins with a single literature (e.g., that dealing with a disease) and searches for a second unknown literature having complementary knowledge (e.g. that dealing with potential therapies). However, we soon realized that the two-node search is better aligned to the information practices of most biomedical investigators: in this case, the user chooses two literatures and then seeks to identify meaningful links between them. Could typical biomedical investigators learn to carry out Arrowsmith analyses? Would they find routine occasions for using such a sophisticated tool? Would they uncover significant links that affect their experiments? Four years ago, we initiated a project to answer these questions, working with several neuroscience field testers. Initially we expected that investigators would spend several days learning how to carry out searches, and would spend several days analyzing each search. Instead, we completely re-designed the user interface, the back-end databases, and the methods of processing linking terms, so that investigators could use Arrowsmith without any tutorial at all, and requiring only minutes to carry out a search. The Arrowsmith Project now hosts a suite of free, public tools. It has launched new research spanning medical informatics, genomics and social informatics, and has, indeed, assisted investigators in formulating new experiments, with direct impact on basic science and neurological diseases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Swanson, D.R.: Fish oil, Raynaud’s Syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986)
Google Scholar
Swanson, D.R.: Undiscovered public knowledge. Library Q 56, 103–118 (1986)
Article Google Scholar
Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. JASIS 38, 228–233 (1987)
Article Google Scholar
Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31, 526–557 (1988)
Google Scholar
Smalheiser, N.R., Swanson, D.R.: Assessing a gap in the biomedical literature: magnesium deficiency & neurologic disease. Neurosci. Res. Commun. 15, 1–9 (1994)
Google Scholar
Smalheiser, N.R., Swanson, D.R.: Linking estrogen to Alzheimer’s Disease: an informatics approach. Neurology 47, 809–810 (1996)
Google Scholar
Smalheiser, N.R., Swanson, D.R.: Indomethacin and Alzheimer s Disease. Neurology 46, 583 (1996)
Google Scholar
Smalheiser, N.R., Swanson, D.R.: Calcium-independent phospholipase A2 and schizophrenia. Arch. Gen. Psychiat. 55, 752–753 (1998)
Article Google Scholar
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intelligence 91, 183–203 (1997)
Article MATH Google Scholar
Smalheiser, N.R., Swanson, D.R.: Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine 57, 149–153 (1998)
Article Google Scholar
Smalheiser, N.R.: Predicting emerging technologies with the aid of text-based data mining: a micro approach. Technovation 21, 689–693 (2001)
Article Google Scholar
Swanson, D.R., Smalheiser, N.R., Bookstein, A.: Information discovery from complementary literatures: categorizing viruses as potential weapons. JASIST 52, 797–812 (2001)
Article Google Scholar
Weeber, M., Vos, R., Baayen, R.H.: Using concepts in literature-based discovery: Simulating Swanson’s raynaud - fish oil and migraine - magnesium discoveries. JASIST 52, 548–557 (2001)
Article Google Scholar
Weeber, M., Vos, R., Klein, H., De Jong-Van Den Berg, L.T., Aronson, A.R., Molema, G.: Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. JAMIA 10, 252–259 (2003)
Google Scholar
Torvik, V.I., Triantaphyllou, E.: Guided Inference of Nested Monotone Boolean Functions. Information Sciences 151, 171–200 (2003)
Article MATH MathSciNet Google Scholar
Torvik, V.I., Triantaphyllou, E.: Discovering rules that govern monotone phenomena. In: Triantaphyllou, Felici (eds.) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing Series, Ch. 4, pp. 149–192. Springer, Heidelberg (2005) (in press)
Google Scholar
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proc AMIA Symp., pp. 17–21 (2001)
Google Scholar
Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The Unified Medical Language System. Methods Inf Med. 32(4), 281–291 (1993) Related Articles, Links
Google Scholar
Tanabe, L., Wilbur, W.J.: Generation of a large gene/protein lexicon by morphological pattern analysis. J. Bioinform Comput Biol. 1(4), 611–626 (2004)
Article Google Scholar
Torvik, V.I., Weeber, M., Swanson, D.R., Smalheiser, N.R.: A probabilistic similarity metric for MEDLINE records: a model for author name disambiguation. JASIST 56(2), 140–158 (2005)
Article Google Scholar
Smalheiser, N.R., Perkins, G.A., Jones, S.: Guidelines for negotiating scientific collaborations. PLoS Biology 3(6), e217 (2005)
Article Google Scholar
Palmer, C.L., Cragin, M.H., Hogan, T.P.: Information at the Intersections of Discovery: Case Studies in Neuroscience. In: Proc. ASIST annual meeting, pp. 448–455 (2004)
Google Scholar
Kostoff, R.N., Block, J.A., Stump, J.A., Pfeil, K.M.: Information content in MEDLINE record fields. Int. J. Med Inform. 73(6), 515–527 (2004)
Article Google Scholar
Ding, J., Berleant, D., Nettleton, D., Wurtele, E.: Mining MEDLINE: abstracts, sentences, or phrases? In: Pac. Symp. Biocomput., pp. 326–337 (2002)
Google Scholar
Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatic 4, 20 (2003)
Google Scholar
Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27(6), 1210–1214, 1216–1217 (1999)
Google Scholar
Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 5(1), 147 (2004)
Article Google Scholar
Divoli, A., Attwood, T.: BioIE: extracting informative sentences from the biomedical literature. Bioinformatics 21(9), 2138–2139 (2005)
Article Google Scholar
Chen, H., Martinez, J., Ng, T.D., Schatz, B.R.: A concept space approach to addressing the vocabulary problem in scientific information retrieval: An experiment on the worm community system. JASIST 48(1), 17–31 (1997)
Article Google Scholar
Lindsay, R.K., Gordon, M.D.: Literature-based discovery by lexical statistics. JASIS 50, 574–587 (1999)
Article Google Scholar
Gordon, M.D., Dumais, S.: Using latent semantic indexing for literature based discovery. JASIS 49, 674–685 (1998)
Article Google Scholar
Hristovski, D., Peterlin, B., Mitchell, J.A., Humphrey, S.M.: Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005)
Article Google Scholar
Srinivasan, P.: Text Mining: Generating Hypotheses from MEDLINE. JASIST 55(5), 396–413 (2004)
Article Google Scholar
Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., Garner, H.R.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20(3), 389–398 (2004)
Article Google Scholar
Wren, J.D., Garner, H.R.: Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 20, 191–198 (2004)
Article Google Scholar
Wren, J.D.: Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics 5(1), 145 (2004)
Article Google Scholar
Pratt, W., Yetisgen-Yildiz, M.: LitLinker: Capturing Connections across the Biomedical Literature. In: Proceedings of the International Conference on Knowledge Capture (K-Cap 2003), Florida, October 2003, pp. 105–112 (2003)
Google Scholar
Hearst, M.A.: Untangling text data mining. In: Proc. Assoc. Comp. Ling. (1999)
Google Scholar
Smalheiser, N.R.: EST analyses predict the existence of a population of chimeric microRNA precursor-mRNA transcripts expressed in normal human and mouse tissues. Genome Biology 4, 403 (2003)
Article Google Scholar
Smalheiser, N.R., Torvik, V.I.: A population-based statistical approach identifies parameters characteristic of human microRNA-mRNA interactions. BMC Bioinformatics 5, 139 (2004)
Article Google Scholar
Smalheiser, N.R., Torvik, V.I.: Mammalian microRNAs derived from genomic repeats. Trends in Genetics 21(6), 322–326 (2005)
Article Google Scholar
Smalheiser, N.R., Torvik, V.I.: Complications in mammalian microRNA target prediction. In: Ying, S.-Y. (ed.) MicroRNA: Protocols. Methods in Molecular Biology. Humana Press (2005) (to be published)
Google Scholar
Lugli, G., Larson, J., Martone, M.E., Jones, Y., Smalheiser, N.P.: Dicer and eIF2c are enriched at postsynaptic densities in adult mouse brain and are modified by neuronal activity in a calpain-dependent manner. J. Neurochem. (2005) (in press)
Google Scholar
Smalheiser, N.R.: Informatics and hypothesis-driven research. EMBO Reports 3, 702 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

UIC Psychiatric Institute, University of Illinois-Chicago, MC912, 1601 W. Taylor Street, Chicago, IL, 60612, USA
Neil R. Smalheiser

Authors

Neil R. Smalheiser
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science & Engineering, The University of New South Wales, Sydney, Australia
Achim Hoffmann
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
Hiroshi Motoda
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smalheiser, N.R. (2005). The Arrowsmith Project: 2005 Status Report. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_5

Download citation

DOI: https://doi.org/10.1007/11563983_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics