Abstract
In the 1980s, Don Swanson proposed the concept of “undiscovered public knowledge,” and published several examples in which two disparate literatures (i.e., sets of articles having no papers in common, no authors in common, and few cross-citations) nevertheless held complementary pieces of knowledge that, when brought together, made compelling and testable predictions about potential therapies for human disorders. In the 1990s, Don and I published more predictions together and created a computer-assisted search strategy (“Arrowsmith”). At first, the so-called one-node search was emphasized, in which one begins with a single literature (e.g., that dealing with a disease) and searches for a second unknown literature having complementary knowledge (e.g. that dealing with potential therapies). However, we soon realized that the two-node search is better aligned to the information practices of most biomedical investigators: in this case, the user chooses two literatures and then seeks to identify meaningful links between them. Could typical biomedical investigators learn to carry out Arrowsmith analyses? Would they find routine occasions for using such a sophisticated tool? Would they uncover significant links that affect their experiments? Four years ago, we initiated a project to answer these questions, working with several neuroscience field testers. Initially we expected that investigators would spend several days learning how to carry out searches, and would spend several days analyzing each search. Instead, we completely re-designed the user interface, the back-end databases, and the methods of processing linking terms, so that investigators could use Arrowsmith without any tutorial at all, and requiring only minutes to carry out a search. The Arrowsmith Project now hosts a suite of free, public tools. It has launched new research spanning medical informatics, genomics and social informatics, and has, indeed, assisted investigators in formulating new experiments, with direct impact on basic science and neurological diseases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Swanson, D.R.: Fish oil, Raynaud’s Syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986)
Swanson, D.R.: Undiscovered public knowledge. Library Q 56, 103–118 (1986)
Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. JASIS 38, 228–233 (1987)
Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31, 526–557 (1988)
Smalheiser, N.R., Swanson, D.R.: Assessing a gap in the biomedical literature: magnesium deficiency & neurologic disease. Neurosci. Res. Commun. 15, 1–9 (1994)
Smalheiser, N.R., Swanson, D.R.: Linking estrogen to Alzheimer’s Disease: an informatics approach. Neurology 47, 809–810 (1996)
Smalheiser, N.R., Swanson, D.R.: Indomethacin and Alzheimer s Disease. Neurology 46, 583 (1996)
Smalheiser, N.R., Swanson, D.R.: Calcium-independent phospholipase A2 and schizophrenia. Arch. Gen. Psychiat. 55, 752–753 (1998)
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intelligence 91, 183–203 (1997)
Smalheiser, N.R., Swanson, D.R.: Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine 57, 149–153 (1998)
Smalheiser, N.R.: Predicting emerging technologies with the aid of text-based data mining: a micro approach. Technovation 21, 689–693 (2001)
Swanson, D.R., Smalheiser, N.R., Bookstein, A.: Information discovery from complementary literatures: categorizing viruses as potential weapons. JASIST 52, 797–812 (2001)
Weeber, M., Vos, R., Baayen, R.H.: Using concepts in literature-based discovery: Simulating Swanson’s raynaud - fish oil and migraine - magnesium discoveries. JASIST 52, 548–557 (2001)
Weeber, M., Vos, R., Klein, H., De Jong-Van Den Berg, L.T., Aronson, A.R., Molema, G.: Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. JAMIA 10, 252–259 (2003)
Torvik, V.I., Triantaphyllou, E.: Guided Inference of Nested Monotone Boolean Functions. Information Sciences 151, 171–200 (2003)
Torvik, V.I., Triantaphyllou, E.: Discovering rules that govern monotone phenomena. In: Triantaphyllou, Felici (eds.) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing Series, Ch. 4, pp. 149–192. Springer, Heidelberg (2005) (in press)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proc AMIA Symp., pp. 17–21 (2001)
Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The Unified Medical Language System. Methods Inf Med. 32(4), 281–291 (1993) Related Articles, Links
Tanabe, L., Wilbur, W.J.: Generation of a large gene/protein lexicon by morphological pattern analysis. J. Bioinform Comput Biol. 1(4), 611–626 (2004)
Torvik, V.I., Weeber, M., Swanson, D.R., Smalheiser, N.R.: A probabilistic similarity metric for MEDLINE records: a model for author name disambiguation. JASIST 56(2), 140–158 (2005)
Smalheiser, N.R., Perkins, G.A., Jones, S.: Guidelines for negotiating scientific collaborations. PLoS Biology 3(6), e217 (2005)
Palmer, C.L., Cragin, M.H., Hogan, T.P.: Information at the Intersections of Discovery: Case Studies in Neuroscience. In: Proc. ASIST annual meeting, pp. 448–455 (2004)
Kostoff, R.N., Block, J.A., Stump, J.A., Pfeil, K.M.: Information content in MEDLINE record fields. Int. J. Med Inform. 73(6), 515–527 (2004)
Ding, J., Berleant, D., Nettleton, D., Wurtele, E.: Mining MEDLINE: abstracts, sentences, or phrases? In: Pac. Symp. Biocomput., pp. 326–337 (2002)
Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatic 4, 20 (2003)
Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques 27(6), 1210–1214, 1216–1217 (1999)
Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 5(1), 147 (2004)
Divoli, A., Attwood, T.: BioIE: extracting informative sentences from the biomedical literature. Bioinformatics 21(9), 2138–2139 (2005)
Chen, H., Martinez, J., Ng, T.D., Schatz, B.R.: A concept space approach to addressing the vocabulary problem in scientific information retrieval: An experiment on the worm community system. JASIST 48(1), 17–31 (1997)
Lindsay, R.K., Gordon, M.D.: Literature-based discovery by lexical statistics. JASIS 50, 574–587 (1999)
Gordon, M.D., Dumais, S.: Using latent semantic indexing for literature based discovery. JASIS 49, 674–685 (1998)
Hristovski, D., Peterlin, B., Mitchell, J.A., Humphrey, S.M.: Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005)
Srinivasan, P.: Text Mining: Generating Hypotheses from MEDLINE. JASIST 55(5), 396–413 (2004)
Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V., Garner, H.R.: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20(3), 389–398 (2004)
Wren, J.D., Garner, H.R.: Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 20, 191–198 (2004)
Wren, J.D.: Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics 5(1), 145 (2004)
Pratt, W., Yetisgen-Yildiz, M.: LitLinker: Capturing Connections across the Biomedical Literature. In: Proceedings of the International Conference on Knowledge Capture (K-Cap 2003), Florida, October 2003, pp. 105–112 (2003)
Hearst, M.A.: Untangling text data mining. In: Proc. Assoc. Comp. Ling. (1999)
Smalheiser, N.R.: EST analyses predict the existence of a population of chimeric microRNA precursor-mRNA transcripts expressed in normal human and mouse tissues. Genome Biology 4, 403 (2003)
Smalheiser, N.R., Torvik, V.I.: A population-based statistical approach identifies parameters characteristic of human microRNA-mRNA interactions. BMC Bioinformatics 5, 139 (2004)
Smalheiser, N.R., Torvik, V.I.: Mammalian microRNAs derived from genomic repeats. Trends in Genetics 21(6), 322–326 (2005)
Smalheiser, N.R., Torvik, V.I.: Complications in mammalian microRNA target prediction. In: Ying, S.-Y. (ed.) MicroRNA: Protocols. Methods in Molecular Biology. Humana Press (2005) (to be published)
Lugli, G., Larson, J., Martone, M.E., Jones, Y., Smalheiser, N.P.: Dicer and eIF2c are enriched at postsynaptic densities in adult mouse brain and are modified by neuronal activity in a calpain-dependent manner. J. Neurochem. (2005) (in press)
Smalheiser, N.R.: Informatics and hypothesis-driven research. EMBO Reports 3, 702 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smalheiser, N.R. (2005). The Arrowsmith Project: 2005 Status Report. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_5
Download citation
DOI: https://doi.org/10.1007/11563983_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)