Skip to main content

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing

  • Protocol
  • First Online:
Data Mining for Systems Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1807))

Abstract

The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (see Note 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ (2004) The NLM indexing initiativeś medical text indexer. Stud Health Technol Inform 107(Pt 1):268–272

    PubMed  Google Scholar 

  2. Stokes N, Li Y, Cavedon L, Zobel J (2010) Exploring criteria for successful query expansion in the genomic domain. Inf Retr 12:17–50

    Article  Google Scholar 

  3. Lu Z, Kim W, Wilbur WJ (2010) Evaluation of query expansion using MeSH in PubMed. Inf Retr 12:69–80

    Article  Google Scholar 

  4. Zhu S, Takigawa I, Zeng J, Mamitsuka H (2009) Field independent probabilistic model for clustering multi-field documents. Inf Process Manage 45(5):555–570

    Article  Google Scholar 

  5. Zhu S, Zeng J, Mamitsuka H (2009) Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinformatics 25(15):1944–1951

    Article  CAS  PubMed  Google Scholar 

  6. Gu J, Feng W, Zeng J, Mamitsuka H, Zhu S (2013) Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans Cybernetics 43(4):1265–1276

    Article  Google Scholar 

  7. Zhou J, Shui Y, Peng S, Li X, Mamitsuka H, Zhu S (2015) MeSHSim: An R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents. J Bioinform Comput Biol 13(6):1542002

    Article  CAS  PubMed  Google Scholar 

  8. Huang X, Zheng X, Yuan W, Wang F, Zhu S (2011) Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inform Sci 181(11):2293–2302

    Article  Google Scholar 

  9. Mork JG, Jimeno-Yepes A, Aronson AR (2013) The NLM medical text indexer system for indexing biomedical literature. BioASQ@ CLEF

    Google Scholar 

  10. Demner-Fushman D, Mork JG (2016) A report to the board of Scientific Counselors, April 2016

    Google Scholar 

  11. Mork JG, Demner-Fushman D, Schmidt S, Aronson AR (2014) Recent Enhancements to the NLM Medical Text Indexer. CLEF (Working Notes), pp 1328–1336

    Google Scholar 

  12. Nelson SJ, Schopen M, Savage AG, Schulman JL, Arluk N (2004) The MeSH translation maintenance system: structure, interface design, and implementation. Medinfo 11:67–69

    Google Scholar 

  13. Aronson AR, Lang FM (2004) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236

    Article  Google Scholar 

  14. Lin J, Wilbur WJ (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8:423

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Partalas I, Gaussier É, Ngomo ACN et al. (2013) Results of the first BioASQ Workshop. BioASQ@ CLEF

    Google Scholar 

  16. Tsatsaronis G et al (2015) An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16:138

    Article  PubMed  PubMed Central  Google Scholar 

  17. Balikas G, Partalas I, Ngomo AN, Krithara A, Paliouras G (2014) Results of the BioASQ track of the question answering lab at CLEF 2014. CLEF (Working Notes), pp 1181–1193

    Google Scholar 

  18. Tsoumakas G, Laliotis M, Markantonatos N, Vlahavas IP (2013) Large-scale semantic indexing of biomedical publications. BioASQ@ CLEF

    Google Scholar 

  19. Mao Y, Lu Z (2013) NCBI at the 2013 BioASQ challenge task: learning to rank for automatic MeSH indexing. BioASQ@ CLEF

    Google Scholar 

  20. Liu K, Peng S, Wu J, Zhai C, Mamitsuka H, Zhu S (2015) MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence. Bioinformatics 12:i339–i347

    Article  CAS  Google Scholar 

  21. Peng S, You R, Wang H, Zhai C, Mamitsuka H, Zhu S (2016) DeepMeSH: deep semantic representation for improving large-scale MeSH indexing. Bioinformatics 32(12):i70–i79

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Peng S, You R, Xie Z, Wang B, Zhang Y, Zhu S (2015) The Fudan participation in the 2015 BioASQ challenge: large-scale biomedical semantic indexing and question answering. CLEF (Working Notes)

    Google Scholar 

Download references

Acknowledgments

This work has been partially supported by National Natural Science Foundation of China (Grant Nos: 61572139), MEXT KAKENHI #16H02868 and FiDiPro by Tekes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shanfeng Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Peng, S., Mamitsuka, H., Zhu, S. (2018). MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing. In: Mamitsuka, H. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 1807. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8561-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8561-6_15

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8560-9

  • Online ISBN: 978-1-4939-8561-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics