Prior Data for Non-target Identification

  • Boris L. MilmanEmail author


This chapter is devoted to prior information required to set up and test identification hypotheses. According to its type, the relevant information is divided into meaning and statistical data. Knowledge with regard to the origin, properties, and use of chemical compounds is very essential in order to be able to propose and reject candidate compounds for identification. Prior information about samples analyzed is important in order to gather full evidence of the trueness of an identification result. Plausibility of qualitative analytical results is also taken into account to confirm conclusions made by analysts. Much of such data are extracted from chemical databases outlined in this chapter. These data sources are also used to calculate statistical rates of occurrence and co-occurrence of chemical compounds in the literature. The occurrence rate is the direct measure of the abundance of chemical compounds, and the related possibility of presenting in samples to be analyzed. Rare compounds are filtered out by means of this rate, and further excluded from consideration for identification purposes. Most known compounds are rare ones, as proved by respective statistical data. Facts and rates of the co-occurrence of chemical compounds in the literature provide the possibility of a priori prediction of a group of compounds available in the same samples analyzed. Different methods of estimating these rates are described; examples of their use for identification are given.


Chemical Compound Identification Point Occurrence Rate Prior Data Abundant Compound 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Milman BL, Kovrizhnych MA (2000) Identification of chemical substances by testing and screening of hypotheses II. Determination of impurities in n-hexane and naphthalene Fresenius. J Anal Chem 367:629–634CrossRefGoogle Scholar
  2. 2.
    Milman BL (2002) A Procedure for decreasing uncertainty in the identification of chemical compounds based on their literature citation and cocitation. Two case studies. Anal Chem 74:1484–1492CrossRefGoogle Scholar
  3. 3.
    Milman BL (2005) Literature-based generation of hypotheses on chemical composition using database co-occurrence of chemical compounds. J Chem Inf Model 45:1153–1158CrossRefGoogle Scholar
  4. 4.
    Milman BL (2005) Identification of chemical compounds. Trends Anal Chem 24:493–508CrossRefGoogle Scholar
  5. 5.
    Milman BL, Konopelko LA (2000) Identification of chemical substances by testing and screening of hypotheses I. General. Fresenius J Anal Chem 367:621–628CrossRefGoogle Scholar
  6. 6.
    Anari MR, Baillie TA (2005) Bridging cheminformatic metabolite prediction and tandem mass spectrometry. Drug Discov Today 10:711–717CrossRefGoogle Scholar
  7. 7.
    Baranczewski P, Stańczak A, Kautiainen A, Sandin P, Edlund PO (2006) Introduction to early in vitro identification of metabolites of new chemical entities in drug discovery and development. Pharmacol Rep 58:341–352Google Scholar
  8. 8.
    Staack RF, Hopfgartner G (2007) New analytical strategies in studying drug metabolism. Anal Bioanal Chem 388:1365–1380CrossRefGoogle Scholar
  9. 9.
    Roger S, Scheltema RA, Girolami M, Breitling R (2009) Probabilistic assignment of formulas to mass peaks in metabolomics experiments. Bioinformatics 25:512–518CrossRefGoogle Scholar
  10. 10.
    Chemical Abstracts Service. Accessed 23 May 2010
  11. 11.
    CrossFire Beilstein. Accessed 30 Oct 2010
  12. 12.
    The Combined Chemical Dictionary on DVD. Accessed 29 Oct 2010
  13. 13.
    CHEMnetBASE. Accessed 23 May 2010
  14. 14.
    The Merck Index. Accessed 23 May 2010
  15. 15.
    KEGG: Kyoto Encyclopedia of Genes and Genomes. Accessed 23 May 2010
  16. 16.
    NIST Chemistry WebBook. Accessed 23 May 2010
  17. 17.
    PubChem. Accessed 6 July 2009
  18. 18.
    ChemSpider. Accessed 23 May 2010
  19. 19.
  20. 20.
    ZINC Accessed 23 May 2010
  21. 21.
    ChemIDplus. Accessed 23 May 2010
  22. 22.
    Google Accessed 22 March through 03 April 2008
  23. 23.
    Google Scholar. Accessed 1 Jan 2010
  24. 24.
  25. 25.
    SureChem Accessed 23 May 2010
  26. 26.
  27. 27.
    Drug databases Accessed 30 Oct 2010
  28. 28.
    Schaeffer DJ, Janardan KG (1980) Abundance of organic compounds in water. Bull Environ Contam Toxicol 24:211–216CrossRefGoogle Scholar
  29. 29.
    Milman BL (2008) Introduction to chemical identification (In Russian). VVM, Saint PetersburgGoogle Scholar
  30. 30.
  31. 31.
  32. 32.
    Protein sequences in the CAS Registry file on STN – exact and pattern searching (2004) CAS2052-1104 Accessed 23 May 2010
  33. 33.
    CAS Registry: Exact and pattern searching of nucleic acid sequences (2008) CAS2536-1108. Accessed 23 May 2010
  34. 34.
    UniProtKB/Swiss-Prot protein knowledgebase release 2010_06 statistics. Accessed 24 May 2010
  35. 35.
    Protein existence (2008) Accessed 24 May 2010
  36. 36.
    CA Abstracts. Accessed 24 May 2010
  37. 37.
    Milman BL, Zhurkovich IK (2009) Tandem mass spectral library of pesticides and its use in identification. Proceedings of the 18th International Mass Spectrometry Conference, BremenGoogle Scholar
  38. 38.
    Compendium of Pesticide Common Names. Accessed 24 May 2010
  39. 39.
    NIST Mass Spectral Search Program, version 2.0d, and NIST/EPA/NIH Mass Spectral Library (2005)Google Scholar
  40. 40.
    Mastral AM, Callén MS (2000) A review on polycyclic aromatic hydrocarbon (PAH) emissions from energy generation. Environ Sci Technol 34:3051–3057CrossRefGoogle Scholar
  41. 41.
    Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24:265–269CrossRefGoogle Scholar
  42. 42.
    Small H, Sweeney E (1985) Clustering the Science Citation Index using co-citation I. A comparison of methods. Scientometrics 7:391–409CrossRefGoogle Scholar
  43. 43.
    Small H, Sweeney E, Greenlee E (1985) Clustering the Science Citation Index using co-citation II. Mapping science. Scientometrics 8:311–340CrossRefGoogle Scholar
  44. 44.
    Milman BL, Gavrilova YA (1993) Analysis of citation and co-citation in chemical engineering. Scientometrics 27:53–74CrossRefGoogle Scholar
  45. 45.
    Law J, Bauin S, Courtial JP, Whittaker J (1988) Policy and the mapping of scientific change: a co-word analysis of research into environmental acidification. Scientometrics 14:251–264CrossRefGoogle Scholar
  46. 46.
    Peters HPF, Hartmann D, Van Raan AFJ (1988) Monitoring advances in chemical engineering. Informetrics 87(88):175–195Google Scholar
  47. 47.
    Milman BL, Gavrilova YA (1994) Science news in business journals as the source of information on applied and strategic research and science policy (In Russian). Sci Technol Inf 1(7):17–26Google Scholar
  48. 48.
    Wolfram D (2003) Applied informetrics for information retrieval research. Library Unlimited, WestportGoogle Scholar
  49. 49.
    Smalheiser NR, Swanson DR (1998) Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed 57:149–153CrossRefGoogle Scholar
  50. 50.
    Weeber M, Klein H, Jong-van D, den Berg LTW, Vos R (2001) Using concepts in the literature-based discovery: simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. J Am Soc Inform Sci Technol 52:548–557CrossRefGoogle Scholar
  51. 51.
    Wren JD, Bekeredjian R, Stewart JA, Shohet RV, Garner HR (2004) Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20:389–398CrossRefGoogle Scholar
  52. 52.
    Jenssen TK, Öberg LMJ, Andersson ML, Komorowski J. Methods for large-scale mining of networks of human genes. Accessed 30 Oct 2010
  53. 53.
    Milman BL (2008) Unpublished dataGoogle Scholar
  54. 54.
    Shada DM, Wong CF, Elrod L, Morley JA, Gay CM (1996) Determination of 1-benzo[b]thien-2-ylethanone and related impurities by high performance liquid chromatography. J Pharm Biomed Anal 14:501–510CrossRefGoogle Scholar
  55. 55.
    Sunesson AL, Nilsson CA, Andersson B, Blomquist G (1996) Volatile metabolites produced by two fungal species cultivated on building materials. Ann Occup Hyg 40:397–410Google Scholar
  56. 56.
    Zhou S, Ma J, Wang S, Chen Z (1991) Qualitative analysis of organic compounds in enclosed air by gas chromatography/mass spectrometry (In Chinese). Fenxi Huaxue 19:1115–1121. CA (1992) 116:135267Google Scholar
  57. 57.
    ISO Standard 22892 (2006) Soil quality - Guidelines for the identification of target compounds by gas chromatography and mass spectrometryGoogle Scholar
  58. 58.
    FAO/WHO Codex Alimentarius. Guidelines on the use of mass spectrometry (MS) for identification, confirmation and quantative determination of residues (2005) CAC/GL 56-2005. Accessed 16 May 2010
  59. 59.
    SOFT/AAFS Forensic Laboratory Guidelines (2006). Accessed 17 May 2010
  60. 60.
    Schürmann A, Dvorak V, Crüzer C, Butcher P, Kaufmann A (2009) False-positive liquid chromatography/tandem mass spectrometric confirmation of sebuthylazine residues using the identification points system according to EU directive 2002/657/EC due to a biogenic insecticide in tarragon. Rapid Commun Mass Spectrom 23:1196–1200CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.D.I. Mendeleyev Inst. for Metrology (VNIIM) and Cent. for Ecol. Saf. of Russ. Acad. of SciencesSt. PetersburgRussia

Personalised recommendations