From Linking Text to Linking Crimes: Information Retrieval, But Not As You Know It

  • Fabio Crestani
Part of the The Information Retrieval Series book series (INRE, volume 22)


Information retrieval techniques have been used for a long time to identify links between textual items for the automatic construction of hypertexts and electronic books where sought information can be accessed by browsing. While research work in this area has been steadily decreasing in recent years, some of the techniques developed in that context are proving very valuable in a number of new application areas. In this paper we present an approach to automatic linking of textual items that is used to prioritise criminal suspects in a police investigation. A free-text description of an unsolved crime is compared to previous offence descriptions where the offender is known. By linking the descriptions, inferences about likely suspects can be made. Language Modeling is adapted to produce a Bayesian model which assigns a probability to each suspect. An empirical study showed that the linking of free text descriptions of burglaries enables prioritisation of offenders. The model presented in this paper could be easily extended to take account of additional crime and suspect linking data, such as geographical location of crimes or suspect social networks. This would enable large networks of investigative information automatically constructed from police archives to be browsed.


text mining language modeling crime suspect prioritisation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agosti, M., Colotti, R., Gradenigo, G.: A two-level hypertext retrieval model for legal data. In: E.A. Fox (ed.) Proc. 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1991), pp. 316–325. ACM Press, New York, USA, Chicago, USA (1991)CrossRefGoogle Scholar
  2. 2.
    Agosti, M., Crestani, F.: A methodology for the automatic construction of a Hypertext for Information Retrieval. In: Proceedings of the ACM Symposium on Applied Computing, pp. 745–753. Indianapolis, USA (1993)Google Scholar
  3. 3.
    Agosti, M., Crestani, F., Melucci, M.: Automatic authoring and construction of hypertext for Information Retrieval. ACM Multimedia Systems 3(1), 15–24 (1995)Google Scholar
  4. 4.
    Agosti, M., Crestani, F., Melucci, M.: Design and implementation of a tool for the automatic construction of hypertexts for Information Retrieval. Information Processing and Management 32(4), 459–476 (1996)CrossRefGoogle Scholar
  5. 5.
    Agosti, M., Crestani, F., Melucci, M.: On the use of Information Retrieval techniques for the automatic construction of hypertexts. Information Processing and Management 33(2), 133–144 (1997)CrossRefGoogle Scholar
  6. 6.
    Bache, R., Crestani, F., Canter, D., Youngs, D.: Application of language models to suspect prioritisation and suspect likelihood in serial crimes. In: International Workshop on Computer Forensics. Manchester, UK (2007)Google Scholar
  7. 7.
    Belew, R.: Finding Out About: A Cognitive Perspective on Search Engines Technology and the WWW. Cambridge University Press, Cambridge, UK (2000)zbMATHGoogle Scholar
  8. 8.
    Bennell, C., Canter, D.: Linking commercial burglaries by modus operandi: test using regression and ROC analysis. Science and Justice 42(3) (2002)Google Scholar
  9. 9.
    Botafogo, R.: Cluster analysis for hypertext systems. In: R. Korfhage, E.M. Rasmussen, P. Willett (eds.) Proc. 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993), pp.116–125. ACM Press, New York, USA, Pittsburgh, PA, USA (1993)CrossRefGoogle Scholar
  10. 10.
    Botafogo, R., Rivlin, E., Shneiderman, B.: Structural analysis of hypertextx: identifying hierarchies and useful metrics. ACM Transactions on Information Systems 10(2), 142–180 (1992)CrossRefGoogle Scholar
  11. 11.
    Bruza, P., van der Weide, T.: Stratified hypermedia structures for information disclosure. The Computer Journal 35(3), 208–220 (1992)zbMATHCrossRefGoogle Scholar
  12. 12.
    Canter, D.: Offender profiling and criminal differentiation. Journal of Legal and Criminological Psychology 5, 23–46 (2000)CrossRefGoogle Scholar
  13. 13.
    Canter, D., Fritzon, K.: Differentiating arsonists: a model of firesetting actions and characteristics. Legal and criminal psychology 3, 73–96 (1998)Google Scholar
  14. 14.
    Crestani, F., Landoni, M., Melucci, M.: Appearance and functionality of electronic books. International Journal of Digital Libraries 6(2), 192–209 (2006)CrossRefGoogle Scholar
  15. 15.
    Crestani, F., Lee, P.: Searching the web by constrained spreading activation. Information Processing and Management 36(4), 585–605 (2000)CrossRefGoogle Scholar
  16. 16.
    Crestani, F., Melucci, M.: A case study of automatic authoring: from a textbook to a hyper-textbook. Data and Knowledge Engineering 27(1), 1–30 (1998)zbMATHCrossRefGoogle Scholar
  17. 17.
    Crestani, F., Melucci, M.: A methodology for the enhancement of a hypertext version of a textbook by the automatic insertion of links in the subject index. In: Proceedings of the IEEE ADL’98 Conference, pp. 157–166. Santa Barbara, CA, USA (1998)Google Scholar
  18. 18.
    Crestani, F., Melucci, M.: Automatic construction of hypertexts for self-referencing: the hyper-textbook project. Information Systems 28(7), 769–790 (2003)CrossRefGoogle Scholar
  19. 19.
    Crestani, F., Ntioudis, S.: User centred evaluation of an automatically constructed hyper-textbook. Journal of Educational Multimedia and Hypermedia 11(1), 3–19 (2002)Google Scholar
  20. 20.
    Egan, D., Remde, J., Gomez, L., Landauer, T., Eberhardt, J., Lochbaum, C.: Formative design-evaluation of SuperBook. ACM Transactions on Information Systems 7(1), 30–57 (1989)CrossRefGoogle Scholar
  21. 21.
    Egan, D., Remde, J., Landauer, T., Lochbaum, C., Gomez, L.: Acquiring information in books and superbooks. Machine Mediated Learning 3, 259–277 (1989)Google Scholar
  22. 22.
    Fang, H., Tao, T., Cheng-Xiang, Z.: A formal study of information retrieval heuristics. In: M. Sanderson, K. Järvelin, J. Allan, P. Bruza (eds.) Proc. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 49–56. ACM Press, New York, USA, Sheffield, UK (2004)Google Scholar
  23. 23.
    Frisse, M.: Searching for information in a medical handbook. Communications of the ACM 31(7), 880–886 (1988)CrossRefGoogle Scholar
  24. 24.
    Furuta, R., Plaisant, C., Shneiderman, B.: Automatically transforming regularly structured linear documents into hypertext. Electronic Publishing 2(4), 211–229 (1989)Google Scholar
  25. 25.
    Lafferty, J., Cheng-Xiang, Z.: Probabilistic relevance models based on document and language generation. In: W.B. Croft, J. Lafferty (eds.) Language modelling for information retrieval. Kluwer Academic Publisher, Dodrecht, The Netherlands (2003)Google Scholar
  26. 26.
    Landoni, M.: The Visual Book system: a study of the use of the visual rhetoric in the design of electronic books. PhD Thesis, Department of Information Science, University of Strathclyde, Glasgow, Scotland, UK (1997)Google Scholar
  27. 27.
    Mayfield, J.: Two-level models of hypertext. In: C. Nicholas, J. Mayfield (eds.) Intelligent Hypertexts: advanced techniques for the World Wide Web, Lecture Notes in Computer Science, pp. 91–108. Springer Verlag, Berlin, Germany (1997)Google Scholar
  28. 28.
    Melucci, M.: Making digital libraries effective: automatic generation of link for similarity search across hyper-textbooks. Journal of the American Society for Information Science and Technology 55(5), 414–430 (2004)CrossRefGoogle Scholar
  29. 29.
    Miller, G.A.: WordNet: An on-line lexical database. International Journal of Lexicography 3(4), 235–312 (1990)CrossRefGoogle Scholar
  30. 30.
    Oatley, G., Ewart, B.: Crime analysis software: pins in maps, clustering and bayes net prediction. Expert Systems with Applications 25(4), 569–588 (2003)CrossRefGoogle Scholar
  31. 31.
    Ponte, J., Croft, W.B.: A language modelling approach to information retrieval. In: Proc. 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 275–281. ACM Press, New York, USA Melbourne, Australia (1998)Google Scholar
  32. 32.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  33. 33.
    Rada, R.: Converting a textbook to hypertext. ACM Transactions on Information Systems 10(3), 294–315 (1992)CrossRefGoogle Scholar
  34. 34.
    Robertson, J., Merkus, E., Ginige, A.: The hypermedia authoring research toolkit (HART). In: Proceedings of the ACM European Conference on Hypermedia Technology, pp. 177–185. Edinburgh, Scotland, UK (1994)Google Scholar
  35. 35.
    Salton, G., McGill, M.: The SMART retrieval system - experiments in automatic document retrieval. Prentice Hall Inc., Englewood Cliffs, USA (1983)Google Scholar
  36. 36.
    Smeaton, A.: Building hypertext under the influence of topology metrics. In: Proceedings of the IWHD Conference. Montpellier, France (1995)Google Scholar
  37. 37.
    Smeaton, A., Morrissey, P.: Experiments on the automatic construction of hypertext from text. Tech. Rep. CA-0295, School of Computer Application, Dublin, Ireland (1995)Google Scholar
  38. 38.
    Sparck Jones, K., Robertson, S.E., Hiemstra, D., Zaragoza, H.: Language modelling and relevance. In: W.B. Croft, J. Lafferty (eds.) Language modelling for information retrieval. Kluwer Academic Publisher, Dodrecht, The Netherlands (2003)Google Scholar
  39. 39.
    Tebbutt, J.: User evaluation of automatically generated semantic hypertext links in a heavily used procedural manual. Information Processing and Management 35(1), 1–18 (1999)CrossRefGoogle Scholar
  40. 40.
    Thompson, R.: The design and implementation of an intelligent interface for Information Retrieval. Technical report, Computer and Information Science Department, University of Massachusetts, Amherst, MA. USA (1989)Google Scholar
  41. 41.
    van Rijsbergen, C.J.: Information Retrieval, second edn. Butterworths, London, UK (1979)Google Scholar
  42. 42.
    Wong, S., Ziarko, W., Raghavan, V., Wong, P.: On modelling of information retrieval concepts in vector spaces. ACM Transactions on Information Systems 12(2), 299–321 (1987)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Fabio Crestani
    • 1
  1. 1.Faculty of InformaticsUniversity of Lugano (USI)Via G. Buffi 13Switzerland

Personalised recommendations