Skip to main content
Log in

Information extraction framework to build legislation network

  • Original Research
  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

This paper concerns an information extraction process for building a dynamic legislation network from legal documents. Unlike supervised learning approaches which require additional calculations, the idea here is to apply information extraction methodologies by identifying distinct expressions in legal text in order to extract network information. The study highlights the importance of data accuracy in network analysis and improves approximate string matching techniques to produce reliable network data-sets with more than 98% precision and recall. The applications and the complexity of the created dynamic legislation network are also discussed and challenged.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Shepard’s Citations include a judicial history of cases and statutes.

  2. For more details about MetaLex please refer to Boer et al. (2010).

  3. To estimate this error rate, a cluster sampling method is used to randomly choose ten sets of 30 entities. By manual check of the samples, the rate of incorrectly matched entities is observed.

  4. Time periods: before 1800, 1800–1850, 1850–1900, 1900–1950, 1950–2000, 2000–2018.

  5. To find the frequent words, Textalyzer Python module is used. The frequent prepositions, conjunctions and articles are excluded from the analysis.

  6. Based on their connectivity (total degree).

References

  • Albert R, Jeong H, Barabási A-L (2000) Error and attack tolerance of complex networks. Nature 406(6794):378

    Article  Google Scholar 

  • Andersen PM, Hayes PJ, Huettner AK, Schmandt LM, Nirenburg IB, Weinstein SP (1992) Automatic extraction of facts from press releases to generate news stories. In: Proceedings of the third conference on applied natural language processing. Association for Computational Linguistics, pp 170–177

  • Alexander B, Hoekstra R, De Maat E, Vitali F, Palmirani M, Ratai B (2010) Metalex (open xml interchange format for legal and legislative resources). Management Center, Akon

    Google Scholar 

  • Borgatti SP, Carley KM, Krackhardt D (2006) On the robustness of centrality measures under conditions of imperfect data. Soc Netw 28(2):124–136

    Article  Google Scholar 

  • Butts CT (2003) Network inference, error, and informant (in) accuracy: a Bayesian approach. Soc Netw 25(2):103–140

    Article  Google Scholar 

  • Canisius S, Sporleder C (2007) Bootstrapping information extraction from field books. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL)

  • Carlson A, Schafer C (2008) Bootstrapping information extraction from semi-structured web pages. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 195–210

  • Casteigts A, Flocchini P, Quattrociocchi W, Santoro N (2012) Time-varying graphs and dynamic networks. Int J Parallel Emergent Distrib Syst 27(5):387–408

    Article  Google Scholar 

  • Chiticariu L, Li Y, Reiss FR (2013) Rule-based information extraction is dead! long live rule-based information extraction systems! In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 827–832

  • Cohen KB, Demner-Fushman D (2014) Biomedical natural language processing, vol 11. John Benjamins Publishing Company, Amsterdam

    Book  Google Scholar 

  • Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string metrics for matching names and records. In: KDD workshop on data cleaning and object consolidation, vol 3, pp 73–78

  • Damerau FJ (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176

    Article  Google Scholar 

  • De Maat E, Winkels R, van Engers T (2006) Automated detection of reference structures in law. Frontiers in artificial intelligence and applications. IOS Press, Amsterdam, p 41

    Google Scholar 

  • EUR-Lex (2020) Access to European Union law. https://eur-lex.europa.eu/homepage.html. Accessed 10 Sept 2017

  • Fowler JH, Johnson TR, Spriggs JF, Jeon S, Wahlbeck PJ (2007) Network analysis and the law: measuring the legal importance of precedents at the US supreme court. Polit Anal 15(3):324–346

    Article  Google Scholar 

  • Freitag D (2000) Machine learning for information extraction in informal domains. Mach Learn 39(2–3):169–202

    Article  Google Scholar 

  • Gultemen D, van Engers T (2013) Graph-based linking and visualization for legislation documents (glvd). In: Network analysis in law workshop, at ICAIL 2013: XIV international conference on AI and law, NAiL2013 ICAIL, Rome, Italy, 14 June

  • Hafner CD (1978) An information retrieval system based on a computer model of legal knowledge. UMI Research Press, Ann Arbor, MI

    Google Scholar 

  • Hall PAV, Dowling GR (1980) Approximate string matching. ACM Comput Surv (CSUR) 12(4):381–402

    Article  MathSciNet  Google Scholar 

  • Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics, vol 2. Association for Computational Linguistics, pp 539–545

  • Humphries MD, Gurney K (2008) Network small-world-ness: a quantitative method for determining canonical network equivalence. PLoS ONE 3(4):e0002051

    Article  Google Scholar 

  • Jurafsky D, Martin JH (2014) Speech and language processing, vol 3. Pearson, London

    Google Scholar 

  • Kartoun U (2017) Text nailing: an efficient human-in-the-loop text-processing method. Interactions 24(6):44–49

    Article  Google Scholar 

  • Koniaris M, Anagnostopoulos I, Vassiliou Y (2017) Network analysis in the legal domain: a complex model for European Union legal sources. J Complex Netw 6(2):243–268

    Article  Google Scholar 

  • Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A (2013) Overview of the chemical compound and drug name recognition (chemdner) task. In: BioCreative challenge evaluation workshop, vol 2, p 2

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710

    MathSciNet  Google Scholar 

  • McCallum A (2005) Information extraction: distilling structured data from unstructured text. Queue 3(9):4

    Article  Google Scholar 

  • Mendelson E (2008) Abbyy finereader professional 9.0. PC Magazine

  • Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv CSUR) 33(1):31–88

    Article  Google Scholar 

  • New Zealand Legal Information Institute (2020) Free access to legal information in New Zealand. http://www.nzlii.org. Accessed 31 Oct 2018

  • New Zealand Parliamentary Counsel Office (2020) The authoritative source of New Zealand legislation. http://www.legislation.govt.nz. Accessed 31 Oct 2018

  • Niu Q, Zeng A, Fan Y, Di Z (2015) Robustness of centrality measures against network manipulation. Physica A 438:124–131

    Article  Google Scholar 

  • Pasula H, Marthi B, Milch B, Russell SJ, Shpitser I (2003) Identity uncertainty and citation matching. In: Advances in neural information processing systems, pp 1425–1432

  • Philips L (1990) Hanging on the metaphone. Comput Lang 7(12):39–43

    Google Scholar 

  • Sakhaee N (2018) Leginet New Zealand, first outcome of the new information extraction framework proposed to build legislation network. https://doi.org/10.7910/dvn/ib3qsf. Published 21 Sept 2018

  • Sakhaee N, Wilson M, Hendy S, Zakeri G (2017) Network analysis of New Zealand legislation. NZ Law J 10:332–337

    Google Scholar 

  • Sakhaee N, Wilson MC, Zakeri G (2016) New Zealand legislation network. In: Legal knowledge and information systems: JURIX 2016: the twenty-ninth annual conference, vol 294. IOS Press, p 199

  • Tabak BM, Takami M, Rocha JMC, Cajueiro DO, Souza SRS (2014) Directed clustering coefficient as ameasure of systemic risk in complex banking networks. Phys A Stat Mech Appl 394:211–216

    Article  Google Scholar 

  • Tin CT, Jeffrey LC, Mark DT, Kenneth GY, Rachel E (2009) Information extraction from legal documents. In: 2009 eighth international symposium on natural language processing

  • Trier OD, Jain AK, Taxt T et al (1996) Feature extraction methods for character recognition-a survey. Pattern Recognit 29(4):641–662

    Article  Google Scholar 

  • Ukkonen E (1992) Approximate string-matching with q-grams and maximal matches. Theor Comput Sci 92(1):191–211

    Article  MathSciNet  Google Scholar 

  • Watts DJ (2004) Small worlds: the dynamics of networks between order and randomness, vol 9. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440

    Article  Google Scholar 

  • Winkler WE (1999) The state of record linkage and current research problems. Statistical Research Division, US Census Bureau, Suitland

    Google Scholar 

  • Zhang P, Koppaka L (2007) Semantics-based legal citation network. In: Proceedings of the 11th international conference on artificial intelligence and law. ACM, pp 123–130

  • Zhang Y, Patrick J (2005) Paraphrase identification by text canonicalization. In: Proceedings of the Australasian language technology workshop, pp 160–166

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neda Sakhaee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sakhaee, N., Wilson, M.C. Information extraction framework to build legislation network. Artif Intell Law 29, 35–58 (2021). https://doi.org/10.1007/s10506-020-09263-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-020-09263-3

Keywords

Navigation