This paper concerns an information extraction process for building a dynamic legislation network from legal documents. Unlike supervised learning approaches which require additional calculations, the idea here is to apply information extraction methodologies by identifying distinct expressions in legal text in order to extract network information. The study highlights the importance of data accuracy in network analysis and improves approximate string matching techniques to produce reliable network data-sets with more than 98% precision and recall. The applications and the complexity of the created dynamic legislation network are also discussed and challenged.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Shepard’s Citations include a judicial history of cases and statutes.
For more details about MetaLex please refer to Boer et al. (2010).
To estimate this error rate, a cluster sampling method is used to randomly choose ten sets of 30 entities. By manual check of the samples, the rate of incorrectly matched entities is observed.
Time periods: before 1800, 1800–1850, 1850–1900, 1900–1950, 1950–2000, 2000–2018.
To find the frequent words, Textalyzer Python module is used. The frequent prepositions, conjunctions and articles are excluded from the analysis.
Based on their connectivity (total degree).
Albert R, Jeong H, Barabási A-L (2000) Error and attack tolerance of complex networks. Nature 406(6794):378
Andersen PM, Hayes PJ, Huettner AK, Schmandt LM, Nirenburg IB, Weinstein SP (1992) Automatic extraction of facts from press releases to generate news stories. In: Proceedings of the third conference on applied natural language processing. Association for Computational Linguistics, pp 170–177
Alexander B, Hoekstra R, De Maat E, Vitali F, Palmirani M, Ratai B (2010) Metalex (open xml interchange format for legal and legislative resources). Management Center, Akon
Borgatti SP, Carley KM, Krackhardt D (2006) On the robustness of centrality measures under conditions of imperfect data. Soc Netw 28(2):124–136
Butts CT (2003) Network inference, error, and informant (in) accuracy: a Bayesian approach. Soc Netw 25(2):103–140
Canisius S, Sporleder C (2007) Bootstrapping information extraction from field books. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL)
Carlson A, Schafer C (2008) Bootstrapping information extraction from semi-structured web pages. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 195–210
Casteigts A, Flocchini P, Quattrociocchi W, Santoro N (2012) Time-varying graphs and dynamic networks. Int J Parallel Emergent Distrib Syst 27(5):387–408
Chiticariu L, Li Y, Reiss FR (2013) Rule-based information extraction is dead! long live rule-based information extraction systems! In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 827–832
Cohen KB, Demner-Fushman D (2014) Biomedical natural language processing, vol 11. John Benjamins Publishing Company, Amsterdam
Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string metrics for matching names and records. In: KDD workshop on data cleaning and object consolidation, vol 3, pp 73–78
Damerau FJ (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176
De Maat E, Winkels R, van Engers T (2006) Automated detection of reference structures in law. Frontiers in artificial intelligence and applications. IOS Press, Amsterdam, p 41
EUR-Lex (2020) Access to European Union law. https://eur-lex.europa.eu/homepage.html. Accessed 10 Sept 2017
Fowler JH, Johnson TR, Spriggs JF, Jeon S, Wahlbeck PJ (2007) Network analysis and the law: measuring the legal importance of precedents at the US supreme court. Polit Anal 15(3):324–346
Freitag D (2000) Machine learning for information extraction in informal domains. Mach Learn 39(2–3):169–202
Gultemen D, van Engers T (2013) Graph-based linking and visualization for legislation documents (glvd). In: Network analysis in law workshop, at ICAIL 2013: XIV international conference on AI and law, NAiL2013 ICAIL, Rome, Italy, 14 June
Hafner CD (1978) An information retrieval system based on a computer model of legal knowledge. UMI Research Press, Ann Arbor, MI
Hall PAV, Dowling GR (1980) Approximate string matching. ACM Comput Surv (CSUR) 12(4):381–402
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics, vol 2. Association for Computational Linguistics, pp 539–545
Humphries MD, Gurney K (2008) Network small-world-ness: a quantitative method for determining canonical network equivalence. PLoS ONE 3(4):e0002051
Jurafsky D, Martin JH (2014) Speech and language processing, vol 3. Pearson, London
Kartoun U (2017) Text nailing: an efficient human-in-the-loop text-processing method. Interactions 24(6):44–49
Koniaris M, Anagnostopoulos I, Vassiliou Y (2017) Network analysis in the legal domain: a complex model for European Union legal sources. J Complex Netw 6(2):243–268
Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A (2013) Overview of the chemical compound and drug name recognition (chemdner) task. In: BioCreative challenge evaluation workshop, vol 2, p 2
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710
McCallum A (2005) Information extraction: distilling structured data from unstructured text. Queue 3(9):4
Mendelson E (2008) Abbyy finereader professional 9.0. PC Magazine
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv CSUR) 33(1):31–88
New Zealand Legal Information Institute (2020) Free access to legal information in New Zealand. http://www.nzlii.org. Accessed 31 Oct 2018
New Zealand Parliamentary Counsel Office (2020) The authoritative source of New Zealand legislation. http://www.legislation.govt.nz. Accessed 31 Oct 2018
Niu Q, Zeng A, Fan Y, Di Z (2015) Robustness of centrality measures against network manipulation. Physica A 438:124–131
Pasula H, Marthi B, Milch B, Russell SJ, Shpitser I (2003) Identity uncertainty and citation matching. In: Advances in neural information processing systems, pp 1425–1432
Philips L (1990) Hanging on the metaphone. Comput Lang 7(12):39–43
Sakhaee N (2018) Leginet New Zealand, first outcome of the new information extraction framework proposed to build legislation network. https://doi.org/10.7910/dvn/ib3qsf. Published 21 Sept 2018
Sakhaee N, Wilson M, Hendy S, Zakeri G (2017) Network analysis of New Zealand legislation. NZ Law J 10:332–337
Sakhaee N, Wilson MC, Zakeri G (2016) New Zealand legislation network. In: Legal knowledge and information systems: JURIX 2016: the twenty-ninth annual conference, vol 294. IOS Press, p 199
Tabak BM, Takami M, Rocha JMC, Cajueiro DO, Souza SRS (2014) Directed clustering coefficient as ameasure of systemic risk in complex banking networks. Phys A Stat Mech Appl 394:211–216
Tin CT, Jeffrey LC, Mark DT, Kenneth GY, Rachel E (2009) Information extraction from legal documents. In: 2009 eighth international symposium on natural language processing
Trier OD, Jain AK, Taxt T et al (1996) Feature extraction methods for character recognition-a survey. Pattern Recognit 29(4):641–662
Ukkonen E (1992) Approximate string-matching with q-grams and maximal matches. Theor Comput Sci 92(1):191–211
Watts DJ (2004) Small worlds: the dynamics of networks between order and randomness, vol 9. Princeton University Press, Princeton
Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440
Winkler WE (1999) The state of record linkage and current research problems. Statistical Research Division, US Census Bureau, Suitland
Zhang P, Koppaka L (2007) Semantics-based legal citation network. In: Proceedings of the 11th international conference on artificial intelligence and law. ACM, pp 123–130
Zhang Y, Patrick J (2005) Paraphrase identification by text canonicalization. In: Proceedings of the Australasian language technology workshop, pp 160–166
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Sakhaee, N., Wilson, M.C. Information extraction framework to build legislation network. Artif Intell Law 29, 35–58 (2021). https://doi.org/10.1007/s10506-020-09263-3
- Optical character recognition
- Information extraction
- Named entity recognition
- Relation extraction
- Approximate string matching
- Legislation network