Abstract
Legal documents require high precision and accuracy in language use, leaving no room for grammatical and spelling errors. To address the issue, this paper proposes a novel application of the BERT pre-trained language model for legal document proofreading. The BERT-based model is trained to detect and correct legal texts’ grammatical and spelling errors. On a dataset of annotated legal documents, we experimentally show that our BERT-based model significantly outperforms state-of-the-art proofreading models in precision, recall, and F1 score, showing its potential as a valuable tool in legal document preparation and revision processes. The application of such advanced deep learning techniques could revolutionise the field of legal document proofreading, enhancing accuracy and efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
Since the competition phase for testing is now closed, we cannot make a direct comparison.
- 13.
- 14.
- 15.
- 16.
References
Bai, Y., et al.: Constitutional AI: harmlessness from AI feedback (2022). arXiv preprint arXiv:2212.08073
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer (2006)
Bryant, C., Yuan, Z., Qorib, M.R., Cao, H., Ng, H.T., Briscoe, T.: Grammatical error correction: a survey of the state of the art. Comput. Linguist. 1–59 (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Fang, T., et al.: Is ChatGPT a highly fluent grammatical error correction system? A comprehensive evaluation (2023). arXiv preprint arXiv:2304.01746
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Gu, J., Wang, C., Zhao, J.: Levenshtein transformer. In: Advances in Neural Information Processing Systems. vol. 32 (2019)
Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 160–169 (2019)
Katsumata, S., Komachi, M.: Stronger baselines for grammatical error correction using a pretrained encoder-decoder model. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 827–832 (2020)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, pp. 1–15 (2014)
Korre, K., Pavlopoulos, J.: Enriching grammatical error correction resources for modern Greek. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4984–4991 (2022)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys.-Dokl. 10(8), 707–710 (1966)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)
Malmi, E., Krause, S., Rothe, S., Mirylenka, D., Severyn, A.: Encode, tag, realize: high-precision text editing. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5054–5065 (2019)
Musyafa, A., Gao, Y., Solyman, A., Wu, C., Khan, S.: Automatic correction of Indonesian grammatical errors based on transformer. Appl. Sci. 12(20), 10380 (2022)
Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR – grammatical error correction: Tag, not rewrite. In: Proceedings of the 15th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–170 (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
Rothe, S., Mallinson, J., Malmi, E., Krause, S., Severyn, A.: A simple recipe for multilingual grammatical error correction. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 702–707 (2021)
Solyman, A., Wang, Z., Tao, Q., Elhag, A.A.M., Zhang, R., Mahmoud, Z.: Automatic Arabic grammatical error correction based on expectation-maximization routing and target-bidirectional agreement. Knowl.-Based Syst. 241, 108180 (2022)
Stahlberg, F., Kumar, S.: Synthetic data generation for grammatical error correction with tagged corruption models (2021). arXiv preprint arXiv:2105.13318
Sun, K., Luo, X., Luo, M.Y.: A survey of pretrained language models. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds.) Knowledge Science, Engineering and Management. KSEM 2022. LNCS(), vol. 13369. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10986-7_36
Tarnavskyi, M., Chernodub, A., Omelianchuk, K.: Ensembling and knowledge distilling of large sequence taggers for grammatical error correction (2022). arXiv preprint arXiv:2203.13064
Trinh, V.A., Rozovskaya, A.: New dataset and strong baselines for the grammatical error correction of Russian. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 4103–4111 (2021)
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–10 (2017)
Wang, C., Yang, L., Wang, Y., Du, Y., Yang, E.: Chinese grammatical error correction method based on transformer enhanced architecture. J. Chin. Inf. Process. 34(6), 106–114 (2020)
Xu, H.D., et al.: Read, listen, and see: Leveraging multimodal information helps Chinese spell checking. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 716–728 (2021)
Yuan, Z., Briscoe, T.: Grammatical error correction using neural machine translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 380–386 (2016)
Yuan, Z., Bryant, C.: Document-level grammatical error correction. In: Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 75–84 (2021)
Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890 (2020)
Zhang, Y., et al.: MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3118–3130 (2022)
Zhang, Y., Zhang, B., Li, Z., Bao, Z., Li, C., Zhang, M.: SynGEC: Syntax-enhanced grammatical error correction with a tailored GEC-oriented parser (2022). arXiv preprint arXiv:2210.12484
Zhao, Y., Jiang, N., Sun, W., Wan, X.: Overview of the NLPCC 2018 shared task: grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 439–445. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_41
Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 1226–1233 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Liu, J., Luo, X. (2024). A BERT-Based Model for Legal Document Proofreading. In: Shi, Z., Torresen, J., Yang, S. (eds) Intelligent Information Processing XII. IIP 2024. IFIP Advances in Information and Communication Technology, vol 703. Springer, Cham. https://doi.org/10.1007/978-3-031-57808-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-57808-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57807-6
Online ISBN: 978-3-031-57808-3
eBook Packages: Computer ScienceComputer Science (R0)