Abstract
This paper proposes a collaborative multi-agent system for splitting documents into semantically coherent text chunks, labeling them according to a given segmentation structure. Diverse linear text segmentation methods can be incorporated into the system by introducing new agents, which allows to combine complementary approaches: domain-specific, supervised and unsupervised. The system must be supplied with a representative set of previously segmented documents from the target corpus, which are used both to train the supervised agents and to evaluate every agent within the system, similar to ensemble methods. The accuracy of each agent determines its weight in a subsequent aggregation phase, when a common solution is agreed on. The proposed approach presented promising results on segmenting documents from a juridical corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arnold, S., Schneider, R., Cudré-Mauroux, P., Gers, F.A., Löser, A.: SECTOR: a neural model for coherent topic segmentation and classification. Trans. ACL 7, 169–184 (2019)
Bayomi, M., Levacher, K., Ghorab, M.R., Lawless, S.: OntoSeg: a novel approach to text segmentation using ontological similarity. In: ICDMW 2015, Proceedings, pp. 1274–1283. IEEE (2015)
Beeferman, D., Berger, A.L., Lafferty, J.D.: Statistical models for text segmentation. Mach. Learn. 34(1–3), 177–210 (1999)
Conitzer, V.: Making decisions based on the preferences of multiple agents. Commun. ACM 53(3), 84–94 (2010)
Dadachev, B., Balinsky, A., Balinsky, H.: On automatic text segmentation. In: Proceedings of the ACM Symposium on Document Engineering. DocEng 2014, pp. 73–80. ACM (2014)
Ghinassi, I.: Unsupervised text segmentation via deep sentence encoders: a first step towards a common framework for text-based segmentation, summarization and indexing of media content. In: 2nd DataTV, Proceedings. Zenodo (2021)
Glavaš, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: 5th SEM, Proceedings, pp. 125–130. ACL (2016)
Gupta, V., Zhu, G., Yu, A., Brown, D.E.: A comparative study of the performance of unsupervised text segmentation techniques on dialogue transcripts. In: SIEDS 2020, Proceedings, pp. 1–6 (2020)
Habibi, M., et al.: Patseg: a sequential patent segmentation approach. Big Data Res. 19–20, 100133 (2020)
Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)
Koshorek, O., Cohen, A., Mor, N., Rotman, M., Berant, J.: Text segmentation as a supervised learning task. In: NAACL, Proceedings, vol. 2, pp. 469–473. ACL (2018)
Li, W., Matsukawa, T., Saigo, H., Suzuki, E.: Context-aware latent Dirichlet allocation for topic segmentation. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 475–486. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_37
Memon, M.Q., Lu, Y., Chen, P., Memon, A., Pathan, M.S., Zardari, Z.A.: An ensemble clustering approach for topic discovery using implicit text segmentation. J. Inf. Sci. 47(4), 431–457 (2021)
Misra, H., Yvon, F., Jose, J.M., Cappe, O.: Text segmentation via topic modeling: an analytical study. In: CIKM 2009, Proceedings, pp. 1553–1556. ACM (2009)
Pak, I., Teh, P.L.: Text segmentation techniques: a critical review. In: Zelinka, I., Vasant, P., Duy, V.H., Dao, T.T. (eds.) Innovative Computing, Optimization and Its Applications. SCI, vol. 741, pp. 167–181. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-66984-7_10
Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Auton. Agent. Multi-agent Syst. 11, 387–434 (2005)
Pethe, C., Kim, A., Skiena, S.: Chapter Captor: text segmentation in novels. In: EMNLP 2020, Proceedings, pp. 8373–8383. ACL (2020)
Riedl, M., Biemann, C.: Text segmentation with topic models. J. Lang. Technol. Comput. Linguist. 27(47–69), 13–24 (2012)
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)
Wagh, R.S., Anand, D.: A novel approach of augmenting training data for legal text segmentation by leveraging domain knowledge. In: Thampi, S.M., et al. (eds.) Intelligent Systems, Technologies and Applications. AISC, vol. 910, pp. 53–63. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-6095-4_4
Zeinab Shahbazi, Y.C.B.: Analysis of domain-independent unsupervised text segmentation using LDA topic modeling over social media contents. Int. J. Adv. Sci. Technol. 29(06), 5993–6014 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Perotto, F.S. (2023). Collaborative Multi-agent System for Automatic Linear Text Segmentation. In: Aydoğan, R., Criado, N., Lang, J., Sanchez-Anguix, V., Serramia, M. (eds) PRIMA 2022: Principles and Practice of Multi-Agent Systems. PRIMA 2022. Lecture Notes in Computer Science(), vol 13753. Springer, Cham. https://doi.org/10.1007/978-3-031-21203-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-21203-1_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21202-4
Online ISBN: 978-3-031-21203-1
eBook Packages: Computer ScienceComputer Science (R0)