Abstract
This is an extension from a selected paper from JSAI2019. To extract business contents automatically from financial reports is an important problem in the financial area. Especially, segment names and their explanations are important contents that should be extracted. However, the methods for extracting these types of information from financial reports have not been established. In this study, we aim to develop a practical solution for extracting these types of information. To solve this problem, we developed a manually annotated dataset for the task of extracting the segment names and their explanations of each company from financial reports and then developed a recurrent neural network model to solve this task. Our method using the manually annotated dataset outperformed the baseline methods in the task of extracting segment names and their explanations of each company from annual financial reports. In addition, we experimentally demonstrated that our method can be available for this task even when we have a small training dataset. This work is the first work for applying a machine learning method to the task of extracting segment names and their explanations. The insights from this work should be valuable in the industrial area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alves, P., Rayson, P., Walker, M., Young, S.: Heterogeneous narrative content in annual reports published as pdf files: extraction, classification and incremental predictive ability. SSRN Electron. J. (2016)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 2670–2676 (2007)
Cho, K., van Merriënboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)
Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: In Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013)
Cui, L., Wei, F., Zhou, M.: Neural open information extraction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 407–413 (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019)
EL-Haj, M., Rayson, P., Young, S., Walker, M.: Detecting document structure in a very large corpus of UK financial reports. In: Proceedings of The 9th Edition of the Language Resources and Evaluation Conference, pp. 26–31 (2014)
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 3–10 (2011)
Hajek, P., Henriques, R.: Mining corporate annual reports for intelligent detection of financial statement fraud - a comparative study of machine learning methods. Knowl.-Based Syst. 128, 139–152 (2017)
Isonuma, M., Fujino, T., Mori, J., Matsuo, Y., Sakata, I.: Extractive summarization using multi-task learning with document classification. In: EMNLP (2017)
Kitamori, S., Sakai, H., Sakaji, H.: Extraction of sentences concerning business performance forecast and economic forecast from summaries of financial statements by deep learning. In: IEEE CIFEr (2017)
Lee, H., Surdeanu, M., MacCartney, B., Jurafsky, D.: On the importance of text analysis for stock price prediction. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1170–1175 (2014)
Madaan, A., Mittal, A., Mausam, Ramakrishnan, G., Sarawagi, S.: Numerical relation extraction with minimal supervision. In: Proceedings of Thirtieth AAAI Conference on Artificial Intelligence, pp. 2764–2771 (2016)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: A survey on open information extraction. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3866–3878 (2018)
Pires, F.M., Abreu, S.: Automatic selection of table areas in documents for information extraction. In: EPIA (2013)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)
Sakaji, H., Murono, R., Sakai, H., Bennett, J., Izumi, K.: Discovery of rare causal knowledge from financial statement summaries. In: IEEE CIFEr (2017)
Sheikh, M., Conlon, S.: A rule-based system to extract financial information. J. Comput. Inf. Syst. 52, 10–19 (2012)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: NIPS (2015)
Wang, W., Yan, M., Wu, C.: Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering. In: ACL (2018)
Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: ACL (2017)
Acknowledgment
This work was supported in part by JSPS KAKENHI Grant Number JP17J04768.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ito, T., Sakaji, H., Izumi, K. (2020). Segment Information Extraction from Financial Annual Reports Using Neural Network. In: Ohsawa, Y., et al. Advances in Artificial Intelligence. JSAI 2019. Advances in Intelligent Systems and Computing, vol 1128. Springer, Cham. https://doi.org/10.1007/978-3-030-39878-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-39878-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39877-4
Online ISBN: 978-3-030-39878-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)