Mining the Temporal Structure of Thought from Text
Thinking is a self-organized dynamical process and, as such, interesting to characterize. However, direct, real-time access to thought at the semantic level is still very limited. The best that can be done is to look at spoken or written expression. The question we address in this research is the following: Is there a characteristic pitch of thought? To begin answering this complex question, we look at text documents from several large corpora at the sentence level – i.e., using sentences as the units of meaning – and considering each document to be the result of a random process in semantic space. Given a large corpus of multi-sentence documents, we build a lexical association network representing associations between words in the corpus. This network is used to induce a semantic similarity metric between sentences, and each document is segmented into multi-sentence semantically coherent blocks (SCBs) with occasional connecting text between the blocks. Based on this segmentation, the process of document generation is modeled as a sticky Markov chain at the sentence level. We show that most documents across all the corpora are sequences of blocks with a very consistent mean length of 6.4 sentences across the corpora. This consistency suggests that a value of 6-7 sentences may be the typical mean length for single coherent thoughts in texts. We have also described several ways of visualizing the semantic structure of documents in space and time.
KeywordsSemantic dynamics Text analysis Text segmentation
This work was supported in part by National Science Foundation INSPIRE grant BCS-1247971 to Ali Minai.
- 4.Friedenberg, J., Silverman, G.: Introduction: exploring inner space. In: Brace-Thompson, J., Crouppen, M.B., Robinson, S. (eds.) Cognitive Science An Introduction to the Study of Mind, chapter 1, pp. 2–3. Sage Publications, Inc., Thousand Oaks (2006)Google Scholar
- 5.Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in neural information processing systems, pp. 833–840 (2002)Google Scholar
- 6.Hogan, J.P.: Mind Matters: Exploring the World of Artificial Intelligence, 1st edn. Ballantine Publication Group, New York (1998)Google Scholar
- 7.Lamprier, S., Amghar, T., Levrat, B., Saubion, F.: SegGen: A genetic algorithm for linear text segmentation. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1647–1652 (2007)Google Scholar
- 8.Mei, M., Vanarase, A., Minai, A.A.: Chunks of thought: finding salient semantic structures in texts. In: Proceedings of IJCNN 2014 (2014)Google Scholar
- 11.Riedl, M., Biemann, C.: Text segmentation with topic models. J. Lang. Technol. Comput. Linguist. 27(1), 47–69 (2012)Google Scholar
- 13.Shen, G., Horikawa, T., Majima, K., Kamitani, Y.: Deep image reconstruction from human brain activity. bioRxiv (2017). 10.1101/240317Google Scholar
- 14.Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July, pp. 384–394 (2010)Google Scholar