Abstract
The paper describes an approach to automatic classification problem using J.S. Mill’s ideas of inductive logic. The technique uses the general principles (but not the technical details) of the JSM method of automatic hypothesis generation. The proposed method uses some induction procedure to form generic positive and generic negative objects (represented as vectors) and an analogy procedure to classify new documents. With an optimal selection of text preprocessing options, the suggested approach shows better precision than other implemented text classification methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Text Mining and Its Applications, pp. 109–129. Wit Press, Southampton (2005)
Yang, Y., Pederson, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of ICML-97, pp. 412–420 (1997)
Bai, J., Nie, J.-Y.: Using language models text classifications. In: Processing of Asia Information Retrieval Symposium, Beijing (2004)
Finn, V.K.: Epistemological foundations of the JSM method for automatic hypothesis generation. Autom. Doc. Math. Lingust. 48(2), 96–148 (2014)
Anshakov, O.M.: The JSM method: a set theoretical explanation. Autom. Doc. Math. Linguist. 46(5), 202–220 (2012)
Lyfenko, N.D.: Automatic classification of documents in a natural language: a conceptual model. Autom. Doc. Math. Lingust. 48(3), 158–166 (2014)
Lyfenko N.D.: An approach to text data categorization based on the ideas of J.S. Mill. Autom. Doc. Math. Lingust. 49(6), 202–221 (2015)
Mill, J.S.: A System of Logic, Ratiocinative and Inductive. Harper & Brothers, NY (1882)
Finn, V.K.: Databases with incomplete information and a new method for automatic generation of hypotheses. In: Dialogovye i faktograficheskie sistemy informatsionnogo obespecheniya (Dialogue and Factual Information Support System), Moscow (1981)
Grigor’ev, P.A.: A method for automatic generation of hypotheses that is similar to JSM method: the use of statistical considerations. Nauchn.-Tekhn. Inform. Ser. 2. Protsessy Sist. 5–6, 52–55 (1996)
Grigor’ev, P.A., Sword-systems or JSM systems for chains using statistical considerations, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist. 5–6, 45–51(1996)
Anshakov, O.M.: Generalized quantifiers are defined using templates. Part I, Nauchn.-Tekhn. Inform. Ser. 2. Protsessy Sist. 11, 5–17 (2000)
Anshakov, O.M.: Generalized quantifiers are defined using templates. Part II, Nauchn.-Tekhn. Inform. Ser. 2.Protsessy Sist. 5, 35–48 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lyfenko, N. (2016). Automatic Document Classification Based on J.S. Mill’s Ideas. In: Abraham, A., Kovalev, S., Tarassov, V., Snášel, V. (eds) Proceedings of the First International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’16). Advances in Intelligent Systems and Computing, vol 450. Springer, Cham. https://doi.org/10.1007/978-3-319-33609-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-33609-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33608-4
Online ISBN: 978-3-319-33609-1
eBook Packages: EngineeringEngineering (R0)