A flexible text analyzer based on ontologies: an application for detecting discriminatory language
Language can be a tool to marginalize certain groups due to the fact that it may reflect a negative mentality caused by mental barriers or historical delays. In order to prevent misuse of language, several agents have carried out campaigns against discriminatory language, criticizing the use of some terms and phrases. However, there is an important gap in detecting discriminatory text in documents because language is very flexible and, usually, contains hidden features or relations. Furthermore, the adaptation of approaches and methodologies proposed in the literature for text analysis is complex due to the fact that these proposals are too rigid to be adapted to different purposes for which they were intended. The main novelty of the methodology is the use of ontologies to implement the rules that are used by the developed text analyzer, providing a great flexibility for the development of text analyzers and exploiting the ability to infer knowledge of the ontologies. A set of rules for detecting discriminatory language relevant to gender and people with disabilities is also presented in order to show how to extend the functionality of the text analyzer to different discriminatory text areas.
KeywordsText analyzer Document text model Methodology Ontology Discriminatory language
This contribution has been supported by the Andalusian Institute of Women, Junta de Andalucía, Spain (Grant No. UNIVER09/2009/23/00).
- Aussenac-Gilles, N., & Sörgel, D. (2005). Text analysis for ontology and terminology engineering. Applied Ontology, 1(1), 35–46.Google Scholar
- Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol, CA: O’Reilly Media, Inc.Google Scholar
- Brading, J., & Curtis, J. (2000). Disability discrimination: A practical guide to the new law. London: Kogan Page Series.Google Scholar
- Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the third conference on applied natural language processing, association for computational linguistics, Stroudsburg, PA, USA, ANLC ’92, pp. 152–155. doi: 10.3115/974499.974526.
- Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protégé plug-in for ontology extraction from text based on linguistic analysis. In The semantic web: Research and applications, pp. 31–44. Springer.Google Scholar
- Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012). Detecting offensive language in social media to protect adolescent online safety. In Proceedings—2012 ASE/IEEE international conference on privacy, security, risk and trust and 2012 ASE/IEEE international conference on social computing, SocialCom/PASSAT 2012, pp. 71–80.Google Scholar
- Chin, S., Street, W., Srinivasan, P., & Eichmann, D. (2010). Detecting wikipedia vandalism with active learning and statistical language models. In Proceedings of the 4th workshop on information credibility, WICOW’10, pp. 3–10.Google Scholar
- Cimiano, P., McCrae, J., & Buitelaar, P. (2016). Lexicon model for ontologies: Community report. https://www.w3.org/2016/05/ontolex/. Accessed 12 July 2016.
- Claude, R., & Weston, B. (1992). Human rights in the world community: Issues and action. Pennsylvania: University of Pennsylvania Press.Google Scholar
- Colker, R., & Milani, A. (2012). The law of disability discrimination handbook: Statutes and regulatory guidance. New York, NY: LexisNexis.Google Scholar
- Drummond, N., Rector, A., Stevens, R., Moulton, G., Horridge, M., Wang, H., & Seidenberg, J. (2006). Putting owl in order: Patterns for sequences in owl. In OWLED.Google Scholar
- Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Computing semantic relatedness using wikipedia-based explicit semantic analysis. pp. 1606–1611.Google Scholar
- Gangemi, A., Navigli, R., & Velardi, P. (2003). The ontowordnet project: Extension and axiomatization of conceptual relations in wordnet. In The OntoWordNet project: Extension and axiomatization of conceptual relations in WordNet, Vol. 2888, pp. 820–838. Springer.Google Scholar
- Hayes, P. J., & Patel-Schneide, P. F. (2014). Rdf 1.1 semantics. https://www.w3.org/TR/rdf11-mt/. Accessed 18 March 2016.
- Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on computational linguistics-Volume 2, Association for Computational Linguistics, pp. 539–545.Google Scholar
- Hellmann, S., Lehmann, J., Auer, S., & Brümmer, M. (2013). Integrating NLP using linked data. In International semantic web conference, pp. 98–113. Springer.Google Scholar
- Hotho, A., Maedche, A., & Staab, S. (2002). Ontology-based text document clustering. KI, 16(4), 48–54.Google Scholar
- Isaac, A., & Summers, E. (2009). Skos simple knowledge organization system primer. w3c recommendation. Technical Report, World Wide Web Consortium (W3C).Google Scholar
- Kasper, W., & Vela, M. (2012). Sentiment analysis for hotel reviews. Speech Technology, 4(2), 96–109.Google Scholar
- Kontostathis, A., Edwards, L., & Leatherman, A. (2009). Chatcoder: Toward the tracking and categorization of internet predators. In Society for industrial and applied mathematics—9th SIAM international conference on data mining 2009, Proceedings in applied mathematics, Vol 3. pp. 1327–1334.Google Scholar
- Kubota, R., & Lin, A. (2010). Race, culture, and identities in second language education: Exploring critically engaged practice. New York: Taylor & Francis.Google Scholar
- Litosseliti, L. (2014). Gender and language theory and practice. New York: Taylor & Francis.Google Scholar
- Loenen, T., & Rodrigues, P. (1999). Non-discrimination law: Comparative perspectives. Alphen aan den Rijn: Kluwer Law International.Google Scholar
- Machhour, H., & Kassou, I. (2013). Improving text categorization: A fully automated ontology based approach. In 2013 Third international conference on communications and information technology (ICCIT), IEEE, pp. 67–72.Google Scholar
- ODP. (2010). Owl list pattern. http://ontologydesignpatterns.org/wiki/Submissions:List. Accessed 18 May 2016.
- Orelus, P. (2011). Rethinking race, class, language, and gender: A dialogue with noam chomsky and other leading scholars. Lanham, MD: Rowman & Littlefield Publishers.Google Scholar
- Santorini, B. (1990). Part-of-speech tagging guidelines for the penn treebank project (3rd revision). Technical Report, University of Pennsylvania.Google Scholar
- Schiek, D., & Lawson, A. (2011). European union non-discrimination law and intersectionality: Investigating the triangle of racial, gender and disability discrimination. Farnham: Ashgate.Google Scholar
- Shuy, R. W. (2007). Fighting over words: Language and civil law cases: Language and civil law cases. Oxford: Oxford University Press.Google Scholar
- Tablan, V., Bontcheva, K., Roberts, I., & Cunningham, H. (2015). Mímir: An open-source semantic search framework for interactive information seeking and discovery. Web Semantics: Science, Services and Agents on the World Wide Web, 30, 52–68. doi: 10.1016/j.websem.2014.10.002 http://www.sciencedirect.com/science/article/pii/S1570826814001036, semantic Search.
- Talbot, M. (2010). Language and gender. New York: Wiley.Google Scholar
- Tontti, J. (2004). Right and prejudice: Prolegomena to a hermeneutical philosophy of law. Farnham: Ashgate.Google Scholar
- University of Newcastle. (2006). Inclusive language policy 000797. http://www.newcastle.edu.au/policy/000797.html.
- Weller, P., Purdam, K., Ghanea, N., & Cheruvallil-Contractor, S. (2013). Religion or belief, discrimination and equality: britain in global contexts. London: Bloomsbury Publishing.Google Scholar