Abstract
In this paper we propose a novel machine learning application on a funny story sharing website for automatical moderation of newly submitted posts based on their content and metadata. This is a challenging task due to the limitation of a machine to understand a joke and the fact that the content of each post is quite short. We collect all the posts of the website using a web crawler, and then extract the features of the posts with the help of some natural language processing (NLP) tools. Finally we utilize a regression model based on approximate nearest neighbor (ANN) search to predict the number of votes for a given post to achieve the goal of determining its quality. Hashing techniques are used to address the curse of dimensionality issue and also for its fast query speed and low storage cost. The experiment shows that our system can achieve a satisfactory performance using various hashing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. J. Electronic Imaging 16(4), 049901 (2007)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, 518–529 (1999)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468 (2006)
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR (2008)
Marcus, M.P., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: Annotating predicate argument structure. In: HLT. Morgan Kaufmann (1994)
Porter, M.F.: Snowball: A language for stemming algorithms (2001)
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)
Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR, pp. 2074–2081 (2012)
Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML, pp. 1–8 (2011)
Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: CVPR, pp. 817–824. IEEE (2011)
Wang, J., Kumar, S., Chang, S.F.: Sequential projection learning for hashing with compact codes. In: Fürnkranz, J., Joachims, T. (eds.) ICML, pp. 1127–1134. Omnipress (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, P., Guo, M. (2013). An Automatical Moderating System for FML Using Hashing Regression. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-53917-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)