An Automatical Moderating System for FML Using Hashing Regression

Zhang, Peichao; Guo, Minyi

doi:10.1007/978-3-642-53917-6_13

Peichao Zhang²⁵ &
Minyi Guo²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3145 Accesses

Abstract

In this paper we propose a novel machine learning application on a funny story sharing website for automatical moderation of newly submitted posts based on their content and metadata. This is a challenging task due to the limitation of a machine to understand a joke and the fact that the content of each post is quite short. We collect all the posts of the website using a web crawler, and then extract the features of the posts with the help of some natural language processing (NLP) tools. Finally we utilize a regression model based on approximate nearest neighbor (ANN) search to predict the number of votes for a given post to achieve the goal of determining its quality. Hashing techniques are used to address the curse of dimensionality issue and also for its fast query speed and low storage cost. The experiment shows that our system can achieve a satisfactory performance using various hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. J. Electronic Imaging 16(4), 049901 (2007)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, 518–529 (1999)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
Google Scholar
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468 (2006)
Google Scholar
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR (2008)
Google Scholar
Marcus, M.P., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: Annotating predicate argument structure. In: HLT. Morgan Kaufmann (1994)
Google Scholar
Porter, M.F.: Snowball: A language for stemming algorithms (2001)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)
Article Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)
Google Scholar
Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR, pp. 2074–2081 (2012)
Google Scholar
Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML, pp. 1–8 (2011)
Google Scholar
Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: CVPR, pp. 817–824. IEEE (2011)
Google Scholar
Wang, J., Kumar, S., Chang, S.F.: Sequential projection learning for hashing with compact codes. In: Fürnkranz, J., Joachims, T. (eds.) ICML, pp. 1127–1134. Omnipress (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University, Shanghai, China
Peichao Zhang & Minyi Guo

Authors

Peichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, Edmonton, University of Alberta, T6G 2E8, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, P., Guo, M. (2013). An Automatical Moderating System for FML Using Hashing Regression. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-53917-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics