Skip to main content

An Automatical Moderating System for FML Using Hashing Regression

  • Conference paper
Advanced Data Mining and Applications (ADMA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

  • 3145 Accesses

Abstract

In this paper we propose a novel machine learning application on a funny story sharing website for automatical moderation of newly submitted posts based on their content and metadata. This is a challenging task due to the limitation of a machine to understand a joke and the fact that the content of each post is quite short. We collect all the posts of the website using a web crawler, and then extract the features of the posts with the help of some natural language processing (NLP) tools. Finally we utilize a regression model based on approximate nearest neighbor (ANN) search to predict the number of votes for a given post to achieve the goal of determining its quality. Hashing techniques are used to address the curse of dimensionality issue and also for its fast query speed and low storage cost. The experiment shows that our system can achieve a satisfactory performance using various hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. J. Electronic Imaging 16(4), 049901 (2007)

    Google Scholar 

  2. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, 518–529 (1999)

    Google Scholar 

  3. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)

    Google Scholar 

  4. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468 (2006)

    Google Scholar 

  5. Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR (2008)

    Google Scholar 

  6. Marcus, M.P., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: Annotating predicate argument structure. In: HLT. Morgan Kaufmann (1994)

    Google Scholar 

  7. Porter, M.F.: Snowball: A language for stemming algorithms (2001)

    Google Scholar 

  8. Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1980)

    Article  Google Scholar 

  9. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)

    Google Scholar 

  10. Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR, pp. 2074–2081 (2012)

    Google Scholar 

  11. Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML, pp. 1–8 (2011)

    Google Scholar 

  12. Gong, Y., Lazebnik, S.: Iterative quantization: A procrustean approach to learning binary codes. In: CVPR, pp. 817–824. IEEE (2011)

    Google Scholar 

  13. Wang, J., Kumar, S., Chang, S.F.: Sequential projection learning for hashing with compact codes. In: Fürnkranz, J., Joachims, T. (eds.) ICML, pp. 1127–1134. Omnipress (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, P., Guo, M. (2013). An Automatical Moderating System for FML Using Hashing Regression. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53917-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53916-9

  • Online ISBN: 978-3-642-53917-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics