Skip to main content

Research of LSH and Outliers Detection

  • Conference paper
Book cover Information Computing and Applications (ICICA 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 307))

Included in the following conference series:

  • 1103 Accesses

Abstract

Data supplied mining executive funded data repository of a subset of the sample after the capital data set can replace the original database to reduce order to repeatedly search the database of the time, therefore there is a lot of algorithms has been proposed future at reasonable sampling-owned data sets so that it more real to reflect the original database. These algorithms by data from randomly selected set, select or deletion of swap some noise records of the transactions to make more meaningful rules can be extracted out of the future. We observed that the sample data set is composed of cluster transaction data. Each cluster consists of the similar nature of the information in some of the arguments. Therefore, the removal of outliers should be based on each cluster as a unit without the data set should be based on the entire sample. In order to consider for high-dimensional data encountered curse of dimensionality of the problem. We have studied LSH (Locality the Sensitive the Hashing) the technology to do a cluster of all cut the main point of view through multiple hybrid hash function high similar to the level of trading discipline record will have higher opportunity gathered in the same cluster, the contrary is each other collision reduce the chances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: The 4th International Conference on Knowledge Discovery and Data Mining (1996)

    Google Scholar 

  2. Klein, A., Do, H.-H., Lehner, W.: Representing data quality for streaming and static data. In: The International Workshop on Ambient Intelligence, Media, and Sensing, AIMS Workshop, pp. 3–10 (2007)

    Google Scholar 

  3. Fayyad, U.M., Reina, C.A., Bradley, P.S.: Initialization of iterative refinement clustering algorithms. In: The 4th International Conference on Knowledge Discovery and Data Mining, pp. 194–198 (1998)

    Google Scholar 

  4. Kraemer, J., Seeger, B.: Pipes - A public infrastructure for processing and exploring streams. In: Weikum, G., et al. (eds.) The 9th International Conference on Management of Data, pp. 925–926. ACM (2004)

    Google Scholar 

  5. Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: The ACM International Conference on Management of Data, pp. 73–84 (1998)

    Google Scholar 

  6. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers (2006)

    Google Scholar 

  7. Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)

    Article  MATH  Google Scholar 

  8. Mielke, M., Mueller, H., Naumann, F.: Ein data-quality-wettbewerb. Datenbank-Spektrum 14, 34–37 (2005)

    Google Scholar 

  9. Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: A Relevance-Extended Multi-Dimensional Model for a Data Warehouse Contextualized with Documents. In: Proc. Eighth ACM Int’l Workshop Data Warehousing and Olap (Dolap 2005), pp. 19–28 (2005)

    Google Scholar 

  10. Nabli, A., Soussi, A., Feki, J., Ben-abdallah, H., Gargouri, F.: Owards an Automatic Data Mart Design. In: 7th International Conference on Enterprise Information Systems (ICEIS 2005), Miami, USA, pp. 226–231 (May 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Yy., Zeng, R., Li, Mz., Li, F. (2012). Research of LSH and Outliers Detection. In: Liu, C., Wang, L., Yang, A. (eds) Information Computing and Applications. ICICA 2012. Communications in Computer and Information Science, vol 307. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34038-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34038-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34037-6

  • Online ISBN: 978-3-642-34038-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics