Extraction Method of Micro-Blog New Login Word Based on Improved Position-Word Probability

Zhu, Hongze; Zhang, Shunxiang

doi:10.1007/978-3-319-67071-3_45

Hongze Zhu¹⁷ &
Shunxiang Zhang¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 580))

Included in the following conference series:

International Conference on Applications and Techniques in Cyber Security and Intelligence

1072 Accesses

Abstract

In the traditional discovery methods of micro-blog new login word, compound words are difficult to be extracted effectively. Aiming to solve this problem, this paper proposes an extraction method of micro-blog new login word based on improved Position-Word Probability (PWP) and N-increment algorithm. First, the micro-blog long text is composed of all micro-blog within a single topic in period of a given time and then pre-treated. Then, the extension direction of frequent strings is judged by improved the probability of word location in the query process of N-increment algorithm. Finally, the redundant strings are reduced by pruning frequent strings set. The experimental results show that the algorithm proposed in this paper can effectively extract the compound words in micro-blog new login word.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mei L.L.: A new words extraction method based on domain specificity and statistical language knowledge, Beijing Institute of Technology (2016)
Google Scholar
Lei, Y.M., Liu, Y., Huo, H.: Network oriented language corpus word discovery based on micro-blog. Comput. Eng. Des. 3, 789–794 (2017)
Google Scholar
Yao, R.P., Xu, G.Y., Song, J.: Micro-blog new word discovery method based on improved mutual information and branch entropy. J. Comput. Appl. 36(10), 2772–2776 (2016)
Google Scholar
Zhang, S., Liu, Q.R., Lei, W.: A Weibo-oriented method for unknown word extraction. In: 2012 Eighth International Conference on Semantics, Knowledge and Grids, pp. 209–212 (2012)
Google Scholar
Su, Q.L., Liu, B.Q.: Chinese new word extraction from Micro-blog data. In: 2013 International Conference on Machine Learning and Cybernetics, vol. 4, pp. 1874–1879 (2013)
Google Scholar
Zhang, S.X., Wang, Y., Zhang, S.Y., Zhu, G.L.: Building associated semantic representation model for the ultra-short micro-blog text jumping in big data. Clust. Comput. J. Netw. Softw. Tools Appl. 19(3), 1399–1410 (2016)
Google Scholar
Xu, Z., Luo, X.F., Zhang, S.X., Xiao, W., Lin, M., Hua, C.P.: Mining temporal explicit and implicit semantic relations between entities using web search engines. Future Gener. Comput. Syst. 37(7), 468–477 (2014)
Article Google Scholar
Peng, J., Detchon, S., Choo, K.-K.R., Ashman, H.: Astroturfing detection in social media: a binary n-gram–based approach. Concurr. Comput. Pract. Exp. (in press) (2017)
Google Scholar
Peng, J., Choo, K.-K.R., Ashman, H.: Bit-level N-gram based forensic authorship analysis on social media: identifying individuals from linguistic profiles. J. Netw. Comput. Appl. 70, 171–182 (2016)
Article Google Scholar
Peng, J,, Raymond Choo, K.-K., Ashman, H.: Astroturfing detection in social media: using binary n-gram analysis for authorship attribution. In: Proceedings of 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom 2016), 23–26 August 2016, pp. 121–128, IEEE Computer Society Press (2016)
Google Scholar

Download references

Acknowledgements

This Research work was supported in part by the Natural Science Foundation of Anhui Province Universities (No. KJ2015A111), the Opening Project of Shanghai Key Laboratory of Integrate Administration Technologies for Information Security (Grant No. AGK2013002) in part by the National Science Foundation of China under (Grant No. 61300202).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China
Hongze Zhu & Shunxiang Zhang

Authors

Hongze Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shunxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shunxiang Zhang .

Editor information

Editors and Affiliations

Faculty of Science, Engineering and Built Environment, Deakin University, Geelong, Victoria, Australia
Jemal Abawajy
Department of Information Systems and Cyber Security, The University of Texas at San Antonio, San Antonio, Texas, USA
Kim-Kwang Raymond Choo
School of Computing and Mathematics, Charles Sturt University, Albury, New South Wales, Australia
Rafiqul Islam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, H., Zhang, S. (2018). Extraction Method of Micro-Blog New Login Word Based on Improved Position-Word Probability. In: Abawajy, J., Choo, KK., Islam, R. (eds) International Conference on Applications and Techniques in Cyber Security and Intelligence. ATCI 2017. Advances in Intelligent Systems and Computing, vol 580. Edizioni della Normale, Cham. https://doi.org/10.1007/978-3-319-67071-3_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-67071-3_45
Published: 21 October 2017
Publisher Name: Edizioni della Normale, Cham
Print ISBN: 978-3-319-67070-6
Online ISBN: 978-3-319-67071-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics