Abstract
In this chapter, we discuss the problem of how to discover when works in a social media site are related to one another by artistic appropriation, particularly parodies. The goal of this work is to discover concrete link information from texts expressing how this may entail derivative relationships between works, authors, and topics. In the domain of music video parodies, this has general applicability to titles, lyrics, musical style, and content features, but the emphasis in this work is on descriptive text, comments, and quantitative features of songs. We first derive a classification task for discovering the “Web of Parody.” Furthermore, we describe the problems of how to generate song/parody candidates, collect user annotations, and apply machine learning approaches comprising of feature analysis, construction, and selection for this classification task. Finally, we report results from applying this framework to data collected from YouTube and explore how the basic classification task relates to the general problem of reconstructing the web of parody and other networks of influence. This points toward further empirical study of how social media collections can statistically reflect derivative relationships and what can be understood about the propagation of concepts across texts that are deemed interrelated.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kimono Labs, (2014). Retrieved from Kimono Labs: https://www.kimonolabs.com/
E. Alpaydin, Introduction to Machine Learning, 3rd edn. (MIT Press, Cambridge, 2014)
API Overview Guide, (2014). Retrieved from Google Developers: https://developers.google.com/youtube/
D.M. Blei, A.Y. Ng, Latent dirichlet allocation. J. Mach. Learn. Res. 2003(3), 993–1022 (2003)
S. Bloehdorn, A. Moschitti, Combined syntactic and semantic kernels for text classification. Adv. Inf. Retr. 4425, 307–318 (2007)
K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, N. Aswani, TwitIE: an open-source information extraction pipeline for microblog text, in Proceedings of the International Conference on Recent Advances in Natural Language Processing (2013)
S. Bull, Automatic Parody Detection in Sentiment Analysis (2010)
R. Bunescu, R. Mooney, A shortest path dependency kernel for relation extraction. in Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing(2005), pp. 724–731
C. Burfoot, T. Baldwin, in ACL-IJCNLP, Automatic Satire Detection: Are You Having A Laugh? (Suntec, Singapore, 2009), pp. 161–164
I. Cadez, D. Heckerman, C. Meek, P. Smyth, S. White, Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering, in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), ed. by R. Ramakrishnan, S. J. Stolfo, R. J. Bayardo, I. Parsa (Boston 2000), pp. 280–284
N. Cancedda, E. Gaussier, C. Goutte, J. Renders, Word sequence kernels. J. Mach. Learn. Res. 3, 1059–1082 (2003)
D. Caragea, V. Bahirwani, W. Aljandal, W. H. Hsu, Ontology-Based Link Prediction in the Livejournal Social Network, in Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA 2009), ed. by V. Bulitko, J. C. Beck, (Lake Arrowhead, CA, 2009)
S. Choudury, J. G. Breslin, User Sentiment Detection: A Youtube Use Case, (2010)
M. Collins, N. Duffy, Convolution kernels for natural language. Adv. Neural Inf. Proces. Syst. 1, 625–632 (2002)
A. Culotta, J. Sorensen, Dependency tree kernels for relation extraction, in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, (2004), p. 423-es
A. Cuzzocrea, I.-Y. Song, K. C. Davis, Analytics over large-scale multidimensional data: the big data revolution!, in Proceedings of the ACM 14th International Workshop on Data Warehousing and On-Line Analytical Processing (DOLAP 2011), ed. by A. Cuzzocrea, I.-Y. Song, K. C. Davis, (ACM Press, Glasgow, 2011) pp. 101–104
R. Dawkins, in The Meme Machine, ed. by S. Blackmore, Foreword, (Oxford: Oxford University Press, 2000). pp. i–xvii
R. Dawkins, The Selfiish Gene, 30th edn. (Oxford University Press, Oxford, 2006)
G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, R. Weischedel, The automatic content extraction (ace) program–tasks, data, and evaluation. Proc. LREC 4, 837–840 (2004)
C. Drummond, R. E. Holte, Severe Class Imbalance: Why Better Algorithms aren't the Answer (2012). Retrieved from http://www.csi.uottawa.ca/~cdrummon/pubs/ECML05.pdf
C.E. Elger, K. Lehnertz, Seizure prediction by non-linear time series analysis of brain electrical activity. Eur. J. Neurosci. 10(2), 786–789 (1998)
W. Elshamy, W. H. Hsu, in Continuous-time infinite dynamic topic models: the dim sum process for simultaneous topic enumeration and formation, ed. by W. H. Hsu, Emerging Methods in Predictive Analytics: Risk Management and Decision-Making (Hershey, IGI Global, 2014), pp. 187–222
U. Gargi, W. Lu, V. Mirrokni, S. Yoon, Large-scale community detection on youtube for topic discovery and exploration, in Proceedings of the 5th International Conference on Weblogs and Social Media, ed. by L. A. Adamic, R. A. Baeza-Yates, S. Counts (Barcelona, Catalonia, 17–21 July 2011)
P. Gill, M. Arlitt, Z. Li, A. Mahanti, YouTube traffic characterization: a view from the edge, in IMC'07: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (ACM, New York, 2007), pp. 15–28
J. Gleick, The Information: A History, a Theory, a Flood (Pantheon Books, New York, 2011)
J. Goldstein, S. F. Roth, Using aggregation and dynamic queries for exploring large data sets, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2004), ed. by E. Dykstra-Erickson, M. Tscheligi (ACM Press, Boston, MA, 1994), pp. 23–29
Google. Statistics. (2012). Retrieved from YouTube: http://www.youtube.com/t/press_statistics
M. Hall, E. Frank, G. Holmes, B. Pfahringer, The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009b)
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1), 10–18 (2009a)
J. Heer, N. Kong, M. Agrawala, Sizing the horizon: the effects of chart size and layering on the graphical perception of time series visualizations, in Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI 2009) (ACM Press, Boston, 2009), pp. 1303–1312
E. Hovy, J. Lavid, Towards a ‘Science’ of corpus annotation: a new methodological challenge for corpus linguistics. Int. J. Translat. 22(1), 13–36 (2010). doi:10.1075/target.22.1
W. H. Hsu, J. P. Lancaster, M. S. Paradesi, T. Weninger, Structural link analysis from user profiles and friends networks: a feature construction approach, in Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM 2007), ed. by N. S. Glance, N. Nicolov, E. Adar, M. Hurst, M. Liberman, F. Salvetti (Boulder, CO, 2007), pp. 75–80
H. Jenkins. If it Doesn’t Spread, it’s Dead. (2009). Retrieved 06 16, 2011, from Confessions of an Aca-Fan: The Official Weblog of Henry Jenkins: http://www.henryjenkins.org/2009/02/if_it_doesnt_spread_its_dead_p.html
D. A. Keim, Challenges in visual data analysis, in 10th International Conference on Information Visualisation (IV 2006), ed. by E. Banissi, K. Börner, C. Chen, G. Clapworthy, C. Maple, A. Lobben, … J. Zhang (IEEE Press, London, 2006), pp. 9–16
D. Koller (2001). Representation, reasoning, learning: IJCAI 2001 computers and thought award lecture. Retrieved from Daphne Koller: http://stanford.io/TFV7qH
A. Krishna, J. Zambreno, S. Krishnan, Polarity trend analysis of public sentiment on Youtube, in The 19Tth International Conference on Management of Data (Ahmedabad, 2013)
N. Kumar, E. Keogh, S. Lonardi, C. A. Ratanamahatana, Time-series bitmaps: a practical visualization tool for working with large time series databases, in Proceedings of the 5th SIAM International Conference on Data Mining (SDM 2005) (Newport Beach, CA, 2005), pp. 531–535
M. Liberman. Penn Treebank POS, (2003). Retrieved 2014, from Penn Arts and Sciences: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
LiteraryDevices Editors, (2014). Retrieved from Literary Devices: http://literarydevices.net
J. Liu, S. Ali, M. Shah, Recognizing human actions using multiple features, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008) (2008), pp. 1–8. doi: 10.1109/CVPR.2008.4587527
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, C. Watkins, Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, D. McClosky, The Stanford CoreNLP Natural Language Processing Toolkit, in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2014), pp. 55–60
C. Mario, D. Talia, The knowledge grid. Commun. ACM 46(1), 89–93 (2003)
A. K. McCallum (2002). Retrieved from MALLET: A Machine Learning for Language Toolkit: http://mallet.cs.umass.edu
A. Mesaros, T. Virtanen, Automatic recognition of lyrics in singing. EURASIP J. Audio, Speech, and Music Processing 2010 (2010). doi:10.1155/2010/546047
M. Mintz, S. Bills, R. Snow, D. Jurafsky, Distant supervision for relation extraction without labeled data, in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 (2009), pp. 1003–1011
T.M. Mitchell, Machine learning (McGraw Hill, New York, 1997)
M. Monmonier, Strategies for the visualization of geographic time-series data. Cartographica: Int. J. Geogr. Inf. Geovisualization 27(1), 30–45 (1990)
A. Moschitti, A study on convolution kernels for shallow semantic parsing, in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (2004), p. 335-es
A. Moschitti, Syntactic kernels for natural language learning: the semantic role labeling case, in Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XX (2006). (pp. 97–100)
A. Moschitti, Kernel methods, syntax and semantics for relational text categorization, in Proceeding of the 17th ACM Conference on Information and Knowledge Management (2008), pp. 253–262
A. Moschitti, D. Pighin, R. Basili, Tree kernels for semantic role labeling. Comput. Linguist. 34(2), 193–224 (2008)
J. C. Murphy, W. H. Hsu, W. Elshamy, S. Kallumadi, S. Volkova, Greensickness and HPV: a comparative analysis?, in New Technologies in Renaissance Studies II, ed. by T. Gniady, K. McAbee, J. C. Murphy, vol. 4 (Toronto and Tempe, AZ, USA: Iter and Arizona Center for Medieval and Renaissance Studies, 2014), pp. 171–197
K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012)
T. Nguyen, A. Moschitti, G. Riccardi, Convolution kernels on constituent, dependency and sequential structures for relation extraction, in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: volume 3 (2009), pp. 1378–1387
T. O'reilly, What Is Web 2.0 (O'Reilly Media, Sebastopol, 2009)
A. Reyes, P. Rosso, T. Veale, A multidemensional approach for detecting irony in twitter. Lang. Resour. Eval. 47(1), 239–268 (2012)
J. Selden, Table Talk: Being the Discourses of John Selden. London: Printed for E. Smith (1689)
J. Shawe-Taylor, N. Cristianini, An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods (Cambridge University Press, Cambridge, 2004)
S. C. Siersdorfer, How useful are your comments?-Analyzing and predicting Youtube comments and comment ratings, in Proceedings of the 19th International Conference on World Wide Web, vol. 15 (2010), pp. 897–900
V. Simmonet, Classifying Youtube channels: a practical system, in Proceedings of the 22nd International Conference on World Wibe Web Companion (2013), pp. 1295–1303
J. Steele, N. Iliinsky (eds.), Beautiful Visualization: Looking at Data Through the Eyes of Experts (O'Reilly Media, Cambridge, 2010)
L. A. Trindade, H. Wang, W. Blackburn, N. Rooney, Text classification using word sequence kernel methods, in Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC 2011) (Guilin, 2011), pp. 1532–1537
L. A. Trindade, H. Wang, W. Blackburn, P. S. Taylor, Enhanced factored sequence kernel for sentiment classification, in Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies (WI-IAT 2014) (2014), pp. 519–525
O. Tsur, D. Davidov, A. Rappoport, in ICWSN—A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews, AAAI (2010)
M. Wang, A re-examination of dependency path kernels for relation extraction, in Proceedings of IJCNLP (2008), 8
H.J. Watson, B.H. Wixom, The current state of business intelligence. IEEE Comput. 40(9), 96–99 (2007)
T. Watt, Cheap Print and Popular Piety, 1550–1640 (Cambridge University Press, Cambridge, 1991)
M. Wattenhofer, R. Wattenhofer, Z. Zhu, The YouTube Social Network, in Sixth International AAAI Conference on Weblogs and Social Media (2012), pp. 354–361
J. L. Weese, in Emerging Methods in Predictive Analytics: Risk Management and Decision-Making, ed. by W. H. Hsu, Predictive analytics in digital signal processing: a convolutive model for polyphonic instrument identification and pitch detection using combined classification. (Hershey: IGI Global, 2014), pp. 223–253
M. Yang, W. H. Hsu, S. Kallumadi, in Emerging Methods in Predictive Analytics: Risk Management and Decision-Making, ed. by W. H. Hsu, Predictive analytics of social networks: a survey of tasks and techniques (Hershey: IGI Global, 2014), pp. 297–333
W. Yang, G. Toderici, Discriminative tag learning on Youtube videos with latent sub-tags. CVPR, (2011), pp. 3217–3224
H. Yoganarasimhan. (2012). Impact of Social Network Structure on Content Propagation: A Study Using Youtube Data. Retrieved from: http://faculty.gsm.ucdavis.edu/~hema/youtube.pdf
D. Zelenko, C. Aone, A. Richardella, Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003)
M. Zhang, J. Zhang, J. Su, G. Zhou, A composite kernel to extract relations between entities with both flat and structured features, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (2006), pp. 825–832
Acknowledgements
We thank the anonymous reviewers for helpful comments, and Hui Wang and Niall Rooney for the survey of kernel methods for clustering and classification of text documents in Section “Machine Learning Task: Classification”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Weese, J.L., Hsu, W.H., Murphy, J.C., Knight, K.B. (2017). Parody Detection: An Annotation, Feature Construction, and Classification Approach to the Web of Parody. In: Hai-Jew, S. (eds) Data Analytics in Digital Humanities. Multimedia Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-54499-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-54499-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54498-4
Online ISBN: 978-3-319-54499-1
eBook Packages: Computer ScienceComputer Science (R0)