Skip to main content
Log in

Analyzing the capabilities of crowdsourcing services for text summarization

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper presents a detailed analysis of the use of crowdsourcing services for the Text Summarization task in the context of the tourist domain. In particular, our aim is to retrieve relevant information about a place or an object pictured in an image in order to provide a short summary which will be of great help for a tourist. For tackling this task, we proposed a broad set of experiments using crowdsourcing services that could be useful as a reference for others who want to rely also on crowdsourcing. From the analysis carried out through our experimental setup and the results obtained, we can conclude that although crowdsourcing services were not good to simply gather gold-standard summaries (i.e., from the results obtained for experiments 1, 2 and 4), the encouraging results obtained in the third and sixth experiments motivate us to strongly believe that they can be successfully employed for finding some patterns of behaviour humans have when generating summaries, and for validating and checking other tasks. Furthermore, this analysis serves as a guideline for the types of experiments that might or might not work when using crowdsourcing in the context of text summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.mturk.com.

  2. Crowdflower (http://crowdflower.com/) is a crowdsourcing service built on top of AMT which allows non-US citizens to run tasks on AMT.

  3. Human Intelligence Tasks.

  4. Since AMT has been employed more than Crowdflower, in this section, we mainly focus on this platform.

  5. Text analysis conference: http://www.nist.gov/tac/.

  6. http://crowdflower.com/.

  7. We establish a length of 200 words for all generated summaries.

  8. In the manual process we identified the full sentence where the part occurred and selected the entire sentence as answer.

  9. Our previous experiments have shown that the country of workers’ does not have impact on the quality of the results.

  10. For summaries which did not contain this information we included to the end of it a sentence providing the country information.

References

  • Aker, A., & Gaizauskas, R. (2010). Model summaries for location-related images. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta.

  • Aker, A., El-Haj, M., Albakour, M. D., & Kruschwitz, U. (2012). Assessing crowdsourcing quality through objective tasks. In Proceedings of LREC.

  • Alonso, O., Rose, D. E., & Stewart, B. (2008). Crowdsourcing for relevance evaluation. SIGIR Forum 42(2), 9–15.

    Article  Google Scholar 

  • Buzek, O., Resnik, P., & Bederson, B. B. (2010). Error driven paraphrase annotation using mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazons mechanical turk.

  • Callison-Burch, C. (2009). Fast, cheap, and creative: Evaluating translation quality using amazon’s mechanical turk. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 286–295).

  • Dakka, W., & Ipeirotis, P.G. (2008). Automatic extraction of useful facet hierarchies from text databases. In ICDE ’08: Proceedings of the 2008 IEEE 24th international conference on data engineering (pp. 466–475).

  • El-Haj, M., Kruschwitz, U., & Fox, C. (2010). Using mechanicel turk to create a corpus of arabic summaries. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10), Valletta, Malta.

  • Feng, D., Besana, S., & Zajac, R. (2009). Acquiring High Quality Non-expert Knowledge from On-demand Workforce. In Proceedings of the 2009 workshop on the people’s web meets NLP: Collaboratively constructed semantic resources (pp. 51–56). Association for computational linguistics, Morristown, NJ, People’s Web ’09. http://portal.acm.org/citation.cfm?id=1699765.1699773.

  • Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., & Dredze, M. (2010). Annotating named entities in twitter data with crowdsourcing. In CSLDAMT ’10: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk, association for computational linguistics (pp. 80–88). Morristown, NJ

  • Giannakopoulos, G., Karkaletsis, V., Vouros, G., & Stamatopoulos, P. (2008). Summarization system evaluation revisited: N-gram graphs. ACM transactions on speech and language processing 5(3), 1–39.

    Article  Google Scholar 

  • Gillick, D., & Liu, Y. (2010). Non-expert evaluation of summarization systems is risky. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazons mechanical turk.

  • Heilman, M., & Smith, N. (2010). Rating computer-generated questions with mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk (pp. 00–00).

  • Hsueh, P. Y., Melville, P., & Sindhwani, V. (2009). Data quality from crowdsourcing: a study of annotation selection criteria. In HLT ’09: Proceedings of the NAACL HLT 2009 workshop on active learning for natural language processing (pp. 27–35).

  • Kaisser, M., Hearst, M. A., & Lowe, J. B. (2008). Improving search results quality by customizing summary lengths. In Proceedings of ACL-08: HLT, Columbus, Ohio (pp. 701–709).

  • Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with mechanical turk. In Proceeding of the twenty-sixth annual SIGCHI conference on human factors in computing systems (pp. 453–456).

  • Le, J., Edmonds, A., Hester, V., & Biewald, L. (2010). Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In Proceedings of the ACM SIGIR 2010 workshop on crowdsourcing for search evaluation (CSE 2010) (pp. 17–20), Switzerland: Geneva.

  • Lin, C. Y. (2004). ROUGE: a package for automatic evaluation of summaries. In Proceedings of ACL text summarization workshop (pp. 74–81).

  • Mason, W., & Watts, D.J. (2010) Financial incentives and the “Performance of Crowds”. SIGKDD Explor Newsl 11, 100–108.

    Article  Google Scholar 

  • Nakov, P. (2008). Noun compound interpretation using paraphrasing verbs: Feasibility study. In AIMSA ’08: Proceedings of the 13th international conference on artificial intelligence (pp. 103–117).

  • Negri, M., & Mehdad, Y. (2010). Creating a bi-lingual entailment corpus through translations with mechanical turk: \(\hbox{\$}100\) for a 10-day rush. In CSLDAMT ’10: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s mechanical turk (pp. 212–216). Morristown, NJ: Association for Computational Linguistics.

  • Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD ’08: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 614–622).

  • Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. (2008). Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 conference on empirical methods in natural language processing (pp. 254–263), Honolulu, Hawaii.

  • Sorokin, A., & Forsyth, D. (2008). Utility data annotation with amazon mechanical turk. pp. 1–8.

  • Tang, J., & Sanderson, M. (2010). Evaluation and user preference study on spatial diversity. In Proceedings of the 32nd European conference on information retrieval (ECIR).

Download references

Acknowledgments

This work was supported by the EU-funded TRIPOD project (IST-FP6-045335) and by the Spanish Government through the FPU program and the projects TIN2009-14659-C03-01, TSI 020312-2009-44, and TIN2009-13391-C04-01; and by Conselleria d’Educació–Generalitat Valenciana (grant no. PROMETEO/2009/119 and grant no. ACOMP/2010/286).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena Lloret.

Appendix: HIT for Experiment 5 (Generating Informative Summaries about Tourist Places using Quality Control Mechanisms)

Appendix: HIT for Experiment 5 (Generating Informative Summaries about Tourist Places using Quality Control Mechanisms)

In this appendix, we provide the complete HIT for the experiment regarding the selection of five pieces of information or sentences that best answer different questions a person would like to know about a specific place. Together with the instructions we provide an example of how to perform the task, and we also warned the annotators of the acceptance requirements for the task to be considered correctly done.

Information Selection: Instructions

You will be given a name of a geo-graphical place, a picture of it and 10 documents related to the place. Imagine you are a tourist and have got 5 questions about the place for which you would like to know the answers (i.e., what information you would ask for). Please select your answers from the given documents. The documents contain in each line a sentence. Your answer selections can contain the entire sentence or a part of it.

Finally, you will be given two questions. Please answer them. The answers might not come from the given documents.

Example

Given Place Name: Ararat (Fig. 7)

Fig. 7
figure 7

Example of the image shown (Mount Ararat)

Your possible questions might look like:

  1. 1.

    What is Ararat?

  2. 2.

    Where it is located? 3. What is its height? 4. How many peaks does it have? 5. When was the last erruption?

Your possible answer selections might look like:

  1. 1.

    Ararat is a stratovolcano, formed of lava flows and pyroclastic ejecta, with no volcanic crater.

  2. 2.

    Mount Ararat is located in the Eastern Anatolia Region of Turkey

  3. 3.

    It has an elevation of 5,137 m/16,854 ft)

  4. 4.

    It has two peaks

  5. 5.

    It is not known when the last eruption of Ararat occurred

Acceptance Requirement

  1. A.

    We added trap sentences into the documents. Thus your work should be genuine. In the case a trap sentence is selected as answer, the work will be rejected.

  2. B.

    You should avoid redundant information while selecting the sentences. For instance, if you select two sentences which contain the same information about the place then your work will be rejected.

  3. C.

    You have to go through all documents. Otherwise your work will be rejected.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lloret, E., Plaza, L. & Aker, A. Analyzing the capabilities of crowdsourcing services for text summarization. Lang Resources & Evaluation 47, 337–369 (2013). https://doi.org/10.1007/s10579-012-9198-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-012-9198-8

Keywords

Navigation