Abstract
The words.hk project is the first attempt to build a Cantonese-to-Cantonese dictionary using a lean start-up (see Ries, The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business, 2011) model combined with crowdsourcing strategies. The goal is to produce a comprehensive dictionary written for Cantonese and in Cantonese. Existing resources are often (1) not available electronically, (2) out of date, or (3) too Anglo- or Sino-centric. Building large data sets from these existing resources requires a lot of editing and ‘data-janitorial’ work, which can be done far better with a large group of less-experienced people than just a handful of experts, and crowdsourcing strategies are particularly appropriate in these cases. We started with a small team of editors and software developers in 2014. In less than 3 years’ time, we grew into an organisation with over 400 volunteers, gathered over 42,000 entries, of which more than 36,000 entries have been edited with Written Cantonese descriptions, examples, and translations as of June 2017. Given the nature of the project and the member composition – a language with no authority to fall back on and most members with no formal linguistics or lexicographical training – we adhere to two simple principles, in order to keep the dictionary growing without introducing major issues in the core data: ‘usage over etymology’ and ‘decision problem avoidance’. I will discuss how these principles have shaped the architecture of the project, the editing workflow, and other technological difficulties that we face.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A Cantonese-to-English dictionary is being compiled at the same time.
- 2.
These nicely rephrased questions were perhaps brought up because some think (1) there is no use; (2) online projects should never be trusted; and (3) how dare you write a Cantonese dictionary?
- 3.
I refer readers to Li (2011) and Snow (2004), respectively, for the status of Written Cantonese in its early stage and subsequent development until the early 2000s. The recent development of Written Cantonese as a fully functional written variety is discussed at length in a submitted manuscript by the author titled ‘The Transformation of “Cantonese in Written Materials” into “Written Cantonese”’.
- 4.
Cantonese Wikipedia (n.d.). In Wikipedia. Retrieved June 25, 2017, from https://en.wikipedia.org/wiki/Cantonese_Wikipedia
- 5.
- 6.
Native speakers usually look at character-based collections, such as A Chinese Syllabary Pronounced According to the Dialect of Canton (Wong 1941) or online references like Chinese Character Database: With Word-formations Phonologically Disambiguated According to the Cantonese Dialect (accessible from http://humanum.arts.cuhk.edu.hk/Lexis/lexi-can/), to look up the Cantonese pronunciation of a character. Most speakers do not consult a dictionary to check the meaning of unknown Cantonese words.
- 7.
Prominent ones include A Chinese Dictionary in the Cantonese Dialect (Eitel 1877), The Student’s Cantonese-English Dictionary (Meyer and Wempe 1935), Cantonese Speaker’s Dictionary (Cowles 1965), Cantonese Dictionary: Cantonese-English, English-Cantonese (Huang 1970), A Practical Cantonese-English dictionary (Lau 1977), and Touhou Kantongo Jiten (Chishima 2005).
- 8.
Hong Kong Cantonese is the de facto standard, but we also recruited editors from other Cantonese-speaking regions.
- 9.
See http://beta.words.hk/base/hoifong/ for the full licence text.
- 10.
Copyrights in Hong Kong are generally retained for 50 years after author’s death or 50 years after being published.
- 11.
Our “Non-Commercial Open Data Licence” (see Footnote #8) can be readily adapted to any project with valuable content. In addition, the Creative Commons family of licenses (http://creativecommons.org) are also a popular choice for open licencing of data.
- 12.
Which contains words in Mandarin, but most of them are shared among Mandarin and Cantonese, and it also contains local words that are used by Cantonese speakers only.
- 13.
A software development practice ‘in which requirements and solutions evolve through collaboration between self-organising, cross-functional teams’ (Wikipedia).
- 14.
The practice of separating development task into clear-cut phases; the process flows downwards through these phases.
- 15.
The character was originally proposed by Kong (1933) to be the character for ‘a bit’, but it was listed without any justification. The character is pronounced as zit3 according to correspondence rules between Middle Chinese and Cantonese pronunciation and had never been listed in any Cantonese materials except for that single occurrence, before the character was rediscovered and popularised by columnist Pang Chi-Ming.
- 16.
That is, when the speaker is forced to guess the pronunciation of a word or required to write out certain spoken words with which they are unfamiliar.
- 17.
Despite the fact that Cantonese orthography or pronunciation is not officially regulated or standardised by any institutional bodies, elements that are shared between Cantonese and Standard Written Chinese are standardised and well-documented.
- 18.
Cantonese Wikipedia used to have a set of standard characters, and they are no longer in use. Since there is a large community of Written Cantonese speakers, as well as publications using Written Cantonese, a socially emerged standardisation is taking place, and a rigid top-down standardisation seems unnecessary.
References
Bauer, R. (1988). Written Cantonese of Hong Kong. Cahiers de Linguistique-Asie Orientale, 17(2), 245–293.
Caau2. n.d. Retrieved June 25, 2017, from Words.hk http://beta.words.hk/zidin/%E7%82%92.
Cantonese Wikipedia. n.d. Retrieved June 25, 2017, from Wikipedia https://en.wikipedia.org/wiki/Cantonese_Wikipedia.
Chin, A. C.-O. (2018). Initiatives of digital humanities in Cantonese studies: A corpus of mid-20th century Hong Kong Cantonese. In K.-K. Tam (Ed.), Digital humanities and new ways of teaching. Singapore: Springer.
Chishima, E. (2005). Tōhō Kantongo Jiten [Tōhō Cantonese dictionary]. Tōkyō: Tōhō Shoten.
Cowles, R. (1965). Cantonese speaker’s dictionary. Hong Kong: Hong Kong University Press.
Eitel, E. (1877). A Chinese dictionary in the Cantonese dialect. London: Trübner and Co. 57 & 59, Ludgate Hill and Hong Kong: Lane, Crawford & Co.
Ferguson, C. (1959). Diglossia. Word, 15, 325–340.
Howe, J. (2006). The rise of crowdsourcing. Wired Magazine. Retrieved from http://sistemas-humano-computacionais.wikidot.com/local–files/capitulo:redessociais/Howe_The_Rise_of_Crowdsourcing.pdf.
Huang, P. (1970). Cantonese dictionary: Cantonese-English, English-Cantonese. New Haven: Yale University Press.
Hutton, C., & Bolton, K. (2005). A dictionary of Cantonese slang: The language of Hong Kong movies, street gangs and city life. Honolulu: University of Hawaii Press.
Kong, Z. N. (1933). Guangdong Suyu Kao [Study on common sayings in Cantonese]. Guangzhou: Nanfang Fulunshe.
Lau, S. (1977). A practical Cantonese-English dictionary. Hong Kong: Hong Kong Government Printer.
Leung, M., & Law, S. (2002). HKCAC: The Hong Kong Cantonese adult language corpus. International Journal of Corpus Linguistics, 6(2), 305–326.
Li, Y. M. F. (2011). Qingmo Minchu de Yueyu Shuxie [Cantonese writing in late Qing and early Republic of China]. Hong Kong: Joint Publishing (HK).
Luke, K., & Wong, M. (2015). The Hong Kong Cantonese corpus: Design and uses. In B. K. Tsou & O. Y. Kwong (Eds.), JCL monograph series no. 25: Linguistic corpus and corpus linguistics in the Chinese context (pp. 312–333). Hong Kong: The Chinese University Press.
Meyer, B., & Wempe, T. (1935). The student’s Cantonese-English dictionary. Unknown: St. Louis Industrial School Printing Press.
Qu, D. J. (1678). Guangdong Xinyu [New words about Guangdong]. (n.p.)
Rieder, B., & Röhle, T. (2012). Digital methods: Five challenges. In D. Berry (Ed.), Understanding digital humanities (pp. 67–84). London: Palgrave Macmillan.
Ries, E. (2011). The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business.
Snow, D. B. (2004). Cantonese as written language: The growth of a written Chinese vernacular. Hong Kong: Hong Kong University Press.
Snow, D. B. (2008). Cantonese as written standard? Journal of Asian Pacific Communication, 18(2), 190–208.
Tang, S. W. (2015). Yueyu Yufa Jiangyi [Lectures on Cantonese grammar]. Hong Kong: Commercial Press.
Wong, S. L. (1941). Yueyin Yunhui [A Chinese syllabary pronounced according to the dialect of Canton]. Hong Kong.
Zhao, Z. Y. (1821). Yue’ou [Cantonese folklore].
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Lau, Cm. (2019). Building Cantonese Dictionaries Using Crowdsourcing Strategies: The words.hk Project. In: Tso, A.Wb. (eds) Digital Humanities and New Ways of Teaching. Digital Culture and Humanities, vol 1. Springer, Singapore. https://doi.org/10.1007/978-981-13-1277-9_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-1277-9_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1276-2
Online ISBN: 978-981-13-1277-9
eBook Packages: Social SciencesSocial Sciences (R0)