Building Cantonese Dictionaries Using Crowdsourcing Strategies: The words.hk Project
The words.hk project is the first attempt to build a Cantonese-to-Cantonese dictionary using a lean start-up (see Ries, The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business, 2011) model combined with crowdsourcing strategies. The goal is to produce a comprehensive dictionary written for Cantonese and in Cantonese. Existing resources are often (1) not available electronically, (2) out of date, or (3) too Anglo- or Sino-centric. Building large data sets from these existing resources requires a lot of editing and ‘data-janitorial’ work, which can be done far better with a large group of less-experienced people than just a handful of experts, and crowdsourcing strategies are particularly appropriate in these cases. We started with a small team of editors and software developers in 2014. In less than 3 years’ time, we grew into an organisation with over 400 volunteers, gathered over 42,000 entries, of which more than 36,000 entries have been edited with Written Cantonese descriptions, examples, and translations as of June 2017. Given the nature of the project and the member composition – a language with no authority to fall back on and most members with no formal linguistics or lexicographical training – we adhere to two simple principles, in order to keep the dictionary growing without introducing major issues in the core data: ‘usage over etymology’ and ‘decision problem avoidance’. I will discuss how these principles have shaped the architecture of the project, the editing workflow, and other technological difficulties that we face.
KeywordsCantonese Dictionary compilation Crowdsourcing Usage over etymology Decision problem avoidance Open data
- Caau2. n.d. Retrieved June 25, 2017, from Words.hk http://beta.words.hk/zidin/%E7%82%92.
- Cantonese Wikipedia. n.d. Retrieved June 25, 2017, from Wikipedia https://en.wikipedia.org/wiki/Cantonese_Wikipedia.
- Chin, A. C.-O. (2018). Initiatives of digital humanities in Cantonese studies: A corpus of mid-20th century Hong Kong Cantonese. In K.-K. Tam (Ed.), Digital humanities and new ways of teaching. Singapore: Springer.Google Scholar
- Chishima, E. (2005). Tōhō Kantongo Jiten [Tōhō Cantonese dictionary]. Tōkyō: Tōhō Shoten.Google Scholar
- Cowles, R. (1965). Cantonese speaker’s dictionary. Hong Kong: Hong Kong University Press.Google Scholar
- Eitel, E. (1877). A Chinese dictionary in the Cantonese dialect. London: Trübner and Co. 57 & 59, Ludgate Hill and Hong Kong: Lane, Crawford & Co.Google Scholar
- Howe, J. (2006). The rise of crowdsourcing. Wired Magazine. Retrieved from http://sistemas-humano-computacionais.wikidot.com/local–files/capitulo:redessociais/Howe_The_Rise_of_Crowdsourcing.pdf.
- Huang, P. (1970). Cantonese dictionary: Cantonese-English, English-Cantonese. New Haven: Yale University Press.Google Scholar
- Hutton, C., & Bolton, K. (2005). A dictionary of Cantonese slang: The language of Hong Kong movies, street gangs and city life. Honolulu: University of Hawaii Press.Google Scholar
- Kong, Z. N. (1933). Guangdong Suyu Kao [Study on common sayings in Cantonese]. Guangzhou: Nanfang Fulunshe.Google Scholar
- Lau, S. (1977). A practical Cantonese-English dictionary. Hong Kong: Hong Kong Government Printer.Google Scholar
- Li, Y. M. F. (2011). Qingmo Minchu de Yueyu Shuxie [Cantonese writing in late Qing and early Republic of China]. Hong Kong: Joint Publishing (HK).Google Scholar
- Luke, K., & Wong, M. (2015). The Hong Kong Cantonese corpus: Design and uses. In B. K. Tsou & O. Y. Kwong (Eds.), JCL monograph series no. 25: Linguistic corpus and corpus linguistics in the Chinese context (pp. 312–333). Hong Kong: The Chinese University Press.Google Scholar
- Meyer, B., & Wempe, T. (1935). The student’s Cantonese-English dictionary. Unknown: St. Louis Industrial School Printing Press.Google Scholar
- Qu, D. J. (1678). Guangdong Xinyu [New words about Guangdong]. (n.p.)Google Scholar
- Rieder, B., & Röhle, T. (2012). Digital methods: Five challenges. In D. Berry (Ed.), Understanding digital humanities (pp. 67–84). London: Palgrave Macmillan.Google Scholar
- Ries, E. (2011). The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business.Google Scholar
- Snow, D. B. (2004). Cantonese as written language: The growth of a written Chinese vernacular. Hong Kong: Hong Kong University Press.Google Scholar
- Tang, S. W. (2015). Yueyu Yufa Jiangyi [Lectures on Cantonese grammar]. Hong Kong: Commercial Press.Google Scholar
- Wong, S. L. (1941). Yueyin Yunhui [A Chinese syllabary pronounced according to the dialect of Canton]. Hong Kong.Google Scholar
- Zhao, Z. Y. (1821). Yue’ou [Cantonese folklore].Google Scholar