Skip to main content

Advertisement

Log in

An algorithm for local geoparsing of microtext

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The location of the author of a social media message is not invariably the same as the location that the author writes about in the message. In applications that mine these messages for information such as tracking news, political events or responding to disasters, it is the geographic content of the message rather than the location of the author that is important. To this end, we present a method to geo-parse the short, informal messages known as microtext. Our preliminary investigation has shown that many microtext messages contain place references that are abbreviated, misspelled, or highly localized. These references are missed by standard geo-parsers. Our geo-parser is built to find such references. It uses Natural Language Processing methods to identify references to streets and addresses, buildings and urban spaces, and toponyms, and place acronyms and abbreviations. It combines heuristics, open-source Named Entity Recognition software, and machine learning techniques. Our primary data consisted of Twitter messages sent immediately following the February 2011 earthquake in Christchurch, New Zealand. The algorithm identified location in the data sample, Twitter messages, giving an F statistic of 0.85 for streets, 0.86 for buildings, 0.96 for toponyms, and 0.88 for place abbreviations, with a combined average F of 0.90 for identifying places. The same data run through a geo-parsing standard, Yahoo! Placemaker, yielded an F statistic of zero for streets and buildings (because Placemaker is designed to find neither streets nor buildings), and an F of 0.67 for toponyms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. These statistics date to February 2012, from http://www.ebizmba.com/articles/social-networking-websites

  2. These are found at the following web addresses as of February 7, 2012: Yahoo Placemaker at http://developer.yahoo.com/geo/placemaker/, Metacarta geoparser at http://www.metacarta.com/products-platform-queryparser.htm; Drupal at http://geoparser.andrewl.net/, and the Unlock system at http://unlock.edina.ac.uk/texts/introduction.

  3. http://thenextweb.com/socialmedia/2010/04/14/twitter-announces-annotations-add-metadata-tweet-starting-quarter-2/

  4. Our data consists of about 300,000 tweets (1 out of every 1000 tweets of about 300,000,000 per hour) sampled from 1 h of tweets. The tweets were dated right after the earthquake. Takahashi, Abe, Igata, “Can Twitter be an alternative of real-world sensors” LNCS 6763, 2011, found that 0.6 % of tweets had GPS coordinates.

  5. Twitter users developed their own indexing practices of using a “#” symbol, called a hashtag, to label tweets of a topic.

  6. We would like to add time as representative of distance, since presently we miss the radius around San Bruno in a tweet like “about an hr and a half from San Bruno”

  7. Illinois co-reference package: http://cogcomp.cs.illinois.edu/page/software_view/18; BART at http://www.bart-coref.org/

  8. We use the dictionary that loads with every Linux operating system as a dictionary of the English language. We use a dictionary of abbreviations common to Twitter called the Twittonary, which we were granted permission to use in research. We refer also to some minor word lists, such as the buildings list from Wikipedia, and a list of saints’ names (to distinguish saints from streets) from http://www.catholic.org/saints/stindex.php

  9. http://developer.gauner.org/jspellcorrect

  10. http://en.wikipedia.org/wiki/list_of_building_types

  11. U.S. airports are found in tweets. But they do not make good training data because U.S. airport abbreviations are forced into a 3-letter mold, and are not supposed to repeat around the country so that many do not follow customary abbreviations rules. For example, LAX stands for the Los Angeles, California airport, and EWR represents the Newark, New Jersey airport. We therefore avoided this sort of abbreviation for training the classifier.

  12. http://www.catholic.org/saints/stindex.php

  13. Part of speech tagger for Twitter by Noah Smith et al., is at http://www.ark.cs.cmu.edu/TweetNLP/

  14. “Consensus decision-making” in Wikipedia, Retrieved July 24, 2012, from http://en.wikipedia.org/wiki/Consensus_decision-making

  15. Kilem Gwet (2002). Kappa statistic is not satisfactory for assessing the extent of agreement between raters. Retrieved July 15, 2012 from http://www.agreestat.com/research_papers/kappa_statistic_is_not_satisfactory.pdf; Julius Sim, Chris Wright (2005). The kappa statistic in reliability studies: Use, interpretation and sample size requirements. Phys, Ther. 85(3):257–68.

  16. Official New Zealand gazetteer of place names, at http://www.linz.govt.nz/placenames/find-names/nz-gazetteer-official-names as of January 31, 2012.

  17. Write to gelern@cs.cmu.edu for use of the geo-tagged 2011 earthquake tweets from Christchurch, New Zealand, or the geo-tagged 2011 fire tweets from Austin, Texas.

  18. We reported results of testing the second version of the algorithm at the high performance computing (XSEDE’12) conference in Chicago, Illinois, USA, this July 2012.

  19. http://www.ark.cs.cmu.edu/TweetNLP/

  20. These have been replaced in the next version of the algorithm that will be presented at the XSEDE’12 conference in July 2012

  21. List of Saints’ Names: http://www.catholic.org/saints/stindex.php

References

  1. Adriani M, Paramita ML (2007) Identifying location in Indonesian documents for geographic information retrieval. GIR’07, November 9, 2007, Lisbon, Portugal, pp 19–23

  2. Ammar W, Darwish K, El Kahki, A, Hafez, K (2011) ICE-TEA: in-context expansion and translation of English abbreviations. In Gelbukh A (ed) CICLing 2011, Part II, LNCS 6609, pp 41–54

  3. Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating Twitter users. CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada, pp 759–768

  4. Dannélls D (2006) Automatic acronym recognition. Eleventh Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), April 3–7, Trento, Italy, pp 167–170

  5. Eisenstein J, O’Connor B, Smith NA, Xing E (2010) A latent variable model for geographic lexical variation. In Proceedings of EMNLP, pp 1277–1287

  6. Gelernter J, Mushegian N (2011) Geo-parsing messages from microtext. Transactions in GIS 15(6):753–773

    Article  Google Scholar 

  7. Hecht B, Hong L, Suh B, Chi EH (2011) Tweets from Justin Bieber’s Heart: the dynamics of the “location” field in user profiles, CHI 2011, May 7–12, 2011, Vancouver, BC, Canada, pp 237–246

  8. Hill E, Fry ZP, Boyd H, Sridhara G, Novikova Y, Pollock L, Vijay-Shanker K (2008) AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools. MSR,’08, May 10–11, 2008, Leipzig, Germany, pp 79–88

  9. Ireson N, Cirabegna F (2008) Toponym resolution in social media. PF Patel-Schneider et al. (eds.) ISWC 2010, Part I, LNCS 6496, pp 370–385

  10. Jung JJ (2011) Towards named entity recognition method for microtexts in online social networks: a case study of Twitter. 2011 International Conference on Advances in Social Network Analysis and Mining (ASONAM), pp 563–564

  11. Khanal N, Kehoe A, Kumar A, MacDonald A, Mueller M, Plaisant C, Ruecker S, Sinclair S Monk Tutorial: Metadata offers new knowledge. Retrieved January 31, 2012 from http://gautam.lis.illinois.edu/monkmiddleware/public/analytics/decisiontree.html

  12. Kinsella S, Murdock V, O’Hare N (2011) “I’m eating a sandwich in Glasgow”: modelling locations with tweets. SMUC’11, October 28, 2011, Glasgow, Scotland, pp 61–68

  13. Leveling J, Hartrumpf S (2008) On metonymy recognition for geographic IR. Int J Geogr Inf Sci 22(3), http://www.geo.uzh.ch/~rsp/gir06/papers/individual/leveling.pdf, accessed 12 January 2012

  14. Lieberman MD, Samet H (2011) Multifaceted toponym recognition for streaming news. SIGIR’11. Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, July 2011, pp 843–852

  15. Lieberman MD, Samet H, Sankaranarayanan J (2010) Geotagging with local lexicons to build indexes for textually-specified spatial data. IEEE 26th International Conference on Data Engineering (ICDE), pp 201–212

  16. Liu J, Chen J, Liu T, Huang Y (2011) Expansion finding for given acronyms using conditional random fields. In: Wang H, et al. (eds) WAIM 2011, LNCS 6897, pp 191–200

  17. Liu X, Zhang S, Wei F, Zhou M (2011) Recognizing named entities in tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland Oregon, June 19–24, pp 359–367

  18. Liu Y, Piyawongwisal P, Handa S, Yu L, Xu Y, Samuel A (2011) Going beyond citizen data collection with mapster: a mobile+cloud real-time citizen science experiment. Seventh IEEE international conference on e-science workshops, pp 1–6

  19. Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Processing and visualizing the data in tweets. SIMOD Record 40(4):21–27

    Google Scholar 

  20. McInnes BT, Pedersen T, Liu Y, Pakhomov SV, Melton GB (2011) Using second-order vectors in a knowledge-based method for acronym disambiguation. Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp 145–153

  21. Moschitti A, Chu-Carroll J, Patwardhan S, Fan J, Riccardi G (2011) Using syntactic and semantic structural kernels for classifying definition questions in jeopardy! Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27–31, 2011, pp 712–724

  22. Nadeau D, Turney PD (2005) A supervised learning approach to acronym identification. In: Kégl B, Lapalme G (eds) AI 2005, LNAI 3501, pp 319–329

  23. Okazaki M, Matsuo Y (2009) Semantic Twitter: analyzing tweets for real-time event notification. In: Breslin JG et al. (eds) BlogTalk 2008/2009, LNCS 6045. Proceedings of the 2008/2009 international conference on social software. Springer, Heidelberg, 2010 pp 63–74

  24. Okazaki N, Ananiadou S (2006) A term recognition approach to acronym recognition. Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp 643–650

  25. Okazaki N, Ananiadou S, Tsujii J (2008) A discriminative alignment model for abbreviation recognition. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp 657–664

  26. Paradesi S (2011) Geotagging tweets using their content. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, May 18–20, 2011, Florida, USA, pp 355–356

  27. Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. Association for Computational Linguistics http://aclweb.org/anthology/W/W01/W01-0516.pdf, Retrieved January 3, 2012

  28. Pennell D, Liu Y (2011) Toward text message normalization: modeling abbreviation generation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May, 2011, pp 5364–5367

  29. Ponte J, Croft WB (1998) A language modeling approach to information retrieval. In Proceedings of SIGIR, pp 275–281

  30. Ritter A, Clark S, Etzioni M, Etzioni O (2011) Named entity recognition in tweets: an experimental study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1524–1534

  31. Roche M, Prince V (2007) AcroDef: a quality measure for discriminating expansions of ambiguous acronyms. In: Kokinov B et al. (eds) Context 2007, LNAI 4635, pp 441–424

  32. Starbird K, Palen L, Hughes A, Vieweg S (2010) Chatter on the red: what hazards threat reveal about the social life of microblogged information. CSCW 2010, February 6–10, 2010, Savannah, Georgia, USA, pp 241–250

  33. Taghva K, Vyas L (2011) Acronym expansion via Hidden Markov Models. 21st International Conference on Systems Engineering, 16–18 August 2011, pp 120–125

  34. Takahashi K, Pramudiono Il, Kitsuregawa M (2005) Geo-word centric association rule mining. Proceedings of the sixth international conference on Mobile Data Management (MDM) 2005, Ayia Napa, Cyprus, pp 273–280

  35. Tanasescu V, Domingue J (2008) A differential notion of place for local search. LocWeb 2008, April 22, 2008, Beijing, China, pp 9–15

  36. Vanopstal K, Desmet B, Hoste V (2010) Towards a learning approach for abbreviation detection and resolution. LREC 2010, May 19–21, 2010, Valletta, Malta, pp 1043–1049

  37. Vieweg S, Hughes AL, Starbird K, Palen L (2010) Microblogging during two natural hazards events: what Twitter may contribute to situational awareness. In: Proceedings of the 2010 Annual Conference on Human Factors in Computing Systems (CHI 2010), Atlanta, Georgia: pp 1079–1088

  38. Watanabe K, Ochi M, Okabe M, Onai R (2011) Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. CIKM’11, October 24–28, 2011, Glasgow, Scotland, UK, pp 2541–2544

  39. Wing BP, Baldridge J (2011) Simple supervised document geolocation with geodesic grids. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19–24, 2011, pp 955–964

Download references

Acknowledgements

We are grateful for the support of the Director of Language Technologies Institute, Jaime Carbonell. Our data was collected from an archive maintained by Brendan O’Connor at Carnegie Mellon University. Nikolai Mushegian, Corinne Meloni, Josh Swanson, Niharika Ray, Andrew Minton, Marielle Saums, and Christa Hester were among the tweet annotators who helped us arrive at a ground truth for scoring the results of our algorithm. Our open-source resources were supplemented by data from Abbreviations.com and from the online Twitter dictionary, Twittonary. Finally, we appreciate our discussions with doctoral candidate in statistics Cong Lu regarding intra-coder reliability.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Judith Gelernter.

Appendices

Appendix 1 Geoparsing for location abbreviations

1.1 Preparation steps

External resource preparation

Download and make available for processing:

  • Twittonary

  • Dictionary (supplemented with standard technical abbreviations such as tv and iphone)

  • Excerpt from National Geospatial-Intelligence Agency gazetteer for the domain (here, for New Zealand)

  • List of geographical features

  • List of building terms

  • List of prepositions

  • Location indicators for direction: (N S E W, NE, NW, north, northeast, northwest, etc.)

  • Location indicators for distance: (mi, mile, km, kilometer, etc.)

Data preparation

  1. 1.

    Normalize tweet: tokenize (word per word), remove articles (a, an, the), and remove punctuation from beginning and end of word and at the end of a sentence.

  2. 2.

    Part of speech processing for tweets using third-party, open source softwareFootnote 19

  3. 3.

    Find date and time of tweet creation

1.2 Processing steps

Identify place abbreviations and acronyms using heuristics that specify how to match external resources to data. Consider a text word to be a place abbreviation if it:

  1. *

    has between 2 and 6 characters and is not in Dictionary or Twittonary is preceded by preposition or direction or distance term

  2. *

    matches to a confirmed place abbreviation (that is, an abbreviation that matches the heuristic above

  3. *

    matches with an abbreviation known to be a place, either from location abbreviations in Abbreviations.com or from an earlier pass over the tweets

But skip as an abbreviation if it is preceded by # or @

Identify candidate disambiguation words

Data for candidate disambiguation words and phrases are drawn from tweets posted 1 to 5 days before the date of the tweet with the corresponding abbreviation. The disambiguation words and phrases are selected according to the following heuristics:

  • first word begins with the same letter as the abbreviation

  • if an abbreviation has n letters, take long multi-words of length n-2 words, n-1 words, and n words, and n with the addition of up to 2 stop words

  • no verbs

  • no hashtags or @mentions

  • no colon, semi-colon, question mark or exclamation point within the phrase

Attributes of abbreviations and acronyms and their expansions

Based on inspection of the data, we have arrived at the following attributes that characterize a match between location abbreviation or acronym, and candidate expansion word or phrase. We defined these attributes as:

figure e

Create a decision tree model based on attributes

Decision tree algorithms classify unseen data based on data they have previously learned. Given a set of attributes and how these attributes match to the correct classes (and mis-match to other classes), the algorithm will discover how unseen data are classified based on their attributes. The algorithm maximizes the information gain at each decision tree branch test, and makes a model based on the training data. Tests are decided upon during the training phase on the basis of entropy, or the measure of disorder of the data. The model can be considered as a branching set of decisions, or tests. Each test decision branches into a subtree until it reaches a leaf end node. Unseen data is sent through the tree by undergoing a series of binary tests until it reaches a leaf.

Weka’s J48 is a version of an earlier algorithm developed by J. Ross Quinlan, the popular C4.5. We select the J48 option to “prune”, or simplify results. Pruning operates on the decision tree while it is being induced. It works by compressing a parent node and child nodes into a single node whenever a split is made that yields a child leaf that represents less than a minimum number of examples from the data set. Pruning can be used as a tool to correct for potential overfitting, and so an unpruned tree might perform slightly better than a pruned one [11].

Our tree (Appendix 4) has many attributes, and therefore many nodes. The root node must effectively split the data, and the best split is the one that provides the most information gain. Each split attempts to pare down a set of instances (the actual data) until all have the same classification. Each node is tested to determine whether it has a particular value based on which attributes are represented. The data is then routed accordingly.

Given our training data comprised of abbreviations and acronyms and their correct and incorrect disambiguation words and phrases, the machine learning algorithm uses statistics to assign weights to the different features according to their importance. Our decision tree model with 52 leaves appears in Appendix 4.

Because our model is based on our training data as well as these attributes, another model made with different data would obtain different results. We received 87.9 % accuracy with this model, with a corresponding kappa of 0.748, indicating that the model performs much better than chance (which would produce a Kappa of zero).

Features that are significant in correctly classifying the abbreviation or acronym with its disambiguation expression are shown by their location near the root, and also their recurrence in the tree (possibly with different values in different branches). These are:

figure f

Rank abbreviation – expansion pairs

Run each abbreviation along with its potential match file through the decision tree. Weka outputs ranking along with true or false and a corresponding error prediction (1 indicates 100 % confidence that the abbreviation corresponds to the match expansion). We wrote a short script that sorts the true values according to error level. We take the top five disambiguation phrases for the sake of error analysis to see whether the correct disambiguation appears near the top, even if it was not selected first.

Appendix 2 Geoparsing streets, buildings and toponyms

User Input (theoretically; data was pre-collected for this study)

  • city and county of the data set, possibly also nearby countries

  • that country’s abbreviation and the abbreviation of nearby countries

    • Ex. For our data set on the Christchurch, New Zealand earthquake, the user enters:

      • Christchurch (Chch)

      • New Zealand (NZ), and

      • Australia (Aus)

External resources:

  • Gazetteer excerpt for region of inquiry

  • List of common words that are also place names to filter the gazetteer (list was manually generated)

  • Enhanced building list (http://en.wikipedia.org/wiki/list_of_building_types) minus a few ambiguous words such as “wall” and “place”

Data (tweet) preparation

  1. 1.

    Save the original tweets

  2. 2.

    Make a copy of the tweets for NLP preparation.

  3. 3.

    Remove hashtag that was used to retrieve the data set (and recurs repeatedly throughout) in copy of tweets, and tokenize.

  4. 4.

    Remove @mentions and replace by XXX. (This is so that we can preserve the original word count).

  5. 5.

    Remove tweets in the copy set of tweets that contain unicode characters

  6. 6.

    Run the copy of tweets through spell-check algorithm. Retain words that are mis-spelled as well as those that are corrected, in case the spell check alters what is actually correct.

Data (tweet) processing

  1. 1.

    Run original tweets through OpenCalais

    1. a.

      Retain locations (city, country, continent, etc.)

    2. b.

      Retain natural features (mountain, etc.)

    3. c.

      Retain facilities

    4. d.

      Ignore all other entities found by OpenCalais

  2. 2.

    Find toponyms

    1. a.

      Make all data lower case

    2. b.

      Match against our own gazetteer

    3. c.

      Do not include partial matches (Example: do not take Eiffel if it needs Tower)

    4. d.

      Do not include toponyms in @mentions

    5. e.

      Allow matches when a space is missing between the two words (newzealand)

  3. 3.

    Find buildings

    1. a.

      Look at each word in the tweet individually (tokenize)

    2. b.

      Match against building list to find additional facilities

    3. c.

      If building word is found, take two words before and capture the string as output. Also, hard code pairs such as “X and Y buildings”, “X or Y buildings” and multi-word building names such as “W X and Y Z buildings” Footnote 20

    4. d.

      To filter non-specific buildings, we do not take those preceded by the article “a”, possessive pronouns “I, my, mine, our, his, hers, yours, theirs”, or relative pronouns “which, what” or demonstrative pronouns “this, that”

    5. e.

      Do not identify as a building if words are found across punctuation mark of period, comma, semi-colon, brackets, parentheses

    6. f.

      Do not identify as a building word, even if it matches an entity named on the buildings list, if it is the first word of a tweet.

    7. g.

      Do not identify as a building if the building phrase contains a placeholder “XXX”

  4. 4.

    Find streets within the tweets

    1. a.

      Street identification words: st, street, ln, lane, dr, drive, boulevard, blvd, road, rd, avenue, ave, pl, way, wy

    2. b.

      Check for an Arabic numeral two to three spaces before the street identification word. If a number is found, mine everything from the number to the street identification word. If there is no number, only take the word immediately before the street identification word. Take 3 words ahead of the street word (including a number) plus the street indicator

    3. c.

      To filter possibility of “st” meaning saint instead of street, we use a list of saints’ names.Footnote 21 Matches with the list indicate that the phrase is not a street. Future versions of the algorithm should rely also on word order.

    4. d.

      Do not identify as a street if street phrase is found across punctuation: period, comma, semi-colon, brackets, parentheses

      • Example “14 East. Street that is a dead end”

        Do not mine this as 14 East Street

  5. 5.

    Output: tweet—location

Appendix 3 Samples of training data for the abbreviation and acronym classifier

Actual abbreviations and acronyms for location

Park:

Park Avenue

Lex:

Lexington Ave.

Sau Ar:

Saudi Arabia

Zimb:

Zimbabwe

Pac:

Pacific

Papua:

Papua New Guinea

N.A.:

North America

Dom Rep:

Dominican Republic

Mainz:

Mainz am Rhein

SG:

St. Gaullen

SG:

St. Gaul

Qnborough:

Queenborough

Pborough:

Peterborough

L.I. sound:

Long Island Sound

Hunt:

Huntington

N Bay Shore:

North Bay Shore

Jersey Shore:

New Jersey Shore

Ronk:

Ronkonkoma

Rio:

Rio de Janeiro

Kab:

Kabambare

Kago:

Kagoshima

T&T:

Trinidad and Tobago

TT:

Trinidad and Tobago

U.A.E.:

United Arab Emirates

W. Sam:

Western Samoa

Sol. Is.:

Solomon Islands

S.L.:

Sierra Leone

PNG:

Papua New Guinea

SOS:

Southend on Sea

EXM:

Exmouth Gulf Airport

H.H.:

Head of the Harbor

MTA:

Metropolitan Transport Authority

FI:

Fire Island

HH:

Hoek van Holland

Xmas Island:

Christmas Island

Congo:

Democratic Republic of Congo

D.R.C.:

Democratic republic of congo

TdF:

Tierra del Fuego

SP:

Sao Paulo

False Abbreviations & Acronyms

Park:

Parking lot

Lex:

Last expression

Sau Ar:

Sad argument

Zimb:

Zoo in my basement

Pac:

Plenty in cabinet

N.A.:

not available

SG:

Southern Georgia

SG:

so geography

SG:

several grains

L.I. sound:

lighter sound

Hunt:

Hunting

N Bay Shore:

not by the shore

Ronk:

Rings on new keys

Ronk:

Rubber on knives

Ronk:

Recalling our kids

Rio:

Ring in an oval

Rio:

Rinse in oil

Kab:

kneel and bend

Kab:

knots and bends

Kago:

kick and go under

T&T:

Trains and transportation

TT:

tractor trailer

U.A.E.:

Under All Empires

U.A.E.:

Under application employees

Sol. Is.:

Sole Ice

S.L.:

southern languages

PNG:

please not again

SOS:

signs of success

EXM:

expected money

FI:

Finally I

HH:

Hello harry

D.R.C.:

Daily Ritual Cleaning

DRC:

Dr. Classic

DRC:

Drab Rubber Chicken

TdF:

to do Friday

TdF:

Trumpet for December Festival

SP:

sudden park

SP:

spark

SP:

salt and pepper

SP:

sensational paper

Appendix 4 Classifier model for abbreviations and acronyms based on the full training set

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gelernter, J., Balaji, S. An algorithm for local geoparsing of microtext. Geoinformatica 17, 635–667 (2013). https://doi.org/10.1007/s10707-012-0173-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-012-0173-8

Keywords

Navigation