An algorithm for local geoparsing of microtext

Gelernter, Judith; Balaji, Shilpa

doi:10.1007/s10707-012-0173-8

An algorithm for local geoparsing of microtext

Published: 27 January 2013

Volume 17, pages 635–667, (2013)
Cite this article

GeoInformatica Aims and scope Submit manuscript

Judith Gelernter¹ &
Shilpa Balaji¹

1951 Accesses
109 Citations
3 Altmetric
Explore all metrics

Abstract

The location of the author of a social media message is not invariably the same as the location that the author writes about in the message. In applications that mine these messages for information such as tracking news, political events or responding to disasters, it is the geographic content of the message rather than the location of the author that is important. To this end, we present a method to geo-parse the short, informal messages known as microtext. Our preliminary investigation has shown that many microtext messages contain place references that are abbreviated, misspelled, or highly localized. These references are missed by standard geo-parsers. Our geo-parser is built to find such references. It uses Natural Language Processing methods to identify references to streets and addresses, buildings and urban spaces, and toponyms, and place acronyms and abbreviations. It combines heuristics, open-source Named Entity Recognition software, and machine learning techniques. Our primary data consisted of Twitter messages sent immediately following the February 2011 earthquake in Christchurch, New Zealand. The algorithm identified location in the data sample, Twitter messages, giving an F statistic of 0.85 for streets, 0.86 for buildings, 0.96 for toponyms, and 0.88 for place abbreviations, with a combined average F of 0.90 for identifying places. The same data run through a geo-parsing standard, Yahoo! Placemaker, yielded an F statistic of zero for streets and buildings (because Placemaker is designed to find neither streets nor buildings), and an F of 0.67 for toponyms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geotagging Social Media Content with a Refined Language Modelling Approach

Detecting and Disambiguating Locations Mentioned in Twitter Messages

Location detection and disambiguation from twitter messages

Article 31 March 2017

Diana Inkpen, Ji Liu, … Diman Ghazi

Notes

These statistics date to February 2012, from http://www.ebizmba.com/articles/social-networking-websites
These are found at the following web addresses as of February 7, 2012: Yahoo Placemaker at http://developer.yahoo.com/geo/placemaker/, Metacarta geoparser at http://www.metacarta.com/products-platform-queryparser.htm; Drupal at http://geoparser.andrewl.net/, and the Unlock system at http://unlock.edina.ac.uk/texts/introduction.
http://thenextweb.com/socialmedia/2010/04/14/twitter-announces-annotations-add-metadata-tweet-starting-quarter-2/
Our data consists of about 300,000 tweets (1 out of every 1000 tweets of about 300,000,000 per hour) sampled from 1 h of tweets. The tweets were dated right after the earthquake. Takahashi, Abe, Igata, “Can Twitter be an alternative of real-world sensors” LNCS 6763, 2011, found that 0.6 % of tweets had GPS coordinates.
Twitter users developed their own indexing practices of using a “#” symbol, called a hashtag, to label tweets of a topic.
We would like to add time as representative of distance, since presently we miss the radius around San Bruno in a tweet like “about an hr and a half from San Bruno”
Illinois co-reference package: http://cogcomp.cs.illinois.edu/page/software_view/18; BART at http://www.bart-coref.org/
We use the dictionary that loads with every Linux operating system as a dictionary of the English language. We use a dictionary of abbreviations common to Twitter called the Twittonary, which we were granted permission to use in research. We refer also to some minor word lists, such as the buildings list from Wikipedia, and a list of saints’ names (to distinguish saints from streets) from http://www.catholic.org/saints/stindex.php
http://developer.gauner.org/jspellcorrect
http://en.wikipedia.org/wiki/list_of_building_types
U.S. airports are found in tweets. But they do not make good training data because U.S. airport abbreviations are forced into a 3-letter mold, and are not supposed to repeat around the country so that many do not follow customary abbreviations rules. For example, LAX stands for the Los Angeles, California airport, and EWR represents the Newark, New Jersey airport. We therefore avoided this sort of abbreviation for training the classifier.
http://www.catholic.org/saints/stindex.php
Part of speech tagger for Twitter by Noah Smith et al., is at http://www.ark.cs.cmu.edu/TweetNLP/
“Consensus decision-making” in Wikipedia, Retrieved July 24, 2012, from http://en.wikipedia.org/wiki/Consensus_decision-making
Kilem Gwet (2002). Kappa statistic is not satisfactory for assessing the extent of agreement between raters. Retrieved July 15, 2012 from http://www.agreestat.com/research_papers/kappa_statistic_is_not_satisfactory.pdf; Julius Sim, Chris Wright (2005). The kappa statistic in reliability studies: Use, interpretation and sample size requirements. Phys, Ther. 85(3):257–68.
Official New Zealand gazetteer of place names, at http://www.linz.govt.nz/placenames/find-names/nz-gazetteer-official-names as of January 31, 2012.
Write to gelern@cs.cmu.edu for use of the geo-tagged 2011 earthquake tweets from Christchurch, New Zealand, or the geo-tagged 2011 fire tweets from Austin, Texas.
We reported results of testing the second version of the algorithm at the high performance computing (XSEDE’12) conference in Chicago, Illinois, USA, this July 2012.
http://www.ark.cs.cmu.edu/TweetNLP/
These have been replaced in the next version of the algorithm that will be presented at the XSEDE’12 conference in July 2012
List of Saints’ Names: http://www.catholic.org/saints/stindex.php

References

Adriani M, Paramita ML (2007) Identifying location in Indonesian documents for geographic information retrieval. GIR’07, November 9, 2007, Lisbon, Portugal, pp 19–23
Ammar W, Darwish K, El Kahki, A, Hafez, K (2011) ICE-TEA: in-context expansion and translation of English abbreviations. In Gelbukh A (ed) CICLing 2011, Part II, LNCS 6609, pp 41–54
Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating Twitter users. CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada, pp 759–768
Dannélls D (2006) Automatic acronym recognition. Eleventh Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), April 3–7, Trento, Italy, pp 167–170
Eisenstein J, O’Connor B, Smith NA, Xing E (2010) A latent variable model for geographic lexical variation. In Proceedings of EMNLP, pp 1277–1287
Gelernter J, Mushegian N (2011) Geo-parsing messages from microtext. Transactions in GIS 15(6):753–773
Article Google Scholar
Hecht B, Hong L, Suh B, Chi EH (2011) Tweets from Justin Bieber’s Heart: the dynamics of the “location” field in user profiles, CHI 2011, May 7–12, 2011, Vancouver, BC, Canada, pp 237–246
Hill E, Fry ZP, Boyd H, Sridhara G, Novikova Y, Pollock L, Vijay-Shanker K (2008) AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools. MSR,’08, May 10–11, 2008, Leipzig, Germany, pp 79–88
Ireson N, Cirabegna F (2008) Toponym resolution in social media. PF Patel-Schneider et al. (eds.) ISWC 2010, Part I, LNCS 6496, pp 370–385
Jung JJ (2011) Towards named entity recognition method for microtexts in online social networks: a case study of Twitter. 2011 International Conference on Advances in Social Network Analysis and Mining (ASONAM), pp 563–564
Khanal N, Kehoe A, Kumar A, MacDonald A, Mueller M, Plaisant C, Ruecker S, Sinclair S Monk Tutorial: Metadata offers new knowledge. Retrieved January 31, 2012 from http://gautam.lis.illinois.edu/monkmiddleware/public/analytics/decisiontree.html
Kinsella S, Murdock V, O’Hare N (2011) “I’m eating a sandwich in Glasgow”: modelling locations with tweets. SMUC’11, October 28, 2011, Glasgow, Scotland, pp 61–68
Leveling J, Hartrumpf S (2008) On metonymy recognition for geographic IR. Int J Geogr Inf Sci 22(3), http://www.geo.uzh.ch/~rsp/gir06/papers/individual/leveling.pdf, accessed 12 January 2012
Lieberman MD, Samet H (2011) Multifaceted toponym recognition for streaming news. SIGIR’11. Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, July 2011, pp 843–852
Lieberman MD, Samet H, Sankaranarayanan J (2010) Geotagging with local lexicons to build indexes for textually-specified spatial data. IEEE 26th International Conference on Data Engineering (ICDE), pp 201–212
Liu J, Chen J, Liu T, Huang Y (2011) Expansion finding for given acronyms using conditional random fields. In: Wang H, et al. (eds) WAIM 2011, LNCS 6897, pp 191–200
Liu X, Zhang S, Wei F, Zhou M (2011) Recognizing named entities in tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland Oregon, June 19–24, pp 359–367
Liu Y, Piyawongwisal P, Handa S, Yu L, Xu Y, Samuel A (2011) Going beyond citizen data collection with mapster: a mobile+cloud real-time citizen science experiment. Seventh IEEE international conference on e-science workshops, pp 1–6
Marcus A, Bernstein MS, Badar O, Karger DR, Madden S, Miller RC (2011) Processing and visualizing the data in tweets. SIMOD Record 40(4):21–27
Google Scholar
McInnes BT, Pedersen T, Liu Y, Pakhomov SV, Melton GB (2011) Using second-order vectors in a knowledge-based method for acronym disambiguation. Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp 145–153
Moschitti A, Chu-Carroll J, Patwardhan S, Fan J, Riccardi G (2011) Using syntactic and semantic structural kernels for classifying definition questions in jeopardy! Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27–31, 2011, pp 712–724
Nadeau D, Turney PD (2005) A supervised learning approach to acronym identification. In: Kégl B, Lapalme G (eds) AI 2005, LNAI 3501, pp 319–329
Okazaki M, Matsuo Y (2009) Semantic Twitter: analyzing tweets for real-time event notification. In: Breslin JG et al. (eds) BlogTalk 2008/2009, LNCS 6045. Proceedings of the 2008/2009 international conference on social software. Springer, Heidelberg, 2010 pp 63–74
Okazaki N, Ananiadou S (2006) A term recognition approach to acronym recognition. Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp 643–650
Okazaki N, Ananiadou S, Tsujii J (2008) A discriminative alignment model for abbreviation recognition. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp 657–664
Paradesi S (2011) Geotagging tweets using their content. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, May 18–20, 2011, Florida, USA, pp 355–356
Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. Association for Computational Linguistics http://aclweb.org/anthology/W/W01/W01-0516.pdf, Retrieved January 3, 2012
Pennell D, Liu Y (2011) Toward text message normalization: modeling abbreviation generation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May, 2011, pp 5364–5367
Ponte J, Croft WB (1998) A language modeling approach to information retrieval. In Proceedings of SIGIR, pp 275–281
Ritter A, Clark S, Etzioni M, Etzioni O (2011) Named entity recognition in tweets: an experimental study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1524–1534
Roche M, Prince V (2007) AcroDef: a quality measure for discriminating expansions of ambiguous acronyms. In: Kokinov B et al. (eds) Context 2007, LNAI 4635, pp 441–424
Starbird K, Palen L, Hughes A, Vieweg S (2010) Chatter on the red: what hazards threat reveal about the social life of microblogged information. CSCW 2010, February 6–10, 2010, Savannah, Georgia, USA, pp 241–250
Taghva K, Vyas L (2011) Acronym expansion via Hidden Markov Models. 21st International Conference on Systems Engineering, 16–18 August 2011, pp 120–125
Takahashi K, Pramudiono Il, Kitsuregawa M (2005) Geo-word centric association rule mining. Proceedings of the sixth international conference on Mobile Data Management (MDM) 2005, Ayia Napa, Cyprus, pp 273–280
Tanasescu V, Domingue J (2008) A differential notion of place for local search. LocWeb 2008, April 22, 2008, Beijing, China, pp 9–15
Vanopstal K, Desmet B, Hoste V (2010) Towards a learning approach for abbreviation detection and resolution. LREC 2010, May 19–21, 2010, Valletta, Malta, pp 1043–1049
Vieweg S, Hughes AL, Starbird K, Palen L (2010) Microblogging during two natural hazards events: what Twitter may contribute to situational awareness. In: Proceedings of the 2010 Annual Conference on Human Factors in Computing Systems (CHI 2010), Atlanta, Georgia: pp 1079–1088
Watanabe K, Ochi M, Okabe M, Onai R (2011) Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. CIKM’11, October 24–28, 2011, Glasgow, Scotland, UK, pp 2541–2544
Wing BP, Baldridge J (2011) Simple supervised document geolocation with geodesic grids. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19–24, 2011, pp 955–964

Download references

Acknowledgements

We are grateful for the support of the Director of Language Technologies Institute, Jaime Carbonell. Our data was collected from an archive maintained by Brendan O’Connor at Carnegie Mellon University. Nikolai Mushegian, Corinne Meloni, Josh Swanson, Niharika Ray, Andrew Minton, Marielle Saums, and Christa Hester were among the tweet annotators who helped us arrive at a ground truth for scoring the results of our algorithm. Our open-source resources were supplemented by data from Abbreviations.com and from the online Twitter dictionary, Twittonary. Finally, we appreciate our discussions with doctoral candidate in statistics Cong Lu regarding intra-coder reliability.

Author information

Authors and Affiliations

Language Technologies Institute, #6416, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA, 15213, USA
Judith Gelernter & Shilpa Balaji

Authors

Judith Gelernter
View author publications
You can also search for this author in PubMed Google Scholar
Shilpa Balaji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Judith Gelernter.

Appendices

Appendix 1 Geoparsing for location abbreviations

1.1 Preparation steps

External resource preparation

Download and make available for processing:

Twittonary
Dictionary (supplemented with standard technical abbreviations such as tv and iphone)
Excerpt from National Geospatial-Intelligence Agency gazetteer for the domain (here, for New Zealand)
List of geographical features
List of building terms
List of prepositions
Location indicators for direction: (N S E W, NE, NW, north, northeast, northwest, etc.)
Location indicators for distance: (mi, mile, km, kilometer, etc.)

Data preparation

1.
Normalize tweet: tokenize (word per word), remove articles (a, an, the), and remove punctuation from beginning and end of word and at the end of a sentence.
2.
Part of speech processing for tweets using third-party, open source software^{Footnote 19}
3.
Find date and time of tweet creation

1.2 Processing steps

Identify place abbreviations and acronyms using heuristics that specify how to match external resources to data. Consider a text word to be a place abbreviation if it:

*
has between 2 and 6 characters and is not in Dictionary or Twittonary is preceded by preposition or direction or distance term
*
matches to a confirmed place abbreviation (that is, an abbreviation that matches the heuristic above
*
matches with an abbreviation known to be a place, either from location abbreviations in Abbreviations.com or from an earlier pass over the tweets

But skip as an abbreviation if it is preceded by # or @

Identify candidate disambiguation words

Data for candidate disambiguation words and phrases are drawn from tweets posted 1 to 5 days before the date of the tweet with the corresponding abbreviation. The disambiguation words and phrases are selected according to the following heuristics:

first word begins with the same letter as the abbreviation
if an abbreviation has n letters, take long multi-words of length n-2 words, n-1 words, and n words, and n with the addition of up to 2 stop words
no verbs
no hashtags or @mentions
no colon, semi-colon, question mark or exclamation point within the phrase

Attributes of abbreviations and acronyms and their expansions

Based on inspection of the data, we have arrived at the following attributes that characterize a match between location abbreviation or acronym, and candidate expansion word or phrase. We defined these attributes as:

Create a decision tree model based on attributes

Decision tree algorithms classify unseen data based on data they have previously learned. Given a set of attributes and how these attributes match to the correct classes (and mis-match to other classes), the algorithm will discover how unseen data are classified based on their attributes. The algorithm maximizes the information gain at each decision tree branch test, and makes a model based on the training data. Tests are decided upon during the training phase on the basis of entropy, or the measure of disorder of the data. The model can be considered as a branching set of decisions, or tests. Each test decision branches into a subtree until it reaches a leaf end node. Unseen data is sent through the tree by undergoing a series of binary tests until it reaches a leaf.

Weka’s J48 is a version of an earlier algorithm developed by J. Ross Quinlan, the popular C4.5. We select the J48 option to “prune”, or simplify results. Pruning operates on the decision tree while it is being induced. It works by compressing a parent node and child nodes into a single node whenever a split is made that yields a child leaf that represents less than a minimum number of examples from the data set. Pruning can be used as a tool to correct for potential overfitting, and so an unpruned tree might perform slightly better than a pruned one [11].

Our tree (Appendix 4) has many attributes, and therefore many nodes. The root node must effectively split the data, and the best split is the one that provides the most information gain. Each split attempts to pare down a set of instances (the actual data) until all have the same classification. Each node is tested to determine whether it has a particular value based on which attributes are represented. The data is then routed accordingly.

Given our training data comprised of abbreviations and acronyms and their correct and incorrect disambiguation words and phrases, the machine learning algorithm uses statistics to assign weights to the different features according to their importance. Our decision tree model with 52 leaves appears in Appendix 4.

Because our model is based on our training data as well as these attributes, another model made with different data would obtain different results. We received 87.9 % accuracy with this model, with a corresponding kappa of 0.748, indicating that the model performs much better than chance (which would produce a Kappa of zero).

Features that are significant in correctly classifying the abbreviation or acronym with its disambiguation expression are shown by their location near the root, and also their recurrence in the tree (possibly with different values in different branches). These are:

Rank abbreviation – expansion pairs

Run each abbreviation along with its potential match file through the decision tree. Weka outputs ranking along with true or false and a corresponding error prediction (1 indicates 100 % confidence that the abbreviation corresponds to the match expansion). We wrote a short script that sorts the true values according to error level. We take the top five disambiguation phrases for the sake of error analysis to see whether the correct disambiguation appears near the top, even if it was not selected first.

Appendix 2 Geoparsing streets, buildings and toponyms

User Input (theoretically; data was pre-collected for this study)

city and county of the data set, possibly also nearby countries
that country’s abbreviation and the abbreviation of nearby countries
- Ex. For our data set on the Christchurch, New Zealand earthquake, the user enters:
  - Christchurch (Chch)
  - New Zealand (NZ), and
  - Australia (Aus)

External resources:

Gazetteer excerpt for region of inquiry
List of common words that are also place names to filter the gazetteer (list was manually generated)
Enhanced building list (http://en.wikipedia.org/wiki/list_of_building_types) minus a few ambiguous words such as “wall” and “place”

Data (tweet) preparation

1.
Save the original tweets
2.
Make a copy of the tweets for NLP preparation.
3.
Remove hashtag that was used to retrieve the data set (and recurs repeatedly throughout) in copy of tweets, and tokenize.
4.
Remove @mentions and replace by XXX. (This is so that we can preserve the original word count).
5.
Remove tweets in the copy set of tweets that contain unicode characters
6.
Run the copy of tweets through spell-check algorithm. Retain words that are mis-spelled as well as those that are corrected, in case the spell check alters what is actually correct.

Data (tweet) processing

1.
Run original tweets through OpenCalais
1. a.
  Retain locations (city, country, continent, etc.)
2. b.
  Retain natural features (mountain, etc.)
3. c.
  Retain facilities
4. d.
  Ignore all other entities found by OpenCalais
2.
Find toponyms
1. a.
  Make all data lower case
2. b.
  Match against our own gazetteer
3. c.
  Do not include partial matches (Example: do not take Eiffel if it needs Tower)
4. d.
  Do not include toponyms in @mentions
5. e.
  Allow matches when a space is missing between the two words (newzealand)
3.
Find buildings
1. a.
  Look at each word in the tweet individually (tokenize)
2. b.
  Match against building list to find additional facilities
3. c.
  If building word is found, take two words before and capture the string as output. Also, hard code pairs such as “X and Y buildings”, “X or Y buildings” and multi-word building names such as “W X and Y Z buildings” ^{Footnote 20}
4. d.
  To filter non-specific buildings, we do not take those preceded by the article “a”, possessive pronouns “I, my, mine, our, his, hers, yours, theirs”, or relative pronouns “which, what” or demonstrative pronouns “this, that”
5. e.
  Do not identify as a building if words are found across punctuation mark of period, comma, semi-colon, brackets, parentheses
6. f.
  Do not identify as a building word, even if it matches an entity named on the buildings list, if it is the first word of a tweet.
7. g.
  Do not identify as a building if the building phrase contains a placeholder “XXX”
4.
Find streets within the tweets
1. a.
  Street identification words: st, street, ln, lane, dr, drive, boulevard, blvd, road, rd, avenue, ave, pl, way, wy
2. b.
  Check for an Arabic numeral two to three spaces before the street identification word. If a number is found, mine everything from the number to the street identification word. If there is no number, only take the word immediately before the street identification word. Take 3 words ahead of the street word (including a number) plus the street indicator
3. c.
  To filter possibility of “st” meaning saint instead of street, we use a list of saints’ names.^{Footnote 21} Matches with the list indicate that the phrase is not a street. Future versions of the algorithm should rely also on word order.
4. d.
  Do not identify as a street if street phrase is found across punctuation: period, comma, semi-colon, brackets, parentheses
  - Example “14 East. Street that is a dead end”
    
    Do not mine this as 14 East Street
5.
Output: tweet—location

Appendix 3 Samples of training data for the abbreviation and acronym classifier

Actual abbreviations and acronyms for location

Park:: Park Avenue
Lex:: Lexington Ave.
Sau Ar:: Saudi Arabia
Zimb:: Zimbabwe
Pac:: Pacific
Papua:: Papua New Guinea
N.A.:: North America
Dom Rep:: Dominican Republic
Mainz:: Mainz am Rhein
SG:: St. Gaullen
SG:: St. Gaul
Qnborough:: Queenborough
Pborough:: Peterborough
L.I. sound:: Long Island Sound
Hunt:: Huntington
N Bay Shore:: North Bay Shore
Jersey Shore:: New Jersey Shore
Ronk:: Ronkonkoma
Rio:: Rio de Janeiro
Kab:: Kabambare
Kago:: Kagoshima

T&T:: Trinidad and Tobago
TT:: Trinidad and Tobago
U.A.E.:: United Arab Emirates
W. Sam:: Western Samoa
Sol. Is.:: Solomon Islands
S.L.:: Sierra Leone
PNG:: Papua New Guinea
SOS:: Southend on Sea
EXM:: Exmouth Gulf Airport
H.H.:: Head of the Harbor
MTA:: Metropolitan Transport Authority
FI:: Fire Island
HH:: Hoek van Holland
Xmas Island:: Christmas Island
Congo:: Democratic Republic of Congo
D.R.C.:: Democratic republic of congo
TdF:: Tierra del Fuego
SP:: Sao Paulo

False Abbreviations & Acronyms

Park:: Parking lot
Lex:: Last expression
Sau Ar:: Sad argument
Zimb:: Zoo in my basement
Pac:: Plenty in cabinet
N.A.:: not available
SG:: Southern Georgia
SG:: so geography
SG:: several grains
L.I. sound:: lighter sound
Hunt:: Hunting
N Bay Shore:: not by the shore
Ronk:: Rings on new keys
Ronk:: Rubber on knives
Ronk:: Recalling our kids
Rio:: Ring in an oval
Rio:: Rinse in oil
Kab:: kneel and bend
Kab:: knots and bends
Kago:: kick and go under
T&T:: Trains and transportation
TT:: tractor trailer
U.A.E.:: Under All Empires
U.A.E.:: Under application employees
Sol. Is.:: Sole Ice
S.L.:: southern languages
PNG:: please not again
SOS:: signs of success
EXM:: expected money
FI:: Finally I
HH:: Hello harry
D.R.C.:: Daily Ritual Cleaning
DRC:: Dr. Classic
DRC:: Drab Rubber Chicken
TdF:: to do Friday
TdF:: Trumpet for December Festival
SP:: sudden park
SP:: spark
SP:: salt and pepper
SP:: sensational paper

Appendix 4 Classifier model for abbreviations and acronyms based on the full training set

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gelernter, J., Balaji, S. An algorithm for local geoparsing of microtext. Geoinformatica 17, 635–667 (2013). https://doi.org/10.1007/s10707-012-0173-8

Download citation

Received: 22 March 2012
Revised: 12 October 2012
Accepted: 05 November 2012
Published: 27 January 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10707-012-0173-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An algorithm for local geoparsing of microtext

Abstract

Access this article

Similar content being viewed by others

Geotagging Social Media Content with a Refined Language Modelling Approach

Detecting and Disambiguating Locations Mentioned in Twitter Messages

Location detection and disambiguation from twitter messages

Notes

References

Acknowledgements