![]() Questions of location inference model transferability essentially emanates from the differences in tweet source distribution between geotagged and non-geotagged datasets, among other factors. Whilst there appears to be progress in accurately inferring Twitter locations, questions still arise as to the possibility to use the developed location inference models on a non-geotagged dataset, which is the ultimate goal of location inference methods. The current location inference methods are able to infer at most 83% of the tweet’s point of origin within a 50 km radius of the ground truth position. ![]() Location inference methods are often developed and tested on a geotagged dataset where the attached latitude and longitude pairs are used as ground truth data. Depending on the desired precision, researchers have inferred either Twitter user’s place of residence, message location, or the user’s position at the time of sending the tweet also known as the tweets’ point of origin. To alleviate the challenges brought by having lower percentages of geotagged posts, researchers have developed various methods to infer the tweets’ location. These rather low percentages of geotagged tweets limit the sample size used for spatial analysis thereby impacting the representativeness of the results. ![]() Of the millions of tweets generated, researchers have found the percentage of geotagged posts to range between 0.35% and 3% of the total tweets generated. One major drawback to the use of Twitter data is the rather small percentage of geotagged posts. Through these substantiated research, Twitter data has proven to be a reliable and valuable data source, evidenced by the results of the respective studies. Researchers have capitalised on the large, near real time and diverse nature of Twitter data to address fields such as urban planning, emergence response, damage assessments, movement patterns and other related fields. With over 500 million tweets generated every day, the platform contains a huge amount of data voluntarily generated across a diverse range of topics. Twitter is one of the most popular social media platforms. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.ĭata Availability: Serere, Helen Ngonidzashe, 2022, "Replication Data for: Analysing the performance of a location inference method on various Twitter source distribution",, Harvard Dataverse, V1.įunding: Initials of the authors who received each award: BR Grant number 878652 Funding Agency: Austria Research Promotion Agency (FFG) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Ĭompeting interests: The authors have declared that no competing interests exist. Received: OctoAccepted: FebruPublished: March 15, 2023Ĭopyright: © 2023 Serere et al. PLoS ONE 18(3):Įditor: Pierluigi Vellucci, Roma Tre University: Universita degli Studi Roma Tre, ITALY A comparative analysis of the influence of data selection. However, using the second dataset our precision values dropped to 45.3%, 73.1% and 81.0% for the same radius values.Ĭitation: Serere HN, Resch B, Havas CR (2023) Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. Using the first dataset we outweighed state-of-the-art location extraction models by inferring 61.9%, 86.1% and 92.1% of the extracted locations within 1 km, 10 km and 50 km radius values, respectively. Our results showed that the distribution of tweet sources influences the performance of location inference models. For the second dataset, we use a modelled distribution of tweet sources following a non-geotagged dataset. For the first dataset, we use a proportionate sample of tweet sources of a geotagged dataset. We investigate the influence of data selection by comparing the model performance on two datasets. This paper proposes a high precision location inference method for inferring tweets’ point of origin based on location mentions within the tweet text. Whilst a substantial number of location inference methods have been developed to date, questions arise pertaining the generalizability of the developed location inference models on a non-geotagged dataset. ![]() For validation of proposed approaches, these location inference methods are developed on a fully geotagged dataset on which the attached Global Navigation Satellite System coordinates are used as ground truth data. Twitter location inference methods are developed with the purpose of increasing the percentage of geotagged tweets by inferring locations on a non-geotagged dataset. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |