Responding to Natural Disasters on Twitter Social Media to Comparative Analysis of User Behavior and Geospatial Information Content in Indonesia and the United States

. Disasters are a series of events that threaten and disrupt human life and livelihoods, are caused by natural, unnatural, and human factors, with casualties, environmental damage, loss, and psychological consequences. Natural disasters have prompted Twitter social media users to upload information about the conditions of areas affected by natural disasters in pictures or only text of the disaster site. At that time, the researcher proposed to analyze the behavior of Twitter social media users towards education levels in Indonesia and the United States. This research provides an overview of the facts about the content shared on Twitter, and provides a solution for the extraction of geospatial content on Twitter. From the processing of data from Twitter social media, it can be seen that there is no difference in the relationship between Twitter social media users and education levels in Indonesia and the United States and more geospatial information content in the United States than in Indonesia.


Introduction
Disasters are a series of events that threaten and disrupt human life and livelihoods, are caused by natural, unnatural, and human factors, with casualties, environmental damage, loss, and psychological consequences [1].According to Indonesia Law No 24 Year 2007, "the National Disaster Management Agency (BNPB) divides disasters into three types, that are natural, unnatural, and social disasters.Natural disasters are a series of events caused by natural phenomena such as hurricanes, floods, earthquakes, tsunamis, volcanic eruptions, droughts and landslides.Unnatural disasters are a series of events caused by unnatural factors, such as failed technology, failed modernization, epidemics, and disease outbreaks.Social disaster is a series of events caused by human factors which include social conflict between community groups and terror" In many recent emergencies around the world, social media is the one of the most effective tools for emergency management and disaster relief, [2].Due to its ease of use and immediacy, Twitter, one of the social media platforms, is likely to provide emergency management information, allowing users to share information quickly and not be able to skip specific topics [3].
The occurrence of natural disasters encourages Twitter social media users to update the information about the conditions of areas affected by natural disasters in pictures or JITeCS Volume 8, Number 1, April 2023, pp [11][12][13][14][15][16][17][18][19][20] p-ISSN: 2540-9433; e-ISSN: 2540-9824 only text from the place of the disaster [4].Information triggers many responses from other users so that users determine their attitudes and make decisions to help disaster victims based on this information [5].
The earthquake and tsunami disaster in Palu, Central Sulawesi, Indonesia on September 28 2018 caused more than 2000 people to die [6].Many Twitter social media users have posted tweets about the condition of the community in the earthquakeaffected areas, prayers for victims of natural disasters, as well as assistance such as food, drink, clothing and shelter.Tweets are short messages, 280 characters long, incorporating well-defined geographic information provided by GPS or manual verification [7].
The study by Lansley & Longley (2016) entitled London Twitter Thematic Geography uses data from Twitter's social networks as an alternative source of public behavioral surveys that could be used to provide useful information to planners, marketers and researchers.This research classification identified 20 different group topics with their own meaning, representing keywords from Tweets, descriptions of informal activities and conversations between users.The motivation for this study was to use classification to show how the nature of content posted on Twitter changes based on location and user characteristics.
Negara, Andryani and Saksono (2016), in the study of geospatial information from Twitter on social networks, extracted and analyzed geospatial data from Twitter on a developing public problem and developed a method to obtain geospatial data in the software prototype of Twitter The extraction and analysis process is carried out through four stages, namely: data retrieval (tracking), storage (storage), analysis (analysis) and visualization process (visualization).This research is exploratory and focuses on the development of Twitter geospatial data analysis and extraction technology.
From previous studies, it only focuses on developed countries.However, in developing countries, there is still little research done to analyze the content of tweets related to geospatial information on developing countries.Geospatial data on Twitter social media can provide spatial information which is the location of the source of the emergence of public perception of an issue on social media that can be used by various parties so as to produce more useful information through the Twitter Data Analytics process [11].
The researcher proposed to analyze the behavior of Twitter social media users towards the level of education in both developed and developing countries.In which, the developed country is the United Stated and the developing country is Indonesia.Because, the level of education is different so why the researcher want to know the behavior of Twitter social media users towards the level of education.This research provides an overview of the facts about the content shared on Twitter, and provides a solution for the extraction of geospatial content on Twitter.
Based on the problem identification above, the problem formulations in this study are as follows: 1. What is the frequency with which users respond before and after natural disasters shared on Twitter social media to education levels in Indonesia and the United States?
2. What are the characteristics of the geospatial information content shared on Twitter social media at the time of the earthquake in Indonesia and the United States?
The limitations of the problems in this study are as follows: 1.For data collection, only a few hashtags and keywords have been trending in Indonesia and the United States on Twitter social media.
2. For the data sample, the time taken is September 23, 2018 to October 3, 2018 for natural disasters in Indonesia and July 1 -11, 2019 for natural disasters in the United

States.
The hypothesis in this study is: There is no difference in the frequency between users in responding before and after natural disasters that are shared on Twitter to the level of education in Indonesia and the United States.The alternative hypotheses in this study are: There is a difference in the frequency between users responding before and after natural disasters that are shared on Twitter to the level of education in Indonesia and the United States.

Case Studies and Research Datasets
The case studies in this research are Twitter social media users in Indonesia and the United States during a natural disaster.The case taken in Indonesia was the earthquake and tsunami disaster in Palu City, Central Sulawesi on September 28, 2018.Meanwhile, the case taken in the United States was the earthquake in California on July 4, 2019.
The dataset in this study is in the form of Tweets posted by users.However, these tweets only focus on a few words or hashtags taken from search engines.Tweet data that has been taken from Twitter social media will be classified where the tweet contains geospatial and non-geospatial content.
Dataset containing geospatial content is data from tweets containing information related to geospatial such as location of earthquake points, earthquake-affected areas, magnitude, maps, and geospatial related images or videos.

Research Method
This research method is structured so that this research process can be carried out in a systematic and planned manner.This research method is divided into several stages as shown in Figure 1.

Literature Review
The first stage in this research is to conduct a literature study.Literature study is carried out with various references obtained from sources such as books, scientific journals, articles and so on.The literature study focuses on the problem domain to be resolved, namely information related to the characteristics of Twitter social media users regarding natural disasters in the United States and Indonesia.In this process the researcher will get all the information related to the problem domain as a basis for carrying out the next stages in the research.

Hypothesis Analysis and Testing
Data retrieval will be done using Twitterscraper software.The tweet data that will be retrieved are tweets about natural disasters in Indonesia and the United States based on the keywords "Gempa", "Gempa Palu", "Disaster", "Earthquake", and "California Earthquake".This tweet data collection is based on 23 September 2018 to 3 October 2018 for natural disasters in Indonesia and 1 -11 July 2019 for natural disasters in the United States.
Twitter data needs to be cleaned up to ensure that text mining identifies valid and representative patterns of user emotions.The following tweets will be deleted: a) A tweet less than 3 words.b) A tweet from a user who have posted the same messages more than once because the user of the tweet may be a fake account.c) A tweet from users such as television media, newspapers, and scientists.d) A tweet that contains a link.e) A tweet that contains non-latin characters.Hypothesis testing in this study is using RStudio.The results from RStudio can be seen from a significant level by comparing with an error level of one (alpha).If the significant level is less than the error rate of one, the null hypothesis is rejected.However, if the level of significance is greater than the error rate of one, the null hypothesis is accepted.

Data Processing
The data processing in this study can be implemented by collecting tweets from Twitter in response to the earthquake on September 23, 2018 to October 3, 2018 in Palu City, Central Sulawesi, Indonesia and on 1 -11 July 2019 in the City of California, United States of America.The model consists of three main stages, which are the data collection, the data processing, and the earthquake location mapping.
At the data collection, all tweets containing the keywords "Earthquake Palu" and "California Earthquake" were collected from Twitter according to the specified time.Tweets that have been collected are cleaned of data, grouped into several categories according to keywords, then time grouping is carried out based on keyword categories.This can be used mainly to find out what Twitter users are posting and when they will post.Tweets containing information about the location and time of the earthquake based on reports from residents around the earthquake scene were also used to represent the points where the earthquake occurred.
The data collection in this study uses Python with tweepy and twitterscraper to get tweets from Twitter.Things that need to be considered in data collection are keywords, start date, end date, and output.The results taken on Twitter are in the form of a json file consisting of columns and rows.The lines in the file consist of tweets from Twitter social media users in 1 day based on the scraped keywords.The keywords are "Gempa" (in Indonesia Language) and "Earthquake."At the data processing stage in this study, the results of collecting tweets were cleaned at the data collection stage, grouping tweets by category, and grouping by time.The collected tweets are cleaned by deleting words with less than 3 characters, words with more than 16 characters, containing URLs, and non-latin characters.Furthermore, the tweets were cleaned by deleting tweets with less than 3 words, the users who had posted the same messages more than once, and the users such as television media, newspapers, and scientists.
Furthermore, data grouping is carried out from tweet data that has been processed from the previous stage.Then, the tweets are grouped by category.Then, an analysis of the frequency of Twitter social media users was carried out on the level of education in Indonesia and the United States.The frequency results are measured by the level of education in Indonesia and the United States.The results of the analysis will answer the formulation of the first problem and the hypothesis of this study.

Result and Discussion
Total results from Twitter social media scraping on 23 September 2018 to 3 October 2018 for natural disasters in Indonesia were found 16,475 tweets and 1 -11 July 2019 for natural disasters in the United States found 5,870 tweets.For tweets from the United States can be seen in Table 1 and tweets from Indonesia can be seen in Table 2. From these tweets, at the peak of the earthquake in Indonesia on September 28, 2018, there were 4,375 tweets.Meanwhile, at the height of the earthquake in the United States on July 6, 2019, it received 407 tweets.The results of these tweets have been cleaned up.Analysis of tweet data on Twitter social media in Indonesia and the United States provides clarity regarding comparisons in the two countries, although it is limited to the models, methods and data used.This discussion is carried out based on the results of tweet analysis at each stages of the model being developed.This discussion can provide a deeper explanation regarding the comparison of users between Indonesia and the United States based on the phases in the model, and the characteristics of each country.

Words Analyze
After the tweet data collection phase, the next phase is analyzing the words of the collected tweets.This word analysis provides an overview of Twitter social media users in Indonesia and the United States.However, the data used in this word analysis were only taken at the time of the earthquake, namely on July 6, 2019 in the United States and September 28, 2018 in Indonesia.
In Table 3, there are 524 words of "earthquake" from United States and 4740 words of "gempa" from Indonesia.This shows that Twitter social media users in Indonesia update using the word "gempa" more than the United States uses the word "earthquake".

Analysis of the Frequency of Twitter Social Media Users on Education Levels in Indonesia and the United States
In Figure 2, it can be seen from the histogram that the tweet data and education levels in Indonesia and the United States are not normally distributed.Furthermore, the test used in this data is the Spearman method correlation test using the R-Studio software.

Figure 1 Data Distribution
To calculate correlation value, we used R program.We calculated tweet data toward education level on each country.The formula on this paper, we use cor.test(x, y, method = "spearman") as implements on R Program.
The formula form is cor.test(DataTwitter,DataLevelofEducation, method = "spearman) The input of the formula in Indonesia is as follows >cor.test(indo$Tweet,indo$Pendidikan, method = "spearman") The output results in Indonesia obtained are as follows In this study, a correlation test analysis was carried out in which the dependent variable was the level of education and the independent variable was the number of tweets in Indonesia and the United States.The results of the correlation test of the Spearman method for data on Twitter social media users on the level of education in Indonesia resulted in a rho value of 0.009, and data on Twitter social media users on the level of education in the United States resulted in a rho value of 0.227.From this output, we can conclude that there is no relationship between education level and Twitter social media users in Indonesia and the United States.

Figure 2 Scatterplot
The results of the scatterplot of Twitter users in Indonesia and the United States towards the level of education in figure 3.However, it shows that the distance of the distribution points and the linear line are far apart.This means that the relationship between educational level variables and Twitter social media users is very weak.

Conclusion
The Twitter users in Indonesia and the United States are the frequency of users responding before and after natural disasters shared on Twitter social media, there is no difference to the level of education in Indonesia and the United States.Geospatial information content by Twitter social media users in America is more than that of Twitter social media users in Indonesia.
This study only uses Twitter social media in collecting data.For further research, maybe other social media such as Facebook, Instagram, Tumblr, or others can be used as a reference for getting information about natural disasters.The data collection method can also influence the results of the analysis so that there are other methods that may be more accurate.

Fig 1
Fig 1 Research Methodology

Spearman's rank correlation rho
data: indo$Tweet and indo$Pendidikan S = 218, p-value = 0.9892 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.009090909 The input of the formula in the United Stated is as follows >cor.test(ustweet$Tweet,ustweet$Pendidikan, method = "spearman") The output results in the United Stated obtained are as follows Spearman's rank correlation rho data: ustweet$Tweet and ustweet$Pendidikan S = 170, p-value = 0.5031 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.2272727"

Table 1 .
Number of Tweets from the United States