Optical Character Recognition Mobile App for Address Matching in Integrated Social Welfare Data Verification Process

. The Ministry of Social Affairs of the Republic of Indonesia has Integrated Social Welfare Data called Data Terpadu Kesejahteraan Sosial (DTKS) and uses it as a basis for the distribution of Social Fund Assistance, or Bantuan Sosial (BANSOS). The fact that occurred in the field was that there were many BANSOS recipients who were not impoverished and did not qualify to be the target of this program. One of the reasons is that there are weaknesses in the system that have the potential for data manipulation during the verification and validation processes. Therefore, a system improvement is needed to minimize the possibility of the data being manipulated. This study proposes a digital verification system using Optical Character Recognition (OCR) and reverse geocoding to make sure that the registrant provides their own citizen ID card and their own house address that meet the qualifications. These technologies in the developed mobile app perform address matching between address extracted from citizen ID card and address obtained from reverse geocoding. The results of this application trial achieved a success rate of 95.7%.


Introduction
The impoverished and homeless in Indonesia are managed by the state in accordance with the 1945 Constitution of the Republic of Indonesia by fulfilling their fundamental needs for human well-being.The regulation about managing the impoverished is regulated by Law Number 13 of 2011.This law states that the management of the impoverished can be done through community institutional empowerment, capacity enhancement of the poor to develop basic skills and abilities to do business, security and social protection to ensure a sense of security for the poor, partnerships and cooperation between stakeholders, and/or coordination between ministries/agencies and local governments.
In order to manage the impoverished, the Ministry of Social Affairs of the Republic of Indonesia operates a Social Fund Assistance program (BANSOS).This program is supported by the government in the form of money or goods, which are given to the recipient.The distribution of BANSOS is selective and not continuous [1].The recipient selection process is conducted by performing a series of procedures for data collection, verification, and validation that have been determined by the Ministry of Social Affairs of the Republic of Indonesia [2].
The data collection of the impoverished is organized by the institution that organizes government affairs in the field of statistics.Human resources in the field of social welfare in the sub-district or village will conduct a verification and validation process of the recorded data of the impoverished.The results of the verification and validation are reported to the regent or mayor, then submitted to the governor and forwarded to the minister [2].The data that has been verified and validated must be integrated and technology-based under the responsibility of the minister.This integrated data is used by relevant ministries or agencies as the basis for distributing BANSOS.This integrated data owned by the Ministry of Social Affairs is called Integrated Social Welfare Data (DTKS) [3].
The facts on the field are different from what should happen.Many recipients of BANSOS were identified as people who should not have been the target of the program.The Minister of Social Affairs found several cases, such as the registration of a BANSOS recipient who actually had a large house.There was also a case where the village head himself entered his name as a recipient of BANSOS [4].
The Social Affairs Department needs to strictly ensure that the data of DTKS registrants is the right target of the BANSOS program.This assurance can be made during the verification and validation processes.This study developed an application that helps the social affairs department during the verification stage of DTKS registrant data.The Ministry of Social Affairs actually has an application called the Social Welfare System Next Generation (SIKS-NG) to manage the Integrated Social Welfare Data (DTKS).However, verification and validation of DTKS using this application are still not optimal [5].The authorized officer inputs the required data through the SIKS-NG application, but the verification and validation processes are still performed manually.As a result, there is still the possibility of manipulating the data.
The existing DTKS verification system requires technological enhancements to reduce the possibility of data manipulation.This study develops an Optical Character Recognition (OCR) and reverse geocoding application to help the DTKS data input verification process by matching the address extracted from the Indonesian citizen ID card (KTP) with the address obtained from the reverse geocoding result at the time of taking the photo of the KTP.The OCR implemented in this application will help the officer from the department of social affairs input the registrant's personal information automatically by taking an image of the registrant's KTP.The reverse geocoding implemented in this application will retrieve the address of the device used to take KTP image during household visit.Then, the address extracted from KTP will be compared to the address retrieved from the reverse geocoding process.This address matching process is needed to ensure that the registrant provides their own KTP and their own house address that meet the qualifications.If the address from KTP extraction matches the address from reverse geocoding, then the registrant information is considered valid data input.If this condition is met, then the verification process will proceed to the next stage, which will not be discussed as it falls outside the scope of the research problem addressed in this paper.
This application was developed for mobile devices with Android OS.The reasons for choosing mobile as a platform for this application are portability and camera requirement.This application is intended to be used during household visits in the DTKS data input verification process and needs a camera attached to the device to capture images for OCR.Hence, a mobile device will be the most convenient to use.The results of this development are expected to help the Ministry of Social Affairs and related parties minimize the registration possibility of non-targeted BANSOS recipients.This study is organized into several sections.Section 2 elaborates on the related works to this study.Section 3 describes the research methodology.Section 4 discusses the results of the study.The last section concludes the research.

Related Works
This research uses OCR and reverse geocoding to perform address matching for the DTKS data input verification process.Some previous studies related to the topic of this study.One of these studies is verification and validation for two kinds of BANSOS, namely the Family Hope Program (PKH) and Non-Cash Food Support (BNPT), to check the registrant qualifications.The verification process in this previous study was done using the Social Affair Geographic Information System (SAGIS).This application was developed by the Ministry of Social Affairs of the Republic of Indonesia to collect the data of the registrants [6].This system has similar goals to the mobile application developed in this study.The difference is that data input and verification are performed manually.Meanwhile, in this study, the DTKS data input verification process is performed automatically using OCR and reverse geocoding.
A study implements reverse geocoding intended for site visit documentation.In this study, the application performs a reverse geocoding process to obtain the address and then attaches it to images taken during the site visit.The result of this process will ensure that the images were actually taken at the site location.This system is similar to the feature in the application developed in this study.The developed DTKS data input verification application will ensure that the KTP image is taken at the same location as the KTP owner's home.If the addresses match, then the data is considered verified and will proceed to the next verification step [7].
Another study with the case study of the Ministry of Social Integrated Welfare Data, also developed a mobile application for validating field data surveys.This previous study developed a mobile application for the overall DTKS data input verification process.The application developed can be divided into two major parts, namely the input system and the recommendation system.The input system in this previous study is enhanced by the mobile application developed in this study for the DTKS data input verification process.This previous study did not use Artificial Intelligence (AI) for verification.Whereas in the mobile application developed in this study, the OCR is involved in the DTKS data input verification process [8].

Research Method
The main objective of this research is to develop an OCR and reverse geocoding mobile app that helps the Ministry of Social Affairs with DTKS data input verification.The initial stage of this study was doing a literature review to support the research.The literature used in this study are related to DTKS, especially about the procedure of DTKS registration and verification.At this stage, the information collected is also related to the OCR and reverse geocoding library and their use in developing application.
The next stage is defining the system overview and limitations.The overview of this application is to perform address matching between address extracted from KTP and address obtained from the reverse geocoding results.The result of address matching is used to verify that the owner of the registered KTP data is the owner of the house visited during the household visit.The application development in this study is limited only to the Android operating system.In this stage, the user of this application is also defined, which is an officer from the department of social affairs who conducts household visits during the verification process.The overall system flowchart is shown in Figure 1.
To develop the mobile application in this study, system requirement analysis and system design are needed.This stage is conducted by doing an interview with the local Social Welfare Department.From the information gathered, the functional and non-functional requirements are defined.Functional requirements are system requirements regarding what activities will be performed by the system in general.Functional requirements include processes and information that must exist and be generated by the system [9].The functional requirements defined for the DTKS data input verification application are registering accounts, logging in, adding verification logs for registrant data, taking KTP images, extracting KTP, viewing KTP extraction results, reverse geocoding, matching the address from KTP extraction results and the address from reverse geocoding results, and saving data on address matching results.The non-functional requirements for this application are usability and compatibility.The system design in this study is done by creating activity diagrams, sequence diagrams, class diagram, and design testing.The activity diagram in this research is used to represent the flow of activities between actors, systems, and databases.The sequence diagrams of KTP extraction application and reverse geocoding in this study were used to represent the flow of interaction between classes involved in a task.The class diagram of the application in this study is used to define the classes that are involved in the system.Testing design in this research is done by testing the success of system processes, black box testing, SUS (System Usability Scale) testing, and compatibility testing.
The development of the mobile application in this study was done using Kotlin 1.5.31 as the programming language and Android Studio 4.2.1 as the IDE.The database used in this study is Firebase Realtime Database.Firebase Realtime is a cloud-based database developed by Google [10].There are four entities involved in the database implementation: users, KTP extraction result, reverse geocoding result, and address matching results.
The DTKS data input verification application in this study uses OCR technology and reverse geocoding.The OCR technology in this application was developed using the Text Recognition library from Google Vision [11].This library is used for extracting text from images [15], which in this study is KTP image.The KTP extraction process begins with getting the image bitmap.The Text Recognition library has a builder to build the recognizer object.After building the recognizer successfully, the developed application will build a frame object to be a container for the image that will be extracted.The recognizer will detect several text blocks in the frame.The text will be arranged by the string builder.The result of this extraction will then be saved to the database and passed to the KTP extraction result page of this application.The workflow of OCR in this application can be seen in Figure 2.
The reverse geocoding process in this application was developed using the LocationManager class and the Geocoder class from Kotlin.The LocationManager class will fetch the latitude and longitude of the device [12].Then the developed application will pass the latitude and longitude information to the Geocoder class.This class will transform the latitude and longitude information into address format [13].The address result from the reverse geocoding process will be saved to the database and passed to the reverse geocoding result page.The workflow of reverse geocoding in this application can be seen in Figure 3.The verification process is performed by matching the address from KTP with the address from the reverse geocoding process.The KTP extraction result contains information about the address.The application will retrieve the address information and put it into the KTP address variable.Meanwhile, in the address from reverse geocoding, the application will retrieve the information about the street name and house number, then put it into the reverse geocoding address variable.These two variables will be matched against each other, and the result will be shown on the verification result page of this application.

Result and Discussion
The OCR and reverse geocoding mobile application developed in this study are tested by several types of testing.The first test is about the success of the system's process.The testing of the KTP extraction process is conducted by comparing the address stated in KTP with the address from KTP image extraction.If the address shown in the KTP extraction result page has more than 85% similarity to the address written in KTP, then the process is considered successful.An example of the KTP extraction process is shown in Figure 4.The testing of the reverse geocoding process is conducted by comparing the current location address with the address from reverse geocoding.If the address from the reverse geocoding result has more than 85% similarity with the current location address, then the process is considered successful.An example of the reverse geocoding process is shown in Figure 5.The testing of address matching is conducted by comparing the address from KTP extraction with the address from reverse geocoding.If the address from KTP extraction has more than 85% similarity with the address from reverse geocoding, then the result page will display "Match".Otherwise, the page will display "Doesn't Match!".This process is considered successful if the address matching result page shows the correct result.An example of the address matching result page is shown in Figure 6.This testing process of this application was conducted by examining the success of KTP extraction, reverse geocoding, and address matching in 10 trials.The testing of the verification process is shown in Table 1.The testing code column represents the code used to identify each test conducted.The address in the KTP extraction result column represents the address shown on the KTP extraction result page.The address in the reverse geocoding result column represents the address shown on the reverse geocoding result page.The address matching result column represents the comparison result between the address from KTP extraction result and address from reverse geocoding.The status of the test will be considered success if the KTP extraction result has more than 85% similarity with address written in KTP, reverse geocoding result has more than 85% similarity with current location address, and address matching result page shows the correct result.100% System validation testing is done using the black box testing.This test is performed on each function that has been defined in the needs analysis.The result is that all nine functionalities of this application show output in accordance with the results that have been defined in the use case scenario.
The SUS questionnaire is used to measure the usability of the system.The SUS testing is conducted by asking 10 SUS questions to the respondents [14].In this study, there are five respondents who fill out a questionnaire related to the usability of the application.These respondents consist of 1 Social Welfare Department officer, 2 UI/UX experts, and 2 heads of neighborhood associations.Out of 5 respondents, only 3 respondents are valid.The other two respondent's results are considered invalid because the answers they gave to several questions are not in line with their own answers to the other questions.The final result of SUS testing of the application in this study is 92.5.The SUS result is shown in Table 3.
The last test of the application developed in this study is compatibility testing.This testing is conducted by executing the application on devices that have a different version of the operating system from the device used in development.The Android versions used in this test are Android 10.0, 11.0, and 11.0.The result is that the application can run well on those three versions of the Android operating system.The DTKS data input verification application developed in this study has been tested on several types of tests, namely process testing, system validation testing, usability testing, and compatibility testing.The application's successes and limitations can be identified through these tests.
The KTP extraction is executed by capturing the KTP image.The system successfully extracted the KTP image.The system also offers to retake the image if the captured KTP image is considered unclear.The application limitation in this function is that the user must crop the KTP image so the final image will only show the KTP owner's personal data information, without the KTP owner's picture.This should be done because the extraction result is best when the final image is only focused on the KTP owner's personal data information.The TextRecognizer library is used to extract text from KTP image.Overall, this library performs the extraction well, but sometimes there are some character recognition errors.For example, the letter I in KTP is recognized as the letter E, and the number 0 is recognized as the letter O.
The reverse geocoding function also runs well, but the result depends on Google Maps data.Sometimes there are differences between the location recognized by the GPS system and the location recognized by the locals.This makes the system limited in house number accuracy.So that the matching of addresses is done by taking the address from the ID card and comparing it with the address from the reverse geocoding result without the street number.

Conclusion
The OCR and reverse geocoding mobile application in this study was developed to help the Ministry of Social Affairs with the DTKS data input verification.The OCR technology is used in this application to extract DTKS registrant personal data information from KTP and automatically input the data to the application.The reverse geocoding process transforms the information about the current latitude and longitude from the device into address format.Then address from KTP extraction result and address from reverse geocoding are compared.If the addresses match, then the system will verify that the owner of the registered KTP data is the owner of the house visited during the household visit.These technological enhancements can reduce the possibility of data manipulation in the DTKS data input verification system.The overall success rate of the system process in this application is 95.7%.

Figure 6 .
Figure 6.Address matching result page if (a) the addresses doesn't match (b) the addresses match

Table 1 .
Application testing result