Business Prospects Prediction for Waqf Lands Using Naïve Bayes And Apriori Algorithm

Waqf is a donation activity of an own property for charity and the general welfare under sharia. The effective waqf empowerment in perspective economic changes the use of waqf from consumptive to productive. Lands are one form of waqf, and they are strategic assets for productive waqf empowerment. This research aims to build a classifier to predict waqf lands as 'productive' or 'not productive' assets for business prospects. The classification used Naïve Bayes with attributes summarised from administrative data of waqf lands. A modified Apriori algorithm was proposed as a new method to improve classification accuracy. A threshold value defined based on a mean value from the classification process by the Naïve Bayes was used to select classification results with a deviation of the posterior value. The value below was to be reclassified using the Apriori algorithm. The proposed method can improve prediction accuracy better than using only one Naïve Bayes classifier. Keyword: Apriori algorithm, business prospects, classification, Naïve Bayes, waqf lands.


Introduction
In many studies and literature, data mining (DM) and machine learning (ML) techniques have penetrated all areas, both in social science and computer science. The DM/ML technique has been widely used to assist in overcoming the challenges of the global socio-economic development of society [1]. Especially in developing countries, strategic use of DM/ML can assist increase knowledge of decision-making, systems and policies for socio-economic development and increase effectiveness to avoid financial crises [2]- [4].
One of the diligent currently being carried out by the Ministry of Religion is to empower strategic waqf lands to build the socio-economic life of the people [5]. Waqf can be defined as donating property ownership, such as land, which has economic potential and benefits for public welfare based on legislation/statutory regulations and sharia (Islamic religious rules) [6]. Waqf in the form of land is one of the most prominent types in Indonesia, with 421,289 locations with varying land areas up to 55,634.15 hectares [7] which is widespread throughout Indonesia. Among them, there are strategic lands with potential for economic development.
The Ministry of Religion has an operational system in which the registration of waqf property, especially land, is permanently recorded and documented in a waqf commitment (ikrar wakaf -in Bahasa), as database processed to produce waqf land reports which contain the total number and area. However, it is still difficult to predict

Related Work
A study comparative in waqf land for commercial use in Indonesia [13] concluded that waqf could function for commercial use as long as it supports public needs or the primary part of living. Another research on waqf land management categorizes waqf lands into the right investment category based on a strategic location in the Selangor region of Malaysia [14], [15]. The research was conducted by identifying the waqf land status and the utility of its investment models, such as mosques, graves, schools, charity foundations, and others. The highest percentage of all functions is used to utilize for a mosque. The waqf lands were analyzed based on their location utility to identify appropriate land utility changes if the improper was found. The results were followed by matching the waqf land categories with the proper investment model and looking for investors. This provided allocation resulted in waqf land assets that can be helped investors select types of funding projects for waqf land development.
Using a Bayesian Network with an Apriori algorithm in classifying problems has been conducted in several pieces of research, where the association rules were represented with the Bayesian Network [16], [17]. Like Vedula and Thatavarti (2011), there was a phase added by Xiao et al. (2016), in which the reuse of the association rules was conducted in the final phase. Association rules defined using the Apriori algorithm were represented with Bayesian Network [17]. Association rules were first conducted to select valuable rules as knowledge, and then the ability was sent to Bayesian Network for probability calculation. Finally, association rules were reused with the probability calculation result for decision-making.
In classifying problems, Naïve Bayes and apriori algorithms have also been conducted in several research types [18]- [21]. A combination between Naïve Bayes and apriori algorithm [20] to identify a context of text documents. Text documents classification was conducted using Naïve Bayes at first, and then the document context identification was performed using the Apriori algorithm. The method used by D'Angelo et al. combines the Naïve Bayes and Apriori algorithm to achieve similarity with humans in decision making. The Apriori algorithm extracts the patterns, and Naïve Bayes is used for the final decision-making based on the user's trust probability [18].
Apriori algorithm and Naïve Bayes were used for text categorization [19]. Apriori algorithm was used to define word set with occurrence frequency, and then the word set was sent to Naïve Bayes for probability calculation. Similar to [19], the Apriori algorithm was used [21] to define frequent itemsets used by Naïve Bayes to calculate the probability for classification. Naïve Bayes and apriori algorithms were used by [22] to build a classifier that can detect spam or ham in SMS. The idea was created based on every word treated as an independent attribute in Naïve Bayes. In contrast, the words were independent of each other in the case of spam or ham detection. Apriori algorithm was used to identify high-frequency words in two separated databases, ham and spam. Further, the classifier was run by considering the frequent value of words in ham and spam databases.
Another classifier was built using Naïve Bayes, and an Apriori algorithm was developed [23]. The classifier was to classify documents into predefined categories based on their contents. Apriori algorithm was used to identify the keywords of data which was then the class probability was measured by Naïve Bayes. The Author [24] sed Naïve Bayes and apriori algorithm to build a text classifier. Apriori algorithm was used to derive a word set, and then Naïve Bayes calculated the probability of the word set. A similar text classifier used Naïve Bayes, and an apriori algorithm was also developed [25]. Naïve Bayes was used to building a text classifier, and the apriori algorithm defined rules that create a content profile.
Before doing this work, the authors [8] have proposed waqf land designation classification to distinguish productive and not productive assets for business development. The classification technique only uses one algorithm, the Naïve Bayes. The crucial attributes of the waqf asset data are analyzed, then these attributes are used to distinguish productive assets from not productive assets. The experimental results show that the proposed method can achieve an accuracy of 91%. In this study, it is proposed to add a new method of Apriori algorithm to improve prediction accuracy in determining productive waqf land for business prospects. Bayes' theorem and its combination with association rules are expected to improve prediction accuracy, which is higher and better than the previous experiment. The most optimal predictions can assist in planning and empowering waqf land for the community's socio-economic development.

Proposed Method
Waqf assets are vacant land, agriculture, plantation lands, or land and buildings. In the long term, the allocation of waqf lands can be empowered as productive or not productive assets. Productive assets allocation includes schools, shopping centres, health centres, gas stations, agriculture, plantation, finance/ coop, workshop. The not productive assets allocation, including mosques, mushallah (prayer room), cemeteries, and vacant lands. This paper proposed Waqf land's business prospects classified into asset allocation productive or not productive using Naïve Bayes combined with apriori algorithm. We set a threshold value to divide the classification results by Naïve Bayes into an assumption of belief and confusion. If a result of a productive prior difference and not productive prior were under the threshold, it would be reclassified using the Apriori algorithm. In this paper, the research method is carried out in three stages: data collection, preprocessing classifications, and evaluation.

a. Data Collection and Pre-Processing
Data was collected from two religious affairs offices in Pedurungan, and East Semarang sub-districts, Semarang city, Center of Java, Indonesia. The data sample contained 159 waqf lands, including vacant land, land farms, and land with building contents that identified their origin and designation and matched current conditions. Predictions were built based on land value, so the waqf assets inland farms and lands with building contents were ruled out and considered lands. The selection of independent attributes was conducted by identifying the critical factors based on potential mapping of productive waqf assets that included legality, location, land size, population, type of Waqif, number of Nadzir, and Nadzir education [26]. The independent attributes are described in Table 1.

b. Classification Using Naïve Bayes
Bayes theorem is a prediction technique based on simple probabilistic using a fundamental statistic approach to pattern recognition. This approach is based on quantification trade-offs between various classification decisions using the probabilities and consequences generated in those decisions [8] [27]. One of the applications of the Bayes theorem in classification is Naïve Bayes. Naïve Bayes works based on the assumption of simplification that attribute values are conditionally independent. In Bayes's theorem, if there are two separate cases, for instance, case A and case B, consequently Bayes's theorem is formulated as follows: The Bayes theorem can be developed considering the validity of the total probability principle as follows: The classification process is required to determine which class is appropriate for sample analysis. Therefore, the Bayes theorem can be adjusted as follows: Variable C represents a class, while variable F1 ... Fn represents the characteristics of the instructions needed for classification. Opportunities of emergence samples that correspond to the characteristics in class C (posterior) is the opportunity of the emergence class C (prior) multiplied by an opportunity of appearance of the sample characteristic on class C (likelihood) and divided by the opportunity of the emergence of characteristics globally (evidence). So as the formula above can be simplified as follows: p-ISSN: 2540-9433; e-ISSN: 2540-9824 The current allocation of waqf assets data was used to label the sample data class as productive or not productive assets. Table 2 shows examples of a class label of waqf assets in the dataset with A1-A7 stands for independent attributes, current allocation, class label where 'P' stands for 'Productive', and 'NP' stands for 'Not Productive'. by selecting the largest as the class selected as the prediction results.
A classification of waqf lands as productive and not productive assets was built using Naïve Bayes and apriori algorithm. Naïve Bayes was used to determining decisions by assuming that object attributes are independent. A threshold value calculated based on an average of the most significant false and most minor true of posterior values was defined as a classification parameter for Naïve Bayes. Apriori algorithm was used when the classification process by Naïve Bayes resulting a deviation in the posterior value, which was under the threshold. Apriori algorithm was expected to improve classification accuracy by adding a filter to classify data formerly predicted as not productive assets by Naïve Bayes. Fig. 2 shows the classification model using the Naïve Bayes and Apriori algorithm.
The experiment of classification using Naïve Bayes was randomly divided into 80% for training data and 20% for testing data. The division of these numbers with consideration of optimization on the sample data. The dataset contained 159 lands, so there were 127 lands for training data and 32 for testing data. The training data included 70 lands identified as productive assets and 57 not productive ones. The testing data contained 20 lands identified as productive assets and 12 identified as not productive assets.
The first training phase calculated the probability of an independent attribute using data of 127 lands identified as either productive or not productive assets. The prior value for productive assets/P (productive) was 0.55, and not productive assets/NP (not productive) was 0.45. The probability of the characteristics emergence sample on each class (likelihood) was listed in Table 3.
Then, the posterior data can be calculated using formula P(A) ∏ P(Bi|A) q i=1 . For example, given a piece of information as follows:  Table 4 shows some of the classification results conducted for training data, and Table 5 shows for testing data with productive and not productive.     Table 6 shows an evaluation of classification results by Naïve Bayes.

c. Improving Classification Using Apriori Algorithm
Apriori is an algorithm that searches frequent itemsets by using the association rule technique [28]. Agrawal and Srikant proposed the algorithm to determine frequent itemset in the rules of boolean associations. This algorithm controls the development of candidate itemset from the regular itemset results with the support-based pruning to eliminate unappealing itemset [29] [30]. This algorithm is also defined as finding all Apriori rules that qualify for support and minimum confidence requirements. There are two main processes to configure the itemset candidate [31], joining in which each item is combined with other things so that no more combinations can be formed again and pruning in which the results of the items that have been integrated are then pruned by using the determined minimum support. Below is the formula to find the support value of an itemset X: While formula to find confidence, value is: In this phase, the classification process using the Naïve Bayes was extended by applying rules defined using the apriori algorithm. A threshold value is determined based on a mean value of the false classification's most significant posterior value. The smallest posterior value of the correct category was calculated after conducting a classification experiment using the Naïve Bayes. The Naïve Bayes' classification with a deviation of posterior value below the threshold value was reclassified using an apriori algorithm modified in defining rules.
The deviation of the most significant posterior value of the false classification results at 0.0039, and the smallest posterior value of the accurate classification results at 0.0015, then the threshold value was (0.0039 + 0.0015)/2 = 0.0028. There were 8 lands in data training below the threshold value, and 4 lands were the wrong prediction. While in data testing, 4 lands were below the threshold value, and 2 were the wrong predictions.
Rules defined by the apriori algorithm were expected to increase the prediction accuracy by keeping the 8 prediction results and correcting the wrong prediction results. Itemsets were determined based on seven independent attributes values (A1-A7) used in Naïve Bayes, and data of prediction results by Naïve Bayes were added as attributes and denoted as A8. So there were 15 items collected from seven independent attribute values and two items of productive and not productive used to define association rules as follows: items = {Certified, Not certified, Near the main road, Near public place, In the housing area, Very large, Large, Medium, Small, Very densely, Densely, Less densely, Not densely, Individual, Group, Legal entity, Nadzir <=3, Nadzir >3, High education, Low education, productive, not productive}.
Transaction data were collected from 159 records in the dataset. The rules were defined based on 8-itemsets representing 7 independent attributes values plus 1 attribute value of prediction result by Naïve Bayes.
By defining the minimum support = 1, all items were frequent (F1) as the results of 1-itemsets. Table 7 shows the effect of 1-itemsets. The next step was searching for 2-itemsets using itemsets of an attribute of A8 (productive and not productive) as the key to finding confidence values of 2-itemsets of attributes A1-A7. Table 8 shows 2-itemsets defined based on characteristics A1-A7 using itemsets of attribute A8 as the key.
The confidence values of 2-itemsets based on attributes were used to measure a record's information to reclassify results of classification by Naïve Bayes with a deviation of posterior value below the threshold 2-itemsets were summed up to find the mean value of the confidence. The mean value was then tested by a minimum confidence value to define whether Naïve Bayes' prediction was right or wrong.  The calculation of support and confidence value for 2-itemsets of each attribute was separately conducted. Table 9 shows examples of 2-itemsets calculation. Rules were defined based on the mean value of 2-itemsets of attributes A1-A7 with the minimum confidence set at 70%. The algorithm works by testing the prediction by Naïve Bayes. For an example, given information as follows: A1: Certified; A2: In the housing area; A3: Medium; A4: Densely; A5: Group; A6: >=3 A7: Low Education The information above was predicted as not productive by Naïve Bayes. The proposed apriori algorithm tested this prediction by finding the mean value of attributes A1-A7 confidence using the prediction result by Naïve Bayes. Table 10 shows the calculation of confidence means the value, which was at 51.1%. Since the mean value was lower than the minimum confidence value, which was 70%, the apriori algorithm convinced that the prediction result by Naïve Bayes was accurate. The rules were implemented to reclassify the classification results by Naïve Bayes with a deviation of posterior value below the threshold value. Table 11 shows the prediction results by the Apriori algorithm on training data and testing data. No., Id, Naïve Bayes Prediction, confidence mean value, T/F stands for true or false, reclassification by apriori, and marks of asterisks denote the wrong prediction by Naïve Bayes or the apriori.
Based on reclassification results of training data, rules defined by the apriori algorithm can keep all the right predictions by Naïve Bayes and correct three of four wrong predictions, which was Land with Id. 036, 039, and 041. Another wrong prediction, Id. 032 with productive class, was judged as accurate by apriori. While testing data, the apriori algorithm could also keep all the right predictions and correct one of two wrong predictions: Land with Id. 118. The accuracy of reclassification using the apriori algorithm was measured using a confusion matrix. Based on total classification results by Naïve Bayes, which was then reclassified by a modified apriori algorithm, results of classification for training data achieved accuracy at 98%. For testing, data achieved accuracy at 93% (Table 12).

Conclusion
The proposed method by implementing Naïve Bayes and Apriori algorithm to build a classifier can improve the accuracy of classification conducted by Naïve Bayes only. The Apriori algorithm's task was to reclassify the classification by Naïve Bayes assumed as confusion based on a threshold value. By identifying frequent items of lands attributes values and items of prediction results by Naïve Bayes, the modified apriori can keep right predictions and correct wrong predictions by Naïve Bayes. The accuracy value in the training process increased from 91% using Naïve Bayes alone to 98% by the Apriori algorithm, and i84% increased to 93% in testing. It is interesting to improve this proposed method to build a decision support system of allocation types of business prospects for lands empowerment for the following works.