Comparison of Regression, Support Vector Regression (SVR), and SVR-Particle Swarm Optimization (PSO) for Rainfall Forecasting

. Rainfall is one of the factors that influence climate change in an area and is very difficult to predict, while rainfall information is very important for the community. Forecasting can be done using existing historical data with the help of mathematical computing in modeling. The Support Vector Regression (SVR) method is one method that can be used to predict non-linear rainfall data using a regression function. In calculations using the regression function, choosing the right SVR parameters is needed to produce forecasting with high accuracy. Particle Swarm Optimization (PSO) method is one method that can be used to optimize the parameters of the existing SVR method, so that it will produce SVR parameter values with high accuracy. Forecasting with rainfall data in Poncokusumo region using SVR-PSO has a performance evaluation value that refers to the value of Root Mean Square Error (RMSE). There are several Kernels that will be used in predicting rainfall using Regression, SVR, and SVR-PSO with Linear Kernels, Gaussian RBF Kernels, ANOVA RBF Kernels. The results of the performance evaluation values obtained by referring to the RMSE value for Regression is 56,098, SVR is 88,426, SVR-PSO method with Linear Kernel is 7.998, SVR-PSO method with Gaussian RBF Kernel is 27.172, and SVR-PSO method with ANOVA RBF Kernel is 2.193. Based on research that has been done, ANOVA RBF Kernel is a good Kernel on the SVR-PSO method for use in rainfall forecasting, because it has the best forecasting accuracy with the smallest RMSE value.


Introduction
Indonesia is an archipelago that has a climate diversity where it is often referred to as the El-Nino and La-Nina phenomena [1]. From this phenomenon, Indonesia often experiences problems with the intensity of extreme rain that is affected by rainfall in conditions above normal and can cause floods and landslides. Rainfall is one element of the weather and includes a meteorological process which is quite difficult to predict [2].
Information about rainfall is very important, such as in agriculture where rainfall can determine which plants are good for planting under certain rainfall conditions [3]. Potato plants are one of the plants that affect the level of rainfall. Erratic rainfall is an obstacle that must be faced because it has a negative impact on the productivity of existing potatoes [4]. Due to the importance of rainfall factors, the researchers developed rainfall prediction methods that have a high degree of accuracy than predictions that have been made in the past [5].
In forecasting, various methods and models that use statistics or artificial intelligence have been used, and some are a combination of the two existing models. There is research on rainfall forecasting using Multiple Linear Regression (MLP) and several classical methods such as SVM, ANN, and several other methods [6]. From these results, it was found that the MLP method got better results than other methods in predicting rainfall, with a Mean Absolute Error (MAE) value of 0.0833.
There are other studies that also predict rainfall using Bayesian Regression, Support Vector Regression, and Wavelet Regression. Where in this study with the SVR method the smallest RMSE value is 108.71 in rainfall forecasting [7]. There are other studies that use the Long Short-Term Memory (LSTM) and LSTM-PSO methods in predicting rainfall. Particle Swarm Optimization (PSO) is used to optimize the parameters of the LSTM method and the results show that the LSTM-PSO method is better than the classical LSTM method. RMSE value obtained by LSTM-PSO was 0.149 while the classical LSTM was 0.166 [8].
SVR is the application of Support Vector Machine (SVM) in the case of Regression, a Regression approach that has been widely applied in solving forecasting problems. SVR builds a hyperplane in high dimensional space and can precisely distinguish objects from Kernel functions in linear or nonlinear data. SVR is a method that can overcome overfitting, so that it will produce good performance [9]. In SVR method itself there are several Kernels that can be used including linear, Gaussian, Polynomial and many other Kernels. The SVR method can also be added with the use of optimization and one of the optimizations is by using Particle Swarm Optimization (PSO). Where the addition of PSO optimization can improve the accuracy of the forecasting done [10].
PSO is an algorithm developed by Kennedy and Eberhart (1995), PSO itself is another technique of computational evolution [11]. The PSO algorithm itself is an optimization algorithm that is often used to solve optimization problems so that it is still often developed [12]. PSO processes search schemes using particle populations that are in accordance with individual use in genetic algorithms, each particle is equivalent to the solution of an existing problem [13]. As research conducted by [14] applied the Particle Swarm Optimization (PSO) for parameter optimization the SVR method to predict stock value of tata steel.
From several studies discussed above, Regression and SVR methods are good enough to make predictions. So that in this study a comparison will be made to find which method is better if you use the same data in forecasting. The methods used in this research are Regression, SVM, and SVM-PSO, so that in this study it is expected to know the strengths and weaknesses of each method in rainfall forecasting.

Methodology
The methodology in this research is a sequence of existing forecasting processes. The forecasting process is carried out based on the theory associated with the existing forecasting steps. In conducting this forecasting process refers to the problems that exist in the process of forecasting that will be carried out. In Fig. 1 is the steps of the forecasting process used in this study.

Data Collection
Data collection is used to obtain information about rainfall data that will be used in this study. The data used is the result of observations from the Meteorology, Climatology and Geophysics Agency for Climatology, Karangploso Malang Station. The following is the data specification used in this study.
 Rainfall data used comes from the Poncokusumo area in 2000-2020.
 Rainfall data ranges from ten days.  Rainfall data in millimeters for ten days.

Method Selection
The forecasting process requires a method that can forecast well. There are two forecasting methods that can be used in conducting the forecasting process, namely quantitative forecasting methods and qualitative forecasting methods. Quantitative methods will be used in this study, because the data used are historical data with rainfall objects. In this study the SVR method is one of the quantitative methods chosen to conduct a test of forecasting the rainfall time in this study.

Support Vector Regression (SVR)
SVR is the application of Support Vector Machine (SVM) in the case of Regression, a Regression approach that has been widely applied in solving forecasting problems [15]. SVR builds a hyperplane in high dimensional space and can precisely distinguish objects from Kernel functions in linear or nonlinear data shown in equation 1. SVR is a method that can overcome overfitting, so that it will produce good performance [9].
( ) = predictive value = weight = input space feature = data = bias value, or bias also represented by = lambda 1 = feature space In order to obtain a decision function, the coefficient and b must be estimated from the data. First, by defining ε-insentive loss function ( , , ( )), shown in equation 2. ( ) = predictive value = flatness = actual value = data = loss value The decline in SVR follows the principle of structural risk minimization rooted in the VC dimension theory. With the use of variables slack and * can overcome the obstacles inability to convex optimization problems shown in equation 3.
Where: = weight ∈ ℜ = vector input = bias value, or bias also represented by = lambda = loss value = input space feature = complexity value = data value to -Formulation of functions ( , , , ′ ) very much in accordance with the principle of structural risk minimization. The first term

Particle Swarm Optimization (PSO)
PSO is an algorithm developed by Kennedy and Eberhart (1995), PSO itself is another technique of computational evolution. PSO algorithm itself is an optimization algorithm that is often used to solve optimization problems so that it is still often developed [12]. PSO processes the search scheme using particle populations according to individual use in the genetic algorithm shown in equation 9, each particle is equivalent to the solution to the existing problem.
( ) = 1 ( ), 2 ( ), … , ( ) (10) Where: = iteration value 1 to n = size of space dimensions = particle speed = particle index Each particle iteration will approach the herd that has the best position from the others. Individuals in a herd will learn from experience in finding the best position [16]. Each particle has the speed shown in equation 11.
X i (t) = V i (t) + X i (t − 1) (12) Where: = the best position = particle speed = particle index = iteration value 1 to n In equation 10 = 1 , 2 , … , is the best position ever passed (local best), and = 1 , 2 , … , the best position that all the herds have ever gone through (global best). 1 is a process in understanding the best position by individual (learning rate), whereas 2 is a process of understanding the best position of relationships between individuals. 1 and 2 an ordinary random value can be initialized with 0 to 1 [12].

Calculation of the SVR-PSO Method
In the calculation process with the SVR method there are 3 Kernels to be used. Kernel in SVR method functions as the most important process in the SVR method itself. Kernels to be used include the Linear Kernel, Gaussian RBF Kernel, and ANOVA RBF Kernel. From the use of the three Kernels that are done, one Kernel will be selected which results in the best forecasting value from the other Kernels. formulations and calculations are needed according to the theory. In Fig. 2 is the flow of the calculation process using the SVR-PSO method in forecasting the time of rainfall.

Test SVR-PSO Method
In this study, the implementation of a prototype-based software system that can forecast rainfall time using SVR-PSO method. So that it can be tested SVR-PSO method in forecasting the forecasting of the time of rainfall for the system that has been implemented. The following are the results of tests that have been carried out on the SVR-PSO method.

SVR Parameter Limit Test
The test of SVR parameter limits is intended to limit the particle dimensions in finding a solution so that it can produce a combination of SVR parameters that are optimal in the training process. Testing the SVR parameter limits include gamma (γ), lambda (λ), complexity (c), and epsilon (ɛ). The testing of SVR parameter limits is done in the SVR training process using rainfall data for ten days in a month from January 2000 to January 2020.

Complexity Parameter Limits Test (c)
The limit of test values used in parameter C testing consists of 1-10, 1 -100, 1 - 1000, 10 -100, 10 -1000 and 100 -1000. Each test is done five times. In Fig. 5 is the result of testing the boundary of the Complexity parameter (c) represented in graphical form.

Test the Number of SVR Iterations
To find out the right iteration in carrying out calculations using the SVR method, it is necessary to test existing iterations. The following is the SVR parameter limit used in carrying out the iteration testing process: a. Y parameter limit: 0.001-0.01 b. Limits of parameters λ: 0.01 -1 c. Parameter limit C: 1 -1000 d. Parameter limits ε: 0.000001 -0.00001 In Fig. 7 is the result of the test of the number of SVR iterations represented in graphical form.

Test PSO Particles
Testing the number of particles in the PSO method is used to find out how many particles are needed in the process of optimizing the SVR method parameters to get optimal results. Test the number of particles from the PSO method with the number of particles 5, 50, 100, 200, 250, 400, 550, 700, 850 and 1000 in the SVR training process uses rainfall data for ten days every month from January 2000 to December 2018. Each trial was conducted five times, in Fig. 8 is the result of testing the number of PSO particles represented in graphical form.

Test the Number of PSO Iterations
Testing the number of iterations in the PSO method is used to find out how many iterations are needed in the process of optimizing the SVR method parameters to get optimal results. The trial of the number of iterations from the PSO method with the number of iterations 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 in the SVR training process uses rainfall data for ten days every month from January 2000 to December 2018. Each trial was conducted five times, in Fig. 9 is the result of testing the number of PSO iterations represented in graphical form.

Rainfall Forecasting Trials
Forecasting trials conducted in this study use data for ten days and every month there are three rainfall data from January 2000 to January 2020. This forecasting trial consists of 7 methods namely multiple Linear Regression, SVR (Linear), SVR (Gaussian RBF), SVR (ANOVA RBF), SVR-PSO (Linear), SVR-PSO (Gaussian RBF), and SVR-PSO (ANOVA RBF). The following are the results of forecasting in table form which can be seen in Table 1. Based on the results of the forecasting trials carried out using 7 different methods, it was found that the RMSE value was cursed on SVR-PSO method (ANOVA RBF). RMSE value obtained from SVR-PSO method (ANOVA RBF) is 2.193 using forecasting data from the years 2000-2020. The data used in the forecasting of rainfall time is ten days data in January 2000 -2017 with the selected test data being in 2018, 2019, and 2020. In Figure 10 is the result of the trial of the forecasting of the time of ten days of rainfall in January with forecasting targets in 2018, 2019, and 2020 which are represented in graphical form.

Conclusion
The results of the study of rainfall forecasting by comparing the 3 methods can be concluded that the SVR-PSO gets the best results from other methods in terms of forecasting rainfall based on performance evaluation which refers to the RMSE value. The RMSE value obtained is the result of the comparison actual data with the existing forecasting data, smaller RMSE value obtained the better forecasting results obtained. Based on the results of tests that have been carried out using SVR-PSO method with Linear Kernel, Gaussian RBF Kernel, and ANOVA RBF Kernel, RMSE values are different for each Kernel. RMSE results were obtained in the process of forecasting the ten days of rainfall in January 2019 using Linear Kernel of 7.998, Gaussian RBF Kernel of 27.172, and ANOVA RBF Kernel of 2.193.