Website Visitors Forecasting using Recurrent Neural Network Method

The number of visitors and content accessed by users on a site shows the performance of the site. In this study, using two data new visitors, and first time visitors of the journal website. The testing using new visitor data and first time visitors from 2018 to 2019 with vulnerable time per month. Therefore, forecasting needs to be done to find out how many users a website will come. This study applies the Long Short Term Memory (LSTM) method which is a development of the Recurrent Neural Network (RNN) method. LSTM has the advantage that there is an architecture of remembering and forgetting the output to be processed back into the input. In addition, the ability of another LSTM is to be able to maintain errors that occur when doing Backpropagation so that it does not allow errors to increase. This study compares two methods, namely LSTM and Backpropagation. Results of the mean square error (MSE) for the LSTM method at first time visits data is 0.0184, and for new visitor data is 0.0521. The Backpropagation result for first time visitor data is 0.1542, and for new visitor data is 0.1424. The computational experiment prove that the LSTM produces better result in term of the (MSE) comparable to those achieved by Backpropagation Neural Network method.


Introduction
In this modern era, the growth of internet usage is very high. Internet users are now mainly in the search for information very much. Every time an internet user visits a website will surely leave a mark on the website. These tracks can be collected, and used for various purposes such as tracking user behavior, recommending products to internet users when visiting the next website, and optimizing the usefulness of the website [4].
So far, forecasting is an approach commonly used to help humans in making decisions. The purpose of this research is to use forecasting to determine user behavior when visiting the website. Also, one of the factors that influence the research objectives is that information on the number of visitors to a website is needed [8].
The journal website is a useful tool for the requirements for advancement of each level of position for lecturers, researchers, teachers, lecturers, engineers, and other functional functions. There are several problems that journal managers have because they have not implemented online scientific journal management [11]. One of the components of journal accreditation is the number of visitors to the journal website. Therefore, forecasting is necessary to see how many users measure whether a promotion is necessary to increase the journal website visitors. The number of visitors to the journal website is one of the providers for journal accreditation [11].
One of the websites that is frequently visited is the official site of the Universitas Negeri Malang, which is one of the media providers of information for the academic community [15].Forecasting is an attempt to predict the situation in the future by testing the situation in the past. Forecasting problems usually use time-series data [10].
In this study, it focuses on forecasting journal website visitors. Using two visitor data. In this study, it was conducted by comparing two methods, namely LSTM and Backpropagation. The contribution of this study is to prove the best forecasting method seen from the MSE results.

Related Works
In forecasting research that used Neural Networks, there are many methods used in the following examples. In the research of Alfiyatin [2] and Oktanisa [12] discusses inflation forecasting in Indonesia. For E research using the method (ELM optimization with the PSO method), and for the I method (SVR optimization with the GA method) The results in research E forecasting with ELM and ELM optimized with PSO, the results are almost the same with the difference of error 0.0000019. The results in the research I forecast with SVR and SVR optimized with GA results better SVR optimized GA but requires a longer process than SVR. In the study, Meilia [9] discusses forecasting electricity consumption in Indonesia. The method used is the method (ELM with GA method optimization) The results of the EW study show that the ELM method can be GA-optimized to optimize the weighting of the ELM.
In a study, Sari's [14] research shows that the Backpropagation method compared to the Sugeno FIS method for forecasting, in Indonesia shows the forecasting results with Backpropagation better than Sugeno FIS with an RMSE of 0.204. In a study conducted by Haviluddin [6], that the accuracy of the network traffic activity prediction model can be optimized using GA on a multi-layer perceptron. Convergence and early permutation problems can be optimized using backpropagation on the GA operator. The results showed that the model prediction performance was superior to the traditional Multi-layer perceptron. Network traffic forecasting when using a combination of Backpropagation and GA that must be considered in the determination of the optimal Neural Network. In research by Havil and Dengen [5], it can be predicted that daily network traffic can be predicted. SARIMA, NARX, and BPNN models are used to test the performance of the prediction time series. The results showed that nonlinear time series modeling and complex predictive tasks can be used in all models.. Backpropagation results have better predictive accuracy than the SARIMA and NARX models measured using MSE. With the results of SARIMA error of 0.064190, NARX error of 0.006717, and BPNN error of 0.009424 [5].
From some previous studies, the use of the Multi-layer perceptron method is widely used for forecasting. One Neural Network algorithm that can also be used in forecasting is the Recurrent Neural Network. RNN is one of the neural network methods used for forecasting. The RNN method is very good for research because it can train using time series data [13]. This method is good for predicting data with time series data types. According to Berradi's research [3], the number of features is reduced using the PCA method, and then the RNN method is used to predict the Total Maroc share price from the Casablanca exchange.. MSE obtained by RNN with PCA is smaller than MSE obtained by RNN without PCA with results with PCA with MSE of 0.00596 and without PCA with MSE as of 0.011835 [3].
Based on previous research, the RNN method obtained quite good results in overcoming the problem of forecasting, so in this study, the RNN method was used to forecast website visitors. RNN can study dependencies between sequential data input or time series. The ability of sequential dependency learning makes the RNN method very popular and widely used. In this study using the Recurrent Neural Network method for forecasting website visitors. It is expected to get good results in the field of forecasting.

Backpropagation
In this study, the Backpropagation method is used as a comparison method. Because the Backpropagation method is often used in various fields. The Backpropagation learning algorithm is to minimize the error rate by means of adjusting the output difference and does it weigh based on the desired target. Backpropagation including a multilayer network which is a single layer of network development [14]. Backpropagation architecture can be seen in Fig. 1.

Perform calculations feedforward
First calculation where variabel zinj is signal entered hidden layer. Do calculation with variabel v0j is bias in hidden layers plus sigma n to i as 1 and multiply varbiabel xi is input consisting of neurons and variabel vij is weight on hidden layers.Each unit is hidden (zj, j = 1,2,3, ..., p) the weight of the input signal is added by Equation 1.
Second calculation where variable; zj is the result of the hidden layer activation function. Variable zinj is signal entered hidden layer. Use the activated sigmoid binary function to calculate the output signal from a hidden unit with Equation 2.
Third calculation where variable yink is input output signal, Variable w0k is weight bias to the output layer, The zj and wij variables are the result of the hidden layer activation function and the hidden layer weight, respectively. Where (Yk, k = 1, ..., m) will add the weighted input signals including the bias, with Equation 3. . .

Do the calculation of backpropagation
First calculation where variable δk is output layer correction factor. Variable tk is target data.Variable yk is training output. Based on the error in each unit of output calculate the factor δ unit of output (yk, k = 1,2, ..., n) with Equation 5.
Second calculation where variable ∆Wjk is factor changes in the output layer weight. The variables α and δk are the learning speed and output layer correction factor, respectively Variable zj is the result of the hidden layer activation function. Calculating the weight change factor Wjk will change the weight of Wjk with Equation 6.
Third calculation where variable ∆W0k is change factor of output layer bias. The variables α and δk are the learning speed and output layer correction factor, respectively. Calculates the change factor of the W0k bias which will change the W0k bias with Equation 7.
Fourth calculation where variable δinj is hidden weight delta. Variable δk is hidden layer correction factor. Variable wij is hidden layer weight to the output layer. Calculating delta weight of hidden units with Equation 8.
Fifth calculation where variable δj is hidden unit error correction factor. Variable δinj is hidden weight delta. Variable zj is hidden layer activation factor. Calculate hidden unit error correction factors with Equation 9.

=
(1 − ) Sixth calculation where variable ∆vjk is correction of hidden layer weights. Variable α is learning rate. Variable δj is hidden unit error correction factor. Variable xj is input value. Calculate the correction of hidden layer weights with Equation 10.
Seventh calculation where variable ∆v 0j is correction of hidden layer bias. Variable α is learning rate. Variable δj is hidden unit error correction factor. Equation 11 is used to calculate the hidden layer bias correction.

Calculate the weights and new bias
First calculation where the variables vij (new) and vij (old) are the new weight input layer for the hidden layer and the weight from the input layer to the hidden layer, respectively. Variable ∆vij is correction of hidden layer weights. Equation 12 is used to calculate the new weight from the input layer to the hidden layer.
Second calculation variables v0j (new) and v0j (old) respectively are only able to insert layer into hidden layer and bias from input layer to hidden layer. Variable ∆v0j is correction of hidden layer bias. Equation 13 is used to calculate the bias from the new input layer to the hidden layer.
Third calculation where variable wjk(new) and variable wjk(old) are the new weight from the hidden layer to the output layer and the old weight from the hidden layer to the output layer, respectively. The variable ∆wjk is the weight correction for the output layer. Equation 14 is used to calculate the new weight from the hidden layer to the output layer.
Fourth calculation where variable w0k (new) and variable w0k (old) are are biased to the output layer and the long hidden layer to the output layer, respectively. The variable ∆w0k is correction of the output layer bias. Equation 15 is used to calculate the bias of the new hidden layer to the output layer.

Recurrent Neural Network (RNN)
Recurrent Neural Network is a good Neural Network used for forecasting with time series data. Recurrent Neural Network neuron values in the previously hidden layer will be reused as input data. at its core (called a cell) a loop occurs. This means that the output of this cell will be the input again [1]. Recurrent Neural Network architecture can be seen in Fig. 2.  Figure 2 shows the input, recurrent hidden, and output layers that RNN has. N input units are the vector sequence through time t that is xt = (x1, x2, ..., xN). Meanwhile, the recurrent hidden layer is directly connected to the input layer. Where M the hidden layer units are ht = (h1, h2, ..., hM). In the RNN method, the output process will refer to the previous computation for each element sequentially. RNN has a memory that contains previously recorded information generated [13].
The RNN training process is very similar to training on ordinary neural networks. Using the Backpropagation Algorithm, with a slight twist. Because the parameters are shared equally (evenly) at each time step on the network, the gradient for each output depends not only on the calculation of the current time step but also on the previous time step [13]. The training process for the Recurrent Network is divided into three namely: 1. Forward propagation which involves the hidden state calculation process and activation functions, 2. Backward propagation to find the gradient value based on the loss function value of the forward propagation process,

Long Short Term Memory (LSTM)
In this research, the method used was LSTM as the proposed method. Because the LSTM method is one of the upgraded RNN methods. The RNN method used is LSTM where the development method of the RNN follows its explanation. Long Short Term Memory networks (LSTM) is an evolution of the RNN architecture, which was first introduced by Hochreiter & Schmidhuber (1997). Until this research was conducted many researchers continued to develop LSTM architecture in various fields such as speech recognition and forecasting [1].
LSTM uses memory cells and gate units to manage memory at each input. As explained in the previous section, to manage memory in each neuron, LSTM has a memory cell and a gate unit [8]. There are four processes of activation functions at each input to neurons, hereinafter referred to as gates units. Gates units are forgotten gates, input gates, cell gates, and output gates [7].

LSTM Training
Forget gates information on each input data will be processed and selected which data will be stored or discarded in memory cells. The output using the simoid activation function is a value of 1 then the data is stored and the value is 0 then the data is discarded.

Experiments and Results
Using UM journal website visitor data. There are 2 journals used in this study. First is the journal with the journal2.um.ac.id/index.php/keds link and the second is the journal link journal.um.ac.id/index.php/jptpp. With 4 input variables from the first journal, namely, pageview, session, visitor, and new visitor. With 4 input variables from the second journal namely page loads, unique visits, first time visits, and return visits. With vulnerable times per month.
This study will use the variable first time visits. The data used for the train is 80% and testing is 20% of the data used.
Data collection in this study was taken from the journal website of Malang State University. The research variable in this thesis is the website of the Malang State University journal. The results of this experiment are estimates of website visitors. Website visitor data shows how big the website visitors are from 2018 to 2019. Based on the data that will be used in this study will test the training of the two methods with the following parameters: In this study, the performance of the LSTM and Backpropagation methods for epoch 100 will be tested. Training will be tested 10 times because of the random value at the initialization of the weight value of the two methods. With Epoch 100 the MSE results from the LSTM and BP method can be seen in table 1. In table 1, you can see the MSE results from the LSTM and Backpropagation methods with 10 times of training. It can be seen that the LSTM method on the first time visit data from the 1st training got 0.0131 results, while the Backpropagation on the first time visit data from the 6th training got 0.0129 results showing Backpropagation is better than LSTM, but the LSTM method is better than Backpropagation in terms of an average of 10 training times.
It can be seen that the LSTM method on new visitor data from the 5th training got 0.0501 results, while Backpropagation on the first time visit data from the 7th training got 0.0828 results showing LSTM is better than Backpropagation.The results above show that MSE from LSTM is better than Backpropagation.
The LSTM result for first time visits data is 0.0184, and for new visitor data is 0.0521 and Backpropagation for time visitor data is 0.1542, and for new visitor data is 0.1424. It can be seen the results of the LSTM training method more accurate than Backpropagation.
With computation time 139ms for LSTM, and 150ms for Backropagation at first time visits data. Computation time 158ms for LSTM and 189ms for Backpropagation at new visitor data. For comparison of the computation the LSTM method performs computation more efficiently than Backpropagation.
From the comparison of the LSTM and Backpropagation methods on the data of the first time and new visitors, it can be seen from the LSTM method that the average MSE results from the second data are better than the Backpropagation method. In terms of computation time, the LSTM method is faster than the Backpropagation method. Therefore, the LSTM method is better than the Backpropagation method, because the LSTM training process uses two activation functions while Backpropagation is only one activation function.

Conclusions
This paper proposed the forecasting of journal website visitors using the LSTM and Backpropagation methods. By testing training 10 times using Epoch 100.
From the test results using the data of new visitors and first time visitors from 2018 to 2019 with vulnerable times per month. It can be seen the value of MSE from new visitors is better than first time visitors with MSE of 0.0184 for first time visitors, and for New Visitor data is 0.0521. And for the Backpropagation method the MSE value is First time visit 0.1542 , and for New Visitor data is 0.1424. It can be concluded that the LSTM method is better than the Backpropagation method From these results, it can be seen that the data used are still lacking results so it causes less than optimal results in the field of prediction. It is hoped that in the future it can use more data.