PREDICTION OF STEM WEIGHT IN SELECTED ALFALFA VARIETIES BY ARTIFICIAL NEURAL NETWORKS, MULTIVARIATE ADAPTIVE REGRESSION SPLINES AND MULTIPLE REGRESSION ANALYSIS

In this study, artificial neural networks (ANNs), Multivariate Adaptive Regression Splines (MARS) algorithm and multiple regression analysis (MLR) were used for plant stem weight prediction. Stem length, stem diameter, number of lateral branch, branch length, leaf number, stipule length and distance between stipules have been selected as input variables in these mentioned methods. A total of 150 plants were examined. Fifty plants from each of Gea, Bilensoy and Basbag alfalfa cultivars were analyzed separately. Our alfalfa varieties in this study are Gea, Bilensoy and Başbağ. In the ANN method, 70% of the data were allocated for training 20% for validation and 10% for testing. ANN training data were used in MARS algorithm and MLR. To measure which models can predict better, the coefficient of determination (R 2 ) and mean square error (MSE) were compared each other. Correlation coefficients (r) of ANN, MARS and MLR in Stem Weight estimation were 0.801, 0.999 and 0.753 for Gea clover variety, respectively; 0.864, 0.997 and 0.711 for Bilensoy variety, respectively, and 0.781, 0.998 and 0561 for the Basbag variety, respectively. In the same models, R 2 was 0.642, 0.998 and 0.567 for the Gea variety, respectively, 0.746, 0.994 and 0.505 for the Bilensoy variety, respectively, and 0.610, 0.997 and 0.315 for the Basbag variety, respectively. MSE values were 0.023, 0.008 and 2.498 for the Gea variety, respectively, 0.113, 0.014 and 1.409 for the Bilensoy variety, respectively, and 0.151, 0.017 and 4.641 for the Basbag variety, respectively. According to these criteria, the MARS algorithm provides a more realistic prediction than ANNs and MLR. The order of used algorithms in obtaining better prediction results in stem weight estimation in alfalfa plants was MARS> ANN> MLR.


INTRODUCTION
Alfalfa (Medicago sativa L.) is an important perennial forage plant with deep and strong root system in the family of legumes (Fabaceae) (Davis, 1978). Its homeland is Asia, Iran, Turkmenistan and the surrounding regions (Bolton, 1962). There are about 60 alfalfa species (Lesins and Gillies, 1972). Alfalfa naturally grows in a wide area extending from South and Central Europe to the near east and Japan (FAO, 2013). Alfalfa (Medicago sativa), is successfully grown in Turkey cold regions such as Central Anatolia and Eastern Anatolia, as well as in Southern regions of Turkey (Saglamtimur et al., 1990). Because of its strong root system (which can descend from 4 m to 9 m) alfalfa is a drought-resistant plant (Volaire, 2008). Besides being an easy produced yield, having a low cost, and containing a rich source of crude protein for animal husbandry, alfalfa also has high digestibility (Radovic et al., 2009). In addition, alfalfa is very rich in minerals and vitamins (Altinok and Karakaya, 2002). In terms of high yield potential and adaptation properties to different environmental conditions, alfalfa varieties also show a wide range of genetic variations (Hill et al., 1988). Alfalfa is adaptable to different climatic and soil conditions can be grown in almost all regions in Turkey (Erisen, 2005). Since alfalfa plants show foreign pollination, alfalfa varieties generally consist of many parent plants and have a broad genetic base (Şengül and Sagsoz, 2004).
Alfalfa breeding programs have focused on forage yield and quality, resistance to biotic and abiotic stressors, and fall dormancy. Conventional breeding is typically based on simple phenotypic selection, in which each plant must be phenotyped, and on pedigree-based methods such as BLUP (Piepho et al., 2008) which attempt to predict individual breeding values based on pedigree.
The stem weight are forage production indicators in alfalfa (Matthew et al., 1996). Authors obtained lower and higher weight per stem (0.27 and 0.45 g) at 30 and 50 d.
One of the best ways to increase yield profitability is to identify production-related problems at an early stage. ANNs are among the most accurate and widely used methods for data mining and forecasting (Rad, 2018). The main excellence of this approach is that ANNs use ordinary patterns for simulations, and they do not use complex mathematical calculations (Razavi et al., 2003). One of the most important advantages of this method is modeling between input and output information with a nonlinear relationship. On the other and in traditional modeling methods, modeling of this information is done with many errors (Zhang et al., 2012). In research on cultivation as well as yield production management forecasts, the ANN method has provided more accurate results than traditional methods (Solaimany-Aminabad et al., 2013). According to Jiang et al. (2004)'s studies, estimating winter wheat harvest by the ANN method gave very realistic results. Uno et al. (2005) tried to estimate the corn harvest data with different statistical methods by choosing a large number of parameters and stated that the ANN method gave the best results. Higgins et al. (2010) reached the same conclusion in a classification based on repeatedly growing trees.
The literature review yielded no studies that predicted stem weight from the morphological characteristics of the alfalfa plant. The fact that it will be done in this study for the first time increases the importance of this study.
The purpose of this study is to determine the stem weight from various plant properties in different types of alfalfa plants using the MARS algorithm, ANNs and MLR.

MATERIALS AND METHODS
Plant characteristics and measurements used in this research were taken from the research project that was established to determine the yield and qualities of some alfalfa genotypes in 2019 at Bingöl-Genç Vocational School Application and Research Area in Turkey. Fifty plant samples belonging to Gea, Bilensoy-80 and Basbag alfalfa varieties were taken randomly from the trial area. The average length of the growing season is 150 days (June to October), and the average annual precipitation is 110 mm. The research site has no identifiable water tables. Alfalfa (Medicago sativa) was planted on 25 June 2019. The plot was irrigated uniformly at a rate of 10 mm per week to the end of the 2019 irrigation season (5 October) to insure uniform germination. The statistics of the various plant properties related to the Gea, Bilensoy and Başbağ varieties of alfalfa are presented in Table 1. Stem weight is output variable. Other variables are input variables. It was possible to access 50 plants (number of plants) for each possible plant species. Efforts were made to achieve the best possible result with the training and testing performed with ANNs. The data was obtained by measuring the samples' stem weight, stem length, stem diameter, number of lateral branches, length of lateral branches, leaf number, stipule length, and the distance between stipules.
Before training any network, the general practice is to first divide the data into three subsets training, validation, and testing. The "dividerand" function of the MATLAB program is used to divide the data randomly. Out of the whole database 70% (35 data) were used for training, 20% (10 data) for validation, and the rest 10% (5 data) for testing. Training, validation and testing were done separately for each plant variety. The number of data in each group was kept balanced with the utmost care. Furthermore, for validation the data should cover the whole range.
An estimate of complex system output with different impressive input parameters is the capability of artificial intelligence system such as ANN systems. ANNs are composed of interconnected neurons that are placed in three different layers (Nowruzi and Ghassemi, 2016): Input layer, hidden layers, and output layer ( Figure 1).

Figure 1. Two-layer feed forward network structure
In this study the activation function used to calculate the output was the Hyperbolic Tangent function (Spiegel et al., 2009).
( ) = − + When the predicted is done by ANNs, two-layer feed forward neural networks containing hidden neurons and output neurons with provided data and hidden layers containing a sufficient number of neurons are well adapted to multidimensional imaging problems. The network was trained with the Levenberg-Marquardt back propagation algorithm (trainlm). A single hidden layer was used for this network. The number of neurons in a given layer was 25. The number of output neurons was determined by the number of elements in the target vector. The network was manually configured for better performance.
MARS (Multivariate adaptive regression splines), a nonlinear and nonparametric regression method, was first introduced by Friedman (1991) as a pliable process that replicates interactions between inputs and outputs with fewer variable.
In training set, the V three-fold cross-validation resampling technique was used to select the best predicting MARS model with degree = 1:4 and nprune =5:40 as the number of selected terms. The V three-fold cross-validation resampling technique was used to find the best model. To maximize the predictive accuracy of the MARS algorithm, the penalty definition was taken as -1. The number of basis functions used in MARS algorithm was 38, 33, and 34 for Gea, Bilensoy and Basbag varieties, respectively. The degree of interaction that is very important was a maximum 3.
The fit of the entire MARS model in the study was evaluated using statistics such as Coefficient of Determination (R 2 ), Standard Deviation Ratio (SD ratio), Root Mean square error (RMSE), Mean square error (MSE), and Mean absolute percent error (MAPE) Zaborski et al., 2019).
Stem weight was estimated by using the leaf characteristics of the alfalfa plant with the ANN method in the Matlab R2016a program (MathWorks, Inc., 2016) and R Program (R Core Team, 2019).

RESULTS
As a result of the training of the network, a graphic showing the error values related to training, validation and test sets in each iteration was obtained. As can be seen in the graphic, the number of iterations in the training of the network was taken as 24, and the best performance was obtained at the 18 th iteration (Figure 2).

Figure 3. Regression graphics related to training, validation and test sets
Also the MARS algorithm was used to estimate stem weight (SW) in the Gea variety. The main functions, the coefficients and significance value of the MARS algorithm are presented in Supplementary Material (Table 2).
According to the results presented in Supplementary Table S2, the model created according to the MARS algorithm had 38 basis functions. According to these results; the main functions and variables that had the greatest positive effect on stem weight (SW) prediction in the Gea variety of alfalfa were summarized as follows.
The variables that contribute to SW are listed in order of importance ( Figure 4).
The ANN model results for the stem weight prediction of the Bilensoy variety of alfalfa plant are summarized as follows. As a result of the training of the network, the change of error values related to training, validation and test sets in each iteration is shown graphically in Figure 5. As seen in the graphic, the number of iterations in the training of the network was taken as 122, and the best performance was obtained at the 116 th iteration ( Figure 5).

Figure 5. Performance of artificial neural network
The correlation values of the data used for training, verification and testing are given in Figure 6. The most important value is that for the test set and this value was 0.85.   (Table S3).
According to the results presented in Table S3, the model created according to the MARS algorithm had 33 basic functions. According to these results, the main functions and variables that had the greatest positive effect on stem weight (SW) prediction in the Bilensoy variety of alfalfa were summarized as follows.
The variables that contributing to SW were listed in order of importance (Figure 7).

Figure 7. Most important variables for Bilensoy variety of alfalfa
To determine the suitability of the model, r = 0.997, RMSE = 0.118, MSE = 0.014, R 2 = 0.994, MAPE = 3.517 and Sd ratio = 0.076 were extracted. These values are the goodness of fit statistics of the model. ANN model results for the stem weight prediction of Basbag variety of alfalfa plant are summarized as follows. As a result of the training of the network, the change of error values related to training, validation and test sets in each iteration is shown graphically in Figure 10. As seen in the graphic, the number of iterations in the training of the network was taken as 16, and the best performance was obtained at 10th iteration ( Figure 8).

Figure 8. Performance of artificial neural network
The correlation values of the data used for training, validation and testing are given in Figure 9.

. Regression graphics related to training, validation and test sets
The MARS algorithm was used to estimate stem weight (SW) in the Basbag alfalfa variety. The main functions, the coefficients, and the significance value of the MARS algorithm are presented in Supplementary Material (Table S4). The variables contributing to SW are listed in order of importance for the Basbag variety ( Figure 10).

Figure 10. Most important variables for Basbag variety of alfalfa
To determine the suitability of the model, r = 0.998, RMSE = 0.131, MSE = 0.017, R 2 = 0.997, MAPE=2.44 and Sd ratio=0.055 have been extracted. According to these results, the MARS algorithm produced a very good estimate. It can be understood that the model fits very well.
To obtain a good fit or very good fit in the model, the standard deviation ratio of the constructed regression model should be in the range of 0.10-0.40 (Grzesiak and Zaborski, 2012;Eyduran et al., 2019).
Regression analysis results plant properties of Gea, Bilensoy and Basbag alfalfa varieties that affect plant stem weight (SW) was presented in Table 8. As a result of multivariate regression analysis in Gea, Bilensoy and Basbag varieties, it was observed that only the SL (stem length) variable significantly affected the plant stem weight (SW) and the effects of other variables were insignificant. In regression analysis applied in Gea, Bilensoy and Basbag alfalfa varieties, determination coefficients (R 2 ) were found as 0.567, 0.505 and 0.315, respectively. On the other hand, mean error squares (MSE) were 2.498, 1.409 and 4.641, respectively. Therefore, models that were created with the MARS algorithm were more suitable and preferred.

DISCUSSION
In recent years, various environmental and hydrological research studies have been carried out using the MARS method (Yang et al., 2003). Simulation of pesticide transport tendencies in soils (Leathwick et al., 2005) and estimation of fish distribution are among these studies that used the MARS method (Bera et al., 2006). In another study, the MARS model and ANN algorithms were compared, and the MARS model gave better results (Zhang and Goh 2016). In this study, when the ANN, MLR and MARS algorithm of the alfalfa plant were examined comparatively in estimating the stem weight of the alfalfa plant by using plant characteristics, the stem weight was predicted better when considering the MARS algorithm goodness criteria. It is compatible with this study in terms of model performance of the statistical methods used. In another study, the MARS algorithm gave better results than ANN method in runoff forecasting in micro basins (Adamowski et al. 2012). Consistent with these studies, in our study, the MARS method yielded better results, as well. Benabderrahim et al. (2009) investigated the diversity of different lucerne (Medicago Sativa L.) populations in South Tunisia. The authors found the alfalfa stem length in the 49.4-83.6 range and stem diameter in the 0.15-0.40 range in different populations. While the stem length was similar to the results in this study, the stem diameter differed from the results obtained in this study. This difference may be due to environment, genotype, and growing conditions. The average plant height decreased from 11.08 to 10.66 cm, respectively, in control and treated plants in Tunisia. In contrast, the average number of leaves showed no significant increase (from 6.16 to 6.47). Regarding genotype by salinity interaction, there were no significant effects on the plant development and growth traits according to ANOVA two-way analysis (Benabderrahim et al., 2020).
The average number of leaves in alfalfa grown in the Erzurum region, which was 692.6 in normal soil, was determined as 54 in saline-alkaline soils. The clover plant height, which was 42.4 cm in normal soil, was 24.3 cm in salinealkaline soil. While the number of leaves was higher than the results obtained in this study, the plant height was slightly lower. The number of leaves and plant height differed depending on the soil type grown, the environment, and genotype differences (Tan et al., 2002).
Since the alfalfa genotypes and growing conditions in this study were different, the results differed from the results obtained in other studies.

Conclusion:
As a result of the application of ANNs and regression analysis and MARS algorithm that used for stem weight prediction research in alfalfa, it was seen that MARS algorithm predicted variables more precıse and with fewer errors. Given the performance criteria of MARS, ANN, and MLR methods, the highest correlation coefficient and the lowest MSE values were obtained in the MARS algorithm. Therefore, it can be suggested that the MARS algorithm provides more reliable and consistent results than the ANN method and MLR. Future studies measuring various properties of plants with the MARS algorithm are expected to prove the contributions of this algorithm to the field of agriculture.    [, -1]h(SL-60)*SD -0.478032 0.050833 -9.404 6.41e-08 ***