Introduction

Background

Recent advances in computational and engineering domains have provided reliable responses to various problems in human modern life1,2,3,4,5. Intelligent tools, sophisticated simulation packages, and soft computing approaches are evident examples of these advances6,7,8,9. In the field of energy, engineers have successfully employed these tools and methodologies to improve the sustainable development of renewable energy systems10,11,12,13. Recently, solar energy has been introduced as an outstanding renewable source due to its numerous benefits like environmental friendliness, universality, high capacity, and inexhaustible supply14,15. Scholars attempted to evaluate the solar energy production pattern by forecasting related parameters such as IS. However, the evaluation of these factors required reliable approaches to handle non-linear calculations because of the existing many involved parameters16,17. Today, Machine Learning (ML) models proved to have an impressive approach to handling non-linear calculations18,19,20. In terms of forecasting tasks, ML methods can be used to conduct complicated mathematical relationships and provide exact solutions. Artificial Neural Network (ANN)21,22, Support Vector Machine (SVM)23,24, decision trees25,26, and neuro-fuzzy tools27,28 are among the most popular ML algorithms utilized for prediction aims related to solar energy calculations.

Literature review

ML algorithms provided fast, inexpensive, and reliable solutions, which motivated experts to take advantage of them in the forecasting tasks29,30,31,32,33. Kim, Seong and Choi34 utilized ANN Models to forecast the energy consumption of an actual air handling unit and the appropriate result was obtained. Bhatt and Gandhi35 used two different statistical and ANN models to forecast the energy consumption in the wind power plants and the error of the feed-forward neural network was determined to be around 9.85%. Yin, Jia, Wu, Dai and Tang36 used a feedforward ANN model for forecasting tasks in the case of energy demand and the mean relative error value of the forecast was determined to be 1.58%. Malvoni, De Giorgi and Congedo37 utilized an SVM method to forecast the data from the Photovoltaic (PV) power. This method was also utilized in Ref. for forecasting wind energy production in Estonia and compared to other traditional methods like Behavior-Driven Development (BDD), appropriate results were obtained.

Optimization-oriented efforts form lots of studies in the engineering literature38,39,40. In particular, many scholars have suggested metaheuristic algorithms for optimization purposes41,42. They can also serve to optimize traditional ML methods like ANN and ANFIS43. These algorithms were concerned in the case of renewable energy analysis in many studies previously44,45 such as solar power energy46 and wind energy47,48. Computational problems such as local minima can be removed by using the metaheuristic-based hybrids29. In the following, previous works related to the use of metaheuristic algorithms to optimize ML methods in the case of energy forecasting tasks are briefly summarized. Moayedi and Mosavi49 utilized an innovative metaheuristic approach (Electromagnetic Field Optimization (EFO)) to optimize a neural network and proved that the EFO-supervised neural network algorithm can appropriately mine a dataset of nonlinearly tuning the network elements. Abedinia, Amjady and Ghadimi50 utilized a productive engine consisting of a metaheuristic optimizer, namely shark smell optimization to optimize ANN. They claimed that this hybrid method had better performance compared to other conventional predictors such as ANN with lower normalized Root Mean Square Errors (RMSEs) by about 27% compared to ANN and other traditional methods. Galván, Valls, Cervantes and Aler51 used a multi-objective Particle Swarm Optimization (PSO) method to enhance the ANN method and observed that the PSO optimizer had outstanding results compared to traditional ANN. Tran, Luong and Chou52 introduced a new model namely Evolutionary Neural Machine Inference Model (ENMIM) consisting of different models of Least Squares Support Vector Regression (LSSVR), and the Radial Basis Function Neural Network (RBFNN) together with Symbiotic Organism Search (SOS) for obtaining optimized tuning parameters. This approach was proved to be a promising alternative for the energy management tasks. Halabi, Mekhilef and Hossain53 demonstrated that the algorithm introduced in54 can be coupled with an ANFIS system in the case of IS predictions. Louzazni, Khouya, Amechnoue, Gandelli, Mussetta and Crăciunescu55 have proven the competency of the algorithm of firefly to evaluate the solar energy harvesting parameters and observed that the firefly algorithm was very reliable in the case of solar energy forecasting. Bechouat, Younsi, Sedraoui, Soufi, Yousfi, Tabet and Touafek56 also concerned the PSO and Genetic Algorithm (GA) in this case. Zhou, Moayedi and Foong57 have studied the limitation of neural computing approaches, for example, local minima, and suggested a novel metaheuristic method namely Teaching–Learning-Based Optimization (TLBO) for enhancing a Multi-Layer Perceptron Neural Network (MLPNN). They observed that, by using TLBO method, the prediction error is reduced by 19.89% compared to the ANN approach. Vaisakh and Jayabarathi58 utilized a hybrid approach called the deer hunting optimization algorithm as well as grey wolf optimization to adjust the structure of ANNs, which was used for solar energy calculations. Their achievements reflected a notable improvement attained by the tested optimizer. Abedinia, Amjady and Ghadimi59 have used a neural network algorithm enhanced by a metaheuristic algorithm as the hybrid method for the forecasting tasks in the case of solar energy harvesting, and appropriate results were obtained. Abdalla, Rezk and Ahmed60 have successfully utilized Wind-Driven Optimization (WDO) to track the elements of photovoltaic systems and justified that the mentioned algorithm had better results compared to many traditional optimization techniques such as cuckoo search.

Motivation, novelty, and objective

The above literature shows the necessity of utilizing modern tools and techniques for coping with intricate engineering problems61,62,63,64,65. In this sense, different ML models have great contributions to the concept of renewable energy, particularly for SE-related predictions. On the other hand, metaheuristic algorithms have been recommended for optimal development of ML models. Based on the previous literature, incorporating metaheuristic optimizers with ML models such as ANN helps to avoid computational drawbacks, and therefore, is becoming a research hotspot in this way66. However, a gap of knowledge emerges when these studies mostly focus on earlier metaheuristic methods such as PSO and GA67,68, because the metaheuristic family is being extended by new potential members. This gap calls for evaluating the capability of newer hybrid models to improve SE-related predictions. Hence, in this research, a novel potential metaheuristic technique named EO is employed through an FFANN framework to analyze the meteorological and temporal conditions and predict the IS. The EO algorithm here is responsible for best-tuning the FFANN’s weights (and biases) which connect the IS to the environmental conditions. Moreover, to comparatively validate the performance of the EO, this algorithm is evaluated versus three benchmark optimizers including WDO, Optics Inspired Optimization (OIO), and Social Spider Algorithm (SOSA), as well as three algorithms of the EFO, Shuffled Complex Evolution (SCE), and Shuffled Frog Leaping Algorithm (SFLA) used in an earlier study by Moayedi and Mosavi49. Accuracy assessment is carried out using different criteria to rank them and distinguish the most competent model. Since the used models have not been applied to this problem before, the findings of this study can assist solar energy experts in the appropriate selection of predictive models. For more convenience, a mathematical formula is also extracted from the EO-FFANN model to eliminate the need for computer-aided implementations in predicting the IS. Moreover, a well-known statistical technique called Principal Component Analysis (PCA) is applied to identify the most contributive meteorological and temporal parameters, and therefore, to optimize the dimension of the problem.

To sum up, the main strengths and novelties of this study can be highlighted as follows:

  • Evaluating the applicability of ensemble learning theory for predicting the IS as a crucial parameter of renewable (solar) energy,

  • Employing the EO metaheuristic algorithm to create a novel FFANN-based model whose absence is considered a gap of knowledge in the literature on IS prediction,

  • Introducing the optimal configurations (i.e., population size and No. of iterations) for the used models,

  • Exposing various environmental and temporal conditions as key parameters in the IS prediction and determining the principal dataset components using the PCA method which has not been performed in the previous literature. In addition to optimizing the problem dimension, the results of the PCA can be regarded for suggesting real-world measures (attributing to the key parameters) to maximize solar energy production.

  • Conducting a comparative assessment by evaluating six other metaheuristic algorithms in this study (i.e., WDO, OIO, and SOSA) and previous literature (i.e., EFO, SCE, and SFLA). It makes this study a suitable benchmark for future applications of hybrid models and appropriate model selection by energy experts,

  • Developing a monolithic explicit formula from the proposed EO-FFANN model to be used as a convenient method for predicting the IS.

Overall, the achievements of this research can greatly contribute to the body of knowledge (from both data and methodology perspectives) that deals with solar energy modeling. Performing several optimization ideas carried out in this study can be helpful to reduce the complexities (i.e., computational costs) in the way of proper IS prediction.

In the following, the study continues by introducing the used materials and methods in Sect. 2, presenting the results and discussion in Sect. 3, followed by providing conclusions in Sect. 4.

Materials and methods

Dataset and splitting

From previous studies, it is evident that the amount of received IS is a function of various meteorological conditions69,70. In this work, this amount is represented by a so-called parameter Global Horizontal Irradiance (GIH) which is measured for Yemen. Along with the GIH, the records of five meteorological factors, namely: Air Temperature (T), Relative Humidity (H), Surface Pressure (P), Wind Direction (WD), and Wind Speed (WS) are downloaded from the Solcast community (https://solcast.com/). All measurements are hourly within one year (2021-05-31 to 2022-06-01). Figure 1 shows the time series of the T, H, P, WD, WS, and GIH.

Figure 1
figure 1

Time-series of the GIH and meteorological parameters.

In addition to these five parameters, three temporal inputs, namely Month (m), Day (d), and Hour (h) are also considered influential parameters. When this dataset is exposed to the considered ML models, the influential parameters (i.e., m, d, h, T, H, P, WD, and WS) play the role of inputs, while the GIH is the target of the system. Therefore, the used models explore the relationship between the temporal and meteorological parameters to understand and predict the hourly GIH. Table 1 gives the results of the statistical analysis performed on the used dataset.

Table 1 Correlation factors showing the proportionality of the parameters.

As per Table 1, a total of 8803 records exist in the dataset. These records are split into two sub-sets for creating the training and testing sets. The training set is required to provide the training material for the models, and the testing set examines the generalizability of the models. Based on previous works, 80:20 ratio is applied to split the dataset, meaning that 7042 records exist in the training set, and 1761 records exist in the testing set.

Applied algorithms

Overview of EO

As the name implies, the EO is an optimization technique that mimics specific laws of physics to obtain an optimum solution71. It is a capable metaheuristic algorithm for dealing with problems with different levels of complexity. The search units of the EO are called particles each of which receives an initial concentration value as in Eq. (1):

$${C}_{j}= LB + r\times (UB- LB),$$
(1)

where \(r\) is a random value in [0, 1]. Moreover, \(LB\) and \(UB\) are the lower and upper bounds of the space.

Similar to other population-based optimizers, the quality of the particles is reflected by a fitness value. They are then sorted, and the algorithm hires four of them which are distinguished by the highest fitness value. A fifth particle is also considered that represents the mean of these four particles.

The exponential term (F) of the algorithm is defined by Eqs. (2), (3), (4).

$$F = {CP}_{1} sign\left(r-0.5\right)({e}^{-\beta t}-1),$$
(2)
$$t= {(1- {R}_{iter})}^{({CP}_{2} \times {R}_{iter})},$$
(3)
$${R}_{iter}=\frac{iter}{T},$$
(4)

in which \(\beta\) stands for the turnover rate,\({CP}_{1}\) and \({CP}_{2}\) are controlling parameters for the exploration and exploitation phases, respectively.

Assuming \({G}_{CP}\) and \(GP\) as a controlling parameter and the generation probability, respectively, generation rate is calculated by Eqs. (5) and (6).

$${R}_{G} = {G}_{0} {G}_{CP} \left({C}_{eq}- \beta C\right) F,$$
(5)
$${G}_{CP}= \left\{\begin{array}{c}0.5 {r}_{1} \,\,\,\,\,\,\,\,{r}_{2}\ge GP \\ 0 \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,otherwise\end{array}\right.,$$
(6)

where \({C}_{eq}\) is the equilibrium pool, and \({r}_{1}\) and \({r}_{2}\) are random numbers in [0, 1].

Based on the above calculations, the solution is updated as in Eq. (7)72:

$${C}_{j} = {C}_{eq}+\left({C}_{j}-{C}_{eq}\right) \times F+\left(1- F\right)\frac{{R}_{G}}{\beta V},$$
(7)

where \(V\) is the considered unit.

Comparative algorithms

Wind-driven optimization was first introduced by Bayraktar, Komurcu and Werner73 in 2010 for electromagnetics applications. The WDO relies on the air parcel's movement in hyper-dimensional space. These movements are supposed to be affected by four natural forces of Coriolis force, gravitational force, frictional force, and pressure gradient force. Also, by taking into consideration the ideal gas equation, the position (as well as the velocity) of the air parcels is updated to find the best responses. Scholars like Moayedi, Bui and Ngo74 and Bayraktar75 have successfully used the WDO for optimizing the neural parameters.

As a physic-based scheme, the OIO was suggested by Kashan76 in 2014. It is inspired by optics (a law in physics) which works by a group of artificial light-related stuff. After randomly generating the fixed number of individuals, the initial position of the light points is determined. Each point is then put in front of an artificial mirror and its image is created in the search space with a certain distance from the main axis. The position of the image is then updated to be a new solution This process continues until a stopping criterion is satisfied77.

The SOSA, as the name implies, takes the idea from the food-seeking action of social spider, introduced by James and Li78 in 2015. In this method, the solution space is considered a hyper-dimensional spider web that the agents (i.e., spiders) can move on it. As assumptions, the agents have regular interaction with each other and every position in this area corresponds to a possible solution79. Each spider distinguishes itself by the position and fitness value. The agents possess a memory to hold three basic attributes: all possible vibration intensities are positive, (ii) the larger the fitness values mean more intense vibrations, and (iii) once the best solution is getting close, the vibration does not experience excessive increase.

Mathematical details pertaining to the above algorithms can be found in the literature (like the WDO60,80, OIO81,82, and SOSA83,84).

Evaluation method

Statistical indices are normally used for evaluating the accuracy of ML models. In this work, RMSE along with Mean Absolute Error (MAE) is used to indicate the prediction error. Given \({GI}_{H {i}_{real}}\) and \({GI}_{H {i}_{predict}}\) as the real and predicted GIHs, respectively, Eqs. (8) and (9) formulate the RMSE and MAE as follows:

$$RMSE=\sqrt{\frac{1}{S}\sum_{i=1}^{S}{({GI}_{{Hi}_{real} }-{GI}_{{Hi}_{predict} })}^{2}},$$
(8)
$$MAE=\frac{1}{S}\sum_{i=1}^{S}\left|{GI}_{{Hi}_{real} }-{GI}_{{Hi}_{predict} }\right|,$$
(9)

where S stands for the size of the set.

Moreover, a so-called correlation indicator “Pearson Correlation Coefficient (R)” is designated as per Eq. (10) to reflect the agreement between reality and prediction.

$$R=\frac{\sum_{i=1}^{S}({GI}_{{Hi}_{predict} }-\overline{{GI }_{{H}_{predict} }})({GI}_{{Hi}_{real} }-\overline{{GI }_{{H}_{real} }})}{\sqrt{\sum_{i=1}^{S}{({GI}_{{Hi}_{predict} }-\overline{{GI }_{{H}_{predict} }})}^{2}}\sqrt{\sum_{i=1}^{S}{({GI}_{{Hi}_{real} }-\overline{{GI }_{{H}_{real} }})}^{2}}}\times 100,$$
(10)

Results and discussion

Hybridization of algorithms

When the FFANN is hybridized with a metaheuristic algorithm, the basic idea is to optimize its weights and biases to establish the best relationship between the target and input parameters. In this research, the FFANN is optimized by the EO algorithm, as well as OIO, WDO, and SOSA. The metaheuristic algorithms are able to find the solution in an iterative process.

The used FFANN is represented by an MLPNN (8,6,1) model indicating a three-layered neural network with 8 input neurons in the first layer, 6 hidden neurons in the middle layer, and 1 output neuron in the last layer. The activation functions in the middle and last layers are Tansig and Purelin, respectively. This configuration is obtained after an extensive trial-and-error effort. A topology of the used FFANN is embedded in Fig. 2. According to this figure, this network has a total of 61 weights and biases which are optimized by the metaheuristic algorithm. In this process, the training dataset is used solely. First, the mathematical equation of the FFANN is extracted and is considered as the problem function. Next, a metaheuristic algorithm is run to tune the FFANN equation (i.e., weights and biases) so that the training RMSE is minimized by 1000 iterations. In each iteration, new 61 variables construct the FFANN, and the training RMSE is calculated. Note that, each of the EO, OIO, WDO, and SOSA algorithms were implemented with different population sizes (varying from 50 to 700) and it was observed that the best population size for them is 400, 200, 100, and 200, respectively.

Figure 2
figure 2

Optimization process of the hybrid models.

Optimization results

Figure 3 shows the optimization process for the used algorithms (with the mentioned best population sizes) that are iterated 1000 times. From the comparison of the curves, it is immediate that the EO has reached a higher quality of solution due to the minimum RMSE error. While the solutions of the WDO and OIO are very close, the SOSA has found the solution with considerably higher error. Note that this process was carried out using the training set only because the testing set should be kept away from the models in this stage. In the next two sections, the training and testing results are assessed using the accuracy methods.

Figure 3
figure 3

Optimization curves of the OIO-FFANN, WDO-FFANN, SOSA-FFANN, and EO-FFANN algorithms.

Training accuracy

Figure 4 forms part of the training results as the final RMSEs are the training RMSEs. Having the order of algorithms as OIO-FFANN, WDO-FFANN, SOSA-FFANN, and EO-FFANN, training RMSEs were 161.22, 152.16, 230.61, and 142.38 w/m2.

Figure 4
figure 4

Training errors of (a) OIO-FFANN, (b) WDO-FFANN, (c) SOSA-FFANN, and (d) EO-FFANN.

Figure 4 illustrates the error values for the 7042 records in the training set. Each single value in this figure indicates the pure difference between \({GI}_{H {i}_{real}}\) and \({GI}_{H {i}_{predict}}\). Hence, the lower the value, the higher the accuracy. In a glance, it can be seen that the results of the EO are better positioned around the Y = 0 line. Quantitatively speaking, the training MAEs were 127.52, 119.33, 189.56, and 110.09 w/m2.

The calculated values of the RMSE and MAE indicated an acceptable level of error for all used models. As for the R index, the values were 0.89, 0.90, 0.76, and 0.91 which demonstrate a significant level of agreement between the reality and prediction results of all four models. However, again, the superiority of the EO algorithm is obvious in terms of the R, too. It was the only model that achieved a correlation larger than 90%.

Testing accuracy

This section shows the performance of the OIO-FFANN, WDO-FFANN, SOSA-FFANN, and EO-FFANN when they are subjected to the 1761 records in the testing set. This process demonstrates the power of the trained models in dealing with unseen environmental conditions for estimating hourly GIH.

From the obtained RMSEs of 161.63, 151.57, 230.16, and 141.61 w/m2, it is quantitatively inferred that the testing results enjoy a satisfying level of accuracy. Figure 5 illustrates the statistics of the testing errors. In these histogram charts, the higher the frequency of 0 error, the better the accuracy. As is seen, the distribution is almost normal for all models. It professes the high quality of testing results. Besides, the MAEs of 127.52, 118.72, 188.20, and 108.07 w/m2 indicate a low level of average errors.

Figure 5
figure 5

Testing histogram of errors of (a) OIO-FFANN, (b) WDO-FFANN, (c) SOSA-FFANN, and (d) EO-FFANN.

Figure 6 shows the correlation diagrams of the testing set. The values on the horizontal and vertical axis represent the \({GI}_{H {i}_{real}}\) and \({GI}_{H {i}_{predict}}\), respectively. Hereupon, the ideal prediction happens when all points are positioned on the line x = y, and the R-value is 1. As per Fig. 6, all four models have performed a nice prediction and their calculated Rs were 0.89, 0.90, 0.77, and 0.91. Similar to the training stage, EO-FFANN is the only algorithm with a correlation above 90%.

Figure 6
figure 6

Correlation diagrams for testing set of (a) OIO-FFANN, (b) WDO-FFANN, (c) SOSA-FFANN, and (d) EO-FFANN.

Accuracy comparison

It was in general shown that the EO-based model outperformed the benchmarks in both training and testing stages. In this section, the models are subjected to a more detailed comparison to rank them. For this purpose, Taylor diagrams are generated and presented in Fig. 7. These figures can simultaneously show the correlation (Correlation Coefficient) and error (RMSD = RMSE). As is seen, in both training and testing sets, the same pattern is obtained, and it means there is no discrepancy between the training and testing qualities. The EO-FFANN is distinguished by the lowest error and highest correlation, followed by WDO-FFANN and OIO-FFANN. As for SOSA-FFANN, this model has a considerable weakness in its performance in comparison with three other models. As per Fig. 7, the point of the SOSA-FFANN is separated from the others.

Figure 7
figure 7

Taylor diagrams for (a) training and (b) testing sets.

PCA importance analysis

In this section, an importance assessment is applied to the used dataset. The results of such efforts can be of great importance for the proper selection of input factors from the statistical point of view. The PCA technique85 is used to determine the most contributive factors for the GIH prediction. Figure 8 shows the obtained scree plot, according to which, four components have an eigenvalue larger than 1. These four components are considered as principal components, and based on Table 2, cumulatively account for about 75% of the variance in the dataset.

Figure 8
figure 8

Scree plot of the PCA analysis.

Table 2 Total variance explained.

Table 3 shows the results of the Varimax rotation method. In each of the four components, the factors with loading above + 0.70 and below −0.70 are selected. As is seen, Component 1 reflects T and H, Component 2 reflects P and WD, Component 3 reflects d, and Component 4 reflects m. Hence, it can be deduced that h and WS can be discarded for optimizing the dataset.

Table 3 Rotated component matrix from Varimax method.

A monolithic formula

In order to eliminate the need for implementing computer-based programs, this section provides a mathematical expression that is derived from the proposed model i.e., EO-FFANN, for predicting the GIH. The reason for considering EO-FFANN is that this model achieved the highest accuracy in the previous assessments. The formula is a monolithic relationship; however, it has two steps and the GIH needs to be calculated in the second step.

Referring to the FFANN topology in Fig. 2, this equation is constructed from 61 weights and biases. The general inputs of this equation are m, d, h, T, H, P, WD, and WS that feed Eq. (11). With the help of Table 4, the outcomes of this equation are \({N}_{i}\) (i = 1, 2, …, 6) that feed Eq. (12) for calculating the GIH. In other words, Eq. (11) and Table 4 together express the process between the input and hidden layers of the FFANN, while Eq. (12) expresses the process between the hidden and output layers (see Fig. 2).

$${N}_{i}=Tansig({W}_{i1} \times m + {W}_{i2} \times d + {W}_{i3} \times h + {W}_{i4} \times T + {W}_{i5} \times H + {W}_{i6} \times P + {W}_{i7} \times WD + {W}_{i8} \times WS+ {b}_{i}),$$
(11)
$$\begin{aligned} GI_{H} = & \, 0.4187 \, \times \, N_{1} + \, 0.5094 \, \times \, N_{2} - \, 0.4479 \, \times \, N_{3} + \, 0.3594 \, \times \, N_{4} \\ & + \, 0.3102 \, \times \, N_{5} - \, 0.6748 \, \times \, N_{6} - \, 0.7620, \\ \end{aligned}$$
(12)
Table 4 Optimized internal parameters of the FFANN.

Strength, limitations, and future guidelines

This study presented novel applications of metaheuristic-empowered ML models for predicting IS. A valid dataset with various meteorological and temporal factors was applied for this purpose. The models were optimized in terms of their hyper-parameters such as the FFANN topology and population size of the metaheuristic algorithms. Therefore, it can be claimed that the used models are among the most optimum ones. The desirable level of accuracy obtained in this study proved the applicability of the applied models, however, a comparison showed that the EO-FFANN shows greater promise. This model achieved improvement when it is compared to previous studies. For instance, Moayedi and Mosavi49 used the EFO algorithm, along with the SCE and SFLA, to optimize a similar FFANN. These models reached an R-value (non-percentage) of 0.82132, 0.78046, and 0.75212, respectively, which are lower than the R values of the EO-FFANN in this work.

Presenting a simplified formula is another outcome of this study which enables the users to predict the GIH without the need for computer-aided facilities. Furthermore, regarding the performed trial and error efforts in different stages, it can be said that this solution is captured carefully among numerous candidates.

In Sect. 3.6, the PCA model was applied to the dataset and its results highlighted the T, H, P, WD, d, and m as the most contributive input factors. As is known, reducing the dataset inputs from 8 to 6 results in lightening the computational burden due to the reduction in the problem dimension86. Considering this idea is highly recommended for future efforts towards improving the solution for the GIH prediction.

However, this study encountered some limitations, too. About the used dataset, it includes the records from 2021-05-31 to 2022-06-01. Hence, updating this dataset with the most recent data (e.g., late 2022 and early 2023) could be of great interest to future efforts. It may help in enhancing the generalizability of the suggested models for new climate conditions. As far as the models are concerned, although the applied metaheuristic algorithms are among the recent members of this family, more algorithms have been developed lately. Comparing the results of the EO with the most recent metaheuristic algorithms would greatly help in updating the solutions, and probably, increasing the accuracy of GIH prediction.

Conclusions

The importance of analyzing environmental conditions is evident in the forecast of renewable energy potentials. This work was dedicated to optimizing solar energy simulation using state-of-the-art ML and feature selection strategies. An FFANN was optimally trained using different metaheuristic algorithms for predicting solar irradiance from meteorological and temporal parameters (including humidity, temperature, pressure, cloud coverage, speed and direction of wind, month, day, and hour). Assessing the prediction results revealed that the EO performs more accurately than the three optimization algorithms evaluated in this study (OIO, WDO, and SOSA), as well as three optimization algorithms (EFO, SCE, and SFLA) from the earlier literature. Therefore, the mathematical representation of the EO-FFANN was presented in the form of a predictive formula to be reliably used for practical GIH predictions. Moreover, the PCA method could successfully analyze the datasets and address the T, H, P, WD, d, and m as the input factors that are most essential in forecasting solar irradiance. This part of the results can be regarded in the real world for enhancing the generation of solar energy. In conclusion, the findings of this study professed the efficiency of the PCA and metaheuristic techniques for optimizing the performance of ML models. However, some ideas were presented for future work toward coping with the limitations of the study, most notably updating the used dataset and predictive models.