Issue 
EPJ Photovolt.
Volume 15, 2024
Special Issue on ‘EU PVSEC 2024: State of the Art and Developments in Photovoltaics’, edited by Robert Kenny and Gabriele Eder



Article Number  33  
Number of page(s)  12  
DOI  https://doi.org/10.1051/epjpv/2024028  
Published online  21 October 2024 
https://doi.org/10.1051/epjpv/2024028
Original Article
Uncertaintyaware estimation of inverter field efficiency using Bayesian neural networks in solar photovoltaic plants
^{1}
GreenPowerMonitor a DNV company, Gran Via de les Corts Catalanes 130, Barcelona, Spain
^{2}
DNV Denmark, Tuborg Parkvej 8, Hellerup, Denmark
^{*} email: gerardo.guerra@dnv.com
Received:
7
June
2024
Accepted:
2
September
2024
Published online: 21 October 2024
Solar inverters are one of the most important components in a Photovoltaic plant. Their main function is to convert the DC power produced by the solar modules into AC power that can be injected into the grid. Although inverter efficiency has reached exceptionally high values, thanks to recent technological advancements, it is typically measured at dedicated laboratories under strict testing conditions, which makes its validation after deployment extremely challenging, both from a logistic and financial point of view. This paper presents a methodology for the calculation of inverter field efficiency based on Bayesian neural networks. The goal of the neural network is to model inverter efficiency and its variance as a function of the inverter's operational state. Results show that an optimised Bayesian neural network can effectively model inverter efficiency with small reconstruction errors and negligible bias. Furthermore, the model has been proven useful to replicate the calculation of the European efficiency along with a full uncertainty characterisation.
Key words: Photovoltaic generation / inverter efficiency / Bayesian neural networks / machine learning / uncertainty
© G. Guerra et al., Published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
The expansion of solar photovoltaic (PV) capacity has experienced remarkable growth from a modest 1 GW per year in 2004 to an anticipated 250 GW by 2022, with projections suggesting a rise to 500 GW annually by 2040. This surge reflects not only the increasing installations but also the expected integration of energy storage solutions with new PV systems. By 2050, solar and solar + storage capacities are forecasted to reach a combined total of 15.3 TW, marking significant advancements in renewable energy capacity. Despite solar's dominance in installed capacity, expected to represent 54% by midcentury, it will contribute 39% to global ongrid electricity production, limited by lower efficiency rates compared to other renewable sources like wind and hydropower [1].
Economically, solar PV has become the most costeffective electricity source in many regions by the early 2020s, driven by a rapid decline in the levelised cost of energy (LCOE), which is anticipated to fall further to about 21 USD/MWh by 2050. This cost reduction is attributed to the decrease in unit investment costs and the learning rate for solar module costs, which is projected to slow from 26% to 17% by 2050. Furthermore, operational expenses are expected to maintain a steady decline due to enhanced data monitoring and maintenance practices, reinforcing solar PV's position as a highly competitive energy source despite regional variability in solar irradiation [1].
Regionally, by 2050, Greater China and North America are poised to lead in solar electricity production, with notable increases also in the Indian Subcontinent and the Middle East and North Africa region. These areas will experience a significant rise in their solar electricity shares, driven by favourable solar conditions and supportive policies. However, the shift towards solar dominance will vary by region due to differences in resource availability and policy frameworks, with Europe likely to see solar overtake fossil fuels by 2030 due to aggressive decarbonisation efforts [1].
Solar inverters are highly complex and versatile pieces of equipment and one of the most important components in a PV plant. Their main function is to convert the DC power (P_{DC}) produced by the PV modules into AC power (P_{AC}) that can be injected into the grid. Additionally, modern inverters can also provide a wide range of auxiliary and smart services such as real power control, power quality enhancement, low voltage ridethrough, reactive power compensation, frequency regulation, and thermal management [2]. Despite the complexity brought about from housing so many different functions, inverter efficiency has reached exceptionally high values, thanks to recent technological advancements, with manufacturers reporting efficiencies of 98% or higher for stateoftheart products [3].
Independent validation of inverter efficiency has been the subject of multiple works in the past. In [4] the authors studied the effect of reactive power generation on inverter losses for 3 inverters operating at different active and reactive power setpoints. Measurements from experiments were used to develop an empirical and a lossbased model for inverter efficiency with accurate results. The experiments were conducted at a laboratory and no records of ambient or inverter temperature are mentioned. In [5] a fleet of 355 inverters was analysed in order to compare the onfield inverter efficiency with the California Energy Commission (CEC) efficiency curves reported by the manufacturers. The authors found that field efficiencies were on average 0.9% lower than the manufacturers' curves. Efficiency curves were constructed using accurate meter data, but no consideration for temperature, AC voltage or reactive power was taken. The authors of [6] developed an efficiency model for an SMA Sunny Boy 3000TL using monitored 5minute data. The presented model uses a double exponential analytical function whose only input is the inverter's power output. In [7] the authors analysed the field efficiency of a test PV power plant and compared it to the manufacturer's reported European efficiency. Calculated efficiency showed differences of up to 4% with respect to the data sheet. The authors briefly discussed the involved uncertainty; however, the discussion was only limited to the accuracy of the power measurement equipment. The study presented in [8] involved the efficiency measurement of multiple microinverters under laboratory conditions. No comparison to manufacturers' values was performed and the study's ultimate goal was to develop an actual energy yield model for multiple microinverter and PV module combinations. [9] presented an empirical model for modelling inverter efficiency. The authors developed an analytical function to model P_{AC} as a function of P_{DC}, whose default parameters can be obtained from manufacturers' specification sheets, and the model's accuracy can be further improved using laboratory or field measurements. The authors of [10] introduced an onsite methodology aimed at estimating inverter efficiency as a function of input DC voltage. Data was acquired with dedicated high accuracy equipment at 200 ms resolution. The 5 analysed inverters presented efficiency values greater than those reported by the manufacturer. Finally, [11] presented a methodology for the automated evaluation of multiinput inverters under multiple conditions at a laboratory setting. Results from the experiments were used to fit the analytical efficiency model presented in [9]. All the above mentioned works attempt to model inverter efficiency by different means, some have developed empirical models that include one or two input variables, while others rely on interpolation curves. Unfortunately, none of these models appear to include other external variables that could impact inverter efficiency (e.g. ambient temperature) and do not attempt to model the uncertainty involved with their estimations beyond the calculation of standard deviations.
This paper presents a methodology for the calculation of inverter field efficiency based on Bayesian Neural Networks (BNNs). The goal of the BNN is to model inverter efficiency based on P_{AC}, ambient temperature (T_{AMB}), AC voltage (V_{AC}), and reactive power (Q_{AC}); these signals are collected from the plant's monitoring system and represent the controlled test variables for efficiency measurement under laboratory conditions. After the BNN has been trained, it is fed with constant T_{AMB}, V_{AC} and Q_{AC} values, while varying P_{AC} to replicate the testing conditions for the calculation of the European efficiency (η_{EURO}). The BNN can be trained using data from individual inverters, which would allow to model each inverter in a PV plant (i.e., single devicelevel), or by combining the data from all the inverters in the plant. The latter approach would result in a BNN that models the average behaviour of the entire inverter fleet (i.e., fleetlevel).
Machine learning (ML) has been successfully applied to different problems for solar energy (e.g. irradiance forecast, condition monitoring, performance prediction [12]); furthermore, the field continues to advance at an incredible pace with new algorithms and techniques appearing frequently. This work makes use of a combination of wellknown ML techniques as a way to model inverter efficiency. The proposed methodology improves on previous work as it introduces a flexible framework that does not rely on static analytical models, that may not generalise well, it takes into account additional variables that may affect efficiency calculation, and produces an estimation of efficiency uncertainty for a more holistic analysis.
The next sections of this paper are structured as follows: Section 2 details the methodology developed to train a BNN that can be used to estimate inverter efficiency. Section 3 presents a complete case study with obtained results, while Section 4 introduces a discussion about the methodology. Finally, Section 5 summarises the work presented in the paper with conclusions and future work.
2 Methodology
2.1 Basic definitions
Efficiency is typically defined as the ratio of the useful work performed by a machine or in a process to the total energy expended or heat taken in. For an inverter in a solar PV plant, this definition translates to the ratio of P_{AC} injected into the grid to P_{DC} received from the PV modules, see (1). Inverter efficiency is typically measured at laboratories under predefined testing conditions for P_{AC}, T_{AMB}, V_{AC} and Q_{AC}. Furthermore, equipment used for measurements must meet a minimum level of accuracy for results to be accepted [13,14]. It is a wellknown fact that inverter efficiency varies according to their output power, so it becomes necessary to perform multiple measurements at various P_{AC} levels in order to fully characterise its efficiency. However, it is not a practical approach to report a list of efficiencies, especially if the user wishes to compare multiple inverters. There exists two main conventions for summarising inverter efficiency as a weighted average of multiple efficiency measurements, the European efficiency and the CEC efficiency [15]. For this work, it was decided to replicate the European efficiency given that its testing conditions are equivalent to the inverter's onfield operation, that is, it uses a PV array simulator to emulate the behaviour of the PV modules connected to the inverter, unlike the CEC procedure where the DC voltage is fixed at three different values and the DC current is varied to achieved the desired power levels. η_{EURO} is calculated according to (2) and the corresponding testing conditions are listed in Table 1.
Although laboratory measurements of η_{EURO} require the use of highly accurate instruments, they are not perfect and there is an uncertainty associated with each measurement. Reference [13] defines a tolerance for η_{EURO} (3), which is defined as the minimum acceptable value for a guaranteed efficiency measurement when independently tested. This tolerance value can be used as lower threshold to decide whether the calculated inverter efficiency is acceptable or not.
$$\eta =\frac{{P}_{AC}}{{P}_{DC}}\cdot 100\left[\%\right]$$(1)
$$\begin{array}{ll}{\eta}_{EURO}\hfill & =0.03\cdot {\eta}_{5\%}+0.06\cdot {\eta}_{10\%}\hfill \\ \hfill & +0.13\cdot {\eta}_{20\%}+0.10\cdot {\eta}_{30\%}\hfill \\ \hfill & +0.48\cdot {\eta}_{50\%}+0.20\cdot {\eta}_{100\%}\hfill \end{array}$$(2)
where η_{x%} represents the measured efficiency when the inverter's output power is equal to x% of the inverter's rated power (${P}_{A{C}_{RATED}}$).
$${\eta}_{TOL}=\eta 0.2\cdot \left(1\frac{\eta}{100}\right)\cdot \eta \left[\%\right].$$(3)
European efficiency test conditions.
2.2 Data requirements
The developed methodology was conceived to be as independent of metadata as possible, model training requires only operational data signals related to P_{AC}, T_{AMB}, V_{AC}, Q_{AC} and P_{DC}; no other specific information related to the plant's location or technology is needed.
P_{AC} and P_{DC} are used as inputs of (1), P_{AC} being the variable that has the greatest influence on inverter efficiency. Additionally, with the inclusion of T_{AMB} as input, the model can discover an internal representation of the inverter's operational temperature, another important driver of efficiency which is mostly a function of P_{AC} and T_{AMB}. Reference [4] demonstrated the impact that Q_{AC} can have on inverter efficiency; however, when no control strategy is applied, Q_{AC} is greatly influenced by V_{AC} at the inverter's point of connection and thus, can indirectly affect its efficiency. Finally, the goal of this work is to reproduce the laboratory conditions set for the measurement of the European efficiency. For this reason, signals and model inputs were chosen to match those variables that are controlled during such measurements, see Table 1.
The only necessary metadata for the calculation of η_{EURO} are ${P}_{A{C}_{RATED}}$ and the inverter's nominal AC voltage. Inverter rated values could be approximated using historical values of P_{AC} and V_{AC}; however, these could lead to an increment in uncertainty, so it is advisable to work with the real values.
2.3 Data cleaning and scaling
Onsite collected data are not free of errors; therefore, a procedure for data cleaning must be implemented. The objective of this step will be to identify those points that do not conform to the inverter's statistically normal behaviour. After said points have been identified, they will be removed from the dataset. The followed procedure attempts to detect anomalies in the dataset by measuring the reconstruction error of the analysed data signals with respect to a vector quantisation model; moreover, it also considers classic data cleaning techniques such as outofrange data, missing data (e.g., nulls), and duplicated time stamps. Note that inverter efficiency is not measured but calculated, the use of independent measurements of P_{AC} and P_{DC} for efficiency calculation means that (1) may produce erroneous or anomalous values (e.g., efficiencies over 100%). For this reason, a proper data cleaning procedure is essential for this particular application.
The main steps of the data cleaning procedure are:
Remove missing data.
Remove duplicated time stamps.
Remove outofrange data.
Remove efficiencies over 100%.
Remove P_{AC} values under 2.5% of ${P}_{A{C}_{RATED}}$.
Remove reconstructionbased anomalies (P_{AC}, T_{AMB}, Q_{AC}, V_{AC}, η).
Finally, data signals are rescaled to have a zero mean and unity variance, except P_{AC} which is scaled by applying a logarithmic transformation:
$${P}_{AC}^{\text{'}}=\text{ln}\left(\frac{{P}_{AC}}{{P}_{A{C}_{RATED}}}\right)$$(4)
$${T}_{AMB}^{\text{'}}=\frac{{T}_{AMB}{\mu}_{{T}_{AMB}}}{{\sigma}_{{T}_{AMB}}}$$(5)
$${V}_{AC}^{\text{'}}=\frac{{V}_{AC}{\mu}_{{V}_{AC}}}{{\sigma}_{{V}_{AC}}}$$(6)
$${Q}_{AC}^{\text{'}}=\frac{{Q}_{AC}{\mu}_{{Q}_{AC}}}{{\sigma}_{{Q}_{AC}}}$$(7)
$${\eta}^{\prime}=\frac{\eta {\mu}_{\eta}}{{\sigma}_{\eta}}$$(8)
where µ and σ represent the mean and standard deviation, respectively.
The logarithmic transformation used in (4) was chosen to facilitate the modelling process. Early experiments showed that rescaling to zero mean and unity variance meant that much larger networks than those shown in this paper were required to obtain similar performance. This is mainly due to the highly nonlinear relationship between P_{AC} and η, especially for low P_{AC} values.
2.4 Model structure and training
Modelling inverter efficiency using field data as input presents two main challenges. The first challenge is related to providing an accurate prediction of inverter efficiency using a set of signals as model input, while the second challenge refers to accounting for the uncertainty introduced by the inherently low accuracy of field measurements. Inverter power sensors have an accuracy of about 3–4%, whereas the sensors used in a laboratory for the purpose of efficiency measurements have an accuracy of less than 0.5% [13,14].
In order to overcome these two challenges, a model that can simultaneously provide both an accurate reconstruction and a measure for uncertainty is required, for example Bayesian Neural Networks. BNNs are a special type of neural network that is trained using a Bayesian approach and introduce stochastic components into the network. By representing the weights and biases of the network as probability distributions, BNNs allow for uncertainty to be modelled into the network's output. During training, BNNs use Bayesian inference to update the distribution over the weights and biases based on the observed data [16–20]. Additionally, BNNs can act as ensemble models thanks to the model stochasticity introduced by employing stochastic weights [16]. Every evaluation of the BNN will result in a different output value, so by performing multiple evaluations of the same dataset and averaging the resulting predictions it is possible to obtain the ensemble prediction with its respective uncertainty.
The model structure chosen for this work, inspired by the model presented in [21], is a BNN with one hidden layer that uses Softplus (Soft+) as activation function, see Figure 1. The model output consists not only of a prediction for inverter efficiency, but it also provides an estimation of efficiency variance. A linear transformation is applied to the output neuron related to inverter efficiency, represented by the (Lin) function placed at its output, while a rectified linear unit (ReLU) activation function is applied to the efficiency variance neuron to guarantee that the output will result in a positive number. ReLU is a function that only allows positive or zero outputs, with all negative values saturated to zero. By including efficiency variance as a model output, variance is estimated for every set of input data and thus, an assumption of homoscedasticity is avoided. The combination of a BNN and variance estimation allows for a complete characterisation of efficiency uncertainty. The BNN's epistemic uncertainty (σ_{ep}) is evaluated through the standard deviation of efficiency predictions, whereas aleatoric uncertainy (σ_{al}) is measured through the efficiency variance [20,21], see Algorithm 1.
1: Define number of evaluations (N) for inference
2: for n ← in N do
3: ${{\displaystyle \stackrel{\u02c6}{\eta}}}_{n},{\sigma}_{n}^{2}$ = BNN(P_{AC}, T_{AMB}, V_{AC}, Q_{AC})
4: end for
5: Calculate:
$\stackrel{\u02c6}{\eta}}=\frac{1}{N}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{{\displaystyle \stackrel{\u02c6}{\eta}}}_{n$(9)
${\sigma}_{ep}=\sqrt{\frac{1}{N1}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{({{\displaystyle \stackrel{\u02c6}{\eta}}}_{n}{\displaystyle \stackrel{\u02c6}{\eta}})}^{2}}$(10)
${\sigma}_{al}=\sqrt{\frac{1}{N}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{\sigma}_{n}^{2}}$(11)
Model training is conducted in PyTorch [22] and the BNN implementation in this work uses the TorchBNN package that models network weights and biases as Gaussian distributions [23]. For every dataset two different BNNs with identical weights are initialised. The first BNN is trained by assuming that $\stackrel{\u02c6}{\eta}$ and ${\sigma}_{\eta}^{2}$ form a Gaussian distribution and minimising (12) [21,23], while the second BNN is updated by applying an Exponentially Weighted Moving Average (EWMA) to the weights of the first BNN at the end of every optimisation step, see equation (13). EWMA has been utilised in the field of Deep Reinforcement Learning to help prevent overestimation of future rewards and stabilise training [24,25]; additionally, in [26] EWMA was found to help stabilise the loss function during training and improve reproducibility of results. After model training has been completed, BNN_{t} is used for the calculation of η̂ _{EURO} and its uncertainty. One final aspect about the model is the layers labelled as “Data Rescaling” and “Data Descaling” in Figure 1. These two layers are added posttraining and their goal is to apply the required transformations to rescale the data according to (4)–(8) and to convert them back to their original units.
$$loss=logp(\eta {\displaystyle \stackrel{\u02c6}{\eta}},{\sigma}_{\eta}^{2})+lr\cdot KLD$$(12)
where logp is the loglikelihood of a Gaussian distribution and KLD, KullbackLeibler divergence, is a measure of the similarity between the resulting probability distribution of the network's weights and biases and their priors [27].
$$BN{N}_{t}=\alpha \cdot BNN+\left(1\alpha \right)\cdot BN{N}_{t}.$$(13)
The proposed model and structure are considered appropriate for this application for two main reasons. First, neural networks are universal function approximators [28] and thus, are capable of modelling any arbitrary function. Second, uncertainty in a modelling process can be assumed to be Gaussian distributed [29]; this assumption is also valid for both σ_{ep} and σ_{al}.
Fig. 1 Model structure. 
2.5 Optimisation
The use of BNNs for modelling allows for a great flexibility in terms of model structure, definition of prior distribution, optimisation loss function, among others. Unfortunately, this flexibility also means that is fairly easy to end up with a suboptimal model that may not present the required balance between reconstruction of the target variable and uncertainty. In order to avoid such drawback, an optimisation of the model's hyperparameters is required.
This work uses Optuna [30], an automatic hyperparameter optimisation software framework for ML, and the Treestructured Parzen Estimator algorithm (TPE) [31] for the optimisation of model structure and regularisation term, see Table 2. The optimisation procedure seeks to discover the parameters that minimise error reconstruction and network size, while maximising prediction uncertainty. The complete process is guided by the minimisation of the objective function defined in (14).
$$\begin{array}{ll}obj\hfill & =mse\left(\eta ,{\displaystyle \stackrel{\u02c6}{\eta}}\right)+mebe\left(\eta ,{\displaystyle \stackrel{\u02c6}{\eta}}\right)+skw\left(\eta ,{\displaystyle \stackrel{\u02c6}{\eta}}\right)\hfill \\ \hfill & {\sigma}_{ep}+loss+\frac{H}{2000}\hfill \end{array}$$(14)
where mse is the mean square error, mebe is the median bias error, skw is the skewness, σ_{ep} is the model's epistemic uncertainty, and loss refers to the model's loss at the end of training, see (12).
From a practical point of view, TPE optimises a set of discrete variables within a predefined range and evaluates the objective function at the end of each trial. Each trial represents a model trained using a combination of parameters selected by the algorithm. Table 3 presents the range of values used for the parameters to be optimised, as well as the algorithm and parameters used for training each BNN.
Optimisation parameters.
Algorithm parameters.
2.6 Fleet modelling
The main goal of this work is to provide plant owners and operators with a tool that can be used to evaluate the efficiency of an inverter type at one site or across multiple ones. For this reason it is necessary to define the proper procedure to conduct a rigorous evaluation of a fleet of inverters.
The main steps for fleet modelling are:
Collect site data.
Clean inverter data.
Choose base inverter.
Optimise model for base inverter.
Train fleetlevel model.
Train single devicelevel models.
Model optimisation is performed only with data from the base inverter in order to minimise the total amount of time required by the process. Furthermore, the optimised model will be used as a starting point for training of any subsequent models to further reduce total training time as fewer epochs are required to achieve the same performance. In this work, new models based on the optimised model are trained for only 100 epochs.
During fleet modelling two types of models can be trained. The fleetlevel model is trained by combining the data from all inverters in the plant, which would result in a BNN that models the average behaviour of the entire inverter fleet. On the other hand, single devicelevel models are trained using data from individual inverters. Both models serve different purposes, single device models can help operators discover over/underperforming inverters, whereas the fleet model can offer a consolidated view of a specific inverter type or plant.
2.7 Efficiency calculation
After model training has been completed, BNN_{t} can be used to calculate the European efficiency at either fleet or single devicelevel. For this purpose the model is fed with constant T_{AMB}, V_{AC}, and Q_{AC} values, while varying P_{AC} to replicate the testing conditions for the calculation of η_{EURO}, see (2). For each P_{AC} value a mean efficiency and uncertainty is calculated and further resampled following a Gaussian distribution to obtain a distribution of European efficiency predictions (η̂_{EURO}), efficiency uncertainty (σ_{EURO}) is calculated as the distribution's standard deviation. A pseudocode for the complete process is presented in Algorithm 2. In this work, the number of evaluations for uncertainty estimation has been set to 500.
1: Define P_{AC} values (P_{lvl}) and weights for evaluation (W)
2: Define T_{AMB}, Q_{AC}, and V_{AC} for test conditions
3: Define number of evaluations (N) for inference
4: for P_{AC} ← in P_{lvl} do
5: for n ← in N do
6: ${{\displaystyle \stackrel{\u02c6}{\eta}}}_{{P}_{A{C}_{n}}},{\sigma}_{PACn}^{2}$ = BNN_{t}(P_{AC}, T_{AMB}, V_{AC}, Q_{AC})
7: end for
8: Calculate:
$${{\displaystyle \stackrel{\u02c6}{\eta}}}_{{P}_{AC}}=\frac{1}{N}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{{\displaystyle \stackrel{\u02c6}{\eta}}}_{{P}_{A{C}_{n}}}$$(15)
$${\sigma}_{e{p}_{{P}_{AC}}}^{2}=\frac{1}{N1}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{({{\displaystyle \stackrel{\u02c6}{\eta}}}_{{P}_{A{C}_{n}}}{{\displaystyle \stackrel{\u02c6}{\eta}}}_{{P}_{AC}})}^{2}$$(16)
$${\sigma}_{a{l}_{{P}_{AC}}}^{2}=\frac{1}{N}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{\sigma}_{{P}_{AC}n}^{2}$$(17)
$${\sigma}_{{P}_{AC}}=\sqrt{{\sigma}_{e{p}_{{P}_{AC}}}^{2}+{\sigma}_{a{l}_{{P}_{AC}}}^{2}}$$(18)
9: end for
10: for n ← in N do
11:
$${\eta}_{n}={\displaystyle {\displaystyle \sum}_{{P}_{AC}}^{{P}_{lvl}}}{W}_{{P}_{AC}}\cdot \left({{\displaystyle \stackrel{\u02c6}{\eta}}}_{{P}_{AC}}+\mathcal{N}\left(0,1\right)\cdot {\sigma}_{{P}_{AC}}\right)$$(19)
12: end for
13: Calculate:
$${\u0302}_{\eta}={\displaystyle {\displaystyle \sum}_{{P}_{AC}}^{{P}_{lvl}}}{W}_{{P}_{AC}}\cdot {{\displaystyle \stackrel{\u02c6}{\eta}}}_{{P}_{AC}}$$(20)
$${\eta}_{nv}=\frac{1}{N}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{\eta}_{n}$$(21)
$${\sigma}_{EURO}=\sqrt{\frac{1}{N1}{\displaystyle {\displaystyle \sum}_{n=1}^{N}}{({\eta}_{n}{\eta}_{nv})}^{2}}$$(22)
3 Case study
This section introduces a case study for testing and validation of the proposed model and methodology. The study has been performed on a PV plant with 43 inverters of the same type and data was taken from an internal database. Table 4 presents some relevant information related to the plant under analysis, as well as the characteristics of the installed inverters and the time period covered by the study.
Plant metadata.
3.1 Optimisation results
Following the procedure listed in Section 2.6, data for one inverter was used to optimised model parameters according to Section 2.5. Although different criteria could be used for base inverter selection (e.g., inverter with the largest clean dataset), for this case study, the base inverter was selected at random. The results from the optimisation procedure are presented in Table 5.
Figure 2 shows the evolution of Optuna's objective function for every trial in the optimisation process. The Figure also presents the objective function value for the “Best trial”, which represents the best solution found after any given number of trials. It can be seen that, in general, the objective function values display a very stable behaviour. Moreover, “Best trial” values present a long period with no changes, after a rapid improvement at the beginning of the process. Finally, the best value was reached at trial number 61; however, the difference with respect to the previous best value doesn't represent a major improvement. No other changes were seen until the end of the optimisation process, so it is accepted that the “Best trial” represents a nearoptimal solution that is adequate for this study.
Optimal parameters.
Fig. 2 Optimisation history. 
3.2 Fleetlevel efficiency
The optimised model presented in the previous section was used as the basis for the training of a new fleetlevel model. The data from all 43 inverters was collected into a single dataset with approximately 1.6 million data points, a 70/30 split was performed on the dataset to create independent training and testing datasets, respectively.
All results and plots presented in the subsequent subsections were obtained for the testing dataset. Performance metrics for the training dataset present no meaningful differences to those displayed here and are omitted for the sake of brevity.
Figure 3a presents a 2D histogram of η vs $\stackrel{\u02c6}{\eta}$, as well as a green line which represents the identity function. The 2D histogram shows that the densest regions are concentrated along the identity function. Additionally, Figure 3b presents a histogram of errors obtained from the evaluation of the fleetlevel model. The found error distribution is centered around zero and shows no significant skewness. The behaviour shown in Figure 3 is representative of a welltrained model with no systematic bias and small reconstruction errors.
Table 6 presents the Root Mean Square Error (RMSE), Mean Bias Error (MBE), σ_{ep}, and σ_{al} obtained at the end of model training. These results show that despite the combination of multiple inverters, the model is capable of generating efficiency predictions with small reconstruction errors and negligible bias. Another important aspect is the found values for σ_{ep} and σ_{al}, with σ_{al} presenting a much higher value than σ_{ep}. Two conclusions can be drawn from this fact. First, the model is confident about its predictions, hence the low σ_{ep} value, and second, most of the uncertainty stems from the data, reflected on the higher σ_{al}. These two conclusions are not surprising considering that σ_{ep} can be minimised by collecting enough data and data noise will increase by combining data from multiple inverters, with potentially different behaviours.
The notable performance achieved by the model, as reflected in Figure 3 and Table 6, proves that the model can be used for the calculation of η̂_{EURO}. As described in Algorithm 2, the first step involves calculating the mean efficiency and uncertainty for the different P_{AC} levels defined in (2), see Table 7. Figure 4a depicts a graphical representation of the values in Table 7. The Figure also includes a 2D histogram of P_{AC} vs η for the complete dataset, together with two dashed lines that represent η_{EURO} and η_{TOL} for the inverter type installed at the PV plant, see Table 4.
The values in Table 7 show that the maximum efficiency is reached at 30% of ${P}_{A{C}_{RATED}}$, while the minimum uncertainty is found at 50% of ${P}_{A{C}_{RATED}}$. As expected, the highest uncertainty is reached at 5% of ${P}_{A{C}_{RATED}}$. Furthermore, the distributions depicted in Figure 4a show that the mean efficiency values, the blue squares, fall within the boundaries of the cloud formed by the 2D histogram of P_{AC} vs η, and efficiency uncertainty matches the spread shown by the histogram, additional proof of the model's quality. Finally, it can be noted that all mean efficiency values fall under the dashed lines representing η_{EURO} and η_{TOL}, a presage of the results discussed next.
The mean efficiency values in Table 7 were used to calculate η̂_{EURO} using (2), while sampled efficiency values for the different P_{AC} levels, Figure 4a, were combined to obtain a distribution of η̂_{EURO} values and calculation of σ_{EURO}, see Figure 4b. For this case study, η̂_{EURO} and σ_{EURO} achieved values of 97.6% and 0.24%, respectively. It can be seen that the fleetlevel η̂_{EURO} is lower than both η_{EURO} and η_{TOL}. In fact, only a reduced number of samples in the η̂_{EURO} distribution reach values above η_{TOL}.
Fig. 3 Training results − Fleet modelling. 
Fleet model training.
Fleet efficiency [%].
Fig. 4 Fleet efficiency. 
3.3 Single devicelevel efficiency
As in the previous section, the optimised model was again used to initialise the training of new models for the remainder inverters in the fleet. In total 42 additional models were trained. Table 8 presents a summary of model training and calculated efficiencies, while Figure 5 depicts η̂_{EURO} together with the uncertainty for every inverter, as well as η_{EURO} and η_{TOL}.
Performance metrics (RMSE, MBE, σ_{ep}, and σ_{al}) in Table 8 show lower values than those presented in Table 6. This behaviour is completely aligned with the expected performance of a model trained from a single data source compared to one that has been trained by combining multiple sources. By focusing on data from a single inverter, data noise is reduced and inverter behaviour is limited to one operational mode, which will result in a smaller reconstruction errors and reduced uncertainty, both epistemic and aleatoric. Furthermore, the almost negligible standard deviations show that model performance is stable across the different inverters.
The average η̂_{EURO} presented in Table 8 matches the one found by the fleetlevel model. This finding corroborates the thesis that the fleetlevel model is representative of the average behaviour of the entire fleet. The individual η̂_{EURO} values show that only 2 inverters present efficiencies above η_{TOL}, the number of inverters increases to 5 if the uncertainty's upper level (i.e., η̂_{EURO} + σ_{EURO}) is taken into account. In general, most η̂_{EURO} values seem to be located close to the fleet mean; however, there are a few inverters that display particularly low efficiency values that may deserve special attention.
Device efficiency.
Fig. 5 Device efficiency. 
3.4 Efficiency over time
So far the case study has only involved analysing data from one year; however, inverter behaviour changes over time and therefore, inverter efficiency may also be affected by the inverter's age. In this section the optimised model was utilised to train new models that illustrate the evolution of fleetlevel η̂_{EURO} as a function of time. Inverter data was collected starting from 2020/01/01, and multiple models were trained using one year's worth of data. The results of the study are presented in Table 9, showing results from seven models. These models were trained using a “skipping” window approach for data selection, that is, the date in Table 9 corresponds to the last day of the training period, which includes the entire previous year up to that date. To ensure consistency, data for each time window were cleaned and processed independently. Figure 6 displays the data from Table 9 as a time series, where the shaded area represents the efficiency's uncertainty.
Results show that inverter efficiency is constant over time, except for the first year, when η̂_{EURO} presented slightly lower values. Through conversations with the plant operator, it was discovered that most inverters in the fleet underwent replacement of internal components during this period of time, which may have affected the inverters' efficiency.
All efficiency values presented in this section are below the inverter's reported η_{EURO} and only 2 devicelevel η̂_{EURO} are greater than η_{TOL}. By considering the efficiencies upper level, 2 inverters present efficiencies greater than η_{EURO} and 5 greater than η_{TOL}, respectively. At this point, one could consider that the model may underestimate inverter efficiency, but the fact that at least 2 inverters present acceptable η̂_{EURO} values and that all trained models show low reconstruction errors would indicate that the calculated η̂_{EURO} values are representative of the collected operational data. Additionally, loss of calibration in inverter sensors could also affect efficiency calculations; however, it seems highly unlikely that all inverter sensors would have lost calibration in a such a manner that would result in lower efficiency values for all inverters. In fact, these values are aligned with the findings of [5,7], where calculated inverter efficiencies were found to be persistently lower than those reported by the manufacturers. However, care must be taken not to generalise beyond the scope of the present case study, as actual results will vary from inverter to inverter and conclusions must be drawn on a casebycase basis.
Efficiency over time.
Fig. 6 Efficiency over time. 
4 Discussion
Modelling uncertainty through Machine Learning is no easy task. There is a myriad of models, tools, and techniques that can be applied for this purpose, all of them with their own strengths and limitations [20]. Model selection for this work (i.e., BNN) was done on the basis of ease of implementation, flexibility, and robustness of the Bayesian framework. However, this does not mean that there is no other option that could yield similar or even better results. The choice of model, activation function or prior distribution may have an effect in the uncertainty estimation, especially if the testing conditions needed for the calculation of the European efficiency require extrapolation with respect to the training dataset. One example of the difficulty of training a BNN and how parameter selection can affect the results is the choice of regularisation term in (12). This term is meant to balance the weight KLD has on the BNN optimisation; if KLD is too prominent on the training loss, training will result in a BNN whose weights match the prior distribution perfectly but has no reconstruction capabilities. On the other hand, if the regularisation term is equal to zero, KLD will play no role on the optimisation process and the resulting BNN will be equivalent to a neural network of the same structure that is unsuitable for uncertainty evaluation. Fortunately, the swift integration among PyTorch, TorchBNN, and Optuna allows for the development of an optimised, fitforpurpose model whose results can be used with confidence.
The low σ_{ep} values are an indication of the model's certainty regarding its own predictions. Despite these low values, there are aspects of the inverter's operation that have not been included in the methodology and that could affect efficiency modelling. Clipping, curtailment, and selfconsumption have not been yet considered and it is unknown what their effect on inverter efficiency may be. For the presented case study, clipping and curtailment did not represent a major obstacle since the analysed PV plant does not have a high DC to AC ratio and curtailment is only seldom applied and therefore, did not have a significant impact on model training. The presence of these two conditions in the training dataset could be minimised by implementing an irradiancebased cleaning procedure that removes data points that do not match the expected relationship between irradiance and P_{AC}, as well as those operational points where high irradiance values cause clipping. However, as heavy clipping (caused by trackers, high DC/AC ratio, and bifacial modules) and curtailment become more prominent, it will become necessary to develop a new methodology that not only can identify these two conditions, but also considers them as model inputs. Furthermore, a new definition of inverter efficiency may be necessary, one that accurately reflects variations in efficiency at different power outputs, as well as during clipping and curtailment. The development of this new definition would be akin to that of the creation of the CEC efficiency, that is, a new definition that is better suited to the inverter's actual operational conditions.
The effect of selfconsumption on inverter efficiency represents a greater challenge as its inclusion or omission depends on the “position” of the inverter's internal load with respect to the P_{AC} sensor. According to the diagrams in [13,14], P_{AC} is measured without the inclusion of selfconsumption, that is, the internal load is placed downstream from the sensor. However, the location of both sensor and load are fixed for any real inverter, so there could potentially be a configuration where the internal load may be placed upstream from the P_{AC} sensor and have an undesired effect on efficiency calculation. Regarding the case study, it does not seem to appear that the inverter's selfconsumption load has been inadvertently included in the efficiency calculation, only the inverter data sheet was available and no internal diagram was included. Moreover, it is known from the same data sheet that the maximum inverter selfconsumption is 8.1kW, which is equal to 0.3 % of ${P}_{A{C}_{RATED}}$ . If this value were simply added to the fleetlevel efficiency, a bestcase scenario, the resulting value would still be below η_{TOL}, so the conclusions presented in Section 3.2 still hold valid.
Another important aspect to consider is the timeresolution of the monitored data. Lower frequencies where data is averaged over a longer sampling period may lead to an underestimation of inverter efficiency. Due to the nonlinearity of the efficiency versus power curve, the inclusion of low power values in the calculation of the sampling period mean could result in lower values that are not fully representative of the real efficiency. This is an unavoidable aspect of the way data is monitored in modern PV plants and one that can only be avoided by having access to higher frequency data. Fortunately, the proposed methodology is agnostic to such parameter and can therefore accommodate a lower sampling period without the need to introduce any changes.
As it was previously mentioned, inverter measurements have very low accuracy. For this reason, inverter data has not been typically used in sensitive applications; estimating inverter efficiency is one such application where low quality data may lead to erroneous conclusions, note the small difference between η_{EURO} and η_{TOL} in Table 4. Nevertheless, Machine Learning models, such as neural networks and Bayesian neural networks, are capable of discovering the true function under the noise and its predictions can be used in substitution of the original noisy data. Furthermore, the inclusion of uncertainty analysis provides greater confidence as it avoids the oversimplification of drawing conclusions from a single value when a probabilistic analysis would be more appropriate. In spite of this reasoning, there may be some that reject the possibility of basing their conclusions on inverter data. In this case it would be important to consider what the other available options are. It is certainly possible to have an inverter sent to a laboratory for efficiency measurement, but this would include decommissioning the inverter, sending it to the laboratory, have the measurements taken, sending it back, and have it recommissioned. During the time the inverter is away, the operator should provide a temporary replacement or face a reduction in energy production. Additionally, this means that only one inverter could be tested at any given point, due to the operational and logistic limitations. Therefore, this approach is neither reasonable nor economically feasible, especially for large inverters, and thus, out of question. Another option would be simply not to act, that is, the plant's owners and operators should act on the assumption that their inverters are operating as intended by the manufacturer. Unfortunately, experience has shown components do not always function as per their design, which is why onfield validation of inverter behaviour is a practice that plant owners and operators cannot afford to waiver. Even if the considered validation approach may face some limitations, it is always advisable to do something rather than nothing.
5 Conclusions
This paper has presented a methodology for the estimation of inverter efficiency using only field data. The methodology is based on a Bayesian neural network capable of predicting inverter efficiency given a set of input signals (P_{AC}, T_{AMB}, V_{AC}, and Q_{AC}). The model is also capable of providing a full characterisation of efficiency uncertainty through the prediction of efficiency variance and variational inference.
The core concept behind this work is that should enough field data be available, then it is possible to construct a datadriven model that would provide an accurate representation of a physical component for almost any operational condition. Subsequently, this model could be used to replicate the component's behaviour under a set of desired conditions, whose exact combination may not have been previously seen by the component, in order to infer any performance index that may be of interest to the user.
Results show that an optimised model is capable of providing a prediction of inverter efficiency with small reconstruction errors and negligible bias. The proven performance makes the developed model adequate for replicating the European efficiency as reported by the inverter manufacturer. Calculation of European efficiency requires only knowledge of the inverter's rated power output and nominal AC voltage.
The presented case study has also shown that the proposed methodology can be used to estimate both fleet and single devicelevel efficiencies. Fleetlevel efficiency can help to provide an overview of the average efficiency for a specific inverter type on a single PV plant or across multiple sites, whereas single devicelevel efficiencies are useful to analyse the individual behaviour of each inverter in a fleet.
Future work will be aimed at evaluating a larger number of PV plants and aggregating efficiency values according to parameters such as inverter type, manufacturer, operator, site conditions, etc. Furthermore, additional effort is required to assess the impact of clipping, curtailment and selfconsumption, as well as other factors, in modelling inverter efficiency.
Funding
This research received no external funding.
Conflicts of interest
The authors have nothing to disclose.
Data availability statement
Data associated with this article cannot be disclosed due to legal reasons.
Author contribution statement
All the authors were involved in the preparation of the manuscript. All the authors have read and approved the final manuscript.
References
 DNV, Energy transition outlook. 2023. https://www.dnv.com/energytransitionoutlook/download [Google Scholar]
 M. Morey, N. Gupta, M.M. Garg, A. Kumar, A comprehensive review of gridconnected solar photovoltaic system: architecture, control, and ancillary services, Renew. Energy Focus 45, 307 (2023) [CrossRef] [Google Scholar]
 FraunhoferISE, Photovoltaics report. 2023. https://www.ise.fraunhofer.de/content/dam/ise/de/documents/publications/studies/PhotovoltaicsReport.pdf. Accessed: 20240430 [Google Scholar]
 R. Grab, F. Hans, M.I.R. Flores, H. Schmidt, S. Rogalla, B. Engel, Modeling of photovoltaic inverter losses for reactive power provision, IEEE Access 10, 108506 (2022) [CrossRef] [Google Scholar]
 K. Passow, L. Ngan, A. Panchula, Selfreported field efficiency of utilityscale inverters, in 2014 IEEE 40th Photovoltaic Specialist Conference (PVSC) (2014), pp. 1963–1968 [Google Scholar]
 L. Kaci, D.A.H. Arab, R. Zirmi, S. Semaoui, S. Boulahchiche, Solar inverter performance prediction, in 2020 6th International Symposium on New and Renewable Energy (SIENR) (2021), pp. 1–5 [Google Scholar]
 N. Allet, F. Baumgartner, J. Sutterlueti, S. Sellner, M. Pezzotti, Inverter performance under field conditions, in 27th EU PVSEC (2012) [Google Scholar]
 S. Krauter, J. Bendfeld, Microinverter pv systems: new efficiency rankings and formula for energy yield assessment for any pv panel size at different microinverter types, in 8th World Conference on Photovoltaic Energy Conversion (2022) [Google Scholar]
 D.L. King, S. Gonzalez, G.M. Galbraith, W.E. Boyson, Performance model for gridconnected photovoltaic inverters, tech. rep., Sandia National Laboratories, 2007. Accessed: 202404–30 [Google Scholar]
 S. Suarez, V. Daniel, G.A. Navas, J. Vilela, I. Fernandez, S. RodríguezConde, Central inverter testing under real outdoor conditions. a controllable analysis under noncontrollable conditions using statistics. a real case study, in 40th EU PVSEC (2023) [Google Scholar]
 C. Hansen, J. Johnson, R. DarbaliZamora, N.S. Gurule, S. Gonzalez, M. Theristis, Modeling inverters with multiple inputs: Test procedure for measuring efficiency, in 8th World Conference on Photovoltaic Energy Conversion (2022) [Google Scholar]
 E. Engel, N. Engel, A review on machine learning applications for solar plants, Sensors 22, 23 (2022) [Google Scholar]
 I.E. Commission, Photovoltaic systems − power conditioners − procedure for measuring efficiency, standard, International Electrotechnical Commission (1999) [Google Scholar]
 I.E. Commission, Maximum power point tracking efficiency of grid connected photovoltaic inverters, standard, International Electrotechnical Commission (2020) [Google Scholar]
 PVSYST, European or cec efficiency. 2023. https://www.pvsyst.com/help/inverter_euroeff.htm. Accessed: 20240430 [Google Scholar]
 Z.H. Zhou, Ensemble Methods: Foundations and Algorithms, 1st ed. (Chapman and Hall/CRC, 2012) [CrossRef] [Google Scholar]
 E. Goan, C. Fookes, Bayesian Neural Networks: An Introduction and Survey (Springer International Publishing, 2020), pp. 45–87 [Google Scholar]
 D.M. Titterington, Bayesian methods for neural networks and related models, Stat. Sci. 19, 128 (2004) [CrossRef] [PubMed] [Google Scholar]
 J. Lampinen, A. Vehtari, Bayesian approach for neural networks—review and case studies, Neural Netw. 14, 257 (2001) [CrossRef] [Google Scholar]
 V. Nemani, L. Biggio, X. Huan, Z. Hu, O. Fink, A. Tran, Y. Wang, X. Zhang, C. Hu, Uncertainty quantification in machine learning for engineering design and health prognostics: a tutorial, Mech. Syst. Signal Process. 205, 110796 (2023) [CrossRef] [Google Scholar]
 B. Harnist, S. Pulkkinen, T. Mäkinen, Deuce v1.0: a neural network for probabilistic precipitation nowcasting with aleatoric and epistemic uncertainties, Geosci. Model Dev. 17, 3839 (2024) [Google Scholar]
 A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E.Z. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style, highperformance deep learning library, in NeurIPS, edited by H.M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché Buc, E.B. Fox, R. Garnett (2019), pp. 8024–8035 [Google Scholar]
 S. Lee, H. Kim, J. Lee, Graddiv: adversarial robustness of randomized neural networks via gradient diversity regularization, IEEE Trans. Pattern Anal. Mach. Intell. 45, 2645 (2022) [Google Scholar]
 T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, arXiv:1509.02971 (2015). https://doi.org/10.48550/arXiv.1509.02971 [Google Scholar]
 H. van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double qlearning, arXiv:1509.06461 (2015). https://doi.org/10.48550/arXiv.1509.06461 [Google Scholar]
 G. Guerra, P. MercadeRuiz, G. Anamiati, L. Landberg, Longterm pv system modelling and degradation using neural networks, EPJ Photovolt. 14, 30 (2023) [CrossRef] [EDP Sciences] [Google Scholar]
 S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22, 79 (1951) [CrossRef] [Google Scholar]
 G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Syst. 2, 303 (1989) [CrossRef] [Google Scholar]
 A.D. Kiureghian, O. Ditlevsen, Aleatory or epistemic? does it matter? Struct. Safety 31, 105 (2009) [CrossRef] [Google Scholar]
 T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: a nextgeneration hyperparameter optimization framework, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) [Google Scholar]
 J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyperparameter optimization, in Advances in Neural Information Processing Systems, edited by J. ShaweTaylor, R. Zemel, P. Bartlett, F. Pereira, K. Weinberger (Curran Associates, Inc., 2011), Vol. 24 [Google Scholar]
Cite this article as: Gerardo Guerra, Pau Mercadé Ruiz, Gaetana Anamiati, Lars Landberg, Uncertaintyaware estimation of inverter field efficiency using Bayesian neural networks in solar photovoltaic plants, EPJ Photovoltaics 15, 33 (2024)
All Tables
All Figures
Fig. 1 Model structure. 

In the text 
Fig. 2 Optimisation history. 

In the text 
Fig. 3 Training results − Fleet modelling. 

In the text 
Fig. 4 Fleet efficiency. 

In the text 
Fig. 5 Device efficiency. 

In the text 
Fig. 6 Efficiency over time. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.