| Issue |
EPJ Photovolt.
Volume 16, 2025
Special Issue on ‘EU PVSEC 2025: State of the Art and Developments in Photovoltaics', edited by Robert Kenny and Carlos del Cañizo
|
|
|---|---|---|
| Article Number | 32 | |
| Number of page(s) | 16 | |
| DOI | https://doi.org/10.1051/epjpv/2025021 | |
| Published online | 10 December 2025 | |
https://doi.org/10.1051/epjpv/2025021
Original Article
An interpretable AI framework for clear-sky detection in photovoltaics monitoring
1
North China Electric Power University, School of Electrical and Electronic Engineering, Baoding, P.R. China
2
École Polytechnique Fédérale de Lausanne (EPFL), Institute of Electrical and Micro Engineering (IEM), Photovoltaics and Thin-Film Electronics Laboratory, CH-2002 Neuchâtel, Switzerland
3
CSEM, Sustainable Energy Centre, CH-2002 Neuchâtel, Switzerland
4
3S Swiss Solar Solutions AG, CH-3645 Gwatt (Thun), Switzerland
* e-mail: hugo.quest@epfl.ch
Received:
24
July
2025
Accepted:
1
November
2025
Published online: 10 December 2025
Accurate clear-sky detection (CSD) is essential for reliable data analysis and performance assessments in photovoltaic (PV) systems. However, many advanced machine learning (ML) models function as “black boxes”, limiting their interpretability and trustworthiness. This study presents an interpretable Artificial Intelligence (AI) framework that combines high predictive performance with deep insight into model decision-making. Using a hand-labelled dataset from a fixed-tilt PV system in Golden, Colorado, USA, with 1 min plane-of-array (POA) measurements of global horizontal irradiance (GHI), direct normal irradiance (DNI), and diffuse horizontal irradiance (DHI), a Categorical Boosting (CatBoost) classifier is developed for CSD. The model is iteratively refined through a closed-loop diagnostic process guided by SHapley Additive exPlanations (SHAP). Misclassified instances are analysed using dimensionality reduction via Uniform Manifold Approximation and Projection (UMAP) and clustering, revealing distinct, physically-grounded failure modes such as “cloud enhancement”, where reflected or scattered sunlight temporarily increases irradiance, and “hazy but stable conditions”, where thin atmospheric haze slightly attenuates sunlight without introducing variability. Insights from this analysis inform targeted feature engineering, yielding a refined model with high classification performance quantified by an F1-score of 97.3%, along with substantially reduced false positive (1.99%) and false negative (7.0%) rates, reflecting both overall accuracy and balanced sensitivity to clear-sky and non-clear-sky periods. This interpretable framework improves the reliability of clear-sky filtering for downstream PV applications, including fault detection and diagnosis (FDD) and long-term performance loss rate (PLR) estimation, and provides a transferable methodology for developing trustworthy AI models in energy systems.
Key words: Clear-sky detection (CSD) / artificial intelligence (AI) / photovoltaics (PV) / SHAP (SHapley Additive exPlanations) / model interpretability / misclassification analysis
© B. Li et al., Published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
1.1 Clear-sky detection in photovoltaics − A brief review
The global photovoltaics (PV) market is experiencing unprecedented growth, with over 600 GW of new capacity commissioned in 2024, bringing the total cumulative capacity to significantly over 2.2 TW [1]. This rapid expansion highlights the critical importance of PV systems in the global energy mix. Consequently, ensuring optimal energy yield, operational reliability, and overall bankability from these solar assets requires robust and accurate monitoring and analysis methodologies [2,3]. A key requirement for many PV and solar resource assessment tasks is reliable characterization of sky conditions, typically done through Clear-Sky Detection (CSD) [4–8]. CSD methodologies aim to accurately distinguish periods of cloudless sky from those affected by cloud cover or other atmospheric obscuration, providing an important reference for various applications [9]. These include detailed solar resource assessment, the accurate calculation of PV system performance metrics such as the Performance Ratio (PR) [10,11], effective fault diagnosis (FD) [12–14], and dependable long-term degradation analysis, for instance, in determining the Performance Loss Rate (PLR) [15–17]. Therefore, the accuracy and trustworthiness of the employed CSD methods directly and significantly impact the quality and reliability of these downstream analyses.
A variety of CSD methodologies have been proposed to identify cloudless periods from solar irradiance time series. Traditional approaches often utilize ground-based irradiance measurements and their relationship to theoretical clear-sky irradiance values derived from physics-based models, such as the Ineichen or Bird models [18–20], which themselves have been extensively evaluated. Prominent examples of such CSD algorithms include the widely recognized Reno and Hansen method [21], which analyses statistical properties of Global Horizontal Irradiance (GHI) within moving windows relative to a clear-sky curve. Aiming for global applicability, the BRIGHT-SUN model was developed [5], integrating clear-sky irradiance optimization, a tri-component (GHI, DNI, DHI) analysis building on the Reno method, and cascading duration filters. Comprehensive reviews, such as that by Sun et al. [22], have systematically evaluated numerous clear-sky detection (CSD) techniques, examining 95 direct normal irradiance (DNIcs) and 88 diffuse horizontal irradiance (DIFcs) models across multiple climate zones, and highlighting the general preference for multi-criteria approaches as well as the challenges in achieving consistent performance globally. More recently, machine learning (ML) techniques have been increasingly applied to CSD. Liu et al. [7] developed a Random Forest (RF)-based CSD model and validated it against multiple existing methods at a polluted suburban site using 1 min irradiance measurements and sky imager data, demonstrating improved detection accuracy, particularly for longer time-averaged data intervals. Jordan & Hansen [6] further adapted the Reno method using multiple regression to optimize parameters for different data averaging intervals of plane-of-array (POA) irradiance, specifically targeting PV degradation analysis. Adding to this trend, Lusi et al. [23] introduced a probabilistic deep learning model using convolutional neural networks (CNNs) on all-sky imagery, which not only classifies the sky state but also quantifies the uncertainty of its predictions. While these ML models often achieve higher accuracies, their decision-making processes can be opaque.
1.2 Challenges in clear-sky detection methods
Despite their importance in PV analysis, existing CSD methods face persistent, yet distinct challenges. Conventional physics-based methods, based on empirical thresholds, are often confounded by the inherent stochasticity of atmospheric conditions [8,24,25]. These conditions include transient cloud phenomena, high aerosol loading, and complex radiative scattering at low solar elevations, which collectively compromise the generalizability of methods across various climatic regions [5]. In contrast, while ML models can achieve superior predictive accuracy [6,7,23], their intrinsic “black-box” nature introduces a different set of challenges centered on operational transparency.
Clear-sky detection algorithms differ substantially depending on the type of reference data used. Some methods rely on sky images, capturing visual cloud cover patterns [26,27], whereas others use irradiance measurements, typically global horizontal irradiance (GHI) [28], or, less commonly, the full set of irradiance components including direct normal irradiance (DNI) [29] and diffuse horizontal irradiance (DHI) [30]. The choice of reference data affects both the definition of “clear-sky” and the applicability of the algorithm across different PV systems and climatic conditions.
Temporal criteria also vary between algorithms. Point-to-point approaches classify each measurement independently, providing high temporal resolution but potentially higher sensitivity to short-lived fluctuations [31]. In contrast, methods using broader time-span metrics, such as rolling 5–10 min standard deviations, daily means, or multi-hour variability, capture temporal stability and reduce susceptibility to transient clouds or minor irradiance deviations, at the expense of finer resolution [32–36]. These differences influence both false positive and false negative rates in CSD, particularly under challenging atmospheric conditions.
Supervised ML approaches, in particular, rely on labelled datasets that define which periods correspond to clear-sky conditions − resources that are still relatively scarce in the PV domain. Jordan & Hansen [6] address this gap by providing a well-structured, hand-labelled CSD dataset, which will be used in this work to train and evaluate a new model (Fig. 1). It is important to note that the definition of “clear-sky” is inherently context-dependent. For example, a period considered clear-sky based on GHI may not correspond to clear conditions in terms of DNI or sky imagery. The Jordan & Hansen dataset reflects manual labels that follow specific application criteria, which may not generalize to all PV systems or measurement types. Acknowledging these ambiguities is critical, as they influence both model training and the interpretation of CSD results. This opacity is particularly problematic in PV monitoring, where understanding and trusting predictions is crucial [37,38]. Consequently, operators and analysts are left unable to understand, question, or trust the model's outputs, particularly in the face of an erroneous prediction. Without a clear framework for interpreting why a model arrives at a specific conclusion − especially a faulty one − the development of truly robust and verifiable PV monitoring systems is fundamentally constrained.
![]() |
Fig. 1 Example of the hand-labelled clear-sky dataset. The plot displays 1 min time series of Global Horizontal Irradiance (GHI) and Diffuse Horizontal Irradiance (DHI) to visually illustrate the difference between clear periods (smooth GHI, low DHI) and cloudy conditions. The source dataset, from a fixed-tilt PV system in Golden, CO, USA [6], also provides the Direct Normal Irradiance (DNI) and Plane-of-Array (POA) irradiance used in this study. The clear-sky periods, highlighted in green, were manually labelled based on comparisons to a clear-sky model and stable temporal variability. |
1.3 Proposed solution − Explainable AI
To address both the accuracy and interpretability challenges in CSD, Explainable Artificial Intelligence (XAI) techniques are leveraged. XAI provides tools to make ML models transparent and understandable, helping operators trust predictions and identify systematic biases. In particular, SHAP (SHapley Additive exPlanations) quantifies the contribution of each input feature to individual predictions, enabling a detailed understanding of why a model classifies a period as clear-sky or not [39,40]. SHAP values can also be used as a new feature space for unsupervised exploration, for example through clustering or visualization techniques such as UMAP (Uniform Manifold Approximation and Projection) [41–43]. These methods enable both global and local interpretability, bridging the gap between complex ML models and domain experts. Given the inherent ambiguities in the definition of clear-sky (Sect. 1.2), explainability is critical. For example, periods labelled as clear-sky based on GHI may still exhibit slight attenuation in DNI or subtle cloud effects not captured by the dataset. By analyzing feature contributions and error patterns, our framework identifies these nuances, allowing targeted refinements and increasing model robustness.
For mission-critical systems like PV monitoring, however, the ultimate goal is not just interpretability but achieving genuine trustworthiness, a principle increasingly formalized by bodies such as the European Commission [44–46]. This requires a method that prioritises both robustness and transparency [47]. In this context, deep, diagnostic transparency serves as the primary mechanism for achieving robustness. However, the development and application of such holistic, trustworthy approaches remain largely unexplored in the PV domain, especially for the task of CSD. To address this gap, this paper presents an interpretable AI framework for advanced CSD in PV. The primary scientific and methodological contributions are as follows:
Development of a high-performance CatBoost-based CSD classifier and a novel diagnostic analysis of its failure modes. Going beyond standard performance metrics, SHAP and clustering techniques are applied specifically to the model's misclassifications. This reveals the underlying decision mechanisms and systematic patterns of error, providing a deep, evidence-based understanding of the model's weaknesses.
Implementation of a transparent, closed-loop framework that operationalises the principles of Trustworthy AI. Diagnostic insights from failure analysis are systematically integrated back into the model development process, enabling an iterative diagnose-and-refine cycle that enhances robustness and accountability.
Assessment of the framework's downstream impact in PV assessments, demonstrating improved reliability in fault diagnosis and greater accuracy in long-term performance analysis.
1.4 Algorithms and evaluation metrics
To ensure both high predictive accuracy and interpretability, the proposed framework integrates several ML techniques and evaluation metrics commonly used in data-driven modelling. CatBoost − a gradient boosting algorithm optimized for tabular data − is employed as the core classifier [48]. Gradient boosting builds a series of decision trees sequentially, where each new tree corrects errors made by previous ones [49]. CatBoost extends this approach by efficiently handling categorical variables and reducing overfitting, making it particularly suitable for complex PV datasets that include multiple irradiance components and environmental factors [50,51].
To interpret the model's predictions, SHAP values are used. SHAP quantifies the contribution of each input feature to a given prediction by attributing “credit” to each feature based on its impact [39]. For instance, it can reveal how variations in GHI, DNI, or DHI influence the model's classification of a period as clear-sky. Beyond individual predictions, SHAP values can be used in combination with clustering and UMAP to visualize patterns of feature importance across the dataset, helping identify systematic biases or recurring misclassifications. UMAP reduces the high-dimensional SHAP feature space into two or three dimensions, facilitating intuitive visual exploration of clusters corresponding to different atmospheric conditions [52,53].
2 Methodology: interpretable and iterative CSD framework
The defined approach is designed not only to achieve high accuracy, but also to remain interpretable for PV and solar engineering applications. The methodology follows a two-stage refinement cycle, in which systematic analysis of model errors provides guidance for targeted feature improvements (Fig. 2). The iterative process is structured as follows:
Baseline model creation: An initial Baseline CSD Model is trained using the foundational feature set (Sect. 2.1) and the standardized training protocol (Sect. 2.2). This model serves as the starting point for the analysis.
First refinement (Advanced model): The Baseline Model's False Positive (FP) errors are diagnostically analyzed using SHAP and clustering techniques (Sect. 2.3). Insights from this analysis guide a first round of incremental feature engineering to create the Advanced CSD Model (Sect. 2.4).
Second refinement (Refined model): The False Negative (FN) errors produced by the Advanced Model are then analyzed using a SHAP-based comparative method. This second diagnostic step informs a final round of feature engineering to produce the superior Refined CSD Model (Sect. 2.4).
By explicitly combining interpretability with iterative model refinement, the framework addresses both the black-box nature of ML and the ambiguities of the clear-sky concept. This approach enables operators to understand why a model succeeds or fails under different conditions, enhancing trust and facilitating practical application in PV monitoring systems.
![]() |
Fig. 2 Interpretable CSD development framework. Conceptual overview of the iterative approach combining classifier training, SHAP-based misclassification analysis, and informed feature engineering. The method follows a closed-loop cycle: building a model, diagnosing its failure modes, and using the resulting insights to refine the model in successive iterations. |
2.1 Data and foundational feature set
High-quality, labelled clear-sky datasets are rare, yet they are essential for developing and evaluating models that aim to detect or reconstruct true clear-sky conditions. In this study, the hand-labelled dataset provided by Jordan & Hansen [6] offers trusted ground-truth labels for supervised model training [6]. The dataset was collected from a fixed-tilt PV system located in Golden, Colorado, USA (39.75°N, 105.22°W; elevation 1,790 m), capturing a temperate continental climate with seasonal variations, transient clouds, and occasional high aerosol events. Measurements were recorded at a 1 min resolution and include plane-of-array (POA) Global Horizontal Irradiance (GHI), Direct Normal Irradiance (DNI), and Diffuse Horizontal Irradiance (DHI) using research-grade pyranometers and pyrheliometers [5,8,21]. The measurement site is located in a temperate semi-arid climate, characterized by hot, dry summers and cold, snowy winters. Occasional periods of high aerosol loading and variable cloud cover influence the temporal patterns and variability of the irradiance time series, affecting the distribution of both clear-sky and non-clear-sky periods. This climatological context provides important background for interpreting the dataset and understanding the environmental conditions under which the model was developed and validated [6]. The dataset spans multiple days and months, capturing diverse solar geometries, including variations in solar azimuth and zenith angles, which are important for interpreting irradiance dynamics.
Manual clear-sky labels were assigned based on a combination of comparison with theoretical clear-sky predictions from the Ineichen model and expert visual inspection [7,54]. Clear-sky periods were defined as intervals with minimal deviation from modelled irradiance values and stable temporal variability [7,54]. While these criteria provide reliable ground-truth labels, it is important to note that definitions of clear-sky conditions may differ between irradiance components (GHI vs DNI) and may not generalize to all PV systems or climatic contexts [55]. The dataset exhibits a pronounced class imbalance, with non-clear-sky instances approximately 6.75 times more frequent than clear-sky instances. To address this, class weights inversely proportional to class frequencies were applied during model training.
The foundational feature set for the baseline model was carefully engineered to capture the temporal, physical, and geometrical signatures of sky conditions. The features are derived by comparing measured irradiance values against theoretical clear-sky predictions calculated using the Ineichen clear-sky model implemented in the pvlib Python library [56]. This allows for the extraction of relative and absolute deviations, which are indicative of cloud presence, optical variability, and solar geometry effects. Table 1 summarises the key model features, which together provide a robust and interpretable representation of sky conditions for clear-sky classification. To characterise irradiance variability, rolling standard deviations over short windows (e.g., 5 min or 10 min spans) were computed for each irradiance component:
These temporal features, denoted Sghi, Sdni, and Sdhi, provide critical insight into high-frequency fluctuations caused by transient clouds or aerosols. Several clearness and ratio indices were included to assess sky conditions relative to theoretical expectations equation (2). The clearness index kt, defined as the ratio of measured GHI to modelled clear-sky GHI, indicates atmospheric transparency, with values near 1 suggesting clear-sky conditions and lower values indicating attenuation from clouds or haze. The irradiance component ratio Rd = DNI/DHI serves as a proxy for direct-to-diffuse dominance, where clear skies typically yield high Rd values due to strong DNI and weak DHI.
Solar geometry and temporal context were incorporated to encode deterministic patterns in solar irradiance driven by Earth–Sun positioning. This includes the solar azimuth angle (Az) and zenith angle (θz), calendar month (Mo) to capture seasonal trends, and cyclical encodings of time using physically grounded parameters such as the hour angle, rather than local clock time, to avoid introducing non-radiative noise related to time zone or geographical differences:
These allow the model to learn diurnal and annual periodicities without introducing discontinuities at cycle boundaries.
Glossary of key impactful features.
2.2 CSD model architecture and training protocol
To ensure a fair comparison between model versions, a single, consistent training protocol was applied to the baseline, advanced, and refined models. The classification employed a CatBoost classifier, a gradient-boosting algorithm on decision trees. Key model hyperparameters, such as 1000 iterations, a learning rate of 0.05, and a tree depth of 5, were established for the training process. The model was trained to minimize logarithmic loss (Log-loss), which evaluates the accuracy of predicted probabilities for classification tasks [57]. To assess model performance and guide early stopping, the F1-score is used as the primary metric, which balances precision and recall and is particularly suitable for datasets with imbalanced classes, such as clear-sky versus non-clear-sky periods [58]. This combination ensures that the model not only predicts probabilities accurately but also maintains high classification reliability for both classes.
The labelled dataset was partitioned into an 80% segment for model development and a 20% holdout set for final, unbiased testing. The development data was further split into training (64% of total) and validation (16% of total) sets. The validation set was used to implement an early stopping mechanism with a patience of 150 rounds, preventing model overfitting. To address the pronounced class imbalance, where non-clear-sky instances are approximately 6.75 times more prevalent than clear-sky cases, class weights were automatically computed and applied by CatBoost. In practice, this means that the minority class (clear-sky) is assigned a higher weight during model training, ensuring that misclassifications of clear-sky periods have a stronger influence on the optimization process. These weights were inversely proportional to the class frequencies. This weighting scheme ensured that the minority class (clear-sky) had an appropriately stronger influence during optimization, promoting balanced performance across both classes.
All features described in Section 2.1 were normalized using z-score standardization (zero mean and unit variance), ensuring that features with larger absolute ranges do not dominate the gradient updates during training. Cyclical temporal features (e.g., hour of day, day of year) were transformed using sine and cosine functions to preserve their periodic nature while maintaining a bounded range between −1 and 1. This combination of standardization and cyclical encoding ensures that all input features contribute appropriately to the model training process. The final model performance was assessed on the unseen 20% holdout set using a range of metrics, with particular emphasis on F1-score to balance the trade-off between false positives (FP) and false negatives (FN). In the context of CSD, a FP refers to a time point that the model incorrectly classifies as clear-sky when the actual sky condition is not clear(e.g., due to thin clouds or unstable irradiance conditions). These errors can result in incorrect assumptions about optimal PV performance or mask faults during diagnostic analysis. Conversely, a FN occurs when the model fails to recognize a genuinely clear-sky period, leading to missed opportunities for accurate baseline estimation or calibration. Minimizing both types of errors is therefore crucial for downstream applications such as performance ratio analysis, fault detection, and system benchmarking in PV applications.
2.3 Diagnostic failure analysis via explainable AI
The framework's core is a sequential diagnostic analysis of model failure types. Instead of only measuring predictive accuracy, this work focuses on understanding why the model fails by transforming prediction errors into an interpretable feature space for root cause analysis. This approach enables the identification of systematic weaknesses and offers concrete guidance for iterative model refinement. To this end, SHAP is used to interpret model behavior. SHAP is a model-agnostic, game-theoretic approach that quantifies the contribution of each input feature to individual predictions. It is particularly well suited to CatBoost, as it leverages fast tree-based explanations that are both consistent and locally accurate. In this study, model performance was evaluated using the following standard metrics summarized in Table 2, where TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.
False Positive (FP) analysis through SHAP clustering. To improve upon the baseline model, the first step in the failure analysis targeted its FP predictions-instances where the model erroneously classified non-clear-sky conditions as clear. These are particularly harmful in downstream applications such as PV performance benchmarking or FD. The diagnostic process for FPs involved three main stages. First, all misclassified FP instances from the baseline model were isolated, and SHAP values were computed for this specific subset to understand the local feature contributions driving these misclassifications. This step provided an interpretable, high-dimensional signature for each FP event. Second, Uniform Manifold Approximation and Projection (UMAP) is applied to reduce the dimensionality of the SHAP value matrix while preserving its local structure. This transformation facilitated visual exploration and allowed for the application of K-means clustering in the reduced space. Each cluster represents a distinct failure mode, groups of FP errors that share similar explanatory patterns, such as low irradiance variability or elevated diffuse fractions that resemble clear-sky signatures. Third, the resulting clusters were examined and annotated with physically meaningful labels. For instance, one prominent cluster corresponded to conditions of cloud enhancement, where reflected or scattered radiation creates spuriously high irradiance levels. Another cluster included transition periods near sunrise or sunset, where low solar angles lead to ambiguous irradiance dynamics. By assigning semantic meaning to these groups, it transforms opaque model failures into actionable knowledge for subsequent model refinement.
False Negative (FN) analysis through SHAP comparison. Building upon the insights gained from FP analysis, the next diagnostic step addressed FNs, instances where the model failed to recognize actual clear-sky conditions. While FNs are generally less detrimental to system operation than FPs, they still undermine the reliability of baseline estimations and can obscure optimal performance intervals. Unlike the FP analysis, FN errors were not clustered, as they were found to be more dispersed and less structurally grouped. Instead, SHAP values were computed for all FN instances in the advanced model and compared against those from correctly classified clear-sky examples (True Positives, or TPs). This differential SHAP analysis helped identify systematic biases in feature attribution. For example, FN errors were often associated with sky conditions that exhibit stable but slightly attenuated irradiance levels − typical of hazy but otherwise cloudless skies. In such cases, the model often overemphasized the relative clearness index kt and underweighted temporal stability metrics like Sdni or Sghi, leading to conservative classifications.
This analysis provided valuable insights for refining the model. In particular, it highlighted the need for better representation of low-variability, intermediate-irradiance scenarios in the training data, and suggested the addition of more nuanced features to capture subtle haze effects or aerosol-driven attenuation. Taken together, this two-step diagnostic process forms the basis of an interpretable, data-driven refinement cycle. Combining SHAP-based explanations with clustering and class comparison techniques allows one to move beyond black-box classification and towards a transparent model with direct relevance for real-world PV monitoring and decision-making.
Glossary of performance evaluation metrics used in this study. TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.
2.4 Closed-loop refinement cycle
The diagnostic insights obtained through SHAP-based failure analysis are directly leveraged to improve model performance through a closed, human-in-the-loop refinement cycle. Rather than relying solely on automated hyperparameter tuning or feature selection, this approach combines data-driven explanations with expert knowledge to iteratively enhance the model's robustness and trustworthiness. The process is guided by a fundamental principle: each model weakness should inform a specific and interpretable intervention in the feature space. Refinements are applied in two stages; each grounded in the failure modes identified previously. To capture the specific failure patterns identified by SHAP, the following engineered features are defined:
CVghi,30: Coefficient of variation of GHI over a 30 min window, quantifying short-term stability of irradiance.
Δkd,z: Deviation of the diffuse fraction from its zenith-specific median, allowing contextualisation of diffuse irradiance.
: Match score between measured and theoretical direct-to-diffuse ratio, used to evaluate low-light conditions such as dawn or dusk.
The first stage of refinement addresses the major FP failure clusters identified in the baseline model, as described in Section 2.3. These clusters revealed systematic misclassifications in physically meaningful regimes such as cloud enhancement, where transient irradiance spikes mimic clear-sky conditions. In response, new features aimed at capturing the longer-term temporal context of irradiance trends can be engineered. For instance, quantifying the consistency of direct normal irradiance over a 30 min window, helping to distinguish brief enhancements from sustained clear conditions. This targeted, explanation-driven improvement of the feature set defines the transition from the baseline model to the advanced CSD model.
In the second stage of refinement, attention shifts to the FN cases that persisted in the advanced model. As detailed in Section 2.3, many of these errors were linked to subtle conditions like hazy but cloudless skies, where irradiance values fall slightly below clear-sky expectations despite high temporal stability. To mitigate these issues, additional features were designed to better characterize such intermediate regimes. For example, refined clearness indices and low-frequency trend metrics were added to improve sensitivity to stable but attenuated irradiance conditions. This second round of feature engineering gives rise to the refined CSD model.
Each iteration of the model − the baseline, advanced, and refined versions − was trained using the identical protocol outlined in Section 2.2. This ensured that performance differences arose solely from modifications in the feature space, not from changes in model complexity or training procedures. As a result, the gains observed through the closed-loop refinement cycle can be directly attributed to the diagnostic insights and targeted interventions, thereby validating the effectiveness of the explainability-driven development approach.
3 Results and discussion
3.1 Iterative improvement and final model performance
In this study, model performance is evaluated using multiple standard metrics (Tab. 3). The workflow started with the Baseline CSD Model, a CatBoost classifier trained on the foundational feature set. In the first stage, this model's FP errors were diagnostically analyzed using SHAP clustering. This guided a round of targeted feature engineering to create the Advanced CSD Model. For the second refinement, a comparative SHAP analysis was performed on the FN cases of the Advanced CSD Model, which informed the final feature enhancements that produced the superior Refined CSD Model.
The culmination of the refinement process is the final model's performance on the unseen test set, detailed in the confusion matrix (Fig. 3). The model achieves an accuracy of 97.3%, a precision of 88.6%, and a recall of 93.0%. This balance is critical for PV applications; high precision ensures that periods identified as ‘clear-sky’ are reliable references for PLR calculations, while the low FP rate (1.99%) minimizes the risk of polluting clear-sky datasets used for fault diagnosis. When benchmarked against other state-of-the-art methods (Tab. 4), the Refined CSD Model demonstrates superior overall accuracy and a more effective balance between error types, validating the ability of the interpretable framework to produce highly robust and trustworthy results.
Iterative performance improvement across model versions.
![]() |
Fig. 3 Test set confusion matrix for the Refined CSD model. Performance of the final model evaluated on the held-out test set. The figure shows true and predicted label counts for both non-clear and clear-sky classes, along with key classification metrics: accuracy (97.3%), precision (88.6%), and recall (93.0%). This balance of high precision and recall highlights the model's robustness in identifying clear-sky conditions while limiting false positives. |
Performance comparison of the final refined CSD model against benchmark methods from Jordan & Hansen (2023).
3.2 Misclassification analysis of model failures with SHAP
To illustrate the diagnostic process that led to the final model improvements, a deep interpretability analysis was performed on the intermediate Advanced CSD Model. This analysis goes beyond overall performance metrics to investigate the underlying causes of model failures, providing a clear path for targeted refinement.
A global SHAP analysis was first conducted to assess the overall interpretability and physical plausibility of the model's decision-making process (Fig. 4). This analysis confirmed that the model appropriately identifies the key physical factors that govern solar irradiance under varying atmospheric conditions. In particular, features that quantify irradiance stability (Sghi), irradiance component ratios (Rd), and solar geometry parameters such as the azimuth angle (Az) consistently emerged as the most influential predictors of clear-sky conditions (Figs. 4a–4b). These results indicate that the model does not rely on spurious correlations but instead aligns with known physical principles governing solar radiation. Beyond global interpretability, the SHAP analysis provides a more granular diagnostic insight when examining the model's misclassifications. By comparing the SHAP value distributions for FPs and FNs, the origins of the errors can be traced to distinct physical phenomena (Figs. 4c–4d). FP errors, in which non-clear-sky conditions are erroneously classified as clear, are predominantly associated with unusually high values of Rd and other irradiance features. This pattern reflects cloud enhancement effects, where scattered or thin clouds can artificially elevate direct irradiance levels, mimicking clear-sky conditions and misleading the model. In contrast, FN errors, where true clear-sky instances are missed, typically occur under moderately attenuated irradiance conditions. These situations correspond to hazy but cloud-free skies, where aerosols or atmospheric turbidity reduce overall irradiance without the presence of clouds. Overall, the SHAP-based error analysis not only identifies which features drive correct predictions but also differentiates the physical mechanisms behind specific error types, providing a targeted roadmap for further model refinement. By explicitly linking misclassification patterns to interpretable environmental factors, this approach enhances trust in the model and informs potential adjustments in feature engineering or threshold selection.
To further dissect the complex FP errors, a clustering approach was applied to their SHAP values using UMAP and K-means. This partitioned the FPs into distinct groups with unique physical signatures (Figs. 5a-5b). The analysis of the cluster profiles and feature deviations (Figs. 5c-5e) identifies the dominant failure mode as Cloud Enhancement [25,59], where transient irradiance spikes are misinterpreted. Other smaller but significant patterns, such as errors during dawn/dusk transitions (characterized by a low direct-to-diffuse ratio) and cases with potential sensor anomalies, were also clearly isolated. By identifying, characterizing, and quantifying these systematic error patterns, an evidence-based guide for model improvement was established. It was this detailed diagnosis that directly informed the targeted feature engineering − such as adding longer-term stability metrics − that produced the superior performance of the Refined CSD Model.
![]() |
Fig. 4 SHAP analysis of the Baseline CSD Model. (a) Global feature importance, ranked by mean absolute SHAP value. (b) Global SHAP summary plot for all test instances. (c) and (d) respectively compare the feature attributions for False Negative (FN) and False Positive (FP) misclassifications. The analysis reveals distinct failure modes: FPs are primarily driven by high values of features like Rd and kt, characteristic of cloud enhancement, whereas FNs are associated with moderately attenuated irradiance, highlighting the model's difficulty with ambiguous sky conditions. |
![]() |
Fig. 5 Comprehensive analysis of False Positive (FP) clusters from the Advanced CSD Model, derived from clustering misclassification SHAP values. (a) Distribution of FPs in the feature space of Clearness Index (kt) and Direct to Diffuse Ratio (Rd), with reference lines for cloud enhancement and low-ratio conditions. (b) UMAP projection of the FP SHAP values, which demonstrates a clear separation of the errors into distinct clusters. (c) Parallel coordinate plots showing the unique, normalized feature profile for each identified cluster. (d) The relative size of each FP cluster, identifying 'Cloud Enhancement' as the dominant failure mode. (e) Heatmap quantifying the mean percentage deviation of key features for each cluster from the global FP average, providing a quantitative basis for the patterns seen in (c). |
3.3 From diagnostic insights to model refinement
The final step of the proposed methodology closes the loop by translating the diagnostic insights into specific feature engineering, which directly validates the interpretable framework. The targeted feature engineering, driven directly by SHAP-based diagnostics, is the key mechanism behind the performance gains of the Refined CSD Model. Table 5 summarizes this insight-to-feature workflow, detailing how specific failure patterns were identified through SHAP and addressed with corresponding, purpose-built features. For example, the cloud enhancement pattern, which dominated the FP clusters (Fig. 5d), was addressed by introducing long-term stability metrics like CVghi,30. Similarly, the diagnosis of FNs in hazy conditions (Fig. 4c) led to the development of context-aware features like Δkd,z, which allows the model to learn that a higher diffuse component is physically expected at high zenith angles. This process validates the method's effectiveness in creating more robust and physically-aware models.
Mapping of SHAP-identified failure patterns to engineered feature solutions. New abbreviations are defined as follows: CVghi,30 is the coefficient of variation for GHI over a 30 min window; Δkd,z is the deviation of the diffuse fraction from its zenith-specific median; and
is the match score between the measured and theoretical direct-to-diffuse ratio.
3.4 Application for PV fault diagnosis and performance assessment
The value of a highly accurate CSD model lies in its ability to improve the reliability of downstream PV analytics. One of the most common applications is “clear-sky filtering”, where system performance data is analyzed exclusively during clear-sky periods to provide a stable baseline for detecting off-nominal behavior [6,60]. Figure 6 provides a practical example of this approach for fault diagnosis. In the figure, the Refined CSD Model accurately identifies the clear-sky periods, during which the system's voltage is expected to follow a smooth, predictable curve relative to the irradiance. However, the plot reveals two instances of sharp, anomalous voltage drops that do not correspond to any change in GHI. These events are classic signatures of a system-level fault; in this case, temporary partial shading with bypass diode activation [14].
The reliability of such a diagnostic system is critically dependent on the quality of the CSD input. A less precise model with a higher FP rate would incorrectly include periods of intermittent cloud cover in the analysis, potentially triggering false alarms by misinterpreting cloud-induced variability as system faults. Conversely, a model with a low recall could fail to identify enough clear-sky data to form a reliable performance baseline. The high precision and recall of the Refined CSD Model are therefore essential for building robust fault detection systems that minimize both false alarms and missed detections. This same principle extends to long-term performance assessment, where accurate clear-sky filtering is required to calculate degradation metrics like PLR with low uncertainty [61], enhancing confidence in the financial and technical assessment of PV assets.
![]() |
Fig. 6 Example of clear-sky filtering for PV fault diagnosis. The GHI (yellow) is plotted alongside a PV system's voltage (grey). The output of the Refined CSD Model is shown as the highlighted clear-sky period (green). During two clear-sky intervals, sharp, anomalous dips in voltage are observed which are not correlated with irradiance, indicating a system fault due to partial shading with bypass diode activation. |
3.5 Limitations and outlook
A primary limitation of this work is its reliance on a single, high-quality labelled dataset for training and evaluation, namely the CSD dataset introduced by Jordan & Hansen [6]. While this dataset is invaluable for method development, it is geographically specific, and models trained exclusively on this data may exhibit limited generalizability when applied to sites with different atmospheric conditions, such as higher humidity or elevated aerosol content [62]. This limitation is not unique to our study but reflects a broader challenge in CSD, where the lack of globally distributed, high-quality labelled datasets constrains the development of universally applicable models [7].
In addition, a practical consideration concerns the availability of irradiance measurements. In most real-world PV installations, plane-of-array (POA) irradiance is typically measured, and GHI is occasionally available, but DHI is rarely recorded and DNI is generally limited to research-grade monitoring sites. Consequently, the direct-to-diffuse ratio Rd, which was shown to be a highly informative feature, cannot be computed in many operational contexts. Although the proposed CSD can be trained and operated without Rd, with only a modest reduction in accuracy, the inclusion of this feature limits the direct applicability of the current trained model. Furthermore, GHI, DHI, and DNI are available from satellite-based irradiance products, which provide global coverage. While satellite data may introduce some additional uncertainty compared to ground-based reference measurements, it nonetheless enables the proposed method to be applied broadly beyond research installations. This demonstrates both the framework's flexibility and its potential for practical use, while making explicit the trade-off between interpretability, accuracy, and data availability.
Nevertheless, the interpretable AI framework proposed here remains inherently designed to mitigate this generalizability challenge. By emphasizing transparency and physics-informed feature design, the model allows researchers to identify which features are context-specific (e.g., those depending on DHI or DNI) versus those capturing more fundamental sky condition signatures. This transparency supports targeted adaptation strategies, such as estimating missing irradiance components from available measurements, rather than relying solely on retraining with new local datasets. In this sense, explainability not only enhances trust, but also provides a pathway to scalability across diverse operational contexts.
Beyond model development, future research should focus on integrating high-accuracy CSD models into broader PV monitoring and analytics pipelines for applications such as operations and maintenance (O&M) decision support or automated alert systems. Furthermore, the framework could be extended beyond binary classification to identify specific weather phenomena affecting PV performance, such as fog, snow, or heavy aerosol events. Such extensions would enable cross-domain applications in nowcasting and short-term forecasting, where accurate recognition of sky conditions is critical for managing the variability of solar generation [63].
4 Conclusion
This paper presents an interpretable AI framework for clear-sky detection that addresses the “black-box” challenge in ML-based PV monitoring. By integrating a SHAP-based diagnostic workflow into model development, the approach achieves both high classification performance and interpretability, enabling identification of the underlying causes of model failures. The two-stage refinement, guided by SHAP clustering and targeted feature engineering, progressively improved the model, resulting in performance surpassing established benchmarks.
A key contribution of this work is the methodology for deconstructing model errors into physically meaningful patterns, such as confusion during cloud enhancement events versus conservativeness in hazy conditions. This approach enhances downstream PV analytics by reducing false alarms and lowering the uncertainty of PLR calculations. More broadly, it provides a blueprint for developing trustworthy AI in the energy sector, where transparency and correctable failures increase confidence and accountability in automated decision-making.
However, while the proposed model relies on all three irradiance components (GHI, DNI, and DHI), which are not always measured on-site, these inputs can be obtained from widely accessible satellite-based irradiance products, thereby extending the model's applicability beyond research-grade installations. Future work should validate the generalizability of the CSD model across diverse climatic regions and extend its capabilities to additional weather phenomena, such as fog or snow, to further support PV operations and forecasting. The diagnostic framework could also be applied to other PV tasks, including soiling detection and fault classification, where understanding model failure is critical. Finally, the identified error patterns could inform a real-time confidence metric, enabling the system to flag ambiguous sky conditions where predictions may be less certain.
Acronyms and symbols
CatBoost: Categorical Boosting
CNN: Convolutional Neural Network
DHI: Diffuse Horizontal Irradiance
FDD: Fault Detection and Diagnosis
GHI: Global Horizontal Irradiance
SHAP: SHapley Additive exPlanations
UMAP: Uniform Manifold Approximation and Projection
XAI: Explainable Artificial Intelligence
Funding
Swiss Federal Office of Energy (SFOE), Project ASSURed PV2, Contract SI/502904-01.
Conflicts of interest
There are no conflicts of interest to declare.
Data availability statement
The Python code for the developed CSD model is shared in a GitHub repository archived in Zenodo: https://doi.org/10.5281/zenodo.16033987.
CRediT Author Statement
Bohan Li: Conceptualisation, Methodology, Investigation, Data Curation, Formal Analysis, Visualisation, Writing − Original Draft, Writing − Review & Editing. Hugo Quest: Conceptualization, Methodology, Data Curation, Visualization, Writing − Original Draft, Writing − Review & Editing, Supervision. Antonin Faes: Writing − Review & Editing, Supervision. Alessandro Virtuani: Writing − Review & Editing, Supervision. Christophe Ballif: Writing − Review & Editing, Supervision.
References
- G. Masson, A.V. Rechem, M.D. l'Epine, A. Jäger-Waldau, Snapshot of global PV markets 2025 en-US. Tech. rep. (IEA-PVPS, Apr. 2025) [Google Scholar]
- S.R. Madeti, S. Singh, Monitoring system for photovoltaic plants: a review, Renew. Sustain. Energy Rev. 67, 1180 (2017) [Google Scholar]
- M.M. Rahman, J. Selvaraj, N.A. Rahim, M. Hasanuzzaman, Global modern monitoring systems for PV based power generation: a review, Renew. Sustain. Energy Rev. 82, 4142 (2018) [Google Scholar]
- F. Antonanzas-Torres, R. Urraca, J. Polo, O. Perpiñán-Lamigueiro, R. Escobar, Clear sky solar irradiance models: a review of seventy models, Renew. Sustain. Energy Rev. 107, 374 (2019) [Google Scholar]
- J.M. Bright et al., Bright-Sun: A globally applicable 1-min irradiance clear-sky detection modelBright-Sun 1, Renew. Sustain. Energy Rev. 121, 109706 (2020) [Google Scholar]
- D.C. Jordan, C. Hansen, Clear-sky detection for PV degradation analysis using multiple regression, Renew. Energy 209, 393 (2023) [Google Scholar]
- M. Liu, X. Xia, D. Fu, J. Zhang, Development and validation of machine-learning clear-sky detection method using 1-min irradiance data and sky imagers at a polluted suburban site, Xianghe, Remote Sens. 13, 3763 (2021) [Google Scholar]
- C.A. Gueymard, J.M. Bright, D. Lingfors, A. Habte, M. Sengupta, A posteriori clear-sky identification methods in solar irradiance time series: review and preliminary validation using sky imagers, Renew. Sustain. Energy Rev. 109, 412 (2019) [CrossRef] [Google Scholar]
- B. Hartmann, Comparing various solar irradiance categorization methods − a critique on robustness, Renew. Energy 154, 661 (2020) [Google Scholar]
- K. Emery, R. Smith, Requirements for a standard test to rate the durability of photovoltaic (PV) modules at system voltage are discussed, in Monitoring system performance (2011) [Google Scholar]
- H. Walker, J. Desai, D. Heimiller, Performance of photovoltaic systems recorded by open solar performance and reliability clearinghouse (oSPARC) NREL/TP-5C00-7516 2, 1603267 (Feb. 28, 2020), NREL/TP-5C00-75162, 1603267 [Google Scholar]
- A. Triki-Lahiani, A. Bennani-Ben Abdelghani, I. Slama-Belkhodja, Fault detection and monitoring systems for photovoltaic installations: a review, Renew. Sustain. Energy Rev. 82, 2680 (2018) [Google Scholar]
- Y.-Y. Hong, R.A. Pula, Methods of photovoltaic fault detection and classification: a review, Energy Rep. 8, 5898 (2022) [Google Scholar]
- H. Quest, C. Ballif, A. Virtuani, Intrinsic performance loss rate: decoupling reversible and irreversible losses for an improved assessment of photovoltaic system performance. en, Prog. Photovolt.: Res. Appl. 32, 774 (2024) [Google Scholar]
- R.H. Roger, French et al., Assessment of performance loss rate of PV power systems en-US. Tech. rep. Report IEAPVPS T13-22:2021 (Apr. 2021) [Google Scholar]
- L. Karttunen et al., Comparing methods for the long-term performance assessment of bifacial photovoltaic modules in Nordic conditions, Renew. Energy 219, 119473 (2023) [Google Scholar]
- M.G. Deceglie et al., Perspective: performance loss rate in photovoltaic systems, Sol. RRL 7, 2300196 (2023) [Google Scholar]
- R.E. Bird, R.L. Hulstrom, Simplified clear sky model for direct and diffuse insolation on horizontal surfaces SERI/TR 642 (Solar Energy Research Inst. (SERI), Golden, CO (United States), Feb. 1, 1981) [Google Scholar]
- M. Grigiante, F. Mottes, D. Zardi, M. de Franceschi, Experimental solar radiation measurements and their effectiveness in setting up a real-sky irradiance model, Renew. Energy 36, 1 (2011) [Google Scholar]
- P. Ineichen, Validation of models that estimate the clear sky global and beam solar irradiance, Sol. Energy 132, 332 (2016) [Google Scholar]
- M.J. Reno, C.W. Hansen, Identification of periods of clear sky irradiance in time series of GHI measurements, Renew. Energy 90, 520 (2016) [Google Scholar]
- X. Sun et al., Worldwide performance assessment of 95 direct and diffuse clear-sky irradiance models using principal component analysis 95, Renew. Sustain. Energy Rev. 135, 110087 (2021) [Google Scholar]
- A.R. Lusi, P.F. Orte, E. Wolfram, J.I. Orlando, Cloud classification through machine learning and global horizontal irradiance data analysis, Q. J. R. Meteorol. Soc. 150, 5435 (2024) [Google Scholar]
- A.R. Starke, L.F.L. Lemos, J. Boland, J.M. Cardemil, S. Colle, Resolution of the cloud enhancement problem for one-minute diffuse radiation prediction, Renew. Energy 125, 472 (2018) [Google Scholar]
- A. Castillejo-Cuberos, R. Escobar, Detection and characterization of cloud enhancement events for solar irradiance using a model-independent, statistically-driven approach, Sol. Energy 209, 547 (2020) [CrossRef] [Google Scholar]
- D. González-Fernández et al., A neural network to retrieve cloud cover from all-sky cameras: a case of study over Antarctica, Q. J. R. Meteorol. Soc. 150, 4631 (2024) [Google Scholar]
- J. Song, Z. Yan, Y. Niu, L. Zou, X. Lin, Cloud detection method based on clear sky background under multiple weather conditions, Sol. Energy 255, 1 (2023) [Google Scholar]
- E. Scolari, F. Sossan, M. Haure-Touzé, M. Paolone, Local estimation of the global horizontal irradiance using an all-sky camera, Sol. Energy 173, 1225 (2018) [Google Scholar]
- R. Chauvin, J. Nou, J. Eynard, S. Thil, S. Grieu, A new approach to the real-time assessment and intraday forecasting of clear-sky direct normal irradiance, Sol. Energy 167, 35 (2018) [Google Scholar]
- E.F. Abreu, P. Canhoto, M.J. Costa, Development of a clear-sky model to determine circumsolar irradiance using widely available solar radiation data, Sol. Energy 205, 88 (2020) [Google Scholar]
- X. Wang, D. Pi, X. Zhang, H. Liu, C. Guo, Variational transformer-based anomaly detection approach for multivariate time series, Measurement 191, 110791 (2022) [Google Scholar]
- H. Hewamalage, K. Ackermann, C. Bergmeir, Forecast evaluation for data scientists: common pitfalls and best practices, Data Min. Knowl. Discov. 37, 788 (2023) [Google Scholar]
- G.M. Lohmann, A.H. Monahan, D. Heinemann, Local short-term variability in solar irradiance, Atmos. Chem. Phys. 16, 6365 (2016) [Google Scholar]
- W. Mol, C. van Heerwaarden, Mechanisms of surface solar irradiance variability under broken clouds, EGUsphere [preprint] (2024) https://doi.org/10.5194/egusphere-2024-2396 [Google Scholar]
- K.T.N. Ihsan, H. Takenaka, A. Higuchi, A.D. Sakti, K. Wikantika, Solar irradiance variability around Asia Pacific: Spatial and temporal perspective for active use of solar energy, Sol. Energy 276, 112678 (2024) [Google Scholar]
- A.D. Seleznyov, S.K. Solanki, N.A. Krivova, Modelling solar irradiance variability on time scales from minutes to months, Astron. Astrophys. 532, A108 (2011) [Google Scholar]
- R. Machlev et al., Explainable Artificial Intelligence (XAI) techniques for energy and power systems: review, challenges and opportunities, Energy AI 9, 100169 (2022) [Google Scholar]
- V. Hassija et al., Interpreting black-box models: a review on explainable artificial intelligence, Cogn. Comput. 16, 45 (2024) [Google Scholar]
- S.M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, Adv. Neural Inf. Process. Syst. 30, 4765 (2017) [Google Scholar]
- S.M. Lundberg et al., From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2, 56 (2020) [CrossRef] [Google Scholar]
- A. Cooper, O. Doyle, A. Bourke, Supervised clustering for subgroup discovery: an application to COVID-19 symptomatology in communications in computer and information science (eds Kamp M., et al.) (Springer International Publishing, Cham, 2021), p. 408 [Google Scholar]
- J. Cohen, X. Huan, J. Ni, Shapley-based explainable AI for clustering applications in fault diagnosis and prognosis, J. Intell. Manuf. 35, 4071 (2024) [Google Scholar]
- A. Brandsæter, I.K. Glad, Shapley values for cluster importance, Data Min. Knowl. Discov. 38, 2633 (2024) [Google Scholar]
- N.A. Smuha, The EU approach to ethics guidelines for trustworthy artificial intelligence, Comput. Law Rev. Int. 20, 97 (2019) [Google Scholar]
- European Commission, Proposal for a regulation of the European Parliament and of the Council laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain union legislative acts en. 2021 [Google Scholar]
- European Parliament, Regulation (EU) 2024/1689 of the European Parliament and of the Council − Artificial Intelligence Act (AI Act) en. June 2024 [Google Scholar]
- B. Li et al., Trustworthy AI: from principles to practices, ACM Comput. Surv. 55, 1 (2023) [CrossRef] [Google Scholar]
- L. Prokhorenkova, G. Gusev, A. Vorobev, A.V. Dorogush, A. Gulin, CatBoost: unbiased boosting with categorical features, in 2nd Conference on Neural Information Processing Systems (NeurIPS, Montréal, Canada, 2018). [Google Scholar]
- A.V. Dorogush, V. Ershov, A. Gulin, CatBoost: gradient boosting with categorical features support, arXiv preprint arXiv:1810. 11363 (2018) [Google Scholar]
- H.N. Nguyen, Q.T. Tran, C.T. Ngo, D.D. Nguyen, V.Q. Tran, Solar energy prediction through machine learning models: a comparative analysis of regressor algorithms, PloS one 20, e0315955 (2025) [Google Scholar]
- R.A. Rajagukguk, H. Lee, Application of explainable machine learning for estimating direct and diffuse components of solar irradiance, Sci. Rep. 15, 7402 (2025) [Google Scholar]
- W.E. Marcílio Jr, D.M. Eler, Explaining dimensionality reduction results using Shapley values, Exp. Syst. Appl. 178, 115020 (2021) [Google Scholar]
- P.V. Matrenin, V.V. Gamaley, A.I. Khalyasmaa, A.I. Stepanova, Solar irradiance forecasting with natural language processing of cloud observations and interpretation of results with modified Shapley Additive Explanations, Algorithms 17, 150 (2024) [Google Scholar]
- G. Lopez, F.J. Batlles, J. Tovar-Pescador, A new simple parameterization of daily clear-sky global solar radiation including horizon effects, Energy Convers. Manag. 48, 226 (2007) [Google Scholar]
- B. Mabasa, M.D. Lysko, H. Tazvinga, N. Zwane, S.J. Moloi, The performance assessment of six global horizontal irradiance clear sky models in six climatological regions in South Africa, Energies 14, 2583 (2021) [Google Scholar]
- W.F. Holmgren, C.W. Hansen, M.A. Mikofski, pvlib python: a python package for modeling solar energy systems, J. Open Source Softw. 3, 884 (2018) [CrossRef] [Google Scholar]
- Z. John Lu, The elements of statistical learning: data mining, inference, and prediction, J. R. Stat. Soc. Ser. A Stat. Soc. 173, 693 (2010) [Google Scholar]
- P. Christen, D.J. Hand, N. Kirielle, A review of the F-measure: its history, properties, criticism, and alternatives, ACM Comput. Surv. 56, 1 (2023) [Google Scholar]
- K. Lappalainen, J. Kleissl, Analysis of the cloud enhancement phenomenon and its effects on photovoltaic generators based on cloud speed sensor measurements, J. Renew. Sustain. Energy 12, 043502 (2020) [Google Scholar]
- H. Quest, C. Ballif, A. Virtuani, Multi-annual year-on-year: minimizing the uncertainty in photovoltaic system performance loss rates, Prog. Photovolt.: Res. Appl. 33, 411 (2025) [Google Scholar]
- S. Lindig, M. Theristis, D. Moser, Best practices for photovoltaic performance loss rate calculations, Prog. Energy 4, 022003 (2022) [Google Scholar]
- Q. Paletta, Y. Nie, Y.-M. Saint-Drenan, B. Le Saux, Improving cross-site generalizability of vision-based solar forecasting models with physics-informed transfer learning, Energy Convers. Manag. 309, 118398 (2024) [Google Scholar]
- J. Simeunović, B. Schubnel, P.-J. Alet, R.E. Carrillo, P. Frossard, Interpretable temporal-spatial graph attention network for multi-site PV power forecasting, Appl. Energy 327, 120127 (2022) [Google Scholar]
Cite this article as: Bohan Li, Alessandro Virtuani, Christophe Ballif, Antonin Faes, Hugo Quest, An Interpretable AI framework for clear-sky detection in photovoltaics monitoring, EPJ Photovoltaics 16, 32 (2025), https://doi.org/10.1051/epjpv/2025021
All Tables
Glossary of performance evaluation metrics used in this study. TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.
Performance comparison of the final refined CSD model against benchmark methods from Jordan & Hansen (2023).
Mapping of SHAP-identified failure patterns to engineered feature solutions. New abbreviations are defined as follows: CVghi,30 is the coefficient of variation for GHI over a 30 min window; Δkd,z is the deviation of the diffuse fraction from its zenith-specific median; and
is the match score between the measured and theoretical direct-to-diffuse ratio.
All Figures
![]() |
Fig. 1 Example of the hand-labelled clear-sky dataset. The plot displays 1 min time series of Global Horizontal Irradiance (GHI) and Diffuse Horizontal Irradiance (DHI) to visually illustrate the difference between clear periods (smooth GHI, low DHI) and cloudy conditions. The source dataset, from a fixed-tilt PV system in Golden, CO, USA [6], also provides the Direct Normal Irradiance (DNI) and Plane-of-Array (POA) irradiance used in this study. The clear-sky periods, highlighted in green, were manually labelled based on comparisons to a clear-sky model and stable temporal variability. |
| In the text | |
![]() |
Fig. 2 Interpretable CSD development framework. Conceptual overview of the iterative approach combining classifier training, SHAP-based misclassification analysis, and informed feature engineering. The method follows a closed-loop cycle: building a model, diagnosing its failure modes, and using the resulting insights to refine the model in successive iterations. |
| In the text | |
![]() |
Fig. 3 Test set confusion matrix for the Refined CSD model. Performance of the final model evaluated on the held-out test set. The figure shows true and predicted label counts for both non-clear and clear-sky classes, along with key classification metrics: accuracy (97.3%), precision (88.6%), and recall (93.0%). This balance of high precision and recall highlights the model's robustness in identifying clear-sky conditions while limiting false positives. |
| In the text | |
![]() |
Fig. 4 SHAP analysis of the Baseline CSD Model. (a) Global feature importance, ranked by mean absolute SHAP value. (b) Global SHAP summary plot for all test instances. (c) and (d) respectively compare the feature attributions for False Negative (FN) and False Positive (FP) misclassifications. The analysis reveals distinct failure modes: FPs are primarily driven by high values of features like Rd and kt, characteristic of cloud enhancement, whereas FNs are associated with moderately attenuated irradiance, highlighting the model's difficulty with ambiguous sky conditions. |
| In the text | |
![]() |
Fig. 5 Comprehensive analysis of False Positive (FP) clusters from the Advanced CSD Model, derived from clustering misclassification SHAP values. (a) Distribution of FPs in the feature space of Clearness Index (kt) and Direct to Diffuse Ratio (Rd), with reference lines for cloud enhancement and low-ratio conditions. (b) UMAP projection of the FP SHAP values, which demonstrates a clear separation of the errors into distinct clusters. (c) Parallel coordinate plots showing the unique, normalized feature profile for each identified cluster. (d) The relative size of each FP cluster, identifying 'Cloud Enhancement' as the dominant failure mode. (e) Heatmap quantifying the mean percentage deviation of key features for each cluster from the global FP average, providing a quantitative basis for the patterns seen in (c). |
| In the text | |
![]() |
Fig. 6 Example of clear-sky filtering for PV fault diagnosis. The GHI (yellow) is plotted alongside a PV system's voltage (grey). The output of the Refined CSD Model is shown as the highlighted clear-sky period (green). During two clear-sky intervals, sharp, anomalous dips in voltage are observed which are not correlated with irradiance, indicating a system fault due to partial shading with bypass diode activation. |
| In the text | |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.









