Detection of microcracks and dark spots in monocrystalline PERC cells using photoluminescene imaging and YOLO-based CNN with spatial pyramid pooling

. Two common defects encountered during manufacturing of crystalline silicon solar cells are microcrack and dark spot or dark region. The microcrack in particular is a major threat to module performance since it is responsible for most PV failures and other types of damage in the ﬁ eld. On the other hand, dark region in which one cell or part of the cell appears darker under UV illumination is mainly responsible for PV reduced ef ﬁ ciency, and eventually lost of performance. Therefore, one key challenge for solar cell manufacturers is to remove defective cells from further processing. Recently, few researchers have investigated deep learning as an alternative approach for defect detection in solar cell manufacturing. The results are quite encouraging. This paper evaluates the convolutional neural network based on heavy-weighted You Only Look Once (YOLO) version 4 or YOLOv4 and the tiny version of this algorithm referred here as Tiny-YOLOv4. Experimental results suggest that the multi-class YOLOv4 is the best model in term of mean average precision (mAP) and prediction time, averaging at 98.8% and 62.9 ms respectively. Meanwhile an improved Tiny-YOLOv4 with Spatial Pyramid Pooling scheme resulted in mAP of 91.0% and runtime of 28.2ms. Even though the tiny-weighted YOLOv4 performs slightly lower compared to its heavy-weighted counterpart, however the runtime of the former is 2.2 order much faster than the later.


Introduction
Solar energy has gained increasing attention as a source of renewable energy in recent years as the price of oil has increased and environmental concerns have grown. However, the efficiency of the system is reduced due to the solar cell defects that occur during production or operation. Among many types of defects the microcrack and dark region are high in the list of worries for the PV industry. On average, about 5-10% of fully completed solar cells coming out of a production line contains these defects, in particular microcrack. Eliminating these defects will aid in increasing the efficiency of PV generation and the lifetime of solar cells. However, detecting these defects can be challenging due to their variety in shape and size. Furthermore, these defects which are located below the surface of solar cells may be invisible in the image captured by a standard CCD camera. Hence, specialized imaging systems have to be deployed in order to inspect PV cells for defects. To-date, various imaging technologies such as the electroluminescence (EL) [1,2]. The infrared (IR) [3] and the photoluminescence (PL) [4] have been developed for solar cell and wafer inspection. A study which reviewed existing and emerging imaging technologies used for solar wafer or cell inspection concluded that PL remains popular among machine vision manufactures due to its speed and cost advantages [5]. Interested readers who wish to learn more about these technologies are referred to this publication. On software inspection, several researchers have developed different techniques and approaches for inspecting solar cells and PV modules automatically. These approaches can be divided into two categories. They are (i) the image processing technique and (ii) the deep learning method. This study investigates a deep learning approach based on YOLOv4 algorithm for training a model capable of detecting microcrack and dark region in solar cell images captured using PL system. Both the heavyweighted and light-weighted YOLOv4 networks have been used in the investigation from which an improved Tiny-YOLOv4 has been proposed.
2 Related works 2.1 Defect detection based on image processing approach Traditional image processing approaches have been widely used to identify defects in solar cells and modules such as microcracks and finger interruptions. Most of these techniques are based on color, shape, and texture of defects. An algorithm based on anisotropic diffusion filtering was proposed in [6] to detect small visible cracks on the surface of solar wafers. The image captured with a standard CCD camera and light source reveals microcracks with low grey scales and strong gradients. Diffusion preserves the crystal grain background while smoothing out the suspected defect region. By subtracting the diffused image from the original image, the location of the crack can be determined precisely. Meanwhile [7] proposed an anisotropic diffusion filter and image segmentation method for identifying microcracks in the presence of noise. A method based on spectral clustering to cluster the features from the regions of interest is proposed. The method achieves high efficiency when detecting finger interruptions in polycrystalline cells [8]. Chen et al. [9] proposed a detection method that can precisely locate cell cracking in EL images using steerable evidence filtering. Although these studies are very promising, however, most authors considered microcrack defect of finger interruption only. To address this drawback, a method based on Fourier image reconstruction has been proposed in [10] that successfully detects breaks, finger interruptions, and microcracks. Even though, this method is proven effective. however, the algorithm is very time consuming, and requires high end workstation in order to deliver a real-time solution.

Defect detection based on deep learning approach
Few researchers have successfully applied deep learning approach to inspect PV cells in industrial set-ups. Both polycrystalline and monocrystalline cells have been considered. For instance, two automatic CNN defect detection techniques based on improved VGG-19 and SVM algorithm have been proposed [11]. By rounding-up the continuous probability prediction to the nearest neighbor of the four original classes enabled these authors to compare the precision of the algorithm with the ground truth label directly. The results indicate that CNN is more accurate than SVM methods. In another study Li et al. [12] proposed a highly accurate diagnostic model for defects detection of PV modules in a large-scale solar farm using unmanned aerial vehicles and CNN. The results are very encouraging with accuracy reaching more than 90% in controlled environment. Similarly, [13] proposed an automatic defect classification method based on CNN model that achieved a higher classification accuracy than existing methods. Although the aforementioned studies achieved competitive accuracies, however, the methods and procedures are very computationally intensive, making these techniques highly dependent on expensive and costly hardware. Addressing this problem led to the development of lightweight CNN network architecture which is referred in the literatures as the Faster-RCNN [14]. Results indicate that this architecture yielded satisfactory results with few calculations. A further improvement utilizing the genetic algorithm for feature extraction has been reported following the publication of GA-Faster-RCNN technique [15]. The improved technique can identify cell defects and mark their locations automatically. In another research, Zhang et al. [16] designed a detection algorithm that combines results from Faster R-CNN and R-FCN in order to improve detection precision and position accuracy. The authors reported achievement of 85.7 % in terms of the mean average precision (mAP) which is low by industrial standard. A much more complex deep learning architecture has also been developed by combining there different CNN technologies À (i) the Faster-RNN, (ii) the EfficientNet, (iii) the autoencoder [17]. Such a combination enabled the authors to construct an end-toend deep learning pipeline that detects, localizes, and segments cell-level anomalies from the entire photovoltaic modules. This new architecture has also improved the sensitivity of Faster-RNN when detecting defects in solar cells. In this case the autoencoder not only allowed the anomaly in PV modules be segmented, but it also facilitated defect detection. In another work a CNN based on generative adversarial network has been evaluated for detecting anomalies in solar cells [18]. Although tested on different datasets, such a network performs less satisfactorily with recall, precision and specificity averaging at 79%, 73% and 73% respectively. More recently an empirical digital twin which fuses measurement data and expert knowledge has been utilized for detecting certain types of defects in solar wafers [19]. Though promising, however, this method requires IV measurements which is difficult to be implemented in high-speed production settings. From the brief literature review, currently, there are three main issues associated with the application of deep learning for defect detection involving solar cells or PV modules. Briefly they are (i) low detection accuracy, (ii) slow detection speed, and (iii) lack of standard for defect detection. Although existing methods have significantly reduced inspection time, however, they are incapable of processing images in real time. In some environments with sensitive hardware resources and stringent real-time requirements, its application remains difficult. In term of business and economy, high over and under rejection due to poor recall and precision respectively, would impact the industry negatively. Hence, additional research is required.
boxes and the probability of a class being within these boxes. A single convolutional network predicts multiple bounding boxes simultaneously, requiring only a single look at an image to determine which objects are present and where they are. Meanwhile YOLOv4 is an enhanced version of the original YOLO architecture which was introduced in 2020. Since then YOLOv4 is very popular among the AI community because of its accuracy and detection precision. More importantly, this algorithm, in particular the Tiny-YOLOv4 is architecturally less complicated compared to other types of convolution neural networks, making this algorithm suitable for high-speed applications such as the PV production. The backbone, network training, activation function, and loss function of YOLOv4 are optimized, resulting in a faster operation without compromising the accuracy and sensitivity of the algorithm [21]. To train and extract image features, YOLOv4 uses CSPDarknet53 as the main backbone engine. The latter is an open-source neural network which is freely available online. Meanwhile the Path Aggregation Network or PANet [22] is used as feature extractor while object detection is achieved by means of standard YOLOv3 algorithm [20].
Tiny-YOLO, on the other hand, is a derivative of YOLO with the former containing fewer convolution layer in the backbone [22]. Hence Tiny-YOLO is essentially YOLO but with reduced size. For this reason, Tiny-YOLO can perform real-time detection on devices with low computing power and has a faster detection speed while maintaining the overall network accuracy. This architecture makes use of the CSPdarknet53-Tiny backbone network and the FPN network for enhanced feature extraction. For prediction this algorithm employs filters of sizes 13 Â 13 and 26 Â 26 in first and second layers respectively. Meanwhile the LeakyReLU function is used to activate the entire network. Prior to detection, each input image is scaled to 416 Â 416 and 608 Â 608 size pixels. Both the CBL and Resblock modules serve as the network's backbone. The former comprises of the convolution layers, the batch normalization block and the LeakyReLU activation function while the later consists of a residual network whose structure is similar to CSPNet [22]. The algorithm, specifically the feature extraction network, employs a pyramid structure to improve the network's feature fusion and detection precision. Like other deep learning frameworks, YOLOv4 or Tiny-YOLOv4 needs to be fine-tuned in order to ensure acceptable performance in term of accuracy and speed. For this reason, a number of YOLOv4 models are trained in this study using both heavy-weighted and light-weighted models. Further improvement of the algorithm is achieved by incorporating the Spatial Pyramid Pooling (SPP) in the neck part of the network's backbone. The details are discussed in the following sections.

Methodology
Overall, the methodology in designing training models is summarized in Figure 1. The process starts with image acquisition, then followed by image annotation and augmentation. Models training constitutes the next subsequent stage. This is immediately followed by finetuning and evaluating each model using validation and test samples respectively. Based on results from previous step, few models are selected for further improvement after which a best model is selected for deployment. The later constitutes the last or final step.

Image annotation
All images captured in the datasets have been inspected and verified by expert human inspectors. This helps ensuring correct labeling and minimizes uncertainty in the data. The software tool LabelImg is used to label all defects in solar cell images during training, resulting in a text file for each image containing the defect's id and the coordinates of the bounding boxes. In the case of Dataset 3, this tool is used to manually label all defective samples into two classes À0 and 1 corresponding to microcrack and dark region class respectively. In this dataset the sizes of microcrack and dark region range from 2 mm to 110 mm and 6 mm 2 to 1309 mm 2 respectively. Meanwhile the labeling is performed separately for Dataset 1 and Dataset 2 and the class id is set to 0 as all samples in each dataset belong to one class only.

Image augmentation
Increasing the diversity of samples can help improve identification accuracy. To increase the richness of the experimental data and the model's generalizability, the data augmentation is used to increase the number of samples in the datasets. Two popular augmentation techniques are used for this purpose. They are (i) flipping and (ii) rotating. Both horizontal and vertical are used in flipping which rotations involved 90°and 270°. Each image in the training sets is processed using these augmentation techniques, and the resulting text file for each augmented image is annotated simultaneously. Altogether 1755 additional images are produced after augmentation resulting in a total of 2106 images for Dataset 3. Similarly, an additional 800 microcrack images and 1100 dark region images are created, yielding a total of 960 and 1320 samples in Dataset 1 and Dataset 2 respectively. The validation set is taken as a ratio of 15% from each training set yielding a total of 144, 198 and 316 samples in Dataset 1, Dataset 2 and Dataset 3 respectively.

Model training
All deep learning models discussed in this paper are implemented on Intel i5-4460 personal computer with 20 G memory, and operating at an optimum speed of 3.2 GHz. This PC is equipped with 8 GB NVIDIA GeForce GTX 1070 GPU and running in Windows 10 operating system. Altogether fourteen YOLOv4-based deep learning models are designed and trained with different configurations and parameters. In this case Model 1 and Model 2 are based on heavy-weighted YOLOv4 and the remaining models (Model 3-Model 14) are essentially Tiny-YOLOv4 algorithm. The parameters considered in the design are the size of the input image, the number of subdivisions, the number of detection layers and the activation function. In the case of activation function, the popular LeakyReLU is tested alternately with the Mish functions while keeping other parameters unchanged. Table 1 summarizes important configurations and parameter settings for all models. The architecture of heavy-weighted YOLOv4 with input size of 608 Â 608 is shown in Figure 3. This architecture forms the basis in designing Model 1 and Model 2. Their differences are in term of number of subdivision and size of input image. In this case Model 1 is designed with 32 subdivisions and input size of 416 Â 416. Model 2 is slightly large having 64 subdivisions and input size of 608 Â 608. Smaller image size and subdivisions in Model 1 allow GPU to handle more images compared to Model 2. Meanwhile Figure 4 shows the architecture for Tiny-YOLOv4, with input size of 416 Â 416, detection layers, Learky ReLU activation function. This architecture forms the basis of Model 3-Model 14. The tiny models with different parameter settings and types of activation functions are shown in Table 1. Compared to heavy-weighted YOLO, the lighted-weighted YOLO enables training using much smaller subdivisions, and hence allowing GPU to handle more images simultaneously during training. In this way the runtime is improved drastically as evident from results discussed in the following section. All models discussed here are evaluated using the same samples in the datasets, from which the best model is selected for further improvement.

Model evaluation
The performance of the algorithm is assessed using a popular indicator based on mean average precision or mAP. This indicator is computed by first forming the precision-recall (PR) curve, and then calculating the average precision (AP) for each class by means of integration. The mAP is simply the average of AP over all classes. Mathematically: and, where AP n is the AP for the nth class, and N c is the total number of classes. In this study N c is set to 1 and 2 for single and multi-class models respectively.

Results and discussions
Each model discussed in the previous section is first evaluated using validation samples and mAP value calculated. The process is repeated for test samples after which the mAP is averaged out. Results comparing the performance of each model are plotted graphically in Figures 5-7  multi-class performer, resulting in an average mAP of 98.8%. This value is significantly higher compared to Tiny-Yolov4 in Model 14 which produced an average mAP of 90.1%. In terms speed, the original YOLOv4 is the slowest with training and prediction times averaging at approximately 20 h and 63 ms respectively. This model is also a very computationally intensive algorithm since it requires 244 GB memory space in order to work efficiently. In contrast, Tiny-YOLOv4 is significantly much faster with training and prediction times averaging at 3.5 h and 26.8 ms respectively. This algorithm also outperforms the original YOLOv4 in terms of hardware resources, with the former requiring 22 MB of memory only. Table 2 summarizes important features of best models, comparing the original and tiny YOLOv4 algorithms.
Close examination of Table 2 reveals few important points. If a heavy-weighted multi-class model is formed by combining two single class models, i.e. Model 2 microcrack + Model 1 dark region, then the resulting network would have an accuracy of 96.9% mAP, 107 ms prediction time and 488 MB memory size. Clearly the new model is not as good as Model 2 multi-class since the latter has mAP, prediction time and memory size averaging at 98.8%, 26.3 ms, and 244 MB respectively. Hence a heavy-weighted model trained to perform multi-class detection performs much better than a multi-class model formed by a combination of two heavy-weighted single class models.
In contrast the light-weighted multi-class model formed by a combination of two tiny models, i.e. Model 7 microcrack + Model 7 dark region, would result in mAP, prediction time and memory size averaging of at 91.0%, 52.7 ms and 44.8 MB respectively. In terms of mAP, clearly, a new tiny model performs slightly better than a same model when trained to perform multi-class detection, i.e. Model 14. Nevertheless the prediction time and memory size of this new model is significantly much higher compared to Model 14. Table 3 summarizes important findings comparing original and combined models for both heavy-weighted and tiny YOLOv4. Referring to this table, evidently, the tiny original and combined models perform slightly lower than their heavy-weighted counterparts. Nevertheless, their prediction times, particularly the original multi-class, is at least twice much faster compared to heavy-weighted YOLOv4.
Like other manufactured products, PV production is usually automated with throughput reaching 3600 samples per second in most cases. Therefore, prediction time is a very important factor to consider. In terms of speed, it's extremely difficult for heavy-weighted YOLOv4 to meet this requirement as the above results suggest. In contrary a lightweighted tiny YOLOv4 offers much more practical solution due to its superiority in speed. In terms of accuracy, however, this network is slightly inferior compared to its heavyweighted counterpart. Hence, a study has been initiated, aiming to improve the performance of Tiny-YOLOv4. This study is based on assumption that small defects occupy fewer pixels, and hence, taking up smaller area in the receptive field. If these features can be amplified then they can be extracted and used for further processing. The SPP is a good candidate for this task since this network is capable of expanding the receptive field, thus enabling the multi-level features be extracted more efficiently [23]. Furthermore the SPP concatenates the outputs of the max pooling, leading to further enhancement of the detailed expression of small defects. Consequently, the recognition ability of the algorithm could be improved. In testing this hypothesis, a single class model (Model 7) is modified by adding SPP which is composed of primarily 13 Â 13 filter kernels. The kernel's  size matches with input image which has been resized to 416 Â 416 pixels. The resulting model is shown in Figure 8. The same method is used to modify a multi-class model (Model 14), and the result is shown in Figure 9. In this case the kernel size is fixed to 19 Â 19 in order to match with resized image of 608 Â 608 pixels.
The modified architecture are again evaluated using images in Dataset 1, Dataset 2 and Dataset 3, and the results are shown in Table 4.
Referring to Table 4, it can be seen that the performance of the modified network, in particular the multi-class model, has registered a slight improvement     of mAP averaging at 95.7% and 88.1% for validation and test samples respectively. The same trend is not repeated for single class models. In terms of mAP there is no significant difference between original and modified models, in particular the microcrack model. In fact the modified model for dark region has registered a slight reduction in mAP. This indicates the difficulty in detecting dark region due to complexity of such a defect. In summary both original YOLOv4 and modified Tiny-YOLOv4 are sufficiently accurate models for inspecting microcrack defect only. In contrast the original YOLOv4 is preferred for inspecting dark region defect compared to Tiny-YOLOv4 or its modified version. In case of multi-class inspection, the modified Tiny-YOLOv4 is preferred choice for an online inspection due to its speed advantage. In this case the original YOLOv4 can be used for sampling where precision and accuracy are an utmost importance.

Conclusion
This paper investigates solar cell defects detection using deep learning approach based on YOLOv4 framework. Various models with different configurations and parameter settings are trained for detecting microcrack and dark region defects. Overall, the original heavy-weighted YOLOv4 is the best algorithm for both single and multiclass solutions with mAP ranging from 94% to 100%. However, this algorithm is very computationally intensive since it requires at least 42 ms to inspect one sample. Also it requires high-end computers since the algorithm needs 244 MB of memory space in order to function reliably. In comparison a modified version of Tiny-YOLOv4 resulted in an accuracy of approximately 92%. Even though this algorithm has registered slightly lower mAP, however, its speed is at least twice faster compared to the original heavy-weighted YOLOv4. This speed is more competitive compared to methods published in [12,13]. In terms of accuracy the performance of the proposed model is also comparable if not better compared to [12,13]. In conclusion the modified algorithm is suitable for rapid online inspection while its heavy-weighted counterpart is more suitable for an offline application where hardware resources are abundant and speed is important but not a decisive factor. Moreover, the algorithm can also be applied to EL inspection system since the images produced by this technology are optically similar to those generated by PL system.

Author contribution statement
Amran Binomairah is responsible in implementing most of tasks of this research project. Azizi Abdullah provides inputs on machine learning while Bee Ee Khoo contributes in image processing and artificial intelligence. Meanwhile Zeinab Mahdavipour is responsible in preparing test samples and Teow Wee Teo involves in designing the image capturing hardware. Nor Shahirah Mohd Noor contributes in data analysis and Mohd Zaid Abdullah is the owner of the research, and the main person responsible for this paper.