Every cell needs a beautiful image: on-the-ﬂ y contacting measurements for high-throughput production

. The future of the energy transition will lead to a terrawatt-scale photovoltaic market, which can be served cost-effectively primarily by means of high-throughput production of solar cells. In addition to high-throughput production, characterization must be adapted to highest cycle times. Therefore, we present an innovative approach to detect image defects in solar cells using on-the-ﬂ y electroluminescence measurements. When a solar cell passes a standard current – voltage ( I – V ) unit, the cell is stopped, contacted, measured, released, and afterwards again accelerated. In contrast to this, contacting and measuring the sample on-the-ﬂ y saves a lot of time. Yet, the resulting images are blurred due to high-speed motion. For the development of such an on-the-ﬂ y contact measurement tool, a deblurring method is developed in this work. Our deep-learning-based deblurring model enables to present a clean EL image of the solar cell to the human operator and allows for a proper defect detection, reaching a correlation coef ﬁ cient of 0.84.


Introduction
Having a high-throughput solar cell production is crucial for an efficient transition to PV installations on a terrawatt (TW) scale. The current production and measuring technologies only enable solar fabrication on a gigawatt scale. A key enabler for multi-GW PV factories could be a significant increase in throughput by a factor of three compared to state of the art production capacities while maintaining or even reducing the costs of production technology. However, a significant increase in throughput requires to change the current end-of-line characterization methods.
In this work, we address the challenges of electroluminescence imaging systems on a high throughput level. Electroluminescence measurements allow for localized defect detections with algorithms such as [1,2]. With the current state-of-the-art inline characterization, a throughput of >4500 samples per hour can be achieved [2]. To increase the throughput, the contacting/decontacting station and measurement setup need to be adapted. These changes pose some challenges to imaging techniques which we address in this contribution.
In the first part of our work, we present a demonstrator for an on-the-fly contact measurement system for EL imaging which is key for high-throughput characterization. This setup enables a fast EL measurement by contacting and measuring the solar cell on-the-fly, which saves measurement time. Yet, these fast measurements result in blurred images due to necessary exposure times which are required to have a reasonable signal-to-noise ratio.
In the second part of our work, we utilize convolutional neural networks (CNN) to tackle the motion blur and develop a model that learns to remove the motion blur from the images. Our model learns to generate a clean EL image from which we can detect defects using defect detection algorithms. CNNs have been proven successful in many areas of photovoltaics. Kunze et al. [3] use CNNs to develop an empirical digital twin for quality inspection of solar cells. In [4], Demant et al. investigate the use of CNNs for defect assessment of multicrystalline silicon wafers using photoluminescence images. Deitsch et al. [5] investigate CNNs and support vector machines (SVM) for automatic defect classification. Deep convolutional neural networks are also widely used in various network architectures to deblur images [6][7][8]. Although these algorithms do fairly well at deblurring images, they were trained on benchmark datasets such as [9][10][11] which acknowledge different motives and different causes of blur. In this study, we work with a type of motion blur caused by capturing an EL image of a moving solar cell. By training a domain specific network, which is specialised on solar cell defects, more accurate network predictions are expected.
The network optimization procedure has been designed in a way that the training works without paired data measured in the on-the-fly and the stop mode, respectively. Firstly, the knowledge of the motion blur allows us to develop a synthetic dataset to train our model based on the estimated motion blur and noise value. Also a real dataset consisting of images measured in on-the-fly and stop-mode respectively was developed, which we use to train and test our models. Typically, models are trained in a supervised manner by comparing network predictions with the ground truth data. This is possible, when matching input and output data exist. Nevertheless, the pixel-wise alignment of blurred and still El-images is tricky due to the intense motion blur. Therefore, generative adversarial networks (GANs) [12] have been evaluated for unsupervised model optimization since they are not only robust to image translations between predicted and reference data but can also be trained entirely without paired data. Also, GANs are widely recognized for their high realism of generated images and have gained attention in other photovoltaic works like fault detection in photovoltaic panels [13], synthetic data generation of photovoltaic panels under suboptimal conditions [14] and image denoising [15].

Demonstrator for high throughput EL characterization
To examine end-of-line characterization of solar cells on-the-fly, which is required for significant throughput increase, we built a demonstrator system which performs measurements on continuously moving samples. The presented measurement unit is designed to develop measurement procedures for a sorter in which the solar cells are measured with a belt speed of 1.9 m/s without stopping for the measurement. Figure 1 shows the setup of the demonstrator and the contacting station. The front view of the demonstrator is shown in Figure 1a. The cells are contacted via a conventional contacting station, highlighted in orange box, which travels on a linear axis. Electrical signals and current supply are transferred to the contacting station via spring loaded sliding contacts (marked in yellow) which are mounted at the contacting station and fixed at the setup. We use a 4 MPixel Si-CMOS camera (marked in green), binned to 1 MPixel for the contacted EL measurements. The camera uses a near infrared (NIR) enhanced model. An objective with high NIR transmission is used to optimize the signal. A close-up image of the contacting station as well as the sliding contacts at the setup are shown in Figures 1b and 1c respectively. When the station enters the field of measurement, a sensor triggers the measurement. The motion blurs the images by 1.9 mm/ms of exposure. Depending on the setup and the device under test, this may result in significant blurring if EL images with exposure time greater than 2 ms are required. The system can perform IV-curve, EL-, and thermography measurements.
During the measurement, the contacting unit continuously moves forward which results in challenges and opportunities regarding the measurement times, speeds and field of illumination. If we consider a high-speed sorter system with a throughput of 10.000 cells/hour and cells with an edge length of approximately 210 mm, the pitch distance between adjacent cells will be in the range of 40 cm or more. In order to reach the throughput, a travel speed greater than 1.1 m/s results. One cell needs to be measured, before the next cell reaches the electrical contacting section, assuming idle time between the measurements. This results in a maximum time of 283 ms left for the measurements, enough for a 40 ms flash, a dark IV-measurement of 40 ms, 60 ms for thermography and a short EL image for which the image acquisition time will be discussed in more detail below. During an individual measurement, the cell moves on, e.g. 4.45 cm during a 40 ms flash. This requires a moderately enlarged homogeneous field of illumination. For imaging techniques, the main challenge is that due to the motion, images get blurred which needs to be accounted for in image analysis. We will show below how such motion blur can be removed by appropriate algorithms. However, the image acquisition times will be limited by this depending on the exact travel speed of a real sorter system.
Commercial measurement systems like the line scanning photoluminescence imaging systems [16] are used for defect classification during production without the need of contacting the solar cells and achieve a throughput of 8000 samples per hour. In contrast to these systems, our demonstrator does not use a high speed line-camera and does not require a special illumination. This in turn reduces the equipment costs.
Special challenges exist for electroluminescence and thermography measurements taken at the moving cell, because these images are smeared due to motion blur. Therefore, we develop a data-driven model that can detect and remove the motion blur to produce unblurred EL measurements.

Deep-learning-based motion deblurring
The schematic of our model architecture is shown in Figure 2. Our model consists of a generator G that learns to deblur the noisy images and produce clean EL images.
We use a U-Net [17] as generator in our model. A blurred input image 'Im blurred ' is given to the generator G, which learns to produce a corresponding clean image 'Im deblurred ' of the input. The generator is optimized using a combined loss mechanism with adversarial loss for unsupervised learning and mean absolute error function for supervised learning. Mathematically, the objective of our model is expressed as where l 1 and l 2 are constants weighing the losses. The most famous unsupervised loss function is the adversarial loss [12] which is implemented in this work. For the adversarial loss, a second network, the discriminator D, is used. The discriminator follows the same architecture as that of the PatchGAN discriminator [18]. The discriminator tries to distinguish the generated data from the real data. If the discriminator cannot tell the generated image apart from the real data, the generator would be successful at generating realistic deblurred  images. The generator forwards the deblurred EL image to the discriminator which decides whether the deblurred EL image looks as real as still EL images. The disciminator is trained on the deblurred images and the real images, i.e., still EL images. For y ∈ D still and x ∈ D blurred , During training, the discriminator is not only trained using the last generated image but considers the history of its training. This concept was first applied by [19] in which the discriminator is trained considering the last 50 generated images. The discriminator does not only help in optimizing the generator network but improves the realism of the reconstructed images. Additionally, the generator is optimized by computing an l1 loss between the deblurred image and the respective still EL image 'Im still '. An ℓ 1 loss can only be computed when paired data exist. On-the-fly contacted EL images and still EL images are measured with different EL systems and thus are not aligned equally. Therefore, We use pairs of synthetically blurred (Sect. 4.1) and respective still EL images of solar cells for training our model. During the testing mode, the generator network is tested on how well it can predict deblurred images on new data, i.e., data it has not trained on which is shown in Section 6 and discussed in Section 7. For x ∈ D blurred and z ∈ D still , the l1-loss is defined as:

Experimental
To train and test our neural networks, we created two different types of datasets: synthetic data (Sect. 4.1) and real data (Sect. 4.2). The synthetic dataset is used for supervised learning while the real data can be used for both supervised and unsupervised learning, as well as, for network evaluation.

Synthetic data
Every entry of the synthetic dataset consists of a blurred EL image as the input image and the respective still EL image as the label. This synthetically blurred EL image is created by convolving the still EL image with an averaging kernel. Mathematically, the blurred image can be formulated as follows, where Im blurred is the blurred image, Im still is the still image, b is the kernel vector which describes the motion blur being convoluted with the still image, and n is the additive noise.
The motion kernel contains a vector of ones which corresponds to an integration of the image data in motion direction.
The synthetic motion blur length is calculated as the product of the exposure time (in ms) and the belt speed (in m/s). The size of the blur kernel can be computed by the length of the blur divided by the pixel size. An additional noise is added to the blurred image that corresponds to the camera noise in a measured blurred image. In our case, we have a belt speed of 1.9 m/s, an exposure time of 7 ms and a pixel size of 160 mm/px resulting in a blur length of 13.3 mm a kernel size of 83 pixel.
The synthetic dataset consists of 1480 solar cells including 735 Silicon Heterojunction Cells (SHJ) and 745 Passivated Emitter and Rear Cells (PERC) [20]. All the solar cells were measured using a standard EL measuring device in stop-mode to create still EL images. These images are then convoluted with a kernel to create synthetic blurred data.

Real data: On-the-fly contacted EL measurements
For on-the-fly contacted EL measurements, solar cells with representative defects were selected and have been measured at the demonstrator. The parameters exposure time, gain and current applied during the EL measurement were varied and adjusted to have a visible signal in the measurements. For our dataset, we applied an exposure time of 7 ms and a current of 9 A to avoid saturation of the signal. The camera gain was set to 1 to minimize the camera noise. Every cell was measured using the same applied current and camera gain . We used 98 SHJ solar cells to create this dataset. All SHJ solar cells measured on-the-fly have a V oc value within 730-735 mV. Figure 3 shows a comparison of a still EL image in (a), a synthetically blurred EL image in (b) and a measured EL image in (c) of the same solar cell. The first image shows a still EL image measured at a standard EL measuring device. The second image shows the respective synthetically blurred EL image following the steps mentioned in Section 4.1. The third image shows the EL image of the solar cell taken on-the-fly at our demonstrator with an exposure time of 7 ms. The intensities of the synthetic and measured EL images are different due to different parameter values during measurement recording. Additionally, a visual difference in contact bars can be seen in both the still and on-the-fly measured images. This is because the contact bars are different for both setups. The real dataset, comprising of the measured and the still EL images, can be used as a supervised dataset when aligned correctly. An additional step would be to mask out the contact bars in the images to be able to compute a supervised error between the predicted and the still EL image. Since the precise data alignment is challenging due to the strong motion blur, unsupervised learning without pixel-wise aligned measurement data is a promising alternative investigated within this work. In our case, we have used the real data as supervised data for validation and testing purposes only.

Training details
Our model was implemented in the Pytorch [21] framework and trained using NVIDIA GeForce RTX 2080 Ti GPUs. We have used two model training strategies in terms of images: full-image training and patch training.
For full-image training, we resized the EL images from 1024 px Â 1024 px to 512 px Â 512 px. For patch training, instead of training the model on the whole EL image, randomly cropped image patches of size 512 px Â 512 px from the original EL images were used to train our model. Before feeding the inputs, the images were normalized as expressed in equation (5) where x is the image to be normalized, m is the image mean and s is the image standard deviation. Binary cross entropy was used for the adversarial loss. To avoid mode collapse, label smoothing [22] was applied where we used labels 0.1 and 0.9 instead of hard labels 0 and 1. The learning rate applied for both generator and discriminator was 0.0002 without any decay. For the GAN training, the generator and discriminator were both updated only once for every iteration. The Adam [23] is an optimization algorithm based on stochastic gradient descent which we use for model optimization.

Results
With the implemented demonstrator for the on-the-fly measurements, we can capture EL images at a belt speed of 1.9 m/s which corresponds to a throughput of 12,000 samples per hour. Due to the necessity of exposure times in the range of 5-7 ms, the resulting images are affected by motion blur. Our deblurring model was successfully trained on the synthetic dataset. We conducted two different experiments with the same data and the same model architecture. In the first experiment, we trained our model with the synthetic images by inputting them as whole images. In the second experiment, we trained our model on smaller random patches cropped from the original images. Additionally, we tested our data using a classical Fourier based deblurring approach using Wiener filtering [24], also called Wiener deconvolution method. Figure 4 shows the resulting images of our trained models tested on a never-seen-before on-the-fly contacted EL image along with results obtained by applying the Wiener filter for comparison. Three different samples are shown here, each with a different type of defect: dark spots with surface scratch (Fig. 4a), crack (Fig. 4b) and finger interruptions (Fig. 4c). The first column shows the on-thefly measured EL image given as input. The next three columns show the results from our trained models and the Wiener filter, respectively. The last column shows the still image of the same solar cell captured by a standard EL measuring device in stop-mode. The contact bars look different in the output and target images due to the different systems used for the measurements. Overall, both trained models can reconstruct the EL images from the blurred EL images with significant artifacts remaining in the generated EL images. The models are able to detect the edges correctly and to reconstruct the relevant defects. On comparing the generated images with the still EL image (reference), we can see that the full-image-trained model reconstructs the defect and spots more darker and slightly broader than the patch-trained model in Figure 4a. But in Figure 4b, the patch-trained model reconstructs a darker defect. The results obtained by applying the Wiener deblur filter are quite noisy. The classical Wiener deconvolution method is able to detect the defects from the blurred images but is unable to recreate them as good as our models. In Figure 4c, both our trained models have overestimated the finger interruptions whereas the Wiener filter could not reconstruct any finger interruption.
For image quality assessment, several similarity measures, such as peak-signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) [25] are used.
To assess the quality of our model predictions, we use two measures, SSIM and Pearson correlation coefficient (PCC) [26], to measure the similarity between the reconstructed EL images and the still EL images.
SSIM measures between 0 and 1. The closer it is to 1, the higher is the correlation and vice versa. Let m x and m y be the means of reference image x and deblurred image y, s 2 x and s 2 y be the respective variances and s xy be the covariance   of x and y, The metric PCC measures between À1 and 1. A PCC of 1 indicates positive correlation, whereas a PCC of À1 indicates a negative correlation.
Due to the differences in the measuring equipment, the contact lines are of different types and thus look different as seen in the results above. To compute a one-on-one similarity measure between the reconstructed outputs and the reference images, the metrics, SSIM and PCC, are measured on a smaller region of the EL images instead of the whole image. This will exclude the contact lines in the similarity computation. We selected five images to measure the similarity. Two examples are shown in Figure 5. Here, we show two cropped regions where the similarities based on PCC and SSIM are measured. The first row shows the cropped regions of the reference images and the second row shows the corresponding cropped regions of the model generated images. Figure 5a shows the quantitative results of full-image-trained model. The measured PCC and SSIM do not vary for the two regions. The quantitative results of the patch-trained models are shown in Figure 5b. The measured PCC for the two regions vary to a slight extent, whereas the SSIM values are aprroximately 0.70 for the two regions.

Discussion
We have shown that our models can deblur the on-the-fly contacted EL images with some additional generated artifacts. From the results in Section 6, we can visually see that our model works well with on-the-fly contacted data. With the synthetic dataset, a successful transfer to real data is possible. Our model is able to detect the edges of the sample accurately and reconstruct the defects. Considering that the measurements in the synthetic dataset and the real dataset were recorded using different measurement devices, our model is able to successfully transfer the knowledge from one domain to the other.
When comparing our results to Wiener-filtered EL images, the reconstructed images of the classical method are very noisy. This noise can be referred to a reconstruction of the noise added to the blurred image due to the high camera gain. In contrast, our machine learning approach is able to avoid the reconstruction of noise patterns.
To quantify our results, we use similarity measures SSIM and PCC. According to the results in Figure 5, the full-image trained model and patch-trained model do not differ based on the SSIM score. The SSIM metric considers the intensity distribution of images. Both the predicted outputs have similar intensity values and therefore the score does not vary between the two models. The computed SSIM score measures between 0.64 and 0.70 which shows an average degree of similarity. This may be due to difference in the intensity distribution due to different measuring systems for on-the-fly  and stop-modes. Considering the PCC, the patch-trained model shows a higher correlation between the predicted and the reference images than the full-image-trained model. This is because the model can predict the fine details on a patch level like finger structures accurately as seen in Figure 5b whereas the full-image-trained model can only predict the details at a coarse level. Overall, the visual perception of the reconstructed images seems very good with respect to the strong motion blur and noise due to the camera gain. During cell classification using deblurred EL images, the solar cells with scratches, dark spots and cracks may be correctly classified. However, solar cells with finger interruptions may be overestimated and falsely classified using the deblurred EL images. In future works, existing and especially nonexisting defect artifacts should be improved to avoid the false classification of healthy samples.

Conclusion
A measuring system for on-the-fly contacted measurements has been set up as a demonstrator which is capable to perform on-the-fly EL-measurements with an integration time of 7 ms. To achieve a reasonable signal-to-noise ratio within the exposure time of 7 ms for which a deblurring can be performed, our tests are restricted to high-efficiency cell concepts such as hetero junction (SHJ) and TOPCon solar cells. Due to synthetic and measured data, we developed a data-driven deep learning model to sharpen EL measurements taken on the moving sample (Sect. 2). In contrast to generic deconvolution methods, our data-driven approach allows a domain-specific and thus realistic reconstruction of EL images despite the strong noise and motion blur. We have successfully trained our model on synthetically generated data and tested it on EL test images measured at the on-thefly contact measurement unit. Once successfully trained and tested, our model allows for a quick deblurring prediction in about 3 ms on a consumer GPU. Thus, the model can be deployed into production without affecting the measurement time. The EL images reveal clear defect structures, e.g., finger interruptions and dark spots. Integrating our deblurring model into the contacted on-the-fly measurement, we will be able to compute fast EL measurements and quickly preprocess the images for defect evaluation and thus increase the throughput. This work was funded by the German Federal Ministry for Economic Affairs and Climate Action within the project "NextTec" (03EE1001A). This work was partially conducted at halm Elektonik Gmbh and Fraunhofer Institute for Solar Energy Systems. We thank everyone who extended their support during this work.