A self-consistent hybrid model connects empirical and optical models for fast, non-destructive inline characterization of thin, porous silicon layers

. Epitaxially-grown wafers on top of sintered porous silicon are a material-ef ﬁ cient wafer production process, that is now being launched into mass production. This production process makes the material-expensive sawing procedure obsolete since the wafer can be easily detached from its seed substrate. With high-throughput inline production processes, fast and reliable evaluation processes are crucial. The quality of the porous layers plays an important role regarding a successful detachment. Therefore, we present a fast and non-destructive investigation algorithm of thin, porous silicon layers. We predict the layer parameters directly from inline re ﬂ ectance data by using a convolutional neural network (CNN), which is inspired by a comprehensive optical modelling approach from literature. There, a numerical ﬁ tting approach on re ﬂ ection curves calculated with a physical model is performed. By adding the physical model to the CNN, we create a hybrid model, that not only predicts layer parameters, but also recalculates re ﬂ ection curves. This allows a consistency check for a self-supervised network optimization. Evaluation on experimental data shows a high similarity with Scanning Electron Microscopy (SEM) measurements. Since parallel computation is possible with the CNN, 30.000 samples can be evaluated in roughly 100ms.


Introduction
This work focuses on the inspection of sintered porous silicon as part of the epitaxial wafer production, which will play a major role in the future of renewable energy production. Common mass production of silicon wafers depends on the sawing of silicon blocks, which leads to a material loss of up to 40% [1]. With increasing numbers of solar energy systems, research needs to not only focus on creating higher efficient solar cells, but also to make production cheaper and more resource efficient. With epitaxially-grown wafer technology, a resource-efficient technique is newly launched into mass production [2]. In this process wafers are grown from chlorosilane gas on top of a seed substrate. The wafer is then detached and the seed substrate can be reused [3]. Purification processes of silicon and sawing procedures are hence obsolete. To optimize this process and for quality inspection during high throughput production, new characterization techniques specialized for the inline processes and the materials must be developed.
One important step here is the etching of porous layers on top of a seed substrate visualized in Figure 1. These layers are highly relevant for a successful wafer production and therefore a fast and reliable inline characterization of these structures is important. The different layer properties are designed to form a smooth surface, as basis for the epitaxial growth, and a detachment layer for easy mechanical detachment. The absolute parameter values and uniformity of the porous layers influence the quality of the epitaxial wafer as well as a successful detachment. Hence reliable information regarding the porous layers are valuable for the manufacturer as it can be used as a quality measure for the inline etching process.
However, the existing characterization is not sufficient for fast full-scale wafer analysis and therefore not for inline process control. Commonly, the investigation of such thin layers is done via Scanning Electron Microscopy (SEM) which has three disadvantages. First it is destructive, second it is costly and time-consuming to measure the thicknesses at multiple points and third porosities can't be measured. A non-destructive analysis of such thin films can be performed, using reflection measurements [4][5][6]. To estimate the layer parameters, an optical physical model is fitted by least-squares to the measurement results. Since this needs a numerical optimization procedure per measured reflection spectra, it is a time-consuming procedure and not suitable for high throughput production.
Our approach provides a fast and non-destructive investigation method, that can be used for full scale inline investigation for a two-layer stack. Since we are working with a neural network architecture, parallel computing allows the evaluation of 30.000 spectra in roughly 100ms. Modern measurement techniques e.g., hyperspectral imaging, can measure reflectance spectra across the full wafer area, providing thousands of spectra at once. With our model the evaluation of this data is possible in an inline setup. This allows analysis of the quality and uniformity of the etching process during production.
Our work is based on the physical model described in [7], that uses spectroscopic reflectance data and a stepwise optimization procedure to predict the same layer parameters. In the mentioned work, the physical model is modified such that it can be fitted optimally to actual inline measurement data. For this aim, an additional parameter is introduced, the relative standard deviation of the layer thickness, which depicts the thickness distribution due to overlying signals in big measurement spots. In this work, we speed up the prediction process by using a CNN and provide an algorithm to evaluate even thousands of spectra in parallel, which can e.g., be measured by hyperspectral imaging systems.
Applying neural networks have shown promising results in predicting thicknesses for single and multilayer stacks whereby our approach differs from similar approaches in three things.
References [8][9][10][11] use ellipsometry data for the accurate prediction of the layer thickness. However, these models require multiple input values from ellipsometry data, which is a time-consuming measurement technique and not suitable for mapping procedures. As a first innovation, our approach requires reflectance values only and is therefore suitable for fast inline evaluation.
Second, we predict multiple layer parameters and a value to handle measurement uncertainties. Since the investigated samples have two porous silicon layers, two thickness and two porosity values are estimated.
Third, by adding the physical model on top of our classic model, we reverse the layer parameter prediction and use the so calculated reflection curves for a consistency check of our model. The model is derivable and can therefore be used for additional network optimization. A similar approach improves the layer parameter prediction for ellipsometry data as shown in [9]. Here a second neural network takes the role of our physical model to predict ellipsometry values from layer parameters. Thus, a cyclic optimization procedure can be used to train the network. The second network, however, must be trained as well and brings its own uncertainties into the training procedure. We, on the other hand, use a well-established physical model and therefore reliable information resources. The consistency information can be used to identify uncertain regions of the parameter space. This information is a trigger for a refinement of the model within uncertain layer parameters.

Method
Our goal is a fast and reliable algorithm to predict the thicknesses and porosities for two layers of porous silicon from inline-measured spectroscopic reflectance spectra. To achieve this, we combine a convolutional neural network with a physical model. The neural network allows fast evaluation of multiple measurements due to parallel computing. The physical model is used to recalculate reflection curves with predicted layer parameters and therefore makes consistency checks possible. We trained and evaluated the following three network variations: classic model, reflection model and hybrid model. All three are trained with a synthetic dataset, consisting of pairs of reflectance spectra R ∈ ℝ 100 and layer parameters lp ∈ ℝ 5 and evaluated on synthetic as well as experimental data. Figure 2 shows the structure of our simple model, which predicts layer parameters lp predicted ∈ ℝ 5 using reflectance spectra as input information. Since the input information has only one spatial dimension, the network uses 1D convolution in each block. The kernel size 1 Â 7 of the initial convolution is chosen slightly bigger than for the rest of the network to capture neighbouring information, such as local minima or maxima. Followed by this, there are four convolutional blocks with maximum pooling layers with kernels size 1 Â 2 in between. Every block consists of two convolutional layers with a kernel size of 1 Â 3, each preceded by batchnormalisation [12] and rectified linear unit activation. We predict all layer parameters at once and therefore use an average pooling layer followed by a linear layer for the output. Our model is implemented in pytorch [13] and can be rebuild by using the structure of Figure 2.

Classic model
The optimization of this model is done using a regression loss. Specifically, by calculating the L1-loss between actual layer parameters and the ones predicted by the network, displayed in equation (1) Regression À Loss lp; lp predicted À Á ¼ 1 5 We scale the actual parameters in a way that a difference of 0.1 in the L1-loss corresponds to a difference of 100 nm for the thicknesses, 10% for the porosities and 0.01 for the thickness uncertainties.

Reflection model
For our reflection model, we add a physical model to the classic model. The physical model is taken from [7] and calculates reflectance spectra R predicted with given layer parameters. We changed the implementation of the physical model to pytorch [13] to make parallel computing of multiple reflection curves and backpropagation possible. The predicted layer parameters from the classic model are now used to recalculate reflectance curves. Since this is the reverse operation of the classic model, we are now able to check our model for consistency. Therefore, we calculate the L1-Loss of the input reflection curve with the one calculated with the predicted layer parameters shown in equation (2) Consistency À Loss R; R predicted À Á ¼ 1 100 This value is now used to optimize our model for the prediction of the layer parameters. During initial training runs we faced the problem, that the model output range of the layer parameter values did not match the range of the actual layer parameters, since also negative values occurred. Therefore, we added an additional loss displayed in equation (3) Range À Loss lp predicted À Á ¼ 1 5 that forces the model to predict only positive values.

Hybrid model
The hybrid model has the same structure as the reflection model À a combination of a neural network and a physical model À but we use both, the consistency equation (2) and the regression loss equation (1), for the optimization. In Figure 3, the schematic information flow during the training procedure is shown. Since the training with the two losses behaves quite differently, we add a scaling factor and a weighting parameter u ∈ [0, 1]. The weighting parameter is set to 0 at the beginning of the training procedure and increases linearly, so that its value reaches 1 after 500 epochs. This result in equation (4) Hybrid À Loss lp; r; lp predicted ; r predicted À Á as the loss function to optimize the hybrid model.

Physical model
The reflection and the hybrid model consist not only of a machine learning part, but also of a physical model, which is taken from [7]. In this work, a numerical fitting approach is used to determine the layer thicknesses and porosities of two porous layers on top of a thick silicon wafer from reflectance spectra. Our physical model simulates the optical properties of this stack by using the transfer-matrix method (TMM). This method requires refractive indices and thicknesses of all layers involved to calculate the reflection and transmission coefficients for the whole stack. For bulk silicon these coefficients are well known. For porous silicon they can be determined via effective medium models such as the Bruggeman approximation [14,15]. This formular is integrated and hence, the thicknesses and porosities for each optical stack are the effective inputs for our physical model. There are some simplifications in this approach worth mentioning. For simplicity the light incident is considered perpendicular to the surface and due to the thickness of the bulk silicon, backside reflection is ignored. Additionally, light scattering effects are also not considered. It cannot be excluded that bulk or interface scattering effects influence the results of our work. However, as stated in [7], the simulated reflectance curves show still a good overall agreement with the measured ones. Also, the SEM measured thicknesses correlate well with the results from the fitting approach (see Fig. 9). Therefore, the influence on the results of the rest of our work is expected to be small.
The interface between two layers, as can be seen in Figure 5, is not smooth and due to a measurement spot size of roughly 0.78 cm 2 this can influence the measured reflectance spectra. According to [7] the physical model is tuned by introducing a new parameter, the relative standard deviation (RSD) of the thickness to tackle this uncertainty. A variation of multiple thicknesses for the porous layers are used to calculate reflection curves and the results are then weighted by the probability function of a standard distribution to determine the actual output. The deviation of this distribution is given by the RSD.

Synthetic data
All three models are trained with a synthetic dataset, created with the physical model from [7]. The samples are designed to have two porous layers on top of each other. The top one is thicker but has smaller porosity than the layer underneath. We calculate reflection values at 100 positions in a wavelength range of 450-1000 nm. The parameter ranges are chosen as follows: The thickness of the first layer ranges from 650 to 1500 nm and of the second layer from 100 to 550 nm, both with 50 nm steps in between. The porosity for the first layer ranges from 20% to 40% and for the second layer from 40% to 65%, with 5%-steps in both cases. The relative standard deviation values are picked from (0, 0.01, 0.015, 0.02, 0.025, 0.03). By using all possible combinations, we get 36000 pairs of layer parameters and reflectance curves. We split this dataset in 80% data for training, 10% for validation and 10% for testing. The validation set is used during training to check the model performance on an independent dataset. The evaluation shown in the result section is done on the test dataset.

Training details
We use the Adam optimizer [16] with weight-decay 1e À5 to optimize our models. As initial learning rates, we chose 1e À2 for the classic model, 1e À4 for the reflection model. The learning rate is reduced while training with the ReduceLrOnPlateau-scheduler implemented in pytorch [13] by a factor of 0.75 when the loss is not decreasing fast enough on the validation set. We split the training-set in batches of size 2048 and train at maximum 1000 epochs, with early stopping, when there are no more significant changes in the performance on the validation set.

Synthetic data augmentation
To make our model robust against possible measurement uncertainties and to prevent our model to just memorize the data, we apply data augmentation. An example is shown in Figure 4. We add uniformly distributed random noise with m = 0 and s = 0.04 as a vector with the same length as the reflectance spectra to the input. Additionally,  we shift the input signal randomly by up to four positions which corresponds to a wavelength difference of around 20 nm. The now not defined values are filled with zeros. For example, a shift of three positions at the beginning means, we take the last 97 reflection values as the first 97 ones and add three zeros at the end to match again the input size of our CNN. This does not affect the training quality, since zeros are added at the edge of samples as well to perform convolution without loss in spatial dimension.

Experimental data
Since we aim to provide an algorithm for fast inline evaluation of porous silicon, we verify the performance of our models not only on synthetic data, but also on experimental samples. We use the same samples and measurements as in [7], but focus on three wafers with a two layer structure with similar specifications as in our trainings data. The samples were electrochemically etched on 9-11 mV-cm p-type Cz-Si Wafers of 500 mm thickness to create the two-porous-layer structure at the front side of the wafer. The top layer is designed to have a lower porosity than the one in-between the top and the bulk silicon.
The inline spectrometer is a Zeiss MCS722 combined with an OFR 104 Flare Head with 4°viewing angle integrated in an automated measurement system. It determines the total reflectance (diffuse + specular) by using an Ulbricht sphere with hemispherical white light illumination and a spot size of roughly 0.78 cm 2 . The provided wavelengths range from 380 to 1100 nm. Since there are some uncertainties in the outer wavelength regions, we only focus on the values measured at the range of 450-1000 nm. The measurements are taken at a line at the centre of the wafer, while the wafer is moving on a conveyor belt. With one spectrum measured approximately every 10 ms, there are multiple reflection curves per wafer.
Further, we compare the model prediction of the layer thicknesses with SEM measurements, like also done in [7]. Different spots on the wafer were characterized by capturing three images at slight position variations with a Zeiss Auriga 60 SEM. The different layers are visually easy to distinguish, as can be seen in Figure 5 and therefore the thicknesses can be measured with the evaluation tool of the microscope. We take the mean of the three values as a reference measure for the quality of our network prediction.
Since we use the same data as [7], we also directly compare the performance of our model with the numerical fitting approach and receive further feedback.

Model evaluation on synthetic data
All three models were trained and validated with the same dataset, so that the performance of the models can be compared based on the same test data. In Figure 6, we show the predicted layer parameters subtracted from the real ones for each model and each parameter separately. Since we use 10% of the synthetic data for testing, 3600 samples are evaluated and displayed in this graph. Within the boxes lay 50% of the datapoints. The boxplot whiskers are the 2.5% and the 97.5% percentile, so that 190 examples are displayed as outliers. If we investigate the real layer parameters for some of these outliers for the hybrid and the classic model e.g., an absolute difference in layer 1 thickness prediction higher than 50 nm, all have the highest possible relative standard deviation 0.03. The models seem to struggle with the big changes in the reflectance spectra caused by this high relative standard deviation value. We see an overall reduction in the error deviation when we compare the results of the hybrid with the one from the classic model. Regarding accuracy, the thickness prediction for the first layer and the prediction of the RSD value seem to benefit most from the consistency check. For the other values an improvement in the error deviation is also visible, however these values tend to be underestimated, especially by the hybrid model.
The mean relative errors (MRE) for both model predictions can be found in Table 1. The MRE decreases in most cases for the hybrid model in comparison to the classic model. Only the porosity prediction of layer 2 gets slightly worse. However, the number and the maximum absolute difference value decrease with the hybrid model prediction compared to the classic model. To train the model not only with the regression loss, but also add the consistency check, seems to pay off in overall prediction accuracy when working with synthetic data.
The reflection model trained without regression loss is predicting values in a reasonable range, but not close to the actual label parameters. Even with multiple changes in training details, such as different optimizer and loss functions, e.g., Mean Squared Error, different initial learning rates and batch sizes and even different CNNstructures, no significant optimization in the resulting prediction was achievable. Also, more parameter variations Fig. 5. SEM image of a silicon substrate with two porous layers. Layer 1 is the top layer with lower porosity and Layer 2 the one below that. The two layers as well as the surface and bulk material are visible distinguishable, which allows thickness evaluation based on these images.
A. Wörnhör et al.: EPJ Photovoltaics 14, 4 (2023) in the synthetic dataset or changes of the wavelength range did not help. We compare the input reflection curve with the one calculated with the predicted layer parameters. As shown in figure, the model is not able to recreate the reflection curve. The information of the consistency loss is not sufficient to train our model properly, although we penalize negative layer parameters predicted by the CNN model.

Discussion of loss contributions
We investigate the relation between the consistency loss and the regression loss to get a better understanding, on how both losses could influence the training. In Figure 7, the two losses are plotted against each other using the results of the classical model. The green rectangle marks an area with a low consistency but relatively high regression loss values. The high prediction error of the layer parameters indicates that the consistency loss alone provides not enough information to train the network. Since there are other ways of determining the differences of two curves, another consistency loss than the L1-diffence of both curves could lead to better results. We performed tests with other loss functions for the comparison of reflectance curves but were not successful so far. On the other hand, the orange rectangle marks an area with high consistency and medium regression loss values. This substantiates our observation, that the training can profit from combining the regression loss with the consistency loss.  6. Boxplot of the difference between the model prediction and the actual layer parameters for all three models (different colours). Since the performance of the reflection model is significant worse than for the other two, the results are displayed separately.  The consistency loss is added to optimize our model also with respect to the recreated reflexion curves. We investigate how the different models perform in recreating the reflectance spectra. In Figure 8, we display two exemplary reflection curves. The left example is chosen with consistency and regression loss values from the orange region in Figure 7 and the right one from the green area. The left picture shows an improvement in the consistency of the recreated reflection curve when the layer parameters predicted with the hybrid model are used. On the other hand, we do not see such improvement for the example from the green area. Since the consistency loss is already low for the classic model prediction, it adds only bit information to the optimization of the hybrid model. This results in only slight changes of the predicted reflection curves. However, by comparing the results of the layer parameter prediction, the hybrid model shows an improvement, compared to the classic model.

Model evaluation on experimental samples
The classic and the hybrid model are also evaluated on experimental data, due to the good performance on the synthetic data.
We use the same samples and measurements as in [7] for the evaluation of our models on experimental data. Like that, we can directly compare the performance of the CNN-Models with the fitting approach. We focus on the inline measurement system, referred to as Inline spectrometer 2 by [7]. Since positioning of the SEM measurement spots, as well as the reflection measurement spots is uncertain, we evaluate four neighbouring reflection curves, measured around the centre of the wafer. The results shown in Figure 9 are the mean values of the four predicted layer parameters. Since we train our models only with reflection values in a range from 450 to 1100 nm, we filter the measured reflection values accordingly, before using them as input for our models.
The prediction for both layer thicknesses correlate well with the values measured by SEM, which means both models are capable to predict layer thicknesses within a reasonable range. The mean relative deviation of the classic model is 2% for layer 1 and 11% for layer 2. For the hybrid model we have 3% for layer 1 and 14% for layer 2. The prediction worked especially well for sample 2 and 3. However, the thicknesses for Sample 1 show errors up to 90 nm for the hybrid model prediction and up to 65 nm for the classic model prediction.

Discussion
For experimental data it is noticeable that the predictions of layer thickness are slightly more accurate for the classic model than for the hybrid model. We investigated the contrary on synthetic data, where the hybrid model outperformed the classic model in predicting correct thicknesses. It is possible, that the additional consistency loss may cause the hybrid model to adapt the specifications of synthetic reflectance data. Even though we applied data augmentation, it seems not enough to simulate real measurement uncertainties. Therefore, more focus must  A. Wörnhör et al.: EPJ Photovoltaics 14, 4 (2023) be set on the domain change from synthetic data to real measurements. Either by performing further data augmentation or by adding a self-supervised online optimization of the model, using only the consistency check.
The results for sample 2 and sample 3 match the SEM measurements well, when comparing the predicted parameters for both models. However, sample 1 shows some variations. In Figure 10, the recalculated reflection curves with both model predictions and the actual measurements are shown for the four neighbouring measurement spots considered for evaluation. One of the spots shows a completely different shaped reflection curve. This results in a drastic variation of the layer thickness prediction for both models. For the classic model the prediction range for Layer 1 is about 400 nm and around 200 nm for the hybrid model. This could indicate a change in the layer structure which causes an overlay of multiple reflectance spectra and results in this measurement artefact. This thesis is supported by comparing the centre values measured with SEM with four additional ones taken of the same wafer. For these points the Layer 1 thickness varies between 750 and 850 nm and only for the centre point values around 990 nm is measured.
Since we used the same samples and measurements as [7], we can directly compare our predicted layer parameters with the ones shown in Figure 10 of the mentioned reference. They reported a mean relative deviation of 5% in case of layer 1 and 11% in case of layer 2, which matches our results. The same uncertainty in the prediction of the thicknesses of sample 1 are visible. Which is a further indication, that the issue does not lie in the evaluation process, but within the layer structure of the wafer, which makes correct evaluation difficult.
An underestimation of layer 2 thickness and porosity is visible in Figure 6 and similar observations can be seen in the thickness prediction of layer 2 on experimental data in Figure 9. This could be an indication, that the information content of the reflectance spectra is not sufficient for an accurate prediction of both layer parameters. However, we see only slight variance in the error values for the synthetic data in Figure 6 and therefore expect the main cause lies in the difference of the absolute values of layer 2 and layer 1. An improvement of the models could be achieved by using adequate scaling of the input values. But so far, we did not further investigate this option.

Evaluation speed
An important advantage of both of our models is the fast evaluation of multiple samples. On a Nvidia RTX 2080 TI, we can evaluate 30.000 spectra in roughly 0.1 s, which allows fast inline evaluation, even with large quantities of measurement data. This duration does not include data loading and preprocessing. Since the evaluation speed results mainly from the parallel computing of multiple samples, the speed does not increase linearly with the number of samples. If the data size exceeds, the GPU limits of 30.000 spectra, the evaluation must be split on multiple runs and thus the time adds up. The investigated inline measurement systems of [7] take up to 56 spectra at the centre line per wafer, which means an evaluation time of roughly 1 ms. As stated in [7] the evaluation of the same amount of spectra would take approximately 1 min with their approach. For full spatial quality inspection of the porous layers, the evaluation of hyperspectral measurements can be used. These systems can measure up to millions of spectra per wafer, hence rely on fast evaluation techniques. With our models, the evaluation of a hyperspectral measurement with a resolution of 1024 Â 1024 can be done in roughly 3.5 s in comparison to 290 h when the numerical fitting from [7] is used.

Conclusion
We presented two models, the classic and the hybrid model, which are both able to fulfil the task of fast and reliable layer parameter prediction from inline reflection measurements. On synthetic data, the hybrid model outperforms the classic model in predicting the correct values for the layer parameters. The reconstructed reflection curves also show the improvement of the hybrid model, achieved by adding the consistency check to the training procedure. So far, a training procedure solely with the consistency loss was not effective.
To verify the good results achieved on the synthetic data, we also evaluated experimental data with inline measurements. Both model predictions of the layer thicknesses correlate well with actual SEM measurements taken at the same region. Since porosity values cannot be measured directly, the shape similarity of the reconstructed reflection curve gives an indication of the overall performance. Except for one irregular measurement, the reconstruction of the reflection curve works also well with both model predictions. The uncertainty value of the consistency loss given by the hybrid model can be used to trigger a refinement of the model for a continuous online learning process. Since the classic model already performed well on the three samples, it is still open whether we would observe an improvement with the hybrid model for samples with bad predictions based on the classic model, when evaluating a bigger dataset.
The good results on these three samples together with the comparison of the fitting results from [7] indicate, that both models perform well on real measurements. This makes them suitable for fast inline analysis since they calculate multiple layer parameters at once. Therefore, in time evaluation of multiple reflection spectra is possible, even with full spatial resolution measurement systems, like hyperspectral measurement.