Treffer: Exploring the non-linearity in empirical modelling of a steel system using statistical and neural network models
School of Materials Science & Engineering, Bengal Engineering & Science University, Shibpur, Howrah -711103, India
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Weitere Informationen
The relationship between the physical properties of metal is often very complex in nature with its chemistry and several other rolling parameters in operation. Non-linear regression models play a very important role in modelling the underlying mechanism, provided it is known. Artificial neural networks provide a wide class of general-purpose and flexible non-linear regression models. The most commonly used neural networks, called multi-layered perceptrons, can vary the complexity of the model from a simple parametric model to a highly flexible nonparametric model. In this particular work, an industry-based data set is used for learning and optimizing the neural network architecture using some well-known algorithms for prediction under neural-net systems. The outcome of the analysis is compared with the results achieved through empirical statistical modelling from its prediction error level and the knowledge of materials science.
AN0024155405;b9501feb.07;2019Feb26.13:27;v2.2.500
Exploring the non-linearity in empirical modelling of a steel system using statistical and neural network models.
The relationship between the physical properties of metal is often very complex in nature with its chemistry and several other rolling parameters in operation. Non-linear regression models play a very important role in modelling the underlying mechanism, provided it is known. Artificial neural networks provide a wide class of general-purpose and flexible non-linear regression models. The most commonly used neural networks, called multi-layered perceptrons, can vary the complexity of the model from a simple parametric model to a highly flexible nonparametric model. In this particular work, an industry-based data set is used for learning and optimizing the neural network architecture using some well-known algorithms for prediction under neural-net systems. The outcome of the analysis is compared with the results achieved through empirical statistical modelling from its prediction error level and the knowledge of materials science.
Keywords: Artificial neural network; Steel; Regression models; Learning; Backpropagation algorithm
1. Introduction
An artificial neural network (ANN) is an information processing paradigm that is inspired by the way that biological nervous systems, such as the brain, process the information. It is composed of a large number of highly interconnected processing elements, known as neurons, working in unison to solve specific problems. ANN is configured for a specific problem such as pattern recognition, data classification or prediction through a learning process. Learning in biological systems involves adjustment to the synaptic connections that exist between the neurons. All neural networks have some set of processing units that receive inputs from the outside world, which can be referred to appropriately as the 'input units' or 'input nodes'. It does have one or more layers of 'hidden' processing units that receive inputs only from the other processing units. The set of processing units that represent the final result of the neural network computation is designed as the 'output units'. The process, which the ANN uses for learning, has a lot of similarities with the process of statistical estimation. Neural networks are nothing more than function approximation tools which learn the relationship between independent and dependent variables, much like regression analysis or other traditional approaches.
Concrete applications of neural networks to real problems started from the end of the 1980s when the backpropagation algorithm was introduced for the computation of the network parameters. In the last decade many statisticians have investigated the properties of neural networks and it appears that there exists a considerable overlap between statistical and neural network modelling. Neural network can be seen as a general way to parameterize data through arbitrary non-linear functions from the space of explanatory (casual, factor) variables to the space of explained (response, dependent) variables. The most commonly used artificial neural networks, called multilayer perceptrons, are nothing more than the nonlinear regression and discriminate analysis (Sarle [14]). Along with some relevant statistical techniques, ANN provides better treatments to some common problems of modelling and inference (Cheng and Titterington [1]). There are similarities of terminologies between statistics and neural network like (variables, features), (estimation, learning), (estimates, weights), (interpolation, generalization); (observations, patterns) and many others (Sarle [14]). Some neural network models are similar or almost identical to statistical techniques. For example, feed-forward nets with no hidden layers are the generalized linear model, probabilistic neural nets are identical to kernel discriminate analysis, Kohonen learning for adoptive vector quantization is very similar to K-means cluster analysis, etc. The principal difference between neural networks and statistical approaches is that neural network makes no assumptions about the statistical distribution or properties of the data, and therefore tends to be more useful in business and practical situations.
Neural network can also be applied very effectively and efficiently to those areas where developing a physical model is very difficult. Several attempts have already been made in semiconductor processing, but those were mostly confined to the plasma etching process of the wafers (Himmel and May [5], Kim and May [9]). Jeong
This paper includes a comparative study of statistical process of prediction and that of a neural network for a highly complicated system in steel. The data used in the context are generated in the metallurgical laboratory. Dual phase (DP) steel, one of the prospective members of the family of HSLA steels, has been considered for this study. This steel possesses composite microstructure comprising hard martensite particles dispersed in the soft ferrite matrix. C, Mn, Si, Cu, Ti and B, cold deformation, cooling rate, aging time and aging temperature have been taken as input parameters, whereas hardness is designated as the output variable.
The paper contains the general discussion on the basis of prediction tools like linear regression, stepwise regression, generalized linear model (GLM), and projection pursuit regression (PPR) along with the backpropagation algorithm used by ANN for learning, generalized regression neural network (GRNN) are discussed in section 2. The implementations and results are given in section 3 followed by comparison of results. A brief discussion on the results and conclusions are stated in sections 4 and 5, respectively.
2. Materials and methods
2.1 Linear regression analysis
Regression analysis is a statistical technique for investing and modelling the relationship between variables. The simple linear regression model can be written as
Graph
where
Graph
The linear regression model is based on the basic assumption that the errors are random and independently normally distributed with mean
One of the important parameters for regression analysis is residual mean square error (MSE). The lower the value of MSE, the higher is the adequacy of the model. Instead of MSE, there are two major parameters, by which we can assess the overall adequacy of the model—
2.2 Stepwise regression
Evaluating all possible regressions can be burdensome computationally; various methods have been developed for evaluating only a small number of subset regression models by either adding or deleting regressors one at a time. These methods are generally referred to as stepwise type procedures. These are forward selection, backward elimination and stepwise regression. The last one is a popular combination of procedures of the first two. It is a process in which at each step all regressors, entered into the model previously, are reassessed via their partial
2.3 Generalised linear model (GLM)
A generalized linear model provides a way to estimate a function (called the link function) of the mean response as linear function of the values of some set of predictors. This is written as
Graph
where
A very fundamental idea is that there are two components to a GLM: the response distribution (also called the error distribution) and the link function. Another concept underlying GLM is the exponential family of distributions, which includes the Normal, Poisson, Binomial, Exponential and Gamma distributions as members. Since, the normal error in linear model is just a special case of GLM, it can, therefore, be thought of as a unifying approach to many aspects to empirical modelling and data analysis.
The estimates of the regression parameter in GLM are maximum likelihood estimates, produced by iteratively reweighted least squares (IRLS) (Myers
2.4 Projection pursuit regression (PPR)
Let
To find
Graph
provide a 'good' model for the data (
More formally,
Graph
where
Graph
The true model parameters
Graph
over all possible values of
In order to determine a 'good' model for PPR, the value of
Graph
A plot of
2.5 Human brain and artificial neural network
In the human brain, a typical neuron collects a signal from others through a host of fine structures called 'dendrites' (figure 1). Then it sends the spikes of electrical activity through an 'axon' into thousands of branches. At the end of each branch, the 'synapse (link)' converts these activities into excitation of other connected neurons.
Graph: Figure 1. Human nervous system.
The magnitude of the signal received by a neuron depends on the efficiency of the synaptic transmission. A neuron will fire (i.e. send an output impulse) if its net excitation exceeds its inhabitation by a critical amount (threshold). Firing is followed by refractory period during which the neuron will stay inactive.
2.5.1 Backpropagation
Generalizing the Widrow–Hoff learning rule to multiple-layer networks and non-linear differentiable transfer functions created backpropagation. Input vectors and the corresponding target vectors are used to train a network until it can approximate a function, associate input vectors with specific output vectors, or classify input vectors in an appropriate way as defined by users. Standard backpropagation is a gradient descent algorithm in which the network weights are moved along the negative of the gradient of the performance function. The term backpropagation refers to the manner in which the gradient is computed for non-linear multilayer networks. There are a number of variations on the basic algorithms that are based on other standard optimization techniques, but scale conjugate gradient and Levenberg–Merquardt algorithms are the most effective and appreciable.
2.5.1.1 Scaled conjugate gradient (SCG)
The basic backpropagation algorithm adjusts the weights in the steepest descent direction (negative of the gradient). In the conjugate gradient algorithms a search is performed along conjugate directions, which produces generally faster convergence than steepest descent directions. In the case of quadratic functions, exact answers are obtainable without calculating second-order derivatives.
Given a symmetric matrix <bold>Q</bold>, two vectors <bold>d<subs>1</subs></bold> and <bold>d<subs>2</subs></bold> are said to be
Graph
Graph
Commonly used stopping criteria are:
Graph
2.5.1.2 Levenberg–Marquardt (LM)
The Levenberg–Marquardt algorithm was designed to approach second-order training speed without having to compute the Hessian matrix. When the performance function has the form of a sum of squares (as is typical in training feedforward networks), then the Hessian matrix can be approximated as M =
Graph
The network architecture that is most commonly used with the backpropagation algorithm is multilayer feed-forward network (Rumelhart
Graph: Figure 2. Feed forward architecture (single hidden layer).
Here
2.5.2 Generalized regression neural network
A generalized regression neural network (GRNN) is often used for function approximation. It has a radial basis layer and a special linear layer. The architecture for the GRNN contains a hidden layer with radial basis neurons, and a special linear layer for the output layer.
The input to the radial basis transfer function is the vector distance between its weight vector
3. Implementation and results
3.1 Data background
Development of high-strength steels combined with adequate ductility, formability and fracture toughness has drawn the interest of several metallurgists. The demand of such steel is mainly in the sectors like automobile, defense and naval applications. Dual phase (DP) steel provides the option for achieving judicious balance among the desired mechanical properties. The microstructural constitution of DP offers the opportunity of configuring the same in a flexible manner by varying the volume fraction, morphology and distribution of the constituent phases.
In this regard, copper (Cu), being an element known to strengthen through solid solution in ferrite as well as through precipitation hardening, a Ti-B micro alloyed steel has been alloyed with Cu to utilize the individual effect of Ti, B, and Cu as well as their synergistic combined effect. The purpose is to improve the hardenability of austenite which in turn is expected to give rise a DP microstructure depending on the finish rolling condition and cooling rate.
The alloy chemistry (C, Mn, Si, Cu, Ti and B), cold deformation, cooling rate, aging time and aging temperature have been taken as input parameters, whereas hardness is designated as the output variable. The data used for the present exercise have been generated in the laboratories. The chemical analyses are done in atomic spectrometer. Rolling has been carried in a laboratory scale two-high rolling mill. The hardness testing has been carried out in Vicker's hardness testing machine.
The ranges of the variables used in the present work are listed in table 1. Each variable is normalized within the range of 0 to 1 for ANN modelling by the operation given below and used in the same form for other statistical techniques as well.
Graph
where
Table 1. The minimum and maximum limits of the parameters.
3.2 Results
The results of modelling as found using different statistical techniques along with neural network models are presented below.
3.2.1 Linear model
The simple linear regression model for the data considered is:
Graph
The ANOVA table for the above model in given in table 2. The MSE and Multiple-
Table 2. The ANOVA table for simple linear model.
3.2.2 Stepwise regression
The best-fitted second-order linear model found through stepwise regression analysis is
Graph
The ANOVA table for the model is shown in table 3. The MSE and Multiple-
Table 3. The ANOVA table for the stepwise regression model.
3.2.3 Generalized linear model
The generalized linear model (considering the second-order polynomial) is given below:
Graph
The MSE for this model was found as 0.00362.
3.2.4 Projection pursuit regression
The value of
3.2.5 Backpropagation
The ANN used in the present case happens to be a supervised multilayered feed-forward network, trained with standard gradient descent backpropagation algorithms. The input variables, namely, six compositional and four process variables are defined as input nodes and one property variable is described as output variable. A total of 242 observations were considered for analysis of which train and test data were divided randomly in the 70:30 ratio.
Similar operations are repeated for varying numbers of hidden layers and nodes within layers in order to find out suitable network architecture. In the process of learning, the error of the calculated or predicted output in relation to the actual output is backpropagated to adjust all the weights and bias values, using both SCG and LM algorithms. For choosing the type of transfer function to be used, it has been described by the previous workers that owing to the attainable flexibility, hyperbolic tangent function is most suitable for modelling metallurgical problems.
From figure 4, the decision regarding the number of hidden layers and hidden units to be used can be taken. It shows that the training error is minimum when five hidden layers are used, however the test error is minimum for a six-layered network. But there is not much deviation between the minimum test errors estimated for five- and six-layered networks.
Graph: Figure 3. GRNN architecture.
Graph: Figure 4. Training and test error vs. hidden nodes using SCG.
But in the case of the LM algorithm it is very clear from figure 5 that the minimum values of both train and test errors are obtained when four hidden layers are used. Also the learning algorithm LM has produced much better predictions than that of the SCG, but still it is not convenient to choose LM as the better-optimized learning algorithm. It was further observed that training through LM takes much smaller number of epochs to reach the goal; however, computational time is also very important where LM takes much higher time per epoch because of its complicated algorithm than with SCG. The following figure shows the scatter diagram for predicted versus measured values under normalized condition for the networks trained with SCG and LM.
Graph: Figure 5. Training and test error vs. hidden nodes using LM.
Now, it is clear from figure 6 that both the algorithms have reasonably good capability to train the network for this specific problem, but the network trained by SCG has slightly better prediction ability. Finally, the two optimized architecture were selected as (10-35-28-21-14-8-1) and (10-20-16-11-17-1) when SCG and LM algorithms, respectively, were used for training.
Graph: Figure 6. Predicted vs. measured values after training with SCG and LM.
The individual effects of different alloying elements and processing parameters as predicted by the trained ANN with optimized architecture are shown in figure 7. The training was done using the SCG algorithm. During prediction, when one parameter is varied the others are fixed in the mean value of the range of variables used for training.
Graph: Figure 7. Individual effect of different variables on the hardness of the steel as learnt by the ANN (learning with SCG algorithm).
Graph
Graph
Graph: Figure 8. Train error and test error versus spread.
Table 4. Comparative study of prediction errors.
3.2.6 Generalized regression neural network
It has already been discussed that in GRNN, the spread of the radial basis function is the most important criteria of optimization. As the spread changes, the test and training error changes significantly. In this study, this was observed as below.
It can be seen that the train error increases as the spread increases, but it is just the opposite for test error, which initially falls very rapidly, stabilizes and then starts increasing. The optimized network is chosen with spread 0.11. The magnitude of training error and testing error were 0.0012 and 0.0396, respectively.
4. Discussion
The previous section shows the approaches adopted by different statistical tools and neural network models to function approximation problem and their results considering the given data set. The comparative study reveals that the neural network with backpropagation algorithm (specifically SCG) exhibits the better prediction capability. The comparative performance of the techniques used is done based on the commonly used characteristic, i.e. mean squared error (MSE). Other means of comparison could be with respect to mean absolute percentage error (MAPE), root mean square deviation (RMSD), etc.
It is very clear from the overall discussion that in industry, where fitting a physical model is very difficult, ANN can be applied efficiently with significant prediction capability. ANN does not make any assumption on the distribution and the properties of the data and therefore tends to be more useful in practical scenarios, not only in the manufacturing industry, but also in the nonmanufacturing business sector.
Although one aspect of a satisfactory model is to reach a sufficiently low prediction error level, another aspect of the model that can draw some interest to the workers is the validation of the model from the point of prior knowledge of the system being modeled. In this case the effects of the different parameters on the final output could be analysed from statistical as well as ANN models. In the case of the linear regression model, it shows that carbon has a negative effect on the hardness of the final product, which is against the prior understanding of any kind of steel. Similarly, both aging time and aging temperature have a negative effect on the hardness, which is also difficult to agree in copper-added steel. As copper precipitates in the steel during aging, the hardness of the steel increases. The
From the above discussion it is quite evident that ANN has the superiority in reflecting the prior knowledge of the system in its prediction than the statistical methods in case of a highly complicated steel system used in this case. Although the comparisons were made based on only one system, still it seems that ANN can accommodate the complexity and non-linearity of a system better than the regression models, keeping the peculiarity of the system used in this work in mind. Hence from the above experience it can be predicted here that ANN models could produce better results compared to the statistical methods when used in other systems.
5. Conclusion
Among the statistical methods, the projection pursuit regression model is found to have better capacity for predicting the property of the steel modeled in the present case. Among the neural networks, the network using the SCG algorithm for error optimization was found to be superior to the other most common (LM) algorithm. It was also found that the neural network with backpropagation algorithm exhibits much better prediction capability than the generalized regression neural network. As the present work is carried out on a highly complex steel system it can be concluded, in general, that the neural network is able to explain such systems in a better way than using statistical methods. The ANN models can also be validated from the system's point of view in a superior manner.
References
1 Cheng, B and Titterington, DM. 1994. Neural networks: a review from a statistical perspective. Statist. Sci., 9(1): 2–54.
2 Datta, S and Banerjee, MK. 2004. Optimizing parameters of supervised learning techniques (ANN) for precise mapping input output relationship in TMCP steels. Scand. J. Metall., 34: 1–6.
3 Friedman, JH and Stuetzle, W. 1981. Projection pursuit regression. J. Am. Stat. Assoc., 76(376): 817–823.
4 Gorni, AA. 1997. The application of neural networks in the modeling of plate rolling process. JOM-e, 49(4)(http://www.tms.org/pubs/journals/JOM/9704/Gorni/)
5 Himmel, CD and May, GS. 1993. Advantages of plasma etch modeling using neural networks over statistical techniques. IEEE Trans. Semiconductor Manuf., 6(2): 103–111.
6 Jeong, B, Lee, J and Cho, H. 2005. Efficient optimization of process parameters in shadow mask manufacturing using NNPLS and genetic algorithm. Int. J. Prod. Res., 43(15): 3209–3230.
7 Jiao, Y, Pei, ZJ, Lei, S, Lee, ES and Fisher, GR. 2005. Fuzzy adaptive networks in machining process modelling: dimensional error prediction for turning operations. Int. J. Prod. Res., 43(14): 2931–2948.
8 Juang, J-G and Chio, J-Z. 2005. Fuzzy modelling control for aircraft automatic landing system. Int. J. Syst. Sci., 36(2): 77–87.
9 Kim, B and May, GS. 1994. An optimal neural network process model for plasma etching. IEEE Trans. Semiconductor Manufact., 7(1): 12–21.
Montgomery, DC, Peck, EA and Vining, GG. 2001. Introduction to Linear Regression Analysis, New York: John Wiley & Sons, Inc..
Myers, RH, Montgomery, DC and Vining, GG. 2002. Generalized Linear Models, New York: John Wiley & Sons, Inc..
Pacella, M, Semeraro, Q and Anglani, A. 2004. Adaptive resonance theory-based neural algorithms for manufacturing process quality control. Int. J. Prod. Res., 42(21): 4581–4607.
Rumelhart, DE, Geoffrey, EH and Ronald, JW. 1986. Learning representation by back-propagating errors. Nature, 323: 533–536.
Sarle, WS. 1994. Neural networks and statistical methods. Proceeding of the Nineteenth Annual SAS Users Group International Conference,
Singh, SB, Bhadeshia, HKDH, Mackay, DJC, Carey, H and Martin, I. 1998. Neural network analysis of steel plate processing. Iron Making Steel Making, 25(5): 355–365.
Specht, DF. 1991. A general regression neural network. IEEE Trans. Neural Netw., 3(6): 568–576.
By Prasun Das and Shubhabrata Datta
Reported by Author; Author