# «Chapter 2 Random Process Variation in Deep-Submicron CMOS One of the most notable features of nanometer scale CMOS technology is the increasing ...»

Chapter 2

Random Process Variation

in Deep-Submicron CMOS

One of the most notable features of nanometer scale CMOS technology is the

increasing magnitude of variability of the key parameters affecting performance of

integrated circuits [1]. Although scaling made controlling extrinsic variability

more complex, nonetheless, the most profound reason for the future increase in

parameter variability is that the technology is approaching the regime of fundamental randomness in the behavior of silicon structures where device operation must be described as a stochastic process. Electric noise due to the trapping and de-trapping of electrons in lattice defects may result in large current ﬂuctuations, and those may be different for each device within a circuit. At this scale, a single dopant atom may change device characteristics, leading to large variations from device to device [2]. As the device gate length approaches the correlation length of the oxide-silicon interface, the intrinsic threshold voltage ﬂuctuations induced by local oxide thickness variation will become signiﬁcant [3]. Finally, line-edge roughness, i.e., the random variation in the gate length along the width of the channel, will also contribute to the overall variability of gate length [4]. Since placement of dopant atoms introduced into silicon crystal is random, the ﬁnal number and location of atoms in the channel of each transistor is a random variable. As the threshold voltage of the transistor is determined by the number and placement of dopant atoms, it will exhibit a considerable variation [3]. This leads to variation in the transistors’ circuit-level properties, such as delay and power [5].

Predicting the timing uncertainty is traditionally done through corner-based analysis, which performs static timing analysis (STA) at multiple corners to obtain the extreme-case results. In each corner, process parameters are set at extreme points in the multidimensional space. As a consequence, the worst-case delay from the corner-based timing analysis is over pessimistic since it is unlikely for all process parameters to have extreme values at the same time. Additionally, the number of process corners grows exponentially as the number of process variations increases.

Recently, statistical STA (SSTA) has been proposed as a potential alternative to consider process variations for timing veriﬁcation. In contrast to static timing analysis, SSTA represents gate delays and interconnect delays as probability distributions, and provides the distribution (or statistical moments) of each timing A. Zjajo, Stochastic Process Variation in Deep-Submicron CMOS, Springer Series in Advanced Microelectronics 48, DOI: 10.1007/978-94-007-7781-1_2, Ó Springer Science+Business Media Dordrecht 2014 18 2 Random Process Variation in Deep-Submicron CMOS value rather than a deterministic quantity. When modeling process-induced delay variations, the sample space is the set of all manufactured dies. In this case, the device parameters will have different values across this sample space, hence the critical path and its delay will change from one die to the next. Therefore, the delay of the circuit is also a random variation, and the ﬁrst task of statistical timing analysis is to compute the characteristics of this random variation. This is performed by computing its probability-distribution function or cumulative-distribution function (CDF).

Alternatively, only speciﬁc statistical characteristics of the distribution, such as its mean and standard deviation, can be computed. Note that the cumulativedistribution function and the probability-distribution function can be derived from one another through differentiation and integration. Given the cumulative-distribution function of circuit delay of a design and the required performance constraint the anticipated yield can be determined from the cumulative-distribution function. Conversely, given the cumulative-distribution function of the circuit delay and the required yield, the maximum frequency at which the set of yielding chips can be operated at can be found.

In addition to the problem of ﬁnding the delay of the circuit, it is also key to achieve operational robustness against process variability at the expense of a higher energy consumption and larger area occupation [6]. Technology scaling, circuit topologies, and architecture trends have all aligned to speciﬁcally target low-power trade-offs through the use of ﬁne-grained parallelism [7], near-threshold design [8], VDD scaling and body biasing [9]. Similarly, a cross-layer optimization strategy is devised for variation resilience, a strategy that spans from the lowest level of process and device engineering to the upper level of system architecture. Simultaneous circuit yield and energy optimization with key parameters (supply voltage VDD and supply to threshold voltage ratio VDD/VT) is a part of a system-wide strategy, where critical parameters that minimize energy (e.g. VDD/VT) provide control mechanisms (e.g. adaptive voltage scaling) to runtime system. Yield constrained energy optimization, as an active design strategy to counteract process variation in sub-threshold or near-threshold operation, necessitates the need for statistical design paradigm to overcome the limitations of deterministic optimization schemes.

In this chapter, the circuits are described as a set of stochastic differential equations and Gaussian closure approximations are introduced to obtain a closed form of moment equations and compute the variational waveform for statistical delay calculation. For high accuracy in the case of large process variations, the statistical solver divides the process variation space into several sub-spaces and performs the statistical timing analysis in each sub-space. Additionally, a yield constrained sequential energy minimization framework applied to multivariable optimization is described.

The chapter is organized as follows: Sect. 2.1 focuses on the process variations modeled as a wide-sense stationary process and Sect. 2.2 discusses a solution of a system of stochastic differential equations for such process. In Sect. 2.3, statistical delay calculation and complexity reduction techniques are described. In Sect. 2.4, 2 Random Process Variation in Deep-Submicron CMOS 19 a yield constrained sequential energy minimization framework is discussed.

Experimental results obtained are presented in Sect. 2.5. Finally, Sect. 2.6 provides a summary and the main conclusions.

2.1 Modeling Process Variability The availability of large data sets of process parameters obtained through parameter extraction allows the study and modeling of the variation and correlation between process parameters, which is of crucial importance to obtain realistic values of the modeled circuit unknowns. Typical procedures determine parameters sequentially and neglect the interactions between them and, as a result, the ﬁt of the model to measured data may be less than optimum. In addition, the parameters are obtained as they relate to a speciﬁc device and, consequently, they correspond to different device sizes. The extraction procedures are also generally specialized to a particular model, and considerable work is required to change or improve these models.

For complicated IC models, parameter extraction can be formulated as an optimization problem. The use of direct parameter extraction techniques instead of optimization allows end-of-line compact model parameter determination. The model equations are split up into functionally independent parts, and all parameters are solved using straightforward algebra without iterative procedures or least squares ﬁtting. With the constant downscaling of supply voltage the moderate inversion region becomes more and more important, and an accurate description of this region is thus essential. The threshold-voltage-based models, such as BSIM and MOS 9, make use of approximate expressions of the drain-source channel current IDS in the weak inversion region (i.e., subthreshold) and in the stronginversion region (i.e., well above threshold). These approximate equations are tied together using a mathematical smoothing function, resulting in neither a physical nor an accurate description of IDS in the moderate inversion region (i.e., around threshold). The major advantages of surface potential (deﬁned as the electrostatic potential at the gate oxide/substrate interface with respect to the neutral bulk) over threshold voltage based models is that surface potential model does not rely on the regional approach and I–V and C–V characteristics in all operation regions are expressed/evaluated using a set of uniﬁed formulas. In the surface-potential-based model, the channel current IDS is split up in a drift (Idrift) and a diffusion (Idiff) component, which are a function of the gate bias VGB and the surface potential at the source (ts0) and the drain (tsL) side. In this way IDS can be accurately described using one equation for all operating regions (i.e., weak, moderate and stronginversion). The numerical progress has also removed a major concern in surface potential modeling: the solution of surface potential either in a closed form (with limited accuracy) exists or as with our use of the second-order Newton iterative method to improve the computational efﬁciency in MOS model 11.

20 2 Random Process Variation in Deep-Submicron CMOS The fundamental notion for the study of spatial statistics is that of stochastic (random) process deﬁned as a collection of random variables on a set of temporal or spatial locations. Generally, a second-order stationary (wide sense stationary, WSS) process model is employed, but other more strict criteria of stationarity are possible. This model implies that the mean is constant and the covariance only depends on the separation between any two points. In a second-order stationary process only the ﬁrst and second moments of the process remain invariant. The covariance and correlation functions capture how the co-dependence of random variables at different locations changes with the separation distance. These functions are unambiguously deﬁned only for stationary processes. For example, the random process describing the behavior of the transistor length L is stationary only if there is non systematic spatial variation of the mean L. If the process is not stationary, the correlation function is not a reliable measure of codependence and correlation. Once the systematic wafer-level and ﬁeld-level dependencies are removed, thereby making the process stationary, the true correlation is found to be negligibly small. From a statistical modeling perspective, systematic variations affect all transistors in a given circuit equally. Thus, systematic parametric variations can be represented by a deviation in the parameter mean of every transistor in the circuit.

We model the manufactured values of the parameters pi [ {p1,…,pm} for transistor i as a random variable pi ¼ lp;i þ rp ðdi Þ Á pðdi ; hÞ ð2:1Þ where lp,i and rp(di) are the mean value and standard deviation of the parameter pi, respectively, p(di,h) is the stochastic process corresponding to parameter p, di denotes the location of transistor i on the die with respect to a point origin and h is the die on which the transistor lies. This reference point can be located, say in the lower left corner of the die, or in the center, etc. A random process can be represented as a series expansion of some uncorrelated random variables involving a complete set of deterministic functions with corresponding random coefﬁcients.

A commonly used series involves spectral expansion [10], in which the random coefﬁcients are uncorrelated only if the random process is assumed stationary and the length of the random process is inﬁnite or periodic. The use of the KarhunenLoève expansion [11] has generated interest because of its bi-orthogonal property, that is, both the deterministic basis functions and the corresponding random coefﬁcients are orthogonal [12], e.g. the orthogonal deterministic basis function and its magnitude are, respectively, the eigenfunction and eigenvalue of the covariance function. Assuming that pi is a zero-mean Gaussian process and using the Karhunen-Loève expansion, pi can be written in truncated form (for practical implementation) by a ﬁnite number of terms M as X pﬃﬃﬃﬃﬃﬃﬃﬃ M pi ¼ lp;i þ rp ðdi Þ Á #p;n dp;n ðhÞfp;n ðdi Þ ð2:2Þ n¼1

2.1 Modeling Process Variability 21 where {dn(h)} is a vector of zero-mean uncorrelated Gaussian random variables and fp,n(di) and #p,n are the eigenfunctions and the eigenvalues of the covariance matrix Rp(d1, d2) (Fig. 2.1) of p(di,h), controlled through a distance based weight term, the measurement correction factor, correlation parameter q and process correction factors cx and cy.

Without loss of generality, consider for instance two transistors with given threshold voltages. In our approach, their threshold voltages are modeled as stochastic processes over the spatial domain of a die, thus making parameters of any two transistors on the die two different correlated random variables. The value of M is governed by the accuracy of the eigen-pairs in representing the covariance function rather than the number of random variables. Unlike previous approaches, which model the covariance of process parameters due to the random effect as a piecewise linear model [13] or through modiﬁed Bessel functions of the second kind [14], here the covariance is represented as a linearly decreasing exponential function

where 1 is a distance based weight term, c is the measurement correction factor for the two transistors located at Euclidian coordinates (x1, y1) and (x2, y2), respectively, cx and cy are process correction factors depending upon the process maturity. For instance, in Fig. 2.1a, process correction factor cx,y = 0.001 relates to a very mature process, while cx,y = 1 indicates that this is a process in a ramp up phase. The correlation parameter q reﬂecting the spatial scale of clustering deﬁned in [-a, a] regulates the decaying rate of the correlation function with respect to distance (d1, d2) between the two transistors located at Euclidian coordinates (x1, y1) and (x2, y2).

Physically, lower a/q implies a highly correlated process and hence, a smaller number of random variables are needed to represent the random process and correspondingly, a smaller number of terms in the Karhunen-Loève expansion.