# Analysis and Calibration for Wideband Times-2 Interleaved Current-Steering DACs

Daniel Beauchamp<sup>(D)</sup>, *Member, IEEE*, and Keith M. Chugg, *Fellow, IEEE* 

Abstract—This work presents analysis and calibration of interleaving and data timing errors that are encountered in modern times-2 interleaved digital-to-analog converters (DACs) with a current-steering (CS) architecture. Such errors corrupt the DAC output spectrum with spectral images that require calibration. We develop an analytical model for the interleaving and data timing errors that we understand are most significant and propose a calibration algorithm that treats all of them. Extensive simulations of the algorithm are made possible by leveraging the speed and accuracy of the analytical model. The algorithm is demonstrated on a commercially-developed 10-bit times-2 interleaved CS-DAC, operating at 40GS/s in 14nm CMOS.

Index Terms—DAC, SFDR, current-steering, time-interleaved, calibration, wideband.

#### I. INTRODUCTION

**D** IRECT radio frequency (RF) sampling paves the way for several interesting applications, including 5G/6G cellular communication, electronic warfare, and automotive radar. Typically, these applications require data converters with sample rates in the mmWave regime. Designing such converters at the full rate is challenging, so the typical approach is to time interleave lower speed sub-converters. Without calibration, nonidealities in the sub-converters and mismatch between them corrupts the Nyquist band with undesired spectral content. This impedes the ability to use the full Nyquist band, which is critical for systems that rely on, for example, software-defined or cognitive radio.

The CS-DAC is regarded as the de facto solution for high-speed applications [1]. Using CS-DACs in time-interleaved architectures has been demonstrated in order to achieve even higher sample rates [2]–[6]. While various CS-DAC errors have been studied, e.g., current source mismatch [7], timing mismatch [8], and finite output impedance [9], there are two classes of significant errors where research on analysis and calibration is sparse. The first class

Manuscript received 21 April 2022; revised 29 June 2022; accepted 20 July 2022. This work was supported in part by the National Science Foundation under Grant CCF-1763747 and Grant ECCS-1643004 and in part by Jariet Technologies Inc. This article was recommended by Associate Editor S. Gupta. (*Corresponding author: Daniel Beauchamp.*)

Daniel Beauchamp is with the Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA 90089 USA, and also with Jariet Technologies, Inc., Redondo Beach, CA 90277 USA (e-mail: dbeaucha@usc.edu).

Keith M. Chugg is with the Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA 90089 USA.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2022.3193659.

Digital Object Identifier 10.1109/TCSI.2022.3193659

is *interleaving errors*, which are caused by, for example, gain mismatch between the sub-DACs and nonidealities associated with the clocks that toggle between them (e.g., duty cycle errors and timing skew). The second class is *data timing errors*, which are caused by nonidealities associated with the clocks that are part of the input data serializer (e.g., duty cycle and phase errors). This work is focused on the analysis and calibration of interleaving and data timing errors.

1

The discussion in this work is limited to times-2 interleaving. While research on time-interleaved digital-to-analog converters (TIDACs) with higher interleaving factors exists [6], it is not very common, since interleaving by 2 typically provides enough timing margin for the settling of the sub-DACs [3]. However, in modern times-2 interleaved DACs, incomplete settling is possible due to the reduced timing margin at higher sample rates. Hence, we consider finite settling in the analysis and, under this framework, show how data timing errors create spurs.

Interleaving errors have been discussed in prior works [3], [5], [10], [11]. For example, calibration algorithms for gain and clock duty cycle are presented in [3] and [5]. In [3], the DAC is excited by a single tone near Nyquist and the resulting interleaving spur is driven to zero via duty cycle control. In [5], simulated annealing is applied with a single tone at the desired calibration frequency. In this work, we show that calibration with single tone inputs can result in narrowband solutions, i.e., where the calibration is effective near the calibration frequency but not over the full Nyquist band. This phenomenon, which we refer to as *narrowband locking*, is not observable when these errors are analyzed in isolation of each other as in [10] and [3], however, it becomes clearer when they are analyzed in a coupled framework as in [12]. Using the coupled analysis, we prove that calibration with two tones leads to wideband solutions, i.e., those that are effective across the full Nyquist band. Moreover, we propose a two-tone calibration algorithm based on simulated annealing and show that it is effective even when additional nonidealities are present (e.g., clock timing skew and data timing errors).

We also develop an analytical model that accounts for the interleaving and data timing errors that we understand are most significant. The analytical model is then validated against behavioral simulations and proven to be extremely accurate with a run time that is four orders of magnitude faster. We then propose a calibration algorithm that treats all of the errors considered in this paper. Extensive simulations of the algorithm are made possible by leveraging the speed and accuracy of the analytical model. The algorithm is then

1549-8328 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. *M*-bit times-2 interleaved CS-DAC with a sample rate of  $f_s$ .

demonstrated on a modern, commercially-developed transceiver chip that contains a times-2 interleaved DAC, operating at 40GS/s in 14nm CMOS. The transceiver also contains a high-speed analog-to-digital converter (ADC), an embedded MCU, and programmable control over various impairments, which is consistent with the trend toward system-on-chip (SoC) implementations. We show how these can be leveraged as calibration support for the DAC, which is more practical than previous methods that rely on off-chip measurements of the DAC output with a spectrum analyzer [5] and manual tuning [10].

The rest of the paper is organized as follows. Section II provides background on times-2 interleaved CS-DACs. Section III provides a summary of the significant interleaving and data timing errors. Section IV focuses on the coupled analysis of gain and duty cycle errors. Section V focuses on the analysis of data timing errors. Section VI presents an analytical model that accounts for all of the errors considered in this paper. Section VII presents the proposed calibration algorithm, along with simulations and experimental results. Finally, we conclude the paper in Section VIII.

# II. BACKGROUND

A block diagram of an *M*-bit times-2 interleaved DAC with a sample rate of  $f_s = 1/T_s$  and input sequence x[k] is illustrated in Fig. 1, where  $T_s$  is the sample period. A half-rate clock (C2 clock<sup>1</sup>) is passed to a serializer and analog multiplexer (AMUX). The serializer multiplexes low-speed, parallel data lanes into high-speed even and odd lanes at  $f_s/2$ , and the AMUX is used to toggle the sub-DACs between the output node and a dump node. A common serializer with *K* inputs per bit slice<sup>2</sup> is shown in Fig. 2(a), and the nominal timing of clock and data signals down to the C4 level is shown in Fig. 2(b). Note that the C4 clocks (i.e., C4I, C4Q) are offset by 90°, or delayed by  $T_s$  relative to each other. The high-speed data lanes, D2E and D2O, are also delayed by  $T_s$  relative to each other since they are aligned with C4I and C4Q, respectively.

A circuit schematic of a TIDAC with a CS architecture is shown in Fig. 3(a). Note that the AMUX uses opposite phases of the C2 clock to toggle the sub-DACs between the output node ( $V_p$ ,  $V_n$ ) and the dump node,  $V_{dump}$ . Moreover, current is drawn from these nodes via current cells that are comprised of binary-weighted current sources and complementary data switches. Fig 3(b) shows the ideal timing of the C2 clock and

<sup>1</sup>We use the notation CN and DN to denote clock and data signals at  $f_s/N$ , respectively.

<sup>2</sup>Examples with K = 4 and K = 8 are found in [2] and [4], respectively.



Fig. 2. (a) Serializer block diagram. (b) Clock and data timing down to the C4 level.

sub-DAC data, where there is a timing margin of  $T_s/2$  to avoid clock and data edge overlap. Ideally, the contribution of each sub-DAC to the output node is

$$y_{\text{even}}(t) = \left(x(t) \cdot \sum_{k=-\infty}^{\infty} \delta(t - 2k T_s)\right) * h_{\text{even}}(t)$$
(1a)

$$y_{\text{odd}}(t) = \left(x(t) \cdot \sum_{k=-\infty}^{\infty} \delta(t - (2k+1)T_s)\right) * h_{\text{odd}}(t) \quad (1b)$$

where x(t) is the analog target signal,  $h_{\text{even}}(t) = h_{\text{odd}}(t) = \text{rect}(t/T_s)$  is the hold pulse, rect(t) = 1 if  $|t| \le 1/2$  and 0 if |t| > 1/2, and \* denotes convolution. The ideal TIDAC output spectrum is derived by summing the Fourier transforms of (1a) and (1b), which is

$$Y_{\text{ideal}}(f) = \frac{1}{2} \operatorname{sinc}(fT_s) \sum_{k=-\infty}^{\infty} X(f - kf_s/2) + \frac{1}{2} \operatorname{sinc}(fT_s) \sum_{k=-\infty}^{\infty} X(f - kf_s/2) (-1)^k \quad (2a)$$
$$= \operatorname{sinc}(fT_s) \sum_{k=-\infty}^{\infty} X(f - kf_s) \quad (2b)$$

 $k = -\infty$ 

| Nonideality                | Spectral Image Locations | Spur Locations (1st Nyquist Zone)                       | Analysis          | Parameter                      |
|----------------------------|--------------------------|---------------------------------------------------------|-------------------|--------------------------------|
| Gain error                 | $(2k+1) f_s/2$           | $f_s/2-f_0$                                             | Section IV        | $\epsilon_g$                   |
| C2 clock duty cycle error  | $(2k+1)f_s/2$            | $f_s/2 - f_0$                                           | Section IV        | α                              |
| C2 clock skew              | $(2k+1)f_s/2$            | $f_s/2 - f_0$                                           | Presented in [10] | $\Delta \theta_{\rm skew}$     |
|                            | $(4k+1) f_s/4$           | $f_s/4 - f_0, f_s/4 + f_0$ if $f_0 \in (0, f_s/4)$      |                   |                                |
| C4 clock duty cycle errors | $(4k+3) f_s/4$           | $3f_s/4 - f_0, f_0 - f_s/4$ if $f_0 \in (f_s/4, f_s/2)$ | Section V         | $\beta_I, \beta_Q$             |
|                            | $(2k+1) f_s/2$           | $f_s/2 - f_0$                                           |                   |                                |
| C4 clock phase errors      | $(2k+1) f_s/2$           | $f_{s}/2 - f_{0}$                                       | Section V         | $\Delta \phi_I, \Delta \phi_O$ |

TABLE I INTERLEAVING AND DATA TIMING ERRORS IN TIMES-2 INTERLEAVED CS-DACS



Fig. 3. (a) Circuit schematic of a times-2 interleaved CS-DAC. (b) Ideal timing of the C2 clock and a bit slice of each sub-DAC.

where  $sin(f) := sin(\pi f)/(\pi f)$ . Note from (2a) that spectral images at odd multiples of  $f_s/2$  cancel each other out, resulting in a spectrum identical to that of a single ideal DAC with a sample rate of  $f_s$ .

#### **III. OVERVIEW OF ERRORS**

The perfect cancellation of undesired spectral images only occurs for an ideal TIDAC. In practice, there are various errors that cause imperfect image cancellation. The errors that we consider are summarized in Table I and depicted in Figs. 4 and 5. From our experience with times-2 interleaved CS-DACs, these are the most significant errors associated with interleaving and data timing. Under this framework, we show that the TIDAC output spectrum is, in general, described by

$$Y(f) = K_0(f) \sum_{k=-\infty}^{\infty} X(f - kf_s) + K_{f_s/2}(f) \sum_{k=-\infty}^{\infty} X(f - (2k+1)f_s/2) + K_{f_s/4}(f) \sum_{k=-\infty}^{\infty} X(f - (4k+1)f_s/4) + K_{3f_s/4}(f) \sum_{k=-\infty}^{\infty} X(f - (4k+3)f_s/4)$$
(3)

Ideally,  $K_0(f) = \operatorname{sinc}(fT_s)$  and the rest of the coefficients are zero, as shown in (2b). With errors present, the coefficients  $K_{f_s/2}(f)$ ,  $K_{f_s/4}(f)$  and  $K_{3f_s/4}(f)$  may be nonzero, which results in spurs when the DAC is excited by a tone. This can degrade performance metrics, such as spurious-free dynamic range (SFDR), thus motivating calibration. We collectively refer to the errors in Fig. 4 as interleaving errors. As mentioned, these include gain mismatch between the sub-DACs



Fig. 4. Interleaving errors. (a) Gain error. (b) C2 clock duty cycle error. (c) C2 clock skew.



Fig. 5. Data timing errors. (a) C4 clock duty cycle errors. (b) C4 clock phase errors.

and nonidealities on the clock that toggles between them (i.e., the C2 clock). Figs. 4(a) and (b) illustrate how gain and C2 clock duty cycle errors distort the sub-DAC hold pulses. In general, this creates images at odd multiples of  $f_s/2$  (i.e,  $K_{f_s/2}(f) \neq 0$ ). While the analysis of these errors has been studied in isolation [3], [10], their coupled analysis has received little attention aside from the appendix in [12]. In Section IV, we show how the coupled analysis uncovers critical information that helps design a robust calibration algorithm. Another error that results in  $K_{f_s/2}(f) \neq 0$  is C2 clock skew, i.e., where the C2 clock phases do not have a perfect 180° offset, as shown in Fig. 4(c). The analysis of C2 clock skew is presented in [10], so it is excluded here. However, we do include this effect in our analytical model 4

(Section VI) and in the simulations of our calibration algorithm (Section VII).

We collectively refer to the errors in Fig. 5 as data timing errors. These include nonidealities associated with the serializer clocks, i.e., C4I and C4Q in Fig. 2(b), as these clocks are aligned with the sub-DAC data. The serializer in Fig. 2(a) may also rely on lower rate clocks (C8, C16, etc.), but in contrast to the C4 clocks, these lower rate clocks are not aligned with the high-speed data lanes, which makes their nonidealities less critical. The analysis in Section V shows how C4 clock duty cycle and phase errors creates spurs when finite settling of the sub-DACs is considered. In general, such errors can result in  $K_{f_s/2}(f) \neq 0$ ,  $K_{f_s/4}(f) \neq 0$ , and  $K_{3f_s/4}(f) \neq 0$ .

The AMUX that toggles between the sub-DACs is an important part of the TIDAC, so its nonidealities are worth discussing. Such nonidealities can result in either interleaving spurs or nonlinearity. Interleaving spurs are caused by dynamic mismatch in the AMUX [3], e.g., C2 clock duty cycle error and C2 clock skew. As mentioned, we include these effects in the analysis and calibration since they are within the scope of this paper. To isolate the interleaving effects (as in [10]), we assume that the toggling between the sub-DACs is a linear operation, which allows the TIDAC output to be modeled as the sum of the individual sub-DAC contributions. The linearity of the AMUX is determined by the circuit design and, specifically, the choice of device for the switches outlined in Fig. 3(a). While circuit design is beyond the scope of this paper, it is worth mentioning that the choice of these devices has been discussed in detail [3], [10], particularly regarding linearity tradeoffs with bandwidth and output swing.

Clock jitter is another effect known to distort the DAC output spectrum [13]. Modern phase-locked loops (PLLs) are capable of synthesizing clocks (i.e. the C2 clock) with very low integrated jitter (e.g. in the tens of femtoseconds-rms range [14]) and fine tuning of the PLL loop parameters (charge pump current, loop filter resistor, etc.) can reduce jitter even further [15]. At these low levels, clock jitter effects are not significant in terms of spectral imperfections (cf., [16, eq. (9.16)]) and therefore not considered further.

To simplify the presentation, we analyze the effect of various errors in isolation and then present an analytical model that includes all of them. Specifically, the analysis of errors in this work is organized as follows. Section IV focuses only on the coupled analysis of gain and C2 clock duty cycle errors, i.e., with all other nonidealities equal to zero. Similarly, Section V focuses only on data timing errors. Finally, we develop an analytical model in Section VI that accounts for all of the errors in Table I.

## IV. GAIN AND C2 CLOCK DUTY CYCLE ERRORS

Gain and C2 clock duty cycle errors are discussed in [3] and [10] and SFDR expressions are derived for each in isolation of all other parameters. The work in [17] extends the analysis of gain errors to general times-N interleaved DACs, also in an isolated environment. In this section, we analyze gain and duty cycle errors in a coupled framework, which is motivated by the fact that they can cancel each other out,

as mentioned in the appendix of [12]. We show that this can cause narrowband locking, i.e., where the calibration locks onto a single frequency solution with non-ideal parameters. Moreover, we propose a calibration signal that is immune to this effect.

# A. Coupled Analysis

In this section, we only consider errors in gain and C2 clock duty cycle. Specifically, we analyze model (1) with hold pulses defined as

$$h_{\text{even}}(t) = \operatorname{rect}\left(\frac{t}{T_s(1+2\alpha)}\right)$$
 (4a)

$$h_{\text{odd}}(t) = (1 + \epsilon_g) \operatorname{rect}\left(\frac{t}{T_s(1 - 2\alpha)}\right)$$
 (4b)

where  $\epsilon_g$  and  $\alpha$  are the gain and duty cycle errors, respectively. Under this framework, it can be shown that the output spectrum is of the form in (3) with coefficients

$$K_0(f) = \frac{f_s}{2} \left( H_{\text{even}}(f) + H_{\text{odd}}(f) \right)$$
(5a)

$$K_{f_s/2}(f) = \frac{f_s}{2} (H_{\text{even}}(f) - H_{\text{odd}}(f))$$
 (5b)

where

$$H_{\text{even}}(f) = T_s(1+2\alpha)\operatorname{sinc}(fT_s(1+2\alpha))$$
(6a)

$$H_{\text{odd}}(f) = (1 + \epsilon_g) T_s (1 - 2\alpha) \operatorname{sinc}(f T_s (1 - 2\alpha)) \quad (6b)$$

In general, since  $K_{f_s/2}(f) \neq 0$ , we observe from (3) that the output spectrum contains images at odd multiples of  $f_s/2$ . Hence, if the DAC is excited by a tone at  $f_0$ , then the image at  $f_s/2$  creates an *interleaving spur* at  $f_s/2 - f_0$ , and the resulting SFDR is

SFDR = 20 
$$\log_{10} \left| \frac{K_0(f_0)}{K_{f_s/2}(f_s/2 - f_0)} \right|$$
 (7)

By inspection of (5b) and (6), we have  $K_{f_s/2}(f) \equiv 0$  if  $\epsilon_g = \alpha = 0$ , i.e., the images vanish for *all* frequencies when both gain and C2 clock duty cycle errors are zero. However, if either  $\alpha \neq 0$  or  $\epsilon_g \neq 0$ , then this is no longer the case. For example, suppose the DAC is excited by a tone at  $f_0$ , creating an interleaving spur at  $f_s/2 - f_0$ . This spur is zero when  $K_{f_s/2}(f_s/2 - f_0) = 0$ , and solving this amounts to writing  $\epsilon_g$ in terms of  $\alpha$  as

$$\epsilon_g(\alpha; \nu_0) = \frac{(1+2\alpha)\operatorname{sinc}\left((1/2-\nu_0)\left(1+2\alpha\right)\right)}{(1-2\alpha)\operatorname{sinc}\left((1/2-\nu_0)\left(1-2\alpha\right)\right)} - 1 \quad (8)$$
$$\approx \frac{4\pi\alpha(\frac{1}{2}-\nu_0)}{\tan\left(\pi(\frac{1}{2}-\nu_0)\right)} \tag{9}$$

where<sup>3</sup>  $v_0 = f_0/f_s$ , and a small  $\alpha$  approximation has been used. In Fig. 6, we plot (9) for various frequencies  $v_0 \in$ [0, 0.5]. Note that each frequency admits a distinct family of solutions  $(\alpha, \epsilon_g(\alpha; v_0))$  that satisfies  $K_{f_s/2}(f_s/2 - f_0) =$ 0, i.e., no interleaving spur. Furthermore, the solutions are linear near the origin with slopes as indicated in Fig. 6. The magnitude of the steepest slope is 4, which is found by taking the limit of (9) as  $v_0$  approaches 0.5.

<sup>3</sup>When convenient, we use normalized frequency  $v = f/f_s$ .



Fig. 6. Solutions  $(\alpha, \epsilon_g)$  that result in no interleaving spur.



Fig. 7. Calibration by gain error cancellation ( $\epsilon_g = -0.018$ ).

### B. Calibration by Gain Error Cancellation

The cancellation of the interleaving spur with nonzero parameters is important to consider for calibration. For example, for the calibration approach in [3], the TIDAC is excited by a single tone at  $f_{cal} = f_s/2$  so that the interleaving spur appears at DC  $(f_s/2 - f_{cal})$ . A low-speed auxiliary ADC is then used in tandem with on-chip duty cycle control to drive this spur to zero, cancelling out the gain error. We refer to this calibration approach as gain error cancellation. Fig. 7 illustrates the SFDR (from (7)) over the Nyquist band after applying gain error cancellation with different calibration frequencies. Note that, in all cases, the SFDR monotonically decreases away from the calibration frequency. This is due narrowband locking, since the calibration algorithm locks onto a single frequency solution,  $(\alpha, \epsilon_g) \neq (0, 0)$ . We say that gain error cancellation exhibits deterministic narrowband locking, since such locking occurs every time the algorithm is run for  $\epsilon_g \neq 0$ . In Section VII, we explore another single-tone calibration algorithm that exhibits random narrowband locking, since it occurs randomly as opposed to every time. Narrowband locking is problematic for wideband systems where the calibration should hold over the full Nyquist band. In Appendix A, we prove that calibration with two tones promotes convergence to the wideband solution,  $(\alpha, \epsilon_g) = (0, 0)$ . Furthermore, an algorithm that uses two tones as the input in calibration mode is proposed in Section VII.

# V. DATA TIMING ERRORS

The sub-DAC data timing is aligned with the C4 clocks, as illustrated in Fig. 2(b). Therefore, duty cycle and phase errors on these clocks affects the data timing. Again, we refer to these as data timing errors and show that can they can



Fig. 8. (a) Single bit slice of the even sub-DAC. (b) C2 clock. (c) Output of the even sub-DAC. (d) Settling error for the even sub-DAC,  $e_{even}(t)$ .

create spurs when finite settling is considered. In this section, to isolate the effect of data timing errors, we assume that there are no interleaving effects, i.e., we assume an ideal C2 clock with no sub-DAC gain error. For clarity of the exposition, we explain the key concepts using the even sub-DAC and leave most of the detailed derivations to Appendix B.

# A. Finite Settling

First, we focus only on finite settling in the absence of all other nonidealities. In Fig. 8(a), we depict a scenario where the even sub-DAC undergoes a data transition, focusing on a single bit slice.<sup>4</sup> The shaded regions define the times during which the sub-DAC is connected to the dump node. Note that there is a timing margin of  $T_s/2$  for the sub-DAC to settle to its desired value on this node. After this settling period, the C2 clock, shown in Fig. 8(b), routes the sub-DAC to the output node. Ideally, the sub-DAC output changes instantaneously at each switching instant, as shown by the green waveform in Fig. 8(c). In practice, the sub-DAC current sources change gradually according to a time constant,  $\tau$ , determined by the load [18]. Under this framework, the settling error<sup>5</sup> at the output node is

$$e_{\text{even}}(t) = \left(\Delta x(t) \sum_{k=-\infty}^{\infty} \delta(t - 2k T_s)\right) * p(t) \qquad (10)$$

where

Δ

$$x(t) = x(t - 2T_s) - x(t)$$
(11)

$$p(t) = e^{-(t+T_s)/\tau} h(t)$$
(12)

with hold pulse  $h(t) = \operatorname{rect}(t/T_s)$ . We refer to (12) as the *settling pulse*, i.e., the hold pulse weighted by a decaying exponential. An expression similar to (10) may be derived for the odd sub-DAC,  $e_{odd}(t)$ , and the total settling error at the output node is then  $e(t) = e_{even}(t) + e_{odd}(t)$ . By taking the Fourier transform of e(t), it can be shown that this results

<sup>&</sup>lt;sup>4</sup>We assume that there is no timing skew between the bit slices, which means that they all fire simultaneously. Calibration for this impairment is covered in [8].

<sup>&</sup>lt;sup>5</sup>In the analysis, we assume that the settling errors are negligibly small just prior to each data transition, i.e.,  $e^{-2T_s/\tau} \approx 0$ , which implies  $\tau \ll 2T_s$ .

6

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS



Fig. 9. Data timing errors for the even sub-DAC (a) C4 clock duty cycle error. (b) C4 clock phase error.

in an output spectrum (3) where all coefficients except for  $K_0(f)$  are zero. Hence, this does not result in any in-band spurs for sinusoidal inputs. However, it does result in a mild output power reduction, e.g., a 1dB reduction (excluding the sinc attenuation) for  $f_0 = 0.26 f_s$  and  $\tau/T_s = 0.3$ .

## B. Finite Settling With Data Timing Errors

While finite settling alone does not create spurs, it does when combined with data timing errors (as we prove in Appendix B). For example, referring to Fig. 9(a), C4 clock duty cycle errors  $\beta_I$  cause samples x[4k] to fire early by  $2\beta_I T_s$  and samples x[4k+2] to fire late by the same quantity. Similarly, referring to Fig. 9(b), C4 clock phase errors  $\Delta\phi_I$ delay samples x[2k] by  $\frac{2}{\pi}\Delta\phi_I T_s$ . We also assign these errors to the odd sub-DAC, i.e.,  $\beta_Q$  and  $\Delta\phi_Q$  for C4Q in Fig. 2(b). In Appendix B, we show that duty cycle errors  $\beta_I$ ,  $\beta_Q$  and phase errors  $\Delta\phi_I$ ,  $\Delta\phi_Q$  result in an output spectrum (3) with the following coefficients

$$K_{0}(f) = \operatorname{sinc}(f/f_{s}) + \frac{f_{s}}{4} \left( c_{+}(\beta_{I}, \Delta \phi_{I}) + c_{+}(\beta_{Q}, \Delta \phi_{Q}) \right) \times M(f) P(f)$$
(13a)

$$= \frac{f_s}{4} \left( c_+(\beta_I, \Delta \phi_I) - c_+(\beta_Q, \Delta \phi_Q) \right) M(f) P(f)$$
(13b)

$$K_{f_s/4}(f) = \frac{f_s}{4} \left( -c_-(\beta_I, \Delta \phi_I) + j c_-(\beta_Q, \Delta \phi_Q) \right) \times M(f + f_s/4) P(f)$$
(13c)
$$K_{3,f/4}(f)$$

$$= \frac{f_s}{4} \left( -c_-(\beta_I, \Delta \phi_I) - j c_-(\beta_Q, \Delta \phi_Q) \right)$$
  
 
$$\times M(f + f_s/4) P(f)$$
(13d)

where

$$c_{+}(u,v) = 2\cosh\left(\frac{2T_{s}}{\tau}u\right)e^{\frac{2T_{s}}{\pi\tau}v}$$
(14a)

$$c_{-}(u,v) = 2\sinh\left(\frac{2T_s}{\tau}u\right)e^{\frac{2T_s}{\pi\tau}v}$$
(14b)

and

$$M(f) = 2\sin(2\pi f T_s) e^{-j(2\pi f T_s + \pi/2)}$$
(15)

$$P(f) = \frac{2 e^{-\frac{I_s}{\tau}}}{1/\tau + j2\pi f} \sinh\left(\frac{T_s}{2}\left(\frac{1}{\tau} + j\ 2\pi f\right)\right) \quad (16)$$

Note that the output spectrum contains images at odd multiples of  $f_s/2$  when  $c_+(\beta_I, \Delta \phi_I) \neq c_+(\beta_Q, \Delta \phi_Q)$ , as implied by (13b). This occurs when  $\beta_I \neq \beta_Q$  (duty cycle mismatch) and/or  $\Delta \phi_I \neq \Delta \phi_Q$  (I/Q imbalance). Also note that there are images at odd multiples of  $f_s/4$ , as implied by (13c) and (13d). These are nonzero when  $\beta_I \neq 0$  and/or  $\beta_Q \neq 0$ , i.e., duty cycle error on one or both C4 clocks.

These undesired spectral images create spurs when the DAC is excited by a tone at  $f_0$ , as summarized by the last two rows in Table I. The coefficients in (13) can be used to compute the power ratio of these spurs relative to an input tone at  $f_0$ . For example, if  $f_0 \in (0, f_s/4)$ , then  $I_{f_s/4-f_0}(f_0) =$ 20  $\log_{10} \left| \frac{K_{f_s/4}(f_s/4-f_0)}{K_0(f_0)} \right|$  for the spur at  $f_s/4 - f_0$ , and similar quantities may be derived for the other spurs. In Fig. 10(a), we compare  $I_{f_s/4-f_0}(f_0)$  with that obtained from a behavioral simulation of a 10-bit TIDAC with only C4 clock duty cycle errors ( $\beta_I = \beta_Q = 0.02$ ). Note that the simulation and theory are closely matched over frequency with different  $\tau/T_s$  ratios, except for the extreme cases. Specifically, when  $\tau/T_s = 0.05$ (very small), the spur power is below the quantization noise floor and quantization effects are ignored in the analysis since they do not provide additional insight. When  $\tau/T_s = 1.1$ (very large), the assumption in the analysis that  $\tau \ll 2T_s$  is less accurate, resulting in an approximation error of 2.1dB. In Fig. 10(b), we observe qualitatively similar results for the  $f_s/2 - f_0$  spur where only I/Q imbalance is considered  $(\Delta \phi_I \neq \Delta \phi_Q)$ . Fig. 10(c) illustrates the output PSD from a behavioral simulation that includes both C4 clock duty cycle errors and I/Q imbalance.

Lastly, there are two observations from Figs. 10(a) and (b) that are worth mentioning. First is that the spur power increases with  $\tau/T_s$  and, intuitively, this is because the magnitude of the settling errors also increases with  $\tau/T_s$ . Second, the spur power peaks near  $f_s/4$  and is smallest near DC and Nyquist, i.e., the spurs are shaped by  $\sin(2\pi\nu)$ , and it can be shown that this is caused by the sine term in (15).

#### VI. ANALYTICAL MODEL

In this section, we develop an analytical model that captures all of the nonidealities in Table I. We first state the model and then highlight its speed and accuracy by comparing it to Matlab-based behavioral simulations.

#### A. Signal Flow Diagram and Output Spectrum

The signal flow diagram in Fig. 11 captures all of the nonidealities in Table I. As developed in Section IV, gain and C2 clock duty cycle errors ( $\epsilon_g$  and  $\alpha$ ) are accounted for in the hold pulses that are defined in (4). In addition, we have included a parameter that accounts for C2 clock skew,  $\Delta t_{skew} = \frac{\Delta \theta_{skew}}{\pi} T_s$ . The details of the analysis of data timing errors are contained in Appendix B and summarized

Authorized licensed use limited to: University of Southern California. Downloaded on August 08,2022 at 22:11:39 UTC from IEEE Xplore. Restrictions apply.



Fig. 10. Behavioral simulations vs. theory (a) C4 clock duty cycle errors ( $\beta_I = \beta_Q = 0.02$ ). (b) C4 clock phase errors ( $\Delta \phi_I = -\pi/180$ ,  $\Delta \phi_Q = \pi/180$ ). (c) Output PSD from a simulation that includes both C4 clock duty cycle errors ( $\beta_I = \beta_Q = 0.02$ ) and I/Q imbalance ( $\Delta \phi_I = -\pi/180$ ,  $\Delta \phi_Q = \pi/180$ ),  $\tau/T_s = 0.2$ .

 $K_{a}$  (f)



Fig. 11. Signal flow diagram of the analytical model for interleaving and data timing errors.

in the lower portion of Fig. 11, i.e., they are embedded in the constants  $c_0, c_1, c_2, c_3$  (referring to (29) and (31)). Moreover, the settling pulses are defined as

$$p_{\text{even}}(t) = e^{-(t+T_s)/\tau} \operatorname{rect}\left(\frac{t}{T_s(1+2\alpha)}\right)$$
(17a)

$$p_{\text{odd}}(t) = (1 + \epsilon_g) e^{-(t + T_s)/\tau} \operatorname{rect}\left(\frac{t - \Delta t_{\text{skew}}}{T_s(1 - 2\alpha)}\right) \quad (17b)$$

where gain and C2 clock errors are now included. Using Fig. 11, it can be shown that the output spectrum is of the form (3) with coefficients

$$K_{0}(f) = \frac{f_{s}}{2} \left( H_{\text{even}}(f) + H_{\text{odd}}(f) e^{-j2} fT_{s} \Delta \theta_{\text{skew}} \right) + \frac{f_{s}}{4} M(f) \left[ c_{+}(\beta_{I}, \Delta \phi_{I}) P_{\text{even}}(f) + c_{+}(\beta_{Q}, \Delta \phi_{Q}) P_{\text{odd}}(f) \right]$$
(18a)  
$$K_{f/2}(f)$$

$$\frac{f_s}{2} \left( H_{\text{even}}(f) - H_{\text{odd}}(f) e^{-j2 f T_s \Delta \theta_{\text{skew}}} \right)$$

$$+ \frac{f_s}{4} M(f) \left[ c_+(\beta_I, \Delta \phi_I) P_{\text{even}}(f) - c_+(\beta_Q, \Delta \phi_Q) P_{\text{odd}}(f) \right]$$

$$(18b)$$

where  $c_+(u, v)$ ,  $c_-(u, v)$  are defined in (14), and M(f) is

defined in (15). The Fourier transforms of the hold pulses,  $H_A(f)$  and  $H_B(f)$ , are given by (6). Finally, it can be shown that the Fourier transforms of the settling pulses defined in (17) are

$$P_{\text{even}}(f) = \frac{2 \exp\left(-\frac{T_s}{\tau}\right)}{1/\tau + j2\pi f} \sinh\left(\frac{T_s}{2}\left(\frac{1}{\tau} + j\ 2\pi f\right)\ (1+2\alpha)\right)$$
(19a)  
$$P_{\text{odd}}(f) = (1+\epsilon_g) \frac{2 \exp\left(-\frac{T_s}{\tau} - \frac{\Delta\theta_{\text{skew}}}{\pi}\left(\frac{T_s}{\tau} + j\ 2\pi f\ T_s\right)\right)}{1/\tau + j2\pi f}$$
$$\times \sinh\left(\frac{T_s}{2}\left(\frac{1}{\tau} + j\ 2\pi f\right)\ (1-2\alpha)\right)$$
(19b)

#### B. Accuracy, Speed, and Utility

We now evaluate the analytical model by comparing it to behavioral simulations of a 10-bit times-2 interleaved CS-DAC. The simulations have an oversampling ratio (OSR) = 8192, i.e., one sample period,  $T_s$ , is represented by 8192 samples. The large OSR is so that we can capture timing-related errors that are a small fraction of  $T_s$ , which pertains to all errors in Table I with the exception of gain errors. Furthermore, finite settling of the sub-DACs is modeled using first-order Butterworth lowpass filters ( $\tau/T_s = 0.3$ ).

Table II lists the parameters that are common to the analytical model and behavioral simulations. We assume that they are normally distributed with mean-zero and variance  $\sigma^2$ . Nominally, we ensure that a  $3\sigma$  error results in a wideband SFDR of 35dB for the behavioral simulations, e.g., according to Table II, a gain error of 2.4% guarantees

7

Authorized licensed use limited to: University of Southern California. Downloaded on August 08,2022 at 22:11:39 UTC from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS



Fig. 12. (a) Histogram of  $R = C_{\text{model}}/C_{\text{simulation}}$  in dB based on 500 randomly generated test vectors. (b) Power and phase accuracy of the analytical model for the spurs generated by a tone at  $f_1 = 0.4 f_s$  (relative to behavioral simulations).

 $\min_{f \in (0, f_s/2)}$  SFDR(f) = 35dB when all other parameters are zero. To this end, the values of  $\sigma$  are found one at a time via heuristic tuning of each parameter. Also note that  $\Delta \phi_I = 0$  is fixed, since varying  $\Delta \phi_Q$  is sufficient to model I/Q imbalance (as inferred from Fig. 5(b)). We use the distributions in Table II to generate 500 random test vectors (of dimension 6), where each will serve as a common input to the analytical model and behavioral simulation for a comparison. Specifically, for each test vector, we compute

$$C = 10 \log_{10} \left( \frac{\text{power sum of the spurious tones}}{\text{power sum of the input tones}} \right) \quad (20)$$

for a two-tone excitation with frequencies  $(f_0, f_1)$ =  $(0.05 f_s, 0.4 f_s)$ . For the behavioral simulations, (20) is derived from the DAC output PSD. The PSD bins that correspond to the spurious tones can be derived from the spur locations in Table I. For the analytical model, we compute (20) using (18), e.g., the power of the interleaving spur at  $f_s/2$  –  $f_1$  is  $\frac{1}{2} K_{f_s/2}^2 (f_s/2 - f_1)$ . Fig. 12(a) illustrates a histogram of  $R = C_{\text{model}} / C_{\text{simulation}}$  in dB, which is the ratio of (20) from the analytical model to that obtained via simulation. Note that the mean of  $[R]_{dB}$  is approximately 0dB with a low spread, which highlights the accuracy of analytical model for (20). In addition, the computations of (20) took 13.3 hours for the behavioral simulations on a modern workstation, while the analytical model took only 0.83 sec  $(5.7 \times 10^4 \text{ times faster})$ . The analytical model is fast since computations of (20) simply require the use of (18). In contrast, the behavioral simulations require oversampling, filtering, and PSD computations using Fast Fourier Transforms (FFTs). Fig. 12(b) highlights the power and phase accuracy of the analytical model for the spurs generated by the tone at  $f_1 = 0.4 f_s$ . Only 2 out of the 500 test vectors had parameters that were too small for the simulation to accurately resolve all spurs (without more oversampling), so these were discarded from the histograms in Fig. 12(b). Although omitted for brevity, we observed similar results for the tone at  $f_0 = 0.05 f_s$ . The minor differences observed in Fig. 12 may be explained by quantization and FFT windowing effects that are present in the simulations but not in the analysis.

Since the analytical model is highly accurate in all regions of practical interest, it may be used in place of behavioral

TABLE II PARAMETER DISTRIBUTIONS

| Parameter                        |          |                                                                 |
|----------------------------------|----------|-----------------------------------------------------------------|
| $\mathcal{N} \sim (0, \sigma^2)$ | $\sigma$ | Description                                                     |
| $\epsilon_g$                     | 0.008    | Gain error                                                      |
| α                                | 0.002    | C2 clock duty cycle error                                       |
| $\Delta \theta_{\text{skew}}$    | 0.017    | C2 clock skew                                                   |
| $\beta_I$                        | 0.009    | C4 clock duty cycle error (even sub-DAC)                        |
| $\beta_Q$                        | 0.009    | C4 clock duty cycle error (odd sub-DAC)                         |
| $\Delta \phi_Q$                  | 0.037    | C4 clock phase error (odd sub-DAC), $\Delta \phi_I = 0$ (fixed) |

simulations in many cases. For example, it could be used to explore the circuit design space by mapping design tolerances to spectral impairments and making design trade-offs across circuit components. The analysis is especially useful when extensive exploration or experimentation is desired since it is much faster than simulation. For example, in Section VII, we consider calibration algorithms and are able to run extensive experiments by using the analytical expressions which would otherwise be impossible to conduct via simulations.

#### VII. CALIBRATION ALGORITHM

In this section, we propose a calibration algorithm that suppresses the spurs in Table I. The algorithm is first described and then simulated using the analytical model from Section VI. After discussing the simulated results, we demonstrate its efficacy on a real, commercially-developed 10-bit TIDAC operating at 40 GS/s in 14nm CMOS.

#### A. Algorithm Description

We seek to design a *wideband* calibration algorithm, i.e., one that suppresses the spurs in Table I for frequencies over the Nyquist band. Hence, during calibration mode, we excite the DAC with a *two-tone* signal to avoid narrowband locking, which is motivated by the results in Section IV and Appendix A. In many modern transceivers, we have control over the parameters in Table I. For example, gain control has been demonstrated by adjusting the bias voltage in the DAC current cells [19]. The data timing may be adjusted using phase rotators, as shown in Fig. 2(a) and demonstrated in [20]. The authors in [10] adjust duty cycle and clock skew for a times-2 interleaved DAC using circuits similar to those



Fig. 13. Wideband SFDR results (a) Two-tone calibration with tones at  $v_{cal,0} = 0.05$  and  $v_{cal,1} = 0.4$ . (b) Single-tone calibration with a tone at  $v_{cal} = 0.24$ .

in [21]. Therefore, we propose to solve the following integer programming problem

$$s^* = \operatorname{argmin} C(s) \tag{21}$$

with cost function as in (20)

$$C(s) = 10 \log_{10} \left( \frac{\text{power sum of the spurious tones}}{\text{power sum of the input tones}} \right) (22)$$

where s is a vector of integers that map to the parameter control settings. Although there are seven controllable parameters in Table I ( $\epsilon_g$ ,  $\alpha$ ,  $\Delta \theta_{skew}$ ,  $\beta_I$ ,  $\beta_Q$ ,  $\Delta \phi_I$ ,  $\Delta \phi_Q$ ), it is sufficient to control only six of them. Specifically, we can drop either  $\Delta \phi_I$ or  $\Delta \phi_O$ , since only one of these is needed to correct the I/Q imbalance (as inferred from Fig. 5(b)). Hence,  $s \in S \subset \mathbb{R}^6$  is comprised of the parameters in Table II, where S is the parameter search space. Note that it is generally infeasible to solve (21) via a brute-force search over S. For example, if 5-bit control is used for each parameter, then S is comprised of  $(2^5)^6 = 2^{30}$  possible vectors. Instead, we propose to use simulated annealing [22], since it promotes convergence to the global optimum for problems with a large, discrete search space. For the simulations that follow, we assume that the parameters are distributed as in Table II (each with a control range of  $\pm 3\sigma$ ).

# **B.** Simulations

We now simulate the proposed two-tone simulated annealing algorithm for finding (21). For comparison, we also apply simulated annealing with a single tone as in [5]. Simulating *K* iterations of the simulated annealing algorithm requires *K* computations of (22), and we show that large values of *K* are required when more control bits are used, e.g., K >1000 when 5-bit control is used for each parameter. Hence, we run the simulations with the analytical model (where the computations of (22) are done using (18)), since it is accurate and runs much faster than the behavioral simulations (as we described in Section VI). Furthermore, we leverage its speed to simulate the algorithm over hundreds of randomized initial parameters, which we draw from the distributions in Table II.

In Fig. 13(a), we show simulated results for the two-tone calibrated SFDR vs. iterations for different control resolutions (which are assumed to be identical for each parameter). Each



Fig. 14. Overlapped parameter trajectories for 20 runs of simulated annealing (a) Two-tone calibration. (b) Single-tone calibration.

data point represents the minimum (worst-case) SFDR over the Nyquist band (averaged over 100 independent runs of simulated annealing) and the calibration aims to maximize this quantity. Without calibration, the SFDR is approximately 40dB. When calibrated with enough iterations, the SFDR is increased significantly, e.g., by over 25dB for 5-bit control. Note that the number of iterations required for convergence increases with the number of control bits. For example, the algorithm converges after roughly 1000 iterations in the 5-bit case. In contrast, the 7-bit case requires roughly 3000 iterations since it has a larger parameter search space. It is worth mentioning that producing Fig. 13(a) took 16 hours using the analytical model on a modern workstation. With the behavioral simulations described in Section VI, this corresponds to an infeasible run time of 104 years.



Fig. 15. Narrowband SFDR results (a) Two-tone calibration with tones at  $v_{cal,0} = 0.05$  and  $v_{cal,1} = 0.4$ . (b) Single-tone calibration with a tone at  $v_{cal} = 0.24$ .

Fig. 13(b) shows SFDR results for single-tone simulated annealing, where there is only a modest improvement relative to the uncalibrated result (independent of the control resolution). This is a consequence of narrowband locking, as evident from Fig. 14(b) which overlays parameter trajectories for 20 runs of single-tone simulated annealing. Specifically, for several of these runs, parameters  $\alpha$ ,  $\epsilon_{g}$ ,  $\Delta\phi_{O}$ , and  $\Delta \theta_{\rm skew}$  converge to nonzero values. While these solutions result in a high SFDR at the calibration frequency, as shown in Fig. 15(b), they perform poorly over the full Nyquist band. Single-tone simulated annealing exhibits random narrowband locking, since locking onto narrowband solutions is not guaranteed for every run of the algorithm (e.g., see the results outlined in green for the histogram in Fig. 13(b)). Such locking was not observed for the two-tone case, as evident in Fig. 14(a) where the parameters converge near zero for every run of the algorithm.

The narrowband locking behavior shown in Fig. 14(b) is caused by a similar phenomenon as that discussed in Section IV. Specifically, the single-tone gain error cancellation approach considered in Section IV was proven to have deterministic narrowband locking due to coupling between the gain and C2 clock duty cycle errors. If single-tone simulated annealing was used to calibrate just gain and C2 clock duty cycle errors, it would suffer from this effect, but since both of these parameters are adjusted, this narrowband locking would not be observed for every initialization (i.e., random narrowband locking). The single-tone simulated annealing algorithm in this section calibrates more than just gain and C2 clock duty cycle errors and may be prone to other parameter coupling effects. In fact, the results in Fig. 14(b) suggest that coupling exists between four parameters:  $\alpha$ ,  $\epsilon_g$ ,  $\Delta \phi_Q$ ,  $\Delta \theta_{skew}$ . Specifically, we observe narrowband locking with non-ideal settings for these four parameters. In contrast, the results in Fig. 14(a) suggest that using two-tone calibration eliminates narrowband locking. To investigate this further, we ran a numerical grid search over the parameter search space, S, and found that cost function (22) had a unique global minimum at the origin for the two-tone case (where all parameters in



Fig. 16. Modern transceiver used to demonstrate the simulated annealing calibration algorithm.

Table II are equal to zero). In contrast, a similar search for the single-tone case had several global minima away from the origin.

The choice of frequencies for the calibration tones is important to consider. We simulated the algorithm with calibration tones of the form ( $\nu_{cal,0}$ ,  $\nu_{cal,1}$ ) = (0.25 –  $\Delta\nu/2 - \delta$ , 0.25 +  $\Delta\nu/2$ ), where  $\Delta\nu \in (0, 0.5)$  is the tone spacing and  $\delta > 0$  is a small offset to ensure that the calibration tones do not overlap with the targeted spurs in Table I. An SFDR penalty was observed for calibration tones that were too far apart or too close together (e.g., a 10dB penalty if  $\Delta\nu = 0.1$  or  $\Delta\nu = 0.49$ ). If the calibration tones are too far apart (e.g.,  $\Delta\nu > 0.49$ ), then  $\nu_{cal,0} \approx 0$  and  $\nu_{cal,1} \approx 0.5$ , which is undesired since the C4 image spurs are shaped by  $\sin(2\pi\nu)$ , as mentioned in Section V. Hence, these targeted spurs would only negligibly affect (22) during calibration. On the other hand, calibration tones that are too close together (e.g.,  $\Delta\nu <$ 0.1) can result in narrowband locking.

One practical implementation of simulated annealing is to utilize a high-speed ADC together with FFTs to compute (22), and we demonstrate this in Section VII.C. In contrast, gain error cancellation (from Section IV) with a single tone near Nyquist requires only a low-speed ADC (to measure the  $f_s/2-f_{cal}$  spur), however, it exhibits deterministic narrowband locking and does not account for the C4 image spurs.



Fig. 17. Two-tone and single-tone simulated annealing calibration (a) Maximum of the C2 and C4 image spurs in dBc (from Table I). Losses from the test board, cables, and balun have been de-embedded from the measurements. Raw output spectrum comparison (using a Keysight N9040B UXA Signal Analyzer) with fundamental tones at (b)  $f_0 = 8.54$ GHz. (c)  $f_0 = 16.08$ GHz.

#### C. Measured Results

We now demonstrate two-tone and single-tone simulated annealing calibration on a 10-bit times-2 interleaved CS-DAC in 14nm CMOS from Jariet Technologies Inc.<sup>6</sup> The DAC is part of a modern transceiver chip with a high-speed 10-bit ADC that we use in the algorithm for DAC output measurements. Moreover, we have control over all of the parameters in Table II, except for C2 clock skew, so this is excluded during calibration.

Fig. 16 illustrates a block diagram of the transceiver. A C2 clock at 20GHz is synthesized from a PLL that uses a reference clock (REFCLK) supplied by a signal generator. This allows both the DAC and ADC to operate at an aggregate sample rate of 40GS/s. The ADC can sample the DAC output via an on-chip measurement path, which allows the ADC to capture the full Nyquist band of the DAC. The Field Programmable Gate Array (FPGA) serves as a bridge between the PC and Serial Peripheral Interface (SPI), allowing the control registers that are part of the DAC and ADC to be read and modified. For testing purposes, we wrote the algorithm in Python to run on the PC. In practice, one would implement it on an embedded MCU. To compute (22), we sample the DAC output using the ADC and then compute a FFT of size N = 8192. The FFT is the dominant computation task of the calibration algorithm. While computing 500 FFTs of size N = 8192 on an embedded MCU requires only a few seconds, utilizing Goertzel filters [23] at the measurement frequencies can further speed up this operation.

Fig. 17(a) illustrates the maximum of the spurs in Table I at frequencies over the Nyquist band (in dBc). The uncalibrated

results are based on nominal settings for each parameter. The calibrated results show the best and worst of 10 simulated annealing runs using single-tone and two-tone calibration signals. Evaluation of a particular run is determined by the wideband performance of the parameters returned after calibration. To this end, our metric is the maximum of the quantity in Fig. 17(a) over the Nyquist band – for example, comparing the purple and green curves, we consider the green one to be better since this metric is roughly -48dBc and -55dBc, respectively. Our motivation behind running the algorithm multiple times is for a deeper exploration of the solution space. Specifically, since simulated annealing is a random algorithm, it often converges to a local minimum instead of the global minimum. However, such local minima are often close enough to the global minimum to be considered feasible solutions for the intended application. For example, even the worst case two-tone solution in Fig. 17(a) still guarantees that the targeted spurs will be below -48dBc over the Nyquist band. For the single-tone case, the calibration signal was a tone at  $f_{cal} = 9.48$ GHz. An additional tone was included for the two-tone case, i.e.,  $f_{cal,0} = 9.48$ GHz and  $f_{cal,1} =$ 18.8GHz. Note that for single-tone calibration (worst case), the performance starts to degrade for frequencies greater than the calibration frequency (9.48GHz), which is indicative of narrowband locking. Although the two-tone calibration signal also includes 9.48GHz, such locking is not observed due to the presence of the additional tone at  $f_{cal,1}$  during calibration, resulting in a 15dB improvement near the Nyquist frequency. Fig. 17(b) compares single-tone and two-tone calibration by means of the output spectrum at  $f_0 = 8.54$  GHz.<sup>7</sup> Note that the targeted spurs (from Table I) are suppressed significantly

<sup>&</sup>lt;sup>6</sup>Our motivation is to demonstrate the calibration algorithm on a modern TIDAC that exhibits the errors analyzed in this work. Comparing the specific DAC used to state-of-the-art circuit research is beyond the scope of the paper.

<sup>&</sup>lt;sup>7</sup>Note that the upward slope within the noise floor is an artifact of noise density variation in the spectrum analyzer.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS

in both cases. A similar comparison is shown in Fig. 17(c) for a tone at  $f_0 = 16.08$ GHz. We suspect that the second and third harmonics (HD2, HD3) are caused by errors commonly encountered in CS-DACs [24]–[26]. The literature involving the calibration of such errors is rich, and the work in [24] provides an overview of several calibration techniques.

One practical consideration is the resolution of the ADC that is used to measure the targeted spurs. Specifically, if the ADC resolution is too low, the targeted spurs will be masked by quantization noise. For high-resolution ADCs (e.g. 10 bits), the processing gain from the FFT (i.e.,  $10 \log_{10}(\frac{M}{2})$  dB for an M-point FFT) helps resolve spurs below the quantization noise level [27]. Conservatively, a signal-to-noise ratio (SNR) of at least 20 dB for each of the targeted FFT bins should be enough to neglect quantization effects. Accordingly, we have  $P_{\text{spur}} + 6.02 \ B_{\text{ADC}} + 1.76 + 10 \log_{10}(M/2) \ge 20$ , where  $P_{\text{spur}}$ is the targeted spur power and  $B_{ADC}$  is the ADC resolution. For example, with a 10-bit ADC and a 8192-point FFT (providing 36 dB processing gain), the effects of quantization are negligible for spurs above -78dBFS. In addition, if other ADC nonidealities also produce spurs at the same locations as the targeted spurs, then it is important that they are kept sufficiently low in order to ensure an adequate SNR. For example, timing mismatch between sub-ADCs in a timeinterleaved ADC will also produce a C2 image spur [28]. Fortunately, calibration schemes for these types of nonidealities are well-documented in the literature [29]. While ADC calibration is beyond the scope of this paper, it is important to note that the ADC in Fig. 16 was calibrated prior to calibrating the DAC.

# VIII. CONCLUSION

This paper presented analysis and calibration of interleaving and data timing errors that are encountered in modern times-2 interleaved CS-DACs. First, we derived key insights by analyzing the effects of these errors in isolation. For example, the coupled analysis of gain and duty cycle errors uncovered the drawback of calibration via gain error cancellation (i.e., deterministic narrowband locking), which motivated the use of two tones in calibration mode. Moreover, we showed how data timing errors create spurs when finite settling of the sub-DACs is considered. We then developed an analytical model that includes all of the errors considered in this paper and highlighted its speed and accuracy relative to behavioral simulations. The speed and accuracy of the analytical model was leveraged to run extensive simulations of the proposed two-tone calibration algorithm. The simulation results showed that the use of two tones in calibration mode avoids narrowband locking, resulting in solutions that are effective over the full Nyquist band. This is an improvement over the previous single-tone approaches (e.g., gain error cancellation and single-tone simulated annealing), as these are prone to narrowband locking. The efficacy of the algorithm was then demonstrated experimentally on a 10-bit times-2 interleaved CS-DAC, operating at 40GS/s in 14nm CMOS.

Finally, it is worth noting that our proposed approach is classified as foreground calibration, i.e., it does not consider environmental variations. A useful direction for future work would be to quantify the sensitivity of the parameters in Table I to variations in, for example, temperature and supply voltage. Moreover, developing background calibration algorithms that remedy such variations would be another fruitful research opportunity.

# APPENDIX A Two-Tone Calibration Analysis

In this appendix, we motivate the use of two tones in calibration mode where only gain and C2 clock duty cycle errors are considered ( $\epsilon_g$  and  $\alpha$ ). Suppose the TIDAC is excited by a two-tone calibration signal of the form  $x(t) = \frac{1}{2}\cos(2\pi f_{cal,0} t) + \frac{1}{2}\cos(2\pi f_{cal,1} t)$ ,  $f_{cal,0} \neq f_{cal,1}$ . This creates one interleaving spur at  $f_s/2 - f_{cal,0}$  and another at  $f_s/2 - f_{cal,1}$ . Both of these spurs are zero if and only if  $(\alpha, \epsilon_g) = (0, 0)$ , and we prove this as follows. First, recall that the interleaving spur at  $f_s/2 - f_{cal,0}$  vanishes when  $\epsilon_g = \epsilon_g(\alpha; \nu_{cal,0})$ , i.e., for gain errors as in (8). Similarly, the interleaving spur at  $f_s/2 - f_{cal,1}$  vanishes when  $\epsilon_g = \epsilon_g(\alpha; \nu_{cal,0})$ . Hence, both interleaving spurs vanish simultaneously when  $\epsilon_g(\alpha; \nu_{cal,0}) = \epsilon_g(\alpha; \nu_{cal,1})$ , i.e.,

$$\frac{\sin\left(\pi\left(1/2 - \nu_{cal,0}\right)\left(1 + 2\alpha\right)\right)}{\sin\left(\pi\left(1/2 - \nu_{cal,0}\right)\left(1 - 2\alpha\right)\right)} = \frac{\sin\left(\pi\left(1/2 - \nu_{cal,1}\right)\left(1 + 2\alpha\right)\right)}{\sin\left(\pi\left(1/2 - \nu_{cal,1}\right)\left(1 - 2\alpha\right)\right)} \quad (23)$$

Note that  $\alpha = 0$  is a solution to (23). Moreover, it is in fact the unique solution in the interval of interest  $\alpha \in [-0.5, 0.5]$ . To prove this, we show that (23) cannot hold for  $\alpha \in \mathcal{A}^- \cup \mathcal{A}^+$ , where  $\mathcal{A}^- = [-0.5, 0)$  and  $\mathcal{A}^+ = (0, 0.5]$ . To this end, we show that the following function

$$f(w) = \frac{\sin(\pi w (1+2\alpha))}{\sin(\pi w (1-2\alpha))}$$
(24)

is one-to-one over the domain  $w \in [0, 0.5]$  for  $a \neq 0$ . A property of such a function is that  $f(w_0) = f(w_1)$  implies  $w_0 = w_1$ . However, this property cannot hold for (23) since  $v_{\text{cal},0} \neq v_{\text{cal},1}$  is assumed. It can be shown that the sign of the df/dw is determined by the following quantity

$$G(w) = 2\alpha \, \sin(2\pi \, w) - \sin(4\pi \, w\alpha) \tag{25}$$

Note that (25) is a continuous function over I for any  $\alpha \in$  $\mathcal{A}^- \cup \mathcal{A}^+$ . It is straightforward to verify that there are no critical points in the open interval I' = (0, 0.5) by setting dG/dw equal to zero and solving for w. By the extreme value theorem, this means that the absolute extrema must lie at the endpoints of I, i.e., w = 0 and w = 0.5. Accordingly, we have G(0) = 0 and  $G(0.5) = -\sin(2\pi\alpha)$ . Therefore, if  $\alpha \in \mathcal{A}^+$  then  $G(w) \leq 0 \ \forall w \in I$ . Since the sign of df/dwis determined by G(w), this also implies  $df/dw \le 0 \forall w \in I$ . Thus, we conclude that (24) is one-to-one over I for any  $\alpha \in \mathcal{A}^+$ . Analogously, we also conclude that f(w) is one-toone over I for any  $\alpha \in \mathcal{A}^-$ . Hence, (24) is one-to-one over I for any  $\alpha \in \mathcal{A}^- \cup \mathcal{A}^+$ . Consequently,  $\alpha = 0$  is the unique solution to (23) in the interval of interest  $\alpha \in [-0.5, 0.5]$ . Furthermore, substituting  $\alpha = 0$  into (8) yields  $\epsilon_g = 0$ . Thus, calibration based on two distinct tones promotes convergence

to the wideband solution  $(\alpha, \epsilon_g) = (0, 0)$ , i.e., it avoids Proceeding with the Fourier transform of (28) yields narrowband locking at just a single frequency.

# APPENDIX B ANALYSIS OF DATA TIMING ERRORS

In this appendix, we derive the output spectrum coefficients in (3) where only data timing errors are considered. Specifically, referring to Table I, we only consider  $\beta_I$ ,  $\beta_O$  (C4 clock duty cycle errors) and  $\Delta \phi_I$ ,  $\Delta \phi_Q$  (C4 clock phase errors). Fig. 9 illustrates how these errors affect the data timing. First, recall that the analysis with ideal data timing resulted in (10) for the even sub-DAC. If we include C4 clock errors in the analysis (i.e.,  $\beta_I$  and  $\Delta \phi_I$ ), then this becomes

$$e_{\text{even}}(t) = \left(\Delta x(t) \sum_{k=-\infty}^{\infty} \delta(t - 4k T_s)\right) * p_0(t) + \left(\Delta x(t) \sum_{k=-\infty}^{\infty} \delta(t - (4k + 2) T_s)\right) * p_2(t) \quad (26)$$

where

$$p_0(t) = e^{-\frac{1}{\tau}(t+T_s + 2\beta_I T_s - \frac{2}{\pi}\Delta\phi_I T_s)} h(t)$$
(27a)

$$p_2(t) = e^{-\frac{1}{\tau}(t+T_s - 2\beta_I T_s - \frac{2}{\pi}\Delta\phi_I T_s)} h(t)$$
(27b)

 $\Delta x(t)$  is defined in (11), and  $h(t) = \operatorname{rect}(t/T_s)$ . We observe that (26) has two separate terms, unlike in (10). This is a consequence of duty cycle error  $\beta_I$ , since samples x[4k] fire early by  $2\beta_I T_s$  and samples x[4k+2] fire late by the same quantity, as shown in Fig. 9(a). For convenience, we rewrite (26) as

$$e_{\text{even}}(t) = \left(c_0 \,\Delta x(t) \sum_{k=-\infty}^{\infty} \delta(t - 4k \, T_s)\right) * p(t) + \left(c_2 \,\Delta x(t) \sum_{k=-\infty}^{\infty} \delta(t - (4k + 2) \, T_s)\right) * p(t)$$
(28)

where

$$c_0 = e^{-\frac{1}{\tau}(2\beta_I T_s - \frac{2}{\pi}\Delta\phi_I T_s)}$$
(29a)

$$c_2 = e^{-\frac{1}{\tau}(-2\beta_I T_s - \frac{2}{\pi}\Delta\phi_I T_s)}$$
(29b)

and p(t) is defined in (12), which is the settling pulse with ideal data timing. Note that  $\beta_I$  and  $\Delta \phi_I$  are now embedded in the constants in (29). Analogously, for the odd sub-DAC we have

$$e_{\text{odd}}(t) = \left(c_1 \Delta x(t) \sum_{k=-\infty}^{\infty} \delta(t - (4k+1)T_s)\right) * p(t) + \left(c_3 \Delta x(t) \sum_{k=-\infty}^{\infty} \delta(t - (4k+3)T_s)\right) * p(t) \quad (30)$$

where

$$c_1 = e^{-\frac{1}{\tau}(2\beta_Q T_s - \frac{2}{\pi}\Delta\phi_Q T_s)} \tag{31a}$$

$$c_3 = e^{-\frac{1}{\tau}(-2\beta_Q T_s - \frac{2}{\pi}\Delta\phi_Q T_s)}$$
(31b)

$$E_{\text{even}}(f) = \frac{f_s}{4} P(f) \sum_{k=-\infty}^{\infty} \Delta X(f - kf_s/4) \left( c_0 + (-1)^k c_2 \right)$$
(32)

where  $\Delta X(f)$  and P(f) are the Fourier transforms of (11) and (12), respectively. Further simplifying by separating the even and odd terms yields

$$E_{\text{even}}(f) = \frac{f_s}{4} c_+(\beta_I, \Delta \phi_I) P(f) \sum_{k=-\infty}^{\infty} \Delta X(f - kf_s/2) - \frac{f_s}{4} c_-(\beta_I, \Delta \phi_I) P(f) \sum_{k=-\infty}^{\infty} \Delta X(f - (2k+1)f_s/4)$$
(33)

where  $c_+(u, v)$  and  $c_-(u, v)$  are defined in (14). Furthermore, note that the Fourier transform of (11) is  $\Delta X(f) =$ M(f) P(f) where M(f) and P(f) are defined in (15) and (16), respectively. Substituting this into (33) yields

$$E_{\text{even}}(f) = \frac{f_s}{4} c_+(\beta_I, \Delta \phi_I) P(f) M(f) \sum_{k=-\infty}^{\infty} X(f - kf_s/2) - \frac{f_s}{4} c_-(\beta_I, \Delta \phi_I) P(f) M(f + f_s/4) \times \sum_{k=-\infty}^{\infty} X(f - (2k+1)f_s/4)$$
(34)

where we have used the fact that  $M(f - kf_s/2) = M(f)$  and  $M(f - (2k + 1)f_s/2) = M(f + f_s/4)$ . Similarly, the settling error spectrum for the odd sub-DAC may be derived by taking the Fourier transform of (30), which is

$$E_{\text{odd}}(f) = \frac{f_s}{4} c_+(\beta_Q, \Delta \phi_Q) P(f) M(f) \\ \times \sum_{k=-\infty}^{\infty} X(f - kf_s/2) (-1)^k \\ + j \frac{f_s}{4} c_-(\beta_Q, \Delta \phi_Q) P(f) M(f + f_s/4) \\ \times \sum_{k=-\infty}^{\infty} X(f - (2k+1) f_s/4) (-1)^k$$
(35)

where  $j = \sqrt{-1}$  is the imaginary unit. Finally, the output spectrum, Y(f), is the sum of (34), (35), and (2b), which is of the form (3) with coefficients as in (13).

#### ACKNOWLEDGMENT

The authors would like to thank Jariet Technologies, Inc., for providing the equipment that made this research possible. They would also like to thank the anonymous reviewers whose feedback substantially improved this article.

14

#### REFERENCES

- B. Razavi, "The current-steering DAC [a circuit for all seasons]," *IEEE Solid-State Circuits Mag.*, vol. 10, no. 1, pp. 11–15, Winter 2018.
- [2] S.-N. Kim, W.-C. Kim, M.-J. Seo, and S.-T. Ryu, "A 65-nm CMOS 6-bit 20 GS/s time-interleaved DAC with full-binary sub-DACs," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 65, no. 9, pp. 1154–1158, Sep. 2018.
- [3] E. Olieman, A. J. Annema, and B. Nauta, "An interleaved full Nyquist high-speed DAC technique," *IEEE J. Solid-State Circuits*, vol. 50, no. 3, pp. 704–713, Mar. 2015.
- [4] H. Huang, J. Heilmeyer, M. Grözing, M. Berroth, J. Leibrich, and W. Rosenkranz, "An 8-bit 100-GS/s distributed DAC in 28-nm CMOS for optical communications," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 4, pp. 1211–1218, Apr. 2015.
- [5] D. Beauchamp and K. M. Chugg, "Machine learning based image calibration for a twofold time-interleaved high speed DAC," in *Proc. IEEE 62nd Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Aug. 2019, pp. 908–912.
- [6] W.-C. Kim, D.-S. Jo, Y.-J. Roh, Y.-D. Kim, and S.-T. Ryu, "A 6b 28 GS/s four-channel time-interleaved current-steering DAC with background clock phase calibration," in *Proc. Symp. VLSI Circuits*, Jun. 2019, pp. C138–C139.
- [7] D. J. Stoops, J. Kuo, P. J. Hurst, B. C. Levy, and S. H. Lewis, "Digital background calibration of a split current-steering DAC," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 8, pp. 2854–2864, Aug. 2019.
- [8] S. Su and M. S.-W. Chen, "A 12-bit 2 GS/s dual-rate hybrid DAC with pulse-error pre-distortion and in-band noise cancellation achieving > 74 dBc SFDR and < -80 dBc IM3 up to 1 GHz in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 2963–2978, Dec. 2016.
- [9] C.-H. Lin *et al.*, "A 12 bit 2.9 GS/s DAC with IM3 ≪ −60 dBc beyond 1 GHz in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3285–3293, Dec. 2009.
- [10] P. Caragiulo, O. E. Mattia, A. Arbabian, and B. Murmann, "A 2× time-interleaved 28-GS/s 8-bit 0.03-mm<sup>2</sup> switched-capacitor DAC in 16-nm FinFET CMOS," *IEEE J. Solid-State Circuits*, vol. 56, no. 8, pp. 2335–2346, Aug. 2021.
- [11] L. Zhou et al., "A 30 Gsps 6bit DAC in SiGe BiCMOS technology," in Proc. IEEE Int. Conf. Electron., Circuits Syst. (ICECS), Dec. 2016, pp. 37–40.
- [12] E. Olieman, "Time-interleaved high-speed D/A converters," Ph.D. dissertation, Univ. Twente, Enschede, The Netherlands, Mar. 2016. [Online]. Available: https://www.utwente.nl/en/eemcs/icd/
- [13] L. Angrisani and M. D'Arco, "Modeling timing jitter effects in digitalto-analog converters," *IEEE Trans. Instrum. Meas.*, vol. 58, no. 2, pp. 330–336, Feb. 2009.
- [14] X. Geng, Y. Tian, Y. Xiao, Z. Ye, Q. Xie, and Z. Wang, "A 25.8 GHz integer-N PLL with time-amplifying phase-frequency detector achieving 60 fsrms jitter, -252.8 dB FoMJ, and robust lock acquisition performance," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 388–390.
- [15] M. Mansuri, A. Hadiashar, and C.-K. K. Yang, "Methodology for onchip adaptive jitter minimization in phase-locked loops," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 870–878, Nov. 2003.
- [16] S. Pavan, R. Schreier, and G. C. Temes, Understanding Delta-Sigma Data Converters. Hoboken, NJ, USA: Wiley, 2017.
- [17] S. Balasubramanian *et al.*, "Systematic analysis of interleaved digitalto-analog converters," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 58, no. 12, pp. 882–886, Dec. 2011.
- [18] T. Chen and G. G. E. Gielen, "The analysis and improvement of a current-steering DACs dynamic SFDR-I: The cell-dependent delay differences," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 53, no. 1, pp. 3–15, Jan. 2006.
- [19] Y. Cong and R. L. Geiger, "A 1.5-V 14-bit 100-MS/s self-calibrated DAC," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2051–2060, Dec. 2003.

- [20] J. Savoj, A. Abbasfar, A. Amirkhany, M. Jeeradit, and B. W. Garlepp, "A 12-GS/s phase-calibrated CMOS digital-to-analog converter for backplane communications," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1207–1216, May 2008.
- [21] J. Kim et al., "A 112 Gb/s PAM-4 56 Gb/s NRZ reconfigurable transmitter with three-tap FFE in 10-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 29–42, Jan. 2019.
- [22] L. M. Rios and N. V. Sahinidis, "Derivative-free optimization: A review of algorithms and comparison of software implementations," J. Global Optim., vol. 56, no. 3, pp. 1247–1293, 2013.
- [23] E. Jacobsen and R. Lyons, "The sliding DFT," IEEE Signal Process. Mag., vol. 20, no. 2, pp. 74–80, Mar. 2003.
- [24] S. McDonnell, V. J. Patel, L. Duncan, B. Dupaix, and W. Khalil, "Compensation and calibration techniques for current-steering DACs," *IEEE Circuits Syst. Mag.*, vol. 17, no. 2, pp. 4–26, May 2017.
- [25] C. Su and R. L. Geiger, "Dynamic calibration of current-steering DAC," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2006, p. 120.
- [26] Y. Tang *et al.*, "A 14 bit 200 MS/s DAC with SFDR >78 dBc, IM3 < -83 dBc and NSD <-163 dBm/Hz across the whole Nyquist band enabled by dynamic-mismatch mapping," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, pp. 1371–1381, Jun. 2011.
- [27] W. Kester, "Taking the mystery out of the infamous formula," snr= 6.02 n+ 1.76 db," and why you should care," Analog Devices Tutorial, Norwood, MA, USA, Tech. Rep. MT-001 Rev. A, 2009, vol. 10, no. 8.
- [28] M. El Chammas and B. Murmann, Background Calibration of Time-Interleaved Data Converters. New York, NY, USA: Springer, 2011.
- [29] M. Bagheri, F. Schembari, H. Zare-Hoseini, R. B. Staszewski, and A. Nathan, "Interchannel mismatch calibration techniques for timeinterleaved SAR ADCs," *IEEE Open J. Circuits Syst.*, vol. 2, pp. 420–433, 2021.



**Daniel Beauchamp** (Member, IEEE) received the B.A.Sc. and M.A.Sc. degrees in electrical engineering from the University of Toronto in 2012 and 2014, respectively. He is currently pursuing the Ph.D. degree with the Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California. He is an RF Systems Engineer with Jariet Technologies Inc., where he is focused on the design of calibration algorithms for high-speed data converters.



Keith M. Chugg (Fellow, IEEE) received the B.S. degree (Hons.) in engineering from the Harvey Mudd College in 1989 and the Ph.D. degree in electrical engineering from the University of Southern California (USC) in 1995. Since 1996, he has been on the faculty of the Ming Hsieh Department of Electrical and Computer Engineering, USC, where he is currently a Professor. He is also the Co-Founder of TrellisWare Technologies, Inc., where he serves as a Chief Scientist. His research interests are in machine learning, signal processing, digital commu-

nications, and associated efficient implementations. He was a recipient of the ASEE Frederick Emmons Terman Award and the National Academy of Inventors Fellow.

Keitl degre Mud