# A 6-bit 600-MS/s 5.3-mW Asynchronous ADC in $0.13-\mu m$ CMOS Shuo-Wei Michael Chen, Student Member, IEEE, and Robert W. Brodersen, Fellow, IEEE Abstract—An asynchronous analog-to-digital converter (ADC) based on successive approximation is used to provide a high-speed (600-MS/s) and medium-resolution (6-bit) conversion. A high input bandwidth (>4 GHz) was achieved which allows its use in RF subsampling applications. By using asynchronous processing techniques, it avoids clocks at higher than the sample rate and speeds up a nonbinary successive approximation algorithm utilizing a series nonbinary capacitive ladder with digital radix calibration. The sample rate of 600 MS/s was achieved by time-interleaving two single ADCs, which were fabricated in a $0.13-\mu m$ standard digital CMOS process. The ADC achieves a peak SNDR of 34 dB, while only consuming an active area of $0.12 \ mm^2$ and having power consumption of $5.3 \ mW$ . Index Terms—Analog-to-digital conversion, analog integrated circuits, asynchronous logic circuits, calibration, capacitive ladder, comparators, high-speed integrated circuits, impulse radio, non-binary successive approximation, ultra-wideband (UWB). #### I. INTRODUCTION RENDS IN MANY communication systems, such as ultra-wideband (UWB), cognitive, and software-defined radio, require ever wider signal bandwidths, increased flexibility, and system integration, yet with lower power consumption and smaller area to meet cost targets. Typical requirements of these system architectures demand medium-resolution (~6-bit) and high-speed (~GHz) analog-to-digital converters (ADCs) such as needed in 802.15 UWB standard. If the ADC is designed with sufficient input bandwidth, it is able to subsample the wideband RF signal and achieve a radio solution that even further dramatically reduces implementation cost [1]. Conventionally, a flash-type converter [2]–[5] is often chosen when the sample rate is high, since it can perform a conversion in a single clock cycle. However, this comes at the expense of an exponential dependence of area and power on the resolution, as well as offset variations of the parallel paths, which requires pre-amplifiers or extra calibrations [6]. On the other hand, a successive approximation (SA) architecture has only a logarithmic dependence on resolution, but consumes multiple clock cycles to implement the conversion algorithm [7], which requires more time-interleaving for faster conversion speed. Manuscript received March 4, 2006; revised August 11, 2006. This work was supported by the IEEE, by the Army Research Office, North Carolina under Award No. 065861, and by industrial members of the Berkeley Wireless Research Center. The authors are with the Berkeley Wireless Research Center, University of California, Berkeley, CA 94704 USA (e-mail: swchen@eecs.berkeley.edu; shuowei@gmail.com). Digital Object Identifier 10.1109/JSSC.2006.884231 This work explores architectural strategies and circuit techniques to optimize the power efficiency and area of a high-speed ADC. An asynchronous ADC architecture is proposed to speed up the power-efficient SA algorithm using a dynamic comparator and digital logic to facilitate asynchronous processing. To achieve the high speed and high bandwidth requirement of this ADC, a series capacitive ladder network is used to reduce the effective capacitance. In fact, much effort has been made to push the size limit of this ladder, raising the concern of random errors. Therefore, a post-processing digital calibration scheme is used to compensate for random errors induced from manufacture. Finally, the asynchronous logic is optimized for its speed by using dynamic digital logic. This paper describes the prototype design of a 6-bit 600-MS/s asynchronous ADC [8] that consumes a total power of 5.3 mW. Section II reviews the power efficiency of conventional ADC topologies and describes the concept and architecture of the proposed asynchronous approach. In Section III, the implementation details of the asynchronous ADC are provided, and measured results are shown in Section IV. Finally, the technology scaling and potential usage of the proposed ADC architecture are concluded in Section V. ## II. ADC ARCHITECTURE #### A. Power Efficiency of Conventional ADC Architectures Fig. 1 shows the three commonly used Nyquist ADC topologies: flash, pipeline, and successive-approximation ADC (SAR). A first-order estimation of power and conversion speed of these conventional topologies is performed to identify the best entry point for further power efficiency improvement. Traditionally, flash ADCs are favored for high-speed N-bit converters since $2^N-1$ comparators are utilized to make a fully parallel comparison with the entire quantization levels within one clock cycle. The decoding circuits solving sparkle and metastability issues and thermometer-to-binary code conversion also dissipate extra power. The total power consumption of a flash ADC therefore roughly scales as $2^N$ . In Fig. 1, the conversion speed is normalized to one for comparison with other architectures corresponding to the fact that the full conversion is complete within one sample clock cycle. An approach to breaking the exponential dependence of the number of comparators on the number of bits is the use of a pipeline ADC. Instead of fully parallel comparison, it divides the process into several comparison stages, the number of which is proportional to the number of bits. Therefore, the total number of required comparators is greatly reduced, with only N comparators required for a 1-bit per stage, N-bit pipeline ADC. However, due to the pipeline structure of both analog and digital signal path, Fig. 1. Conventional architectures for Nyquist ADCs. Fig. 2. Synchronous conversion for SAR ADCs. inter-stage residue amplification is needed which consumes considerable power and limits high-speed operation. While it is possible to make use of open-loop residue amplification [9], an extra calibration loop is needed, increasing overall complexity and power consumption. Therefore, the total power consumption of a pipeline ADC increases > N with a speed <1. For low conversion speeds, an SAR approach is often used since it also divides a full conversion into several comparison stages in a way similar to the pipeline ADC, except the algorithm is executed sequentially rather than in parallel as in the pipeline case. An N-bit SAR converter utilizes only one comparator with N clock cycles to complete a full conversion. Thus, the total power consumption is normalized to approximately one, while speed is now 1/N. Since the ratio of power and speed represents the energy consumption per conversion sample, SAR converters clearly have a power efficiency advantage over the other approaches. Due to the fact that the power efficiency difference between SAR and flash topologies increases exponentially with the number of bits N, an SAR converter provides a promising starting point for achieving the most power efficient solution. However, the sequential operation of the SA algorithm has traditionally been a limitation in achieving high-speed operation, so in the following section, an architecture based on asynchronous processing will be used to yield high-speed operation with a normalized power/speed ratio $\ll N$ . #### B. Asynchronous Processing The conventional implementation of the SA algorithm, such as an SAR converter, relies on a synchronous clock to divide the time into a signal tracking phase and conversion phase which progresses from the MSB to the LSB as shown in Fig. 2. For an N-bit converter with conversion rate of $F_s$ , a synchronous approach would require a clock running at least $(N+1) \cdot F_s$ . Since an SAR converter is traditionally used in lower conversion rate regime, clock generation is less of an issue. However, for a high-speed converter, the clock generation of this high-speed internal clock is a significant overhead. For example, a 300-MS/s 6-bit SAR would require a 2.1-GHz clock. Synthesizing such a high-frequency clock plus the clock distribution network would likely consume more power than the ADC itself. From a speed perspective, every clock cycle has to tolerate the worst case comparison time, which is composed of maximum DAC settling time and comparator resolving time depending on the minimum resolvable input level. In addition, every clock cycle requires margin for the clock jitter which will either slow down the conversion speed or impose a stringent jitter requirement on the clock generator. Therefore, the power and speed limitations of a synchronous SA design come largely from the high-speed internal clock. Using asynchronous processing of the internal comparisons removes the need for such a clock and substantially improves the power efficiency compared to a synchronous design. On the top level, a global clock running at the sample rate is still used for uniform sampling, since most of the digital baseband to date remains in a synchronous world. The concept of asynchronous processing is to trigger the internal comparison from MSB to LSB like dominoes. As shown in Fig. 3, whenever the current comparison is complete, a ready signal is generated to trigger the following comparison. Fig. 3. Asynchronous processing concept. The voltage difference $(V_{\rm res})$ between input signal and reference level determines the comparator resolving time. For example, a typical regenerative latch has the following tradeoff between input voltage $(V_{\rm res})$ and resolving time $(T_{\rm cmp})$ [10]: $$T_{\rm cmp} = \frac{\tau}{A_o - 1} \cdot \ln \frac{V_{\rm FS}}{V_{\rm res}} = K \cdot \ln \frac{V_{\rm FS}}{V_{\rm res}}$$ (1) where $A_o$ is the small-signal gain of the internal inverting amplifier, $\tau$ is the time constant at the latch outputs, and $V_{\rm FS}$ is the full logic swing level. Depending on the comparator topology, the resolving time and input voltage tradeoff will change. Nevertheless, this simple regenerative latch model provides intuition of how asynchronous processing helps to improve the conversion speed. For an *N*-bit converter, the total resolving time of both synchronous and asynchronous design can be expressed as $$T_{\text{async}} = \sum_{i=0}^{N-1} K \cdot \ln \frac{V_{\text{FS}}}{V_{\text{res}}[i]}$$ $$T_{\text{sync}} = N \cdot K \cdot \ln \frac{V_{\text{FS}}}{V_{\text{min}}}$$ (2) where $V_{\text{res}}[i]$ denotes the input voltage of the comparator at the ith stage (Fig. 3), and $V_{\min}$ is usually set by the LSB level. Clearly, the asynchronous conversion takes the advantage of the faster comparison cycles, since only one of these $V_{\rm res}[i]$ , $\forall i \in [0,N-1]$ will fall within $\pm 1/2$ LSB due to the successive approximation algorithm. The amount of conversion time savings between $T_{\rm async}$ and $T_{\rm sync}$ is a function of the number of bits as well as the profile of $V_{\rm res}[i]$ , which depends on the input voltage level. In the extreme case, a 1-bit converter does not benefit from asynchronous processing, since the only comparison cycle is always limited by the worst case resolving time. As the number of bits increases, $V_{\rm res}[i]$ will distribute over the full scale range and thus create time savings. Intuitively, the wider the range of $V_{\rm res}$ , the faster conversion speed it can achieve. With the assistance of numerical analysis of (2), the best case Fig. 4. Best (solid line) and worst (dash line) case of $V_{\rm res}$ profile. scenario is found when the input signal is at full swing, i.e., $V_{\rm res}$ reaches $\pm 1/2V_{\rm FS}$ . When $V_{\rm res}$ alternates its polarity from consecutive comparison cycles, it results in the longest conversion time, as shown in Fig. 4. The ratio of $T_{\rm async}/T_{\rm sync}$ of both the best and worst case is derived as a function of number of bits in order to explore the theoretical performance bound of asynchronous processing. In the best case, $V_{\rm res}[i]$ are simply $V_{\rm FS}/2, V_{\rm FS}/4, V_{\rm FS}/8, \ldots, V_{\rm FS}/2^N$ , assuming a binary successive approximation algorithm. Defining $V_{\rm LSB} = V_{\rm FS}/2^N$ , the minimum value of $T_{\rm async}/T_{\rm sync}$ can be expressed as $$\frac{T_{\text{async}}}{T_{\text{sync}}}\Big|_{\text{min}} = \frac{\ln \frac{V_{\text{FS}}}{V_{\text{LSB}}} + \ln \frac{V_{\text{FS}}}{2V_{\text{LSB}}} + \ln \frac{V_{\text{FS}}}{4V_{\text{LSB}}} + \dots \ln \frac{V_{\text{FS}}}{2^{N-1}V_{\text{LSB}}}}{N \cdot \ln \frac{V_{\text{FS}}}{V_{\text{LSB}}/2}} = \frac{N^2 \ln 2 - \frac{N}{2}(N-1)\ln 2}{N(N+1)\ln 2} = \frac{1}{2}.$$ (3) In the worst case, the input voltage level that leads to comparison results with alternating polarity can be better understood as Fig. 5. Simplified block diagrams of the ADC architecture. the number of bit increases from the 2-bit case, assuming $V_{\rm res}$ begins from the positive side. $$\begin{aligned} \text{2-bit Case} &\Rightarrow V_{\text{res}}[0] - \frac{V_{\text{FS}}}{4} < 0 \\ \text{3-bit Case} &\Rightarrow V_{\text{res}}[0] - \frac{V_{\text{FS}}}{4} + \frac{V_{\text{FS}}}{8} > 0 \\ \text{4-bit Case} &\Rightarrow V_{\text{res}}[0] - \frac{V_{\text{FS}}}{4} + \frac{V_{\text{FS}}}{8} - \frac{V_{\text{FS}}}{16} < 0 \\ &\vdots \\ \Rightarrow \frac{1}{8} \left( 1 + \frac{1}{4} + \frac{1}{16} + \cdots \right) < V_{\text{res}}[0] < \frac{1}{4} \left( 1 - \frac{1}{4} - \frac{1}{16} + \cdots \right) \\ &\Rightarrow V_{\text{res}}[0] \rightarrow \frac{1}{6} V_{\text{FS}} \\ &\text{as number of bits increases.} \end{aligned} \tag{4}$$ Given the derived results from (4), the worst case conversion time occurs when $V_{\rm in}$ is $V_{\rm FS}/3$ or $2V_{\rm FS}/3$ regardless of the number of bits, and therefore the maximum value of $T_{\rm async}/T_{\rm sync}$ is derived as shown in (5) at the bottom of the page. Note that the ratio of $T_{\rm async}/T_{\rm sync}$ in (5) approaches 1/2 as N increases. In other words, given the lower and upper bound from (3) and (5), the maximum resolving time reduction between synchronous and asynchronous case is twofold. Moreover, the conversion time savings over a synchronous approach increases with higher ADC resolution. #### C. Architecture While there are several possible architectures to incorporate the asynchronous processing concept, the first prototype has utilized only one comparator with a charge redistribution network to achieve a low-complexity implementation similar to an SAR converter. Since the internal comparisons use the same comparator, it does not require special attention to reduce its offset in the analog domain as the global offset can be subtracted in the digital domain. However, the overall conversion speed is slowed down because the comparator must be reset after each comparison cycle. The charge redistribution capacitor network is used to sample the input signal and serves as a digital-to-analog converter (DAC) for creating and subtracting reference voltages. Besides asynchronous processing, time interleaving [11] is used to increase the maximum conversion rate over what a single ADC can achieve. Note that there are power and area overheads as the number of parallel converters increases. Therefore, a single asynchronous ADC should be optimized for high speed and small silicon area. In this prototype (Fig. 5), two ADCs are time interleaved for a doubling of the sample rate over an individual ADC. The two phase (0 and 180°) clocks are provided via on-chip inversion, and used as sampling clocks and reset signals. The high input bandwidth (>4 GHz) of the individual converter achieved here would actually allow additional time interleaving. There are two critical delay paths in this architecture, which involve signal and timing. For the signal path, each internal comparison result is stored in an SR latch as a buffering stage to $$\frac{T_{\text{async}}}{T_{\text{sync}}}\Big|_{\text{max}} = \frac{\ln \frac{V_{\text{FS}}}{1/6 \cdot V_{\text{FS}}} + \ln \frac{V_{\text{FS}}}{1/12 \cdot V_{\text{FS}}} + \dots \ln \frac{V_{\text{FS}}}{\max (1/(3 \cdot 2^N) \cdot V_{\text{FS}}, V_{\text{LSB}}/2)}}{N \cdot \ln \frac{V_{\text{FS}}}{V_{\text{LSB}}/2}} = \frac{(N-1)\ln 3 + \ln 2 + \frac{N}{2}(N+1)\ln 2}{N(N+1)\ln 2} \tag{5}$$ Fig. 6. Dynamic comparator schematic. the temporary bit caches. For the asynchronous timing path, the comparator's outputs are detected by a ready signal generator as a data completion flag of each comparison cycle. This ready signal then drives a sequencer to provide multiple-phase clocks for switching logic and temporary bit caches to store the internal comparison results. A separate pulse generator creates a reset phase for the comparator to avoid any memory effect from the previous comparisons. Note that the ready signal generator, pulse generator, and sequencer are the dedicated digital logic functions to perform asynchronous conversion and they occupy only a small portion of the silicon area. Finally, the bit streams at the output of the bit caches are designed for high throughput, which raises the difficulty of real-time streaming off chip. Therefore, a 1K-depth on-chip SRAM is used to store the converted data, and later read out to off-chip at a much slower rate. The integration of the SRAMs is solely for testing purposes, and occupies most of the die area. # III. CIRCUIT IMPLEMENTATION DETAILS # A. Dynamic Comparator and Ready Signal The design of the comparator requires special consideration because of the need to generate a data ready signal. Shown in Fig. 6, a dynamic comparator is used that is composed of a pre-amplifier and regenerative latch [12]. The complementary outputs of the comparator are connected to the positive supply during the reset phase and one of the two outputs $(Q_p \text{ and } Q_n)$ is pulled down to the negative supply during comparison. Therefore, digital logic which is able to distinguish state "1" "1" (reset phase) from state "1" "0" or "0" "1" (data ready) seems to be sufficient for the ready signal generation. However, one potential issue with asynchronous processing is that the comparator can take a long resolving time when its input voltage is small. Waiting for the comparator to fully resolve will slow down or even halt the entire conversion process. As a matter of fact, whenever the input signal is less than 1 LSB, i.e., the decision does not affect the converter accuracy, the ready signal generator should still set the flag and the decision result is simply taken from the previous value stored in the SR latch. Since both outputs $(Q_p \text{ and } Q_n)$ will drop together to a lower level when the comparator is sufficiently small, a complimentary CMOS NAND gate with input threshold skewed above this level is a key solution to the ready signal generation. As $Q_p$ and $Q_n$ both drop across the threshold, the NAND gate will treat it as data ready and continue the remaining conversion process. There are reset switches in both the pre-amplifier and latch stage that help to reduce the comparator recovery time during the reset phase. An input offset cancellation is also utilized for the pre-amplifier stage but is not critical in this ADC architecture as mentioned earlier. Current mirrors between the two stages are useful to reduce charge kickback [13] from the logic level swing of the latch onto the input capacitors preceded the pre-amplifier. This is especially important since the input capacitor network is pushed to the minimum possible value as will be described shortly. #### B. Nonbinary Successive Approximation Review Instead of a binary successive approximation scheme, this ADC adopts redundancy to allow dynamic decision errors for faster conversion speed [14]. In other words, the overlapped search range compensates for wrong decisions made in earlier stages as long as they are within the error tolerance range. This eliminates the constraint of DAC settling accuracy to be less than 1 LSB, and thus helps to reduce the settling time. The equivalent radix is less than 2 and computed as $$Radix = 2^{N_{bit}/(N_{bit} + N_{rdn})}$$ (6) where $N_{\rm bit}$ is the target bit resolution and $N_{\rm rdn}$ is the redundant bit. In this prototype, one extra redundant bit is used for the target of 6-bit resolution. Therefore, the equivalent radix is about 1.81 and results in about 50% reduction in DAC settling time while sacrificing just 15% conversion time due to the extra comparison cycle. Fig. 7. Conventional implementation of radix creation. (a) Geometrically scaled capacitor array. (b) Unitary capacitor array. Fig. 8. Series nonbinary capacitive ladder. (a) Ideal case without parasitic capacitance; (b) inclusion of parasitic capacitance. In terms of implementation, there are two conventional approaches as shown in Fig. 7. The geometrically scaled capacitor array makes use of a parallel bank of capacitors ratioed from $1, \alpha, \alpha^2, \dots, \alpha^{N-1}$ , as shown in Fig. 7(a). The advantage of this approach is the low complexity of the switching logic, since only one capacitor will be switched at each comparison cycle. The propagation delay and power consumption through switching logic is expected to be lower. However, in this proto type, the ratio $\alpha$ is noninteger, which significantly increases layout complexity and matching difficulty [15] for a full array. On the other hand, an unitary capacitor array [Fig. 7(b)] can be used to avoid this noninteger matching issue. The nonbinary code words are stored in a digital ROM. However, the propagation delay through a digital ROM is much larger compared to the previous case due to the longer logic depth. Moreover, the total input capacitance of both schemes is on the order of $2^N$ times unit capacitance. Even for a 6-bit case, the total capacitance can be on the order of picofarads with just tens of femtofarads unit capacitance, which is set by matching and parasitic considerations. This causes additional power consumption as well as difficulty in maintaining a high input bandwidth. # C. Series Nonbinary Capacitive Ladder Another approach was therefore taken to create an arbitrary radix, i.e., in effect an analog ROM. This approach uses a ladder structure of a nonbinary capacitor array, which allows a significant reduction in the input capacitance with relaxed matching and layout requirements. Shown in Fig. 8(a), three different sizes of capacitors ratioed from $1:\alpha:\beta$ are used to build the ladder. The approach is to have the equivalent capacitance at every internal node be identical, i.e., $\beta \cdot C_u$ . Therefore, the charge redistribution from one section to the adjacent one always sees the capacitive divider between $\alpha \cdot C_u$ and $\beta \cdot C_u$ . This division ratio will determine the radix of the SA algorithm. Based on the above observations, the design equations of this ladder are derived as $$\begin{cases} \beta = 1 + \alpha || \beta \\ \text{Radix} = 1 + (\beta/\alpha) \end{cases}$$ (7) where operator || is defined as x||y = (xy/(x+y)). Due to the series connection of the capacitors, the equivalent capacitance is decreased, which reduces the DAC settling time Fig. 9. LMS calibration loop. (a) Calibration with ramp input; (b) calibration with sine wave input. and the total input capacitance. The traditional tradeoff between matching property and total input capacitance is removed since it does not depend on reducing the unit capacitance size. The total input capacitance of the proposed ladder is no longer dependent on the number of ADC bits, and is calculated as $$C_{\rm in} = [1 + 2 \cdot (\alpha || \beta)] \cdot C_u. \tag{8}$$ One potential issue with this ladder structure is the vulnerability to the parasitic capacitance due to interconnects or the capacitor itself, especially when the capacitor is implemented as a low-cost metal—oxide—metal (MOM) finger capacitor available in a standard digital CMOS process instead of a higher quality metal—insulator—metal (MIM) capacitor. In the circuit layout, three different sizes of MOM capacitors with almost identical finger length are instantiated in a three-row array while dummy capacitors are placed on each side for matching purposes. The extra capacitance introduced at the floating nodes can change the effective radix value if the parasitic capacitance is not negligible to the capacitors in the ladder. In this prototype, the unit capacitance is set at the minimum possible value and MOM capacitors have nonnegligible fringing capacitances, which necessitates a new design equation including the parasitic capacitance $p_1 \cdot C_u$ and $p_2 \cdot C_u$ denoted in Fig. 8(b). $$\begin{cases} \beta = 1 + \alpha ||\beta' + p_1 \\ \text{Radix} = 1 + (\beta'/\alpha) \\ \beta' = \beta + p_2. \end{cases}$$ (9) By solving (7) and (9), one can show that the new ratios ( $\alpha_{\rm mod}$ and $\beta_{\rm mod}$ ) should be modified according to the following relations with the original ones ( $\alpha_{\rm org}$ and $\beta_{\rm org}$ ): $$\begin{cases} \alpha_{\text{mod}} = (1 + p_1) \cdot \alpha_{\text{org}} \\ \beta_{\text{mod}} = (1 + p_1)\beta_{\text{org}} - p_2. \end{cases}$$ (10) In this design, a standard capacitor model with conventional EDA extraction tools was used for estimating the parasitic ca- pacitances and it is accurate enough at this level of ADC resolution. Combining the proposed ladder structure in a passive bottom plate sampling network, the input bandwidth achieves >4 GHz with a relatively smaller sampling switch because of the small total input capacitance of 90 fF. Note that the input signal generator has 50- $\Omega$ impedance. # D. Digital Calibration Scheme As the systematic error is accounted for in the modified design equation, the ADC is still vulnerable to random error, such as capacitor mismatch and parasitic variation. These random errors can change the effective radix from MSB to LSB, and thus reduce the linearity of the ADC. Similar to the gain error of a residue amplifier in a pipeline ADC, the digital combining weights need correction by estimating the real gain [16]. In this prototype, a foreground digital calibration scheme was developed to correct the combining weights and is currently implemented off-chip. The approach was to inject a known input signal to the ADC, and use the converted outputs with initial combining weights to reconstruct the input signal. By using the reconstructed signal as the reference for an LMS loop, the combining weights can be adapted to the real values. Alternatively, the combining weights can be directly calculated through matrix operation using the orthogonality principle. The reason for using an LMS loop is to reduce the algorithmic complexity to enable a potential on-chip integration. Shown in Fig. 9(a), a ramp signal that spans over the fullswing range is injected as the known signal. The reconstruction of the reference signal is done through a best linear curve fitting of the the ADC outputs with initial guess of the combining weights. Next, the same ADC output code words are fed into an adaptive finite impulse response (FIR) filter to converge the real combining weights. The simulation results showed the quantization error can be improved after calibration using several hundred samples. Alternatively, a sine wave of a certain frequency and full-swing amplitude can be used as the prior information as illustrated in Fig. 9(b). By using a fast Fourier transform (FFT) processor, the sine wave is reconstructed by extracting its amplitude, phase, and offset at the fundamental frequency. A benefit of using a sine wave rather than a ramp is the potentially easier on-chip implementation with high linearity so that the digital calibration can be turned into an on-chip self-calibration scheme and extended to higher ADC resolution. ## E. Variable Duty-Cycled Clock There are two criteria for the global sampling clock. First, a variable duty-cycle clock is required to adjust the time allocation between tracking and conversion phases for testing purpose. Second, a low clock jitter is necessary to directly sample RF signals. In fact, the RMS jitter should be on the order of picoseconds using worst-case case analysis to support the subsampling capability. The clock generation is illustrated in Fig. 10. It uses two sinusoidal waves generated off-chip with a tunable phase skew between them. The waveforms are then regenerated on chip and combined with an AND gate. The phase skew determines the duty cycle of the clock source. Another 180° phase-shifted clock is achieved by simply inverting the two sinusoidal waves Fig. 10. Variable duty-cycled clock generation. and going through the same combination logic. Special attention was paid in both logic and layout level to ensure the exact 180° phase shift. Any phase imbalance causes extra distortion or requires additional calibration. Finally, the clock jitter is minimized by careful layout, a dedicated power domain, and a clean clock source with extra bandpass filtering. In addition, the edge rate of the sampling clock should be high enough to reduce the jitter, which results in extra power dissipation in the large-sized buffers. The jitter due to intrinsic noise of the logic gates is analyzed and simulated to ensure that it is well below the specification. ## F. High-Speed Digital Logic The speed of the entire asynchronous and switching logic is also critical to the speed of the conversion rate. Therefore, all the digital logic in the critical path is custom dynamic logic and optimized using logical efforts [17] as well as careful layout. The dynamic logic uses a weak keeper transistor to avoid charge leakage and enhance noise invulnerability. Moreover, the dynamic registers are designed for minimal clock loading as these are driven by the asynchronous logic. Note that the pulse duration is adjusted by a variable MOS resistor operated deep in the triode region in order to explore the tradeoff between conversion speed and dynamic error. The less critical digital blocks, such as bit caches and the SRAM controller, are made from standard cells provided by the foundry to save the design time. Nevertheless, the timing constraint between the ADC and testing SRAM is still tight, which requires careful design of the interface circuitry. # IV. MEASURED RESULTS The prototype ADC was fabricated in a 1.2-V $0.13-\mu m$ six-metal one-poly digital CMOS process. Chip-on-board packaging was used on two versions of PCB designs to measure ADC performance below and above Nyquist frequency (above Nyquist to investigate the use of subsampling). A microphotograph is shown in Fig. 11. The total chip size measures $1.7 \times 1.4 \text{ mm}^2$ , while each ADC occupies Fig. 11. Die micrograph. Fig. 12. DNL and INL before and after combining weights calibration. only 250 $\times$ 240 $\mu \mathrm{m}^2$ , which reduces the overhead of time interleaving. The static performance is characterized through differential nonlinearity (DNL) and integral nonlinearity (INL) measurement. Shown in Fig. 12, DNL and INL improve from over 1 LSB to within half LSB after combining weights calibration as described in Section III-D. It is equivalent to 2-dB signal-to-noise-and-distortion ratio (SNDR) improvement, which implies the random error at the 6-bit level is not significant. The dynamic performance measurements [Fig. 13(a)] show that the effective number of bits (ENOB) of a single ADC scales from 5.3 bits at 300 MS/s to 3.7 bits at 500 MS/s, demonstrating the straightforward tradeoff between ENOB and conversion rate which is inherent to the proposed ADC architecture. In Fig. 13(b), the dynamic performance is further explored using RF input above Nyquist ranging from 3 to 5 GHz, showing that the SNDR remains above 30 dB even with an input frequency over 4 GHz. Fig. 14 shows the performance of time-interleaving two of the ADCs to achieve 600-MS/s sampling rate at twice the power and area. Off-chip digital subtraction of each ADC offset removes spurious tones, improving the SNDR by 0.7 dB. Note that there is little reduction of SNDR at lower frequency, but as the input frequency increases above 300 MHz, the clock skew between paths yields a several decibel SNDR reduction. To prove this, the clock skew is extracted through a Hilbert transformer and then compensated by digital interpolation. The results (Fig. 15) show that spurious-free dynamic range (SFDR) improves 11 dB and SNDR improves 2 dB. The total power consumption, excluding SRAM and IO pads, is 5.3 mW while the analog, digital, and clock sections consume 1.2 mW, 3.2 mW, and 0.9 mW, respectively. The performance of the chip is summarized in Table I. #### V. CONCLUSION As a medium-resolution ADC is not limited by KT/C noise, it generally benefits from technology scaling. This also applies Fig. 13. Measured SNDR versus $f_s$ and $f_{in}$ for single ADC (a) below and (b) above Nyquist frequency. Fig. 14. (a) Measured SNDR versus $f_s$ and $f_{in}$ for time-interleaved ADC and (b) its FFT spectrum measured at 159 MHz input. Fig. 15. FFT spectrum before and after clock skew calibration. to the proposed asynchronous ADC architecture, since most circuits are open-loop and digitally operated while only limited by capacitor matching accuracy. A first-order power and speed analysis for technology scaling is explored assuming a constant field scaling, i.e., dimension of transistors and supply voltage scales down by 1/S. The conversion time is mainly dominated by the signal tracking time, comparator speed, and digital propagation delay. The value of the capacitor array is assumed fixed to preserve the matching property, and the on-resistance of a MOS switch is deliberately scaled down by fixing W. Therefore, the tracking time constant $(R_{\rm on}C_s)$ , comparator bandwidth $(\propto f_T$ of a transistor), and digital gate delay (CV/I) all scale down by 1/S when including velocity saturation. From a power perspective, if the overdrive voltage and W/L are assumed fixed, the analog power scales down as supply voltage. The digital switching power $(fCV^2)$ scales down by $1/S^2$ [18], while that of the clock network scales less due to the relatively larger sized sampling switch. Table II summarizes the scaling trend, which predicts the figure-of-merit (FOM), defined in (11), improves at least $1/S^2$ and thus becomes even more attractive as the technology is scaled. $$FOM = \frac{Power}{2^{ENOB} \cdot f_s}.$$ (11) Fig. 16 shows a common usage of ADC topologies in terms of resolution and sampling rate. Traditionally, flash-type ADCs, including subranging and folding converters, dominate over the | TABLE I | |----------------------------| | PERFORMANCE SUMMARY (25°C) | | Technology | | 0.13-μm 6M1P Digital CMOS | | | |---------------------|---------|------------------------------------------------------------------------|---------------------|--| | Package | | Chip on board | | | | Resolution | | 6 bit | | | | Sampling Rate | | 300-500 MS/s for single ADC (600 MS/s-1 GS/s for time-interleaved one) | | | | Supply Voltage | | 1.2 V | | | | Input 3dB Bandwidth | | >4 GHz | | | | Peak SNDR | | 34 dB at 600 MS/s | | | | FOM | | 0.22 pJ/conversion step | | | | | Analog | 1.2 mW | | | | Power | Digital | 3.2 mW | Total Power: 5.3 mW | | | | Clock | 0.9 mW | | | TABLE II TECHNOLOGY SCALING ON THE PROPOSED ADC ARCHITECTURE | $T_{track}$ | $T_{comp}$ | $T_{dig}$ | |---------------|----------------------------|--------------------| | $RC \sim 1/S$ | $1/f_T \sim 1/S$ | $RC \sim 1/S$ | | $P_{analog}$ | $P_{clk}$ | $P_{dig}$ | | $IV \sim 1/S$ | $fCV^2 \sim [1/S - 1/S^2]$ | $fCV^2 \sim 1/S^2$ | high-speed and medium-resolution regime, while consuming redundant power. To save the extra power consumption, an SAR converter is a better solution; however, it is normally used for high to medium resolution while limited to speeds in the tens of MHz to kHz range. By using asynchronous processing and scaled technology, the SA algorithm implementation has been increased to hundreds of MHz with the potential of even improving more in the future. The proposed architecture can be easily extended to higher ADC resolution before being limited by KT/C noise. In fact, the FOM improves as the ADC bits increase, since the speed scales down proportionally while quantization levels $(2^{\rm ENOB})$ scale up exponentially. Finally, the greater than 4-GHz input bandwidth allows the potential to time-interleave close to 8-GHz sampling rate for a Nyquist ADC. In summary, an asynchronous ADC architecture has been demonstrated to achieve a high power efficiency for a high-speed and medium-resolution converter. While the asynchronous processing concept can be incorporated with other ADC topologies, the first prototype has been realized with an SA architecture to achieve a FOM of 0.22 pJ/conversion step in 0.13- $\mu$ m CMOS process. The FOM of the proposed architecture is anticipated to further improve with continued technology scaling. Fig. 16. Future role of the SA architecture. #### ACKNOWLEDGMENT The authors would like to thank Taiyo Yuden and Colby Instruments for donating parts, and Y. Chiu, L. Tee, B. Tsang, and J. Vanderhaegen for providing comments. #### REFERENCES - M. S. W. Chen and R. W. Brodersen, "A subsampling UWB radio architecture by analytic signaling," in *Proc. ICASSP*, May 2004, vol. 4, pp. 533–536. - [2] P. Scholtens and M. Vertregt, "A 6 bit 1.6 GS/s flash ADC in 0.18 μm CMOS using averaging termination," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2002, vol. 1, pp. 168–457. - [3] X. Jiang, Z. Wang, and F. Chang, "A 2 GS/s 6 b ADC in 0.18 μm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2003, vol. 1, pp. 322–497. - [4] R. Taft et al., "A 1.8-V 1.6-GSample/s 8-b self-calibrating folding ADC with 7.26 ENOB at Nyquist frequency," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, vol. 1, pp. 252–256. - [5] P. Figueiredo et al., "A 90nm CMOS 1.2V 6b 1GS/s two-step subranging ADC," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 568–569. - [6] G. Van der Plas, S. Decoutere, and S. Donnay, "A 0.16 pJ/conversion-step 2.5 mW 1.25 GS/S 4b ADC in a 90 nm digital CMOS process," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 566–567. - [7] D. Draxelmayr, "A 6b 600 MHz 10 mW ADC arrary in digital 90 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 264–527. - [9] B. Murmann and B. Boser, "A 12-bit 75-MS/s pipelined ADC using open-loop residue amplification," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2040–2050, Dec. 2003. - [10] H. J. M. Veendrick, "The behavior of flip-flops used as synchronizers and prediction of their failure rate," *IEEE J. Solid-State Circuits*, vol. SC-15, no. 2, pp. 169–176, Apr. 1980. - [11] W. Black and D. Hodges, "Time interleaved converter arrays," *IEEE J. Solid-State Circuits*, vol. SC-15, no. 6, pp. 1022–1029, Dec. 1980. - [12] G. Yin, F. Eynde, and W. Sansen, "A high-speed CMOS comparator with 8-b resolution," *IEEE J. Solid-State Circuits*, vol. 27, no. 2, pp. 208–211, Feb. 1992. - [13] K. Bult and A. Buchwald, "An embedded 240-mW 10-b 50 MS/s CMOS ADC in 1-mm<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 32, no. 12, pp. 1887–1895, Dec. 1997. - [14] F. Kuttner, "A 1.2 V 10b 20 MSample/s nonbinary successive approximation ADC in 0.13 μm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2002, vol. 1, pp. 176–177. - [15] A. Hastings, The Art of Analog Layout. Englewood Cliffs, NJ: Prentice-Hall, 2001. - [16] A. N. Karanicolas, H. Lee, and K. L. Bacrania, "A 15-b 1-Msample/s digitally self-calibrated pipeline ADC," *IEEE J. Solid-State Circuits*, vol. 28, no. 12, pp. 1207–1215, Dec. 1993. - [17] I. Sutherland, R. Sproull, and D. Harris, *Logical Effort: Designing Fast CMOS Circuits*. San Francisco, CA: Morgan Kaufmann, 1999. - [18] J. M. Rabaey et al., Digital Integrated Circuits: A Design Perspective. Upper Saddle River, NJ: Pearson Education, 2003. **Shuo-Wei Michael Chen** received the B.S. degree from National Taiwan University, Taipei, Taiwan, R.O.C., in 1998 and the M.S. degree from the University of California, Berkeley (UC Berkeley), in 2002, both in electrical engineering. He is currently working toward the Ph.D. degree at UC Berkeley, where he is a member of the Berkeley Wireless Research Center. His current research interests include low-power and high-speed mixed signal circuits, ultra-wideband system design, digital baseband and digital ASIC implementation. Mr. Chen received an honourable mention in Asian Pacific Mathematics Olympiad in 1994. He was the recipient of UC Regents' Fellowship at Berkeley in 2000 and Analog Devices Outstanding Student Award for recognition in IC design in 2006. **Robert W. Brodersen** (M'76–SM'81–F'82) received the Ph.D. degree from the Massachusetts Institute of Technology, Cambridge, in 1972. He was then with the Central Research Laboratory at Texas Instruments for three years. Following that, he joined the Electrical Engineering and Computer Science faculty of the University of California at Berkeley, where he is now the John Whinnery Chair Professor and Co-Scientific Director of the Berkeley Wireless Research Center. His research is focused in the areas of low-power design and wireless communications and the CAD tools necessary to support these activities. Prof. Brodersen has won best paper awards for a number of journal and conference papers in the areas of integrated circuit design, CAD, and communications, including in 1979 the W.G. Baker Award. In 1983, he was co-recipient of the IEEE Morris Liebmann Award. He received the Technical Achievement Awards in the IEEE Circuits and Systems Society in 1986 and in 1991 from the Signal Processing Society. In 1988, he was elected to be a member of the National Academy of Engineering. In 1996, he received the IEEE Solid-State Circuits Society Award and in 1999 received an honorary doctorate from the University of Lund in Sweden. In 2000, he received a Millennium Award from the Circuits and Systems Society, the Golden Jubilee Award from the IEEE. In 2001, he received the Lewis Winner Award for outstanding paper at the IEEE International Solid-State Circuits Conference and in 2003 received an award for being one of the top ten contributors over the 50 years of that conference. ## FURTHER READING Click any one of the following links to be taken to a website which contains the following documents. The following are some recent examples of Asynchronous ADC activity off the web. 6 bit Asynchronous December 2006 Asynchronous ADC In CAD Mentor Graphics Asynchronous Data Processing System ASYNCHRONOUS PARALLEL RESISTORLESS ADC Flash Asynchronous Analog-to-Digital Converter Novel Asynchronous ADC Architecture LEVEL BASED SAMPLING FOR ENERGY CONSERVATION IN LARGE NETWORKS A Level-Crossing Flash Asynchronous Analog-to-Digital Converter Weight functions for signal reconstruction based on level crossings Adaptive Rate Filtering Technique Based on the Level Crossing Sampling Adaptive Level-Crossing Sampling Based DSP Systems A 0.8 V Asynchronous ADC for Energy Constrained Sensing Applications Spline-based signal reconstruction algorithm from multiple level crossing samples A New Class of Asynchronous Analog-to-Digital Converters Effects of time quantization and noise in level crossing sampling stabilization Here is some more background information on Analog to Digital converters. A 1-GS/s 6-bit 6.7-mW ADC A Study of Folding and Interpolating ADC Folding\_ADCs\_Tutorials high speed ADC design Investigation of a Parallel Resistorless ADC Here are some patents on the subject. 4,291,299 Analog to digital converter using timed 4,352,999 Zero crossing comparators with threshold 4,544,914 Asynchronously controllable successive approximation 4,558,348 Digital video signal processing system using 5,001,364 Threshold crossing detector 5,315,284 Asynchronous digital threshold detector 5,945,934 Tracking analog to digital converter 6,020,840 Method and apparatus for representing waveform 6,492,929 Analogue to digital converter and method 6,501,412 Analog to digital converter including a quantizers 6,667,707 Analog to digital converter with asynchronous ability 6,720,901 Interpolation circuit having a conversio2 6,850,180 SelfTimed ADC 6,965,338 Cascade A D converter 7,133,791 Two mean level crossing time interval 11.19.10\_1.20PM dsauersanjose@aol.com Don Sauer