# Design Considerations for a Sub-25µW PLL with Multi-Phase Output and 1-450MHz Tuning Range

Parikha Mehrotra, Baibhab Chatterjee, Shovan Maity, Shreyas Sen School of ECE, Purdue University, West Lafayette, IN 47906, USA. {pmehrot, bchatte, maity, shreyas}@purdue.edu

Abstract-In this paper, we present the design considerations for a sub-25µW phase-locked loop (PLL) with a wide tuning range and multi-phase outputs, which makes it suitable for applications that involve clock-and-data-recovery with variable data rates, such as broadband body-area-networks. Several architectures for the voltagecontrolled-oscillator (VCO) are analyzed for power and performance, and the considerations for keeping the VCO's low-dropout-regulator (LDO) within the loop and outside the loop are discussed. Power consumption is minimized by keeping the LDO outside the loop, which exempts the error-amplifier (EA) from the bandwidth constraints posed by the PLL. Conforming to the analysis, the PLL is designed and simulated in a standard 65nm CMOS process, and the results show that energy-efficiencies as low as 70fJ/cycle can be achieved with a tuning range of 1-450MHz along with multi-phase outputs with RMS timing jitter of 11.4ps (frequency offset < 100ppm) from a 31-stage split-tuned ring oscillator VCO.

Keywords— low-power, PLL, supply-regulated VCO, split-tuned VCO, multi-phase, variable data-rate, CDR, enhanced PSNR, LDO

#### I. Introduction

#### A. Background and Motivation

As the future of Internet-of-Things (IoT) is moving towards larger, high-bandwidth sensor and body-area-networks (BAN) [1], the need for low-power clock generation is becoming increasingly apparent. Notably, broadband-BAN (BB-BAN) using human communication allows sub-10pJ/bit information transfer for data rates ranging from ~1Mbps to ~100Mbps [1]-[6]. Using digital-friendly architectures, the scaling in data rates could directly be translated into scaling in power consumption, thereby envisioning these receivers to be simultaneously agile and energy-efficient. However, clock-and-datarecovery (CDR) with variable data rates remain one of the primary challenges for such low-power broadband systems, which require generation of a stable multi-phase clock similar to wireline systems [7]-[8] for ensuring that the sampling is performed at the location of maximum eye opening. This paper addresses this problem and presents the analysis and design of a sub-25µW, 31-phase clock generation circuit with a tuning range of 1-450MHz.

#### B. Related Work

Tunable phase-locked-loops (PLL) have previously been demonstrated [9]-[10] that generate clock signals within the locking range of the PLL. However, the tuning ranges exhibited are often small and may not be applicable to high-bandwidth BB-BANs. The multiphase requirement for CDR operation necessitates a phase-interpolator (PI) which should support a large range of frequencies, typically from a few hundreds of kHz to ~100MHz. However, the design of PI is often dependent on making the rise/fall time a significant portion of the total time period [7], and hence the design of PI becomes challenging for scaled process technologies and lower frequencies. On the other hand, voltage controlled ring oscillators (VCO) generate multiple phases with low power consumption due to technology scaling, while supporting a wide range of frequencies. In order to reduce the inherent phase noise of the ring-VCO, or in other words, to meet the requirement of a stable frequency generation, the ring-VCO can either



Fig. 1. Choices for generating scalable multi-phase clock for broadband BAN

be injection locked [11], or integrated in a PLL [9]-[10] as illustrated in Fig. 1. The techniques of injection locking, in spite of showing promise in terms of power consumption, suffer from asymmetric phase generation [12]-[13] which is detrimental in terms of the CDR operation. While techniques to alleviate the issue of phase asymmetry by multi-phase injection have been proposed [14], this requires replication of VCO circuitry, thus increasing the power consumption. Moreover, the delay of the inverters at non-injection points becomes a weak function of the injected frequency, making it an unsuitable solution for symmetric phase-interpolation. As a result, in this paper, we have chosen a ring-VCO based scalable PLL architecture with wide tuning range and analyzed the considerations for a sub-25µW design.

This paper is organized as follows: Section II presents the VCO design choices for achieving high current efficiency, while Section III presents the system-level PLL simulation results for sub- $25\mu W$  performance. Section IV compares the current work with previously reported literature. Finally, in section V, we conclude this paper by summarizing our contributions.

# II. DESIGN CONSIDERATIONS AND CHOICES FOR VCO

## A. System Specifications

Power and phase noise (PN) are two major performance metrics in design of ring-oscillator based PLLs. Since the state-of-the-art BAN application [1] exhibits  $\sim\!10 pJ/bit$  energy efficiency, the allowable energy for the PLL is set at 1pJ/cycle (which would translate to 1pJ/bit for single-data-rate BB-BAN and 0.5pJ/bit for dual-data-rate BB-BAN). This limits the power consumption to 1-450 $\mu$ W in the frequency range of 1-450MHz.

For reliable operation of the PLL in conjunction with a CDR allowing up to 1000 ppm frequency offset [15], a supply voltage variation of  $250 \mu V$  is found to be tolerable from simulations across process corners and temperatures in inverter based ring-VCOs which employ minimum sized transistors in 65nm technology. Assuming that the supply line has 50-80 mV noise in the worst case, this translates to a power supply noise rejection (PSNR) better than -50dB in the frequencies of interest, which can be achieved using a power-optimized LDO. For the PLL to be employed in a CDR for BB-BAN, multiple



Fig. 2. Design choices for the 31-stage VCO and their Performance Comparison in terms of frequency (F), current (I) and current efficiency (F/I).

phases need to be generated. Since PIs suffer from issues discussed earlier, we have used a 31-stage ring-VCO, tapping a phase from each stage to enable finer phase control and consequently, enable proper sampling even with < 0.05UI eye opening in the CDR.

As mentioned in previous section, the PLL is required to operate with a wide frequency range to be able to support CDRs with variable data rates (1-450Mbps). Thus, the bandwidth (BW) of the PLL must scale with data rate. In both the PLL architectures (to be discussed subsequently), BW is limited by the pole frequency of the loop filter, and is easily adjustable by using a transistor in triode region as a variable resistor.

#### B. Architecture exploraion

The VCO is one of the most power intensive and noise sensitive components in the PLL. For this reason, we focus on choosing a VCO topology which minimizes both power consumption and phase noise while maximizing operating frequency range. We consider 3 topologies – a supply-regulated VCO (SR-VCO), a current-starved VCO (CS-VCO) and an RC-delayed VCO (RC-VCO) for which the architectures and their respective performance (frequency, current and their ratio) is shown in Fig. 2(a)-(f).

We first consider the case where frequency tunability of ring-VCO is achieved using supply regulation (SR-VCO). Here, the regulated control voltage from the charge pump feeds the ring oscillator supply and achieves a PN performance of -96dBc/Hz @ 1MHz offset for an output frequency of 300MHz, consuming as low as 0.026pJ/cycle with frequency to current ratio (F/I) ranging from 30-80MHz/ $\mu$ A over 0.4-1V supply voltage.

In the second scenario, the charge pump's control voltage starves the currents in the VCO (CS-VCO) to adjust the frequency. We utilize the aforementioned supply regulation technique (using a digitally controlled reference generator as shown in Fig. 4(c)) as a second degree of freedom to achieve coarse control on frequency output of CS-VCO, hence the name split tuned. Interestingly, there is no benefit in phase noise performance of a 31-stage CS-VCO as the current-starving transistors go out-of saturation, unlike a 3-stage CS-VCO. The corresponding F/I ratios are shown in Fig. 2(f), while the PN response (-91dBc/Hz @ 1MHz offset) is presented in Fig. 3. CS-VCO consumes 0.1pJ/cycle at an (F/I) ratio of 15-30MHz/μA which is 3-4X worse compared to SR-VCO. CS-VCO employs additional current starving

transistors which adds to the parasitic capacitance and degrades the (F/I) ratio.

Lastly, in RC-VCO frequency tunability is achieved by varying the RC delay cell's resistance (controlling gate voltage of a MOSFET in triode region) and by regulating the supply for coarse frequency control (as in CS-VCO). The delay cell at each inverter stage is modulated with change in control voltage from the charge pump. RC-VCO supports slightly lower frequency range compared to SR-VCO due to additional load of the delay cell at each stage of the ring oscillator. Current consumed by SR-VCO and RC-VCO is same as no extra current flows through the delay cell. The PN performance of RC-VCO is -97dBc/Hz @ 1MHz offset at 0.034pJ/cycle with an (F/I) ratio which is 1.3X worse compared to SR-VCO.

From Fig. 2(f) it is evident that SR-VCO has maximum current efficiency (F/I) and is able to achieve 1M-540MHz frequency range with a maximum current of 22uA compared to CS-VCO, which consumes 52uA at 500MHz. RC-VCO, on the other hand, combines the advantages of having a split-tuned architecture (coarse control from supply, fine control from PLL – thus allowing the LDO loop to be decoupled from the main PLL loop), along with having a high current efficiency and good phase noise immunity.



Fig. 3. Phase noise of the three VCOs, along with their current consumption



Fig. 4. Choices of Scalable (1MHz to  $\sim$ 450 MHz) PLL Architecture: (a) SR-VCO based (LDO in loop), (b) RC-VCO based (LDO outside the loop). The phase-frequency detector (PFD) and charge-pump (CP) architectures are similar to [9]. (c) Bias/reference generator (50 mV resolution in the range 0-1V) (d) Variable BW Loop Filter (BW is controlled by the MOSFET in triode) (e) pole locations for architecture I (f) pole locations for architecture II showing that the LDO poles ( $P_4$  and  $P_5$ ) do not pose any additional constraint on the design of the PLL.

Each single-ended ring oscillator topology can be replaced with differential-pairs to leverage lower sensitivity to substrate and supply noise. However, the phase noise performance for the same power consumption and frequency would degrade as compared to single-ended 31-stage ring oscillator [16], as the PN would be a linear function in number of stages, unlike the single ended case. Typically, PN is inversely proportional to the power consumption and increases quadratically with the oscillation frequency. In a scalable VCO, power consumption increases with frequency through a control voltage, and hence the overall PN remains almost constant across frequencies, which in our case, is around -95dBc /Hz @ 1MHz offset, checked at 50, 100 and 300MHz oscillation frequencies, as shown in Fig. 3.

# III. PLL DESIGN AND CLOSED LOOP PERFORMANCE

## A. LDO design considerations in PLL based on SR-VCO

We consider two possible design choices for a variable frequency range (1-450MHz) low power low noise PLL. In the first design as shown in Fig. 4(a), SR-VCO based PLL is used for its benefits highlighted in Section II. An LDO is used in conjunction with the SR-VCO for improving power supply rejection and the control voltage from charge pump is provided to the LDO input. The LDO is part of the PLL, which may compromise the stability as two additional poles (P4 and P5) are introduced in the loop. Keeping this in mind, P4 and P5 are kept much larger than the PLL BW so that the PLL dynamics remain unaffected by the addition of the LDO. P4 is contributed by the error amplifier (EA), while P5 results from the decoupling capacitor at the LDO output. A trade-off exists between power and PSNR-peaking performance for the LDO, depending on the relative position of P4 and P5 [9],[17]-[18].

The PLL BW (given by loop filter) is kept between 50 kHz-20 MHz for 1 M-450 MHz PLL operation, and hence the LDO poles need to be designed at > 200 MHz in the worst case. Keeping  $P_5$  dominant as compared to  $P_4$  is advantageous in terms of PSNR (peaking) performance but is extremely power intensive [9]. Thus, the EA pole is chosen to be dominant at 200 MHz. Accounting for an observed 70 X variation of  $P_5$  with load current,  $P_5$  needs to be designed to be 700 X away from  $P_4$  (at 140 GHz) for maximum operating frequency, so that even at the minimum frequency of operation,  $P_5$  is still 10 X away from  $P_4$  for maintaining stability. However, this 700 X separation degrades the PSNR by more than 40 dB through peaking. In order to resolve both the issues of power consumption and insufficient PSNR, we need to

employ EA slices (or have adjustable biasing to control BW in the EA) to scale  $P_4$  from 0.5-200MHz with PLL BW. The worst case power consumption in the EA (for 200 MHz BW) is > 40 $\mu$ W, which makes the overall PLL power worse than 70 $\mu$ W.

# B. PLL based on split-tuned RC-VCO

Alternatively, the LDO can be decoupled from the PLL loop using split tuned RC-VCO architecture as shown in Fig. 4(b). This allows the LDO bandwidth to be independent of the PLL bandwidth and hence enables  $P_4$  to be set at lower frequencies, ensuring lower power consumption. The PSNR for the low-BW LDO varies from -58dB to -52dB for 10nA to 22 $\mu A$  current drawn by RC-VCO to achieve frequency range of 1-450MHz, as shown in Fig. 5.  $P_5$  depends on the load current and the decoupling capacitor at the LDO output, and varies from 1.1MHz to 70MHz for a high density MOSCAP of 30pF occupying  $50\times50~\mu m^2$  area.

 $P_4$  is designed to be at 110kHz (when PLL operating frequency is 1MHz, so that  $P_5$  is 10X away), consuming 114nW power. Thus, this architecture achieves 350X improvement in EA power, as compared to  $P_4$  being at 200MHz (SR-VCO based PLL) which consumed 40 $\mu$ W. For operation at 450MHz,  $P_5$  shifts to 70MHz due to variation in load current, degrading the PSNR by 25dB through peaking (Fig. 5).



Fig. 5. PSNR of the LDO in RC-VCO based PLL, with maximum and minimum load current (corresponding to 450MHz/1MHz PLL output) with power scaling in error amplifier through adjustable biasing in the EA.



Fig. 6. Time-domain settling and offset of the closed loop at 300 MHz

However, by adjusting the EA bias (increasing power consumption from 114nW to 498nW), P<sub>4</sub> can be moved from 110kHz to 2.3MHz, thus improving the PSNR by 20dB while ensuring stability. The bias voltage is generated using the circuit shown in Fig. 4(c), ensuring that all transistors in the EA remain in saturation for the applied range of bias voltages. This technique helps in optimizing the loop bandwidths while maintaining stability over a large tuning range by controlling the reference voltage of the LDO separately.

## IV. RESULTS AND DISCUSSION

A split-tuned PLL with RC-VCO was designed in 65nm process using minimum sized transistors. The closed loop PLL response is shown in Fig. 6 with a 80MHz, 200mV<sub>pp</sub> noise at the supply (80MHz is the frequency of worst PSNR from Fig. 5) where the frequency settles to within 100ppm of the reference in 600µs by properly setting the coarse control voltage. A ppm offset of 66ppm is observed in the frequency that is easily tolerable by a state-of-the-art CDR [15]. The equivalent RMS timing jitter is 11.4ps (3420ppm). Due to accumulated fluctuations in phase, the ppm timing jitter is higher than the ppm fluctuation in the frequency.

Table I shows the power consumed by each circuit component at a nominal frequency of 300MHz, with RC-VCO consuming 62% of the total power which is 21.12µW. Table II compares the performance of the proposed work with state-of-the-art literature, while Fig. 7 compares the energy-efficiencies with other reported works. The 33X better energy-efficiency warrants for a practical implementation which would be a future extension of this work. Our work achieves a better pJ/cycle number because of (1) no additional spur-cancelation circuit (for narrowband applications such as [22]) is required because the PLL output for a BB-BAN will not be subjected to mixing and its non-idealities, (2) no active circuit for adaptive BW control (as no real-time self-adjustment is needed for BB-BAN) and no integral/proportional



Fig. 7. Energy-efficiency (in pJ/cycle) of this work and previous literature

TABLE I. POWER CONSUMPTION BY PLL COMPONENTS @ 300MHZ (IN µW)

| RC-VCO | Divide By N | PFD   | CP    | EA  | Total |
|--------|-------------|-------|-------|-----|-------|
| 13.17  | 5.917       | 1.437 | 0.098 | 0.5 | 21.12 |

charge pump as in [10], [20] and [21] is employed, (3) VCO and LDO, which are two of the most power-intensive components in a PLL, are optimized for a low ( $f/f_T$ ) application, (4) no replica-feedback (as in [9]) and no additional noise cancellation circuit (as in [19]) is needed as LDO provides -50dB PSNR for the frequency of operation. The wide tuning range is achieved because of the scalable VCO and adjustable loop BW.

#### V. CONCLUSION

By analyzing the three architectures of VCO for their power and noise performance, we have shown that RC-VCO is current efficient, and allows decoupling of LDO from the PLL main loop, thus reducing power consumption with a tolerable LDO PSNR performance of 50dB across variable load currents. The closed loop RC-VCO PLL design achieves wide tunability in the range 1-450MHz with an energy efficiency of 70fJ/cycle (which would correspond to 35fJ/bit for broadband dual-data rate BAN) at 300MHz and an RMS jitter/cycle of 11.4ps (3420ppm).

# VI. ACKNOWLEDGEMENTS

This work was supported in part by the National Science Foundation CRII Award under Grant CNS 1657455, in part by the National Science Foundation Career Award under Grant 1944602, and in part by the Air Force Office of Scientific Research YIP Award under Grant FA9550-17-1-0450.

TABLE II. PERFORMANCE COMPARISON OF THE PROPOSED WORK WITH STATE-OF-THE-ART LITERATURE

| Specification                         | JSSC'09<br>[9]*     | ISSCC'14<br>[19]*   | ISSCC'16<br>[20]*      | JSSC'17<br>[21]* | TVLSI'18<br>[22]* | ISSCC'19<br>[23]* | This Work**          |
|---------------------------------------|---------------------|---------------------|------------------------|------------------|-------------------|-------------------|----------------------|
| Technology (nm)<br>Supply Voltage (V) | 180<br>1.8          | 22<br>1.2           | 14<br>0.6-0.95         | 65<br>0.6-1.2    | 65<br>1-1.5       | 65<br>1           | 65<br>1              |
| Operating Freq. (Hz)                  | 500M-2.5G           | 25M-1.6G            | 150M-5G                | 400M-2.6G        | 1G                | 24.5-29.5G        | 1M-450M              |
| Nominal Freq. (GHz)                   | 1.5                 | 1.6                 | 0.8-4                  | 0.4-2.6          | 1                 | 0.103             | 0.3                  |
| Power (mW)#                           | 3.9                 | 3.1                 | 0.43-2.56              | 0.16-2.38        | 0.32              | 10.2              | 0.022                |
| Energy Eff. (pJ/cycle)#               | 2.6                 | 1.92                | 0.52-0.64              | 0.4-0.9          | 0.32              | 99.02             | 0.070                |
| RMS Jitter (ps)#                      | $1.9^{+}$           | 5.83 <sup>+</sup>   | 5.77-1.26 <sup>+</sup> | 33.5-3.71+       | $3.2^{+}$         | 0.071             | 11.4++               |
| RMS Jitter/Cyle (ppm)                 | $2850^{+}$          | $9328^{+}$          | 4616-5040 <sup>+</sup> | 13400-9646+      | 3200 <sup>+</sup> | 7.313             | 3420++               |
| Tunability                            | 5X                  | 64X                 | 33X                    | 6.5X             | 1.5X              | 1.2X              | 450X                 |
| FOM (dB)                              | -220.3 <sup>+</sup> | -219.7 <sup>+</sup> | -226.8 to -            | -217.5 to -      | -234.8+           | -252.9            | -235.4 <sup>++</sup> |

<sup>\*:</sup> measured; \*\*: simulated; \*: jitter with white noise; \*+: jitter with injected noise of 200mV<sub>pp</sub> at freq. of worst PSNR (80MHz); #: at the Nominal Frequency,  $FOM = 10log_{10} \left[ \left( \frac{RMS \, Jitter}{1 \, s} \right)^2 \left( \frac{Power}{1 \, mW} \right) \right]$ (lower value is better)

#### REFERENCES

- [1] B. Chatterjee *et al.*, "Context-Aware Intelligence in Resource-Constrained IoT Nodes: Opportunities and Challenges," *IEEE Design & Test (D&T)*, 2019.
- [2] S. Maity et al., "BodyWire: A 6.3-pJ/b 30-Mb/s -30-dB SIR-Tolerant Broadband Interference-Robust Human Body Communication Transceiver Using Time Domain Interference Rejection," *IEEE J. Solid-State Circuits (JSSC)*, 2019.
- [3] H. Cho et al. "21.1 A 79pJ/b 80Mb/s full-duplex transceiver and a 42.5 uW 100kb/s super-regenerative transceiver for body channel communication," *International Solid-State Circuits Conference (ISSCC)*, 2015.
- [4] W. Saadeh et al."A 1.1-mW Ground Effect-Resilient Body-Coupled Communication Transceiver With Pseudo OFDM for Head and Body Area Network," *IEEE J. Solid-State Circuits (JSSC)*, 2017.
- [5] S. Sen, "SocialHBC: Social Networking and Secure Authentication using Interference-Robust Human Body Communication," *Scientific Reports*, 9, Article 4160, 2019.
- [6] D. Das et al., "Enabling Covert Body-Area Network using Electro-Quasistatic Human Body Communication," IEEE/ACM Design, Automation and Test in Europe, 2017.
- [7] M. Mansuri et al., "A Scalable 0.128-1 Tb/s, 0.8-2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS," IEEE J. Solid-State Circuits (JSSC), 2013.
- [8] T. Musah et al., "A 4–32 Gb/s Bidirectional Link With 3-Tap FFE/6-Tap DFE and Collaborative CDR in 22 nm CMOS," IEEE J. Solid-State Circuits (JSSC), 2014.
- [9] A. Arakali, S. Gondi, and P. Hanumolu. "Low-Power Supply-Regulation Techniques for Ring Oscillators in Phase-Locked Loops Using a Split-Tuned Architecture." *IEEE J. Solid-State Circuits (JSSC)*, 2009.
- [10] S. Sidiropoulos et al.. "Adaptive Bandwidth DLLs and PLLs Using Regulated Supply CMOS Buffers." International Symposium on VLSI Circuits (VLSIC), 2000.
- [11] B. Razavi, "A study of injection locking and pulling in oscillators," *IEEE J. Solid-State Circuits (JSSC)*, 2004.
- [12] Y. Huang and S. Liu, "A 2.4-GHz Subharmonically Injection-Locked PLL With Self-Calibrated Injection Timing," *IEEE J. Solid-State Circuits* (JSSC), 2013.
- [13] J. Pandey and B. P. Otis, "A Sub-100μW MICS/ISM Band Transmitter Based on Injection-Locking and Frequency Multiplication," in *IEEE J. Solid-State Circuits (JSSC)*, 2011.

- [14] J. Chien and L. Lu, "Analysis and Design of Wideband Injection-Locked Ring Oscillators With Multiple-Input Injection," in *IEEE J. Solid-State Circuits (JSSC)*, 2007.
- [15] H. Yamaguchi et al., "A 5Gb/s transceiver with an ADC-based feedforward CDR and CMA adaptive equalizer in 65nm CMOS," International Symposium on Low Power Electronics and Design (ISSCC), 2010.
- [16] A. Hajimiri, S. Limotyrakis and T. H. Lee, "Jitter and phase noise in ring oscillators," *IEEE J. Solid-State Circuits (JSSC)*, 1999.
- [17] E. Alon et al., "Replica compensated linear regulators for supplyregulated phase-locked loops," IEEE J. Solid-State Circuits (JSSC), 2006.
- [18] R. J. Milliken, J. Silva-Martinez and E. Sanchez-Sinencio, "Full On-Chip CMOS Low-Dropout Voltage Regulator," IEEE Transactions on Circuits and Systems (TCAS I), 2007.
- [19] J. Liu et al., "A 0.012mm<sup>2</sup> 3.1mW bang-bang digital fractional-N PLL with a power-supply-noise cancellation technique and a walking-one-phase-selection fractional frequency divider," *International Solid-State Circuits Conference (ISSCC)*, 2014.
- [20] K. J. Shen et al., "A 0.17-to-3.5mW 0.15-to-5GHz SoC PLL with 15dB built-in supply noise rejection and self-bandwidth control in 14nm CMOS," International Solid-State Circuits Conference (ISSCC), 2016.
- [21] J. Zhu et al., "A 0.0021mm<sup>2</sup> 1.82 mW 2.2 GHz PLL Using Time-Based Integral Control in 65 nm CMOS," *IEEE J. Solid-State Circuits (JSSC)*, 2017.
- [22] P. Agarwal et al., "Zero-Power Feed-Forward Spur Cancelation for Supply-Regulated CMOS Ring PLLs," IEEE Trans. Very Large Scale Integr. Syst. (TVLSI), 2018.
- [23] Z. Yang et al., "A 25.4-to-29.5 GHz 10.2 mW Isolated Sub-sampling PLL Achieving -252.9 dB Jitter-power FoM and -63dBc Reference Spur," International Solid-State Circuits Conference (ISSCC), 2019.