# 2012 VLSI Circuits Short Course Program Honolulu I

# **Designing in Advanced Cmos Technologies**

Tuesday, June 12 8:30 am. – 5:30 p.m.

Organizers/Chairs: Andreia Cathelin, STMicroelectronics

Masato Motomura, Hokkaido University

8:30 a.m. Introduction

Andreia Cathelin, STMicroelectronics

8:45 a.m. Bulk CMOS Scaling to the End of the Roadmap

Tsu-Jae King Liu, UC Berkeley

9:45 a.m. Technology Boosters for LP Design Platforms in 28/20nm

Thomas Skotnicki, STMicroelectronics

10:45 a.m. Break

11:00 a.m. Challenges and Solutions Paths in Scaling SRAM

Fatih Hamzaoglu, Intel

12:00 p.m. Lunch

1:00 p.m . The Mixed-Signal Design Challenges in the Advanced Technology Nodes

Fu-Lung Hsueh, TSMC

2:00 p.m. Power-aware Design in 28nm Generation and Beyond-Facts, Myths, and Misunderstandings

Youngsoo Shin, Kaist

3:00 p.m. Break

3:15 p.m. Advanced CAD Methodologies for Custom Design at Advanced Process Nodes

David White, Cadence R&D

4:15 p.m. Round Table Discussion

All Speakers

# 2012 VLSI Circuits Short Course Program Honolulu II

# Ultra Low Power SoC Design for Future Mobile Systems

Tuesday, June 12 8:30 a.m. – 5:00 p.m.

Organizers/Chairs: Alice Wang, MediaTek

Koichi Nose, Renesas Electronics Corp.

8:30 a.m. Introduction

Alice Wang, MediaTek

8:45 a.m. Vision of Future Mobile Systems

Jan Rabaey, UC Berkeley

9:45 a.m. Low-Power Logic Design Technologies

Masaya Sumita, Panasonic

10:45 a.m. Break

11:00 a.m. Memory Architecture and Systems for Mobile Systems

Kyomin Sohn

12:00 p.m. Lunch

1:00 p.mm. Wireless Communication

Gangadhar Burra, Texas Instruments

2:00 p.m. Interconnect and Wireline Communication

Jared Zerbe, Rambus

3:00 p.m. Break

3:15 p.m. Case-Study on Low-Power Mobile System

John Redmond, Broadcom

4:15 p.m. Round Table Discussion

**All Speakers** 

# JOINT TECHNOLOGY/CIRCUITS RUMP SESSION Tuesday, June 12 8:00 p.m. – 10:00 p.m.

Organizers:

Circuits Technology

N. Lu, Etron T. Skotnicki, STMicroelectronics

M. Bauer, Micron K. Miyashita, Toshiba

RJ1: Scaling Challenges Beyond 1x nm DRAM and NAND Flash

Moderator: N. Lu, Etron

R. Shrivastava, SanDisk

The combined revenues of DRAM and NAND Flash approached \$54 Billion in 2010. This is expected to continue to grow in the coming years. Emerging silicon and package technologies will further drive lower cost and new applications. The difficulty of scaling and developing new technologies and investments to build new factories is increasing at about the same rate as the memory bit growth in the world. At the same time, the industry is becoming aware that we are closing in on physical and electrical scaling limitations. As we close in on scaling limits, the use of new materials, manufacturing processes, and circuit design will become unavoidable. To compound the problem, fierce competition is forcing shorter development times. Our industry needs to openly address these issues and challenges in order to continue developing better and lower cost memories for the decade to come. The whole industry faces these challenges and issues. They are huge. We have assembled a representative group of industry experts for this Joint Rump Session. We will ask them to discuss the top issues from the perspective of each one's area of expertise. The floor will be open to question the panelist's view or challenge them to consider issues that audience would like to raise.

#### Panelists:

S. Aritome, Hynix

G. Atwood, Micron

G. Bronner, Rambus

H. Hazama, Toshiba

H-K Kang, Samsung

C.Y. Lu, Macronix

K. Takeuchi, University of Tokyo

# Session 1 – TAPA I Plenary Session

Wednesday, June 13, 8:05 a.m. Chairpersons: V. De, Intel Corp.

H. Kabuo, Panasonc Corp.

8:05 a.m. Welcome, Opening Remarks, and Awards

A. Amerasekera, Texas Instruments

M. Nagata, Kobe University

#### 1.1 - 8:35 a.m.

The Evolution of Next Generation Data Center Networks for High Capacity Computing, Nicholas Ilyadis, Broadcom Corporation

#### 1.2 - 9:20 a.m.

Technology Innovations for Smart Cities, Akira Maeda, Hitachi, Ltd.

# Session 2 – TAPA I Phase Locked Loops and Oscillators

Wednesday, June 13, 10:05 a.m

Chairpersons: B. Nauta, University of Twente

S. Cho, KAIST

#### 2.1 - 10:25 a.m.

Components for Generating and Phase Locking 390-GHz Signal in 45-nm CMOS, D. Shim#, D. Koukis, D. Arenas, D. Tanner, E. Seok\*, J. Brewer, K. O\*\*, #University of Florida and Seoul National University of Scince and Technology, \*Texas Instruments, \*\*University of Texas at Dallas

Components for generating and phase locking 390-GHz signal are demonstrated using low leakage transistors in 45-nm CMOS. An integrated chain of circuits composed of an 195-GHz oscillator with frequency doubled output at ~390 GHz followed by two cascaded divide-by-two injection locked frequency dividers with output frequency of ~49 GHz is demonstrated. The peak power radiated at ~390 GHz by an on-chip antenna is ~2 uW. The oscillator and frequency divider consumes 21 and 6 mW, respectively.

## 2.2 - 10:50 a.m.

A 160-GHz Receiver-Based Phase-Locked Loop in 65 nm CMOS Technology, W.-Z. Chen, T.-Y. Lu, Y.-T. Wang, J.-T. Jian, Y.-H. Yang, G.-W. Huang\*, W.-D. Liu\*, C.-H. Hsiao\*, S.-Y. Lin\*, J.Y. Liao\*, National Chiao Tung University, \*National Device Laboratory

A 160-GHz receiver-based PLL with tuning range from 156.4 GHz to 159.2 GHz is presented. Sub-THz 1/9 prescaler is replaced by a 3rd harmonic mixer incorporating frequency tripler for frequency down conversion. Frequency acquisition is assisted by received signal strength indicator (RSSI) for automatically frequency sweeping and fast locking. The frequency locking time is less than 3 µsec. Fabricated in 65 nm CMOS technology, the chip size is 0.92mm2. This chip drains 24mW from a 1.2V power supply.

#### 2.3 - 11:15 a.m.

A 32.4 ppm/°C 3.2-1.6V Self-chopped Relaxation Oscillator with Adaptive Supply Generation, K.-J. Hsiao, MediaTec Inc.

A self-chopped relaxation oscillator with adaptive supply generation provides the stable output clock against variations in temperature and supply voltages. The frequency drift is less than ±0.1% for the supply voltage changing from 1.6 to 3.2 V and ±0.1% for a temperature range from -20 to 100°C, which is reduced by 83% with the self-chopped technique. This relaxation oscillator is implemented in a 60-nm CMOS technology with its active area equals to 0.048 mm2. It consumes 2.8 uA from a 1.6-V supply.

#### 2.4 - 11:40 a.m.

A 280nW, 100kHz, 1-Cycle Start-up Time, On-chip CMOS Relaxation Oscillator Employing a Feedforward Period Control Scheme, T. Tokairin, K. Nose, K. Takeda, K. Noguchi, T. Maeda, K. Kawai, M. Mizuno, Renesas Electronics Corporation

A sub-microwatt, 1-cycle start-up CMOS relaxation oscillator has been developed with a feedforward period control scheme and a digitally-controlled boost charging technique. The oscillator is implemented in 90nm CMOS and we successfuly have demonstrated 100kHz clock generation with  $\pm 1\%$ -accuracy and an extremely low power consumption of 280nW.

Session 3 – Honolulu Suite

Analog Devices

Wednesday, June 13, 10:25 a.m.

Chairpersons: J. Paramesh, Carnegie Mellom

M. Ikeda, University of Tokyo

## 3.1 - 10:25 a.m.

**Circuit Techniques to Overcome Class–D Audio Amplifier Limitations in Mobile Devices,** X. Jiang, J. Song, M. Wang, J. Chen, S.K. Arunachalam, T. Brooks, Broadcom Corporation

An auxiliary loop with ramping circuits suppresses pop—and—click noise to 1 mV for an amplifier with 4V achievable output voltage. This is the first reported analog technique that can suppress the Class-D pop-and-click due to PWM start/stop and due to the amplifier offset simultaneously. Switching edge rate control enables the system to meet the EN55022 Class—B standard with a 15 dB margin, which is 14 dB better than state-of-the-art designs An enhanced scheme detects short—circuit conditions without relying on over—limit current events. This is the first reported method to detect short-circuit condition with "zero" input signal and before the over-limit current events ever happen. The paper presents an idea to derive a "clock" signal (based on the control signal that connects speak to the battery) to strobe the high-tracking bandwidth comparator. The reported methods enable full adoption of Class-D technology in mobile communication devices.

# 3.2 - 10:50 a.m.

A 5.2mW, 0.0016% THD up to 20kHz, Ground-Referenced Audio Decoder with PSRR-enhanced Class-AB 16 $\Omega$  Headphone Amplifiers, S.-H. Wen, C.-C. Yang, MediaTek Inc.

A low-power ground-referenced audio decoder with PSRR-enhanced class-AB headphone amplifiers presents <0.0016% THD in the whole audio band against the supply ripple by a negative charge-pump. Realized in the 40nm CMOS, the fully-integrated stereo decoder achieves 91dB SNDR and 100dB dynamic range while driving a 160hm headphone load and consumes 5.2mW from a 1.8V power supply. The core area is 0.093mm2/channel only.

#### 3.3 - 11:15 a.m.

A Sub-1V 3.9μW Bandgap Reference with a 3σ Inaccuracy of ±0.34% from -50°C to +150°C using Piecewise-Linear-Current Curvature Compensation, S. Sano, Y. Takahashi, M. Horiguchi, M.Ota, Renesas Electronics Corporation

A sub-1V 3.9 $\mu$ W bandgap reference (BGR) with small voltage variation of ±0.34% and low temperature drift (< 1mV) over a wide temperature range (-50°C  $^{\sim}$  +150°C) and a wide voltage range (+0.9 V  $^{\sim}$  +5.5V) by using a low power current mode BGR core and a piecewise-linear curvature compensation system. The BGR occupies 0.1mm2 in 0.13 $\mu$ m CMOS technology with triple well structure.

## 3.4 - 11:40 a.m.

**A 1.2V 8.3nJ Energy-Efficient CMOS Humidity Sensor for RFID Applications,** Z. Tan, Y. Chae, R. Daamen\*, A. Humbert\*, Y. Ponomarev\*, M. Pertijs\*, Delft University of Technology, \*NXP Semiconductors

A CMOS fully-integrated humidity sensor for a RFID sensor platform has been realized in  $0.16\mu m$  CMOS technology. It consists of a top-metal finger capacitor, covered by a humidity-sensitive polyimide layer, and an energy-efficient inverter-based capacitance-to-digital converter (CDC). Measurements show that the CDC performs a 12.5-bit conversion in 0.8ms while consuming only 8.6uA from a 1.2V supply. Together with the cointegrated humidity sensor, this translates into a resolution of 0.05% RH in the range of 30% RH to 90% RH, at an energy consumption of only 8.3nJ per measurement which is more than an order-of-magnitude less than the state-of-the-art.

Session 4 – TAPA I

A/D Converters

Wednesday, June 13, 1:30 p.m.

Chairpersons: T.C. Carusone, University of Toronto

M. Ito, Renesas Electronics Corp.

## 4.1 - 1:30 p.m.

A 6b 3GS/s 11mW Fully Dynamic Flash ADC in 40nm CMOS with Reduced Number of Comparators, Y.-S. Shu, MediaTek Inc.

A 6b 3GS/s fully dynamic flash ADC is fabricated in 40nm CMOS and occupies 0.021mm2. Dynamic comparators with digitally controlled built-in offset are realized with imbalanced tails. Half of the comparators are substituted with simple SR latches. The ADC achieves SNDRs of 36.2dB and 33.1dB at DC and Nyquist, respectively, while consuming 11mW from a 1.1V supply.

# 4.2 - 1:55 p.m.

**An Event-Driven, Alias-Free ADC with Signal-Dependent Resolution,** C. Weltin-Wu, Y. Tsividis, Columbia University

A clockless 8b ADC in 130nm CMOS uses a time-varying comparison window to dynamically vary resolution, and input-dependent dynamic bias, to maintain SNDR while saving power. Alias-free operation with SNDR in the range of 47-54dB, which partly exceeds the theoretical limit of 8b conventional converters, is achieved over a 20kHz bandwidth with  $3-9\mu W$  power from a 0.8V supply.

# 4.3 - 2:20 p.m.

A 10-Bit 1-GHz 33-mW CMOS ADC, B.D. Sahoo, B. Razavi, University of California, Los Angeles

A pipelined ADC digitally calibrates capacitor mismatches in its 4-bit first stage and the gain error in the first 5 stages. Using a one-stage op amp with a gain of 10 and realized in 65-nm CMOS technology, the ADC digitizes a 490-MHz input with an SNDR of 52.4 dB, achieving an FOM of 0.097pJ/conversion-step.

# 4.4 - 2:45 p.m.

**A 61.5dB SNDR Pipelined ADC Using Simple Highly-Scalable Ring Amplifiers,** B. Hershberg, S. Weaver, K. Sobue\*, S. Takeuchi\*, K. Hamashita\*, U.-K. Moon, Oregon State University, Asahi Kasei Microdevices

A ring amplifier based pipelined ADC is presented that uses simple cells constructed from small inverters and capacitors to perform amplification. The basic ring amplifier structure is characterized and demonstrated to be highly scalable, power efficient, and compression-immune (inherent rail-to-rail output swing). The prototype 10.5-bit ADC, fabricated in 0.18µm CMOS technology, achieves 61.5dB SNDR at a 30MHz sampling rate and consumes 2.6mW, resulting in a FoM of 90fJ/conversion-step.

Session 5 – Honolulu Suite
Ultra Low Power Radios

Wednesday, June 13, 1:30 p.m.

Chairpersons: G. Van der Plas, IMEC

K. Agawa, Toshiba Corp.

## 5.1 - 1:30 p.m.

A 440pJ/bit 1Mb/s 2.4GHz Multi-Channel FBAR-based TX and an Integrated Pulse-shaping PA, A. Paidimarri, P. Nadeau, P. Mercier, A. Chandrakasan, Massachusetts Institute of Technology

A 2.4GHz TX in 65nm CMOS defines three channels using three high-Q FBARs and supports OOK, BPSK and MSK. The oscillators have –132dBc/Hz phase noise at 1MHz offset, and are multiplexed to an efficient resonant buffer. Optimized for low output power ~ –10dBm, a fully-integrated PA implements 7.5dB dynamic output power range using a dynamic impedance transformation network, and is used for amplitude pulse-shaping. Peak PA efficiency is 44.4% and peak TX efficiency is 33%. The entire TX consumes 440pJ/bit at 1Mb/s.

# 5.2 - 1:55 p.m.

An 8-PPM, 45 pJ/bit UWB Transmitter with Reduced Number of PA Elements, V. Majidzadeh, A. Schmid, Y. Leblebici, J. Rabaey\*, EPFL, \*University of California, Berkeley

An impulse radio ultra wideband (IR-UWB) transmitter (Tx) using finite impulse response synthesis of the raised-cosine pulse is presented. Symmetric pulse combining technique is proposed to reduce the number of power amplifier elements by half. A novel all-digital delay locked loop (AD-DLL) serves as an 8-array pulse position modulator (PPM) for aggressive duty-cycling of the Tx. The chip is fabricated with 90nm CMOS technology and consumes 540  $\mu$ W from 1 V power supply resulting in 45 pJ/bit energy efficiency with -26 dBm of output power.

## 5.3 - 2:20 p.m.

An All 0.5V, 1Mbps, 315MHz OOK Transceiver with 38-µW Career-Frequency-Free Intermittent Sampling Receiver and 52-uW Class-F Transmitter in 40-nm CMOS, A. Saito, K. Honda\*, Y. Zheng\*, S. Iguchi\*, K. Watanabe, T. Sakurai\*, M. Takamiya\*, STARC, \*University of Tokyo

An all 0.5V, 1Mbps, 315MHz OOK transceiver in 40-nm CMOS for a body area network is developed. Both a 38-pJ/bit career-frequency-free intermittent sampling receiver with -55dBm sensitivity and a 52-pJ/bit class-F transmitter with -21dBm output power achieve the lowest energy in the published transceivers for wireless sensor networks.

# 5.4 - 2:45 p.m.

A 2.4GHz Hybrid PPF Based BFSK Receiver with ±180ppm Frequency Offset Tolerance for Wireless Sensor Networks, R. Ni, K. Mayaram, T. Fiez, Oregon State University

A low-power 2.4GHz 1Mb/s hybrid polyphase filter (PPF) based BFSK receiver with +/-180ppm frequency offset tolerance (FOT) and 40dB adjacent channel rejection (ACR) at a modulation index (MI) of 2 is presented for medium-rate wireless sensor networks (WSNs). High FOT at low MI is achieved by a frequency-to-energy conversion architecture using PPFs without any frequency correction. The proposed hybrid topology of the PPF provides an improved ACR at reduced power. The prototype receiver fabricated in a 0.13um CMOS process, including RF and analog front-ends, consumes 1.95mW from a 1V supply with -84dBm sensitivity. Excellent FOT and ACR are achieved for ultra-low baseband power with the proposed receiver. This relaxes the frequency accuracy requirement and improves the radio co-existence in the presence of interferers, reducing the power and cost of WSN links.

# CIRCUITS SESSION 6 – TAPA I

# Technology/Circuits Joint Focus Session - Emerging Nonvolatile Memory

Wednesday, June 13, 3:25 p.m. Chairpersons: J. DeBrosse, IBM

S. Yamakawa, Sony Corp.

### 6.1 - 3:25 p.m.

A 0.13μm 8Mb Logic Based Cu<sub>x</sub>Si<sub>y</sub>O Resistive Memory with Self-Adaptive Yield Enhancement and Operation Power Reduction, X.Y. Xue, W.X. Jian, J.G. Yang, F.J. Xiao, G. Chen, X.L. Xu, Y.F. Xie, Y.Y. Lin, R. Huang\*, Q.T. Zhou\*, J.G. Wu\*, Fudan University, \*Semiconductor Manufactoring International Corp.

A 0.13µm 8Mb CuxSiyO resistive memory test macro with 20F2 cell size is developed based on logic process for the first time. Smart and adaptive assist write and read circuit are proposed and verified in order to fix yield and power consumption issues from large write speed and high temperature resistance variation. SAWM (self-adaptive write mode) helps to enlarge Roff/Ron window from 8X to 24X at room temperature. The reset bit yield is improved from 61.5% to 100% and large power consumption is eliminated after set success. SARM (Self-adaptive read mode) improves read bit yield from 98% to 100% at 125°C.

The typical access time of on-pitch voltage sensing SA(sense amplifier) is 21ns and high bandwidth throughput is supported.

### 6.2 - 3:50 p.m.

A 3.14 um<sup>2</sup> 4T-2MTJ-Cell Fully Parallel TCAM Based on Nonvolatile Logic-in-Memory Architecture, S. Matsunaga, S. Miura\*, H. Honjou\*, K. Kinoshita, S. Ikeda, T. Endoh, H. Ohno, T. Hanyu, Tohoku University, \*NEC Corporation

A four-MOS-transistor/two-MTJ-device (4T-2MTJ) cell circuit is proposed and fabricated for a standby-power-free and a high-density fully parallel nonvolatile TCAM. By optimally merging a nonvolatile storage function and a comparison logic function into a TCAM cell circuit with a nonvolatile logic-in-memory structure, the transistor counts required in the cell circuit become minimized. As a result, the cell size becomes 3.14um2 under a 90-nm CMOS and a 100-nm MTJ technologies, which achieves 60% and 86% of area reduction in comparison with that of a 12T-SRAM-based and a 16T-SRAM-based TCAM cell circuit, respectively.

#### 6.3 - 4:15 p.m.

1Mb 4T-2MTJ Nonvolatile STT-RAM for Embedded Memories Using 32b Fine-Grained Power Gating Technique with 1.0ns/200ps Wake-up/Power-off Times, T. Ohsawa, H. Koike, S. Miura\*, H. Honjo\*, K. Tokutome\*, S. Ikeda, T. Hanyu, H. Ohno, T. Endoh, Tohoku University, \*NEC Coropration

A 1Mb nonvolatile STT-RAM using the 4T-2MTJ cell is designed and fabricated using 90nm CMOS and MTJ processes. 32 cells along a word line (WL) are simultaneously power-gated with quick wake-up/power-off times of 1.0ns/200ps, respectively, to reduce operation power and to eliminate standby power of the chip. The cell is

experimentally shown to retain data with static noise margin (SNM) 0.32V under Vdd=1V. The 1Mb chip with 2.19um2 cell is successfully operated with array access time of 8ns and read power of 10.7mW under 10ns cycle. The macro size of 1Mb STT-RAM is predicted to become smaller than the 1Mb 6T-SRAM in 45nm and beyond.

## T-6.4 - 4:40 p.m.

A Simple New Write Scheme for Low Latency Operation of Phase Change Memory, Y.-Y. Lin, Y.-C. Chen, F.-M. Lee, M. BrightSky\*, H.-L. Lung, C. Lam\*, Macronix International Co., Ltd., \*IBM T.J. Watson Research Center

The behavior of resistance drift after RESET operation for phase change memory is investigated. We propose, for the first time, an effective way to accelerate the drift so that the program/read latency may better match that for DRAM for SCM (storage class memory) application. By simply applying an extra annealing pulse after RESET we can quickly anneal out many defects (that are responsible for the drift) and provide a drift-free period that enlarges the read window. A physical model is proposed to understand the defect annealing phenomenon, which predicts the resistance drift behavior well.

## T-6.5 - 5:05 p.m.

Analysis of Random Telegraph Noise and Low Frequency Noise Properties in 3-D Stacked NAND Flash Memory with Tube-Type Poly-Si Channel Structure, M.-K. Jeong, S.-M. Joe, C.-S. Seo, K.-R. Han\*, E. Choi\*, S.-K. Park\*, J.-H. Lee, Seoul National University, \*Hynix Semiconductor Inc.

Random Telegraph Noise (RTN) and low frequency noise (LFN) properties were investigated for the first time in 3-D stacked NAND flash memory with tube-type poly-Si channel structure. 3-D stacked NAND flash memory showed higher noise power density of bit-line (BL) current (IBL) by  $^{\sim}10$  times than 32 nm NAND flash memory. The behavior of  $\Delta$ IBL was investigated with control-gate bias (VCG), BL bias (VBL) and pass bias (Vpass). As temperature (T) increases, capture and emission times becomes short. To understand poly-Si channel, planar poly-Si thin film transistors (TFT) with different grain size were prepared and analyzed in terms of noise, subthreshold swing (SS), and T.

# Session 7 – Honolulu Suite High Data Rate Wireless and Imaging

Wednesday, June 13, 3:25 p.m.

Chairpersons: A. Cathelin, STMicroelectronics

C.M. Hung, MStar Semiconductor, Inc.

### 7.1 - 3:25 p.m.

**A 260 GHz Fully Integrated CMOS Transceiver for Wireless Chip-to-Chip Communication,** J.-D. Park, S. Kang, S. Thyagarajan, E. Alon, A. Niknejad, University of California, Berkeley

A fully integrated 260GHz OOK transceiver is demonstrated in 65nm CMOS. Communication at 10Gb/s has been verified over a range of 40 mm. The Tx/Rx dual on-chip antenna array is implemented with half-width leaky wave antennas. Each Tx consists of a quadrupler driven by a class-D-1 PA with a distributed OOK modulator, and outputs +5 dBm of EIRP. The Rx uses a double balanced mixer to down-convert to a V-band IF signal that is amplified with a wideband IF driver and demoduated on-chip.

# 7.2 - 3:50 p.m.

**135** GHz 98 mW 10 Gbps ASK Transmitter and Receiver Chipset in 40 nm CMOS, N. Ono, M. Motoyoshi\*, K. Takano\*, K. Katayama\*, R. Fujimoto\*\*, M. Fujishima\*, Semiconductor Technology Academic Research Center, \*Hiroshima University, \*\*Toshiba Corp.

An ASK transmitter and receiver chipset using 40 nm CMOS technology for wireless communication systems is described, in which a maximum data rate of 10 Gbps and power consumption of 98.4 mW are obtained with a

carrier frequency of 135 GHz. A simple circuit and a modulation method to reduce power consumption are selected for the chipsets. To realize multi-gigabit wireless communication, the receiver is designed with consideration of the group delay optimization.

## 7.3 - 4:15 p.m.

A 21.5mW 10+Gb/s mm-Wave Phased-Array Transmitter in 65nm CMOS, L. Kong, E. Alon, University of California, Berkeley

This paper presents a 65nm mm-wave transmitter efficiently supporting QPSK modulation and phased array functionality with a proposed oscillator modulation technique. The design delivers an average output power of 1mW at 10Gb/s and 0.8mW at 14Gb/s while consuming 21.5mA DC current from a 1V supply. At 10Gb/s, an overall transmitter efficiency of 4.65% is achieved, representing ~1.8X improvement over prior art.

## 7.4 - 4:40 p.m.

A UWB IR Timed-Array Radar Using Time-Shifted Direct-Sampling Architecture, C.-M. Lai, K.-W. Tan, L.-Y. Yu, Y.-J. Chen, J.-W. Huang, S.-C. Lai, F.-H. Chung, C.-F. Yen, J.-M. Wu, P.-C. Huang, K.-J. Chang, S.-Y. Huang, T.-S. Chu, National Tsing Hua University

A UWB impulse radio (IR) timed-array radar using time-shifted direct-sampling architecture is presented. The transmitter array can generate and send a variety of 10GS/s pulses towards targets. The receiver array samples the reflected signal in RF domain directly by time interleaved sampling with equivalent sampling rate of 20 GS/s. The radar system can determine time of arrival (TOA) and direction of arrival (DOA) through time-shifted sampling edges which are generated by on-chip digital-to-time converters (DTC). The proposed architecture has range and azimuth resolution of 0.75 cm and 3 degree respectively. This prototype is implemented in a 0.18 $\mu$ m CMOS technology.

## 7.5 - 5:05 p.m.

A 94GHz mm-Wave to Baseband Pulsed-Radar for Imaging and Gesture Recognition, A. Arbabian, S. Kang\*, S. Callender\*, J.-C. Chien\*, B. Afshar\*, A. Niknejad\*, Stanford University, \*University of California, Berkeley

An integrated phase-coherent and pixel-scalable pulsed-radar transceiver with on-chip tapered loop antennas generates programmable pulses down to 36ps using an integrated 94GHz carrier, frequency synthesized and locked to an external reference. A DLL controls the TX pulse position with 2.28ps resolution, which allows the chip to function as a unit element in a timed-array. The receiver also features a >1.5THz GBW DA as the frontend amplifier, wideband quadrature mixers, and a 26GHz quadrature baseband. Phase coherency allows for ~375µm single-target position resolution by interferometry.

# CIRCUITS SESSION 8 – TAPA I Technology/Circuit Joint Focus Session - Advanced SRAM

Thursday, June 14, 8:05 a.m.

Chairpersons: G. Lehman, Infineon Technologies AG

H. Yamauchi, Fukuoka Institute of Technology

## 8.1 - 8:05 a.m.

A 0.41μA Standby Leakage 32Kb Embedded SRAM with Low-Voltage Resume-Standby Utilizing All Digital Current Comparator in 28nm HKMG CMOS, N. Maeda, S. Komatsu, M. Morimoto, Y. Shimazaki, Renesas Electronics Corp.

A high-performance and low-leakage current embedded SRAM for mobile phones is proposed. The proposed SRAM has a low-voltage resume-standby mode to reduce the standby leakage. An all digital current comparator is also proposed to choose a suitable standby mode. A test chip was fabricated using 28 nm HKMG CMOS technology. The proposed 32 Kb SRAM has a 0.41  $\mu$ A standby leakage which is the conventional half, with 420 ps access.

## 8.2 - 8:30 a.m.

A 13.8pJ/Access/Mbit SRAM with Charge Collector Circuits for Effective Use of Non-Selected Bit Line Charges, S. Moriwaki, Y. Yamamoto, A. Kawasumi, T. Suzuki\*, S. Miyano, T. Sakurai\*\*, H. Shinohara, Semiconductor Technology Academic Research Center, \*Panasonic Corp., \*\*University of Tokyo

1Mb SRAM with charge collector circuits for effective use of non-selected bit line charges has been fabricated in 40nm technology. These circuits reduce two major wasted power sources of the low voltage SRAM: excess bit line swing due to random variation and bit line swing of non-selected columns. The lowest power consumption of 13.8pJ/Access/Mbit in the previous works has been achieved.

#### 8.3 - 8:55 a.m.

A SRAM Cell Array with Adaptive Leakage Reduction Scheme for Data Retention in 28nm High-K Metal-Gate CMOS, P. Hsu, Y. Tang, D. Tao, M.-C. Huang, M.-J. Wang, C. Wu, Q. Li, TSMC

1Mbit SRAM macro with adaptive leakage current reduction scheme is implemented in 28nm high-k metal gate CMOS technology. A current limiter that limits cell array leakage current at various process-voltage-temperature (PVT) corners is included in the proposed scheme. The leakage current is reduced by more than 60% at fast process corners by increasing virtual ground voltage (Vvgnd) while maintaining sufficient data retention margin. At low VDD or slow process corners, Vvgnd is lowered to maintain the data integrity in the bitcell.

# 8.4 - 9:20 a.m.

A 28nm High-k Metal-Gate SRAM with Asynchronous Cross-Couple Read Assist (AC2RA) Circuitry Achieving 3X Reduction on Speed Variation for Single Ended Arrays, R. Lee, J.-P. Yang, C.-E. Huang, C.-C. Chiu, W.-S. Kao, H.-C. Cheng, H.-J. Liao, J. Chang, TSMC

Asynchronous Cross-Couple Read Assist (AC2RA) circuitry scheme was invented for single-ended sensing to minimize speed variation in 28nm HKMG process. It improves SRAM array speed variation by 63.3% which is adequate to cover  $6\sigma$  variation. Access time is also boosted by faster sensing.

# Session 9 – Honolulu Suite Medical Electronics

Thursday, June 14, 8:05 a.m.

Chairpersons: J. Gealow, MediaTek Wireless, Inc.

C.-Y. Lee, National Chiao Tung University

### 9.1 - 8:05 a.m.

A 0.6V 2.9μW Mixed-Signal Front-End for ECG Monitoring, M. Yip, J.L. Bohorquez\*, A.P. Chandrakasan, Massachusetts Institute of Technology, \*Convergence Medical Devices

This paper presents a mixed-signal ECG front-end that uses aggressive voltage scaling to maximize power-efficiency and facilitate integration with low-voltage DSPs. 50/60Hz interference is canceled using mixed-signal feedback, enabling low-voltage operation by reducing dynamic range requirements. Analog circuits are optimized for ultra-low-voltage, and a SAR ADC with a dual-DAC architecture eliminates the need for a power-

hungry ADC buffer. Oversampling and  $\Delta\Sigma$ -modulation leveraging near-VT digital processing are used to achieve ultra-low-power operation without sacrificing noise performance and dynamic range. The fully-integrated frontend is implemented in a 0.18 $\mu$ m CMOS process and consumes 2.9 $\mu$ W from 0.6V.

## 9.2 - 8:30 a.m.

**A 700μW 8-Channel EEG/Contact-impedance Acquisition System for Dry-electrodes,** S. Mitra, J. Xu, A. Matsumoto\*, K.A.A. Makinwa\*\*, C. Van Hoof, R.F. Yazicioglu, imec, \*Panasonic Corporation, \*\*Delft University of Technology

A 700 $\mu$ W 8-channel active-electrode (AE) based EEG monitoring system is presented. The complete system consists of 9 AEs and a back-end analog signal processor. It is capable of continuously recording EEG signals and electrode-tissue contact impedance (ETI). The EEG channels have 1.2GOhm input impedance, 1.75 $\mu$ Vrms noise (0.5-100Hz), 84dB CMRR, and can reject ±250mV of electrode offset, while consuming less than <87 $\mu$ W (including ETI). The system facilitates ambulatory use and patient comfort, while delivering high quality EEG signals.

#### 9.3 - 8:55 a.m.

A Wirelessly Powered Log-based Closed-loop Deep Brain Stimulation SoC with Two-way Wireless Telemetry for Treatment of Neurological Disorders, H.-G. Rhew, J. Jeong, J. Fredenburg, S. Dodani, P. Patil, M.Flynn, University of Michigan

A log-based closed-loop Deep Brain Stimulation system detects and processes low-frequency brain field signals to optimize stimulation parameters. The fully self-contained single-chip system incorporates LNAs, a log-ADC, digital log-filters, a log-DSP with a PI-controller, current stimulators, a two-way wireless transceiver, a clock generator, and an RF energy harvester. The 2x2mm2 180nm CMOS prototype consumes 468µW for recording and processing neural signals, stimulation, and for two-way wireless communication.

## 9.4 - 9:20 a.m.

**A Fully-Integrated 10.5μW Miniaturized (0.125μm²) Wireless Neural Sensor,** D. Yeager, W. Biederman, N. Narevsky, E. Alon, J. Rabaey, University of California, Berkeley

A wirelessly powered 0.125mm2 65nm CMOS IC for BMI applications integrates four 1.5 $\mu$ W amplifiers (6.5 $\mu$ Vrms input-referred noise for a 10kHz bandwidth) with power conditioning and communication circuitry. The multi-node backscatter FDMA communication scheme frequency locks to a wireless interrogator. The full system, verified wirelessly with MATLAB generated neural data, consumes 10.5 $\mu$ W, and operates at 1mm in air with 50mW transmit power.

# Session 10 – TAPA I Wireless Connectivity and Software Defined Radios

Thursday, June 14, 10:00 a.m.

Chairpersons: B. Ginsberg, Texas Instruments H. Ishikuro, Keio University

# 10.1 - 10:00 a.m.

A -70dBm-Sensitivity 522Mbps 0.19nJ/bit-TX 0.43nJ/bit-RX Transceiver for TransferJet<sup>™</sup> SoC in 65nm CMOS, D. Miyashita, K.Agawa, H. Kajihara, K. Sami, M. Iwanaga, Y. Ogasawara, T. Ito, D. Kurose, N. Koide, T. Hashimoto, H. Sakurai, T. Yamaji, T. Kurihara, K.Sato, I. Seto, H. Yoshid, R. Fujimoto, Y. Unikawa, Toshiba Corp.

TransferJet(TM) is an emerging high-speed close-proximity wireless communication standard, which enables a data transfer of up to 522Mbps within a few centimeters range. We have developed a fully integrated TransferJet SoC with a 4.48-GHz operating frequency and a 560-MHz bandwidth (BW) using 65nm CMOS technology. Baseband filtering techniques for both a transmitter (TX) and a receiver (RX) are proposed to obtain a sensitivity of -70dBm with low power consumption. The SoC achieves an energy per bit of 0.19nJ/bit and 0.43nJ/bit for the TX and the RX, respectively, We have also built the world's smallest module prototype using the SoC, which is suitable for small mobile devices.

#### 10.2 - 10:25 a.m.

A 2.4GHz WLAN Transceiver with Fully-integrated Highly-linear 1.8V 28.4dBm PA, 34dBm T/R Switch, 240MS/s DAC, 320MS/s ADC, and DPLL in 32nm SoC CMOS, Y. Tan, J. Duster, C.-t. Fu, E. Alpman, A. Balankutty, C.C. Lee, A. Ravi, S. Pellerano, K. Chandrashekar, H. S. Kim, B. Carlton, S. Suzuki, M. Shafi, Y. Palaskas, H. Lakdawala, Intel Corporation

A 2.4GHz WLAN transceiver is presented with a fully-integrated highly-linear 28.4dBm PA, 34dBm T/R switch, 240MS/s DAC and 320MS/s ADC (high OSR for relaxed filtering), DPLL and fractional LOG, in 32nm CMOS. For 802.11g 54Mbps, without linearization the TX delivers 19.8dBm at 12.5% efficiency (PA 21.6dBm/19.7% PAE) for -25dB EVM and mask-compliant 22.8dBm/18.5%, while the RX achieves 4.8dB NF, -69dBm sensitivity, and -8dBm IIP3.

#### 10.3 - 10:50 a.m.

**A +30.5 dBm CMOS Doherty Power Amplifier with Reliability Enhancement Technique,** K.Onizuka, S. Saigusa, S.Otaka, Toshiba Corporation

A watt-level, fully integrated 1:1 Doherty power amplifier for 2.4 GHz band is demonstrated in 65 nm CMOS. Both high peak output power of +30.5 dBm and high PAE of 23% at 6 dB power back-off are achieved by the proposed compact output network. A newly introduced reliability enhancement technique for sub-PA prolongs time to failure by up to 75% as well. The PA satisfies IEEE 802.11b and 11g spectrum masks at output power levels of 25.5 and 21.5 dBm respectively, from supply voltage of 3.3 V.

### 10.4 - 11:15 a.m.

A Harmonic-Rejecting CMOS LNA for Broadband Radios, J.W. Park, B. Razavi, University of California, Los Angeles

A feedback LNA employs programmable notch filtering so as to suppress by 20 dB blockers at LO harmonics from 300 MHz to 10 GHz. Fabricated in 65-nm technology, the LNA exhibits a noise figure of less than 3 dB from 300 MHz to 4 GHz while consuming 8.6 mW from a 1.2-V supply.

## 10.5 - 11:40 a.m.

**A 13.5mA Sub-2.5dB NF Multi-Band Receiver,** M. Mikhemar, A. Mirzaei, A. Hadji-Abdolhamid, J. Chiu, H. Darabi, Broadcom Corporation

An ultra low-power multi-band receiver covering any frequency band in the range 0.7-2.5GHz is fabricated in 40nm CMOS and occupies a total area of 1.5mm2. The receiver achieves a NF of 2.4dB, with -2dBm IIP3, and a peak SNR of 35dB, while consuming 13.5mA from the battery, more than three times power reduction compared to prior art.

# **Successive Approximation A/D Converters**

Thursday, June 14, 10:00 a.m.

Chairpersons: M. Flynn, University of Michigan

S. Dosho, Panasonic Corp.

#### 11.1 - 10:00 a.m.

A 2.8GS/s 44.6mW Time-Interleaved ADC Achieving 50.9dB SNDR and 3dB Effective Resolution Bandwidth of 1.5GHz in 65nm CMOS, D. Stepanovic, B. Nikolic, University of California, Berkeley

This paper presents a power- and area-efficient 24-way time-interleaved SAR ADC designed in 65nm CMOS. At 2.8GS/s sampling rate the ADC consumes 44.6mW of power from a 1.2V supply while achieving peak SNDR of 50.9dB and retaining SNDR higher than 48.2dB across the entire first Nyquist zone.

#### 11.2 - 10:25 a.m.

A 3.8mW 8b 1GS/s 2b/cycle Interleaving SAR ADC with Compact DAC Structure, C.-H. Chan, Y. Zhu, S.-W. Sin, S.-P. U, R. Martins\*, University of Macau, \*TU of Lisbon

An 8b 1GS/s ADC is presented that interleaves two 2b/cycle SARs. To enhance speed and save power, the prototype utilizes segmentation switching and custom-designed DAC array with high density in a low parasitic layout structure. It operates at 1GS/s from 1V supply without interleaving calibration and consumes 3.8mW of power, exhibiting a FoM of 24fJ/conversion step. The ADC occupies an active area of 0.013mm^2 in 65nm CMOS including on-chip offset calibration.

## 11.3 - 10:50 a.m.

A 4.5-mW 8-b 750-MS/s 2-b/Step Asynchronous Subranged SAR ADC in 28-nm CMOS Technology, Y.-C. Lien, MediaTek

A 8-b 2-b/step asynchronous subranged SAR ADC is presented. It incorporates subranging technique to obtain fast reference settling for MSB conversion. The capacitive interpolation reduces number of NMOS switches and lowers matching requirement of a resistive DAC. The proposed timing scheme avoids the need of specific duty cycle of external clock for defining sampling period in a conventional asynchronous SAR ADC. Operating at 750 MS/s, this ADC consumes 4.5 mW from 1-V supply, achieves ENOB of 7.2 and FOM of 41 fJ/conversion-step. It is fabricated in 28-nm CMOS technology and occupies an active area of only 40 um X 100 um.

## 11.4 - 11:15 a.m.

A 34fJ 10b 500 MS/s Partial-Interleaving Pipelined SAR ADC, Y. Zhu, C.-H. Chan, S.-W. Sin, S.-P. U, R. Martins\*, University of Macau, \*TU of Lisbon

A 10b 500MS/s ADC is presented that shares a full-speed SAR at front-end and interleaves the pipelined residue amplification with shared opamp and 2nd-stage SAR ADCs, which achieves high speed, low power and compact area. The prototype ADC in 65nm CMOS achieves a mean SNDR of 55.4dB with 8.2mW power dissipation at 1.2V. The active die area including the offset calibrations is 0.046mm^2.

### 11.5 - 11:40 a.m.

A 3.2fJ/c.-s. 0.35V 10b 100KS/s SAR ADC in 90nm CMOS, H.-Y. Tai, H.-W. Chen, H.-S. Chen, National Taiwan University

A low-voltage energy-efficient SAR ADC is presented in this paper with four techniques. Arbitrary weight capacitor array tolerates errors to reduce conversion time. To operate under low voltage, DAC common mode level shift and leakage reduction sample switch with a charge pump are proposed. Differential control logic is used to save its digital power. The prototype ADC consumes 170nW at 100KS/s from a 0.35V supply. It achieves an SNDR of 56.3dB at Nyquist rate and its FOM is 3.2fJ/c.-s.

# Rainbow Room Technology and Circuits Joint Luncheon Talk

Thursday, June 14, 12:00 p.m.

Nano Satellites, CubeSats, and the Next Space Generation, James W. Cutler, University of Michigan

Enabled by advancements in VLSI and related technology, spacecraft are becoming smaller and more capable and flying to extremely interesting locations. Spacecraft provide global communication, global geolocation, and explore the farthest reaches of our solar system and beyond. Humanity has landed rovers on Mars, returned space probes after they have landed on asteroids, and discovered liquid oceans on far off moons. Space, though, is an extremely challenging environment. From a system perspective, spacecraft need to be small and light to enable low-cost launches, and proper power management, conversion, and generation are fundamental mission enablers.

# Session 12 – Tapa 1 Technology/Circuits Joint Focus Session - Design Enablement in Scaled CMOS

Thursday, June 14, 1:30 p.m. Chairpersons: K. Wilcox, AMD

K. Nose, Renesas Electronics Corp.

#### 12.1 - 1:30 p.m.

**A 22nm Dynamically Adaptive Clock Distribution for Voltage Droop Tolerance,** K. Bowman, C. Tokunaga, T. Karnik, V. De, J. Tschanz, Intel

An all-digital dynamically adaptive clock distribution mitigates the impact of high-frequency supply voltage (Vcc) droops on microprocessor performance and energy efficiency. Silicon measurements from a test chip in a 22nm tri-gate technology demonstrate simultaneous throughput gains and energy reductions ranging from 14% and 3% at 1.0V to 31% and 15% at 0.6V, respectively, for a 10% Vcc droop.

# 12.2 - 1:55 p.m.

**Voltage Droop Reduction Using Throttling Controlled by Timing Margin Feedback,** M. Floyd, A. Drake\*, R. Berry, H. Chase, R. Willaman, J. Pena, IBM System and Technology Group, \*IBM Austin Research Lab

An active processor throttling control loop was enabled in the shipping POWER7™ based P775 supercomputer to mitigate voltage droop. Critical path measurement circuits built into the POWER7 processor chips are used to dynamically measure and react to loss of timing margin. This technique was used to save power without dropping frequency and to only engage if a worst-case droop event occurred in the system. As a result, worst-case workload-induced voltage droop events are reduced by around 50% compared to the system operating without the control loop. The reduction in operating voltage afforded by this technique translates to significant yield improvement, reduced failure rates (around 60% FIT reduction), and improved power efficiency (32W per processor chip, which translates into more than \$600 per node per year, which is well more than \$250,000 per year in a proposed 512 node installation).

# 12.3 - 2:20 p.m.

An On-Die All-Digital Delay Measurement Circuit with 250fs Accuracy, M. Mansuri, B. Casper, F. O'Mahony, Intel Corporation

This paper demonstrates an in-situ delay measurement circuit which precisely characterizes key clocking circuits such as full phase rotation interpolators. This on-die all-digital circuit produces a digital output value proportional to the relative delay between two clocks, normalized to the clock period. This circuit requires no calibration for variation or process, voltage, temperature (PVT) and measures the delay with 250fs absolute accuracy and repeatability of 10fs-rms.

### 12.4 - 2:45 p.m.

A 47% Access Time Reduction with a Worst-Case Timing-Generation Scheme Utilizing a Statistical Method for Ultra Low Voltage SRAMs, A. Kawasumi, Y. Takeyama, O. Hirabayashi, K. Kushida, F. Tachibana, Y. Niki, S. Sasaki, T. Yabe, Toshiba

A variation tolerant sense amplifier timing generator which utilizes a statistical method is proposed. The circuit monitors all the bitline delays and generates the worst timing from the delay distribution. The timing generating circuits have been implemented in 28nm and 40nm SRAMs. The 47% access time improvement at 0.5V has been confirmed in measured results.

Session 13 – Honolulu Suite High Performance Transceivers

Thursday, June 14, 1:30 p.m. Chairpersons: J. Zerbe, Rambus

J. Terada, NTT Microsystem Integration Labs

# 13.1 - 1:30 p.m.

A 3.1mW/Gbps 30Gbps Quarter-Rate Triple-Speculation 15-tap SC-DFE RX Data Path in 32nm CMOS, T. Toifl, M. Ruegg\*, R. Inti\*\*, C. Menolfi, M. Brändli, M. Kossel, P. Buchmann, P.A. Francese, T. Morf, IBM Research GmbH, \*Miromico, \*\*Oregon State University

This paper describes a low-power implementation of a receiver data path, consisting of the RX termination with ESD, continuous-time linear equalizer (CTLE), and a 15-tap decision feedback equalizer (DFE) running at quarter rate. While the first 3 DFE taps are implemented by speculation, the latter 12 taps use a switched-cap (SC-DFE) approach. The circuit was produced in 32nm SOI-CMOS, and was measured to receive 30Gb/s PRBS31 data at <10-12 BER over a 36dB loss channel with an energy efficiency of 3.1mW/Gbps.

# 13.2 - 1:55 p.m.

A Wide Common-Mode Fully-Adaptive Multi-Standard 12.5Gb/s Backplane Transceiver in 28nm CMOS, J. Savoj, K. Hsieh, P. Upadhyaya, F.-T. An, A. Bekele, S. Chen, X. Jiang, K.W. Lai, C.F. Poon, A. Sewani, D. Turker, K. Venna, D. Wu, B. Xu, E. Alon\*, K. Chang, Xilinx, Inc., \*University of California, Berkeley

This paper describes the design of a fully-adaptive backplane transceiver embedded in a state-of-the-art, low-leakage, 28nm CMOS FPGA. The wide common mode receive AFE utilizes a three-stage CTLE to provide selective frequency boost for long-tail ISI cancellation. A 5-tap speculative DFE removes the immediate post-cursor ISI. Both CTLE and DFE are fully adaptive using sign-sign LMS algorithm. A novel clocking technique uses wideband LC and ring oscillators for reliable clocking from 0.6-12.5Gb/s operation. The transmitter utilizes a 3-tap FIR and provides flexibility for supply and ground referenced operation. The transceiver achieves BER < 10^-15 over a 33dB-loss backplane at 12.5Gb/s and over multiple channels with 10G-KR characteristics at 10.3125Gb/s.

### 13.3 - 2:20 p.m.

A 25-Gb/s 2.2-W Optical Transceiver Using an Analog FE Tolerant to Power Supply Noise and Redundant Data Format Conversion in 65-nm CMOS, T. Takemoto, H. Yamashita, T. Kamimura, F. Yuki, N. Masuda, H. Toyoda, N. Chujo, K. Kogo, Y. Lee, S. Tsuji, S. Nishimura, Hitachi, Ltd.

A one-chip optical transceiver was developed for backplane transmission inside ICT systems by integrating an analog FE with data format conversion (DFC) in 65-nm CMOS.  $10 \times 6.25$ -Gb/s electrical signals were converted into  $4 \times 25$ -Gb/s optical signals with 25% redundancy to improve resilience against the possible failure of laser diodes (LD). A TIA with a noise canceller and fully differential LD driver (LDD) with two-tap de-emphasis were proposed for achieving tolerance to power supply noise. The noise canceller suppressed power-supply variations by 98% compared to our previous TIA. Moreover, the integrated redundant DFC improved transceiver reliability without relying on redundant network topologies at the system level. Total power consumption at full channel operation was only 2.2 W, including 236 and 831 mW for the TIA and LDD with power efficiencies of 2.4 and 8.3 mW/Gb/s, respectively.

# 13.4 - 2:45 p.m.

**A 100+ meter 12Gb/s/Lane Copper Cable Link Based on Clock-Forwarding,** T. Ali, W.H. Park, P. Mulage, E.-H. Chen, R. Ho\*, C.-K.K. Yang, UCLA, \*Oracle Labs

Active and passive copper cables for data rates exceeding 10Gb/s are typically limited to less than 20m. Optical fiber on the other hand offers superior performance for run length greater than 100m, but is costly and has large power requirements exceeding 1W per link. Although 100m copper link is demonstrated for 10GBASE-T, it utilizes complex symbols at a lower symbol rate and dissipates large power for DSP. In this paper we propose a 12Gbps/lane active cable link that extends copper cables >100 meters using low power and area repeaters powered through the cable that can potentially be embedded in the cable. An FIR filtering technique, and a low-jitter configurable PLL/MDLL enables the delivery of a low jitter forward clock along the entire cable. Total jitter at the end of the cable is 4.4ps RMS. The link at each repeater occupies 0.095mm2 of area in a 65nm technology dissipating 48mW.

# Session 14 – TAPA II Technology/Circuits Focus Session - Embedded Memory

Thursday, June 14, 3:25 p.m. Chairpersons: L. Cheng, Oracle

M. Yamaoka, Hitachi America, Ltd.

## 14.1 - 3:25 p.m.

**Isolated Preset Architecture for a 32nm SOI embedded DRAM Macro,** J. Barth, D. Plass, A. Vehabovic, R. Joshi\*, R. Kanj\*, S. Burns, T. Weaver, IBM Systems and Technology Group, \*IBM Research

The Isolated Preset Architecture (IPA) improves retention characteristics by implementing a weak read '1' Isolation scheme, allowing a lower stored '1' Ievel to be sensed. The architecture also reduces sub-array area by 15% and bit-line activation power by 2x compared to previous design, without impacting performance. The architecture was implemented in IBM's 32nm High-K/Metal SOI embedded DRAM technology. Hardware results confirm 1.8ns random cycle and 2x improved retention characteristic with optimized Analog reference tuning.

## 14.2 - 3:50 p.m.

A 260mV L-shaped 7T SRAM with Bit-Line (BL) Swing Expansion Schemes Based on Boosted BL, Asymmetric-V<sub>TH</sub> Read-Port, and Offset Cell VDD Biasing Techniques, M.-P. Chen, L.-F. Chen, M.-F. Chang, S.-M. Yang, Y.-J. Kuo, J.-J. Wu, M.-S. Ho\*\*, H.-Y. Su\*, Y.-H. Chu\*, W.-C. Wu\*, T.-Y. Yang\*, H. Yamauchi^, National Tsing Hua University, \*ICL, ITRI, \*\*National Chung Hsing University, ^Fukuoka Institute of Technology

This work proposes bit-line (BL) swing expansion schemes (BL-EXPD), which minimize the product of SRAM cell area (A) and the minimum operation voltage (VDDmin) to the best of our knowledge. The key-enablers to minimize A VDDminare: L-shaped 7T cell (L7T) and BL-EXPD. L7T features: (1) area efficient compact cell layout, (2) a read-disturb free decoupled 1T read port (RP), and (3) a half-select disturb free write back scheme. BL-EXPD enables a 9x larger read-BL (RBL) swing at 6 point than that of our previously proposed Z8T and allows a single BL sensing for cell area saving. A fabricated 65nm 256-row BL 32Kb L7T SRAM achieves a 260mV VDDmin. As a result, it's A VDDmin is ~50% lower than for Z8T and the conventional 8T SRAM cells.

## 14.3 - 4:15 p.m.

**A 1.6-mm2 38-mW 1.5-Gb/s LDPC Decoder Enabled by Refresh-Free Embedded DRAM,** Y.S. Park, D. Blaauw, D. Sylvester, Z. Zhang, University of Michigan

Memory dominates the power consumption of high-throughput LDPC decoders. A 700 MHz refresh-free embedded DRAM (eDRAM) is designed as a low-power memory to retain data for the required access window. 32 1-kb eDRAM arrays are integrated in a 1.6 mm2, 65nm LDPC decoder suitable for IEEE 802.11ad. The LDPC decoder consumes 38 mW for a 1.5 Gb/s throughput at 90 MHz and 10 decoding iterations, and it achieves up to 9 Gb/s at 540 MHz.

# 14.4 - 4:40 p.m.

1Gsearch/sec Ternary Content Addressable Memory Compiler with Silicon-Aware Early-Predict Late-Correct Single-Ended Sensing, I. Arsovski, T. Hebig, D. Dobson, R. Wistort, IBM Systems Technology Group

This paper describes a Ternary Content Addressable Memory (TCAM) that uses a novel Early-Predict Late Correct (EPLC) search scheme to achieve the highest published TCAM search throughput of 1billion searches / sec, while using a power-efficient two-phase search sensing that consumes only 0.76W on a 2048x640 TCAM. Abstract:

A Ternary Content Addressable Memory (TCAM) uses a two phase search operation where early prediction on its pre-search results prematurely activates the subsequent main-search operation, which is later interrupted only if the final pre-search results contradict the early prediction. This early main-search activation improves performance by 30%, while the low-probability of a late-correct has a negligible power impact. This Early Predict Late Correct (EPLC) sensing enables a high-performance TCAM compiler implemented in 32nm High-K Metal Gate SOI process to achieve 1Gsearch/sec throughput on a 2048x640bit TCAM instance while consuming only 0.76W. Embedded Deep-Trench (DT) capacitance for power supply noise mitigation adds 5% overhead for a total TCAM area of 1.56mm2

#### 14.5 - 5:05 p.m.

A 2.8GHz 128-entry x 152b 3-Read/2-Write Multi-Precision Floating-Point Register File and Shuffler in 32nm CMOS, S. Hsu, A. Agarwal, M. Anders, H. Kaul, S. Mathew, F. Sheikh, R. Krishnamurthy, S. Borkar, Intel Corporation

A 128-entry x 152b 3-read/2-write ported multi-precision floating-point register file/shuffler with measured 2.8GHz operation is fabricated in 1.05V, 32nm CMOS. Single-precision (24b-mantissa), 2-way 12b or 4-way 6b reduced mantissa precision modes, certainty tracking bits, mode-dependent gating, area-efficient windowing using 1R/1W cells, and ultra-low-voltage read/write circuits enable 350mV-1.2V wide dynamic voltage range with measured peak energy-efficiency of 751GOPS/W at 400mV, 4-way 6b-mode (22.3x higher than 1.05V single-precision mode) and 19% area reduction over single-precision 3R/2W implementations.

# Session 15 – Honolulu Suite Analog Sensor Interfaces

Thursday, June 14, 3:25p.m.

Chairpersons: J. Lloyd, Analog Devices

J. Lee, National Taiwan University

### 15.1 - 3:25 p.m.

High-resolution Sensing Sheet for Structural-health Monitoring via Scalable Interfacing of Flexible Electronics with High-performance ICs, Y. Hu, W. Rieutort-Louis, J. Sanz-Robinson, K. Song, J.C. Sturm, S. Wagner, N. Verma, Princeton University

Early-stage damage detection for buildings and bridges requires continuously sensing and assessing strain over large surfaces, yet with centimeter-scale resolution. To achieve this, we present a sensing sheet that combines high-performance ICs with flexible electronics, allowing bonding to such surfaces. The flexible electronics integrates thin-film strain gauges and amorphous-silicon control circuits, patterned on a polyimide sheet that can potentially span large areas. Non-contact links couple digital and analog signals to the ICs, allowing many ICs to be introduced via low-cost sheet lamination for energy-efficient readout and computation over a large number of sensors. Communication between distributed ICs is achieved by transceivers that exploit low-loss interconnects patterned on the polyimide sheet; the transceivers self-calibrate to the interconnect impedance to maximize transmit SNR. The system achieves multi-channel strain readout with sensitivity of 18 uStrainRMS at an energy per measurement of 270nJ, while the communication energy is 12.8pJ/3.3pJ per bit (Tx/Rx) over 7.5m.

# 15.2 - 3:50 p.m.

Nanostructured CMOS Wireless Ultra-Wideband Label-free DNA Analysis SoC, H.M. Jafari, L. Soleymani\*, K. Abdelhalim, E. Sargent, S. Kelley, Roman Genov, University of Toronto, \*McMaster University

A 0.13-micron CMOS fully integrated 48-channel UWB label-free DNA analysis SoC is demonstrated in prostate can- cer screening. The 3mm\*3mm die includes 578 nanostructured DNA sensors, 48 pH sensors, and 48 temperature sensors and reuses key circuits for cyclic voltammetry, amperometry and temperature regulation.

## 15.3 - 4:15 p.m.

A Fully Integrated Hepatitis B Virus (HBV) DNA Detection SoC based on Monolithic Polysilicon Nanowire CMOS Process, C.-W. Huang, Y.-J. Huang, P.-W. Yen, H.-T. Hsueh, C.-Y. Lin\*, M.-C. Chen\*, C.-H. Ho\*, F.-L. Yang\*, H.-H. Tsai\*\*, H.-H. Liao\*\*, Y.-Z. Juang\*\*, C.-K. Wang, C.-T. Lin, S.-S. Lu, National Taiwan University, \*National Nano Device Laboratories, \*\*National Applied Research Laboratories

Polysilicon nanowire (poly-Si NW) based biosensor is integrated with the wireless acquisition circuits in a standard CMOS SoC for the first time. To improve detection quality, a chopper DDA-based analog front-end with features of low noise, high CMRR, and rail-to-rail input range is implemented. Additional temperature sensor is also included to compensate temperature drift of the biosensor. The results indicate that the detection limit is as low as 10fM. The capability to distinguish one base-pair mismatched DNAs is also demonstrated.

#### 15.4 - 4:40 p.m.

**A Fully-Electronic Charge-Based DNA Sequencing CMOS Biochip,** A. Manickam, R. Singh, N. Wood, B. Li, A. Ellington, A. Hassibi, University of Texas at Austin

A 90x90 fully-electronic biosensor array for charge-based DNA sequence-by-synthesis is implemented in a 0.18µm standard CMOS process. Each 16um x 16um pixel consists of an integrated charge-sensing electrode connected to an embedded circuitry capable of detecting DNA polymerization and simultaneously measuring

the electrode-electrolyte interface capacitance. The detection dynamic range of this sensor is +90dB while consuming 4 mW from a 3.3V supply when operating at 8.1s/frame.

# 15.5 - 5:05 p.m.

An 88dB SNR, 30μm Pixel Pitch Infra-Red Image Sensor with a 2-Step 16 bit A/D Conversion, A. Peizerat, J.-P. Rostaing, N. Zitouni, N. Baier, F. Guellec, R. Jalby, M. Tchagaspanian, CEA-LETI, Minatec

A new readout IC (ROIC) with a 2 step A/D conversion for cooled infrared image sensors is presented in this paper. The sensor operates at a 50Hz frame rate in an Integrate-While-Read snapshot mode. The 16 bit ADC resolution preserves the excellent detector SNR at full well ( $^3$ Ge-). The ROIC, featuring a 320x256 array with 30µm pixel pitch, has been designed in a standard 0.18µm CMOS technology. The IC has been hybridized (indium bump bonding) to a LWIR (Long Wave Infra Red) detector fabricated using our in-house HgCdTe process. The first measurement results of the detector assembly validate both the 2-step ADC concept and its circuit implementation. This work sets a new state-of-the-art SNR of 88dB.

# CIRCUITS RUMP SESSION Thursday, June 14 8:00 p.m. – 10:00 p.m.

Organizers: J. Zerbe, Rambus

K. Agawa, Toshiba

R1: Is VLSI Innovation Dead?

Moderator: J. Zerbe, Rambus

Since the 90's the drop-off in venture-capitalist funded semiconductor startups has been noticeably precipitous. Between the burst of the internet bubble and the economic slowdown, IC companies seem like they are taking a back seat. Headlines touting innovative companies are now dominated by web software or server/OEMs with chip companies noticeably absent. Memory has matured, processors have matured, even networking and performance graphics has matured. Attend any conference with a grizzled IC veteran and you may hear the standard refrain "it's all been done before". The question is: is VLSI semiconductor innovation fine, dead, dying, or does it just need some kind of kick-start?

### Panelists:

M. Horowitz, Stanford H. Morimura, NTT S. Kawahito, Shizuoka Univ. G. Shahidi, IBM S. Kosonocky, AMD I. Young, Intel

H. Lee, MIT

# R2: Will the Future Have More Analog or Digital Processing?

Organizers: B. Ginsburg, Texas Instruments

M. Takamiya, University of Tokyo

Moderator: B. Ginsburg, Texas Instruments

Since the early days of DSP, traditional analog functionality has been increasingly replaced by digital circuits, due to added flexibility, robustness, and the promise of smaller area and lower power operation. The extent of digital has progressed commensurate with the ability to efficiently digitize signals. ADC energy efficiency has improved by more than 500x over the last decade, such that it can be more efficient to generate bits than actually process them in the digital domain. Given new technology nodes do not exhibit the same digital energy scaling as experienced in the past, will real energy-constrained systems become increasingly analog in their partitioning? Is the push towards digital replacing analog finished, or is the overall trend irreversible?

#### Panelists:

E. Alon, Univ. of California, Berkeley A. Momtaz, Broadcom

M. Ikeda, University of Tokyo K. Nakamura, Analog Devices

T. Miki, Renesas J. Savoj, Xilinx

Session 16 – TAPA I
Circuits Special Focus Session - Flash Memory

Chairpersons: M. Bauer, Micron Tech.

H. Hwang, Samsung Electronics Co., Ltd.

## 16.1 - 8:05am

A Logic-Compatible Embedded Flash Memory Featuring a Multi-Story High Voltage Switch and a Selective Refresh Scheme, S.-H. Song, K.C. Chun, C.H. Kim, University of Minnesota

A logic-compatible embedded flash memory that uses no special devices other than standard core and IO transistors is demonstrated in a low-power standard logic process having a 5nm tunnel oxide. An overstress-free high voltage switch expands the cell VTH window by >170% while a 5T embedded flash memory cell with a selective row refresh scheme is employed for improved endurance.

## 16.2 - 8:30 a.m.

A New 3-bit Programming Algorithm using SLC-to-TLC Migration for 8MB/s High Performance TLC NAND Flash Memory, S.-h. Shin, D.-K. Shim, J.-Y. Jeong, O.-S. Kwon, S.-Y. Yoon, M.-H. Choi, T.-Y. Kim, H.-W. Park, H.-J. Yoon, Y.-S. Song, Y.-H. Choi, S.-W. Shim, Y.-L. Ahn, K.-T. Park, J.-M. Han, K.-H. Kyung, Y.-H. Jun, Samsung Electronics

We have developed a new 3-bit programming algorithm of high performance TLC(Triple-level-cell, 3-bit/cell) NAND flash memories for 20nm node and beyond. By using the proposed 3-bit algorithm based on reprogramming with SLC-to-TLC migration, performance and BER is improved by 50% and 68%, respectively, compared to conventional method. The proposed algorithm is successfully implemented in 21nm 64Gb TLC NAND flash product that provides 8MB/s write and 400MB/s read throughputs.

#### 16.3 - 8:55 a.m.

x11 Performance Increase, x6.9 Endurance Enhancement, 93% Energy Reduction of 3D TSV-Integrated Hybrid ReRAM/MLC NAND SSDs by Data Fragmentation Suppression, H. Fujii, K. Miyaji, K. Johguchi, K. Higuchi, C. Sun, K. Takeuchi, University of Tokyo

A 3D through-silicon-via (TSV) -integrated hybrid ReRAM/multi-level-cell (MLC) NAND solid-state drives (SSDs) architecture is proposed for PC, server and smart phone applications. NAND-like interface and sector-access overwrite policy are proposed for the ReRAM. Furthermore, three intelligent data management algorithms (anti-fragmentation, most-recently-used and reconsidered-as-a-fragmentation algorithms) are proposed. The proposed algorithms suppress data fragmentation and excess usage of the MLC NAND by storing hot data in the ReRAM. As a result, 11 times performance increase, 6.9 times endurance enhancement and 93% write energy reduction are achieved compared with the conventional MLC NAND SSD. Both ReRAM write and read latency should be less than 3us to obtain these improvements. The Required endurance for ReRAM is 1e5. 3D TSV interconnects reduce the energy consumption by 68%.

#### 16.4 - 9:20 a.m.

Adaptive Multi-Pulse Program Scheme Based on Tunneling Speed Classification for Next Generation Multi-Bit/Cell NAND FLASH, Y.S. Cho, I.H. Park, S.Y. Yoon, N.H. Lee, S.H. Joo, K.-W. Song, K. Choi, J.M. Han, K.H. Kyung, Y.-H. Jun, Samsung Electronics Co., Ltd.

As device technology is scaling down, Vth's of flash cell show wide distribution due to process variation such as random dopant fluctuation, etc. Since the extension of Vth distribution is directly related with the performance degradation of NAND flash, it is more challenging to make a high performance flash memory. This paper presents a novel program scheme, called Adaptive Multi-pulse Program (AMP), which targets toward scaled multi-bit/cell NAND flash devices. In the AMP scheme memory cells are divided into several groups based on its own program speed. Suitable program voltages are applied for each group and thus cells having different program speed reach its target level at the same time. Our experimental results show that AMP achieves ~20% improvement on program performance in 3-bit/cell architecture of 21nm CMOS technology.

# Session 17 – TAPA II Low Power Receivers and Jitter Reduction

Friday, June 15, 8:05 a.m. Chairpersons: K. Chang, Xilinx

C. Yoo, Hanyang University

#### 17.1 - 8:05 a.m.

A 25-Gb/s 5-mW CMOS CDR/Deserializer, J.W. Jung, B. Razavi, University of California, Los Angeles

A half-rate clock and data recovery circuit and a deserializer employ charge-steering logic to reduce the power consumption. Realized in 65-nm technology, the overall circuit draws 5 mW from a 1-V supply, producing a clock with an rms jitter of 1.5 ps and a jitter tolerance of 0.5 Ulpp at 5 MHz.

## 17.2 - 8:30 a.m.

4×12 Gb/s 0.96 pJ/b/lane Analog-IIR Crosstalk Cancellation and Signal Reutilization Receiver for Single-Ended I/Os in 65 nm CMOS, T. Oh, R. Harjani, University of Minnesota

A crosstalk cancellation and signal reutilization (XTCR) algorithm implemented with analog-IIR networks dramatically improves signal integrity across 4 closely-spaced single-ended PCB traces. The prototype XTCR design implemented in 65 nm CMOS improves the measured average horizontal and vertical-eye openings of the 4 channels by 37.5% and 26.4% at 10-8 BER, while consuming only 0.96 pJ/b/lane.

#### 17.3 - 8:55 a.m.

A Clock Jitter Reduction Circuit Using Gated Phase Blending Between Self-Delayed Clock Edges, K. Niitsu, N. Harigai, D. Hirabayashi, D. Oki, M. Sakurai, O. Kobayashi\*, T.J. Yamaguchi, H. Kobayashi, Gunma University, \*STARC

A clock jitter reduction circuit is presented that exploits the phase blending technique between the uncorrelated clock edges that are self-delayed by multiples of the clock cycle, nT. By blending non-correlated clock edges, the output clock edges approach the ideal timing and, thus, timing jitter can be reduced by a factor of the square root of two per stage. There are three technical challenges to realize this: 1) generating non-correlated clock edges, 2) phase averaging with small time offset from the ideal center position, and 3) minimizing the error in nT-delay being deviated from ideal nT. The proposed circuit overcomes each of these by exploiting an nT-delay, gated phase blending, and self-calibrated nT-delay elements, respectively. Measurement results with a 180-nm CMOS prototype chip demonstrated an approximately four-fold reduction in timing jitter from 30.2ps to 8.8ps in 500-MHz clock by cascading the proposed circuit with four-stages.

# 17.4 - 9:20 a.m.

A 1.22mW/Gb/s 9.6Gb/s Data Jitter Mixing Forwarded-Clock Receiver Robust against Power Noise with 1.92ns Latency Mismatch between Data and Clock in 65nm CMOS, S.-H. Chung, L.-S. Kim, KAIST

This paper presents a data jitter mixing forwarded-clock receiver which is robust against power supply induced jitter (PSIJ) and overcomes 1.92ns latency mismatch between data and clock. The forwarded-clock architecture has a tradeoff between the number of clock channels and the achievable data rate due to the lack of the jitter correlation between data and clock. Moreover, PSIJ due to a long clock distribution network and an injection-locked oscillator reduces the jitter correlation further. The proposed receiver eases this tradeoff, and also increases the jitter correlation reduced by PSIJ. The test chip achieves 9.6Gb/s with 1.22mW/Gb/s and occupies only 0.017mm2 in 65nm CMOS.

Session 18 – TAPA I
SoC and Signal Processors

Friday, June 15, 10:00 a.m.

Chairpersons: E. Yeo, Marvell Semiconductors

M. Motomura, Hokkaido University

#### 18.1 - 10:00 a.m.

A Low Power Many-Core SoC with Two 32-Core Clusters Connected by Tree Based NoC for Multimedia Applications, H. Xu, J. Tanabe, H. Usui, S. Hosoda, T. Sano, K. Yamamoto, T. Kodaka, N. Nonogaki, N. Ozaki, T. Miyamori, Toshiba Corporation

A low-power many-core SoC for multimedia applications is implemented in 40nm CMOS technology. Within a 209.3mm2 die, two 32-core clusters are integrated with dynamically reconfigurable processors, hardware accelerators, 2-channel DDR3 I/Fs, and other peripherals. Processor cores in the cluster share a 2MB L2 cache connected through a tree-based Network-on-Chip (NoC). The high scalability and low power consumption is accomplished by the parallelized firmware for multimedia applications, such as the H.264 1080p 30fps decoding under 500mW and the super resolution 4K2K 15fps image processing under 800mW.

## 18.2 - 10:25 a.m.

A 69mW 140-meter/60fps and 60-meter/300fps Intelligent Vision SoC for Versatile Automotive Applications, Y.-M. Tsai, T.-J. Yang, C.-C. Tsai, K.-Y. Huang, L.-G. Chen, National Taiwan University

A machine-learning based intelligent vision SoC implemented on a 9.3 mm2 die in a 40nm CMOS process is presented. The architecture realizes 140 meters active distance at 60fps and 60 meters at 300fps under Quad-VGA (1280×960) resolution while maintaining above 90% detec-tion rate for versatile automotive applications. The system supports 64 object tracking and prediction. It raises 1.62× improvement on power efficiency and at least 1.79× increase on frame rate with the proposed knowledge-based tracking processor. The chip achieves 354.2fps/W and 3.01TOPS/W power efficiency with 69mW average power consumption.

#### 18.3 - 10:50 a.m.

A 4320p 60fps H.264/AVC Intra-Frame Encoder Chip with 1.41Gbins/s CABAC, D. Zhou, G. He, W. Fei, Z. Chen, J. Zhou, S. Goto, Waseda University

An H.264/AVC intra-frame video encoder is implemented in 65nm CMOS. With an efficient intra prediction design, its maximum throughput reaches 1991Mpixels/s for 7680x4320p 60fps video, 9.4x to 32x faster than previous designs. The encoder also incorporates a 1.41Gbins/s CABAC architecture that has been enhanced by 31%. Moreover, low energy consumption is achieved by the high parallelism and hardware efficiency of this design. 1080p30 encoding dissipates only 2mW at 0.8V and 9MHz.

## 18.4 - 11:15 a.m.

A Sub-100µW Multi-Functional Cardiac Signal Processor for Mobile Healthcare Applications, S.-Y. Hsu, Y. Ho, Y. Tseng, T.-Y. Lin, P.-Y. Chang, J.-W. Lee, J.-H. Hsiao, S.-M. Chuang, T.-Z. Yang\*, P.-C. Liu, T.-F. Yang, R.-J. Chen\*\*, C. Su, C.-Y. Lee, National Chiao Tung University, \*Taipei Medical University Hospital, \*\*Wan Fang Hospital

A multi-functional cardiac signal processor (CSP) with integrated sensor interfaces is designed for mobile healthcare applications, especially for heart activity diagnosis in different phases. Applying dedicated processing engines, the CSP extracts critical cardiac signal features based on compressed data with 90% storage reduction, while keeping the data network secure. Implemented in 90nm CMOS, the CSP consumes 22.6-46.5 $\mu$ W at 0.5/1.0V in different configurations. Besides, the 10.2 $\mu$ W biopotential and 11.4 $\mu$ W capacitive sensor interfaces further enhance the system functionality.

### 18.5 - 11:40 a.m.

A 0.25V 460nW Asynchronous Neural Signal Processor with Inherent Leakage Suppression, T.-T. Liu, J. Rabaey, University of California, Berkeley

A neural signal processor exploits an asynchronous timing strategy to dynamically minimize leakage and to self-adapt to the process variations and different operating conditions. Based on a logic topology with built-in leakage suppression, the self-timed processor demonstrates robust sub-threshold operation down to 0.25V, while consuming only 460nW in 0.03mm2 in a 65nm CMOS technology, representing a 4.4X reduction in power compared to the state-of-the-art designs.

Session 19 – TAPA II

ΔΣ Converters

Friday, June 15, 10:00 a.m.

Chairpersons: I. Fujimori, Broadcom Corp. M. Yoshioka, Fujitsu Labs, Ltd.

#### 19.1 - 10:00 a.m.

A 10 MHz BW 50 fJ/conv. Continuous Time ∆∑ Modulator with High-order Single Opamp Integrator using Optimization-based Design Method, K. Matsukawa, K. Obata, Y. Mitani, S. Dosho, Panasonic Corporation

We propose not only new power and area efficient circuit configurations but also an optimization design method for such configurations. So far, design difficulties of the modulator, such as a trade-off between loop stability and a performance and unknown distortion mechanism, have been serious obstacles to improve the efficiency. Major factors to overcome these obstacles are new high-order single opamp integrators using optimization-based design method and tuning systems for harmonic distortions. Two design examples for TV-tuner application confirm that those design approach can maximize the performance of various types of modulators. A simple 3rd-order modulator achieved the FOM of 101 fJ/conv. and more complex 4th-order one achieved 50 fJ/conv. which is less than half of ever reported.

### 19.2 - 10:25 a.m.

A 5MHz BW 70.7dB SNDR Noise-Shaped Two-Step Quantizer Based ΔΣ ADC, T. Oh, N. Maghari\*, U.-K.Moon, Oregon State University, \*University of Florida

In this paper, a new  $\Delta\Sigma$  ADC using a noise-shaped two-step integrating quantizer is presented. Attaining an extra order of noise-shaping from the integrating quantizer, the proposed  $\Delta\Sigma$  ADC manifests a second-order noise-shaping with a first-order loop filter. Furthermore, this quantizer provides an 8b quantization in itself, drastically reducing the oversampling requirement. The proposed ADC also incorporates a new feedback DAC topology that alleviates feedback DAC complexity of a two-step 8b quantizer. The measured results of the prototype ADC implemented in a 0.13µm CMOS demonstrate peak SNDR of 70.7dB at 8.1mW power, with an 8x OSR at 80MHz sampling frequency.

#### 19.3 - 10:50 a.m.

An 85dB SFDR 67dB SNDR 8OSR 240MS/s ∑∆ ADC with Nonlinear Memory Error Calibration, S.-C. Lee, B. Elies\*, Y. Chiu\*, University of Illinois at Urbana-Champagne, \*University of Texas at Dallas

A 1-0 MASH sigma-delta ADC demonstrates a digital calibration technique treating both amplifier distortion and capacitor mismatch. The output-referred error analysis accurately models a nonlinear modulator. The identification of multiple error parameters is accomplished by correlating various moments of the ADC output with a one-bit pseudorandom noise (PN). The prototype ADC employing 29dB gain amplifiers measures 85dB SFDR and 67dB SNDR for a-1dBFS (1.1Vpp) 5MHz sinusoidal input at 240MS/s. The core ADC consumes 37mW from a 1.25V supply and occupies 0.28mm2 in a 65nm CMOS low-leakage digital process, in which the transistor threshold voltages are around 0.5V.

19.4 - 11:15 a.m.

A Reconfigurable Mostly-Digital  $\Delta\Sigma$  ADC with a Worst-Case FOM of 160dB, G. Taylor, I. Galton\*, Analog Devices, \*University of California at San Diego

This paper presents a mostly-digital background-calibrated delta-sigma modulator ADC based on voltage-controlled ring oscillators (VCROs). As a result of several new techniques its performance is in line with the best delta-sigma modulators published to date, but it occupies much less circuit area and unlike other high-performance ADCs it is reconfigurable and consists mainly of digital circuitry. It does not use op-amps, analog integrators, feedback DACs, comparators, or reference voltages, so its performance is set by the speed of its digital circuitry and its supply voltage can be scaled with its sample-rate to save power. The sample rate is is tunable from 1.3-2.4GHz over which the SNDR spans 70-75dB, the bandwidth spans 5-37.5MHz, and the minimum SNDR + 10log(bandwidth/power dissipation) figure of merit (FOM) is 160dB. The 65nm CMOS delta-sigma modulator occupies 0.075 square millimeters and operates from a single 0.9-1.2V supply.

# 19.5 - 11:40 a.m.

**A 71dB Dynamic Range Third-Order ΔΣ TDC Using Charge-Pump,** M. Gande, N. Maghari\*, T. Oh, U.-K. Moon, Oregon State University, \*University of Florida

A high resolution time-to-digital converter (TDC) architecture is proposed. The architecture combines the principles of noise-shaping quantization and charge-pump to build a third-order  $\Delta\Sigma$  TDC with a dedicated feedback DAC. Fabricated in a 0.13 $\mu$ m CMOS process, the prototype TDC achieves better than 71dB DR and 67dB SNDR in 2.81MHz signal bandwidth (OSR=16) and consumes 2.58mW.

# Session 20 – TAPA I Clock and Interconnect

Friday, June 15, 1:30 p.m.

Chairpersons: N. Kurd, Intel Corp.

K. Kobayashi, Kyoto Institute of Technology

## 20.1 - 1:30 p.m.

A Shorted Global Clock Design for Multi-GHz 3D Stacked Chips, L.-T. Pang, P. Restle, M. Wordeman, J. Silberman, R. Franch, G. Maier\*, IBM TJ Watson Research Center, \*IBM Systems and Technology Group

A global clock distribution technique for 3D stacked chips where the clock tree and grid are shorted between strata is presented and compared with a DLL-based technique. Both permit at-speed testing of the strata before and after stack assembly. The shorting-based technique is implemented in a 2-strata eDRAM test chip using an IBM 45nm SOI 3D technology. Operation above 2.5GHz is measured.

#### 20.2 - 1:55 p.m.

**A 3-stage Pseudo Single-phase Flip-flop Family,** H. Partovi, A. Yeung, L. Ravezzi, M. Horowitz\*, Velouce Technologies, Inc., \*Stanford University

This paper presents an energy-efficient 3-stage Pseudo Single-phase family of Flip-flops (PSPFF) targeted for use in a 3GHz microprocessor in a 40nm, 0.9V CMOS technology. With latencies in line with the fast pulsed-latch and an average switching energy comparable to the master-slave flip-flop, PSPFF achieves an energy-delay product (EDP) which is 42% and 24% lower than the pulsed-latch and the master-slave flip-flop respectively. Measurement results confirm an improvement of at least 300MHz in operating frequency when using the PSPFF in place of the master-slave flip-flop.

# 20.3 - 2:20 p.m.

A Standard Cell Compatible Bidirectional Repeater with Thyristor Assist, S. Satpathy, D. Sylvester, D. Blaauw, University of Michigan

A thyristor-assisted standard cell compatible self-timed bidirectional repeater with no configuration overhead enables 8mm interconnects to achieve 37% higher speed at 20% lower energy over conventional repeaters in 65nm CMOS at 1.0V. Absence of configuration logic removes the need for clocking, yielding up to 14× higher energy efficiency at very low data switching activity.

## 20.4 - 2:45 p.m.

An Integral Path Self-Calibration Scheme for a 20.1-26.7GHz Dual-Loop PLL in 32nm SOI CMOS, M. Ferriss, J.-O. Plouchart, A. Natarajan, A. Rylyakov, B. Parker, A. Babakhani, S. Yaldiz, B. Sadhu, A. Valdes-Garcia, J. Tierno, D. Friedman, IBM TJ Watson Research Center

A bandwidth self-calibration scheme is introduced as part of a 20.1GHz to 26.7GHz, low noise PLL in 32nm CMOS SOI. A dual-loop architecture in combination with an integral path measurement and correction scheme desensitizes the loop transfer function to the VCO's small signal gain variations. The spread of gain peaking is reduced by self-calibration from 2.4dB to 1dB, when measured at 70 sites on a 300mm wafer. The PLL has a measured phase noise @10MHz offset of -126.5dBc/Hz at 20.1GHz.

Session 21 – TAPA II DC-DC Converters

Friday, June 15, 1:30 p.m. Chairpersons: T. Burd, AMD

M. Takamiya, University of Toyko

## 21.1 - 1:30 p.m.

A 50nA Quiescent Current Asynchronous Digital-LDO with PLL-Modulated Fast-DVS Power Management in 40nm CMOS for 5.6 times MIPS Performance, Y.-H. Lee, S.-Y. Peng, C.-H. Wu, C.-C. Chiu, Y.-Y. Yang, M.-H. Huang, K.-H. Chen, Y.-H. Lin\*, S.-W. Wang, C.-Y. Yeh\*, C.-C. Huang\*, C.-C. Lee\*, National Chiao Tung University, \*Realtek Semiconductor Corp.

A 50nA quiescent current asynchronous digital-LDO (DLDO) integrated with the PLL-modulated switching regulator (SWR) exhibits the hybrid power management operation. The proposed bidirectional asynchronous wave pipeline (BAWP) in the asynchronous DLDO realizes the Fast-DVS (F-DVS) operation within tens of nanoseconds. The SWR with the leading phase amplifier achieves on-the-fly DVS and 94% peak efficiency, as well as improves 5.6 times MIPS performance through hybrid operation. The fabricated chip occupies 1.04mm2 in 40nm CMOS.

# 21.2 - 1:55 p.m.

**High Area-Efficient DC-DC Converter using Time-Mode Miller Compensation (TMMC),** S.-W. Hong, T.-H. Kong, S. Jung, Su.-W. Lee, S.-W. Wang, J.-P. Im, G.-H. Cho, KAIST

For the controller design of a DC-DC converter, a Time-Mode Miller Compensation (TMMC) is introduced in this paper. Using this concept, the consuming area of the DC-DC converter can be significantly reduced without any off-chip compensation components. The chip is implemented in  $0.18\mu m$  I/O CMOS whose size is similar to  $0.35\mu m$  CMOS, and the core size of this work is only 0.12mm2. Peak efficiency is 90.6%, with switching frequency of 1.15MHz.

## 21.3 - 2:20 p.m.

A 900mA 93% Efficient 50µA Quiescent Current Fixed Frequency Hysteretic Buck Converter Using a Highly Digital Hybrid Voltage- and Current-mode Control, Q. Khan, A. Elshazly, S. Rao, R. Inti, P. Hanumolu, Oregon State University

A hysteretic buck converter employs a hybrid voltage/current mode control to regulate output voltage and switching frequency independently. Fabricated in 130nm CMOS process, the prototype consumes only  $50\mu$ A quiescent current and operates at a constant switching frequency of 1MHz over a wide range of output voltages (0.7-to-1.8V) and inductor values (1-to-5 $\mu$ H) with a peak efficiency of 93% at 900mA load current. The output ripple and the settling time of the converter are less than  $\pm 2.5$ mV and  $10\mu$ s, respectively.

### 21.4 - 2:45 p.m.

A 198-ns/V  $V_0$ -Hopping Reconfigurable RGB LED Driver with Automatic  $\Delta V_0$  Detection and Quasi-Constant-Frequency Predictive Peak Current Control, Y. Zhang, H. Chen\*, D. Ma, University of Texas at Dallas, \*Linear Technology Corporation

A CMOS RGB LED driver is presented, adopting a single-converter, reconfigurable structure to adaptively bias RGB color LEDs in high precision. Automatic  $\Delta VO$  detection and fast VO-hopping techniques are proposed, achieving 198-ns/V VO-hopping speed on 0.35 $\mu$ m CMOS. This is at least one order faster than the state-of-arts. While predictive peak current control and burst-mode operation are employed for robust operation, switching frequency is still stabilized around 1MHz by an adaptive off-timer for switching noise spectrum control. The driver consumes 8.6 times less headroom power than its fixed-output counterparts.

# Session 22 – TAPA I Digital Timing Generations Circuits

Friday, June 15, 3:25 p.m.

Chairpersons: A. Emami, CalTech

K. Sunaga, NEC Corp.

## 22.1 - 3:25 p.m.

Design of a 2.5-GHz, 3-ps Jitter, 8-Locking-Cycle, All-Digital Delay-Locked Loop with Cycle-by-Cycle Phase Adjustment, C.-Y. Cheng, J.-S. Wang, C.-T. Yeh, J.-S. Sheu, National Chung-Cheng University, \*United Microelectronics Corp.

This paper describes the design of a multi-GHz ADDLL. The HDSC-based coarse-fine architecture is adopted for achieving low power and harmonic locking free when the operating frequency range is large. For preventing from long locking in GHz operations, a new resettable coarse delay line and a new asynchronous-binary-search design are proposed for achieving fast coarse and fine locking, respectively. Furthermore, a novel maintenance operation is proposed so that phase adjustment can be performed cycle by cycle to effectively suppress the jitter. Measurement results show that the designed 1.0V 55nm ADDLL has a peak-to-peak jitter of 3 ps and a locking time of 8 cycles when operated at 2.5 GHz with a power dissipation of only 1.96 mW.

## 22.2 - 3:50 p.m.

A 1.5GHz 1.35mW -112dBc/Hz In-band Noise Digital Phase-Locked Loop with 50fs/mV Supply-Noise Sensitivity, A. Elshazly, R. Inti, M. Talegaonkar, P.K. Hanumolu, Oregon State University

A highly digital PLL employs a 1b TDC and a low power regulator to reduce output jitter in the presence of large amount of supply-noise. Fabricated in 0.13μm CMOS, the ring-oscillator based DPLL consumes 1.35mW at 1.5GHz output frequency and achieves better than 50fs/mV worst-case noise sensitivity (=10pspp jitter degradation with 200mVpp noise). The proposed DPLL achieves the lowest power, and the best reported supply noise rejection compared to state-of-the-art PLLs.

# 22.3 - 4:15 p.m.

A 61-dB SNDR 700 μm<sup>2</sup> Second-Order All-Digital TDC with Low-Jitter Frequency Shift Oscillators and Dynamic Flipflops, T. Konishi, K. Okuno, S. Izumi, M. Yoshimoto, H. Kawaguchi, Kobe University

We present a small-area second-order all-digital time-to-digital converter (TDC) with two frequency shift oscillators (FSOs) comprising inverter chains and dynamic flipflops featuring low jitter. The proposed FSOs can maintain their phase states through continuous oscillation, unlike conventional gated ring oscillators (GROs) that are affected by transistor leakage. Our proposed FSOTDC is more robust and is eligible for all-digital TDC architectures in recent leaky processes. Low-jitter dynamic flipflops are adopted as a quantization noise propagator (QNP). A frequency mismatch occurring between the two FSOs can be canceled out using a least mean squares (LMS) filter so that second-order noise shaping is possible. In a standard 65-nm CMOS process, an SNDR of 61 dB is achievable at an input bandwidth of 500 kHz and a sampling rate of 16 MHz, where the respective area and power are 700  $\mu$ m2 and 281  $\mu$ W.

## 22.4 - 4:40 p.m.

A 7b, 3.75ps Resolution Two-Step Time-to-Digital Converter in 65nm CMOS Using Pulse-Train Time Amplifier, K. Kim, Y. Kim, W. Yu, S. Cho, KAIST

This paper presents a time-to-digital converter (TDC) using a novel pulse-train time amplifier. The proposed TDC exploits repetitive pulses with gated delay-lines for a calibration-free and programmable time amplification and quantization. Using this circuit, a 7-bit two-step time-to-digital converter is implemented. The prototype chip fabricated in 65nm CMOS process achieves 3.75ps of time resolution at 200Msps while consuming 3.6mW and occupying 0.02mm2.

# Session 23 – TAPA II Power Management Circuits

Friday, June 15, 3:25 p.m.

Chairpersons: H. Bergveld, NXP Semiconductors

H. Nakamoto, Fujitsu Labs, Ltd.

## 23.1 - 3:25 p.m.

A 0.45-V Input On-Chip Gate Boosted (OGB) Buck Converter in 40-nm CMOS with More Than 90% Efficiency in Load Range from 2μW to 50μW, X. Zhang, P.-H. Chen, Y. Ryu\*, K. Ishida, Y. Okuma\*, K. Watanabe\*, T. Sakurai, M. Takamiya, University of Tokyo, \*STARC

A 0.45-V input, 0.4-V output on-chip gate boosted (OGB) buck converter with clock gated digital PWM controller in 40-nm CMOS achieved the highest efficiency to date with the output power less than 40uW. A linear delay trimming by a logarithmic stress voltage (LSV) scheme to compensate for the die-to-die delay variations of a delay line in the PWM controller with good controllability is also proposed.

# 23.2 - 3:50 p.m.

A Fully Electrical Startup Batteryless Boost Converter with 50mV Input Voltage for Thermoelectric Energy Harvesting, H.-Y. Tang, P.-S. Weng, P.-C. Ku, L.-H. Lu, National Taiwan University

A fully electrical startup boost converter is presented in this paper. With a three-stage stepping-up architecture, the proposed circuit is capable of performing thermoelectric energy harvesting at an input voltage as low as 50 mV. Due to the zero-current-switching (ZCS) operation of the boost converter and automatic shutdown of the low-voltage starter and the auxiliary converter, conversion efficiency up to 73% is demonstrated. The boost converter does not require bulky transformers or mechanical switches for kick-start, making it very attractive for body area sensor network applications.

# 23.3 - 4:15 p.m.

Integrated All-silicon Thin-film Power Electronics on Flexible Sheets For Ubiquitous Wireless Charging Stations based on Solar-energy Harvesting, L. Huang, W. Rieutort-Louis, Y. Hu, J. Sanz-Robinson, S. Wagner, J.C. Sturm, N. Verma, Princeton University

With the explosion in the number of battery-powered portable devices, ubiquitous powering stations that exploit energy harvesting can provide an extremely compelling means of charging. We present a system on a flexible sheet that, for the first time, integrates the power electronics using the same thin-film amorphous-silicon (a-Si) technology as that used for established flexible photovoltaics. This demonstrates a key step towards future large-area flexible sheets which could cover everyday objects, to convert them into wireless charging stations. In this work, we combine the thin-film circuits with flexible solar cells to provide embedded power inversion, harvester control, and power amplification. This converts DC outputs from the solar modules to AC power for wireless device charging through patterned capacitive antennas. With 0.5-2nF transfer antennas and solar modules of 100cm2, the system provides 47-120µW of power at 11-22% overall power-transfer efficiency under indoor lighting.

# 23.4 - 4:40 p.m.

**A 2.98nW Bandgap Voltage Reference Using a Self-Tuning Low Leakage Sample and Hold,** Y.-P. Chen, M.Fojtik, D. Blaauw, D. Sylvester, University of Michigan

A novel low power voltage reference using a sample and hold circuit with self-calibrating duty cycle and leakage compensation is presented. Implemented in 180nm CMOS, it shows a temperature coefficient of 24.7ppm/°C and power consumption of 2.98nW which marks a 251× power improvement over the best prior bandgap reference.

### 23.5 - 5:05 p.m.

**A 635pW Battery Voltage Supervisory Circuit for Miniature Sensor Nodes,** I. Lee, S. Bang, Y. Lee, Y. Kim, G. Kim, D. Sylvester, D. Blaauw, University of Michigan

We propose a low power battery voltage supervisory circuit for micro-scale sensor systems that provides power-on reset, brown-out detection, and recovery detection to prevent malfunction and battery damage. Ultra-low power is achieved using a 57pA, fast stabilizing two-stage voltage reference and an 81pA leakage-based oscillator and clocked comparator. The supervisor was fabricated in 180nm CMOS and integrated with a complete 1 mm3 sensor system. It consumes 635pW at 3.6V supply voltage, which is an 850× reduction over the best prior work.