On On Potential Potential Design Design Impacts Impacts of of Electromigration Electromigration Awareness Awareness Andrew B. Kahng, Siddhartha Nath and Tajana S. Rosing VLSI CAD LABORATORY, UC San Diego UC San Diego / VLSI CAD Laboratory -1- Outline Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -2-

Electromigration Electromigration in in Interconnects Interconnects Electromigration (EM) is the gradual displacement of metal atoms in an interconnect Iavg causes DC EM and affects power delivery networks Irms causes AC EM and affects clock and logic signals -3- EM EM Lifetime Lifetime EM degrades interconnect lifetime Blacks Equation calculates lifetime of interconnect segment due to EM degradation

50 = t50 median time to failure (= loge 2 x MTTF) A* geometry-dependent constant J current density in interconnect segment n constant ( = 2) Ea activation energy of metal atoms k Boltzmanns constant T temperature of the interconnect -4- Parameters Parameters Affecting Affecting EM EM MTTF MTTF Wwire Driver size Fanout Jrms

Vdd MTTF Freq Temp A B Inverse relation; if A increases then B decreases A B Direct relation; if A increases then B increases Design parameters Runtime parameters -5- Why Why Is Is EM EM Important Important Now? Now? Cu rren t DCurrent en sity Density

(M A /cm 2) (MA/cm2) ITRS 2011 data shows that EM will be a significant reliability issue Physical design teams trade off 3 performance and/or resources to meet EM 2.5 MTTF 2 Jrms 1.5 What values of MTTF do we really need? 1 MTTF 0.5 In the US, people replace 2.5 2 1.5 10

0.5 0 10 Cell phones every 2 years Laptops every 3 5 years Year 9Servers every 8 7 6 5 34 7 3 years 2 1 MTTF (years) Devices can be designed with small EM lifetimes -6- Examples Examples of of EM EM Guardband Guardband Wwire Driver size

Fanout Jrms Vdd MTTF Freq Temp To meet EM MTTF margin at given wire width upper bound Reduce Jrms reduce driver size slower circuit To meet EM MTTF margin at given performance requirement Increase Wwire increase capacitance, dynamic power -7- Outline Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax

Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -8- To To Meet Meet EM EM Lifetime Lifetime Requirements Requirements Three major categories of prior work EM MTTF modeling Black69 (Blacks Equation) Liew89 (AC lifetime models) Lu07 and Wu12 (Joule heating) Architecture changes to mitigate EM Srinivasan04 (RAMP) Romanescu08 (core cannibalization) Synthesis and physical design (PD) techniques to reduce current density violations Dasgupta96 (limit Jrms violation at synthesis)

Jerke04 (limit Jrms violation at PD) Lienig03 (post-route Jrms fixes) -9- Outline Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -10- Key Key Idea Idea We quantify impact of EM guardband on performance (Fmax), area and power [Blacks Equation] 2

( ) = 2 (Irms,limit)2 = (Irms,default)2 x MTTFdefault/MTTFreduced Fmax Area / Power Decrease MTTFreduced increase Irms,limit We study impacts on Fmax, area and power -11- Approach Approach We conduct two studies 1. MTTF vs. Fmax tradeoffs with fixed resource budget 2. MTTF vs. resources tradeoffs with fixed performance requirement Assumptions 10 years = example default EM MTTF Six testcases Report

three representative (AES, DMA, JPEG) -12- Key Key Contributions Contributions We are the first to quantify impacts of EM guardband on performance and resources by using PD flows We introduce EM slack as an accurate measure of potential performance improvements in different circuits at reduced MTTF requirements Blacks Equation cannot accurately quantify the impacts of EM-awareness in circuits We study how tightness vs. looseness of timing constraints determine area and power trends at reduced MTTF Our study flow/methodology can potentially be used by architects and front-end designers to improve performance at no area cost physical designers whose levers are conventional SI and EM fixing methods -13- Outline

Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -14- EM EM Slack Slack When EM violations occur , = Blacks Equation 2 ( ) =

2 ( 1 + 1 ) > , Theoretical limit of Irms,net , , Basic Concept: EM slack of a net (units: mA) , = , , ,

( 1 ) -15- Significance Significance of of EM EM Slack Slack Positive EM slack potential for improved Fmax If EM slack > 0, a part of it can be used to increase Irms,limit by reducing MTTF (from Blacks Equation), and improve Fmax by using SP&R knobs (e.g., gate sizing) without causing EM violations

, = ( 1 + 1 ) > , -16- Outline Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -17- Study

Study 1: 1: MTTF MTTF vs. vs. F Fmax max Study MTTF vs. Fmax tradeoffs given upper bounds on area, temperature and #EM violations Setup Three testcases: AES, DMA and JPEG Two technology libraries: TSMC 45GS and 65GPLUS Upper bounds temperature = 378 K area = 66% utilization #EM violations = 25 Synopsys DesignCompiler and Cadence SOC Encounter flows Thermal analysis using Hotspot -18- Automated Automated Flow Flow to to Determine Determine F Fmax max RTL

LIB SDC Synthesis (DC) NETLIS T MTTFreduced Tech. LEF LEF Place and Route (SOCE) AR UTILtar -- timing slack -- peak temperature -- UTILeff -- #EM violations PI Check constraints Met all constraint s?

Y Increase frequency N Decrease frequency Frequenc y tried before? Y EXIT N -19- Automated Automated Flow Flow to to Determine Determine F Fmax max RTL LIB SDC Synthesis

(DC) NETLIS T MTTFreduced Tech. LEF LEF Place and Route (SOCE) AR UTILtar -- timing slack -- peak temperature -- UTILeff -- #EM violations PI Check constraints Met all constraint s? Y Increase frequency N

Decrease frequency Frequenc y tried before? Y EXIT N -20- Derating Derating LEF LEF We derate current density limits in technology Library Exchange Format (LEF) file Reduced lifetime (ratio > 1) increases Jrms limit Reduced toggle rate (ratio > 1) increases Jrms limit Increased maximum temperature

increases Jrms limit -21- Automated Automated Flow Flow to to Determine Determine F Fmax max RTL LIB SDC Synthesis (DC) NETLIS T MTTFreduced Tech. LEF LEF Place and Route (SOCE) AR UTILtar

-- timing slack -- peak temperature -- UTILeff -- #EM violations PI Check constraints Met all constraint s? Y Increase frequency N Decrease frequency Frequenc y tried before? Y EXIT N -22- Binary

Binary Search Search for for F Fmax max Increase frequency by step until some constraint is violated Perform binary search between the current F and the last feasible F to find Fmax -23- Automated Automated Flow Flow to to Determine Determine F Fmax max RTL LIB SDC Synthesis (DC) NETLIS

T MTTFreduced Tech. LEF LEF Place and Route (SOCE) AR UTILtar -- timing slack -- peak temperature -- UTILeff -- #EM violations PI Check constraints Met all constraint s? Y Increase frequency N Decrease frequency

Frequenc ytried before? Y EXIT N -24- Flow Flow to to Fix Fix EM EM Violations Violations Netlist from Fmax determination flow return N Group nets depending on the extent of Irms,limit violations N Y Timing met? Perform timing analysis

Create nondefault rules (NDR) for each net group Decrease fanout Perform ECO route All EM violation s fixed? N Downsize drivers Y return Y -25- Automated Automated Flow Flow to to Determine Determine F Fmax max RTL LIB

SDC Synthesis (DC) NETLIS T MTTFreduced Tech. LEF LEF Place and Route (SOCE) AR UTILtar -- timing slack -- peak temperature -- UTILeff -- #EM violations PI Check constraints Met all constraint s? Y

Increase frequency N Decrease frequency Frequenc ytried before? Y EXIT N -26- % In c r e a s e in P e r fo r m a n Observation Observation 1 1 45nm 90% 80% DMA AES JPEG 70%

60% 50% 40% 30% 20% 10% 0% 10 9 8 7 6 5 4 3 2 Fmax scaling is not uniform across designs and at reduced MTTF as suggested by Blacks Equation Fmax scaling is determined by the EM slack in each design at each MTTF requirement Large F improvements may be setup artifacts 1

-27- Observation Observation 2 2 % of EM violations % EM slack 90% 80% DMA DMA AES AES JPEGJPEG 70% 60% 80% 50% 60% 40% 30% 40% 20% 20% 10%

EM slack (not timing slack) limits performance 0% 0% 10 2 12 10 99 8 8 7 7 6 6 5 5 4 43 3 1 scaling due to AC EM EM slack determines Fmax at fixed resources % of positive EM slack is usable to improve Fmax by reducing MTTF requirement EM violations in critical paths lead to positive EM slack -28- Observation Observation 3 3 105% DMA AES JPEG 100% 85% TEMP

90% AREA 95% EM % of target utilization 110% 80% 10 9 8 7 6 5 4 3 Area 75% and temperature can be dominating 2 1 10 9 7 6 5 4 3 2 1 constraints at8 lower MTTF requirements

Area limits Fmax scaling for MTTF 7 years (DMA) Area upper bounds are violated for MTTF 6 years; Temperature upper bounds are violated for MTTF 3 years -29- Outline Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -30- Study Study 2: 2: MTTF MTTF vs. vs. Area, Area, Power Power Study MTTF vs. area and power tradeoffs at

a fixed performance requirement Setup DMA at 2000 MHz (2ps slack after SP&R at 45nm) AES at 1100 MHz (1.6ps slack after SP&R at 45nm) JPEG at 850 MHz (93ps slack after SP&R at 45nm) Two technology libraries: TSMC 45GS and 65GPLUS -31- Observation Observation 4 4 49000 36.6 Power Area (m2) (mW) 36.4 48950 36.2 36 48900 35.8 48850 35.6 35.4 48800

35.2 48750 35 00 11 22 33 44 55 66 77 88 9 10 Large positive timing slack at MTTF = 10 years can lead to smaller area when MTTF requirement is reduced Large positive timing slack at MTTF = 10 years can -32- A rea (m 2) Power (mW) Observation

Observation 5 5 17 13740 16.5 13720 13700 16 13680 13660 13640 15.5 13620 13600 15 13580 13560 14.5 0 Area and power can4 decrease as9 MTTF 1 2 3 5 6 7 8 10 0 is1 reduced 2 3 4 for

5 designs 6 7 8 with 9 10 loose requirement constraints Small positivetiming timing slack at MTTF = 10 years can lead to increase in area as MTTF requirement is reduced Small positive timing slack at MTTF = 10 years can-33- Outline Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -34- Conventional Conventional EM EM Fixes Fixes and

and MTTF MTTF Study how conventional SI and EM fixing methods affect area and performance at reduced MTTF requirements. Setup Sweep MTTF from 10 years down to 1 year Apply per-net NDRs, driver downsizing and fanout reduction fixes Study using AES, JPEG and DMA testcases Two technology libraries: TSMC 45GS and 65GPLUS Insights are very instance-, technology/libraryand flow-specific -35- Observation Observation 6 6 Fmax Area 3% 2.5% 2% 1.5% 1% 0.5%

0% AES JPEG DMA Fixing EM violations using NDRs can be effective in improving Fmax only till MTTF = 7 years % increase in Fmax is less than 5% % increase in area is ~2% -36- Observation Observation 7 7 3.5% 3% Fmax Fmax Area Area 3% 2.5% 2.5% 2% 2% 1.5%

1.5% 1% 1% 0.5% 0.5% 0% 0% 8 12 11 16 14 17 2024 23 28 26 2932 20 size knobs to increase Fanout more Driver effective NDRs can be Fmax with less increase in area Fanout reductions to fix EM can increase Fmax by 3% at the cost of 1.86% increase in area Drive downsizing to fix EM can increase Fmax by -372.5% at the cost of 2% increase in area Outline

Outline Motivation Previous Work Our Work Preliminaries Study 1: MTTF vs. Fmax Study 2: MTTF vs. Area, Power Insights on Conventional EM Fixes Conclusions -38- Conclusions Conclusions We study and quantify potential impacts of improved EM-awareness in designs through two basic studies Our key observations Study 1: Available performance scaling (up to 80%) from MTTF reduction is dependent on EM slack Study 2: Area and power can decrease when MTTF is reduced in designs with loose timing constraints Additional studies: NDRs can be more effective in increasing performance ~5% at the cost of 2% increase in area for MTTF up to 7 years

Ongoing work EM reliability requirements in multiple operating modes Combined impacts of EM and other back end of the line reliability mechanisms on interconnect lifetime -39- Acknowledgments Acknowledgments Work supported by IMPACT, SRC, NSF, Qualcomm Inc. and NXP Semiconductors -40- Thank You! -41- Backup -42- Hotspot Hotspot Setup Setup We use Hotspot5.0 calibrated with thermal package from Qualcomm Inc. We perform two kinds of modeling Without heat spread and heat sink when profiling

single block of AES, JPEG or DMA (area in m2) With heat spreader and heat sink when profiling 50x50 blocks of AES, JPEG, or DMA in an area of ~5mm2 We get same values of temperature for a single block from both these methods -43-