#### Implementation strategies for digital design

Chip complexity grows faster than design productivity

From 1980: **Design gap** Compound complexity growth rate (#transistors/chip): **58%/Year** 

Compound productivity growth rate (Transistor/Person-month): 21%/Year

→ Team size increases (now 1000)

# Productivity leaps are due to the introduction of new design technologies

| Time:<br>70s | Programmable Logic Arrays     |
|--------------|-------------------------------|
|              | Standard cells                |
|              | Macro cells, module compilers |
|              | Gate arrays                   |
|              | Reconfigurable hardware       |
|              |                               |



#### Semiautomatic placement and routing



#### Intel 4004 – custom design

2300 PMOS

10  $\mu$ m process

Clock: 108 KHz

Area: 3 mm x 4 mm



Courtesy Intel

### **Transition to Automation and Regular Structures**



**Courtesy Intel** 

Intel 8286



## **Tradeoff between flexibility and efficiency**

#### ADVANTAGES of Flexibility (Programmability):

- Reusable design of multiple applications
- Generic hardware with upgradable software

#### DISADVANTAGES:

- Loss in performance: large overhead and slower operation (instruction decoding)
- Increase in energy consumption: same reason.



### Cost of an integrated circuit



Variable cost of 1 chip +  $\frac{\text{fixed costs}}{\# \text{ of chips}}$ 

#### NRE (Fixed Costs) and RE (Variable Costs)



#### **Die Yield**

Empirical formula (for a modern CMOS process  $\alpha \sim 3$ ) Die yield =  $\left[1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \frac{\# \text{ defects per unit area } \times \text{ die area}}{1 + \frac{\# \text{ defects per unit area } \times \frac{\# \text{ defects per unit area } \times \frac{\# \text{ defects } \times \frac{\# \text{$ α Die Yield (%) vs Die Area (cm<sup>2</sup>) 100% For example: 1 defect/cm<sup>2</sup> 80% 60% Die area **Yield**  $0.025 \text{ cm}^2$ 40% 99%  $0.25 \text{ cm}^2$ 78% 20% 2.5 cm<sup>2</sup> 16.2% 0% 0 0,5 1 2 1,5 2,5

#### Implementation approaches to digital design



#### **Custom design**

Transistor by transistor design of complete circuit

- [+] High performance
- [-] Time consuming  $\rightarrow$  High cost of design
  - $\rightarrow$  High time to market

| WHEN | Blocks that have to be used many times | e.g. Library Cells                               |  |  |
|------|----------------------------------------|--------------------------------------------------|--|--|
|      | Very high volume ICs                   | e.g. Microprocessors                             |  |  |
|      | Cost is not an issue                   | e.g. defense applications<br>e.g. supercomputing |  |  |
|      |                                        | applications                                     |  |  |

#### **Cell-based semicustom design**

We need a library of cells to be used for the design

| Type of library |                                                                  |
|-----------------|------------------------------------------------------------------|
| Standard cells  | Logic gates,<br>MSI circuits: Decoders, multiplexes,<br>encoders |
| Macro cells     | Memory Bank                                                      |
| Mega cells      | Microprocessor, DSP, PCI interface                               |

# **1**<sup>st</sup> generation Standard Cell — Example

|                                         | Rows for cells      |
|-----------------------------------------|---------------------|
|                                         | — Routing channel   |
|                                         | Rows for cells      |
|                                         |                     |
|                                         |                     |
|                                         |                     |
|                                         | [Brodersen92]       |
| Not very dense – lot of space used by t | he routing channels |

# Standard Cell – The New Generation



Cell-structure hidden under interconnect layers



Much denser structure: no space occupied by routing channels Routing occurs through higher interconnect layers



#### Suitable to Fabless industry model

| Fabless company                            | Design + Testing +<br>Sale                     | e.g. Marvell, Qualcomm<br>Dialog Semiconductor,<br>Altera, Xilinx |
|--------------------------------------------|------------------------------------------------|-------------------------------------------------------------------|
| Foundry                                    | Fabrication +<br>Standard cells for<br>Fabless | e.g. TSMC, UMC, SMIC                                              |
| IDM (Integrated<br>Device<br>Manufacturer) | Design +<br>Fabrication +<br>Testing + Sale    | e.g. INTEL, Samsung<br>STM                                        |
| IP Vendor                                  | Macro cell library<br>Soft Macromodules<br>IP  | e.g. ARM                                                          |

#### **Macrocells or Macromodules**

| Hard<br>Macrocells | Custom designed<br>for a <b>specific CMOS</b><br>process  | Predetermined functionality<br>AND<br>Predetermined physical<br>implementation             |
|--------------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------------|
|                    |                                                           | e.g. embedded<br>microprocessor, embedded<br>memory                                        |
| Soft<br>Macrocells | Portable to different<br>CMOS Processes<br>(Gate Netlist) | Predetermined functionality<br><b>but NO</b><br>physical implementation                    |
|                    |                                                           | Typically provided by IP<br>vendors [with software tools,<br>testing procedures and tools] |

# MacroModules



256×32 (or 8192 bit) SRAM Generated by hard-macro module generator

# Soft MacroModules and IP



string mat = "booth"; directive (multtype = mat); output signed [16] Z = A \* B;



Synopsys DesignCompiler



#### **Array-based implementation**

#### Do not require a complete manufacturing run

| Gate array                                 | Mask Programmable arrays                                                                                          |  |  |
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|--|--|
| (or Sea of gates)                          | Pre diffused wafers, ONLY Metal layers are missing                                                                |  |  |
| Field<br>Programmable<br>Gate Array (FPGA) | Complete separation between the<br>manufacturing phase and the<br>implementation phase                            |  |  |
|                                            | Manufacturing phase has large volumes<br>[+] Short Time-to-market and LOW NRE<br>[-] Performance loss and HIGH RE |  |  |

#### How are cells programmed



#### What type of logic can be programmed ?

Array-based programmable logic (e.g. PLA, PAL)



#### **Cell-based programmable logic**

Look Up Table based logic cells: Any combinatorial logic



Sea of gates



Courtesy LSI Logic

## **RAM-based FPGA**



Xilinx XC4000ex

Courtesy Xilinx

#### Xilinx Virtex UltraScale

| Value                              | Deliverables                                                                                                                                                                                                                                                                        |
|------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Programmable System<br>Integration | <ul> <li>Up to 5.5M System Logic Cells at 20nm using 2<sup>nd</sup> generation 3D<br/>IC</li> <li>Integrated 100G Ethernet MAC and 150G Interlaken cores</li> </ul>                                                                                                                 |
| Increased System Performance       | <ul> <li>Up to two speed-grade improvement with high utilization</li> <li>30G transceivers for chip-to-chip, chip-to-optics, 28G backplanes</li> <li>16G backplane capable transceivers at half the power</li> <li>2,400 Mb/s DDR4 for robust operation over varying PVT</li> </ul> |
| BOM Cost Reduction                 | <ul> <li>Up to 50% lower cost - half the cost per port for Nx100G systems</li> <li>VCXO and fractional PLL integration reduces clocking component cost</li> <li>2,400 Mb/s DDR4 in a mid-speed grade</li> </ul>                                                                     |
| Total Power Reduction              | <ul> <li>Up to 40% lower power vs. previous generation</li> <li>Fine granular clock gating with ASIC-like clocking</li> <li>Enhanced logic cell packing reduces dynamic power</li> </ul>                                                                                            |
| Accelerated Design<br>Productivity | <ul> <li>Footprint compatibility with Kintex UltraScale devices for<br/>scalability</li> <li>Seamless footprint migration from 20nm planar to 16nm FinFET</li> <li>Co-optimized with Vivado Design Suite for rapid design closure</li> </ul>                                        |

#### **IC Production – Installed capacity**



Worldwide Capacity by Product Type as of Dec-2012 (Installed Monthly Capacity in 200mm-Equiv. Wafers x1000)

Source: IC Insights

### IC Production Breakdown by Region

#### Regional Capacity by Product Type as of Dec-2012 (Installed Monthly Capacity in 200mm-Equiv. Wafers x1000)

| Product | Americas | Europe  | Japan   | Korea   | Taiwan  | China   | ROW     | Total    |
|---------|----------|---------|---------|---------|---------|---------|---------|----------|
| Analog  | 323.5    | 328.3   | 391.9   | 31.5    | 19.2    | 153.5   | 139.4   | 1,387.4  |
| Memory  | 382.9    | 29.3    | 1,109.1 | 1,821.8 | 1,194.3 | 338.3   | 362.0   | 5,237.4  |
| Logic   | 341.1    | 167.8   | 704.2   | 363.1   | 37.7    | 70.9    | 106.5   | 1,791.3  |
| Micro   | 727.2    | 326.4   | 251.1   | 26.3    | 5.1     | 11.2    | 146.6   | 1,493.8  |
| Foundry | 296.5    | 135.5   | 124.5   | 254.6   | 1,899.0 | 737.3   | 543.3   | 3,990.7  |
| Other   | 50.0     | 128.5   | 119.2   | 67.5    | 10.1    | 19.8    | 201.3   | 596.5    |
| Total   | 2,121.3  | 1,115.7 | 2,700.1 | 2,564.7 | 3,165.4 | 1,330.8 | 1,499.0 | 14,497.0 |

Source: IC Insights

#### Korea, Japan ~ 2x Europe Taiwan ~ 3x Europe