#### Implementation strategies for digital design

Chip complexity grows faster than design productivity

From 1980: Design gap

Compound complexity growth rate

(#transistors/chip):

**58%/Year** 

Compound productivity growth rate

(Transistor/Person-month):

**21%/Year** 

→ Team size increases (now 1000)

# Productivity leaps are due to the introduction of new design technologies

Time: 70s

Programmable Logic Arrays

Standard cells

Macro cells, module compilers

Gate arrays

Reconfigurable hardware

#### Complex digital ICs (e.g. Microprocessors)



#### Semiautomatic placement and routing



#### Intel 4004 – custom design

2300 PMOS

10  $\mu$ m process

Clock: 108 KHz

Area:

3 mm x 4 mm



Courtesy Intel

## Transition to Automation and Regular Structures



Intel 4004 ('71)



Intel 8080



Intel 8085



Intel 8286



Intel 8486

**Courtesy Intel** 

#### Tradeoff between flexibility and efficiency

efficiency

Energy

#### **ADVANTAGES of Flexibility (Programmability):**

- Reusable design of multiple applications
- Generic hardware with upgradable software

#### **DISADVANTAGES:**

- Loss in performance: large overhead and slower operation (instruction decoding)
- Increase in energy consumption: same reason.

Custom Hardwired (fixed at manufacturing)

Configurable HW (parameters)

Application specific processor (DSP, GPU)

Embedded microprocessor

Flexibility

#### Cost of an integrated circuit

Recurring Expenses (Variable costs)



Proportional to the Sales volume

+ Non Recurring Expenses (Fixed costs)



Independent of the Sales volume

Total cost of 1 chip =

Variable cost of 1 chip + 
$$\frac{\text{fixed costs}}{\text{# of chips}}$$

#### NRE (Fixed Costs) and RE (Variable Costs)

NRE = Time & Person Months required for the design+ production equipment related to the specific chip

RE = 
$$\frac{\text{cost of die} + \text{cost of die test} + \text{cost of packaging}}{\text{final test yield}}$$
 % of good chips
$$\frac{\text{cost of die}}{\text{cost of die}} = \frac{\text{cost of wafer}}{\text{#dies per wafer} \times \text{die yield}}$$
 % of good dies

#### Die Yield

Empirical formula (for a modern CMOS process  $\alpha \sim 3$ )

Die yield = 
$$\left[ 1 + \frac{\text{\# defects per unit area} \times \text{die area}}{\alpha} \right]^{-\alpha}$$

For example:

1 defect/cm<sup>2</sup>

| Die area             | Yield |  |  |
|----------------------|-------|--|--|
| $0.025 \text{ cm}^2$ | 99%   |  |  |
| $0.25 \text{ cm}^2$  | 78%   |  |  |
| 2.5 cm <sup>2</sup>  | 16.2% |  |  |



#### Implementation approaches to digital design



Higher Performance Higher NREs

Higher Flexibility
Higher REs

#### **Custom design**

Transistor by transistor design of complete circuit

- [+] High performance
- [-] Time consuming → High cost of design
  - → High time to market

| WHEN | Blocks that have to be used many times | e.g. Library Cells                                         |  |
|------|----------------------------------------|------------------------------------------------------------|--|
|      | Very high volume ICs                   | e.g. Microprocessors                                       |  |
|      | Cost is not an issue                   | e.g. defense applications e.g. supercomputing applications |  |

### **Cell-based semicustom design**

We need a library of cells to be used for the design

| Type of library |                                                                  |
|-----------------|------------------------------------------------------------------|
| Standard cells  | Logic gates,<br>MSI circuits: Decoders, multiplexes,<br>encoders |
| Macro cells     | Memory Bank                                                      |
| Mega cells      | Microprocessor, DSP, PCI interface                               |

## 1<sup>st</sup> generation Standard Cell — Example



Not very dense – lot of space used by the routing channels

# Standard Cell – The New Generation



Cell-structure
hidden under
interconnect layers



Much denser structure: no space occupied by routing channels Routing occurs through higher interconnect layers

#### Design at a high level of abstraction



### Suitable to Fabless industry model

| Fabless company                      | Design + Testing +<br>Sale                    | e.g. Marvell, Qualcomm<br>Dialog Semiconductor,<br>Altera, Xilinx |  |  |
|--------------------------------------|-----------------------------------------------|-------------------------------------------------------------------|--|--|
| Foundry                              | Fabrication + Standard cells for Fabless      | e.g. TSMC, UMC, SMIC                                              |  |  |
| IDM (Integrated Device Manufacturer) | Design + Fabrication + Testing + Sale         | e.g. INTEL, Samsung<br>STM                                        |  |  |
| IP Vendor                            | Macro cell library<br>Soft Macromodules<br>IP | e.g. ARM                                                          |  |  |

#### **Macrocells or Macromodules**

| Hard<br>Macrocells | Custom designed for a specific CMOS process               | Predetermined functionality AND Predetermined physical implementation                |
|--------------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------|
|                    |                                                           | e.g. embedded<br>microprocessor, embedded<br>memory                                  |
| Soft<br>Macrocells | Portable to different<br>CMOS Processes<br>(Gate Netlist) | Predetermined functionality but NO physical implementation                           |
|                    |                                                           | Typically provided by IP vendors [with software tools, testing procedures and tools] |

#### MacroModules



256×32 (or 8192 bit) SRAM Generated by hard-macro module generator

#### Soft MacroModules and IP



```
string mat = "booth";
directive (multtype = mat);
output signed [16] Z = A * B;
```



## Semicustom design flow

FUNCTIONAL IMPLEMENTATION

| 1. | Design capture                         | Schematic<br>+VHDL<br>+ IP blocks                |   |
|----|----------------------------------------|--------------------------------------------------|---|
| 2. | Logic Synthesys                        | Gate netlist + Placement and routing constraints | ر |
| 3. | Pre-layout simulation and verification |                                                  |   |
|    | No                                     | <u>ok</u><br>Sı                                  |   |

|    | PHYSICAL IF                             | PLEHENTATION        |
|----|-----------------------------------------|---------------------|
|    | •                                       |                     |
| 4. | Floor planning                          |                     |
| 5. | Placement and routing                   | Layout and<br>Masks |
| 6. | Electrical circuit extraction           |                     |
| 7. | Post-layout simulation and verification |                     |
|    | SEND TO                                 | FOUNDRY             |

#### **Array-based implementation**

#### Do not require a complete manufacturing run

| Gate array                           | Mask Programmable arrays                                                                                    |  |
|--------------------------------------|-------------------------------------------------------------------------------------------------------------|--|
| (or Sea of gates)                    | Pre diffused wafers, ONLY Metal layers are missing                                                          |  |
| Field Programmable Gate Array (FPGA) | Complete separation between the manufacturing phase and the implementation phase                            |  |
|                                      | Manufacturing phase has large volumes [+] Short Time-to-market and LOW NRE [-] Performance loss and HIGH RE |  |

#### How are cells programmed

#### **Fuse-based FPGA [Write Once]**

- Fuse: Normally short circuits (a high current can blow up the fuse)
- Antifuse: Normally OPEN circuit (a high voltage can cause oxide breakdown and short circuit)





Non volatile FPGA

Non Volatile memory that controls on interconnection.

**SRAM-based FPGA** 

(box up table)

#### What type of logic can be programmed?

Array-based programmable logic (e.g. PLA, PAL)



Any combinatorial function [any AND-OR Function]

### Cell-based programmable logic

Look Up Table based logic cells: Any combinatorial logic



#### Sea of gates



Memory

Subsystem

Random Logic

LSI Logic LEA300K (0.6 µm CMOS)

Courtesy LSI Logic

### **RAM-based FPGA**



Xilinx XC4000ex

#### **Xilinx Virtex UltraScale**

| Value                              | Deliverables                                                                                                                                                                                                                                                                        |
|------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Programmable System<br>Integration | <ul> <li>Up to 5.5M System Logic Cells at 20nm using 2<sup>nd</sup> generation 3D IC</li> <li>Integrated 100G Ethernet MAC and 150G Interlaken cores</li> </ul>                                                                                                                     |
| Increased System Performance       | <ul> <li>Up to two speed-grade improvement with high utilization</li> <li>30G transceivers for chip-to-chip, chip-to-optics, 28G backplanes</li> <li>16G backplane capable transceivers at half the power</li> <li>2,400 Mb/s DDR4 for robust operation over varying PVT</li> </ul> |
| BOM Cost Reduction                 | <ul> <li>Up to 50% lower cost – half the cost per port for Nx100G systems</li> <li>VCXO and fractional PLL integration reduces clocking component cost</li> <li>2,400 Mb/s DDR4 in a mid-speed grade</li> </ul>                                                                     |
| Total Power Reduction              | <ul> <li>Up to 40% lower power vs. previous generation</li> <li>Fine granular clock gating with ASIC-like clocking</li> <li>Enhanced logic cell packing reduces dynamic power</li> </ul>                                                                                            |
| Accelerated Design<br>Productivity | <ul> <li>Footprint compatibility with Kintex UltraScale devices for<br/>scalability</li> <li>Seamless footprint migration from 20nm planar to 16nm FinFET</li> <li>Co-optimized with Vivado Design Suite for rapid design closure</li> </ul>                                        |

#### IC Production – Installed capacity

Worldwide Capacity by Product Type as of Dec-2012 (Installed Monthly Capacity in 200mm-Equiv. Wafers x1000)



Source: IC Insights

# IC Production Breakdown by Region

Regional Capacity by Product Type as of Dec-2012 (Installed Monthly Capacity in 200mm-Equiv. Wafers x1000)

| Product | Americas | Europe  | Japan   | Korea   | Taiwan  | China   | ROW     | Total    |
|---------|----------|---------|---------|---------|---------|---------|---------|----------|
| Analog  | 323.5    | 328.3   | 391.9   | 31.5    | 19.2    | 153.5   | 139.4   | 1,387.4  |
| Memory  | 382.9    | 29.3    | 1,109.1 | 1,821.8 | 1,194.3 | 338.3   | 362.0   | 5,237.4  |
| Logic   | 341.1    | 167.8   | 704.2   | 363.1   | 37.7    | 70.9    | 106.5   | 1,791.3  |
| Micro   | 727.2    | 326.4   | 251.1   | 26.3    | 5.1     | 11.2    | 146.6   | 1,493.8  |
| Foundry | 296.5    | 135.5   | 124.5   | 254.6   | 1,899.0 | 737.3   | 543.3   | 3,990.7  |
| Other   | 50.0     | 128.5   | 119.2   | 67.5    | 10.1    | 19.8    | 201.3   | 596.5    |
| Total   | 2,121.3  | 1,115.7 | 2,700.1 | 2,564.7 | 3,165.4 | 1,330.8 | 1,499.0 | 14,497.0 |

Source: IC Insights

Korea, Japan ~ 2x Europe Taiwan ~ 3x Europe