# Optimal Wire and Transistor Sizing for Circuits With Non-Tree Topology

Lieven Vandenberghe (UCLA) Stephen Boyd (Stanford University) Abbas El Gamal (Stanford University)

## Contribution

dominant time constant as measure for RC circuit delay

- applies to general (nontree) RC circuits
- can be efficiently, globally optimized

example applications: sizing of

- clock meshes
- busses with crosstalk

# Outline

- Elmore delay minimization in RC trees
- dominant time constant minimization in general RC circuits
- example applications
- conclusions

#### **RC** models for digital circuits



- C > 0, G > 0 (capacitance and conductance matrices)
- simple model for transistors & wires

# Sizing problem

design variables: transistor and wire widths

 $C(\boldsymbol{x})\text{, }G(\boldsymbol{x})$  are affine in design variables  $\boldsymbol{x}$ 

#### tradeoff between

- threshold delay (e.g., 50%)
- power:  $\frac{1}{2}V^T C(x)V$  per transition (*i.e.*, affine in x)
- area (approximated by affine function of x)
- . . . intractable

### The Elmore delay



- good approximation of 50% delay (for monotonic step resp.)
- efficiently & globally minimized for RC **trees** (via geometric programming; c.f. TILOS)
- no useful convexity properties for **non-tree** circuits

#### **Dominant time constant**

- node voltages have form  $v_k(t) = 1 \sum_{i=1}^n \alpha_{ik} e^{\lambda_i t}$
- $0 > \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n$ : roots of  $det(\lambda C(x) + G(x)) = 0$
- slowest, *i.e.*, **dominant** time constant is  $T^{\mathrm{dom}} = -1/\lambda_1$

$$- \times \times \times \times - \frac{-1/T^{\text{dom}}}{\times}$$

good approximation of max threshold delay (usually better than Elmore delay)

#### Sizing with dominant time constant constraint

$$T^{\text{dom}} \leq T \iff TG(x) - C(x) \geq 0$$

- **convex** constraint in x (linear matrix inequality)
- no restrictions on topology (*i.e.*, G, C)

**example:** minimize linear function (e.g., area, power) s.t.

- upper bound on  $T^{\mathrm{dom}}$
- upper and lower bounds on  $x_i$
- a convex optimization problem (semidefinite program)

# Semidefinite programming (SDP)

minimize 
$$c^T x$$
  
subject to  $F_0 + x_1 F_1 + \dots + x_m F_m \ge 0$ 

- linear objective function, linear matrix inequality constraint  $(F_i = F_i^T \in \mathbf{R}^{n \times n})$
- convex (but not necessarily differentiable) constraint
- **global** optimum efficiently computed using recent interior-point methods

# Outline

- Elmore delay minimization in RC trees
- dominant time constant minimization in general RC circuits
- example applications
  - clock meshes
  - busses with crosstalk
- conclusions

### **Clock distribution mesh**

- used in high-performance designs to reduce clock skew (*e.g.*, DEC alpha)
- quantities of interest: skew, maximum delay, power
- **non-tree** topology: Elmore delay methods find **local** optimum [Desai et al., DAC96]

#### **Clock mesh example**



- multiple synchronized drivers
- variables  $x_i$ : interconnect widths



minimize power subject to  $T^{\mathrm{dom}} \leq T$ ,  $0 \leq x_i \leq w_{\mathrm{max}}$ 

#### Power versus dominant time constant



**globally** optimal tradeoff curve via semidefinite programming

optimal **topology** varies along tradeoff curve

#### Two solutions on tradeoff curve



- $T^{\text{dom}}$  is a good approximation of max. 50% delay
- minimizing  $T^{\text{dom}}$  reduces skew

### **Busses with crosstalk**

- capacitive coupling in deep submicron
- **non-tree** topology (non-grounded capacitors)
- Elmore delay is not a good delay measure (non-monotonic step response)

### Example



- ullet variables: widths  $w_{ij}$ , spacing  $s_1$ ,  $s_2$
- coupling capacitances  $\sim 1/s_{ij}$
- minimize total width  $s_1 + s_2$  subject to bound  $T^{\text{dom}} \leq T$ , upper and lower bounds on  $s_{ij}$ ,  $w_{ij}$

#### Total width versus dominant time constant



globally optimal tradeoff curve via semidefinite programming

#### **Effect on crosstalk level**

apply unit step to bus line 2, zero input to lines 1 and 3



minimizing  $T^{\text{dom}}$  indirectly reduces crosstalk level

### **Computational complexity of SDP**

#### interior-point methods

- worst-case: #iterations  $\sim \sqrt{\text{problem size}}$
- in practice: #iterations between 5 and 50
- can exploit structure to reduce computation per iteration

#### structure in SDPs arising in circuit sizing

- G(x), C(x) sparse; each entry depends on very few variables
- can evaluate  $T^{\mathrm{dom}}$  very efficiently using Lanczos algorithm

## Conclusions

dominant time constant as measure for RC circuit delay

- applies to general (nontree) RC circuits
  - multiple sources
  - loops of resistors
  - capacitive coupling
- efficiently, globally optimized via semidefinite programming

no specialized implementation for large-scale problems