## Evaluation of Booth encoding techniques for parallel multiplier implementation

## D. Villeger and V.G. Oklobdzija

Indexing terms: Digital arithmetic, Multipliers

Although generally used in parallel multipliers, Booth encoding is shown to be obsolete due to the improvements in bit compression trees. It was found that a single row of 4.2 compressors reduces the number of partial products to one half, which is the essential function of the Booth encoding technique. With a single row of 4.2 compressors this reduction is achieved in less time and with fewer gates used.

Introduction: The Booth algorithm [1] is widely used in the implementations of hardware or software multipliers because its application makes it possible to reduce the number of partial products. It can be used for both sign-magnitude numbers as well as 2's complement numbers with no need for a correction term or a correction step.

Booth-MacSorley recoding: A modification of the Booth algorithm was proposed by MacSorley [2] in which a triplet of bits is scanned instead of two bits. This technique has the advantage of reducing the number of partial products by one half regardless of the inputs. The recoding is performed within two steps: encoding and selection. The purpose of the encoding is to scan the triplet of bits of the multiplier and define the operation to be performed on the multiplicand, as shown in Fig. 1. This method is actually an application of a sign-digit representation in radix 4. The Booth-MacSorley algorithm, usually called the modified Booth algorithm or simply the Booth algorithm, can be generalised to any radix. For example, a 3 bit recoding would require the following set of digits to be multiplied by the multiplicand: 0, ±1, ±2, ±3. The difficulty lies in the fact that  $\pm Y$  is computed by summing (or subtracting) 1 to  $\pm 2Y$ , which means that a carry propagation occurs. The delay caused by the carry propagation renders this scheme slower than a conventional scheme. Consequently, only 2 bit Booth recoding is used and therefore considered in this Letter.



Fig. 1 Implementation of modified Booth recoding

Booth recoding compared with use of 4:2 compressors: Booth recoding necessitates the internal use of a 2's complement representation in order to efficiently perform subtraction of the partial products as well as additions. However, the floating point standard specifies the sign magnitude representation which is followed by most of the non-standard floating point numbers in use today. Thus, we assume the use of the sign magnitude representation and compare the multiplier implementations using Booth encoding with those not using it but resorting to efficient partial product addition techniques such as the use of 4:2 compressors.

The advantage of Booth recoding is that it generates only half of the partial products compared to the multiplier implementation which does not use Booth recoding. However, the benefit achieved comes at the expense of increased hardware complexity. Indeed, this implementation requires hardware for the encoding and for the selection of the partial products  $(0, \pm Y, \pm 2Y)$ . An optimised encoding is shown in Fig. 2. The multiplexers and buffers are con-

sidered to be equivalent to an XOR gate. This implementation is then equivalent to a level of XOR gates and a level of AND gates. The selection can be implemented with a simple 5input multiplexer, which is roughly equivalent to three XOR gates. However, because one input is grounded, this circuit can be designed with only a 4 input multiplexer, i.e. two XOR gates, and an AND gate. In this case, the Booth recoding circuit is equivalent to three XOR plus two AND gates.



Fig. 2 Optimised encoding circuit

On the other hand, reducing the number of partial products by one half can be achieved with one level of AND gates and one row of 4:2 compressors. The 4:2 cell is designed with three XOR levels as shown in Fig. 3 and implemented in [4,5]. The use of higher order compressors would result in even higher levels of compression. However, the main disadvantage of the Booth technique is the complexity introduced by the internal use of the 2's complement representation which is necessary to compute negative partial products. Indeed, because the Booth recoding method calculates -Y and -2Y, it needs to extend the sign of negative partial products. It further needs to complement Y when -Y or -2Y are needed, that is to calculate -Y = inv(Y) + 1 where inv (Y) represents the inversion of every bit of Y. Consequently, two extra bits are necessary in the scheme: one for the sign extension and one for conversion into 2's complement. Both of the bits will be placed in the same row, therefore not increasing the number of rows. However, the correction bit (which is needed for correct sign calculation) will be placed right in the middle of the multiplier tree therefore not only increasing the number of rows by one but creating this increase in the worse possible place, i.e. in the critical path of the multiplier.

The conclusions are summarised in Table 2.

Table 2: Comparison of sign-magnitude number multiplication with and without Booth encoding

| Booth encoding                                                                         | No Booth encoding                                                                       |
|----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| Internal representation: 2's complement (some partial products need to be subtracted)  | Internal representation:<br>sign magnitude (all the partial<br>products are positive)   |
| Hardware for encoding and selection                                                    | One row of 4:2 compressors                                                              |
| Sign extension                                                                         | Only one XOR is used to com-<br>pute the sign in parallel                               |
| Two extra bits (sign extension and complementation                                     | No extra bit                                                                            |
| The normalisation requires<br>some leading zero detectors and<br>leading one detectors | The normalisation and even the rounding are easy [5]                                    |
| The schematic and the layout are not regular                                           | The simplicity of the schematic allows a highly regular layout                          |
| 1 XOR + 1 AND (encoding), 2<br>XOR + 1 AND (multiplexer)<br>Total 3 XOR + 2 AND        | 1 AND (partial product genera-<br>tion), 3 XOR (4:2 compressor)<br>Total: 3 XOR + 1 AND |

Conclusion: When Booth recoding is used the schematic and the layout of the resulting implementation are less regular leading to a more difficult design or VHDL description. In terms of speed, the Booth technique is at best equal to or worse than the use of the 4:2 compressors. In the 2's complement representation and without Booth encoding, the last row of the partial product (depending on the sign of the multiplier) is generated by using a AND gate with a inverted input. In other words, the number of gate levels is the same as in the sign magnitude case. However, the sign



Fig. Structure of 4:2 compressor cell

extension is needed with or without Booth encoding. This feature makes the two schemes comparable, although using 4:2 compressors is slightly better because of the simplicity and the lower number of gate levels.

Acknowledgment: We thank T. Soulas and S. Liu for their input.

© IEE 1993

Electronics Letters Online No. 19931300

D. Villeger (Ecole Superieure d'Ingenieurs en Electrotechnique et Electronique, 93162 Noisy le Grand Cedex, France)

V. G. Oklobdzija (Electrical and Computer Engineering Department, University of California, Davis, CA 95616, USA)

## References

- BOOTH, A.D.: 'A signed binary multiplication technique', Quarterly J. Mechan. Appl. Math., 1951, IV

  MACSORLEY, O.L.: 'High speed arithmetic in binary computers',
- WALSORIET, 1961, 49, (1)
  WEINBERGER, A.: '4:2 carry-save adder module', IBM Tech.
- Disclosure Bulletin, 1981, 23

  MORI et al., J.: 'A 10ns 54 × 54-b parallel structured full array multiplier with 0.5-u CMOS technology', IEEE J. Solid State Circuits, 1991, 26, (4)
- SOULAS, T., VILLEGER, D., and OKLOBDZIJA, V.G.: 'An ASIC multiplier for complex numbers'. Proc. EURO-ASIC-93, 22-25 February 1993, (Paris, France)

## Application of the FDTD method and a full time-domain near-field transform to the problem of radiation from a PCB

I.J. Craddock and C.J. Railton

Indexing terms: Finite-difference time domain method, Electromagnetic compatibility

The finite-difference time-domain method is combined with a full time-domain near-field transform to yield accurately and efficiently the radiated field levels measured at a distance of 3m from a printed circuit board.

Introduction: With the introduction of stringent new EC electromagnetic compatibility (EMC) standards the problem of quantifying and then minimising unwanted emissions from equipment assumes a new importance in hardware design. Solutions to this type of problem would be easier and cheaper to achieve if there existed a method of simulating the emissive characteristics of the proposed design. The development of simulation techniques suitable for application to realistic problems is accordingly the subject of much research.

This Letter describes the application of a well known electromagnetic analysis technique (the finite-difference time-domain, or FDTD method), along with new extensions to this technique, to the efficient analysis of a simple but realistic EMC problem for which measured data exist.

Trial problem: The field strength produced by the structure shown in Fig. 1 has been measured between 50 and 600MHz at a distance of 3m in a 10m semi-anechoic chamber (a typical EMC test configuration) [1]. The structure consists of a 2.8mm wide  $50\Omega$ track on a large PCB terminated at one end by a  $50\Omega$  load and driven at the other by a CMOS IC, powered by a shielded battery. This eliminates the need for power cables which would themselves cause radiation. In [1] the radiated field levels were predicted using the FDTD technique with a large computational domain, the aim of the work described within this Letter is to show that the measured results can be predicted far more efficiently by using a nearfield transform in conjunction with a smaller domain.



Fig. 1 Geometry of trial problem

FDTD method: This numerical technique has been applied with success to the analysis of a range of electromagnetic problems and uses the widely accepted discretisation of the Maxwell curl equations in space and time proposed by Yee in 1966 [2]. The electromagnetic behaviour of the structure of interest is modelled with full rigour (unlike many commercial analysis tools), the method imposes no restrictions on the geometry of the structure and the results are available over a wide frequency band.

The computational effort associated with the FDTD algorithm increases linearly with the electrical size of the structure and the spatial resolution required to describe the structure adequately (i.e. the size of its smallest feature). One method used to reduce the computational requirement is the employment of a non-uniform spatial discretisation of the Maxwell equations, whereby high spatial resolution is used in regions of fine geometrical detail or rapid field variation but in, for example, free space, the resolution may

Near-field transformations: Although the FDTD method could be used to model the entire problem space (basically a  $1.5 \times 3 \times$ 0.5 m3 volume) under consideration, this is, for the reasons given above, a computationally expensive option. In this Letter FDTD is used to solve for only the fields on and enclosing the PCB, and a different time-domain technique is used to extend the FDTD results to a point 3m away.

This second technique is commonly known as a near-field transformation and may be briefly described as follows; The equivalence principle is used to replace the fields within a closed surface S, which encloses all the field sources of interest, by equivalent electric and magnetic currents, J and M, on S. Vector potential theory allows the calculation of the fields induced by these currents at any point P, in the volume outside S, from relations such

$$\begin{split} \mathbf{E_P}(t) &= \frac{1}{4\pi} \int_S \left( \frac{\mu}{r} \frac{\partial \mathbf{J}(t-\tau)}{\partial t} - \frac{1}{r^2} \hat{\mathbf{R}} \times \mathbf{M}(t-\tau) \right. \\ &\left. - \frac{1}{cr} \hat{\mathbf{R}} \times \frac{\partial \mathbf{M}(t-\tau)}{\partial t} \right) \, dS \end{split}$$

where Ep(t) is the electric field at P at time t, r is the distance to the point P,  $\mathbf{R}$  is the unit vector in the direction of P,  $\tau$  is the timedelay to the point, c is the speed of light and  $\mu$  is the permeability of the medium.

The combination of FDTD and a time-domain near-field transform has been described elsewhere, for example [3-5], but the application has been to the determination of the far-field charac-