### Hybrid Latch Flip-Flop with Improved Power Efficiency Nikola Nedovic and Vojin G. Oklobdzija Advanced Computer System Design Laboratory Department of Electrical and Computer Engineering University of California Davis, CA 95616 (nikola, vojin)@ece.ucdavis.edu (510) 486-8171 (510) 486-0790 FAX http://www.ece.ucdavis.edu/acsel #### Abstract An improved design of a Hybrid Latch Flip-Flop is presented. Proposed design overcomes the problem of the glitch at the output and reduces the power consumption and delay of the circuit resulting in total Power-Delay-Product improvement of about 20%. It also exhibits better soft-clock edge properties compared to the original circuit. This is accomplished by careful design of keeper elements and introducing the feedback path to suppress unnecessary transitions in the circuit. New design introduces insignificant area increase. #### 1. Introduction Hybrid Latch Flip-Flop [1] is one of today's high-performance flip-flops. It introduces new mechanism of performing flip-flop functionality based on generating explicit transparency window where the transition is allowed. This approach greatly reduces the complexity of the locking mechanism, resulting in small area and small delay. Moreover, explicit presence of the transparency window allows the use of simpler latch structure in the second stage and gives the circuit robustness to the uncertainty of the clock arrival, known as soft-clock edge property. However, use of this design is associated with considerable power consumption. One of main reasons for this is high internal activity of the circuit even when input activity is small. Also, increased output activity due to glitches when the output is at the high level contributes to the total power dissipation. This paper presents new technique that makes an effort to reduce power consumption resulting from unnecessary transitions in the circuit. This is done by employing the feedback path that uses the state of the circuit to prevent precharging of the internal node when it is redundant and reduces power consumption associated with the precharge. Two flip-flops proposed in the paper use more careful design to further reduce power consumed for overpowering the keepers in the circuit. The paper is organized as follows. Section 2 provides background on principle of operation and limitations of Hybrid Latch Flip-Flop. Section 3 describes improvements proposed in the paper. Section 4 gives the principle of operation of proposed designs. Section 5 describes the simulation methodology used for comparison of the circuits. Section 6 presents the results obtained. Section 7 concludes the paper. # 2. HLFF principle of operation and shortcomings Circuit is shown in Figure 1. The principle of operation will be briefly explained. Transparency window is defined by the propagation time of three inverters. The first stage of the circuit conditionally generates the glitch in the transparency window based on the level of input (D) signal. It can be easily noticed that it is formed by static 3-input CMOS NAND gate, which has the functionality Figure 1. Hybrid Latch Flip-Flop for wanted glitch generation. Second stage captures the generated glitch. If the glitch has not appeared (corresponds to low level of input), the output is brought to zero. However, the simplicity of the circuit has an appearance of unnecessary internal transitions as a consequence, which increase flip-flop total power consumption. Every time when input is high the glitch is generated, regardless of the previous state of the output. The observation can be made that if the output is already at the high level, generating the glitch will only increase circuit's intrinsic power dissipation without any useful work. Also, the circuit suffers from an unwanted glitch at the output, generated by the race condition. This happens because the second stage assumes the default state of internal node (X) to be high (no glitch at the node X is a signal for resetting the output). If the previous flip-flop state was high and under realistic assumption of non-zero propagation time from the active clock edge to the transition of the internal node, it is easily seen that the output will make false transition from high to low level, returning again to high level after the transition of internal node. Chosen technique for keeping the output at the defined state (avoiding dynamic behavior) also has certain shortcomings. The keeper is used to hold the value of a dynamic node that would otherwise be in high impedance and therefore sensitive to leakage current effects and noise, especially in low-power applications where clock gating techniques are usually employed. This simple method has some disadvantages as well: in order to change the state of HLFF, the keeper has to be overpowered, which introduces another portion of unnecessary power consumption and increase in the delay. This is particularly true for the keeper at the output since in some cases, such as pass-transistor logic driven by flip-flop, it has to have certain minimal driving capability, which is the requirement in conflict with the claimed keeper weakness. These disadvantages make HLFF not suitable for applications where low power is required since its power consumption limits its utilization. It is noticed that considerable portion of power dissipated in HLFF occurs due to these unnecessary and false transitions that result in glitches generally increasing the power consumed by consecutive logic as well. ## 3. Improvements One of the most important contributions of this work is related to abandoning unconditional pre-charge operation of the internal node, tightly connected to excessive power dissipation of the circuit, addressed in the previous section. This is accomplished by controlling the return of internal node to inactive (high) state using the information about previous flip-flop state, allowing the internal node to stay at low level until input condition is changed. This approach efficiently eliminates the unnecessary transitions of the internal node as well as race condition at the output. There are two main disadvantages of this approach in terms of propagation time. One is introducing another critical path for low input level capture. This is not limiting drawback since sizing of the transistor can be used to ensure that this path does not exceed capture-one path and the only potential disadvantage is somewhat increased load on flip-flop input due to bigger pMOS transistor connected to it. Another drawback is increasing the output load due to the feedback, which, although small (minimal size transistor can be used), being out of the critical path, can affect total propagation delay. Figure 2. Conditional Precharge Flip-Flop There is one more potential disadvantage of the approach proposed. It is related to the environment where input node is subject to high activity, for example due to the noise or unbalanced logic delays that might cause the glitches. If the output is at the high level, this situation may potentially result in excessive power dissipation on the internal node. To overcome this problem, the scheme Figure 3. Alternative Conditional Precharge Flip-Flop with alternative local feedback is proposed (Figure 3). Since internal node instantaneously reacts to the input changes, only one transition (from low to high logic level) is allowed, while the rest of input activity has no effect on the circuit's power dissipation. This solution, however, exhibits a little higher dissipation with steady level of the input node, compared to the previous one. Another important improvement of the proposed circuit is the design of conditional keeper. The keeper at the output is carefully designed to avoid any clash with critical path transitions. The main observation here is that separation of the keeping the output at the low level from the high level can be done. The output should be kept at the low level if internal node is not low, i.e. when it is high. Similarly, the output should be kept at the high level in any case except during the transparency window. This approach avoids the clash at the output and allows for independent sizing of the keeper to meet the requirements of the consecutive logic. Two proposed schemes of the circuit called Conditional Precharge Flip-Flop (CPFF) are shown in Figures 2 and 3. #### 4. Principle of operation of CPFF Conditional Precharge Flip-Flop operates as follows. Transparency window is generated by inserting odd number of inverters in the clock path in the same way as with HLFF. High level at D input of the flip-flop during the transparency window results in driving the internal node to ground. This sets the output (Q) to high level. Once internal node is at the low level (after propagation time required to set the output to VDD), it remains at the low level as long as the input is at high level, because of the path to ground provided by the feedback from the output. The only way the internal node can switch to high level again, as opposed to HLFF, is after input transition from high to low later in the clock cycle. If the input remains high over several clock cycles, no change in the flip-flop internal state will be noticed. Similarly, if the input is at the low level during the transparency window, the internal node remains high, which resets the output. Second stage captures the state of the internal node. The low level is captured unconditionally, since high-to-low transition of internal node is synchronous with the clock edge (more precisely, with the transparency window). The exception is the internal node transition that occurs after low-to-high switching of the input when Q is at the high level, regardless of the state of the clock. However, since the output is already at the high level in this case, it is not necessary to synchronize high-to-low transition of the internal node. High level at the internal node has to be synchronized, since it can occur with no relationship with the output or clock. The synchronization is done in the same way as in HLFF, by restricting the transition only to the transparency window. Holding the output at the desired level is performed by the use of conditional keepers. The high level of the output is held when either clock or delayed clock is low. In other words, the output is kept high out of the transparency window, which is the only period when it can be driven low. Similarly, the output is kept at the low level if the node that drives it to VDD is at its inactive state (when the internal node of the flip-flop is high). This assures that there will be no conflict between paths to VDD and ground at any moment, while the high impedance on the output node is avoided as well. Operation of alternative version of Conditional Capture Flip-Flop is similar to the one described above (CPFF), with the only difference in realization of the feedback for the conditional precharge of the internal node. While it was previously achieved by using the state of the output to suppress unwanted precharge, now the state of internal node itself is used. This is useful to reduce the internal power dissipation of the circuits in the environments where the input experiences large number of transitions in the single clock cycle. Namely, if the output is at the high level, first stage of CPFF behaves as an inverter, and it is obvious that multiple transitions at the input of the flip-flop will increase its power dissipation. Local feedback employed in Alternative CPFF locks the internal node at the high level once the low level of the input is observed, regardless of the state of the output. Drawback of this circuit is somewhat higher power dissipation at the low input switching activity (due to the additional circuitry and the load on the internal signal). Also, since the internal node is locked at the high level after one high-to-low and consecutive low-to-high transition of the input, while the output is high, behavior similar to HLFF with respect to previously described glitch at the output can be observed in this particular case. #### 5. Simulation Before the comparison of the results, circuits are optimized for both speed and power. Optimization is performed using Levenberg-Marquardt algorithm embedded in HSPICE, using 0.25u Fujitsu transistor models. All input and output signals are loaded by 14 minimal inverters in the technology used. Clock frequency used in simulations is 500 MHz. The measure of circuit speed is data-to-output delay, as a true merit of flip-flop performance. Optimization for this parameter is not trivial since transistor widths can be optimized only with defined input waveforms, which are, strictly speaking, unknown unless optimal data-to-clock time (the one that results in optimal data-to-output) is specified. However, this is not the case until the end of the optimization process. To overcome these dependencies, iterative method of optimization is used. This method consists of four steps: - 1) Initial optimization for conventional optimization parameter (clock-to-output time) with a guess as an initial value - 2) Measurement of data-to-output characteristics, which gives current 'optimal' relative position of data with respect to clock edge. - 3) Several steps of data-to-output optimization with fixed data-to-clock time obtained from step 2. - 4) Comparison of obtained results with those from previous cycle and, if there is some significant change, return to step 2. In practice, sometimes it is needed to perform a little displacement of some of the optimization parameters before entering the next cycle, which is, in the essence, performing of some of Levenberg-Marquardt optimization steps manually. It should be, however, noted that this procedure is not guaranteed to always yield the best result and is based on well-chosen initial state for each optimization. The results must not be taken without awareness of these fluctuations. Power consumption that is optimized is assumed to be sum of circuit power dissipation itself and power dissipated on circuits that drive its inputs. This way load that is imposed by the gate to its driving circuits is taken into account, which results in more fair comparison and approaches to the real circuit environments conditions. The simulation conditions are mostly taken from [2], with some small changes in iterative optimization algorithm. #### 6. Results The performance evaluation results of proposed circuit are given in Table 1. The power entry refers to the input activity of 25%, i.e. one input change per two clock periods, which is assumed to be good approximation of the realistic case. Proposed circuit outperforms HLFF in terms of PDP for about 20%. Table 1. Performance Comparison | | Power<br>[mW] | D-Q delay<br>[pS] | PDP [fJ] | |---------------------|---------------|-------------------|----------| | HLFF | 0.879 | 123 | 108.11 | | CPFF | 0.751 | 115 | 86.37 | | Alternative<br>CPFF | 0.762 | 116 | 88.39 | Behavior of the circuit under different input pattern condition can be observed in Figure 4. Important observation is that for quiet input proposed circuit's power consumption is much lower than that of HLFF, because of conditional pre-charge capability. Figure 4. Power Consumption Comparison The difference between the proposed structure and its alternative version can also be observed from Figure 4. It can be seen that for low input activity proposed scheme performs a little better than alternative circuit. This is expected, having in mind that alternative circuit's power dissipation is augmented by the dissipation of additional inverter and parasitic capacitance on internal node. In addition, new design has shown favorable characteristic in suppressing clock arrival time uncertainty (soft-clock edge property) as compared to HLFF. This is the consequence of use of feedback from the output, which appears to mitigate data-to-clock time for optimal propagation delay requirement rigidity. Figure 5. Typical Flip-Flop Waveforms #### 6. Conclusion The design of improved Hybrid Latch Flip-Flop is presented. It utilizes new conditional pre-charge technique to reduce circuit's power consumption and eliminate the glitch at the output. New design also proposes conditional keeper structure that avoids clash at the output, saving the power consumed and reducing the delay in the critical path. The simulations show the improvement in relevant flip-flop performance parameter (PDP) to be about 20% better than original circuit. The improvement in soft-clock edge property is also noticed. In addition, another structure is proposed for use in the condition of high activity of flip-flop input. The cost for the use of this circuit is somewhat higher power dissipation in regular low-activity input behavior. #### 7. References [1] H. Partovi et al, "Flow-through latch and edge-triggered flip-flop hybrid elements", 1996 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC, San Francisco, CA, USA, 8-10 Feb. 1996. [2] V. Stojanovic and V. Oklobdzija, "Comparative Analysis of Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems", IEEE Journal of Solid-State Circuits, vol. 34, (no. 4), IEEE, April 1999. p.536-48.