Report Number: CSL-TR-95-675
Institution: Stanford University, Computer Systems Laboratory
Title: An Analysis of Division Algorithms and Implementations
Author: Oberman, Stuart F.
Author: Flynn, Michael J.
Date: July 1995
Abstract: Floating-point division is generally regarded as a low
frequency, high latency operation in typical floating-point
applications. However, the increasing emphasis on high
performance graphics and the industry-wide usage of
performance benchmarks forces processor designers to pay
close attention to all aspects of floating-point computation.
Many algorithms are suitable for implementing division in
hardware. This paper presents four major classes of
algorithms in a unified framework, namely digit recurrence,
functional iteration, very high radix, and variable latency.
Digit recurrence algorithms, the most common of which is SRT,
use subtraction as the fundamental operator, and they
converge to a quotient linearly. Division by functional
iteration converges to a quotient quadratically using
multiplication. Very high radix division algorithms are
similar to digit recurrence algorithms, but they incorporate
multiplication to reduce the latency. Variable latency
division algorithms reduce the average latency to form the
quotient. These algorithms are explained and compared in this
work. It is found that for low-cost implementations where
chip area must be minimized, digit recurrence algorithms are
suitable. An implementation of division by functional
iteration can provide the lowest latency for typical
multiplier latencies. Variable latency algorithms show
promise for simultaneously minimizing average latency while
also minimizing area.
http://i.stanford.edu/pub/cstr/reports/csl/tr/95/675/CSL-TR-95-675.pdf