Digital Signal Processing (DSP) Numerical Fidelity

June 8th, 2018 by

The digital signal processing (DSP) architecture is based on accurate and predictable real-time digital signal processing requirements that correspond to filter output, input data, and filter coefficients mathematical equations. As such, DSP relies on mathematical operations like addition and multiplication to get its job done, and its numerical fidelity must be maximized for it to perform optimally.


To achieve this, the errors caused by a finite number of bits used in number representation and arithmetic operations should be minimized. DSPs can accomplish this through numeric representation or by using dedicated hardware features.

DSPs address number representation in two ways, fixed point and floating point.

Fixedpoint DSPs are tasked to perform integer and fractional arithmetic as well as support 16, 24 or 32 bits data widths and signed and unsigned fractions and integers. The fractions are values between the -1.0 to 1.0 range. Fixedpoint DSPs can be implemented in hardware in a less costly, more energy-efficient way and with less silicon compared to floating point DSPs. They also support faster clocks.

Floating point DSPs represent numbers with an exponent and a mantissa. The exponent controls the dynamic range while the mantissa controls number precision. The numbers are scaled to utilize the available word lengths and maximize attainable precision. Floating point numbers offer a high dynamic range that is valuable when handling large data sets and data sets with no defined or predictable range. However, the numbers are not equally paced; the gap between adjacent numbers is determined by their magnitude. As such, larger numbers have larger gaps and vice versa.

Download Better DSP Performance with an RTOS Platform Whitepaper

DSP dedicated hardware features for improving numerical fidelity include:

Large accumulatorregisters are used to hold intermediate and final arithmetic operations results. The registers are at least four bits wider than normal registers to prevent overflow during accumulation operations. The extra bits are referred to as guard bits, and they allow retention of higher precision among intermediate computation steps.

Flagsindicate an over or underflow when they occur. They are mainly connected to interrupts to allow the calling of exception-handling routines.

Saturated arithmetic implies that the numbers in use are saturated to the maximum value they can represent. The maximum saturation prevents the wrap-around phenomena.

Organizations can optimize the numerical fidelity of their DSP architecture by using the above strategies for number representation and dedicated hardware features.

Download Better DSP Performance with an RTOS Platform Whitepaper


Leave a Comment