Typical characteristics
Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and repetitively on a set of data. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted again to analog form, as diagrammed below. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable.
A simple digital processing system
Most general-purpose microprocessors and operating systems can execute DSP algorithms successfully, but are not suitable for use in portable devices such as mobile phones and PDAs because of power supply and space constraints. A specialized digital signal processor, however, will tend to provide a lower-cost solution, with better performance, lower latency, and no requirements for specialized cooling or large batteries. The architecture of a digital signal processor is optimized specifically for digital signal processing. Most also support some of the features as an applications processor or microcontroller, since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below.  Architecture
By the standards of general purpose processors, DSP instruction sets are often highly irregular. One implication for software architecture is that hand-optimized assembly is commonly packaged into libraries for re-use, instead of relying on unusually advanced compiler technologies to handle essential algorithms. Hardware features visible through DSP instruction sets commonly include: •Hardware modulo addressing, allowing circular buffers to be implemented without having to constantly test for wrapping. •A memory architecture designed for streaming data, using DMA extensively and expecting code to be written to know about cache hierarchies and the associated delays. •Driving multiple arithmetic units may require memory architectures to support several accesses per instruction cycle •Separate program and data memories (Harvard architecture), and sometimes concurrent access on multiple data busses •Special SIMD (single instruction, multiple data) operations •Some processors use VLIW techniques so each instruction drives multiple arithmetic units in parallel •Special arithmetic operations, such as fast multiply-accumulates (MACs). Many fundamental DSP algorithms, such as FIR filters or the Fast Fourier transform (FFT) depend heavily on multiply-accumulate performance. •Bit-reversed addressing, a special addressing mode useful for calculating FFTs •Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing •Deliberate exclusion of a memory management unit. DSPs frequently use multi-tasking operating systems, but have no support for virtual memory or memory protection. Operating systems that use virtual memory require more time for context switching among processes, which increases latency.  Program flow
•Floating-point unit integrated directly into the datapath •Pipelined architecture
•Highly parallel multiplier–accumulators (MAC units)
•Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations  Memory architecture
•DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time: oHarvard architecture
oModified von Neumann architecture
•Use of direct memory access
•Memory-address calculation unit
 Data operations
•Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow...