Newsletter

Complex-valued arithmetic boosts audio DSP applications for automotive infotainment



Page 1 of 4

Courtesy of Embedded.com

Many new embedded consumer applications such as computer audio and video reproduction, home theatre, electronic musical instruments, and car infotainment require improved high quality digital audio. New formats, more demanding with respect to the old classical audio CD, are emerging in these mass markets to fill user needs for better sound quality at lower costs.

To meet the high performance requirements of such mainstream audio applications requires moving beyond the approaches that have been traditionally employed to reduce the computational burden and to force some algorithms into the real domain.

In the present environment, the modelling of audio can be better achieved using complex domain representations such as the STFT, Gabor transform or complex wavelets, especially if done on an advanced multicore DSP/RISC system-on-a-chip (Figure 1, below) design that natively supports floating point arithmetic in the complex domain and an extended 32 bit mantissa offering greater numeric precision without scaling problems.

Typical of the kinds of audio that can be addressed with such approaches is Dolby Digital Compression (used by most DVD-Videos, DTV, satellite and cable systems, supported also by the new PC audio interfaces), a lossy format providing up to 5.1 discrete channels of information with compression ratios ranging from 3.4:1 to 15:1.

After decoding, Dolby's discrete channels emerge for left, right, left rear, right rear and center channels, with a separate Low Frequency Effects (bass) channel for added high-impact effects. Dolby Digital uses a bit pool, which assigns bits to channels as needed (quantization varies between 16, 18, 20 and 24 bits/sample) rather than maintaining a fixed number of bits per channel.

Another is DTS (Digital Theatre Systems), a lossy audio format capable of delivering up to 6.1 discrete channels of audio in 24-bit precision. It is less compressed than Dolby Digital since compresses the seven channels independently with a 3:1 ratio, removing two-thirds of the redundant PCM data and multiplexing the remaining one-third into a single bitstream. More complex schemes have been also proposed such as THX Ultra2 Cinema Mode and THX MusicMode.

For high quality audio-only materials, SuperAudioCD has been proposed: it has the capability of reproducing up to 5.1 high quality channels using DST (Direct Stream Transfer) system. Another high-end audio format is the DVD-Audio disc that allocates most of its space to audio. It reproduces better-than-CD audio recordings using higher sampling rates (88.2, 96, and in some cases 192 kHz compared with 44.1 kHz of regular CDs) and higher resolution (20 to 24 bits compared with 16 bits of CDs) in stereo or in up to full 6-channel schemes.

To deal with such a high volume of data without reducing the quality, DVD-Audio can employ some form of lossless compression, such as MLP (Meridian Lossless Packing) a data packing method that provides a variable, signal-dependent compression ratio averaging only about 2:1 but capable to exactly reconstruct the original audio stream.

A common claim of all the new formats is that they can overcome the limitations of the audio CD standard which barely uses a stereo stream of uncompressed 16-bits PCM data sampled at 44.1 kHz. Better audio quality is achieved not only increasing the number of channels but also using more quantization bits (up to 24 per sample), higher sampling frequencies (up to 192 kHz) and more complex processing algorithms. Moreover post-processing systems, able to actively deal with the acoustic characteristics of the room where the audio is reproducing, are becoming concrete choices (such as Room Processors, Adaptable Speaker Array, etc.)

In such scenarios, specialized high performance DSP processors are essential to guarantee the required quality during all the processing stages needed for compressing, decompressing and elaborating high-precision high-sampling audio material. Besides the classical audio algorithms, such processors could allow to exploit more powerful and innovative algorithms, typically multichannel based, for new speech and music applications.

Audio applications employing complex arithmetic
The number of audio applications that make use, more or less, of complex-valued arithmetic is large and growing, including:

(1) Fast and accurate filtering of audio streams
The classical family of FFT based fast convolution algorithms, including new very low latency techniques.

(2) Audio encoding/decoding
The number of encoding/decoding schemes for speech and audio signals is really huge. Recently a lot of research activity has been devoted to new techniques for encoding the latest multichannel high quality formats. However, among all the encoding/decoding techniques, many algorithms work in frequency domain requiring complex-valued computations.
Even if common transform-based compression schemes are considered (mostly based on the widely used real-valued DCT), complex-valued implementations have been proposed, as for example the fast computation of IDCT.

(3) Acoustic Echo cancellation
The problem of acoustic echo cancellation (AEC) in hands-free devices or audio conference systems is mainly an identification and filtering problem. Due to the increasing demand of wideband speech quality, the sampling frequency of many hands-free systems is now high and therefore very long FIR filters have to be identified to model the impulse responses of typical office rooms.
The computational burden of the AEC in these cases can easily become very high, especially in stereo devices, often requiring efficient complex-valued computations such as in the block or partitioned block frequency domain adaptation schemes (e.g. generalizations of the simplest fast LMS). More advanced approaches make direct use of time-frequency analyses that can easily require complex-valued operations; see for example the use of Gabor transforms in stereo AEC devices.

(4) Noise reduction
Common companions of AEC devices are noise reduction algorithms, as for example those based on the spectral subtraction idea. Most techniques work in frequency domain and can require complex-valued computations, taking advantage of a strict integration with the AEC itself to spare complexity. Recently wavelet-based techniques have been also proposed.

(5)Audio/conditioning(3D)
New techniques are emerging to accurately reproduce audio scenes, or modify the actual listening environment. When these techniques are applied to quality audio, the required high sampling frequencies can make the modelling filters very long. Therefore cross-talk reduction and 3D filtering are often implemented using efficient complex-valued frequency domain techniques to reduce computational costs and to enhance performances. Subband based techniques are also used in these applications.

(6)Microphone array processing
Effective static or adaptive beamforming algorithms are particularly difficult to realize in the microphone array case, due to the characteristics of the problems (wideband, high reverberant). Many different proposals have been made to improve performances in real world array; they can easily require heavy computations implemented in the complex domain or can rely on time-frequency analysis/synthesis techniques such as subband approaches.

(7) Spatial localization
Both binaural and multisensor approaches to spatial localization of sound can imply complex-domain computations. An exemplary case is the self-aiming video camera usually employed in many new teleconferencing devices. The problem of automatically localizing the active talker is often mapped into the problem of repeated time-delay estimations between couples of microphones extracted by a suitable (small) microphone array. Both complex-valued full band FFT computations and subband approaches have been proposed to robustly estimate these delays.

(8)Bandwidth enhancement
In application such as speech transmission, telephony and digital radio, bandwidth enhancers are becoming more and more appealing. These non-linear devices try to reconstruct missing parts of spectra from the parts already known, e.g. get a wideband speech signal (16kHz) from its narrowband counterpart (8kHz) by predicting the missing spectral information. Such devices often employ complex-valued non-linear algorithms such as complex neural networks or other frequency domain techniques.



Page 2: High precision arithmetic for better audio quality  

Page 1 | 2 | 3 | 4








 Featured Jobs
Accenture seeking Project Management Team Lead in Charlotte, NC

Accenture seeking Software Engineer in Salt Lake City, UT

Boeing Company seeking Software Engineer in Herndon, VA

Switch and Data seeking Customer Solutions Engineer in Dallas, TX

Chart Industries seeking Sr. Developer in Cleveland, OH

More jobs on EETimesCareers
 Sponsor
 CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS:

 SPONSOR

 RECENT JOB POSTINGS
For more great jobs, career related news, features and services, please visit EETimes' Career Center.