 |
|
Application to Audio Coding
Texas Instruments Audio and Video/Imaging Series
|
The Odd Discrete Fourier Transform (ODFT) is used in many modern audio codecs which employ the Modified Discrete Cosine Transform (MDCT) for efficient signal representation. The ODFT is used implicitly in the computation of MDCT and also explicitly for the purpose of Psychoacoustic modeling. These codecs typically operate on one or more audio channel pairs. In the most prevalent stereo coding case, one channel pair consisting of Left and Right channel data is processed. Computation of the ODFT and MDCT of the two channels is responsible for a large fraction of the total complexity of the audio codec encoder. Here we present a fast algorithm for the ODFT computation of two channels. The proposed algorithm utilizes a special form of odd frequency symmetry in ODFT representation. The implication of odd frequency symmetry for the purpose of designing fast algorithms is not well understood in the literature. We present a formulation which allows for a complexity reduction by a factor of two in the computation of the ODFT of the two channels.
Introduction
In many applications one needs to compute a so called Odd Frequency Discrete Fourier Transform (ODFT) of a signal. For a sequence {x(n),n=0,...,N-1} its ODFT is defined as:
(1)
The ODFT differs from the conventional Discrete Fourier Transform (DFT) in the sense that frequency bins in the ODFT are offset by one-half frequency bin in comparison to the DFT. In other words ODFT analysis splits the frequency axis between 0 and 2p into N equally spaced frequency bins, each of width 2p/N. In contrast, in conventional DFT analysis, the first frequency bin is centered at the frequency f=0 and the remaining N-1 bins of width 2p/N are stacked starting from f = 2p/N. This offset in frequency bins results from the terms (k+1/2) in the exponent in Equation 1 (as opposed to k in conventional DFT). The bin stacking in ODFT is commonly referred to as odd stacking while the stacking in DFT is referred to as even stacking.
The ODFT is useful in various signal processing operations and particularly in the context of audio codecs. Many modern audio codecssuch as AC-3 [4], AAC [1], PAC [8], MP3 [2], and so onemploy the so called Modified Discrete Cosine Transformation (MDCT) for efficient signal representation (in other words, as the so called coding filterbank). As we will see in the next section, the MDCT uses the odd stacking of frequency bins similar to ODFT. The MDCT is sometimes computed with the help of ODFT. This is specially the case when both ODFT and MDCT analysis need to be performed on a given data (see next section). Also in audio codec applications, ODFT (and/or MDCT) analysis of two channel data is typically necessary (in other words, the Left and Right audio channels). This analysis is performed on a frame by frame basis (for example, a frame length of 1024 audio samples in AAC). ODFT has also been employed for other audio processing algorithms such as time scale modification [7].
As is well known in the signal processing world, the computation of DFT is done with the help of a fast algorithm called the Fast Fourier Transform (FFT). Several implementations of FFT are known, in particular an algorithm developed by Cooley and Tukey [3]. A variation called Split Radix FFT (SRFFT) is known to have a lower complexity than the Cooley-Tukey algorithm and is increasingly popular. When the DFT analysis of real data is involved, it is well known that substantial reduction in computational complexity can be achieved by utilizing the frequency symmetry of the DFT for real data. One such optimization for real data, for example for the SRFFT algorithm, called RSRFFT, is described in [9].
Optimization of ODFT for real data on the other hand is not well understood or well known. As described in the next section, ODFT can be computed using an FFT in a two step procedure whereby the input data is first pre-multiplied by a complex exponential function and subsequently an FFT on this complex data is performed. The problem in optimizing this algorithm for real data results from the pre-multiplication step which renders the data complex thus eliminating the symmetry that would have existed in the DFT of data (if it was real).
Here we describe a fast algorithm for ODFT when the application requires the computation of the transformation for two channels of real data. In this we show that by utilizing a special symmetry that exists in the ODFT domain (called odd frequency symmetry as opposed to even frequency symmetry in the DFT case), it is possible to compute the two transforms (for the two channels) with the help of a single complex transform (of equal length) and minimal post-modification. Thus we obtain a substantial reduction in complexity.
The DFT, ODFT, and, FFT
The DFT is widely used in signal processing and transforms a sequence of N complex numbers {x(n),n=0,...,N-1} into a sequence of N complex numbers {x(k),k=0,...,N-1} using the following transform:
(2)
If {x(n),n=0,...,N-1} is a sequence of real numbers, the DFT possesses a symmetry called even frequency symmetry as shown below:
X(k) = X * (N-k) (3)
where (*) represents the operation of complex conjugation.
The ODFT on the other hand as defined by Equation 1 has a different type of symmetry for real data. This form of symmetry, called the odd frequency symmetry, is described as:
X(k) = X * (N-k-1) (4)
The ODFT can be further expressed as
(5)
From Equation 5 it becomes apparent that the ODFT of sequence {x(n),n=0,...,N-1} is equivalent to the DFT of sequence {x'(n),n=0,...,N-1} (where x'(n) is as defined above).
The computation of a length N DFT requires N2 multiplications and approximately the same number of additions. In 1965, Cooley and Tukey [3] presented an FFT algorithm for DFT computation which requires a computational effort proportional to N log2N for N = 2m. Since then many FFT algorithms have been developed for DFT computations. One of these, for N = 2m, is the Split Radix FFT (SRFFT) [9], a mixed-radix approach, where the frequency index k is split into three subsets as demonstrated in Equations 6, 7, and 8:
(6)
(7)
(8)
The computational complexity of SRFFT for complex data [5] is given by:
(9)
where µc(N) and ac(N) are respectively the total number of real multiplications and real additions necessary to compute a length N complex DFT via the SRFFT.
The Modified Discrete Cosine Transform (MDCT)
The MDCT was first introduced in [11] and then further developed in [12]. MDCT is a linear orthogonal lapped transform, which is based on the idea of time domain aliasing cancellation (TDAC). It is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications (especially quantization), since it helps to avoid artifacts which arise due to the block boundaries. Compared to other Fourier-related transforms the MDCT is a bit different since it has half as many outputs as inputs (due to the overlapping). It is a linear function F :
2N →
N (where
is the set of real numbers) transforming 2N real numbers {x(n),n=0,...,2N-1} into N real numbers {X(k),k=0,...,N-1} according to:
(10)
The computation of MDCT using the above formula requires Θ(N2) operations, although it is possible to achieve Θ(N log2N) complexity by recursively factorizing the computation. Other ways to compute this is via other transforms such as DFT or a DCT combined with Θ(N) pre- and post-processing steps.
Audio Compression Applications
High quality audio require large amounts of data. For example, an audio disc (CD) stores a stereo audio signal comprised of two 16-bit data words sampled at 44.1 kHz; thus about 1.41 million bits/second of audio data are output from the player. Audio compression algorithms or audio codecs are designed to decrease the data rate of audio signals in order to reduce the size of audio files for various applications. Many codecs use a psychoacoustical model of the human auditory system (which exploits the perceptual redundancies in audio data) to reduce the amount of data needed to represent an audio signal by identifying imperceptible signal content. The applications of audio compression include distribution of streaming audio (real-time streaming) or interactive applications such as coding of speech for digital transmission over cell phone networks, handheld devices, and so on.
In order to achieve audio data compression there exist many different algorithms [10]. Examples include the MPEG-1 Layer I, II, and III, Dolby's AC-3, MPEG-2 AAC, MP3, PAC, and WMA. Some of the differences between these are in the way these use the human audio perceptual model, the type of sub-band coding, and some special tricks such as pre-echo detection to handle certain artifacts. The MDCT (or a hybrid representation with a MDCT stage) is a filterbank of choice in many of these audio codecs. These codecs typically operate on one or more audio channel pairs. For example in the most prevalent stereo coding case, one channel pair consisting of Left and Right channels is coded. The output of MDCT filterbank is quantized using quantizer steps derived from the Psychoaocutic model. Computation of Pyschoacoustic model also requires spectrum analysis, therefore, in codecs like AAC and PAC both an ODFT and MDCT is computed in parallel. A combined algorithm that generates both MDCT and ODFT in an integrated module is sometimes employed.
Motivations and Objectives for Developing a Fast Algorithm for Audio Codecs
The usability of codecs is determined by many factors including: the compression factor, the perceived audio quality, the speed of compression (and decompression), and the hardware/software support. For real-time applications on consumer devices the complexity of the algorithm becomes one of the critical factors in determining its usability. In some algorithmssuch as AAC, MP3, and PACa large number of samples have to be analyzed in order to implement a psychoacoustic model and/or achieve higher coding gain in the frequency domain. Thus, it is important to have fast implementation and optimized structure for the codec. Implementation of various audio encoders on TI platforms such as the C6713 DSP and decoders on platforms such as the C55xx/C64xx DSP families has been utilized. The complexity for performing two channel forward and/or backward transforms is a big fraction of the complexity of a typical audio codec. Therefore a substantial speed up in the transform algorithm increases the usability of these algorithms.
Simultaneous Computation of ODFT and MDCT
If applications that require both ODFT and MDCT for a frame of date (in other words, audio codec encoders that use ODFT for Psychoacoustic Modeling and MDCT for signal representation), it has been proposed [6] that the MDCT as defined in Equation 10 can be computed as:
(11)
In Equation 11 it is apparent that the inside summation is a length 2N ODFT of the sequence {x(n),n=0,...,2N-1}. Hence the above formulation allows for the computation of a length 2N ODFT (to be used, for example, in psychoacoutic modeling) and a length N MDCT in an integrated framework.
Proposed Fast Algorithm for the Computation of 2-Channel ODFT
We address the problem of fast computation of the ODFT of two real sequences (stereo channels),
xL(n) and xR(n) of length N. The ODFT for the two channels are defined respectively as:
(12)
and
(13)
Computation of these transforms via FFT using the formulation in Equation 5 requires two complex transforms of length N each. Here we investigate the possibility of computing the two transforms in Equations 12) and 13 using a single complex transform of length N.
As noted above for real sequences ODFT has the odd frequency symmetry of the form
X(k) = X * (N-k-1) (14)
This somewhat unusual form of symmetry for ODFT (in comparison to conventional even frequency symmetry in DFT) has led to a misconception that this symmetry is not particularly useful for the sake of fast algorithms. Here we show how it can be advantageously utilized to compute the ODFT transformation of two channels simultaneously using a single complex transform of length N. For this let's define:
(15)
and
(16)
Then we have (from the odd frequency symmetry property):
(17)
(18)
The transform of the two channels together is performed by defining a complex sequence:
(19)
Then the ODFT of y(n), Y(k) is given by (using the linearity of ODFT operation):
(20)
This implies that
(21)
Next we use the symmetry properties of XL(k) and XR(k) in Equations 17 and 18 to form Y(N-k-1) as
(22)
We therefore have
(23)
Solving the above four equations, we have:
(24)
(25)
(26)
(27)
Thus, the ODFT of two channels xL(n) and xR(n) can be computed using the following steps:
- Form an N point complex sequence
(n) = xL(n) + jxR(n)
- Pre-multiply
(n) by a complex exponent to form 
- Compute an N point complex DFT of y(n), Y(k), using (for example) SRFFT
- Compute XL(k) and XR(k) using the relationships in Equations 24-27
It may be noted that the operations in step 4 are purely addition operations, since the factor of 1/2 may be absorbed in an overall gain factor. Hence each of the equations in this step require N/2 additions.
Complexity Analysis of the Proposed Algorithm
As noted above the DFT of real data allows for significant complexity reduction since by using the symmetry of DFT one can perform the transformation using FFT of a half length, (see [9]). The ODFT however involves the computation of the DFT of a complex sequence (Equation 5), the conventional approaches to reducing the complexity of the transform (which rely on x(n) being a real sequence) are not valid anymore and one is forced to perform two transforms of full length. Below we analyze the complexity using this conventional approach to computing two channel ODFT and the complexity using the proposed algorithm.
Complexity using the Traditional Approach
Using the conventional approach, the following operations are necessary:
- Pre-multiplication for each of the two channels Left and Right requiring a total of 2N multiplications.
- Two FFTs. Assuming each is a mixed radix FFT the number of operations can be estimated as in [9].
The total operation is given by
(28)
where µc(N,2) and ac(N,2) are respectively the total number of real multiplications and real additions necessary to compute the two-channel ODFT using a conventional approach.
Complexity using the Proposed Algorithm
Using the proposed algorithm requires the following operations:
- 4N multiplications for pre-multiplication step.
- Operations related to one N length complex FFT.
- 4(N/2) = 2N additions for the post twiddle step.
Once again, assuming that the FFT is a fast split radix FFT as in [9], the total number of operations can be counted as:
(29)
where µp(N,2) and ap(N,2) are respectively the total number of real multiplications and real additions in the proposed algorithm. Comparing Equation 29 with Equation 28 clearly illustrates the advantages of the proposed approach.
Conclusions, Discussions, and Directions
This paper presented a fast algorithm to compute ODFT (and hence MDCT) for two channel applications. The purpose of this is to improve and substantially reduce computation requirements for modern audio codecs (particularly the encoder). The proposed algorithm offers a speed of about two times in the computation of two channel ODFT transformations.
About the Authors
Neelu Sinha is currently an Assistant Professor in the Department of
Computer Science at Fairleigh Dickinson University in Madison, NJ. She received her MS and PhD in Electrical Engineering from Iowa State University in Ames, IA. She was previously at Control Data Corporation in Plymouth, MN and the Bell Laboratories in Whippany, NJ. Her current areas of interest are digital image and audio watermarking, information security, multimedia
database indexing and content based retrieval,
distance learning and Web assisted online course offerings.
Dr. Anibal Ferreira is an Assistant Professor in the Department of Electrical and Computer Engineering at the University of Porto in Portugal. He has been teaching courses in Signal Theory, Telecommunication Systems, Digital Signal Processing, Multimedia Communication Technologies and DSP Simulations. In 1990/1991 and in 1993, Dr. Ferreira joined the Signal Processing Research Department at AT&T Bell Laboratories (in Murray Hill, NJ) as a consultant, where he co-authored the development and optimization of the first version of the AT&T Perceptual Audio Codec (PAC). More recently, Dr. Ferreira has joined ATC Labs in June 2004 as Chief Technology Officer. He graduated in Electrical and Computer Engineering from University of Porto, Portugal, in 1988, and received his M.Sc. and Ph.D. degrees in 1992 and 1998 respectively. Dr. Ferreira is a recipient of the "Portuguese 1998 IBM Scientific Prize" that distinguishes relevant research work on Applied Computing. His interests include psychoacoustics, audio analysis and coding, multirate filter banks and algorithms for real-time digital audio applications. He is a member of AES, IEEE and ASA, and participates in several technical committees.
References