Linear Predictive Spectral Shaping for Acoustical Echo Cancellation

M.Eng. Thesis, November 1995

Supervisor: P. Kabal

The purpose of this Thesis is to study adaptive acoustical echo cancellation for signals with variable-rank covariance matrices. Solutions based on the least-mean-square (LMS) algorithm are presented, with the focus being on discrete-cosine-transform- (DCT) domain finite-impulse-response (FIR) filters.

In speech-related applications, the covariance matrix of the reference signal is often nearly singular, i.e., rank-deficient, which has the effect that some of the transform-domain tap coefficients stop adapting and effectively "freeze". During this low-rank phase, the frozen taps can retain any value without effect on the mean-square error (MSE), while the remaining taps track the evolution of the system and keep the MSE at a minimum.

When the covariance matrix becomes nonsingular, however, there are no longer any frozen coefficients, and a unique tap coefficient vector yields minimum MSE. The MSE abruptly "jumps", and convergence of the taps to the unique vector will take additional time due to the (obsolete) values of the previously frozen coefficients. To remedy the situation, one applies a method dubbed "spectral shaping".

The objective of spectral shaping is to replace, during the low-rank phase, each frozen coefficient by an estimate of the corresponding coefficient of the unique full-rank solution. This is achieved in the transform domain by a combination of forward and backward linear predictors. By using spectral shaping, the frozen coefficients are thus "prepared" to be unfrozen when the covariance matrix gains full rank, resulting in a reduced jump in the MSE.

Predictive Split Vector Quantization for Speech Coding

M.Eng. Thesis, May 1994

Supervisors: W.-Y. Chan and P. Kabal

The purpose of this thesis is to examine techniques for efficiently coding speech Linear Predictive Coding (LPC) coefficients. Vector Quantization (VQ) is an efficient approach to encode speech at low bit rate. However its exponentially growing complexity poses a formidable barrier. Thus a structure vector quantizer is normally used instead.

Summation Product Codes (SPC's) are a family of structured vector quantizers that circumvent the complexity obstacle. The performance of SPC vector quantizers can be traded off against their storage and encoding complexity. Besides the complexity factors, the design algorithm can also affect the performance of the quantizer. The conventional generalized Lloyd's algorithm (GLA) generates sub-optimal codebooks. For a particular SPC such as multistage VQ, the GLA is applied to design the stage codebooks stage-by-stage. Joint design algorithms on the other hand update all the stage codebooks simultaneously.

In this thesis, a general formulation and an algorithm solution to the joint codebook design is provided for the SPC's. The key to this algorithm is that every SPC has a reference product codebook which minimizes the overall distortion. This joint design algorithm is tested with a novel SPC, namely "Predictive Split VQ (PSVQ)".

VQ of speech Line Spectral Frequencies (LSF's) using PSVQ is also presented. A result in this work is that PSVQ, designed using the joint codebook design algorithm requires only 20 bits/frame (20 ms) for transparent coding of the 10-th order LSF's parameters.

A Low-Delay Code Excited Linear Prediction Speech Coder at 8 kbits/s

M.Eng. Thesis, May 1994

Supervisor: P. Kabal

The goal of this thesis is to design a high quality low-delay 8 kb/s speech coder. This research is motivated by the need of telecommunication industries to standardize a high quality, low-delay and low rate speech coder. To meet these requirements, we use a coder based on code-excited linear prediction. To meet the demands of high quality and low bit rate, a vector quantizer is used to code the excitation signal. To meet the low-delay requirement, a backward adaptation technique of the synthesis filters is used. The focus of the research is on comparing different pitch synthesis filters in the CELP coder. From the three-order pitch synthesis filter, the first order integer delay pitch synthesis filter and the first-order fractional delay pitch synthesis filter that are experimented in this research, the latter produces the best quality.

Array Processing for Detection and Localization of Narrowband, Wideband and Distributed Sources

Ph.D. Thesis, May 1994

Supervisor: P. Kabal

The detection and estimation techniques that are used in array processing depend on the spatial and temporal characteristics of the signals that arrive at the array. In this dissertation, we consider narrowband as well as wideband signals. For narrowband signals, a detection method based on the predictive stochastic complexity (PSC) is developed. The PSC of data is computed for all the models with order smaller than the number of sensors. The number of signals is selected by choosing the minimum of the PSC over all models. The PSC criterion is on-line and can be used for time varying systems and target tracking.

We also consider wideband signals. One approach to wideband array processing is based on sampling the spectrum of the source signals to generate narrowband signals. Then, using a focusing approach, the information at different frequency bins are combined. Here, an optimal method to select a focusing subspace for the well-known coherent signal-subspace method (CSM) is proposed. It is also shown that with the CSM method unbiased estimation of the directions-of-arrival (DOAs) is not possible. Inspired by the CSM algorithm, a new method for wideband array processing is developed which is based on two-sided transformation of the correlation matrices (TCT). The TCT estimator can generate unbiased estimates of the DOAs and has a lower resolution threshold than the CSM algorithm.

In array processing it is frequently assumed that the signals are generated by point sources. This is an assumption which is not satisfied in reality. In this dissertation, a method is developed for localization of spatially distributed sources. The method is based on generalization of the MUSIC algorithm and is applied to coherent and incoherent distribution of sources.

Auditory Distortion Measures for Speech Coder Evaluation

Ph.D. Thesis, October 1993

Supervisor: P. Kabal

One of the important research problems in the area of speech coding is to determine the sound quality of coded speech signals. This quality can best be evaluated by a subjective assessment which is often difficult to administer and time-consuming. An objective measure which is consistent with subjective assessment could play a vital role in the evaluation as well as in the design of a low bit-rate speech coder. In this dissertation, we introduce two distortion measures for speech coder evaluation. Since the perceptual abilities of a human being determine the precision with which speech data must be processed, we consider the details of cochlear (inner ear) and other auditory processing. Using Lyon's auditory model, the time-domain speech signal is mapped onto a perceptual-domain (PD). Any speech utterance is communicated to the brain through a series of all-or-none electrical spikes (firings) and the PD representation provides information pertaining to the probability-of-firings in the neural channels. Our first measure, namely the cochlear discrimination information (CDI), evaluates the cross-entropy of the neural firings for the coded speech with respect to those for the original one. With this measure, we also compute the rate-distortion function determining the lowest bit-rate required for a specified amount of distortion. In the second measure, namely the cochlear hidden Markovian (CHM) measure, we attempt to capture the high-level processing in the brain with simple hidden Markov models (HMMs). We characterize the firing events by HMMs where the order of occurrence of PD observations and correlations among adjacent observations are modeled suitably. For computing the coder distortion, the PD observations of the coded speech are matched against the HMMs derived from the PD observations of the original speech. Experimental results show that these measures conform to subjective evaluation results in majority of the cases. Finally, the introduced measures are also applied in speech coder analysis, e.g., in the pitch frequency determination and the evaluation of noise weighting schemes.

Pitch Modelling for Speech Coding at 4.8 kbits/s

M.Eng. Thesis, September 1993

Supervisor: P. Kabal

The purpose of this thesis is to examine techniques of efficiently modelling the Long-Term Predictor (LTP) or the pitch filter in low rate speech coders. The emphasis in this thesis is on a class of coders which are referred to as Linear Prediction (LP) based analysis-by-synthesis coders, and more specifically on the Code-Excited Linear Prediction (CELP) coder which is currently the most commonly used in low rate transmission. The experiments are performed on a CELP based coder developed by the U.S. Department of (DoD) and Bell Labs, with an output bit rate of 4.8 kbits/s.

A multi-tap LTP outperforms a single-tap LTP, but at the expense of a greater number of bits. A single-tap LTP can be improved by increasing the time resolution of the LTP. This results in a fractional delay LTP, which produces a significant increase in prediction gain and perceived periodicity at the cost of more bits, but less than for the multi-tap case.

The first new approach in this work is to use a pseudo-three-tap pitch filter with one or two degrees of freedom of the predictor coefficients, which gives a better quality reconstructed speech and also a more desirable frequency response than a one-tap pitch prediction filter. The pseudo-three-tap pitch filter with one degree of freedom is of particular interest as no extra bits are needed to code the pitch coefficients.

The second new approach is to perform time scaling/shifting on the original speech minimizing further the minimum mean square error and allowing a smoother and more accurate reconstruction of the pitch structure. The time scaling technique allows a saving of 1 bit in coding the pitch parameters while maintaining very closely the quality of the reconstructed speech. In addition, no extra bits are needed for the time scaling operation as no extra side information has to be transmitted to the receiver.

Toll-Quality Speech Coding at 8 kb/s

M.Eng. Thesis, February 1993

Supervisor: P. Kabal

There has been an ongoing effort to achieve very high quality speech coding at medium transmission bit rates. Consequently, the TIA has chosen the Vector Sum Linear Predictive (VSELP) implementation of an 8 kb/s coder to be the standard for North-American cellular digital telephony. However, it was only recently that in view of the increased research focus on developing toll-quality speech coding at such bit rates, the CCITT has imposed a set of specifications for standardizing low-delay coders operating at 8 kb/s. The Low-Delay Code Excited Linear Predictive (LD-CELP) suggested by Chen is presently the only potential candidate for CCITT standardization, achieving a one-way coding delay of 10 ms. However, just like the VSELP coding algorithm, the 8 kb/s LD-CELP version does not quite yield toll-quality reconstructed speech. The purpose of the work in this thesis is to show that, by slightly relaxing the coding delay constraint, perceptual enhancement techniques yield toll-quality coding after redesigning and fine-tuning the optimization and quantization procedures of a CELP coder.

Issues in forward adaptive linear prediction analysis such as windowing and prediction order are studied. Once a suitable analysis method is chosen, the attention is directed toward the quantization of the LPC parameters. With transparent quantization of those parameters being a must for toll-quality coding, an LSF split vector quantization scheme endowed with an improved perceptual distortion measure overcomes the challenge. Joint optimization of the CELP synthesis parameters is then shown to yield improved results when compared to the usual sequential approach. Due to the limited bit resources for quantizing the synthesis parameters, a performant gains (pitch and codebook) vector quantizer is developed. Nevertheless, perceptual enhancement techniques of the coded speech quality remain the major contributors to toll-quality coding. The speech periodicity is improved by both increasing the resolution of the long term predictor delays and by combining the spectral noise weighting with an adaptive harmonic weighting scheme. Coded speech quality comparable to that of a 7-bit log PCM is however only attained with the introduction of a delayed-decision coding technique, extending the CELP parameter selection process beyond the subframe boundary with no extra cost in coding delay.

M.Eng. Thesis, February 1993

Supervisor: P. Kabal

The purpose of this thesis is to study the coding of wideband speech and to improve on previous Code-Excited Linear Prediction (CELP) coders in terms of speech quality and bit rate. To accomplish this task, improved coding techniques are introduced and the operating bit rate is reduced while maintaining and even enhancing the speech quality.

The first approach considers the quantization of Linear Predictive Coding (LPC) parameters and uses a three way split vector quantization. Both scalar and vector quantization are initially studied; results show that, with adequate codebook training, the second method generates better results while using a fewer number of bits. Nevertheless, the use of vector quantizers remains highly complex in terms of memory and number of computations. A new quantization scheme, split vector quantization (split VQ), is investigated to overcome this complexity problem. Using a new weighted distance measure as a selection criterion for split VQ, the average spectral distortion is significantly reduced to match the results obtained with scalar quantizers.

The second approach introduces a new pitch predictor with an increased temporal resolution for periodicity. This new technique has the advantage of maintaining the same quality obtained with conventional multiple coefficient predictors at a reduced bit rate. Furthermore, the conventional CELP noise weighting filter is modified to allow more freedom and better accuracy in the modeling of both tilt and formant structures. Throughout this process, different noise weighting schemes are evaluated and the results show that the new filter greatly contributes in solving the problem of high frequency distortion.

The final wideband CELP coder is operational at 11.7 kbits/s and generates a high perceptual quality of the reconstructed speech using the fractional pitch predictor and the new perceptual noise weighting filter.

Thesis titles.