Multimedia Signal Processing Laboratory

Report Abstracts

P. Kabal

TSP Speech Database

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2, 29 pp., Nov. 2018

This report descibes the TSP speech database. The database consists of over 1400 utterances spoken by 24 speaers (half male, half female). The data was recorded to Digital Audio Tape in an anechoic room. The database includes the original samples (48 kHz sampling rate) and the same data filtered and subsampled to 6 kHz and 8 kHz sampling rates.

Database Download Link

P. Kabal

Combinatorial Coding and Lexicographic Ordering

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1, 30 pp., Feb. 2018

This document examines methods to generate a combinatorial index for the selection of items, and the decoding of the index to produce the corresponding selection of items. Marching through the indices produces lexicographically ordered selections. Three cases are considered: Selections with no repeated items, selections with repetitions, and selections with prescribed repetition multiplicities.

P. Kabal

ITU-T G.723.1 Speech Coder: A Matlab Implementation

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2f, 54 pp., Dec. 2017 (initial version Nov. 2003)

Matlab code: G.723.1-v2r1b.tar.gz

This report documents the details of the processing steps in the ITU-T G.723.1 Speech Coder. This report accompanies an implantation of that coder in Matlab. The Matlab implementation was designed to facilitate experimentation and research using a practical speech coder as a base.

P. Kabal

Minimum Mean-Square Error Filtering: Autocorrelation/Covariance, General Delays and Multirate Systems

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 0.985. 215 pp., April 2011

These notes examine procedures for solving for minimum mean-square error filter. The stochastic case and the block-based (least-squares) analyses are covered in a single formalism. The filtering is analyzed in more generality than in many expositions, allowing for configurations with general filter delays and flexible windows for the least squares problem. The important linear prediction problem is examined in detail. For the equally spaced delay case, a rich set of results ensue. Several topics are covered that are missing from many textbooks: affine estimation (non-zero means), cyclostationary signals (for multirate signals), fractionally spaced equalizers, joint process estimation in relation to the Levinson algorithm, and an approximate formulation for linearly constrained filters.

P. Kabal

Frequency Domain Representations of Sampled and Wrapped Signals

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.5, 16 pp., March 2011 (initial version Jan. 2008)

These notes examine the relationships between frequency domain representations of discrete-time and wrapped signals derived from a continuous-time signal. The first part of these notes develops the relationships for periodic signals which allow for the analysis of periodic signals within the framework of the Fourier transform. The second part examines the relationships between the Fourier series, the Discrete-Time Fourier Transform (DTFT) and the Discrete Fourier Transform (DFT).

P. Kabal

The Equivalence of ADPCM and CELP Coding

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.2, 14 pp., March 2011 (initial version April 2010)

This document examines a coding schemes which differentially code signals while at the same time controlling the frequency characteristics of the coding (quantization) error. We show that a (vector quantized) version of an Adaptive Differential Pulse Code Modulation (ADPCM) system using noise feedback to shape the quantization noise can be converted to an equivalent system which is in the form of a Code Excited Linear Prediction (CELP) system. While this equivalence is known by, or at least not a surprise to, the signal processing cognoscenti, it is not widely appreciated by many others. We also try to add a historical perspective on the development of these systems.

P. Kabal

Minimum Phase & All-Pass Filters

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2.1, 28 pp., March 2011 (initial version Nov. 2007)

This document analyzes minimum-phase and all-pass filters. The analysis allows for complex-valued filter coefficients. The properties of the frequency responses (amplitude, phase, and group delay) of these filters are discussed.

P. Kabal

Time Windows for Linear Prediction of Speech

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2a, 43 pp., Nov. 2009 (initial version 2003-10)

This report examines the time windows used for linear prediction (LP) analysis of speech. The goal of windowing is to create frames of data each of which will be used to calculate an autocorrelation sequence. Several factors enter into the choice of window. The time and spectral properties of Hamming and Hann windows are examined. We also consider windows based on Discrete Prolate Spherical Sequences including multiwindow analysis. Multiwindow analysis biases the estimation of the correlation more than single window analysis. Windows with frequency responses based on the ultraspherical polynomials are discussed. This family of windows includes Dolph-Chebyshev and Saramäki windows. This report also considers asymmetrical windows as used in modern speech coders. The frequency response of these windows is poor relative to conventional windows. Finally, the presence of a "pedestal" in the time window (as in the case of a Hamming window) is shown to be deleterious to the time evolution of the LP parameters.

P. Kabal

FIR Filters: Frequency-Weighted and Minimum-Phase Designs

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.6, 32 pp., Nov. 2007 (initial version Sept. 2004)

Matlab code: FilterDesign-M-v2r0.tar.gz

P. Kabal

Improving the Presentation of Matlab Plots

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, June 2006

Matlab code: Matlab-Plot-v1r3.tar.gz

This document describes a number of strategies which go towards the goal of producing publication quality plots from Matlab. One finds much to criticize in the quality of plots that are reproduced in today's journals. This is due to the fact that the authors supply the plots without having a clear view of how they will be processed to produce the final plot on the printed page. We give some guidelines and supply Matlab routines that streamline the application of these guidelines.

P. Kabal

Matlab Plots in Microsoft Word

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Jan. 2006

This report looks at different options for inserting plots generated from Matlab into Microsoft Word document. For publication quality output, it is important to control the size of the graphic that will appear in the final document. The graphic should be drawn at its final size in Matlab. Scaling in Word is undesirable, as it not only scales the plot, but also the text on the graphic. This report outlines a procedure that sets the size of the figure and the font size in Matlab. Once set, the graphic can be imported into Word with no further scaling.

Results indicate that the PostScript format is the best option for good quality graphics. Graphics imported using cut and paste from Matlab (EMF or bitmap format) are noticeably inferior in quality.

P. Kabal

Windows for Transform Processing

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Dec. 2005

This report examines the time windows used in processing of signals in a transformed domain. The goal of windowing is to create frames of data, each of which will be used to calculate a transformed sequence. The transform coefficients are then modified (filtering for instance for noise reduction) or coded (transform coding). The modified transform coefficients are then applied to an inverse transform and windowed again before creating an output signal using addition of the overlapped blocks. It is the analysis window (before the transform) and the synthesis window (after the inverse transform) that are examined in this report. The requirement for perfect reconstruction (when the transform coefficients are not modified) is developed. This gives a condition on the product of the analysis and synthesis windows. An argument is given to show that if additive noise is introduced in the transform domain, the windowing should be equally apportioned between these windows, i.e. the analysis and synthesis windows should be the same. The windowing requirements for systems implementing block-by-block filtering of the input signal in the transform domain are also examined.

P. Kabal

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003

This report examines schemes that modify linear prediction (LP) analysis for speech signals. First techniques which improve the conditioning of the LP equations are examined. White noise compensation for the correlations is justified from the point of view of reducing the range of values which the predictor coefficients can take on. A number of other techniques which modify the correlations are investigated (highpass noise, selective power spectrum modification). The efficacy of these procedures is measured over a large speech database. The results show that white noise compensation is the method of choice - it is both effective and simple.

Other methods to prematurely terminate the iterative solution of the correlation equations (Durbin recursion) to circumvent problems of ill-conditioning are also investigated.

The report also considers the bandwidth expansion of digital filters which have resonances. In speech coding such resonances correspond to the formant frequencies. Bandwidth expansion of the LP filter serves to avoid unnatural sharp resonances that may be artefacts of pitch and formant interaction. Lag windowing of the correlation values has been used with the aim of both bandwidth expansion and helping the conditioning of the LP equations. Experiments show that the benefit for conditioning is minimal. This report also discusses bandwidth expansion of the prediction coefficients after LP analysis using radial scaling of the z-transform. A simple new formula is given which can be used to estimate the bandwidth expansion.

R. Der

Stable Symmetric Distributions and Their Role in the Signal Separation Problem

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003

This report examines the problem of blind source separation when the sources are distributed from a stable class. We show that cost functions extemising any marginal property of a mixture of signals are constant over the set of symmetric stable distributions, and thus cannot solve the blind source separation problem in full generality. These distributions are non-pathological, but have infinite energy. The noticeable exception is the Gaussian distribution, for which the separation problem is inherently undetermined. For finite variance signals, the use of marginal statistics for blind signal separation is justified.

P. Kabal

An Examination and Interpretation of ITU-R BS.1387: Perceptual Evaluation of Audio Quality

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, May 2002 (updated Dec. 2003)

This report examines the standard which describes a method for the objective measure of perceived audio quality (ITU-R Recommendation BS.1387). This standard uses a number of psycho-acoustical measures which are combined to give a measure of the quality difference between two instances of a signal (a reference and a test signal). Many aspects of the standard are under-specified. This report examines alternate interpretations. It also looks at efficiency issues in the implementation of computationally intensive parts of the algorithm.

Matlab code: PQevalAudio-v1r0.tar.gz

R. Der

Blind Signal Separation

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Sept. 2001

Blind Signal Separation is the task of separating signals when only their mixtures are observed. Recently, Independent Component Analysis has become a favourite method of researchers for attacking this problem. We review the techniques, from cumulant-based algorithms to Infomax to second-order statistics, from feedback to feedforward architectures, from the instantaneous to the convolutional problem. A new method for reducing the whitening effect on speech, known to occur in feedforward architectures, is introduced. The procedure also possesses significant stabilization properties, being based on performing the filter update in the LP-residual domain of speech. Experimental tests are conducted, and the algorithms compared.

P. Kabal

Generating Gaussian Pseudo-Random Deviates

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Oct. 2000

This report examines low-complexity methods to generate pseudo-random Gaussian (normal) deviates. We introduce a new method based on modelling the Gaussian probability density function using piecewise linear segments. This approach is shown to be both efficient and accurate. It does not require the calculation of transcendental functions

All of the methods considered map one or more uniform distributions to create the Gaussian deviates. This report investigates the effect of the use of discrete variates, particularly in the tails of the Gaussian distribution. In addition, we give a new interpretation of the method of aliases that suggests its application to non-uniform quantization.

P. Kabal

Formatting a Thesis with LaTeX

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, March 2000 (updated June 2005)

Thesis Macros:

This report describes the use of LaTeX to format a thesis. A number of topics are covered: content and organization of the thesis, LaTeX macros for controlling the thesis layout, formatting mathematical expressions, generating bibliographic references, importing figures and graphs, generating graphs in Matlab, and formatting tables. The LaTeX macros used to format a thesis (and this document) are described. As well, Matlab procedures are shown to illustrate methods that can be used to format graphs in a form suitable for inclusion in a LaTeX document.

P. Kabal

Matlab Plots in Microsoft Word

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, March 2000
Superseded by the version of Jan. 2006

P. Kabal

Measuring Speech Activity

MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, August 1999

This report discusses the algorithm described in ITU-T Recommendation P.56 for measuring the active speech level. Method B in P.56 determines a speech activity factor representing the fraction of time that the signal is considered to be active speech (as opposed to background idle noise) and the corresponding active level for the speech part of the signal. The basic algorithm generates an envelope value at each sample time. The envelope values are compared with a discrete set of thresholds. The (approximate) active speech level is determined by interpolating in the log domain between the threshold values. In this report we assess the effects on the speech active level due to interpolation. Recommendation P.56 allows for sampling rates as low as 600 Hz. Results for subsampled data are compared with those calculated at the full speech sampling rate.

C. C. Chu and P. Kabal

Codebook Excited Linear Prediction of Speech: Performance in the Presence of Channel Errors

Technical Report 88-10, INRS-Telecommunications, University of Quebec, March 1988.

P. Kabal

Code Excited Linear Prediction Coding of Speech at 4.8 kb/s

Technical Report 87-36, INRS-Telecommunications, University of Quebec, July 1987

This report describes a software implementation of an algorithm for digital coding of speech at low bit rates. The reconstruction of the speech signal is accomplished by exciting a cascade of a formant synthesis filter and a pitch synthesis filter with an excitation waveform. The excitation waveform is selected from a dictionary of waveforms using a frequency weighted mean-square error criterion. At transmission rates in the neighborhood of 5 kb/s, this scheme produces speech with better quality than any other known scheme.

J.-L. Moncet and P. Kabal

Codeword Selection for CELP Coders

Technical Report 87-35, INRS-Telecommunications, University of Quebec, July 1987

This report describes the algorithm used for selecting an excitation waveform for a CELP coder operating at 5 kb/s. Each candidate waveform is used to synthesize a segment of speech. A frequency weighted error criterion is used to find the waveform which regenerates the best output speech. The synthesis operation uses both a pitch synthesis filter and a formant synthesis filter. The pitch synthesis filter is optimized to give the best output speech. This optimization offers a significant improvement over a procedure which uses a pitch filter chosen by analyzing the input speech. Simplified sequential versions of this strategy also give good quality speech. The quantization of the parameters is also considered.

C. C. Chu and P. Kabal

Coding of LPC Parameters for Low Bit Rate Speech Coders

Technical Report 87-19, INRS-Telecommunications, University of Quebec, March 1987

This report summarizes the results of a study of the use of line spectral frequencies (LSF's) for the low bit rate coding of the linear predictive (LPC) parameters for use in a speech coder. Different forms of quantization using LSF's for the LPC coefficients are examined. An LSF based scheme allows the quantizer to take into account the perceptual impact of spectral distortion. One of the schemes considered takes advantage of the frame-to-frame correlation of the LSF parameters. The LSF based coding scheme is compared to a quantization based on a reflection coefficient representation. The application of this quantization scheme to the low rate, 4800 bits/sec, Code Excited Linear Predictive (CELP) coder is considered. In this context, adaptive gain factors for the differential (frame-to-frame) LSF quantizer are useful. In addition, a frame-to-frame interpolation scheme is proposed. With these modifications, LSF coding of 10 LPC parameters requires 1150 bits/sec.

L. Barbeau, D. Bernardi, C. C. Chu, P. Kabal, J.-L. Moncet, and D. O'Shaughnessy

Speech Enhancement in the Presence of Interfering Music and Noise

Technical Report 87-09, INRS-Telecommunications, University of Quebec, Jan. 1987

This report summarizes the results of speech enhancement experiments for a signal consisting of speech in the presence of interfering music and noise. Filtering was applied to remove hum and high frequency components. The composite signal was then frequency equalized to flatten the noise spectrum. A reference recording of the same passage of music which interferes with the original recording was obtained and time aligned with the composite recording.

The time-aligned reference music was processed through an adaptive filter and then subtracted from the composite recording. This results in a noticeable reduction and muffling of the music level. While before music cancellation, the music tended to dominate the composite signal, after cancellation the speech has a generally higher level than the music.

A number of other techniques were also investigated. The most successful of these is spectral subtraction. This involves suppressing those frequency components present in the music from the composite signal. This has the effect of suppressing the music, but since the desired speech component also contains the same frequency components, the speech quality is also affected.

The adaptive filtering approach has the least subjective effect on the speech components but does not completely suppress the music. The speech components are considerably more intelligible after music cancellation has been carried out. Spectral subtraction lends a somewhat unnatural quality to the resultant signal, but does render more complete suppression of the music. The speech is slightly muffled. The intelligibility of the speech can be judged to be about the same or better than for the adaptive filtering approach.

R. P. Ramachandran and P. Kabal

The Computation of Line Spectral Frequencies Using Chebyshev Polynomials

Technical Report 85-27, INRS-Telecommunications, University of Quebec, Sept. 1985

Line spectral frequencies provide an alternate parameterization of the analysis and synthesis filters used in linear predictive coding (LPC) of speech. In this paper, a new method of converting between the direct form predictor coefficients and line spectral frequencies is presented. Both even and odd order LPC systems are considered. The system polynomial for the analysis filter is converted to two even order symmetric polynomials with interlacing roots on the unit circle. The line spectral frequencies are given by the positions of the roots of these two auxiliary polynomials. The response of each of these polynomials on the unit circle is expressed as a series expansion in Chebyshev polynomials. The line spectral frequencies are found using an iterative root finding algorithm which searches for real roots of a real function. The algorithm developed is simple in structure and is designed to constrain the maximum number of evaluations of the series expansions. The method is highly accurate and can be used in a form that avoids the storage of trigonometric tables or the computation of trigonometric functions. The reconversion of line spectral frequencies to predictor coefficients uses an efficient algorithm derived by expressing the root factors as an expansion in Chebyshev polynomials.

P. Kabal and B. Sayar

Rounding and Scaling in Fixed-Point FFT Implementations

Technical Report 85-24, INRS-Telecommunications, University of Quebec, June 1985

The calculation of the discrete Fourier transform using a fast Fourier transform (FFT) algorithm with fixed-point arithmetic is considered. The input data is scaled to prevent, overflow and to maintain accuracy. New conditions on the magnitudes of the input components to avoid overflow during the computation of the FFT are derived. Particular emphasis is placed on an implementation using a digital signal processing architecture based on a 16-bit fixed-point representation for the data and the provision for double precision accumulation of sums and products. Simulation results to assess the error performance (signal-to-noise ratio) are presented for various forms of the implementation. Algorithm variants as well as different rounding options are compared. Execution times for implementations based on a single chip signal processor (the Texas Instruments TMS320) are also given. These show that a considerable increase in accuracy can be obtained with only a small penalty in execution time, by applying an alternating form of rounding rather than truncation.

P. Kabal and R. Rabipour

Adaptive Transform Coding (ATC) of Speech - Phase II

Technical Report 83-07, INRS-Telecommunications, University of Quebec, April 1983

P. Kabal

Quantizers for hte Gamma Distribution and Other Symmetrical Distributions

Technical Report 83-08, INRS-Telecommunications, University of Quebec, April 1983

C. Side

Un Systeme d'Analyse et de Synthese de Parole par Prediction Lineaire pour un Tau de Transmission inferieeur a 2400 BPS

Technical Report 82-02, INRS-Telecommunications, University of Quebec, February 1982

P. Kabal

Adaptive Transform Coding of Speech at 9.6 kb/s

Technical Report 82-06, INRS-Telecommunications, University of Quebec, May 1982

P. Kabal

Feasibility Study of a Hardware Implementation of a 4.8 kb/s RELP Speech Coder

Technical Report 81-08, INRS-Telecommunications, University of Quebec, May 1981

This report investigates the feasibility of a hardware implementation of a speech coder based on Residual Excited Linear Prediction (RELP) at 4.8 kb/s. To this end, the basic RELP algorithm has been restructured and simplified to be compatible with real-time processing. In addition, the computations have been implemented with integer arithmetic using 16 bit precision, augmented with the judicious use of double precision accumulation. An architecture based on a microprocessor supplemented with a peripheral processor built around a high speech multiplier/accumulator is proposed. This arrangement can be the basis for a simple, cost-effective and flexible implementation of a hardware RELP coder.

A. Roset

Application of Quadrature Mirror Filters to Split Band Voice Coding Process

Technical Report 80-03, INRS-Telecommunications, University of Quebec, January 1980

This report discusses an application of quadrature ;mirror filters for an 8 sub-band coder; this system allows us to take adavance of the differences in the long term power and of the just nopticable noide in each band.t

P. Kabal

Minimum Mean Square Error Quantizers

Technical Report 80-09, INRS-Telecommunications, University of Quebec, May 1980

This report discusses the design of quantizers which minimize the mean square error for a signal with a given probability density function. Tables of optimal non-uniform quantizers are given for signals with Gaussian, Laplace (exponential) and gamma distributions. These figures correct values given previously in the literature. An appendix documents a program for calculating an optimal quantizer for an empirically derived tabulated probability density.

M. Belleau and P. Kabal

Optimal QUantizers in Linear Predictive Coding of Speech

Technical Report 80-23, INRS-Telecommunications, University of Quebec, May 1980

D. C. Stevenson and P. Kabal

Comparative Evaluation of Residual-Excited Linear Prediction and Sub-Band Coding for Speech Transmission at 9.6 kb/s

Technical Report 79-14, INRS-Telecommunications, University of Quebec, October 1979

P. Kabal

Simulation of Digital Coding Techniques for Speech Transmission at 9.6 kb/szers

Technical Report 78-08, INRS-Telecommunications, University of Quebec, December 1978

Speech transmission at 9.6 kb/s is of significant interest becaus that is the highest rate currently attainable over analog voice lines. Two methods of speech coding, residual-ecited linear prediction (RELP) and sub-band coding (SBC) are simulated and evaluated.