TSP Speech Database
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2, 29 pp., Nov. 2018
This report descibes the TSP speech database. The database consists of over 1400 utterances spoken by 24 speaers (half male, half female). The data was recorded to Digital Audio Tape in an anechoic room. The database includes the original samples (48 kHz sampling rate) and the same data filtered and subsampled to 6 kHz and 8 kHz sampling rates.
Database Download Link
Combinatorial Coding and Lexicographic Ordering
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1, 30 pp., Feb. 2018
This document examines methods to generate a combinatorial index for the selection of items, and the decoding of the index to produce the corresponding selection of items. Marching through the indices produces lexicographically ordered selections. Three cases are considered: Selections with no repeated items, selections with repetitions, and selections with prescribed repetition multiplicities.
ITU-T G.723.1 Speech Coder: A Matlab Implementation
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2f, 54 pp., Dec. 2017 (initial version Nov. 2003)
Matlab code: G.723.1-v2r1b.tar.gz
This report documents the details of the processing steps in the ITU-T G.723.1 Speech Coder. This report accompanies an implantation of that coder in Matlab. The Matlab implementation was designed to facilitate experimentation and research using a practical speech coder as a base.
Minimum Mean-Square Error Filtering: Autocorrelation/Covariance, General Delays and Multirate Systems
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 0.985. 215 pp., April 2011
These notes examine procedures for solving for minimum mean-square error filter. The stochastic case and the block-based (least-squares) analyses are covered in a single formalism. The filtering is analyzed in more generality than in many expositions, allowing for configurations with general filter delays and flexible windows for the least squares problem. The important linear prediction problem is examined in detail. For the equally spaced delay case, a rich set of results ensue. Several topics are covered that are missing from many textbooks: affine estimation (non-zero means), cyclostationary signals (for multirate signals), fractionally spaced equalizers, joint process estimation in relation to the Levinson algorithm, and an approximate formulation for linearly constrained filters.
Frequency Domain Representations of Sampled and Wrapped Signals
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.5, 16 pp., March 2011 (initial version Jan. 2008)
These notes examine the relationships between frequency domain representations of discrete-time and wrapped signals derived from a continuous-time signal. The first part of these notes develops the relationships for periodic signals which allow for the analysis of periodic signals within the framework of the Fourier transform. The second part examines the relationships between the Fourier series, the Discrete-Time Fourier Transform (DTFT) and the Discrete Fourier Transform (DFT).
The Equivalence of ADPCM and CELP Coding
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.2, 14 pp., March 2011 (initial version April 2010)
This document examines a coding schemes which differentially code signals while at the same time controlling the frequency characteristics of the coding (quantization) error. We show that a (vector quantized) version of an Adaptive Differential Pulse Code Modulation (ADPCM) system using noise feedback to shape the quantization noise can be converted to an equivalent system which is in the form of a Code Excited Linear Prediction (CELP) system. While this equivalence is known by, or at least not a surprise to, the signal processing cognoscenti, it is not widely appreciated by many others. We also try to add a historical perspective on the development of these systems.
Minimum Phase & All-Pass Filters
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2.1, 28 pp., March 2011 (initial version Nov. 2007)
This document analyzes minimum-phase and
all-pass filters. The analysis allows for complex-valued filter coefficients.
The properties of the frequency responses (amplitude, phase, and group delay) of these
filters are discussed.
Time Windows for Linear Prediction of Speech
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 2a, 43 pp., Nov. 2009 (initial version 2003-10)
This report examines the time windows used for linear prediction (LP) analysis of speech. The goal of windowing is to create frames of data each of which will be used to calculate an autocorrelation sequence. Several factors enter into the choice of window. The time and spectral properties of Hamming and Hann windows are examined. We also consider windows based on Discrete Prolate Spherical Sequences including multiwindow analysis. Multiwindow analysis biases the estimation of the correlation more than single window analysis. Windows with frequency responses based on the ultraspherical polynomials are discussed. This family of windows includes Dolph-Chebyshev and Saramäki windows. This report also considers asymmetrical windows as used in modern speech coders. The frequency response of these windows is poor relative to conventional windows. Finally, the presence of a "pedestal" in the time window (as in the case of a Hamming window) is shown to be deleterious to the time evolution of the LP parameters.
FIR Filters: Frequency-Weighted and Minimum-Phase Designs
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Version 1.6, 32 pp., Nov. 2007 (initial version Sept. 2004)
Matlab code: FilterDesign-M-v2r0.tar.gz
Improving the Presentation of Matlab Plots
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, June 2006
Matlab code: Matlab-Plot-v1r3.tar.gz
This document describes a number of strategies which go towards the goal of producing publication quality plots from Matlab. One finds much to criticize in the quality of plots that are reproduced in today's journals. This is due to the fact that the authors supply the plots without having a clear view of how they will be processed to produce the final plot on the printed page. We give some guidelines and supply Matlab routines that streamline the application of these guidelines.
Matlab Plots in Microsoft Word
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Jan. 2006
This report looks at different options for inserting plots generated from Matlab into Microsoft Word document. For publication quality output, it is important to control the size of the graphic that will appear in the final document. The graphic should be drawn at its final size in Matlab. Scaling in Word is undesirable, as it not only scales the plot, but also the text on the graphic. This report outlines a procedure that sets the size of the figure and the font size in Matlab. Once set, the graphic can be imported into Word with no further scaling.
Results indicate that the PostScript format is the best option for good quality graphics. Graphics imported using cut and paste from Matlab (EMF or bitmap format) are noticeably inferior in quality.
Windows for Transform Processing
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Dec. 2005
This report examines the time windows used in processing of signals in a transformed domain. The goal of windowing is to create frames of data, each of which will be used to calculate a transformed sequence. The transform coefficients are then modified (filtering for instance for noise reduction) or coded (transform coding). The modified transform coefficients are then applied to an inverse transform and windowed again before creating an output signal using addition of the overlapped blocks. It is the analysis window (before the transform) and the synthesis window (after the inverse transform) that are examined in this report. The requirement for perfect reconstruction (when the transform coefficients are not modified) is developed. This gives a condition on the product of the analysis and synthesis windows. An argument is given to show that if additive noise is introduced in the transform domain, the windowing should be equally apportioned between these windows, i.e. the analysis and synthesis windows should be the same. The windowing requirements for systems implementing block-by-block filtering of the input signal in the transform domain are also examined.
Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003
This report examines schemes that modify linear prediction (LP) analysis for speech signals. First techniques which improve the conditioning of the LP equations are examined. White noise compensation for the correlations is justified from the point of view of reducing the range of values which the predictor coefficients can take on. A number of other techniques which modify the correlations are investigated (highpass noise, selective power spectrum modification). The efficacy of these procedures is measured over a large speech database. The results show that white noise compensation is the method of choice - it is both effective and simple.
Other methods to prematurely terminate the iterative solution of the correlation equations (Durbin recursion) to circumvent problems of ill-conditioning are also investigated.
The report also considers the bandwidth expansion of digital filters which have resonances. In speech coding such resonances correspond to the formant frequencies. Bandwidth expansion of the LP filter serves to avoid unnatural sharp resonances that may be artefacts of pitch and formant interaction. Lag windowing of the correlation values has been used with the aim of both bandwidth expansion and helping the conditioning of the LP equations. Experiments show that the benefit for conditioning is minimal. This report also discusses bandwidth expansion of the prediction coefficients after LP analysis using radial scaling of the z-transform. A simple new formula is given which can be used to estimate the bandwidth expansion.
Stable Symmetric Distributions and Their Role in the Signal Separation Problem
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Feb. 2003
This report examines the problem of blind source separation when the sources are distributed from a stable class. We show that cost functions extemising any marginal property of a mixture of signals are constant over the set of symmetric stable distributions, and thus cannot solve the blind source separation problem in full generality. These distributions are non-pathological, but have infinite energy. The noticeable exception is the Gaussian distribution, for which the separation problem is inherently undetermined. For finite variance signals, the use of marginal statistics for blind signal separation is justified.
An Examination and Interpretation of ITU-R BS.1387: Perceptual Evaluation of Audio Quality
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, May 2002 (updated Dec. 2003)
This report examines the standard which describes a method for the objective measure of perceived audio quality (ITU-R Recommendation BS.1387). This standard uses a number of psycho-acoustical measures which are combined to give a measure of the quality difference between two instances of a signal (a reference and a test signal). Many aspects of the standard are under-specified. This report examines alternate interpretations. It also looks at efficiency issues in the implementation of computationally intensive parts of the algorithm.
Matlab code: PQevalAudio-v1r0.tar.gz
Blind Signal Separation
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Sept. 2001
Blind Signal Separation is the task of separating signals when only their mixtures are observed. Recently, Independent Component Analysis has become a favourite method of researchers for attacking this problem. We review the techniques, from cumulant-based algorithms to Infomax to second-order statistics, from feedback to feedforward architectures, from the instantaneous to the convolutional problem. A new method for reducing the whitening effect on speech, known to occur in feedforward architectures, is introduced. The procedure also possesses significant stabilization properties, being based on performing the filter update in the LP-residual domain of speech. Experimental tests are conducted, and the algorithms compared.
Generating Gaussian Pseudo-Random Deviates
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, Oct. 2000
This report examines low-complexity methods to generate pseudo-random Gaussian (normal) deviates. We introduce a new method based on modelling the Gaussian probability density function using piecewise linear segments. This approach is shown to be both efficient and accurate. It does not require the calculation of transcendental functions
All of the methods considered map one or more uniform distributions to create the Gaussian deviates. This report investigates the effect of the use of discrete variates, particularly in the tails of the Gaussian distribution. In addition, we give a new interpretation of the method of aliases that suggests its application to non-uniform quantization.
Formatting a Thesis with LaTeX
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, March 2000 (updated June 2005)
Thesis Macros: ThesisStyle.zip
This report describes the use of LaTeX to format a thesis. A number of topics are covered: content and organization of the thesis, LaTeX macros for controlling the thesis layout, formatting mathematical expressions, generating bibliographic references, importing figures and graphs, generating graphs in Matlab, and formatting tables. The LaTeX macros used to format a thesis (and this document) are described. As well, Matlab procedures are shown to illustrate methods that can be used to format graphs in a form suitable for inclusion in a LaTeX document.
Matlab Plots in Microsoft Word
MMSP Lab Technical Report, Dept. Electrical & Computer
Engineering, McGill University, March 2000
Superseded by the version of Jan. 2006
Measuring Speech Activity
MMSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, August 1999
This report discusses the algorithm described in ITU-T Recommendation P.56 for measuring the active speech level. Method B in P.56 determines a speech activity factor representing the fraction of time that the signal is considered to be active speech (as opposed to background idle noise) and the corresponding active level for the speech part of the signal. The basic algorithm generates an envelope value at each sample time. The envelope values are compared with a discrete set of thresholds. The (approximate) active speech level is determined by interpolating in the log domain between the threshold values. In this report we assess the effects on the speech active level due to interpolation. Recommendation P.56 allows for sampling rates as low as 600 Hz. Results for subsampled data are compared with those calculated at the full speech sampling rate.
C. C. Chu and P. Kabal
Codebook Excited Linear Prediction of Speech: Performance in the Presence of Channel Errors
Technical Report 88-10, INRS-Telecommunications, University of Quebec, March 1988.
Code Excited Linear Prediction Coding of Speech at 4.8 kb/s
Technical Report 87-36, INRS-Telecommunications, University of Quebec, July 1987
This report describes a software implementation of an algorithm for digital coding of speech at low bit rates. The reconstruction of the speech signal is accomplished by exciting a cascade of a formant synthesis filter and a pitch synthesis filter with an excitation waveform. The excitation waveform is selected from a dictionary of waveforms using a frequency weighted mean-square error criterion. At transmission rates in the neighborhood of 5 kb/s, this scheme produces speech with better quality than any other known scheme.
J.-L. Moncet and P. Kabal
Codeword Selection for CELP Coders
Technical Report 87-35, INRS-Telecommunications, University of Quebec, July 1987
This report describes the algorithm used for selecting an excitation waveform for a CELP coder operating at 5 kb/s. Each candidate waveform is used to synthesize a segment of speech. A frequency weighted error criterion is used to find the waveform which regenerates the best output speech. The synthesis operation uses both a pitch synthesis filter and a formant synthesis filter. The pitch synthesis filter is optimized to give the best output speech. This optimization offers a significant improvement over a procedure which uses a pitch filter chosen by analyzing the input speech. Simplified sequential versions of this strategy also give good quality speech. The quantization of the parameters is also considered.
C. C. Chu and P. Kabal
Coding of LPC Parameters for Low Bit Rate Speech Coders
Technical Report 87-19, INRS-Telecommunications, University of Quebec, March 1987
This report summarizes the results of a study of the use of line spectral frequencies (LSF's) for the low bit rate coding of the linear predictive (LPC) parameters for use in a speech coder. Different forms of quantization using LSF's for the LPC coefficients are examined. An LSF based scheme allows the quantizer to take into account the perceptual impact of spectral distortion. One of the schemes considered takes advantage of the frame-to-frame correlation of the LSF parameters. The LSF based coding scheme is compared to a quantization based on a reflection coefficient representation. The application of this quantization scheme to the low rate, 4800 bits/sec, Code Excited Linear Predictive (CELP) coder is considered. In this context, adaptive gain factors for the differential (frame-to-frame) LSF quantizer are useful. In addition, a frame-to-frame interpolation scheme is proposed. With these modifications, LSF coding of 10 LPC parameters requires 1150 bits/sec.
L. Barbeau, D. Bernardi, C. C. Chu, P. Kabal, J.-L. Moncet, and D. O'Shaughnessy
Speech Enhancement in the Presence of Interfering Music and Noise
Technical Report 87-09, INRS-Telecommunications, University of Quebec, Jan. 1987
This report summarizes the results of speech enhancement experiments for a signal consisting of speech in the presence of interfering music and noise. Filtering was applied to remove hum and high frequency components. The composite signal was then frequency equalized to flatten the noise spectrum. A reference recording of the same passage of music which interferes with the original recording was obtained and time aligned with the composite recording.
The time-aligned reference music was processed through an adaptive filter and then subtracted from the composite recording. This results in a noticeable reduction and muffling of the music level. While before music cancellation, the music tended to dominate the composite signal, after cancellation the speech has a generally higher level than the music.
A number of other techniques were also investigated. The most successful of these is spectral subtraction. This involves suppressing those frequency components present in the music from the composite signal. This has the effect of suppressing the music, but since the desired speech component also contains the same frequency components, the speech quality is also affected.
The adaptive filtering approach has the least subjective effect on the speech components but does not completely suppress the music. The speech components are considerably more intelligible after music cancellation has been carried out. Spectral subtraction lends a somewhat unnatural quality to the resultant signal, but does render more complete suppression of the music. The speech is slightly muffled. The intelligibility of the speech can be judged to be about the same or better than for the adaptive filtering approach.
R. P. Ramachandran and P. Kabal
The Computation of Line Spectral Frequencies Using Chebyshev Polynomials
Technical Report 85-27, INRS-Telecommunications, University of Quebec, Sept. 1985
Line spectral frequencies provide an alternate parameterization of the analysis and synthesis filters used in linear predictive coding (LPC) of speech. In this paper, a new method of converting between the direct form predictor coefficients and line spectral frequencies is presented. Both even and odd order LPC systems are considered. The system polynomial for the analysis filter is converted to two even order symmetric polynomials with interlacing roots on the unit circle. The line spectral frequencies are given by the positions of the roots of these two auxiliary polynomials. The response of each of these polynomials on the unit circle is expressed as a series expansion in Chebyshev polynomials. The line spectral frequencies are found using an iterative root finding algorithm which searches for real roots of a real function. The algorithm developed is simple in structure and is designed to constrain the maximum number of evaluations of the series expansions. The method is highly accurate and can be used in a form that avoids the storage of trigonometric tables or the computation of trigonometric functions. The reconversion of line spectral frequencies to predictor coefficients uses an efficient algorithm derived by expressing the root factors as an expansion in Chebyshev polynomials.
P. Kabal and B. Sayar
Rounding and Scaling in Fixed-Point FFT Implementations
Technical Report 85-24, INRS-Telecommunications, University of Quebec, June 1985
The calculation of the discrete Fourier transform using a fast Fourier transform (FFT) algorithm with fixed-point arithmetic is considered. The input data is scaled to prevent, overflow and to maintain accuracy. New conditions on the magnitudes of the input components to avoid overflow during the computation of the FFT are derived. Particular emphasis is placed on an implementation using a digital signal processing architecture based on a 16-bit fixed-point representation for the data and the provision for double precision accumulation of sums and products. Simulation results to assess the error performance (signal-to-noise ratio) are presented for various forms of the implementation. Algorithm variants as well as different rounding options are compared. Execution times for implementations based on a single chip signal processor (the Texas Instruments TMS320) are also given. These show that a considerable increase in accuracy can be obtained with only a small penalty in execution time, by applying an alternating form of rounding rather than truncation.
P. Kabal and R. Rabipour
Adaptive Transform Coding (ATC) of Speech - Phase II
Technical Report 83-07, INRS-Telecommunications, University of Quebec, April 1983
Quantizers for hte Gamma Distribution and Other Symmetrical Distributions
Technical Report 83-08, INRS-Telecommunications, University of Quebec, April 1983
Un Systeme d'Analyse et de Synthese de Parole par Prediction Lineaire pour un Tau de Transmission inferieeur a 2400 BPS
Technical Report 82-02, INRS-Telecommunications, University of Quebec, February 1982
Adaptive Transform Coding of Speech at 9.6 kb/s
Technical Report 82-06, INRS-Telecommunications, University of Quebec, May 1982
Feasibility Study of a Hardware Implementation of a 4.8 kb/s RELP Speech Coder
Technical Report 81-08, INRS-Telecommunications, University of Quebec, May 1981
This report investigates the feasibility of a hardware implementation of a speech coder based on Residual Excited Linear Prediction (RELP) at 4.8 kb/s. To this end, the basic RELP algorithm has been restructured and simplified to be compatible with real-time processing. In addition, the computations have been implemented with integer arithmetic using 16 bit precision, augmented with the judicious use of double precision accumulation. An architecture based on a microprocessor supplemented with a peripheral processor built around a high speech multiplier/accumulator is proposed. This arrangement can be the basis for a simple, cost-effective and flexible implementation of a hardware RELP coder.
Application of Quadrature Mirror Filters to Split Band Voice Coding Process
Technical Report 80-03, INRS-Telecommunications, University of Quebec, January 1980
This report discusses an application of quadrature ;mirror filters for an 8 sub-band coder; this system allows us to take adavance of the differences in the long term power and of the just nopticable noide in each band.t
Minimum Mean Square Error Quantizers
Technical Report 80-09, INRS-Telecommunications, University of Quebec, May 1980
This report discusses the design of quantizers which minimize the mean square error for a signal with a given probability density function. Tables of optimal non-uniform quantizers are given for signals with Gaussian, Laplace (exponential) and gamma distributions. These figures correct values given previously in the literature. An appendix documents a program for calculating an optimal quantizer for an empirically derived tabulated probability density.
M. Belleau and P. Kabal
Optimal QUantizers in Linear Predictive Coding of Speech
Technical Report 80-23, INRS-Telecommunications, University of Quebec, May 1980
D. C. Stevenson and P. Kabal
Comparative Evaluation of Residual-Excited Linear Prediction and Sub-Band Coding for Speech Transmission at 9.6 kb/s
Technical Report 79-14, INRS-Telecommunications, University of Quebec, October 1979
Simulation of Digital Coding Techniques for Speech Transmission at 9.6 kb/szers
Technical Report 78-08, INRS-Telecommunications, University of Quebec, December 1978
Speech transmission at 9.6 kb/s is of significant interest becaus that is the highest rate currently attainable over analog voice lines. Two methods of speech coding, residual-ecited linear prediction (RELP) and sub-band coding (SBC) are simulated and evaluated.