CompAudio


Routine:

CompAudio [options] AFileA [AFileB]

Purpose:

Compare audio files, printing statistics

Description:

This program gathers and prints statistics for one or two input audio files. With one input file, this program prints the statistics for that file. With two input files, the signal-to-noise ratio (SNR) of the second file relative to the first file is also printed. For this calculation, the first audio file is used as the reference signal. The "noise" is the difference between sample values in the files.

For SNR comparisons between two-channel audio files, the data is treated as complex values. For more than two channels, the statistics of the individual channels are computed. For a very large number of channels, the audio files are treated as if they were single channel files.

For each file, the following statistical quantities are calculated and printed for each channel. However if the number of channels is large, these values are calculated across all channels.

Mean:
 Xm = SUM x(i) / N
Standard deviation:
 sd = sqrt [ (SUM x(i)^2 - Xm^2) / (N-1) ]
Max value:
 Xmax = max (x(i))
Min value:
 Xmin = min (x(i))

The above values are reported as a percent of full scale. For instance for 16-bit integer data, full scale is 32768.

Number of Overloads:
Count of values at or exceeding the full scale value. For integer data from a saturating A/D converter, the presence of such values is an indication of a clipped signal.
Number of Anomalous Transitions:
An anomalous transition is defined as a transition from a sample value greater than +0.5 full scale directly to a sample value less than -0.5 full scale or vice versa. A large number of such transitions is an indication of byte-swapped data.
Active Level:
This measurement calculates the active speech level using Method B of ITU-T Recommendation P.56. The smoothed envelope of the signal is used to divide the signal into active and non-active regions. The active level is the average envelope value for the active region. The activity factor in percent is also reported. The speech activity measurements are calculated only for stereo or mono files with sampling rates appropriat for recording speech.

An optional delay range can be specified when comparing files. The samples in file B are delayed relative to those in file A by each of the delay values in the delay range. For each delay, the SNR with optimized gain factor (see below) is calculated. For the delay corresponding to the largest SNR (with optimized gain), the full regalia of file comparison values is reported.

For multi-channel files, the comparisons are carried out over all samples in all channels. For stereo files, the two channels can be considered to represent the real and imaginary parts of a complex signal.

Conventional SNR:
            SUM |xa(i)|^2
  SNR = --------------------- .
        SUM |xa(i) - xb(i)|^2
The corresponding value in dB is printed.
SNR with optimized gain factor:
  SNR = 1 / (1 - r^2) ,
where r is the (normalized) correlation coefficient,
                 SUM xa(i)*xb'(i)
  r = -------------------------------------- .
      sqrt [ SUM |xa(i)|^2 * SUM |xb(i)|^2 ]
The SNR value in dB is printed. This SNR calculation corresponds to using an optimized gain factor Sf for file B,
       SUM xa(i)*xb'(i)
  Sf = ---------------- .
        SUM |xb(i)|^2
Segmental SNR:
This is the average of SNR values calculated for segments of data. The default segment length corresponds to 16 ms (128 samples at a sampling rate of 8000 Hz). However if the sampling rate is such that the segment length is outside the range of 64 to 1024 samples, the segment is set to the appropriate edge value. For each segment, the SNR is calculated as
                    SUM xa(i)^2
  SS(k) = 1 + ------------------------- .
              eps + SUM [xa(i)-xb(i)]^2
The term eps in the denominator prevents division by zero. The additive unity term discounts segments with SNR's less than unity. The final average segmental SNR in dB is calculated as
  SSNR = 10 * log10 ( 10^[SUM log10 (SS(k)) / N] - 1 ) dB.
The first term in the bracket is the geometric mean of SS(k). The subtraction of the unity term tends to compensate for the unity term in SS(k).

If any of these SNR values is infinite, only the optimal gain factor is printed as part of the message (Sf is the optimized gain factor),

  "File A = Sf * File B".

An example of the output for a stereo file over a range of delays is shown below.

  Delay:  20,  SNR = 5.2461    dB  (Gain for File B = 0.55831)
  Delay:  21,  SNR = 65.394    dB  (Gain for File B = 0.66674)
  Delay:  22,  SNR = 5.2462    dB  (Gain for File B = 0.55831)
  File A:
   Channel 1:
     Number of Samples: 23829
     Std Dev = 855.02 (2.609%),  Mean = -9.9431 (-0.03034%)
     Maximum = 4774 (14.57%),  Minimum = -6156 (-18.79%)
     Active Level: 946.7 (2.889%), Activity Factor: 81.6%
   Channel 2:
     Number of Samples: 23829
     Std Dev = 427.5 (1.305%),  Mean = -4.9737 (-0.01518%)
     Maximum = 2387 (7.285%),  Minimum = -3078 (-9.393%)
     Active Level: 473.34 (1.445%), Activity Factor: 81.6%
  File B:
   Channel 1:
     Number of Samples: 23829
     Std Dev = 1282.4 (3.914%),  Mean = -14.886 (-0.04543%)
     Maximum = 7160 (21.85%),  Minimum = -9233 (-28.18%)
     Active Level: 1418.6 (4.329%), Activity Factor: 81.7%
   Channel 2:
     Number of Samples: 23829
     Std Dev = 641.2 (1.957%),  Mean = -7.4417 (-0.02271%)
     Maximum = 3580 (10.93%),  Minimum = -4617 (-14.09%)
     Active Level: 709.29 (2.165%), Activity Factor: 81.7%
  Best match at delay = 21
  SNR      = 6.0233   dB
  SNR      = 65.394   dB  (Gain for File B = 0.66674)
  Seg. SNR = 6.0203   dB  (256 sample segments)
  Max Diff = 3077 (9.39%),  No. Diff = 46127 (573 runs)

Options:

Input file(s), AFileA [AFileB]:
The environment variable AUDIOPATH specifies a list of directories to be searched for the input audio file(s). Specifying "-" as the input file indicates that input is from standard input. One or two input files can be specified.
-d DL:DU, --delay=DL:DU
Specify a delay range (in samples). Each delay in the delay range represents a delay of file B relative to file A. The default range is 0:0.
-s SAMP, --segment=SAMP
Segment length (in samples) to be used for calculating the segmental signal-to-noise ratio. The default is a length corresponding to 16 ms.
-g GAIN, --gain=GAIN
A gain factor applied to the data from the input files. This gain applies to all channels in a file. The gain value can be given as a real number (e.g., "0.003") or as a ratio (e.g., "1/256"). This option may be given more than once. Each invocation applies to the input files that follow the option.
-l L:U, --limits=L:U
Sample limits for the input files (numbered from zero). Each invocation applies to the input files that follow the option. The specification "L:" means from sample L to the end; "N" means from sample 0 to sample N-1.
-h, --help
Print a list of options and exit.
-v, --version
Print the version number and exit.

See routine CopyAudio for a description of other parameters.

-t FTYPE, --type=FTYPE
Input file type and environment variable AF_FILETYPE
-P PARMS, --parameters=PARMS
Input file parameters and environment variable AF_INPUTPAR

Environment variables:

AUDIOPATH:
This environment variable specifies a list of directories to be searched when opening the input audio files. Directories in the list are separated by colons (semicolons for Windows).

Author / version:

P. Kabal / v10r2 2018-11-16

See Also

CopyAudio, InfoAudio


Main Index AFsp