Package lib :: Module percep_spec2
[frames] | no frames]

Module percep_spec2

source code

Perceptual spectrum for speech.


Version: $Revision: 1.43 $

Date: $Date: 2007/03/07 23:54:40 $

Classes
  roughness_c
This should be based on Hutchinson and KNopoff 1978 Kameoka an Kuriygawa 1969a; Viemeister 1988; Aures 1985; Plomp and Levelt 1965.
  peakalign_c
  fft_filterbank
Functions
 
tau_lp(fc)
This is the cutoff frequency of the modulation transfer function in the ear, as a function of frequency.
source code
 
cochlear_filter(fofc)
Transfer function taken from "Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank IEEE Transactions of Speech and Audio Processing, 10(7) October 2002, Pages 495-503.
source code
 
threshold(f)
IN pressure amplitude.
source code
 
accum_abs_sub(a, b, o)
This does a little computation in a block-wise fashion to keep all the data witin the processor's cache.
source code
 
list_accum_abs_sub_dly(tk, dly) source code
 
process_voicing(tk, tick, Dt)
Returns: (avg, min) where avg-min is the voicing estimator.
source code
 
block_percep_spec(data, dt, Dt, **kwargs)
This computes the perceptual spectrum in blocks and glues them together.
source code
 
perceptual_spec(data, dt, Dt, bmin=CBmin, bmax=CBmax, db=BBSZ, do_mod=0, do_dissonance=False, PlompBouman=True, do_peakalign=False, e=None)
This returns something roughly like the neural signals leaving the ear.
source code
 
test() source code
Variables
  pylab = None
  CBmax = erb_scale.f_to_erb(8000.0)
  CBmin = erb_scale.f_to_erb(50.0)
  BBSZ = 0.5
  E = 0.333
  Neural_Tick = 1e-4
  DISS_BWF = 0.25
  MAX_DISS_FREQ = 250.0
  HP_DISS_FREQ = 30.0
  BLOCK_EDGE = 0.3

Imports: sys, M, erb_scale, gpkmisc, die, numpy, VM, power, PS, gammatone2


Function Details

tau_lp(fc)

source code 

This is the cutoff frequency of the modulation transfer function in the ear, as a function of frequency. From R. Plomp and M. A. Bouman, JASA 31(6), page 749ff, June 1959 'Relation of hearing threshold and duration of tone pulses'

cochlear_filter(fofc)

source code 

Transfer function taken from "Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank IEEE Transactions of Speech and Audio Processing, 10(7) October 2002, Pages 495-503. au=Frank Baumgarte.

threshold(f)

source code 

IN pressure amplitude. Crudely taken from Handbook of Perception vol 4: hearing, E.C.Carterette and M.P.Friedman, editors, Academic Press 1978, isbn 0-12-161904-4. Curve near 70db used.

accum_abs_sub(a, b, o)

source code 

This does a little computation in a block-wise fashion to keep all the data witin the processor's cache. It computes sum( abs(a-b) ).

process_voicing(tk, tick, Dt)

source code 
Parameters:
  • Dt (float) - the sampling rate of the output vectors.
  • tick (float) - the sampling rate of the vectors in tk.
  • tk ([ numpy.ndarray, ...]) - a list of vectors, each of which represents the neural firing of a particular point in the cochlea.
Returns:
(avg, min) where avg-min is the voicing estimator. avg: float, min: float

block_percep_spec(data, dt, Dt, **kwargs)

source code 

This computes the perceptual spectrum in blocks and glues them together. It's useful when the data is too large to fit in memory.

perceptual_spec(data, dt, Dt, bmin=CBmin, bmax=CBmax, db=BBSZ, do_mod=0, do_dissonance=False, PlompBouman=True, do_peakalign=False, e=None)

source code 

This returns something roughly like the neural signals leaving the ear. It filters into 1-erb-wide bins, then takes the cube root of the amplitude.

Returns:
(channel_info, data, time_offset), where
  • time_offset is the time of the the first output datum relative to the time of the first input sample.
  • channel_info
  • data