This data (and the data it refers to) is copyright 2007, 2008 by Greg Kochanski, and is licensed in England under Creative Commons Noncommercial-Attribution License. You may copy and/or use this file (and referenced files) for noncommercial purposes so long as the author is properly acknowledged. For commercial licensing, contact Isis Innovation.
This corpus contains the data from the "Tick1" experiment from ESRC grant "Articulation and Coarticulation in the Lower Vocal Tract" with G. Kochanski and J. Coleman as principal investigators. Data is courtesy of the UK's Economics and Social Research Council, derived from project RES-000-23-1094, 7/2005 through 3/2008.
The files DB.fiat, DBsub.fiat and DBsent.fiat contain metadata describing the recordings, in the FIAT 1.2 file format.The experimental data itself consists of speech recordings, and they are stored in subdirectories. It also contains hand-checked files that mark the beginning and end of utterances, and hand checked positions for finger taps and metronome ticks.
This corpus of data consists partly of short files of repetitive speech: phrases like "Nothing Matters. Nothing Matters. Nothing Matters. ..." (There are 75 different phrases.) The remainder consists of the same phrases (and a few others) spoken in a more standard laboratory phonology context: a randomized list of phrases.
It also includes some longer, rhythmic passages from Dr. Suess.
The speakers are all speakers of Southern British English. It contains 1308 audio files and totals 2.6 gigabytes of uncompressed audio. There are 14 speakers.
The corpus contains a large number of directories. Inside each, it contains several files of interest:
The original recording, in Microsoft WAV format. It is a two-channel file. One channel contains the recorded speech, and the other channel contains either metronome ticks or an audio channel from a microphone positioned to pick up finger taps. (The subject's finger tapped on a hardcover book about 2cm from the microphone.) The finger tap channel will pick up some speech, but faintly, and the speech channel will pick up some finger tap sounds. However, metronome ticks were coupled in electronically and are completely isolated from the speech channel.
These are the start and end-points of the speech in the utterance, automatically generated but checked for accuracy by a human. A small amount of silence (probably <100ms) is included within the marked endpoints on either side of the utterance. See the above publication for details. The data files are in a format suitable for reading by the ESPS package Xwaves, and can be read by Wavesurfer. Python 2.5 code for reading these files is available on Sourceforge, in the speechresearch project, in file gmisclib/xwaves_lab.py . In brief, the format contains a bunch of header lines, then a line consisting of a single hash mark ('#'), then two relevant lines. The one containing an asterisk in the third field marks the utterance start (the time is in the first field). Likewise, the line containing '%' marks the end. Times are relative to the beginning of the raw.wav files.
This file contains experimental tick or tap events. For the metronome data, it contains the times at which metronome ticks occur. For the "tick" data, if it exists, it lists the times at which the subject's finger tapped to mark a stressed syllable. This is computed from one of the channels of the raw.wav file, but manually checked. This file is in the Xwaves label format, same as ue.lbl.
This file contains computed tick or tap locations. It is meaningful only for metronome data, where it simply marks the metronome ticks.
This file (and other files with the ".dat" extension) are
stored in the GPK ASCII Image format. This can be read by code
available on Sourceforge, in
project, in file
gpkio/ascii_read.c and gpkio/read.c
. (Note, the gpklib library is required for this code; that can
be found in the
gpklib subdirectory in the same
project.) A Python interface to these libraries is available in
the gpk_img_python subdirectory of the same project, and is
documented at http://kochanski.org/gpk/code/speechresearch/gpk_img_python
This data format consists of a header, followed by data. The
header consists of lines in the form
value and the data section is a two-dimensional array of
values, either in ASCII in IEEE-754 binary format for floating
point values, on in binary integer formats. The header
information loosely follows NASA's FITS
standard (Flexible Image Transport Standard). (Incidentally, the
same software will read and write FITS format images, too.)
Other files are computed from the raw data, and are preserved for convenience. These were used in the "What marks the beat of speech?" paper.
An irregularity measure that separates voiced speech from unvoiced. It quantifies speech that is not fully voiced.
The perceptual loudness.
A measure of duration for the current syllable. Essentially, it measures how far one can go (in time) before the spectrum changes substantially.
The RMS (intensity or power).
A standard computation of the speech fundamental frequency.
A measurement of the average slope of the speech spectrum.
For the full data set, please see http://www.phon.ox.ac.uk/tick1_info .
When using the data with "rep*" in the "text" field, the appropriate publication to reference is DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?" G. Kochanski and C. Orphanidou, Journal of the Acoustical Society of America, ISSN 0001-4966, Volume 123(5), pages 2780-2791.
Files whose text field is in the form "sent" are long lists of randomized sentences. These "sent" files were used, along with the "rep*" files in another publication: "Testing the Ecological Validity of Repetitive Speech", Greg Kochanski and Christina Orphanidou, presented at the 2007 International Congress of the Phonetic Sciences (ICPhS2007), 6-10 August 2007. It is available on the web at http://kochanski.org/gpk/papers/2007/icphs.pdf, http://ora.ouls.ox.ac.uk/objects/uuid:1999c687-49a0-4808-9a50-2f82ab66d96f , or http://tinyurl.com/3u2ba4 .
Utterances with "rep*" in the text field are repetitive speech; each phrase is repeated 10-15 times in succession. Files where the text field equals "fox", "king", and "lucky" are longer texts that were not used. They are from three books by Dr. Suess (Geisel).
More detailed documentation is in the DB.fiat file that contains the bulk of the metadata. Some comments by the Oxford Library system on putting this corpus on the web are at http://oxfordrepo.blogspot.com/2008/10/modelling-and-storing-phonetics.html
|[ Papers | kochanski.org | Phonetics Lab | Oxford ]||Last Modified Mon Jan 10 17:10:31 2011||Greg Kochanski: [ Home ]|