S0006 : BREF sub-corpus BREF-80
The BREF corpus was designed to provide enough read speech data for the development and evaluation of continuous speech recognition
systems (both speaker-dependent and speaker-independent), and
to provide a large corpus of continuous speech for the acquisition
of acoustic-phonetic knowledge of spoken French. All the recorded
texts were selected from extracts of the French newspaper Le Monde
so as to provide a large vocabulary (over 20,000 words) and a
wide range of phonetic environments. The entire BREF corpus contains
over 100 hours of speech material from 120 speakers.
The BREF-80 sub-corpus consists of 2 ISO9660 CDROMs, BREF80-1
and BREF80-2, containing speaker-independent training data from
80 speakers. Together these 2 CDs contain 5330 sentences, an
average of 67 sentences per speaker. While this data represents
only a small portion of the entire BREF corpus, the sentences
have been selected to cover most of the BREF training prompts,
in order to conserve a wide range of phonetic contexts with a
minimum amount of speech data. Thus, the BREF80 sub-corpus produced
on these CDs was especially selected to train speaker-independent,
vocabulary-independent speech recognizers.
Click here to view the prices and browse other ressources belonging to this category |