S0052 : FIXED0IT - Italian Fixed Network Speech SpeechDat(M) Corpus
DB1 Phonetically rich sentences & application oriented utterances
The Italian Fixed Network Speech SpeechDat(M) Corpus version 1.0 was recorded
within the scope of the SpeechDat(M) project (LRE-63314), funded by the European
Commission. Recording was done by using a primary rate ISDN interface, yielding
8 kHz, 8 bits per sample, A-law coded signal. The data files are formatted according
to the SAM European project. The speech data are compressed with the GNU gzip
program. All software needed to use the corpus is provided on the CDs.
The corpus contains the speech of about 1,000 speakers (about 500 males and
500 females) and was designed to support the creation of voice-driven teleservices.
The callers spoke at least 39 items, comprising:
- isolated and connected digits
- natural numbers
- money amounts
- spelled words
- time and date phrases
- yes/no questions
- city names
- common application words
- application words in phrases
- phonetically rich sentences
Most items are read, some are spontaneously spoken.
The recordings come with extensive and standardised documentation. All speech
is carefully transcribed at the orthographic level; in addition, a number of
clearly audible non-speech events are included in the transcription. Moreover,
age and regional background of the speakers are provided. A pronunciation dictionary
is added, containing all words that occur in the corpus, with a corresponding
SAMPA broad-class phonemic transcription.
Validation and premastering of the CD-ROMs were performed by the Speech Processing
Expertise Centre (SPEX), Leidschendam, The Netherlands.
DB2 Phonetically rich sentences sub-set (S0053)
See ELRA-S0052 for description. DB2 is a sub-set of DB1; it contains only the
phonetically rich sentences items
Click here to view the prices and browse other ressources belonging to this category |