Nothing moreNo samples availableNo bug reported so far

S0052 : FIXED0IT - Italian Fixed Network Speech SpeechDat(M) Corpus

DB1 Phonetically rich sentences & application oriented utterances

The Italian Fixed Network Speech SpeechDat(M) Corpus version 1.0 was recorded within the scope of the SpeechDat(M) project (LRE-63314), funded by the European Commission. Recording was done by using a primary rate ISDN interface, yielding 8 kHz, 8 bits per sample, A-law coded signal. The data files are formatted according to the SAM European project. The speech data are compressed with the GNU gzip program. All software needed to use the corpus is provided on the CDs.

The corpus contains the speech of about 1,000 speakers (about 500 males and 500 females) and was designed to support the creation of voice-driven teleservices. The callers spoke at least 39 items, comprising:

  • isolated and connected digits
  • natural numbers
  • money amounts
  • spelled words
  • time and date phrases
  • yes/no questions
  • city names
  • common application words
  • application words in phrases
  • phonetically rich sentences

Most items are read, some are spontaneously spoken.

The recordings come with extensive and standardised documentation. All speech is carefully transcribed at the orthographic level; in addition, a number of clearly audible non-speech events are included in the transcription. Moreover, age and regional background of the speakers are provided. A pronunciation dictionary is added, containing all words that occur in the corpus, with a corresponding SAMPA broad-class phonemic transcription.

Validation and premastering of the CD-ROMs were performed by the Speech Processing Expertise Centre (SPEX), Leidschendam, The Netherlands.

DB2 Phonetically rich sentences sub-set (S0053)

See ELRA-S0052 for description. DB2 is a sub-set of DB1; it contains only the phonetically rich sentences items

 


Click here to view the prices
and browse other ressources
belonging to this category


Copyright © 2002 ELDA - Webmaster