No samples availableNo report availableNo description availableNo bug reported so far

S0042 : POLYCOST

The POLYCOST speech database was recorded during January-March 1996 as a common initiative entitled "Speaker Recognition in Telephony"' within the COST 250 action. The main purpose of the database is to compare and validate speaker recognition algorithms. The data was collected via international telephone lines, with more than five sessions per speaker, and with English spoken by foreigners.

The database contains 1,285 calls (around 10 sessions per speaker) recorded by 133 speakers (74 males and 59 females) from 13 different countries. Approximately 10 speakers per country were provided by each partner.

Each session comprises 15 prompts, including one prompt for DTMF detection, 10 prompts with connected digits uttered in English, 2 prompts with sentences uttered in English and 2 prompts in the speaker's mother tongue. One of the prompts in the speaker's mother tongue consists of free speech.

* English:

  • 4 prompts distributed throughout the session in which the speaker pronounces his or her 7-digit client code
  • 5 prompts distributed throughout the session in which the speaker pronounces a sequence of 10 digits (the same from session to session and from speaker to speaker)
  • 2 prompts in which the speaker pronounces the sentences: "'Joe took father's green shoe bench out" and "He eats several light tacos"', as fixed password phrases which are common to all speakers
  • 1 prompt in which the speaker is supposed to give his or her international phone number

* Mother tongue:

  • 1 prompt in which the speaker gives his or her first name, family name, gender (female/male), town and country
  • 1 prompt with free speech

The database was collected through the European telephone network and was recorded through an ISDN card on XTL SUN platform with an 8 kHz sampling rate. Most of the calls were automatically classified by DTMF detection. Manual classification has been used in the case of no DTMF or wrong DTMF PIN code (circa 10% of the database).

Character set: ISO-8859-1
Medium:
CD-ROMs. The first CD contains speech data from speakers M001-M069, and the second CD ontains data from speakers F001-F060 plus M070-M074.
Total size: CD1: 636 MB
Total size: CD2: 610 MB
File format:
A-law, 8 kHz sampling rate, 8 bits/sample, with no file header.

 


Click here to view the prices
and browse other ressources
belonging to this category


Copyright © 2002 ELDA - Webmaster