S0042 : POLYCOST
The POLYCOST speech database was recorded during January-March 1996 as a common
initiative entitled "Speaker Recognition in Telephony"' within the
COST 250
action. The main purpose of the database is to compare and validate speaker
recognition algorithms. The data was collected via international telephone lines,
with more than five sessions per speaker, and with English spoken by foreigners.
The database contains 1,285 calls (around 10 sessions per speaker) recorded
by 133 speakers (74 males and 59 females) from 13 different countries. Approximately
10 speakers per country were provided by each partner.
Each session comprises 15 prompts, including one prompt for
DTMF detection, 10 prompts with connected digits uttered in English, 2 prompts
with sentences uttered in English and 2 prompts in the speaker's mother tongue.
One of the prompts in the speaker's mother tongue consists of free speech.
* English:
- 4 prompts distributed throughout the session in which the speaker pronounces
his or her 7-digit client code
- 5 prompts distributed throughout the session in which the speaker pronounces
a sequence of 10 digits (the same from session to session and from speaker
to speaker)
- 2 prompts in which the speaker pronounces the sentences: "'Joe took
father's green shoe bench out" and "He eats several light tacos"',
as fixed password phrases which are common to all speakers
- 1 prompt in which the speaker is supposed to give his or her international
phone number
* Mother tongue:
- 1 prompt in which the speaker gives his or her first name, family name,
gender (female/male), town and country
- 1 prompt with free speech
The database was collected through the European telephone network
and was recorded through an ISDN card on XTL SUN platform with an 8 kHz sampling
rate. Most of the calls were automatically classified by DTMF detection. Manual
classification has been used in the case of no DTMF or wrong DTMF PIN code (circa
10% of the database).
Character set: ISO-8859-1
Medium: CD-ROMs. The first CD contains speech data from speakers M001-M069,
and the second CD ontains data from speakers F001-F060 plus M070-M074.
Total size: CD1: 636 MB
Total size: CD2: 610 MB
File format: A-law, 8 kHz sampling rate, 8 bits/sample, with no file header.
Click here to view the prices and browse other ressources belonging to this category |