WISPR: Welsh and Irish Speech Processing Resources

Introduction

The WISPR Project (Welsh and Irish Speech Processing Resources) Project was a major project, funded by the EU "Interreg" programme. Additional funding was provided by the Welsh Language Board.

The project's aim was to develop text-to-speech synthesis for the Welsh and Irish languages, together with collecting speech databases for those languages. It was developed jointly by the Language Technologies Unit at Canolfan Bedwyr, University of Wales, Bangor, and Trinity College Dublin, with support from Dublin City University and University College Dublin.

Very little work had previously been done on developing speech technology tools for the Welsh and Irish languages. The WISPR team members believe that the best way to disseminate speech technology in a minority language environment is to provide freely distributable tools and applications that are easy for the end user to use and liberally licensed to permit developers to integrate into their own software.

Text-to-Speech Synthesis

Text-to-speech synthesis (TTS) allows a computer to read text out aloud. It is distinct from machine translation, since in TTS the text is not translated, simply read out. It can be used in screenreaders for visually impaired people, which read out the contents of the computer screen, such as e-mails and web pages (sometimes at great speed). It can also be of use when interacting with a computer system over a telephone. More recent applications involving mobile phones will be able to use TTS for such tasks as reading out messages.

The WISPR project used the popular open-source "Festival" framework for TTS. Some enhancements to Festival were developed by the Welsh WISPR team, such as allowing Festival to cope with input text in UTF-8 format (so that all possible characters in Welsh could be handled).

Festival was developed originally at the Centre for Speech Technology Research, University of Edinburgh, and subsequently at Carnegie-Mellon University, USA, under the "Festvox" project..

WISPR outputs

All the resources, software and applications developed during the WISPR project are free of charge for non-commercial purposes, and available for download from these pages. In addition, most of the resources are openly available according to a BSD-style license (the exception is the older "hl_diphone" voice, which is free only for non-commercial purposes). This ensures the fewest restrictions possible, so that the WISPR speech technology outputs can be used with a wide range of software, including proprietary and open source software.

However, all resources, software and applications are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please click here to view the WISPR resources license

The outputs are classified according to the type of people who will need them, as follows:-

End users: End users who simply wish to install and use a Welsh speech synthesiser.
Developers: Developers with relevant technical expertise who wish to adapt and work on the Welsh TTS system, or who wish to develop a Festival TTS voice for another language.

The resources available are as follows:

1 Resources for end users

2 Resources for developers

2.1 Welsh TTS using Unix/Linux/Cygwin
2.2 Python scripts for use when building a new voice or other tasks
2.3 Recording scripts for Welsh
2.4 Recorded speech data for Welsh
2.5 Technical documentation
2.6 Scientific papers

1 Resources for end users

1.1 Welsh text-to-speech synthesis for Windows

To download a Windows version of the three Welsh TTS voices, with
instructions in Welsh, click here.

To download a Windows version of the three Welsh TTS voices, with
instructions in English, click here.

This will bring up the background information and installation instructions.

To download the voices, click on the link under "Installer", and follow the installation instructions further down the page. This will install the three voices: a basic-quality male South Welsh speaker, a female North Welsh speaker, and a male North Welsh speaker. It is possible to switch between the three voices "on the fly".

The lead voice developer in the production of the Welsh MSAPI voice was Dr Briony Williams.
Technical work was carried out by Dr Rhys James Jones and Dewi Bryn Jones.

1.2 Online Speaking Clock

Two natural-sounding time-telling voices (one male, one female) have been included in a web-based demo at the following website.

The speaking clock was developed by D Briony Williams (creation of recording prompts
and overall supervision) and Dr Ivan Uemlianin (production of final system).

2 Resources for developers

2.1 Welsh TTS using Unix/Linux/Cygwin

When using Unix, Linux or Cygwin, it is necessary to download and compile Festival, which includes the Edinburgh Speech Tools (EST). This can be downloaded from this directory (it includes the WISPR Welsh-specific enhancements to the Festival code). The EST code should be compiled before the Festival code.

Next, some code common to all voices should be downloaded. This includes the file "welshtoken.scm", which should be placed in the directory <Festival directory>/lib/voices/welsh/Tokenisation/

Finally, the individual voices can be downloaded and installed. The voices comprise the following:

"Old" diphone voice hl_diphone: This is a male South Welsh diphone voice, with lower speech quality, but with recent enhancements (UTF-8 input, and support for tokenisation of Welsh abbreviations, acronyms, etc).
"New" North Welsh diphone voices cb_cy_cw (female) and cb_cy_llg (male): compared to the "old" voice, these include improved letter-to-sound rules and a statistical model for duration.
Two time-telling voices cb_amser_cw_ldom (female) and cb_time_llyr_ldom (male): These are both North Welsh voices using a limited-domain unit selection technique in which time intervals are rounded to the nearest five minutes.

The lead developer for the Welsh voices was Dr Briony Williams, who had the following responsibilities:

Overall responsibility for the WISPR TTS voice creation project;

Creation of diphone recording script;

Creation of letter-to-sound rules;

Creation of lexicon;

Creation of "old" diphone voice in the 1990's;

Co-ordination of recording process;

Co-ordination of processing of recordings (removal of artefacts, pitchmarking, LPC encoding);

Co-ordination of suprasegmental processing (CART trees for duration and phrase breaks);

Co-ordination of new tokenisation and UTF-8 work.

Others who worked on the production of the WISPR Welsh voices were the following:

Dr Rhys James Jones

Tokenisation rules (adding a new "hook" into the Festival code) and UTF-8 input handling;

Several edits to the Festival Scheme code;

Implementing the removal of artefacts from the recordings;

Implementing the production of MSAPI versions of the voices.

Dr Ivan Uemlianin

Processing of recordings (trimming, pitchmarking, LPC encoding);

Creation of the time-telling voices from recorded prompts;

Creation of several special-purpose scripts in Python (see below);

Assisting with the recording sessions.

Dr Ksenia Shalonova

Training of CART tree for duration.

2.2 Python scripts for use when building a new voice or other tasks

Several tools were produced for use in building a new Festival voice. These take the form of Python scripts. Python is a scripting language which can be downloaded from www.python.org. The tools are as follows:-

SpeechCluster: This is a multipurpose suite of tools for automating several tedious aspects of developing a new Festival voice. Further details and downloads are on the SpeechCluster page.
Optese: This is a highly efficient implementation of the "greedy algorithm" for selecting texts for recording when developing a unit selection voice in TTS. Further details and downloads are on the Optese page.
pyHTK: This script is a convenient wrapper around the HTK suite for creating a speech recogniser. It makes HTK easier to use, especially for those who are not experienced in its use. Further details and downloads are available on the pyHTK page.
lff2scm: This script takes a set of hand-written letter-to-sound rules as input, in context-sensitive critically-ordered format ("linguist-friendly" format). It outputs the rules in the "Scheme" format used by Festival. It can be downloaded from the lff2scm page.
txt2xml: This Python script is designed to be used in conjunction with the JSpeechRecorder software from the Bavarian Archive for Speech Signals, for recording speech. The script takes a plain text file as input, one recording prompt per line, where the first line of the file is an arbitrary comment or metadata field, and the second line gives the base filename for output recorded speech. The output file is an XML file suitable for use with JSpeechRecorder. The script can be downloaded here.
utils: This is a suite of short Python scripts for various low-level tasks. Further details and downloads are on the utils page.

These scripts were created by Dr Ivan Uemlianin.

2.3 Recording scripts for Welsh

Pseudo-Welsh nonsense words used as prompts for a Welsh diphone voice: gogdiphs
300 sentences from the Welsh Bible (some are very long): basic-000-299
351 sentences from a Welsh undergraduate dissertation: sentences 000-181, sentences 182-350

The "nonsense word" script (used for the WISPR voices) was created by Dr Briony Williams,
who also selected and organised the two other databases.

2.4 Recorded speech data for Welsh

Pseudo-Welsh nonsense words, male North Welsh speaker: sound files, pitchmark files and label files
Pseudo-Welsh nonsense words, female North Welsh speaker: sound files, pitchmark files and label files
First 265 of the 300 sentences from the Bible, male North Welsh speaker (soundfiles only): sentences 000-264
351 sentences from a Welsh undergraduate dissertation, male North Welsh speaker (soundfiles only): sentences 000-181, sentences 182-350

The recording sessions and recruitment of voice talents were carried out by Dr Briony Williams,
with assistance from Dr Ivan Uemlianin and Dr Rhys james Jones. The processing of the recordings
was carried out by Dr Ivan Uemlianin, supervised by Dr Briony Williams.

2.5 Technical documentation

Welsh phoneset, plus details (written by Dr Briony Williams)
How to build Festival (and EST) natively in Windows (written by Dr Rhys James Jones)
How to include a lexicon and LTS rules in a Welsh Festival voice (written by Dr Ivan Uemlianin)
How to build a Welsh Festival diphone voice (written by Dr Ivan Uemlianin)
How to train a CART tree for duration using data labelled hierarchically under Emu with .hlb files (the relevant files are here). (written by Dr Briony WIlliams and Dr Ksenia Shalonova)

2.6 Scientific papers

2006

Integrating Festival and Windows. Rhys James Jones, Ambrose Choy, Briony Williams. InterSpeech 2006 (9th International Conference on Spoken Language Processing, Pittsburgh, USA, 17-21 September 2006).
Tools and resources for speech synthesis arising from a Welsh TTS project. Briony Williams, Rhys James Jones and Ivan Uemlianin. Fifth Language Resources and Evaluation Conference (LREC), Genoa, Italy, 24-26 May 2006.
Poster to accompany the above paper.

2005

Experiences of creating a research capability in speech technology for two minority languages. Briony Williams, Delyth Prys and Ailbhe Ní Chasaide. Interspeech 2005 (9th European Conference on Speech Science and Technology, Lisbon, Portugal, 4-8 September 2005).
Poster to accompany the above paper.
SpeechCluster: A speech database builder's multitool. Ivan A. Uemlianin. Paper given at Lesser Used Languages & Computer Linguistics, European Academy Bozen/Bolzano, Italy, October 2005 (to be published in Proceedings, early 2006)
Text-to-speech synthesis for Welsh and Irish: an overview. Briony Williams. Powerpoint format presentation given at the conference of the North American Association of Celtic Language Teachers, Bangor, UK, June 9-12 2005.

2004

WISPR: Speech Processing Resources for Welsh and Irish. Delyth Prys, Briony Williams, Bill Hicks, Dewi Jones, Ailbhe Ní Chasaide, Christer Gobl, Julie Berndsen, Fred Cummins, Máire Ní Chiosáin, John McKenna, Rónán Scaife, Elaine Uí Dhonnchadha. Pre-Conference Workshop on "First Steps for Language Documentation of Minority Languages", 4th Language Resources and Evaluation Conference (LREC), Lisbon, Portugal, 24-30 May 2004.
Speech technology in Welsh and Irish: the WISPR project. Briony Williams, Delyth Prys and Dewi Jones. ELRA Newsletter (European Language Resources Association), vol. 9, no. 4, Oct-Dec 2004

Language Technologies
(Canolfan Bedwyr)
University of Wales, Bangor 2001 - 2006