WISPR: Welsh and Irish Speech Processing Resources

Introduction

The WISPR Project (Welsh and Irish Speech Processing Resources) Project was a major project, funded by the EU "Interreg" programme. Additional funding was provided by the Welsh Language Board.

The project's aim was to develop text-to-speech synthesis for the Welsh and Irish languages, together with collecting speech databases for those languages. It was developed jointly by the Language Technologies Unit at Canolfan Bedwyr, University of Wales, Bangor, and Trinity College Dublin, with support from Dublin City University and University College Dublin.

Very little work had previously been done on developing speech technology tools for the Welsh and Irish languages. The WISPR team members believe that the best way to disseminate speech technology in a minority language environment is to provide freely distributable tools and applications that are easy for the end user to use and liberally licensed to permit developers to integrate into their own software.

Text-to-Speech Synthesis

Text-to-speech synthesis (TTS) allows a computer to read text out aloud. It is distinct from machine translation, since in TTS the text is not translated, simply read out. It can be used in screenreaders for visually impaired people, which read out the contents of the computer screen, such as e-mails and web pages (sometimes at great speed). It can also be of use when interacting with a computer system over a telephone. More recent applications involving mobile phones will be able to use TTS for such tasks as reading out messages.

The WISPR project used the popular open-source "Festival" framework for TTS. Some enhancements to Festival were developed by the Welsh WISPR team, such as allowing Festival to cope with input text in UTF-8 format (so that all possible characters in Welsh could be handled).

Festival was developed originally at the Centre for Speech Technology Research, University of Edinburgh, and subsequently at Carnegie-Mellon University, USA, under the "Festvox" project..

WISPR outputs

All the resources and applications developed during the WISPR project are free of charge, and available for download from these pages. In addition, all source code is openly available according to a BSD-style license. This ensures the fewest restrictions possible, so that the WISPR speech technology outputs can be used with a wide range of software, including proprietary and open source software.

The outputs are classified according to the type of people who will need them, as follows:-

End users: End users who simply wish to install and use a Welsh speech synthesiser.
Developers: Developers with relevant technical expertise who wish to adapt and work on the Welsh TTS system, or who wish to develop a Festival TTS voice for another language.

The resources available are as follows:

1 Resources for end users

2 Resources for developers

2.1 Welsh TTS using Unix/Linux/Cygwin
2.2 Python scripts for use when building a new voice or other tasks
2.3 Recording scripts for Welsh
2.4 Recorded speech data for Welsh
2.5 Technical documentation
2.6 Scientific papers

1 Resources for end users

1.1 Welsh text-to-speech synthesis for Windows

To download a Windows version of the three Welsh TTS voices, click here.

This will bring up the background information and installation instructions.

To download the voices, click on the link under "Installer", and follow the installation instructions further down the page. This will install the three voices: a basic-quality male South Welsh speaker, a female North Welsh speaker, and a male North Welsh speaker. It is possible to switch between the three voices "on the fly".

1.2 Online Speaking Clock

Two natural-sounding time-telling voices (one male, one female) have been included in a web-based demo at the following website.

2 Resources for developers

2.1 Welsh TTS using Unix/Linux/Cygwin

When using Unix, Linux or Cygwin, it is necessary to download and compile Festival, which includes the Edinburgh Speech Tools (EST). This can be downloaded from this directory (it includes the WISPR Welsh-specific enhancements to the Festival code). The EST code should be compiled before the Festival code.

Next, some code common to all voices should be downloaded. This includes the file "welshtoken.scm", which should be placed in the directory <Festival directory>/lib/voices/welsh/Tokenisation/

Finally, the individual voices can be downloaded and installed. The voices comprise the following:

"Old" diphone voice hl_diphone: This is a male South Welsh diphone voice, with lower speech quality, but with recent enhancements (UTF-8 input, and support for tokenisation of Welsh abbreviations, acronyms, etc).
"New" North Welsh diphone voices cb_cy_cw (female) and cb_cy_llg (male): compared to the "old" voice, these include improved letter-to-sound rules and a statistical model for duration.
Two time-telling voices cb_amser_cw_ldom (female) and cb_time_llyr_ldom (male): These are both North Welsh voices using a limited-domain unit selection technique in which time intervals are rounded to the nearest five minutes.

2.2 Python scripts for use when building a new voice or other tasks

Several tools were produced for use in building a new Festival voice. These take the form of Python scripts. Python is a scripting language which can be downloaded from www.python.org. The tools are as follows:-

SpeechCluster: This is a multipurpose suite of tools for automating several tedious aspects of developing a new Festival voice. Further details and downloads are on the SpeechCluster page.
Optese: This is a highly efficient implementation of the "greedy algorithm" for selecting texts for recording when developing a unit selection voice in TTS. Further details and downloads are on the Optese page.
pyHTK: This script is a convenient wrapper around the HTK suite for creating a speech recogniser. It makes HTK easier to use, especially for those who are not experienced in its use. Further details and downloads are available on the pyHTK page.
lff2scm: This script takes a set of hand-written letter-to-sound rules as input, in context-sensitive critically-ordered format ("linguist-friendly" format). It outputs the rules in the "Scheme" format used by Festival. It can be downloaded from the lff2scm page.
txt2xml: This Python script is designed to be used in conjunction with the JSpeechRecorder software from the Bavarian Archive for Speech Signals, for recording speech. The script takes a plain text file as input, one recording prompt per line, where the first line of the file is an arbitrary comment or metadata field, and the second line gives the base filename for output recorded speech. The output file is an XML file suitable for use with JSpeechRecorder. The script can be downloaded here.
utils: This is a suite of short Python scripts for various low-level tasks. Further details and downloads are on the utils page.

2.3 Recording scripts for Welsh

Pseudo-Welsh nonsense words used as prompts for a Welsh diphone voice: gogdiphs
300 sentences from the Welsh Bible (some are very long): basic-000-299
351 sentences from a Welsh undergraduate dissertation: sentences 000-181, sentences 182-350

2.4 Recorded speech data for Welsh

Pseudo-Welsh nonsense words, male North Welsh speaker: sound files, pitchmark files and label files
Pseudo-Welsh nonsense words, female North Welsh speaker: sound files, pitchmark files and label files
First 265 of the 300 sentences from the Bible, male North Welsh speaker (soundfiles only): sentences 000-264
351 sentences from a Welsh undergraduate dissertation, male North Welsh speaker (soundfiles only): sentences 000-181, sentences 182-350

2.5 Technical documentation

2.6 Scientific papers

2006

Integrating Festival and Windows. Rhys James Jones, Ambrose Choy, Briony Williams. InterSpeech 2006 (9th International Conference on Spoken Language Processing, Pittsburgh, USA, 17-21 September 2006).
Tools and resources for speech synthesis arising from a Welsh TTS project. Briony Williams, Rhys James Jones and Ivan Uemlianin. Fifth Language Resources and Evaluation Conference (LREC), Genoa, Italy, 24-26 May 2006.
Poster to accompany the above paper.

2005

Experiences of creating a research capability in speech technology for two minority languages. Briony Williams, Delyth Prys and Ailbhe Ní Chasaide. Interspeech 2005 (9th European Conference on Speech Science and Technology, Lisbon, Portugal, 4-8 September 2005).
Poster to accompany the above paper.
SpeechCluster: A speech database builder's multitool. Ivan A. Uemlianin. Paper given at Lesser Used Languages & Computer Linguistics, European Academy Bozen/Bolzano, Italy, October 2005 (to be published in Proceedings, early 2006)
Text-to-speech synthesis for Welsh and Irish: an overview. Briony Williams. Powerpoint format presentation given at the conference of the North American Association of Celtic Language Teachers, Bangor, UK, June 9-12 2005.

2004

WISPR: Speech Processing Resources for Welsh and Irish. Delyth Prys, Briony Williams, Bill Hicks, Dewi Jones, Ailbhe Ní Chasaide, Christer Gobl, Julie Berndsen, Fred Cummins, Máire Ní Chiosáin, John McKenna, Rónán Scaife, Elaine Uí Dhonnchadha. Pre-Conference Workshop on "First Steps for Language Documentation of Minority Languages", 4th Language Resources and Evaluation Conference (LREC), Lisbon, Portugal, 24-30 May 2004.
Speech technology in Welsh and Irish: the WISPR project. Briony Williams, Delyth Prys and Dewi Jones. ELRA Newsletter (European Language Resources Association), vol. 9, no. 4, Oct-Dec 200

Language Technologies
(Canolfan Bedwyr)
University of Wales, Bangor 2001 - 2006