The WISPR Project (Welsh and Irish Speech Processing Resources)
Project was a major project, funded by the EU "Interreg" programme. Additional
funding was provided by the Welsh Language Board.
The project's aim was to develop
text-to-speech synthesis for the Welsh and Irish languages, together
with collecting speech databases for those languages. It was developed
jointly by the Language Technologies Unit at Canolfan Bedwyr,
University of Wales, Bangor, and Trinity College Dublin, with support
from Dublin City University and University College Dublin.
Very little work had previously been done on developing
speech technology tools for the Welsh and Irish languages. The WISPR team members
believe that the best way to disseminate speech technology in a minority language
environment is to provide freely
distributable tools and applications that are easy for the end user to
use and liberally licensed to permit developers to integrate into their own software.
Text-to-speech synthesis (TTS) allows a computer to read text out aloud. It
is distinct from machine translation, since in TTS the text is not translated,
simply read out. It can be used in screenreaders for visually impaired people,
which read out the contents of the computer screen, such as e-mails and web pages (sometimes at great speed).
It can also be of use when interacting with a computer system over a telephone.
More recent applications involving mobile phones will be able to use TTS for
such tasks as reading out messages.
The WISPR project used the popular open-source "Festival" framework for TTS.
Some enhancements to Festival were developed by the Welsh WISPR team, such
as allowing Festival to cope with input text in UTF-8 format (so that
characters in Welsh could be handled).
Festival was developed originally at
the Centre for Speech Technology
Research, University of Edinburgh, and subsequently
at Carnegie-Mellon University, USA, under the "Festvox"
All the resources and applications developed during the WISPR project are
of charge, and available for download from these pages. In
addition, all source code is openly available according to a BSD-style
license. This ensures the fewest restrictions possible, so that the
WISPR speech technology outputs can be used with a wide
range of software, including proprietary and open source software.
The outputs are classified
according to the type of people who will need them, as follows:-
- End users: End users who simply wish to install and use
a Welsh speech synthesiser.
- Developers: Developers with relevant technical expertise
who wish to adapt and work on the Welsh TTS system, or who wish to develop
a Festival TTS voice for another language.
The resources available are as follows:
1 Resources for end users
2 Resources for developers
1 Resources for end users
1.1 Welsh text-to-speech synthesis for Windows
To download a Windows version of the three Welsh TTS voices, click here.
This will bring up the background information and installation instructions.
To download the voices, click on the link under "Installer", and follow the installation instructions further down the page. This will install the three voices: a basic-quality male South Welsh speaker, a female North Welsh speaker, and a male North Welsh speaker. It is possible to switch between the three voices "on the fly".
1.2 Online Speaking Clock
Two natural-sounding time-telling voices (one male, one female)
have been included in a web-based demo at the following
2 Resources for developers
2.1 Welsh TTS using Unix/Linux/Cygwin
When using Unix, Linux or Cygwin, it is necessary to download and
compile Festival, which includes the Edinburgh Speech Tools
(EST). This can be downloaded from this
includes the WISPR Welsh-specific enhancements to the Festival code). The EST
code should be compiled before the Festival code.
Next, some code common to
all voices should be downloaded. This includes the file "welshtoken.scm",
which should be placed in the directory <Festival directory>/lib/voices/welsh/Tokenisation/
Finally, the individual voices can be
downloaded and installed. The voices comprise the
- "Old" diphone voice hl_diphone: This is a
male South Welsh diphone voice, with lower speech quality, but with recent
(UTF-8 input, and support for tokenisation
abbreviations, acronyms, etc).
- "New" North Welsh diphone voices cb_cy_cw (female)
and cb_cy_llg (male): compared to the "old" voice, these include improved
letter-to-sound rules and a statistical model for duration.
time-telling voices cb_amser_cw_ldom (female) and cb_time_llyr_ldom (male): These are both North Welsh voices using a limited-domain unit selection
technique in which time intervals are rounded to the nearest five minutes.
2.2 Python scripts for use when building a new voice
or other tasks
Several tools were produced for use in building a new Festival voice. These
take the form of Python scripts. Python is a scripting language which can be
downloaded from www.python.org.
The tools are as follows:-
- SpeechCluster: This is a multipurpose suite of tools
for automating several tedious aspects of developing a new Festival voice.
Further details and downloads are on
the SpeechCluster page.
- Optese: This is a highly efficient implementation of the
"greedy algorithm" for selecting texts for recording when developing
a unit selection voice in
TTS. Further details and downloads are on
the Optese page.
- pyHTK: This script is a convenient wrapper around the
HTK suite for creating a speech recogniser. It makes HTK easier to use, especially
for those who
are not experienced in its use. Further details and downloads are available
on the pyHTK page.
- lff2scm: This script takes a set of hand-written letter-to-sound
rules as input, in context-sensitive critically-ordered format ("linguist-friendly"
format). It outputs the rules in the "Scheme" format used by Festival. It
can be downloaded from the lff2scm page.
- txt2xml: This Python script is designed to be used in
conjunction with the JSpeechRecorder software
from the Bavarian Archive for Speech Signals, for recording
speech. The script takes a plain text file as input, one recording prompt
per line, where the first line of the file is an arbitrary comment or metadata
line gives the base filename for output recorded speech. The output file
is an XML file suitable for use with JSpeechRecorder. The script can
be downloaded here.
- utils: This is a suite of short Python scripts for various low-level tasks.
Further details and downloads are on the utils page.
2.3 Recording scripts for Welsh
2.4 Recorded speech data for Welsh
2.5 Technical documentation
2.6 Scientific papers
of creating a research capability in speech technology for two minority languages.
Briony Williams, Delyth Prys and Ailbhe Ní Chasaide. Interspeech 2005
(9th European Conference on Speech Science and Technology, Lisbon, Portugal,
4-8 September 2005).
- Poster to accompany the above paper.
- SpeechCluster: A speech database builder's
multitool. Ivan A. Uemlianin. Paper given at Lesser Used Languages & Computer
Linguistics, European Academy Bozen/Bolzano, Italy, October 2005 (to be published
synthesis for Welsh and Irish: an overview. Briony Williams.
Powerpoint format presentation given at the conference of the North American
Association of Celtic Language Teachers, Bangor, UK, June 9-12 2005.
Speech Processing Resources for Welsh and Irish. Delyth
Prys, Briony Williams, Bill Hicks, Dewi Jones, Ailbhe Ní Chasaide,
Christer Gobl, Julie Berndsen, Fred Cummins, Máire Ní Chiosáin,
John McKenna, Rónán Scaife, Elaine Uí Dhonnchadha.
Pre-Conference Workshop on "First Steps for Language Documentation
of Minority Languages", 4th Language Resources and Evaluation Conference
(LREC), Lisbon, Portugal, 24-30 May 2004.
technology in Welsh and Irish: the WISPR project. Briony
Williams, Delyth Prys and Dewi Jones. ELRA Newsletter (European Language
Resources Association), vol. 9, no. 4, Oct-Dec 200
University of Wales, Bangor 2001 - 2006