Introduction

The WISPR Project (Welsh and Irish Speech Processing Resources) Project was a major project, funded by the EU "Interreg" programme. Additional funding was provided by the Welsh Language Board.

The project's aim was to develop text-to-speech synthesis for the Welsh and Irish languages, together with collecting speech databases for those languages. It was developed jointly by the Language Technologies Unit at Canolfan Bedwyr, University of Wales, Bangor, and Trinity College Dublin, with support from Dublin City University and University College Dublin.

Very little work had previously been done on developing speech technology tools for the Welsh and Irish languages. The WISPR team members believe that the best way to disseminate speech technology in a minority language environment is to provide freely distributable tools and applications that are easy for the end user to use and liberally licensed to permit developers to integrate into their own software.

Text-to-Speech Synthesis

Text-to-speech synthesis (TTS) allows a computer to read text out aloud. It is distinct from machine translation, since in TTS the text is not translated, simply read out. It can be used in screenreaders for visually impaired people, which read out the contents of the computer screen, such as e-mails and web pages (sometimes at great speed). It can also be of use when interacting with a computer system over a telephone. More recent applications involving mobile phones will be able to use TTS for such tasks as reading out messages.

The WISPR project used the popular open-source "Festival" framework for TTS. Some enhancements to Festival were developed by the Welsh WISPR team, such as allowing Festival to cope with input text in UTF-8 format (so that all possible characters in Welsh could be handled).

Festival was developed originally at the Centre for Speech Technology Research, University of Edinburgh, and subsequently at Carnegie-Mellon University, USA, under the "Festvox" project..

WISPR outputs

All the resources, software and applications developed during the WISPR project are free of charge for non-commercial purposes, and available for download from these pages. In addition, most of the resources are openly available according to a BSD-style license (the exception is the older "hl_diphone" voice, which is free only for non-commercial purposes). This ensures the fewest restrictions possible, so that the WISPR speech technology outputs can be used with a wide range of software, including proprietary and open source software.

However, all resources, software and applications are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please click here to view the WISPR resources license

The outputs are classified according to the type of people who will need them, as follows:-

The resources available are as follows:

1 Resources for end users

2 Resources for developers

 

 

1 Resources for end users

1.1 Welsh text-to-speech synthesis for Windows

 

To download a Windows version of the three Welsh TTS voices, with
instructions in Welsh, click here.


To download a Windows version of the three Welsh TTS voices, with
instructions in English, click here.

This will bring up the background information and installation instructions.

To download the voices, click on the link under "Installer", and follow the installation instructions further down the page. This will install the three voices: a basic-quality male South Welsh speaker, a female North Welsh speaker, and a male North Welsh speaker. It is possible to switch between the three voices "on the fly".

The lead voice developer in the production of the Welsh MSAPI voice was Dr Briony Williams.
Technical work was carried out by Dr Rhys James Jones and Dewi Bryn Jones.

1.2 Online Speaking Clock

Two natural-sounding time-telling voices (one male, one female) have been included in a web-based demo at the following website.

The speaking clock was developed by D Briony Williams (creation of recording prompts
and overall supervision) and Dr Ivan Uemlianin (production of final system).

 

2 Resources for developers

2.1 Welsh TTS using Unix/Linux/Cygwin

When using Unix, Linux or Cygwin, it is necessary to download and compile Festival, which includes the Edinburgh Speech Tools (EST). This can be downloaded from this directory (it includes the WISPR Welsh-specific enhancements to the Festival code). The EST code should be compiled before the Festival code.

Next, some code common to all voices should be downloaded. This includes the file "welshtoken.scm", which should be placed in the directory <Festival directory>/lib/voices/welsh/Tokenisation/

Finally, the individual voices can be downloaded and installed. The voices comprise the following:

The lead developer for the Welsh voices was Dr Briony Williams, who had the following responsibilities:

Overall responsibility for the WISPR TTS voice creation project;
Creation of diphone recording script;
Creation of letter-to-sound rules;
Creation of lexicon;
Creation of "old" diphone voice in the 1990's;
Co-ordination of recording process;
Co-ordination of processing of recordings (removal of artefacts, pitchmarking, LPC encoding);
Co-ordination of suprasegmental processing (CART trees for duration and phrase breaks);
Co-ordination of new tokenisation and UTF-8 work.

Others who worked on the production of the WISPR Welsh voices were the following:

Dr Rhys James Jones

Tokenisation rules (adding a new "hook" into the Festival code) and UTF-8 input handling;
Several edits to the Festival Scheme code;
Implementing the removal of artefacts from the recordings;
Implementing the production of MSAPI versions of the voices.

Dr Ivan Uemlianin

Processing of recordings (trimming, pitchmarking, LPC encoding);
Creation of the time-telling voices from recorded prompts;
Creation of several special-purpose scripts in Python (see below);
Assisting with the recording sessions.

Dr Ksenia Shalonova

Training of CART tree for duration.

2.2 Python scripts for use when building a new voice or other tasks

Several tools were produced for use in building a new Festival voice. These take the form of Python scripts. Python is a scripting language which can be downloaded from www.python.org. The tools are as follows:-

2.3 Recording scripts for Welsh

2.4 Recorded speech data for Welsh

2.5 Technical documentation

  1. Welsh phoneset, plus details (written by Dr Briony Williams)
  2. How to build Festival (and EST) natively in Windows (written by Dr Rhys James Jones)
  3. How to include a lexicon and LTS rules in a Welsh Festival voice (written by Dr Ivan Uemlianin)
  4. How to build a Welsh Festival diphone voice (written by Dr Ivan Uemlianin)
  5. How to train a CART tree for duration using data labelled hierarchically under Emu with .hlb files (the relevant files are here). (written by Dr Briony WIlliams and Dr Ksenia Shalonova)

2.6 Scientific papers

2006
2005
2004

Language Technologies
(Canolfan Bedwyr)
University of Wales, Bangor 2001 - 2006