WISPR: Welsh and Irish Speech Processing Resources

Welsh Text-to-Speech for Windows
Festival Speech Synthesis System MSAPI Interface with Welsh Voices, v1.0

REMARKS

This is the public release of three Welsh voices developed as part of the WISPR (Welsh and Irish Speech Processing Resource) project by the Language Technologies Unit (LTU) at Canolfan Bedwyr, Bangor University.

Windows integration is possible via additional software components developed by the LTU, that bridges between Windows and Festival (originally an open source Unix/Linux based speech engine) via the MSAPI (Microsoft Speech API) standard. This release has the minimum support for MSAPI needed to support basic speak functionality.

The Welsh text to speech voice fulfils basic support for the speak command in MSAPI. The ability to vary the speech rate of the voice is also supported.

The Festival speech engine system has been stripped down to a minimum in order to decrease the installer size for the three Welsh voices to 56Mb

The licensing of all speech resources produced by the WISPR project is BSD-like, meaning you have the freedom to use this software in your projects or products at no financial cost. The source code for the MSAPI components will be published soon, but if you have an immediate need then please contact d.b.jones@bangor.ac.uk

Most files in this distribution may be used for commercial and/or non-commercial purposes. However,

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See Known Issues and Limitations

This release contains the "hl_diphone" voice files for non-commercial use only (i.e. all files under http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/FestVox_Voices/hl_diphone/ ). Enquiries concerning commercial use of these files should be addressed to the University of Edinburgh.

DOWNLOAD LOCATION

Windows Installer: http://www.e-gymraeg.org/wispr/1.0/wispr_msapi.zip

License Details: http://www.e-gymraeg.org/wispr/1.0/LICENSE.html

Acknowledgements: http://www.e-gymraeg.org/wispr/1.0/ACKNOWLEDGEMENTS.html

VOICES INCLUDED

Three voices are included in the installer. One of them can be selected at installation time. Instructions to change the active voice after installation are given in the ‘Configuration file’ section below.

1)      cb_cy_llg_diphone: North Welsh male, sampling rate 16k

2)      cb_cy_cw_diphone: North Welsh female, sampling rate 16k

3)      hl_diphone: South Welsh male, sampling rate 10k

Voices 1) and 2) were developed wholly by the WISPR project. Voice 3) was originally developed by Dr Briony Williams formerly of the Centre for Speech Technology Research (CSTR), Edinburgh.

PREVIOUS RELEASES

The URLs below refer to previous releases of the MSAPI framework, the early ones of which were available only to beta testers and interested parties. The hl_diphone voice alone was packaged with these releases.

Version 0.1: Release Note: http://www.bangor.ac.uk/ar/cb/wispr/0.1/ReleaseNote.txt

Version 0.2: Release Note: http://www.bangor.ac.uk/ar/cb/wispr/0.2/

Version 0.3: Release Note: http://www.bangor.ac.uk/ar/cb/wispr/0.3/

Version 0.4 was produced for internal use and testing within Canolfan Bedwyr and was not released publicly

Version 0.5: Release Note: http://www.e-gymraeg.org/wispr/0.5/

FEATURES

-        Config file read on voice initialisation, allowing any installed Festival voice to be used with the MSAPI interface (see Technical notes)

-        Support for stop speaking

-        Support for varying the speech rate

-        Standalone MSAPI interface; client-server setup is not required (see Technical Notes)

-        Supports UTF-8 characters encoding and includes extra tokenisations found in menu options

-        'Time interpretation' tokenisation for the Welsh Speaking Clock

-        Error Logging (see Technical notes)

KNOWN ISSUES & LIMITATIONS

- The MSAPI interface works best in Windows XP. The Festival engine might not work on Windows 98/ME/2000 machines. Please install onto these operating systems at your own risk.

- If the speech middleware within Windows is missing (as is common in Windows 98/ME) then the installer contains merge modules for adding these to your Control Panel. (see 'Files Created or Used' section below)

- For the time being, we can only label Windows XP Home and Professional as operating system requirements, though the installer is believed to work with Windows 2000 Professional also.

- Many aspects of MSAPI support are missing e.g. change volume etc.

- Occasional temporary degradation of voice quality when switching to and from the voice

- The quality of the hl_diphone voice is lower than that of the other two voices, due to Windows having to resample the voice before output.

SYSTEM REQUIREMENTS

- Windows XP Home/Professional

- Intel Pentium class processor (233 MHz or faster recommended)

- 64 MB RAM

- 60 MB free hard disk space

INSTALLATION INSTRUCTIONS

To install the voice

- unzip all contents of zip file into a temporary table

- double click on setup.exe program

- follow instructions on screen.

- re-boot Windows if necessary

To hear the voice

- Go to the Control Panel and choose the Speech icon.

- In Speech Properties dialog window, click on the 'Text to Speech' tab.

- In the voice selection dropdown select 'Festival Welsh Diphone Voice 1.0'

- Enter any Welsh text in the text box underneath

- Press the 'Preview Voice' to listen to the voice, press stop to terminate the voice

If you experience any installation difficulties, please contact d.b.jones@bangor.ac.uk

CONFIGURATION

Festival 1.96 beta (Created by Alan Black on September 19 th 2005, including Canolfan Bedwyr’s additions to support UTF-8) built with Microsoft Visual Studio .NET compiler (Make files generated in Cygwin)

MSAPI developed with Microsoft Speech SDK 5.1

Merge modules included in installer:

SpPhones.msm - Universal Phone Set Merge Module.

For further details see: http://www.microsoft.com/speech/download/old/ups.asp

To install Microsoft speech middleware and a Speech icon into your Control Panel (if not already there):

Sp5.msm

Sp5Intl.msm

SpCommon.msm

(see 'Microsoft Speech SDK Setup 5.1' topic in Microsoft Speech SDK 5.1 Help)

FILES/DIRECTORIES CREATION

C:\festival\festival (Main directory)

C:\festival\festival\lib (Voice library files)

C:\festival\festival\lib\etc

C:\festival\festival\lib\multisyn

C:\festival\festival\lib\voices\...

C:\festival\festival\win32 (Dll and other win32 apps location)

C:\festival\festival\win32\tmp (Temporary wave file folder)

C:\festival\festival\win32\log (Error logging)

TECHNICAL/DEVELOPERS NOTES

CONFIGURATION FILE

A configuration file must be placed in C:\festival\festival\win32\config\config.ini. This file consists of two key=value pairs. The following keys must be present:

command (the exact command to start the voice from the Festival command-line, including the enclosing brackets)

sampling_rate (in Hz)

For example:

command=(voice_cb_cy_llg_diphone)
sampling_rate=16000

For the voices included with this release, the following commands can be used.

(voice_cb_cy_llg_diphone): North Welsh male
(voice_cb_cy_cw_diphone): North Welsh female
(voice_hl_diphone): South Welsh male, with a lower internal sampling rate than the North Welsh voices.

The sampling rate should remain as 16000 for all voices.

If a different voice is required, the file can be changed on the fly without having to re-start the MSAPI interface.

ERROR LOG

A log file is kept in C:\festival\festival\win32\log with the file names YYYYMM.txt, which traces any errors during the use of MSAPI/Festival interface.

For example:

2005/05/24 09:46:13 Error festival_eval_command - (voice_hl_diphone) (0)
2005/05/25 12:28:13 Error festival_text_to_wave - 'Maximize' (0)

And the information is logged in the following format:

Time Stamp Type Error encountered - Additional info (Error Code as generated by GetLastError)

TEMPORARY FOLDER CREATION

This folder is required to store the temporary wave file before it is played by MSAPI. Without this folder, the wave file can not be saved and as a result, the voice will not be played.

SOURCE CODE

The Windows Build Festival is built against festival version 1.96 beta and the associated speech_tools, from:

http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/Software/Festival/WISPR/Merged/Snapshots/MSAPI_V1.0_20060623/festival

http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/Software/Festival/WISPR/Merged/Snapshots/MSAPI_V1.0_20060623/speech_tools

The common Voices folders packaged in this installation (LTS, Lexicon, Tokenisation) are taken from:

http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/FestVox_Voices/common/Snapshot/MSAPI_V1.0_20060623/festival/lib/voices/

The cb_cy_llg_diphone folder packaged is taken from:

http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/FestVox_Voices/cb_cy_llg_diphone/Snapshot/MSAPI_V1.0_20060623

The cb_cy_cw_diphone folder packaged is taken from:

http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/FestVox_Voices/cb_cy_cw_diphone/Snapshot/MSAPI_V1.0_20060623

The hl_diphone folder packaged is taken from:

http://bedwyr-redhat.bangor.ac.uk/svn/repos/WISPR/FestVox_Voices/hl_diphone/Snapshot/MSAPI_V1.0_20060623