Welsh and Irish Speech Processing Resources

lff2scm : linguist-friendly format letter to sound rules to Scheme

Author:	Ivan A. Uemlianin
Contact:	i.uemlianin@bangor.ac.uk
Copyright:	2005, University of Wales, Bangor
Date:	2005/06/10

Contents

Overview
Set up
- Download
- Installation
Linguist-Friendly Format Definition
- General
- Header
- Rules
lff2scm Usage
- Some common uses:

Overview

lff2scm converts festival letter-to-sound rules from 'linguist-friendly' format to scheme.

Set up

Download

lff2scm.py can be downloaded from here. The individual components can be downloaded from here.

Installation

As long as you have python, there is no installation necessary. The script lff2scm.py should just run as it is.

Linguist-Friendly Format Definition

General

Any line beginning with whitespace is translated into a scheme comment (apart from completely blank lines, which are kept blank).

Header

There is no obligatory header, but if you want to use festival phoneme variables (see the festival docs), you should use a 'symbols' section in the lff file. This is the format:

<whitespace># symbols

            x    <interpretation>
            y    <interpretation>
            z    <interpretation>

            # variables

            X    <interpretation>
            Y    <interpretation>
            Z    <interpretation>

            # end symbols

At the moment <interpretation> is assumed to mean 'one and only one of' some symbol (i.e. whatever symbol or phoneme is given), and ignored. lff2scm does scan for and translate the following qualifying phrases:

lff	scm
one or more of	+
zero or more of	*
zero or one of	?

Interpretations containing these phrases should be of the following form:

<qual-phrase>  [<set>]

Where <set> is a space-separated list of symbols. For example:

V    one or more of [i e eh ae ah oh ao uu ih uh ei ai aw oi ow]

This will result in two translations:

The set will be recorded in the scheme header, as:

;  Sets used in the rules
(
    (V i e eh ae ah oh ao uu ih uh ei ai aw oi ow )
)

In the rules section 'V' will be replaced by 'V +', e.g.:

lff	scm
V[ngh]w=ng h	( V + [ n g h ] w = ng h )

Rules

As far as lff2scm is concerned, you can have pretty much anything you like here. lff2scm expects an equals sign between a left-hand-side and a right-hand-side, but that's about it. lff2scm puts a space between the characters and wraps the line in brackets, e.g.:

lff	scm
[angos]=anggos	( [ a n g o s ] = a n g g o s )
[aratoi]!=arato/i	( [ a r a t o i ] ! = a r a t o / i )
#h[en]#=e^n	( # h [ e n ] # = e ^ n )

With the 'to phonemes' option -p, the right-hand-side of each rule is assumed to be a space-separated string of phonemes, e.g.:

lff	scm
V[ngh]w=ng h	( V [ n g h ] w = ng h )
V[nghr]=ng rh	( V [ n g h r ] = ng rh )
V[nghl]=ng lh	( V [ n g h l ] = ng lh )

lff2scm Usage

Calling lff2scm.py with no arguments will output this brief usage reminder:

lff2scm.py: lts rule format converter
    Usage:
        lff2scm.py (-p) ltsFilename

    - If '-p' lff2scm assumes rhs of rules are phonemes,
      otherwise, lff2scm assumes rhs of rules are characters.
      lhs of rules is always assumed to be characters
    - Assumes input is in 'linguist-friendly' format.
    - Pipes output to stdout.

Some common uses:

$ lff2scm.py newepen.lff > newepen.scm
$ lff2scm.py -p gogwel.lff > gogwel.scm