![]() Welsh and Irish Speech Processing Resources |
![]() ![]() ![]() ![]() ![]() ![]() |
Author: | Ivan A. Uemlianin |
---|---|
Contact: | i.uemlianin@bangor.ac.uk |
Copyright: | 2005, University of Wales, Bangor |
Date: | 2005/06/10 |
lff2scm converts festival letter-to-sound rules from 'linguist-friendly' format to scheme.
lff2scm.py can be downloaded from here. The individual components can be downloaded from here.
As long as you have python, there is no installation necessary. The script lff2scm.py should just run as it is.
There is no obligatory header, but if you want to use festival phoneme variables (see the festival docs), you should use a 'symbols' section in the lff file. This is the format:
<whitespace># symbols
x <interpretation>
y <interpretation>
z <interpretation>
# variables
X <interpretation>
Y <interpretation>
Z <interpretation>
# end symbols
At the moment <interpretation> is assumed to mean 'one and only one of' some symbol (i.e. whatever symbol or phoneme is given), and ignored. lff2scm does scan for and translate the following qualifying phrases:
lff | scm |
---|---|
one or more of | + |
zero or more of | * |
zero or one of | ? |
Interpretations containing these phrases should be of the following form:
<qual-phrase> [<set>]
Where <set> is a space-separated list of symbols. For example:
V one or more of [i e eh ae ah oh ao uu ih uh ei ai aw oi ow]
This will result in two translations:
The set will be recorded in the scheme header, as:
; Sets used in the rules
(
(V i e eh ae ah oh ao uu ih uh ei ai aw oi ow )
)
In the rules section 'V' will be replaced by 'V +', e.g.:
lff | scm |
---|---|
V[ngh]w=ng h | ( V + [ n g h ] w = ng h ) |
As far as lff2scm is concerned, you can have pretty much anything you like here. lff2scm expects an equals sign between a left-hand-side and a right-hand-side, but that's about it. lff2scm puts a space between the characters and wraps the line in brackets, e.g.:
lff | scm |
---|---|
[angos]=anggos | ( [ a n g o s ] = a n g g o s ) |
[aratoi]!=arato/i | ( [ a r a t o i ] ! = a r a t o / i ) |
#h[en]#=e^n | ( # h [ e n ] # = e ^ n ) |
With the 'to phonemes' option -p, the right-hand-side of each rule is assumed to be a space-separated string of phonemes, e.g.:
lff | scm |
---|---|
V[ngh]w=ng h | ( V [ n g h ] w = ng h ) |
V[nghr]=ng rh | ( V [ n g h r ] = ng rh ) |
V[nghl]=ng lh | ( V [ n g h l ] = ng lh ) |
Calling lff2scm.py with no arguments will output this brief usage reminder:
lff2scm.py: lts rule format converter
Usage:
lff2scm.py (-p) ltsFilename
- If '-p' lff2scm assumes rhs of rules are phonemes,
otherwise, lff2scm assumes rhs of rules are characters.
lhs of rules is always assumed to be characters
- Assumes input is in 'linguist-friendly' format.
- Pipes output to stdout.
$ lff2scm.py newepen.lff > newepen.scm
$ lff2scm.py -p gogwel.lff > gogwel.scm