Welsh and Irish Speech Processing Resources |
|
Author: | Ivan A. Uemlianin |
---|---|
Contact: | i.uemlianin@bangor.ac.uk |
Copyright: | 2006, University of Wales, Bangor |
The following little scripts have all been tarballed for easy download and are available from here. The individual tools can be downloaded here.
This converts line-end formats in a text file between DOS/Windows and Unix formats.
Usage:
python fixeoln_one.py [-d | -u] filename
Use -d or -u to convert to dos or unix respectively.
n.b.: this overwrites the file, so backup if you want to keep your original.
This script was cribbed from the O'Reilly book Programming Python (Lutz, 2001).
This is just a little shell script that uses sox to convert audio files to mono.
# Usage: mono.ch oldfile newfile
# converts oldfile to mono
# outputs to newfile
sox $1 -c1 $2 pick -l
Not really worth making into a script perhaps, but sox is handy. Here is a shortlist of other useful sox niblets:
get the left-hand channel from a stereo signal, and re-sample to 16kHz:
sox stereo/enghraifft.wav -r 16000 -c1 mono/16k/enghraifft.wav avg -l
for f in $(ls stereo); do; sox stereo/$f -r 16000 -c1 mono/$f avg -l; done
This uses docutils to convert a plain text file, written in a modified reStructuredText format (see below) into html.
Usage:
o2r2h.py readme.txt > readme.html
o2r2h.py outputs to stdout so it can form part of a pipe (as shown in the usage example).
reStructuredText is a very lightweight yet expressive markup, it can be used to generate LaTeX and old-style OpenOffice format files as well as html, see the docutils docs. Here is a quick reference for reStructuredText.
The only modification is headings. Whereas reStructuredText (rst) uses various underlinings for headings, we use line-initial asterisks to indicate headings level (as in emacs' default outline-mode headings).
Get character encoding working properly: user should be able to specify an encoding (e.g., utf-8) on the command-line.
A handy little script to rename files. It changes specified parts of file names. It makes the change to all applicable files in the current directory.
Usage:
renameFiles.py [-d] old new
If you set the -d switch, renameFiles will display the effects without actually making the changes.
For example renameFiles.py eg example_ will make the following changes:
old | new |
---|---|
eg1.wav | example_1.wav |
eg234.html | example_234.html |
egtoday.doc | example_today.doc |
Called with a single argument, renameFiles.py will delete that string from all filenames in the current directory, e.g., renameFiles.py test_ will make the following changes:
old | new |
---|---|
test_1.wav | 1.wav |
test_234.html | 234.html |
test_today.doc | today.doc |
renameFiles will work with regular expressions. For example, renameFiles2.py 'bob03(\d)' 'susan00\1' will make the following changes:
old | new |
---|---|
bob025.wav | bob025.wav |
bob026.wav | bob026.wav |
bob027.wav | bob027.wav |
bob028.wav | bob028.wav |
bob029.wav | bob029.wav |
bob030.wav | susan000.wav |
bob031.wav | susan001.wav |
bob032.wav | susan002.wav |
bob033.wav | susan003.wav |
bob034.wav | susan004.wav |
bob035.wav | susan005.wav |
Notes on using regular expressions:
Lists all words in a file.
Usage:
wlist.py fn
This lists the words in fn to stdout. It also creates files fstem.alist of possible acronyms and fstem.nlist of words containing numbers (where fstem is fn without its extension).
This is an ancient script written for a slightly different purpose (i.e., extracting information from sphinxTrain transcription files). Update it for current purposes.