WISPR - Welsh and Irish Speech Processing Resources
Welsh and Irish Speech Processing Resources

          

Utils README

Author: Ivan A. Uemlianin
Contact: i.uemlianin@bangor.ac.uk
Copyright: 2006, University of Wales, Bangor

Contents

Download

The following little scripts have all been tarballed for easy download and are available from here. The individual tools can be downloaded here.

fixeoln_one.py

This converts line-end formats in a text file between DOS/Windows and Unix formats.

Usage:

python fixeoln_one.py [-d | -u] filename

Use -d or -u to convert to dos or unix respectively.

n.b.: this overwrites the file, so backup if you want to keep your original.

This script was cribbed from the O'Reilly book Programming Python (Lutz, 2001).

mono.sh

This is just a little shell script that uses sox to convert audio files to mono.

# Usage: mono.ch oldfile newfile
# converts oldfile to mono
# outputs to newfile

sox $1 -c1 $2 pick -l

Not really worth making into a script perhaps, but sox is handy. Here is a shortlist of other useful sox niblets:

o2r2h.py

This uses docutils to convert a plain text file, written in a modified reStructuredText format (see below) into html.

Usage:

o2r2h.py readme.txt > readme.html

o2r2h.py outputs to stdout so it can form part of a pipe (as shown in the usage example).

reStructuredText and modifications

reStructuredText is a very lightweight yet expressive markup, it can be used to generate LaTeX and old-style OpenOffice format files as well as html, see the docutils docs. Here is a quick reference for reStructuredText.

The only modification is headings. Whereas reStructuredText (rst) uses various underlinings for headings, we use line-initial asterisks to indicate headings level (as in emacs' default outline-mode headings).

TODO

Get character encoding working properly: user should be able to specify an encoding (e.g., utf-8) on the command-line.

renameFiles.py

A handy little script to rename files. It changes specified parts of file names. It makes the change to all applicable files in the current directory.

Usage:

renameFiles.py [-d] old new

If you set the -d switch, renameFiles will display the effects without actually making the changes.

For example renameFiles.py eg example_ will make the following changes:

old new
eg1.wav example_1.wav
eg234.html example_234.html
egtoday.doc example_today.doc

Called with a single argument, renameFiles.py will delete that string from all filenames in the current directory, e.g., renameFiles.py test_ will make the following changes:

old new
test_1.wav 1.wav
test_234.html 234.html
test_today.doc today.doc

Regular expressions

renameFiles will work with regular expressions. For example, renameFiles2.py 'bob03(\d)' 'susan00\1' will make the following changes:

old new
bob025.wav bob025.wav
bob026.wav bob026.wav
bob027.wav bob027.wav
bob028.wav bob028.wav
bob029.wav bob029.wav
bob030.wav susan000.wav
bob031.wav susan001.wav
bob032.wav susan002.wav
bob033.wav susan003.wav
bob034.wav susan004.wav
bob035.wav susan005.wav

Notes on using regular expressions:

  • renameFiles uses regular expressions in the Python syntax, which is probably the same as the one you're using. Rememer you can display the effects of any changes with the -d switch.
  • arguments may have to be quoted if they contain special characters (as in the example above).

wlist.py

Lists all words in a file.

Usage:

wlist.py fn

This lists the words in fn to stdout. It also creates files fstem.alist of possible acronyms and fstem.nlist of words containing numbers (where fstem is fn without its extension).

TODO

This is an ancient script written for a slightly different purpose (i.e., extracting information from sphinxTrain transcription files). Update it for current purposes.