WISPR - Welsh and Irish Speech Processing Resources
Welsh and Irish Speech Processing Resources

          

pyhtk: A python wrapper for HTK

Author: Ivan A. Uemlianin
Contact: i.uemlianin@bangor.ac.uk
Copyright: 2006, University of Wales, Bangor

Contents

Introduction

What is HTK? What is PyHTK?: TODO

Requirements

The only requirements are python, HTK and SpeechCluster. Note that registration is required for HTK, but the software is free of charge and models built with HTK can be commercialised.

Download

pyhtk can be downloaded from here. The individual components can be downloaded from here.

Further Work

A list of useful feature adds in no particular order:

  • Integrate iteration into pyhtk (i.e. instead of using the horrible hack utils/pyhtkIter.sh).
  • Test pyhtk in more challenging contexts (e.g. recognition beyond forced alignment, ATK HTS).

Usage

Getting Help

Typing pyhtk.py -h will get the following usage summary:

pyhtk Usage summary
===================

pyhtk.py [options]

Option Function
=========== ========
-s Sets up for building an acoustic model
-b Sets up and builds an acoustic model
-a <hmmdir> Forced alignment using the hmm in <hmmdir>
-c1 Clean up log and model files ready for another build
-c2 Clean up everything except model files

Building an Acoustic Model

  1. Make a directory for your HTK Project, and copy pyhtk.py and your wav and lab files into it. For example, using the bash shell:
$ mkdir <myHTKProject>
$ cp pyhtk.py <myHTKProject>
$ cd <myHTKProject>
$ mkdir wav
$ cp /path/to/wav/files wav
$ mkdir lab
$ cp /path/to/lab/files lab

n.b.:

  • The wav files can be any sample-rate you like, as long as they're all the same.
  • The lab files can be any format (at least any format supported by SpeechCluster; see the SpeechCluster docs for details, but it includes esps, TextGrid and htk-lab formats); they don't all have to be the same format.
  1. Run pyhtk.py. These are the options:

    pyhtk.py -s

    This will set things up, but not actually build the AM. This is useful if you want to check your setup.

    pyhtk.py -b

    This will set things up and build the model.

  2. Logfiles in html format are saved in log/. There are tables of contents, and errors are highlighted in red.

Forced Alignment

TODO: What is forced alignment?

  1. Given a fresh, clean HTK AM, as built by pyhtk.py, copy pyhtk.py and your wav and lab files into it. For example, using the bash shell:
$ cp pyhtk.py <myHTK_AM>
$ cd <myHTK_AM>
$ mkdir wav
$ cp /path/to/wav/files wav
$ mkdir lab
$ cp /path/to/lab/files lab

n.b.:

  • The wav files can be any sample-rate you like, as long as they're all the same.
  • The lab files can be any format (at least any format supported by SpeechCluster; see the SpeechCluster docs for details, but it includes esps, TextGrid and htk-lab formats); they don't all have to be the same format.
  1. Run pyhtk.py. These are the options:
    pyhtk.py -a <hmmdir>
    Where <hmmdir> is the directory of the hmm you want to use. This will set things up and do the alignment.


  2. Logfiles in html format are saved in log/. There are tables of contents, and errors are highlighted in red.
  3. Results are saved in results/.

Recognition: TODO

Advanced: TODO

Cleaning up afterwards

pyhtk.py -c1
Cleans up everything apart from the wav and lab files, ready to have another go. n.b.: will remove a model if one has been built.
pyhtk.py -c2
Cleans up everything apart from the model itself (i.e., including wavs and labs).

Windows

I haven't tested this on Windows, but pyhtk.py should work there as long as:

On Windows it should even be pointyclicky: double-clicking pyhtk.py should run it. If you rename it to pyhtk.pyw, you won't get the annoying Command Prompt window that pops up otherwise.