How to get input symbols from a fst model using the OpenFST python extension?

1.7k Views Asked by At

OpenFST provides a python extension. Is ist possible to read input symbols (isyms) from a compiled FST model using that API?

I can't find the right property or method to do that:

>>> import fst
>>> f = fst.Fst('/home/jan/Downloads/en_us_nostress/model.fst')
>>> dir(f)
['__class__', '__delattr__', '__doc__', '__format__',
'__getattribute__', '__hash__', '__init__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '_arc_type', '_check_mutating_imethod',
'_fst_type', '_weight_type', 'arcsort', 'closure', 'concat',
'connect', 'decode', 'encode', 'invert', 'minimize', 'project',
'properties', 'prune', 'push', 'relabel', 'reweight', 'rmepsilon',
'topsort', 'union', 'verify', 'write']
>>> dir(fst)
['ACCEPTOR','ACCESSIBLE', 'ACYCLIC', 'ADD_ARC_PROPERTIES',
'ADD_STATE_PROPERTIES', 'ADD_SUPERFINAL_PROPERTIES', 
'ARC_SORT_PROPERTIES', 'BINARY_PROPERTIES', 'COACCESSIBLE',
'COPY_PROPERTIES', 'CYCLIC', 'DELETE_ARC_PROPERTIES', 
'DELETE_STATE_PROPERTIES', 'EPSILONS', 'ERROR', 'EXPANDED', 
'EXTRINSIC_PROPERTIES', 'FST_PROPERTIES', 'Fst',
'FstError', 'FstWarning', 'INITIAL_ACYCLIC', 'INITIAL_CYCLIC',
'INTRINSIC_PROPERTIES', 'I_DETERMINISTIC', 'I_EPSILONS',
'I_LABEL_INVARIANT_PROPERTIES', 'I_LABEL_SORTED', 'MUTABLE',
'NEG_TRINARY_PROPERTIES', 'NON_I_DETERMINISTIC',
'NON_O_DETERMINISTIC', 'NOT_ACCEPTOR', 'NOT_ACCESSIBLE',
'NOT_COACCESSIBLE', 'NOT_I_LABEL_SORTED', 'NOT_O_LABEL_SORTED',
'NOT_STRING', 'NOT_TOP_SORTED', 'NO_EPSILONS', 'NO_I_EPSILONS',
'NO_O_EPSILSONS', 'NULL_PROPERTIES', 'O_DETERMINISTIC', 'O_EPSILONS',
'O_LABEL_INVARIANT_PROPERTIES', 'O_LABEL_SORTED',
'POS_TRINARY_PROPERTIES', 'RM_SUPERFINAL_PROPERTIES',
'SET_ARC_PROPERTIES', 'SET_FINAL_PROPERTIES', 'SET_START_PROPERTIES',
'STATE_SORT_PROPERTIES', 'STRING', 'TOP_SORTED',
'TRINARY_PROPERTIES', 'UNWEIGHTED', 'WEIGHTED',
'WEIGHT_INVARIANT_PROPERTIES', 'Weight', '_Fst', '_MutableFst',
'__builtins__', '__doc__', '__file__', '__name__', '__package__',
'__pyx_capi__', '__test__', '_fst_error_fatal_old',
'_get_compose_filter', '_get_queue_type', '_get_rand_arc_selection',
'_get_replace_label_type', 'arcmap', 'compose', 'convert',
'determinize', 'difference', 'disambiguate', 'epsnormalize',
'equal', 'equivalent', 'intersect', 'isomorphic', 'prune', 'push', 'randequivalent', 'randgen', 'replace', 'reverse', 'rmepsilon',
'shortestdistance', 'shortestpath', 'synchronize']
2

There are 2 best solutions below

1
On BEST ANSWER

I suspect that you're using version 1.5.0 of OpenFST in which that feature was unavailable. I believe this was added in version 1.5.1 or so. http://www.openfst.org/twiki/bin/view/News/FstNews

I'm using 1.5.3 and am able to access the symbol tables with input_symbols and output_symbols.

>>> dir(fst._Fst)
    ['__class__', '__delattr__', '__doc__', '__format__',
    '__getattribute__', '__hash__', '__init__', '__new__',
    '__pyx_vtable__', '__reduce__', '__reduce_ex__', '__repr__',
    '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
    '_repr_svg_', 'arc_type', 'arcs', 'copy', 'draw', 'final',
    'fst_type', 'input_symbols', 'num_arcs', 'num_input_epsilons',
    'num_output_epsilons', 'output_symbols', 'properties', 'start',
    'states', 'text', 'verify', 'weight_type', 'write']

There's a little more information available here http://www.openfst.org/twiki/bin/view/FST/PythonExtension under "FST object attributes and properties"

0
On

In OpenFST 1.7.6, one can extract both input and output symbol tables as an option of FSTPRINT. Here is an edited version of the options from the HELP for that command. The FST file will print to screen but you'll find the symbol table(s) in the file(s) you specify.

Update: To avoid a lengthy screen print just send the output of the FSTPRINT to a scratch file such as "junk.txt" that you can delete while saving the symbol table file. See the added example at end of code section.

fstprint --help
Prints out binary FSTs in simple text format.

  Usage: c:\OpenFST\fstprint.exe [binary.fst [text.fst]]

  Flags Description:
    .
    .
    .
    .
    --save_isymbols: type = string, default = ""
      Save input symbol table to file
    --save_osymbols: type = string, default = ""
      Save output symbol table to file
   .
   .
   .

Usage example that works in Windows 10 at Powershell command prompt:

 fstprint --save_osymbols=HCLout.sym HCL.fst junk.txt

HCLout.sym will be the output symbol table from HCL.fst and junk.txt will be the contents of the binary fst in simple text format, which you can delete.