Emitters

Understanding and using emission models in HidTen.

Emitters are components that compute emission probabilities or scores for observations given hidden states. They define how likely each observation is to be emitted from each state.

A hidten.hmm.HMM can have multiple emitters, each responsible for a different aspect of the observation space. New emitters can be added with add_emitter(). When using multiple emitters, multiple input tracks must be provided to the Algorithms for Inference and Training. The input tracks must match the order of emitters.

Categorical

hidten.tf.emitter.categorical.TFCategoricalEmitter handles discrete observations with categorical distributions. As everything in HidTen, CategoricalEmitters support multiple heads, i.e. parallel models that use their own set of parameters and states.

In the following example, a CategoricalEmitter is created for a discrete alphabet {0, 1, 2, 3} and 2 HMM heads with 3 states each:

from hidten.tf import TFHMM
from hidten.tf.categorical import TFCategoricalEmitter

hmm = TFHMM(states=[3, 3])

# Create categorical emitter for discrete alphabet {0, 1, 2, 3}
emitter = TFCategoricalEmitter()
hmm.add_emitter(emitter)
emitter.initializer = [
    # head 1
    0.25, 0.25, 0.25, 0.25,  # state 1
    0.4, 0.2, 0.2, 0.2,      # state 2
    0., 0.5, 0.5, 0.,        # state 3
    # head 2
    0.3, 0.3, 0.2, 0.2,      # state 1
    0.5, 0.1, 0.2, 0.2,      # state 2
    0., 0.4, 0.4, 0.2,       # state 3
]

hmm.build((None, None, 4))

print(hmm.emitter[0].matrix())

Continuous

hidten.tf.emitter.multivariate_normal.TFMVNormalEmitter handles continuous observations with multivariate normal distributions.

from hidten.tf import TFHMM
from hidten.tf.multivariate_normal import TFMVNormalEmitter

hmm = TFHMM(states=[2, 2])

# Create categorical emitter for discrete alphabet {0, 1, 2, 3}
emitter = TFMVNormalEmitter()
hmm.add_emitter(emitter)
emitter.initializer = [
    # head 1, state 1
    # means
    0.0, 0.0, 0.0,
    # variances
    0.9, 0.5, 1.0,
    # head 1, state 2
    # means
    0.5, 0.4, 0.6,
        # variances
    1.2, 0.5, 1.0,
    # head 2, state 1
    # means
    1.0, 2.0, 3.0,
        # variances
    0.9, 0.7, 1.0,
    # head 2, state 2
    # means
    0.7, 0.2, 0.3,
        # variances
    1, 1, 1,
]

hmm.build((None, None, 3))

print(hmm.emitter[0].matrix())

Padding

When the input sequences have variable length they can be padded with zeros. A hidten.tf.emitter.base.TFPaddingEmitter must be added to the model that expects the binary padding as a separate input track, where False or 0 indicates a padding position. Padding positions will be ignored in Algorithms for Inference and Training.

Note: HidTen only supports padding sequences to the right.

The following code creates an HMM with 2 states and passes an input sequence with padding.

import numpy as np

from hidten import HMMMode
from hidten.tf import TFHMM
from hidten.tf.categorical import TFCategoricalEmitter
from hidten.tf.emitter import TFPaddingEmitter

hmm = TFHMM(states=2)
hmm.add_emitter(TFCategoricalEmitter())
hmm.emitter[0].allow = [(0, 0), (1, 1)]
hmm.add_emitter(TFPaddingEmitter())
observations = np.array([
    [ [1., 0], [1, 0], [0, 1], [0, 0] ],
    [ [0., 1], [1, 0], [1, 0], [1, 0] ]
])
padding = np.array([[1, 1, 1, 0], [1, 1, 1, 1]])
posterior = hmm(observations, padding, mode=HMMMode.POSTERIOR)
<tf.Tensor: shape=(2, 4, 1, 3), dtype=float32, numpy=
array([[[[1., 0., 0.]],

        [[1., 0., 0.]],

        [[0., 1., 0.]],

        [[0., 0., 1.]]],


    [[[0., 1., 0.]],

        [[1., 0., 0.]],

        [[1., 0., 0.]],

        [[1., 0., 0.]]]], dtype=float32)>

Under the hood, the PaddingEmitter creates a new state (last channel) that will be used only for padding positions. Notice how the previously defined HMM has 2 states, but HMMMode.POSTERIOR returns shape […, 3].

Allowing and sharing

Emissions of specific symbols at specific states can be allowed or disallowed using the hidten.emitter.Emitter.allow property.

With the hidten.emitter.Emitter.share property, it can be defined that certain, consecutive states share the same emission parameters.

In the following example, an emitter is created that only allows specific symbols at specific states and shares parameters between some of them:

from hidten import HMMConfig
from hidten.tf.categorical import TFCategoricalEmitter

emitter = TFCategoricalEmitter()
emitter.hmm_config = HMMConfig(states=[3, 2])

emitter.allow = [
    # head 1
    (0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 0), (0, 1, 2), (0, 2, 3),
    # head 2
    (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1), (1, 1, 2)
]

# this shares the parameters of the three edges exiting state 1 in head 1
# and the parameters of the two edges exiting state 2 in head 1
emitter.share = [(0, 3), (3, 5)]

emitter.initializer = [
    0.2, # shared 3 times, but will be rescaled to sum to one
    0.5, # shared 2 times
    1.,
    0.3, 0.7,
    0.2, 0.3, 0.5
]

emitter.build((None, None, 4))

print(emitter.matrix())
tf.Tensor(
[[[0.33333334 0.33333334 0.33333334 0.        ]
[0.5        0.         0.5        0.        ]
[0.         0.         0.         1.        ]]

[[0.29999998 0.7        0.         0.        ]
[0.2        0.3        0.5        0.        ]
[0.         0.         0.         0.        ]]], shape=(2, 3, 4), dtype=float32)

Side note: In this example we have demonstrated how to use an emitter without it being part of an HMM. This can be done by providing an hidten.hmm.HMMConfig.

Creating Custom Emitters

To create custom emitters, inherit from hidten.tf.emitter.base.TFEmitter:

from hidten.tf.emitter import TFEmitter

class CustomEmitter(TFEmitter):
    ...

TODO: This section is incomplete.