Priors
A hidten.prior.Prior can by attachd to any hidten.generic.Module.
A prior rates the matrix() of its module and returns prior_scores that describe how likely the parameters are.
In the following example we’ll train a model that contains an HMM with a categorical
emitter. We’ll attach a hidten.tf.prior.dirichlet.TFDirichletPrior to the emitter.
Lets assume with know a priori that the first symbol is much more likely to be observed than
the others and that we are in a situation where this is not necessarily reflected in the training data.
We’ll use the concentration parameters of the Dirichlet prior to encode this knowledge.
First, we define a model.
import numpy as np
import tensorflow as tf
from hidten import HMMMode
from hidten.tf import TFHMM
from hidten.tf.prior.dirichlet import TFDirichletPrior
class HMMModel(tf.keras.Model):
def __init__(self, use_prior: bool=False) -> None:
super().__init__()
self.hmm = TFHMM(states=4)
self.hmm.emitter[0].initializer = tf.keras.initializers.GlorotNormal()
self._use_prior = use_prior
if use_prior:
prior = TFDirichletPrior()
# concentration parameters are shared between all states in the head
prior.share = list(range(3)) * 4
# a priori we expect high concentration on the first symbol
prior.initializer = [100]+[0.1]*2
self.hmm.emitter[0].prior = prior
self.hmm.transitioner.allow = [
(0, 0, 0), (0, 1, 1),
(0, 0, 1), (0, 1, 2),
(0, 2, 2),
(0, 2, 3),
(0, 3, 3),
(0, 3, 1),
]
self.hmm.transitioner.share = [(0, 2), (2, 4)]
self.hmm.transitioner.values = [0.6, 0.4, 0.8, 0.2, 0.1, 0.9]
self.out = self.add_weight(
shape=(1, 4, 5),
initializer=tf.keras.initializers.GlorotNormal(),
)
def build(self, input_shape: tuple[int | None, ...]) -> None:
self.hmm.build(input_shape)
def call(self, x: tf.Tensor) -> tf.Tensor:
x = tf.nn.softmax(x) # to get hmm outputs that are not nan
hmm_out = self.hmm(x, mode=HMMMode.POSTERIOR, parallel=25)
if self._use_prior:
prior_log_pdf = self.hmm.prior_scores()
prior_loss = -prior_log_pdf
self.add_loss(prior_loss)
return tf.einsum("bthd,hdo->bto", hmm_out, self.out)
We train the model on random data - clearly this data does not reflect the true distribution we want to learn. However, the Dirichlet prior should help guide the learning process.
ds = tf.data.Dataset.from_tensor_slices((
np.random.normal(size=(64, 1000, 3)),
np.random.randint(0, 5, size=(64, 1000)),
))
ds = ds.batch(32)
model = HMMModel(use_prior=True)
model.build((None, None, 3))
model.compile(
# larger learning rate to see some effect of the prior in a very short
# training
optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
loss= tf.losses.SparseCategoricalCrossentropy(from_logits=True),
jit_compile=True,
)
before_training = model.hmm.emitter[0].matrix()
model.fit(ds, epochs=3)
after_training = model.hmm.emitter[0].matrix()
Before training, the emission matrix was:
<tf.Tensor: shape=(1, 4, 3), dtype=float32, numpy=
array([[[0.35459444, 0.31414294, 0.33126262],
[0.37932956, 0.4167584 , 0.20391206],
[0.2959837 , 0.31581268, 0.38820365],
[0.2735348 , 0.53488946, 0.19157575]]], dtype=float32)>
After training we can observe a shift towards the first symbol:
<tf.Tensor: shape=(1, 4, 3), dtype=float32, numpy=
array([[[0.64187217, 0.17431244, 0.18381536],
[0.66569066, 0.22450018, 0.10980911],
[0.5790523 , 0.18882595, 0.2321217 ],
[0.5522496 , 0.32969874, 0.11805164]]], dtype=float32)>