API Reference

HidTen: A framework for combining hidden Markov models with modern deep learning.

This page provides a complete reference to all public classes, functions, and modules in HidTen.

Core Module

The main hidten module exports the primary classes and functions.

HMM Classes

Core HMM implementation with configuration and modes.

class hidten.hmm.HMM

Bases: Generic[T_Tensor]

A modular hidden Markov model (HMM).

add_emitter(emitter)

Parameters:: emitter (Emitter[T_Tensor])
Return type:: None

config: HMMConfig

emission_scores(*observations)

Computes the joint emission scores for all emitters.

Parameters:: observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension. It is possible to also supply a tensor of shape (B, T, D), if all heads should receive the same input.
Returns:: The joint emission scores of shape (B, T, H, Q).
Return type:: Tensor

property emitter: list[Emitter[T_Tensor]]

abstractmethod joint_log_prob(*observations, states)

Computes the joint log-probability of the observations o and the states x, i.e. P(o, x).

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM moheadsdels and D_i is the feature dimension.
states (Tensor) – The hidden state indices of shape (B, T, H).

Returns:

The joint log-probability of the observations and the states of shape (B, H).

Return type:

Tensor

prior_scores()

Calculates the prior scores for the HMM.

Returns:: The prior scores of shape (H), where H is the number of heads.
Return type:: Tensor

abstractmethod sample(B, T)

Samples a number of tensors from the HMM by repeatedly sampling states based on possible state transitions and afterward sampling possbile emissions from these states.

Parameters:

B (int) – Number of sequences to sample.
T (int) – Number of time steps to sample.

Returns:

State tensor of shape (B, T, H, Q) and emission tensors (as many as emitters in the HMM) of shape (B, T, H, D), where Q is the number of states and D is the size of the emission alphabet.

Return type:

tuple[tf.Tensor, tuple[tf.Tensor, ...]]

property transitioner: Transitioner[T_Tensor]

use_padding()

Returns true if the HMM support variable length inputs.

Return type:: bool

class hidten.hmm.HMMConfig(*, states, heads=None, max_states=None)

Bases: ModelConfig

The basic configuration for any HMM.

Parameters:

states (Sequence[int])
heads (int)
max_states (int)

heads: int: Number of heads in the HMM. If not specified, can be inferred by the length of states. Defaults to 1.

max_states: int: Maximal number of states across all HMM heads.

model_config = {'frozen': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

states: Sequence[int]: The number of states in each head of the HMM. Can also be given as a single integer and is automatically expanded to the given number of heads.

classmethod validate_config(values)

class hidten.hmm.HMMMode(*values)

Bases: Enum

Collection of modes for an HMM. Changing the mode leads to a different behaviour when calling an hidten.HMM as a function by __call__.

BACKWARD_LOG = 2: Enables calculation of the logarithmic backward variables.

EMISSION_SCORES = 7: Only outputs the scores of the given emissions without utilizing any transitions of the model.

FORWARD_LOG = 0: Enables calculation of the logarithmic forward variables.

FORWARD_SCALED = 1: Enables calculation of the scaled forward variables.

LIKELIHOOD_LOG = 3: Enables calculation of the log-likelihood.

MEA = 6: Enables calculation of MEA state sequences.

POSTERIOR = 4: Enables calculation of posterior state distributions.

VITERBI = 5: Enables calculation of Viterbi state sequences.

Configuration

Base configuration classes and decorators for model setup.

class hidten.config.ModelConfig

Bases: BaseModel

A base model configuration class that new config classes should inherit from. A method self.validate_config can be defined where value errors can be thrown or fields can be updated.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

hidten.config.with_config(config_class, overwrite_init=False)

Decorator for any class that requires a number of simple configuration attributes. Expects the decorated class to have a ModelConfig object of name config. Adds methods from_config and get_config to the class, as well as an attribute config_cls that is the type of the given config.

Example use case: ```python class MyClassConfig(ModelConfig):

n: int p: float

@with_config(MyClassConfig) class MyClass:

def __init__(self, **kwargs) -> None:
self.config = self.config_cls(**kwargs)

self.other_init_attr = 62 * self.config.n

```

Parameters:

config_class (type) – Class that inherits from ModelConfig.
overwrite_init (bool, optional) – If set to True, overwrites the __init__ method of the decorated class to automatically create the self.config attribute. A new method called self.post_config_init can be implemented that is called at the end of the overwritten __init__. Defaults to manual __init__ implementation.

Return type:

Callable[[type[T_Model]], type[T_Model]]

Emitters

Emission models that map observations to feature values.

class hidten.emitter.Emitter

Bases: Module[T_Tensor]

The Emitter base class.

An emitter is a module that maps sequences of observations to feature values. The feature values decribe how likely each observation is to be emitted by each hidden state.

property clip_max: float: Clips the calculated emission scores to the given maximum.

property clip_min: float: Clips the calculated emission scores to the given minimum.

abstractmethod emission_scores(observations)

Computes how likely each observation is emitted by any hidden state.

Parameters:

observations (Tensor) – The observation sequences of shape (B, T, D) or (B, T, H, D), where D is the size of the emission alphabet.

Returns:

A state tensor of shape (B, T, H, Q), where: Q is the number of hidden states.

Return type:

Tensor

abstractmethod matrix()

Calculates the emission matrix based on parameters of this class.

Returns:

The emission matrix of shape (H, Q, K) where: H is the number of heads, Q is the number of hidden states and K is the matrix_dim.

Return type:

Tensor

abstractmethod sample(state)

Samples a possible emission given the current state.

Parameters:: state (Tensor) – Tensor of shape (B, H, Q), where B is a batch dimension, H is the number of heads of the HMM and Q is the number of states.
Returns:: Emission tensor of shape (B, H, Q).
Return type:: Tensor

abstractmethod state_emissions(states)

Computes how likely any observation is emitted by the given hidden states.

Parameters:

states (Tensor) – The input state sequences of shape (B, T, H, Q), where Q is the number of hidden states.

Returns:

The emission distributions of shape: (B, T, H, D), where D is the size of the emission alphabet.

Return type:

Tensor

class hidten.emitter.PaddingEmitter

Bases: Emitter[T_Tensor]

A specialized emitter to handle padding tracks. Adding this emitter makes an HMM support variable-length sequences. Accepts padding as a binary input track where “true” or “1” indicates a non-padded position that should be used in calculations.

property allow: ndarray: Not supported for PaddingEmitter.

abstractmethod emission_scores(observations)

Returns emission scores with the following scheme:

input 0 | input 1

—————————|——————–
padding emission state | 1 | 0 all other states | 0 | 1

Parameters:

observations (Tensor) – The observation sequences of shape (B, T, D) or (B, T, H, D), where D is the size of the emission alphabet.

Returns:

A state tensor of shape (B, T, H, Q), where: Q is the number of hidden states including an internal, appended padding emission state.

Return type:

Tensor

property share: ndarray | None: Not supported for PaddingEmitter.

Transitioners

Transition models that handle state-to-state transitions.

class hidten.transitioner.TransitionMode(*values)

Bases: IntFlag

Collection of modes for the transitioner step.

ALLOWED = 8: Instead of the regular transition matrix, uses a matrix that contains ones for allowed transitions and zeros otherwise.

LOG_SUM_EXP = 2: Calculates the log-sum-exp over the latent states.

MAX = 4: Computes the maximum over the latent states.

REVERSE = 16: Computes steps in reverse order.

SUM = 1: Sums over the latent states.

class hidten.transitioner.Transitioner

Bases: Module[T_Tensor]

The Transitioner base class.

A transitioner is a module that implements a discrete time step of a Markov chain for a given number of heads. The state graph of each head can be restricted in the number of states and the allowed transitions between states.

property allow_start: ndarray

Restrict the starting distribution by only allowing the given states. If this method is not called before the model is built, all states are allowed.

Returns:

Array of pairs (h, i), where: i is an allowed starting state for head h. If only tuples of one index (i, ) or integers i are given, they are assumed to be true in each head.

Return type:

indices (np.ndarray)

property hmm_config: HMMConfig

abstractmethod matrix(transition_delta=None)

Computes the state transition matrices for each model.

Parameters:

transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).

Returns:

The state transition probabilities of shape: (H, Q, Q) (or (B, T, H, Q, Q) if transition_delta is provided).

Return type:

Tensor

property mode: TransitionMode

prior_scores()

Calculates the prior scores for the modules’ parameters in log-scale.

Returns:

The prior scores of shape (H), where H: is the number of heads.

Return type:

Tensor

property prior_start: Prior[T_Tensor] | None

abstractmethod sample(state=None)

Samples a possible next state given the previous.

Parameters:: state (Tensor, optional) – Tensor of shape (B, H, Q), where B is a batch dimension, H is the number of heads of the HMM and Q is the number of states. If no state is supplied, the Transitioner uses the start distribution with B=1.
Returns:: Next state tensor of shape (B, H, Q).
Return type:: Tensor

property share_start: list[int]

Allows to share parameters of the start distribution.

Parameters:: indices (list of ranges or indices) – A sequence of index pairs (a,b) where a and b are indices of the tuples in Transitioner.allow_start. It has to be of the same length as the initial values given for the start distribution. Alternatively, it can be a list of indices of the same length as allow_start. These indices are then used to mark which value in the given initial values will be used.

abstractmethod start_dist()

Computes the initial distribution for each model.

Returns:: The start distributions of shape (H, Q).
Return type:: Tensor

abstractmethod step(x, t=None)

Performs a single time step of the Markov chain.

Parameters:

x (Tensor) – The input tensor of shape (H, B, Q). B is the batch size, H is the number of heads and Q is the number of states.
t (int, optional) – The time step index. This is only needed for transitioners that use transition deltas.

Returns:

The output tensor of shape (H, B, Q).

Return type:

Tensor

Priors

Prior distributions for model parameters.

class hidten.prior.Prior

Bases: Module[T_Tensor]

The prior base class.

A prior is a module takes as input a tensor of shape (…, Q, K), where Q is the number of hidden states and K is a kernel size - representing the parameters of another module. The prior rates this parameter set, usually in a probabilistic way, and returns prior_scores that rate how likely the parameters are.

abstractmethod prior_scores(values)

Computes prior scores for the input values.

Parameters:

values (Tensor) – The input values of shape (H, Q, D), where D is the prior feature dimension.

Returns:

The prior scores of shape: (H) with prior scores for each head summed over the states.

Return type:

Tensor

Generic Base Classes

Abstract base classes for modules and distributions.

class hidten.generic.Module

Bases: Generic[T_Tensor]

Base class for all modules in the HMM framework.

__call__(*observations)

Calls the module with the given observations and returns the result.

Parameters:: observations (Tensor or tuple of Tensors) – The input sequence(s). Each tensor i should be broadcastable to the shape (N_1, .., N_k, D_{in}^{(i)}), i.e. they agree on all dimensions except for the last dimension D_{in}^{(i)}, the individual feature dimensions.
Returns:: Output of shape (N_1, .., N_k, D_{out}).
Return type:: Tensor

property allow: ndarray

Restrict the matrix of the module by only allowing a subset of the matrix entries to be modeled. If this method is not called before the model is built, a dense matrix is modeled (everything allowed).

Returns:

Array of triplets (h, i, j),: where (i, j) is an allowed matrix entry of state i to column j for some head index h. For emitters, j refers to an symbol in the emission alphabet, for transitioners j refers to another state. If pairs (i, j) are given, they are assumed to be the same for all available heads. Therefore, they are internally expanded to triplets. The order of indices given remains throughout the Module. If an initializer is specified, values in this intializer always correspond to the (possibly automatically) expanded triplets.

Return type:

indices (np.ndarray)

property heads: int: The number of heads in the HMM configuration.

property hmm_config: HMMConfig

abstractmethod matrix()

Calculates a matrix based on a parameter kernel for this module. This could be a transition matrix or a matrix describing emission distributions.

Returns:

The emission matrix of shape (H, Q, R) where: H is the number of heads, Q is the number of hidden states and R is a feature dimension.

Return type:

Tensor

property max_states: int: The maximum number of states across all heads.

property prior: Prior[T_Tensor] | None

prior_scores()

Calculates the prior scores for the modules’ parameters in log-scale.

Returns:

The prior scores of shape (H), where H: is the number of heads.

Return type:

Tensor

property share: ndarray | None

Defines if some parameters of the module are shared among different states.

Parameters:

indices (list of ranges or list of indices or array of indices) – A sequence of index pairs (a,b) where a and b are indices of the tuples in allow. It has to be of the same length as the initial values given for the kernel. Alternatively, it can be a list of indices of the same length as allow. These indices are then used to mark which value in the given initial values will be used.

Returns:

Array of indices of the same length as: allow, where indices in share mark which values in the parameter kernel are shared.

Return type:

share (np.ndarray)

property states: list[int]: The number of states for each head.

Utilities

Utility functions and helper classes.

hidten.util.expand_indices(indices, inner_length, max_head, max_index=None)

Expands tuples in a given list to the given inner_length. All tuples should be of inner_length or inner_length-1. Shorter tuples (i_1, ..., i_p) are expanded to (j, i_1, ..., i_p), where j=1, ..., max_head. They are expanded in their original position, shifting later specified tuples by a number of max_head-1.

Parameters:

indices (sequence of sequence of int) – The sequence of tuples of integers to expand. The inner sequence can also be an integer, which is internally replaced by a tuple of length one.
inner_length (int) – Target length of the given tuples.
max_head (int) – Maximal first index in the tuples.
max_index (list[int | list[int]], optional) – List of maximal possible indices (i_1, ..., i_p) to use over different heads j. If the maximal indices are lists themselves, they are understood per-axes. For example, max_index=[(3, 5), (2, 3)] are maximal indices for two heads, in which the first head has a tensor of shape (3, 5). Defaults to no index checking.

Return type:

list[tuple[int, …]]

hidten.util.expand_share(share, N)

Converts a given representation of shared parameters to the internally used representation.

Parameters:

share (list of ranges or indices) – If a list of indices is given, it is left unchanged. For a given list of ranges (a,b), the method returns a list of ascending indices of length N where the indices in list[a:b] are the same.
N (int) – Size of the resulting list, which is equal to the number of virtual parameters. With shared parameters, this is larger than the actual number of parameters.

Return type:

list[int]

hidten.util.validate_indices(indices)

Validates the indices. Valid indices are those such that all states have at least one outgoing edge in each head and there are no multi-edges between states of the same head.

Parameters:: indices (list of triplets) – List of triplets (h, i, j), where (i, j) is an allowed connection between states i and j for some head index m of the HMM.
Return type:: None

TensorFlow Backend

The TensorFlow backend provides concrete implementations optimized for TensorFlow.

Core TensorFlow HMM

Main TensorFlow implementation of HMM.

class hidten.tf.hmm.TFHMM(*args, **kwargs)

Bases: Layer, HMM[Tensor], TFViterbiMixin, TFForwardBackwardMixin

The basic configuration for any HMM.

build(input_shape)

Parameters:: input_shape (TensorShape | tuple[int | None, ...] | tuple[tuple[int | None, ...], ...])
Return type:: None

call(*observations, transition_delta=None, mode=HMMMode.FORWARD_LOG, parallel=1, output_dtype=tf.int32)

Calls the layer with an operation defined by its call mode.

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T-1, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
mode (HMMMode) – The call mode of the HMM. Defaults to HMMMode.FORWARD_LOG.
parallel (int) – The parallel factor for the HMM. This is equal to the number of sequence chunks that are processed in parallel. Defaults to 1.
output_dtype (tf.dtypes.DType, optional) – The output dtype for Viterbi and MEA modes. Defaults to tf.int32.

Returns:

A tensor which shape depends on the call mode of: the HMM (see TFForwardBackwardMixin and TFViterbiMixin).

Return type:

Tensor

property emitter: list[TFEmitter]

classmethod from_config(config, custom_objects=None)

Initializes and returns an object of this class with the given configuration dictionary.

Parameters:

config (dict[str, Any])
custom_objects (None)

Return type:

T_Model

get_config()

Returns the config used in the class as a dictionary.

Return type:: dict[str, Any]

joint_log_prob(*observations, states)

Computes the joint log-probability of the observations o and the states x, i.e. P(o, x).

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM moheadsdels and D_i is the feature dimension.
states (Tensor) – The hidden state indices of shape (B, T, H).

Returns:

The joint log-probability of the observations and the states of shape (B, H).

Return type:

Tensor

sample(B, T)

Samples a number of tensors from the HMM by repeatedly sampling states based on possible state transitions and afterward sampling possbile emissions from these states.

Parameters:

B (int) – Number of sequences to sample.
T (int) – Number of time steps to sample.

Returns:

State tensor of shape (B, T, H, Q) and emission tensors (as many as emitters in the HMM) of shape (B, T, H, D), where Q is the number of states and D is the size of the emission alphabet.

Return type:

tuple[tf.Tensor, tuple[tf.Tensor, ...]]

property transitioner: TFTransitioner

TensorFlow Emitters

TensorFlow-specific emission models.

class hidten.tf.emitter.base.TFEmitter(*args, **kwargs)

Bases: TFLayer, Emitter[Tensor]

Base class for TF emitters.

call(emissions, use_padding=False)

Parameters:

emissions (Tensor)
use_padding (bool)

Return type:

Tensor

class hidten.tf.emitter.base.TFPaddingEmitter(*args, **kwargs)

Bases: Layer, PaddingEmitter[Tensor]

TensorFlow implementation of the PaddingEmitter.

call(emissions, use_padding=False)

Parameters:

emissions (Tensor)
use_padding (bool)

Return type:

Tensor

emission_scores(observations)

Returns emission scores with the following scheme:

input 0 | input 1

—————————|——————–
padding emission state | 1 | 0 all other states | 0 | 1

Parameters:

observations (Tensor) – The observation sequences of shape (B, T, D) or (B, T, H, D), where D is the size of the emission alphabet.

Returns:

A state tensor of shape (B, T, H, Q), where: Q is the number of hidden states including an internal, appended padding emission state.

Return type:

Tensor

matrix()

Calculates the emission matrix based on parameters of this class.

Returns:

The emission matrix of shape (H, Q, K) where: H is the number of heads, Q is the number of hidden states and K is the matrix_dim.

Return type:

Tensor

class hidten.tf.emitter.categorical.TFCategoricalEmitter(*args, **kwargs)

Bases: TFEmitter

An emitter for categorical observations.

build(input_shape)

Parameters:: input_shape (TensorShape | tuple[int | None, ...] | tuple[tuple[int | None, ...], ...])
Return type:: None

emission_scores(observations)

Computes how likely each observation is emitted by any hidden state.

Parameters:

observations (Tensor) – The observation sequences of shape (B, T, D) or (B, T, H, D), where D is the size of the emission alphabet.

Returns:

A state tensor of shape (B, T, H, Q), where: Q is the number of hidden states.

Return type:

Tensor

property initializer: Initializer: The initializer of the emission matrix. Defaults to constant values.

matrix()

Calculates the emission matrix based on parameters of this class.

Returns:

The emission matrix of shape (H, Q, K) where: H is the number of heads, Q is the number of hidden states and K is the matrix_dim.

Return type:

Tensor

sample(state)

Samples a possible emission given the current state.

Parameters:: state (Tensor) – Tensor of shape (B, H, Q), where B is a batch dimension, H is the number of heads of the HMM and Q is the number of states.
Returns:: Emission tensor of shape (B, H, Q).
Return type:: Tensor

state_emissions(states)

Computes how likely any observation is emitted by the given hidden states.

Parameters:

states (Tensor) – The input state sequences of shape (B, T, H, Q), where Q is the number of hidden states.

Returns:

The emission distributions of shape: (B, T, H, D), where D is the size of the emission alphabet.

Return type:

Tensor

class hidten.tf.emitter.multivariate_normal.TFMVNormalEmitter(*args, **kwargs)

Bases: TFEmitter

An emitter for multivariate normal distributed observations. The configuration for a multivariate normal emitter.

build(input_shape)

Parameters:: input_shape (TensorShape | tuple[int | None, ...] | tuple[tuple[int | None, ...], ...])
Return type:: None

emission_scores(observations)

Computes how likely each observation is emitted by any hidden state.

Parameters:

observations (Tensor) – The observation sequences of shape (B, T, D) or (B, T, H, D), where D is the size of the emission alphabet.

Returns:

A state tensor of shape (B, T, H, Q), where: Q is the number of hidden states.

Return type:

Tensor

classmethod from_config(config, custom_objects=None)

Initializes and returns an object of this class with the given configuration dictionary.

Parameters:

config (dict[str, Any])
custom_objects (None)

Return type:

T_Model

get_config()

Returns the config used in the class as a dictionary.

Return type:: dict[str, Any]

get_default_initializer()

Return type:: Initializer

property initializer: Initializer

The initializer of the distribution. The flat initializer must be resizeable to (H, Q, P) where P is the number of distribution parameters per head and state.

Let D be the dimensionality of the observations. If full covariance is used, then the initializer is required to contain D means followed by the flat, row-major covariance matrix (P = D + D^2). The full, symmetric covariance matrix is expected. Otherwise the means are followed by D variances (P = 2D) per head and state.

Defaults to means of zeros and variances of ones.

log_prob(observations)

Computes the log density function of the observations under the multivariate normal distribution.

Parameters:

observations (Tensor) – The observations to compute the log probability for. The shape should be (B, T, D) or (B, T, H, D).

Returns:

The log probability of the observations under the: multivariate normal distribution of shape (B, T, H, Q).

Return type:

Tensor

matrix(sqrt_variance=False)

Calculates a tensor with the parameters of the multivariate normal distributions per head and state. The first D values are the means, the rest are the variances, either diagonal or full.

Parameters:

sqrt_variance (bool) – Whether to return the square root of the variance (the scale matrix A) instead of the variance itself. For full covariance, this is the Cholesky factor A such that covariance = A @ A^T.

Returns:

The emission matrix of shape (H, Q, P) where: H is the number of heads, Q is the number of hidden states and P is the number of parameters.

Return type:

Tensor

property matrix_dim: int: Last dimension of the matrix.

mean(matrix=None)

The mean vector of the multivariate normal distribution with shape (H, Q, D).

Parameters:: matrix (Tensor | None)
Return type:: Tensor

sample(state)

Samples a possible emission given the current state.

Parameters:: state (Tensor) – Tensor of shape (B, H, Q), where B is a batch dimension, H is the number of heads of the HMM and Q is the number of states.
Returns:: Emission tensor of shape (B, H, Q).
Return type:: Tensor

sqrt_variance(matrix=None)

Returns the square root of the variance of the multivariate normal distribution. For full covariance, this is the Cholesky factor A such that covariance = A @ A^T.

Returns:

A matrix A of shape (H, Q, D, D) if full: covariance is used, or a vector of shape (H, Q, D) with entry-wise sqrt(variance) if diagonal covariance is used.

Return type:

Tensor

Parameters:

matrix (Tensor | None)

state_emissions(states)

Computes how likely any observation is emitted by the given hidden states.

Parameters:

states (Tensor) – The input state sequences of shape (B, T, H, Q), where Q is the number of hidden states.

Returns:

The emission distributions of shape: (B, T, H, D), where D is the size of the emission alphabet.

Return type:

Tensor

variance(matrix=None)

The covariance matrix of the multivariate normal distribution with shape (H, Q, D, D) if full covariance is used, or (H, Q, D) if diagonal covariance is used.

Parameters:: matrix (Tensor | None)
Return type:: Tensor

class hidten.tf.emitter.multivariate_normal.TFMVNormalEmitterConfig(*, full_covariance=False, temperature=1.0)

Bases: ModelConfig

The configuration for a multivariate normal emitter.

Parameters:

full_covariance (bool)
temperature (float)

full_covariance: bool: Whether to use a full covariance matrix or a diagonal one. Defaults to False.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

temperature: float: Temperature for the multivariate normal distribution.

hidten.tf.emitter.multivariate_normal.mvn_init_transform(x, input_dim, full_covariance=False, num_components=1)

Transforms means and (co-)variances to kernel values.

Parameters:

x (Tensor) – Initializer values with shape that can be reshaped to (H*Q, init_total_dim) with structure: [means_comp1, …, means_compZ, vars_comp1, …, vars_compZ, mix_coef]
input_dim (int) – Dimensionality of observations.
full_covariance (bool) – Whether to use full covariance matrices.
num_components (int) – Number of mixture components.

Returns:

Transformed kernel values of shape (H*Q*output_total_dim,) with structure: [mean_comp1, …, mean_compZ, scale_comp1, …, scale_compZ, mix_coef]

Return type:

Tensor

hidten.tf.emitter.multivariate_normal.mvn_log_prob(values, mean, scale)

Computes the log density function of the values under the multivariate normal distribution with parameters mean and scale. Supports multiple mixture components Z.

Parameters:

values (Tensor) – The values to compute the log probability for. The shape should be (B, T, D) or (B, T, H, D) or (B, T, H, Q, D).
mean (Tensor) – The mean of the multivariate normal distribution of shape (H, Q, Z, D).
scale (Tensor) – The scale of the multivariate normal distribution. A Matrix A such that covariances = A @ A^T of shape (H, Q, Z, D, D) if full covariance is used, or (H, Q, Z, D) if diagonal covariance is used.

Returns:

The log probability of the values under the: multivariate normal distribution of shape (B, T, H, Q, Z).

Return type:

Tensor

TensorFlow Transitioners

TensorFlow-specific transition models.

class hidten.tf.transitioner.TFTransitioner(*args, **kwargs)

Bases: TFLayer, Transitioner[Tensor]

build(input_shape=None)

Parameters:: input_shape (TensorShape | tuple[int | None, ...] | tuple[tuple[int | None, ...], ...] | None)
Return type:: None

call(x, t=None, batch_first=True)

Parameters:

x (Tensor)
t (int | None)
batch_first (bool)

Return type:

Tensor

property initializer: Initializer: The initializer of the transition values that are allowed in the Transitioner. Defaults to constant values.

property initializer_start: Initializer: The initializer of the start distribution of allowed states in the Transitioner. Defaults to constant values.

matrix(transition_delta=None)

Computes the state transition matrices for each model.

Parameters:

transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).

Returns:

The state transition probabilities of shape: (H, Q, Q) (or (B, T, H, Q, Q) if transition_delta is provided).

Return type:

Tensor

property matrix_dim: int: Last dimension of the matrix.

sample(state=None)

Samples a possible next state given the previous.

Parameters:: state (Tensor, optional) – Tensor of shape (B, H, Q), where B is a batch dimension, H is the number of heads of the HMM and Q is the number of states. If no state is supplied, the Transitioner uses the start distribution with B=1.
Returns:: Next state tensor of shape (B, H, Q).
Return type:: Tensor

start_dist()

Computes the initial distribution for each model.

Returns:: The start distributions of shape (H, Q).
Return type:: Tensor

step(x, t=None)

Performs a single time step of the Markov chain.

Parameters:

x (Tensor) – The input tensor of shape (H, B, Q). B is the batch size, H is the number of heads and Q is the number of states.
t (int, optional) – The time step index. This is only needed for transitioners that use transition deltas.

Returns:

The output tensor of shape (H, B, Q).

Return type:

Tensor

property train_start_dist: bool

property train_transitions: bool

TensorFlow Priors

TensorFlow-specific prior models.

class hidten.tf.prior.base.TFCombinedPrior(*args, **kwargs)

Bases: TFPrior

A prior that combines multiple priors by summing their scores.

Parameters:: temperature (float)

add_prior(prior)

Parameters:: prior (TFPrior)
Return type:: None

build(input_shape=None)

Parameters:: input_shape (TensorShape | tuple[int | None, ...] | tuple[tuple[int | None, ...], ...] | None)
Return type:: None

property hmm_config: HMMConfig

prior_scores(values)

Computes prior scores for the input values by summing the scores from all contained priors.

Parameters:

values (Tensor) – The input values of shape (H, Q, D), where D is the prior feature dimension.

Returns:

The prior scores of shape: (H) with prior scores for each head summed over the states.

Return type:

Tensor

property priors: list[TFPrior]

class hidten.tf.prior.base.TFPrior(*args, **kwargs)

Bases: TFLayer, Prior[Tensor]

Base class for TF priors.

Parameters:: temperature (float)

call(values)

Calls the prior with the given inputs.

Parameters:

inputs (Tensor) – The input tensor of shape (H, Q, D).
values (Tensor)

Returns:

The output tensor of shape (H,) in log-scale.

Return type:

Tensor

property temperature: float: Scaling factor for the prior log-likelihood. Values > 1 strengthen the prior, values < 1 weaken it, and 1.0 leaves it unchanged.

class hidten.tf.prior.dirichlet.TFDirichletPrior(*args, **kwargs)

Bases: TFPrior

A Dirichlet prior for categorical observations. Can be added to a hidten.tf.categorical.TFCategoricalEmitter. The configuration for a Dirichlet prior.

build(input_shape)

Parameters:: input_shape (TensorShape | tuple[int | None, ...] | tuple[tuple[int | None, ...], ...])
Return type:: None

classmethod from_config(config, custom_objects=None)

Initializes and returns an object of this class with the given configuration dictionary.

Parameters:

config (dict[str, Any])
custom_objects (None)

Return type:

T_Model

get_config()

Returns the config used in the class as a dictionary.

Return type:: dict[str, Any]

get_default_initializer()

Return type:: Initializer

property initializer: Initializer

The initializer of the emission matrix.

For a single component, the initializer values are the Dirichlet concentration parameters (all must be > 0). Defaults to ones.

For multiple components, the flat initializer must be resizeable to (H, Q, P) where P is the number of distribution parameters per head and state.

Let D be the alphabet size and C be the number of mixture components. The initializer is ordered as follows: - Concentrations of component 1 (D values) - … - Concentrations of component C (D values) - Mixture coefficients (C values)

Defaults to concentration ones and uniform mixture coefficients.

log_dirichlet_pdf(x)

Compute the log pdf of a Dirichlet distribution.

Parameters:

x (Tensor) – The input values of shape (H, Q, D), where the values must sum to 1 along the last dimension.

Returns:

The log pdf with shape (H, Q) for single component,: or (H, Q, C) for multiple components.

Return type:

Tensor

matrix()

Calculates a tensor with the parameters of the Dirichlet distributions per head and state.

For multiple components, the matrix contains the concentration parameters of each component as well as the mixture coefficients: (conc component 1, …, conc component C, mixture coefficients)

Returns:

The parameter matrix of shape (H, Q, P) where: H is the number of heads, Q is the number of hidden states and P is the number of parameters.

Return type:

Tensor

property matrix_dim: int: Last dimension of the matrix.

mean()

Compute the mean of the Dirichlet prior distribution.

For a single component, the mean is alpha / sum(alpha). For a mixture, the mean is the mixture-weighted average of per-component means: sum_c w_c * alpha_c / sum_d(alpha_c_d).

Padding states (where all concentrations are zero) are returned as zeros.

Returns:: Mean of shape (H, Q, D).
Return type:: Tensor

prior_scores(values)

Computes log-Dirichlet densities.

Parameters:

values (Tensor) – The input values of shape (H, Q, D), where the values must sum to 1 along the last dimension.

Returns:

The prior scores with shape (H) with dirichlet log: densities for each head summed over the states.

Return type:

Tensor

class hidten.tf.prior.dirichlet.TFDirichletPriorConfig(*, components=1)

Bases: ModelConfig

The configuration for a Dirichlet prior.

Parameters:: components (int)

components: int: The number of mixture components.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

hidten.tf.prior.dirichlet.dirichlet_init_transform(x, input_dim, num_components=1)

Transforms concentration parameters and mixture coefficients to kernel values.

Parameters:

x (Tensor) – Initializer values with structure: [conc_comp1, …, conc_compC, mix_coef]
input_dim (int) – Dimensionality of the categorical distribution.
num_components (int) – Number of mixture components.

Returns:

Transformed kernel values with concentrations in inverse-softplus space and mixture coefficients in log space.

Return type:

Tensor

Algorithms

Core algorithms for inference and learning.

class hidten.tf.forward_backward.TFForwardBackwardMixin

Bases: Scanner

Mixin class for the forward-backward algorithm in TensorFlow. Must not be used otherwise than as a mixin in the TFHMM class.

backward_log(*observations, transition_delta=None, parallel=1)

Runs the vectorized and differentiable backward algorithm and returns backward log-probabilities.

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T-1, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
parallel (int, optional) – The parallel factor for the HMM. Defaults to no parallelization.
self (TFHMM)

Returns:

The backward log-probabilities for the input: sequence of shape (B, T, H, Q).

Return type:

Tensor

forward_log(*observations, transition_delta=None, parallel=1)

Runs the vectorized and differentiable forward algorithm that calculates logarithmic forward probabilities.

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T-1, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
parallel (int, optional) – The parallel factor for the HMM. Defaults to no parallelization.
self (TFHMM)

Returns:

The forward log-probabilities of: the input sequence of shape (B, T, H, Q).

Return type:

Tensor

forward_scaled(*observations, transition_delta=None, parallel=1)

Runs the scaled variant of the forward algorithm in vectorized and differentiable manner.

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T-1, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
parallel (int, optional) – The parallel factor for the HMM. Defaults to no parallelization.
self (TFHMM)

Returns:

The scaled forward probabilities (not: logarithmic, but summing to one over the last dimension) for the input sequence of shape (B, T, H, Q) and the scaling factors of shape (B, T, H, 1).

Return type:

Tuple of tensors

likelihood_log(*observations, transition_delta=None, parallel=1)

Computes the log-likelihood of the observations using the forward algorithm.

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T-1, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
parallel (int, optional) – The parallel factor for the HMM. Defaults to no parallelization.
self (TFHMM)

Returns:

The log-likelihood of each observation and head of: shape (B, H, 1).

Return type:

Tensor

posterior(*observations, transition_delta=None, normalize=True, parallel=1, return_emission_scores=False)

Runs the vectorized and differentiable forward-backward algorithm and returns the posterior state probabilities. The result is not logarithmic!

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T-1, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
normalize (bool) – Whether to normalize the posterior. If false, the output will be the naked product of scaled forward and backward probabilities. This can result in posterior values > 1 due to numerical errors.
parallel (int, optional) – The parallel factor for the HMM. Defaults to no parallelization.
return_emission_scores (bool, optional) – Whether to return the emission scores.
self (TFHMM)

Returns:

The posterior state log-probabilities for the input: sequence of shape (B, T, H, Q), where Q is the maximum number of states in all HMMs.

Return type:

Tensor

class hidten.tf.viterbi.TFViterbiMixin

Bases: Scanner

Mixin class for the Viterbi algorithm in TensorFlow. Must not be used otherwise than as a mixin in the TFHMM class.

maximum_expected_accuracy(*observations, transition_delta=None, parallel=1, output_type=tf.int32, invalid_state=-1, validate_state_seqs=True)

For each batch sequence and each head, computes the state sequence X with maximum sum of the posterior probabilities and P(X) > 0.

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
parallel (int) – The parallel factor for the HMM.
output_type (tf.dtypes.DType) – The type of the output tensor.
invalid_state (int) – Indiates padding and invalid states (e.g., when no possible state sequence exists.)
validate_state_seqs (bool) – If True, the function checks for the edge case when there is no state sequence with non-zero probability.
self (TFHMM)

Returns:

The maximum expected accuracy state sequence: for the input sequence of shape (B, T, H).

Return type:

Tensor (int)

viterbi(*observations, transition_delta=None, parallel=1, output_type=tf.int32, return_variables=False, invalid_state=-1, validate_state_seqs=True, scaled=True)

Computes the most likely state sequences for the given observations with the Viterbi algorithm.

The implementation is logarithmic (underflow-safe) and capable of decoding many sequences in parallel on the GPU. Optionally the function can also parallelize over the sequence length at the cost of memory usage (recommended for long sequences and HMMs with few states).

Parameters:

observations (Tensor or tuple of Tensors) – The input sequence(s). Tensor i should be broadcastable to shape (B, T, H, D_i) where B is the batch size, T is the time dimension, H is the number of HMM heads and D_i is the feature dimension.
transition_delta (Tensor or None, optional) – Time-varying change that is added on top of the transitioner kernel. Should be of shape (B, T, K) or broadcastable to it, where K is the kernel size of transitioner (see transitioner.kernel.shape).
parallel (int) – The parallel factor for the HMM.
output_type (tf.dtypes.DType) – The type of the output tensor.
return_variables (bool) – If True, the function returns additional variables.
invalid_state (int) – Indiates padding and invalid states (e.g., when no possible state sequence exists.)
validate_state_seqs (bool) – If True, the function checks for the edge case when there is no state sequence with non-zero probability.
scaled (bool) – If True, the Viterbi log-probs are scaled by the maximum log-prob at each time step to improve numerical stability. Only affects parallel.
self (TFHMM)

Returns:

The most likely state sequence for the input: sequence of shape (B, T, H). If return_variables is True, the function also returns the Viterbi values of shape (B, T, H, Q).

Return type:

Tensor (int)

class hidten.tf.scan.Scanner: Bases: object

TensorFlow Utilities

TensorFlow-specific utility functions.

hidten.tf.util.clip(x, min_val=None, max_val=None)

Clips the values of x to be between min_val and max_val.

Parameters:

x (Tensor)
min_val (float | int | None)
max_val (float | int | None)

Return type:

Tensor

hidten.tf.util.ensure_array(x)

Ensures that the input is a numpy array.

Parameters:: x (Tensor | ndarray | list | float | int)
Return type:: ndarray

hidten.tf.util.epsilon(x)

Returns a small value to avoid numerical issues depending on the dtype of x.

Parameters:: x (Tensor | ndarray | list | tuple | float | int)
Return type:: float

hidten.tf.util.flat_to_triangular(flat, upper=False)

Creates triangular matrices from flat vectors.

Parameters:

flat (Tensor) – A tensor representing lower (or upper) triangular elements of shape (..., M).
upper (bool, optional) – Whether the output matrix should be upper or lower triangular. Defaults to lower.

Returns:

Tensor with lower (or upper) triangular elements filled: from flat, with shape (..., N, N) where N = (-1 + sqrt(1 + 8 * M)) / 2.

Return type:

Tensor

Raises:

ValueError – if x cannot be mapped to a triangular matrix.

hidten.tf.util.inverse_softplus(x)

Compute the element-wise inverse softplus of the input tensor.

Parameters:

features (tf.Tensor) – Input tensor containing the values to apply the inverse softplus function.
x (Tensor)

Returns:

A tensor with the same shape as features.

Return type:

tf.Tensor

hidten.tf.util.local_to_global_factors(local_sf, global_almost_scaled, batch_first=False)

Computes the global scaling factors from the local scaling factors. Also computes correction factors that convert a locally scaled tensor to a globally scaled tensor.

Parameters:

local_sf (Tensor) – The local scaling factors of shape (H, B, C, Z, 1).
global_almost_scaled (Tensor) – The global variables scaled up to the previous step but unscaled for the current step of shape (H, B, Z).
batch_first (bool, optional) – If True, (only) local_sf is assumed to have shape (B, C, H, Z, 1) instead.

Returns:

(global_scaled, global_sf, correction): of shapes ((H, B, Q), (H, B, C, 1), (H, B, C, Z)) (independent of the batch_first argument).

Return type:

(Tuple of tensors)

hidten.tf.util.log_to_scaled(log_probs)

Utility function that converts log probabilities to scaled probabilities.

Parameters:

log_probs (Tensor) – The log probabilities of shape (B, T, H, Q).

Returns:

The scaled probabilities of shape: (B, T, H, Q) and the scaling factors of shape (B, T, H, 1) that were used for scaling.

Return type:

Tuple of Tensors

hidten.tf.util.log_zero(x)

Returns the maximum value to avoid numerical issues depending on the dtype of x.

Parameters:: x (Tensor | ndarray | list | tuple | float | int)
Return type:: float

hidten.tf.util.logit(x)

Computes the element-wise logit function, which is the inverse to the sigmoid function: ln(x / (1-x))

Parameters:: x (Tensor | ndarray | list | tuple | float | int)
Return type:: Tensor

hidten.tf.util.map_diagonal(x, fn)

Applies a function to the batched diagonals of batched matrices.

Parameters:

x (Tensor)
fn (Callable)

Return type:

Tensor

hidten.tf.util.n_shared_parameters(indices, share)

Calculates the number of shared parameters by a given number of indices and shared ranges of those indices.

Parameters:

indices (ndarray)
share (ndarray | list[int] | None)

Return type:

int

hidten.tf.util.np_dtype(x)

Returns the numpy dtype of the input.

Parameters:: x (Tensor | ndarray | list | tuple | float | int)
Return type:: dtype

hidten.tf.util.safe_div(x, y)

Safely divides the first tensor by the second.

Parameters:

x (Tensor)
y (Tensor)

Return type:

Tensor

hidten.tf.util.safe_log(x)

Computes element-wise logarithm with output_i=log_zero_val where x_i=0.

Parameters:

x (TFTensor) – Input tensor.

Returns:

Element-wise logarithm of x, with log_zero_val where x: is 0. Same dtype and shape as x.

Return type:

TFTensor

hidten.tf.util.safe_matmul(A, B, transpose_b=False, balance_factor=4, use_balancing=False)

Performs a batched matrix multiplication of A and B. This is functionally equivalent to tf.matmul (except unused arguments), but numerically, this method is guarded against loss of precision in cases where A is very thin, i.e. let A have shape ( …, m, n ) and B have shape ( …, n, p ), then if m is large compared to n (e.g. m=512, n=15), the numerical precision of, e.g., the forward algorithm will suffer compared to the equivalent case of performing multiple smaller, more balanced matrix multiplications.

When both m and n are statically known and m > balance_factor * n, this method will express the multiplication of A and B as multiple smaller multiplications like this: (…, m, n) x (…, n, p) -> (…, c, z, n) x (…, 1, n, p) where b=cz + r with z = balance_factor * n and c and r are chosen appropriately.

Note: Swapping this method with the default tf.matmul makes some tests with small HMMs but large batch sizes fail due to numerical precision issues.

Parameters:

A (Tensor)
B (Tensor)
transpose_b (bool)
balance_factor (int)
use_balancing (bool)

Return type:

Tensor

hidten.tf.util.safe_norm(x)

Safely makes a probability distribution out of x.

Parameters:: x (Tensor)
Return type:: Tensor

hidten.tf.util.scale(values)

Scales the values by their sum over the last dimension and returns the scaled values and the scaling factors.

Parameters:

values (Tensor) – The values to scale, shape (..., Q).

Returns:

The scaled values of shape (..., Q),: where the last dimension sums to one, and the scaling factors of shape (..., 1) that were used for scaling.

Return type:

(tuple of Tensors)

hidten.tf.util.scaled_to_log(scaled_probs: Tensor, scaling_factors: Tensor, time_axis: int = 1, return_cumsum: bool = False) → Tensor

hidten.tf.util.scaled_to_log(scaled_probs: Tensor, scaling_factors: Tensor, time_axis: int, return_cumsum: bool) → Tensor | tuple[Tensor, Tensor]

Utility function that converts scaled probabilities to log-probabilities.

Parameters:

scaled_probs (Tensor) – The scaled probabilities of shape (..., T, ..., Q).
scaling_factors (Tensor) – The scaling factors of shape (..., T, ..., 1) that were used for scaling.
time_axis (int, optional) – The axis along which the time dimension is located. Defaults to 1.
return_cumsum (bool, optional) – If True, the cumulative sum of the scaling factors is returned as well. Defaults to False.

Returns:

The log-probabilities of shape: (..., T, ..., Q). If return_cumsum is True, a tuple of tensors is returned, where the first tensor is the log-probabilities and the second tensor is the cumulative sum of the scaling factors of shape (..., T, ..., 1).

Return type:

(Tensor or tuple of Tensors)

hidten.tf.util.setup_initializer(initializer, transform=None, validate_condition=None, validate_msg='')

Parameters:

initializer (Initializer | Tensor | ndarray | list | float)
transform (Callable | None)
validate_condition (Callable | None)
validate_msg (str)

Return type:

Initializer

hidten.tf.util.shared_tensor(indices, values, shape, share=None)

Creates a dense tensor with optional parameter sharing using scatter operations.

Parameters:

indices (T_TFTensor) – Tensor of shape (N, 2), dtype int64.
values (T_TFTensor) – Tensor of shape (..., K), dtype float32.
shape (T_TFTensor) – Shape of the resulting tensor not including the leading dimensions of values.
share (T_TFTensor, optional) – Tensor of shape (N, ), dtype int32 or int64. Defaults to no parameter sharing.

Returns:

A dense tensor with shared parameters placed at: given indices. The shape is (..., ) + shape.

Return type:

T_TFTensor

hidten.tf.util.tiny(x)

Returns a tiny value to avoid numerical issues depending on the dtype of x.

Parameters:: x (Tensor | ndarray | list | tuple | float | int)
Return type:: float

hidten.tf.util.triangular_to_flat(triag, upper=False)

Creates flat vectors from triangular matrices.

Parameters:

triag (tensor) – A tensor representing lower (or upper) triangular elements of shape (..., N, N).
upper (bool, optional) – Whether the input matrix should be intepreted as upper or lower triangular. Defaults to lower.

Returns:

A batch of vector-shaped tensors representing vectorized: lower (or upper) triangular elements of triag, with shape (..., M) where ``M = N * (N + 1).

Return type:

Tensor

hidten.tf.util.zero(x)

Returns a small value that can be used without trouble in divisions.

Parameters:: x (Tensor | ndarray | list | tuple | float | int)
Return type:: float

hidten.tf.util.zero_row_softmax(x)

Computes a softmax over the rightmost axis but returns a row of zeros when the input row contains only log_zero values.

Parameters:: x (Tensor)
Return type:: Tensor