mbrs.decoders.pruning_mbr module#

class mbrs.decoders.pruning_mbr.DecoderPruningMBR(cfg: ~mbrs.decoders.pruning_mbr.DecoderPruningMBR.Config, metric: ~mbrs.metrics.base.Metric, selector: ~mbrs.selectors.base.Selector = <mbrs.selectors.nbest.SelectorNbest object>)[source]#

Bases: DecoderMBR

Pruning MBR decoder class.

References

J. Cheng and A. Vlachos, 2023, “Faster Minimum Bayes Risk Decoding with Confidence-based Pruning”. https://aclanthology.org/2023.emnlp-main.767/

class Config(alpha: float = 0.99, sampling_scheduler: list[int] = <factory>, num_bootstrap_samples: int = 500, seed: int = 0)[source]#

Bases: Config

Configuration for the decoder.

  • alpha (float): Prune hypotheses based on this confidence threshold.

  • sampling_shceduler (list[int]): Sample size scheduler. For each step, the number of samples will be the t-th number.

  • num_boostrap_samples (int): Number of boostrap samples.

  • seed (int): Random seed for bootstrap sampling.

alpha: float = 0.99#
num_bootstrap_samples: int = 500#
sampling_scheduler: list[int]#
seed: int = 0#
cfg: Config#
decode(hypotheses: list[str], references: list[str], source: str | None = None, nbest: int = 1, reference_lprobs: Tensor | None = None) Output[source]#

Select the n-best hypotheses based on the strategy.

Parameters:
  • hypotheses (list[str]) – Hypotheses.

  • references (list[str]) – References.

  • source (str, optional) – A source.

  • nbest (int) – Return the n-best hypotheses.

  • reference_lprobs (Tensor, optional) – Log-probabilities for each reference sample. The shape must be (len(references),). See https://arxiv.org/abs/2311.05263.

Returns:

The n-best hypotheses.

Return type:

DecoderMBR.Output

decode_pruning(hypotheses: list[str], references: list[str], source: str | None = None, nbest: int = 1, reference_lprobs: Tensor | None = None) tuple[list[float], list[int]][source]#

Select the n-best hypotheses using pruning MBR decoding.

Parameters:
  • hypotheses (list[str]) – Hypotheses.

  • references (list[str]) – References.

  • source (str, optional) – A source.

  • nbest (int) – Return the n-best hypotheses.

  • reference_lprobs (Tensor, optional) – Log-probabilities for each reference sample. The shape must be (len(references),). See https://arxiv.org/abs/2311.05263.

Returns:

Top-k scores. - list[int]: Top-k indices.

Return type:

  • list[float]

metric: Metric#