mbrs.metrics.ter module#

class mbrs.metrics.ter.MetricTER(cfg: Config)[source]#

Bases: Metric

TER metric class.

class Config(normalized: bool = False, no_punct: bool = False, asian_support: bool = False, case_sensitive: bool = False, num_workers: int = 8)[source]#

Bases: Config

TER metric configuration.

  • normalized (bool): Enable character normalization. By default, normalizes a couple of things such as newlines being stripped, retrieving XML encoded characters, and fixing tokenization for punctuation. When ‘asian_support’ is enabled, also normalizes specific Asian (CJK) character sequences, i.e. split them down to the character level.

  • no_punct (bool): Remove punctuation. Can be used in conjunction with ‘asian_support’ to also remove typical punctuation markers in Asian languages (CJK).

  • asian_support (bool): Enable special treatment of Asian characters. This option only has an effect when ‘normalized’ and/or ‘no_punct’ is enabled. If ‘normalized’ is also enabled, then Asian (CJK) characters are split down to the character level. If ‘no_punct’ is enabled alongside ‘asian_support’, specific unicode ranges for CJK and full-width punctuations are also removed.

  • case_sensitive (bool): If True, does not lowercase sentences.

  • num_workers (int): Number of workers for multiprocessing.

asian_support: bool = False#
case_sensitive: bool = False#
no_punct: bool = False#
normalized: bool = False#
num_workers: int = 8#
HIGHER_IS_BETTER: bool = False#
cfg: Config#
corpus_score(hypotheses: list[str], references_lists: list[list[str]], sources: list[str] | None = None) float[source]#

Calculate the corpus-level score.

Parameters:
  • hypotheses (list[str]) – Hypotheses.

  • references_lists (list[list[str]]) – Lists of references.

  • sources (list[str], optional) – Sources.

Returns:

The corpus score.

Return type:

float

pairwise_scores(hypotheses: list[str], references: list[str], *_, **__) Tensor[source]#

Calculate the pairwise scores.

Parameters:
  • hypotheses (list[str]) – Hypotheses.

  • references (list[str]) – References.

Returns:

Score matrix of shape (H, R), where H is the number

of hypotheses and R is the number of references.

Return type:

Tensor

score(hypothesis: str, reference: str, *_, **__) float[source]#

Calculate the score of the given hypothesis.

Parameters:
  • hypothesis (str) – Hypothesis.

  • reference (str) – Reference.

Returns:

The score of the given hypothesis.

Return type:

float

scores(hypotheses: list[str], references: list[str], *_, **__) Tensor[source]#

Calculate the scores of the given hypotheses.

Parameters:
  • hypotheses (list[str]) – N hypotheses.

  • references (list[str]) – N references.

Returns:

The N scores of the given hypotheses.

Return type:

Tensor