mbrs.metrics.ter module

mbrs.metrics.ter module#

class mbrs.metrics.ter.MetricTER(cfg: Config)[source]#

Bases: Metric

TER metric class.

class Config(normalized: bool = False, no_punct: bool = False, asian_support: bool = False, case_sensitive: bool = False, num_workers: int = 8)[source]#

Bases: Config

TER metric configuration.

normalized (bool): Enable character normalization. By default, normalizes a couple of things such as newlines being stripped, retrieving XML encoded characters, and fixing tokenization for punctuation. When ‘asian_support’ is enabled, also normalizes specific Asian (CJK) character sequences, i.e. split them down to the character level.
no_punct (bool): Remove punctuation. Can be used in conjunction with ‘asian_support’ to also remove typical punctuation markers in Asian languages (CJK).
asian_support (bool): Enable special treatment of Asian characters. This option only has an effect when ‘normalized’ and/or ‘no_punct’ is enabled. If ‘normalized’ is also enabled, then Asian (CJK) characters are split down to the character level. If ‘no_punct’ is enabled alongside ‘asian_support’, specific unicode ranges for CJK and full-width punctuations are also removed.
case_sensitive (bool): If True, does not lowercase sentences.
num_workers (int): Number of workers for multiprocessing.

asian_support: bool = False#

case_sensitive: bool = False#

no_punct: bool = False#

normalized: bool = False#

num_workers: int = 8#

HIGHER_IS_BETTER: bool = False#

cfg: Config#

corpus_score(hypotheses: list[str], references_lists: list[list[str]], sources: list[str] | None = None) → float[source]#

Calculate the corpus-level score.

Parameters:

hypotheses (list[str]) – Hypotheses.
references_lists (list[list[str]]) – Lists of references.
sources (list[str], optional) – Sources.

Returns:

The corpus score.

Return type:

float

pairwise_scores(hypotheses: list[str], references: list[str], *_, **__) → Tensor[source]#

Calculate the pairwise scores.

Parameters:

hypotheses (list[str]) – Hypotheses.
references (list[str]) – References.

Returns:

Score matrix of shape (H, R), where H is the number: of hypotheses and R is the number of references.

Return type:

Tensor

score(hypothesis: str, reference: str, *_, **__) → float[source]#

Calculate the score of the given hypothesis.

Parameters:

hypothesis (str) – Hypothesis.
reference (str) – Reference.

Returns:

The score of the given hypothesis.

Return type:

float

scores(hypotheses: list[str], references: list[str], *_, **__) → Tensor[source]#

Calculate the scores of the given hypotheses.

Parameters:

hypotheses (list[str]) – N hypotheses.
references (list[str]) – N references.

Returns:

The N scores of the given hypotheses.

Return type:

Tensor

mbrs.metrics.ter module

Contents

mbrs.metrics.ter module#