mbrs.metrics.ter module#
- class mbrs.metrics.ter.MetricTER(cfg: Config)[source]#
Bases:
MetricTER metric class.
- class Config(normalized: bool = False, no_punct: bool = False, asian_support: bool = False, case_sensitive: bool = False, num_workers: int = 8)[source]#
Bases:
ConfigTER metric configuration.
normalized (bool): Enable character normalization. By default, normalizes a couple of things such as newlines being stripped, retrieving XML encoded characters, and fixing tokenization for punctuation. When ‘asian_support’ is enabled, also normalizes specific Asian (CJK) character sequences, i.e. split them down to the character level.
no_punct (bool): Remove punctuation. Can be used in conjunction with ‘asian_support’ to also remove typical punctuation markers in Asian languages (CJK).
asian_support (bool): Enable special treatment of Asian characters. This option only has an effect when ‘normalized’ and/or ‘no_punct’ is enabled. If ‘normalized’ is also enabled, then Asian (CJK) characters are split down to the character level. If ‘no_punct’ is enabled alongside ‘asian_support’, specific unicode ranges for CJK and full-width punctuations are also removed.
case_sensitive (bool): If True, does not lowercase sentences.
num_workers (int): Number of workers for multiprocessing.
- corpus_score(hypotheses: list[str], references_lists: list[list[str]], sources: list[str] | None = None) float[source]#
Calculate the corpus-level score.
- pairwise_scores(hypotheses: list[str], references: list[str], *_, **__) Tensor[source]#
Calculate the pairwise scores.