Manual of CLI options#

mbrs-generate#

usage: mbrs-generate [-h] [--config_path Path] [--plugin_dir Path]
                     [-o FileType('w')] [--lprobs [FileType('w')]]
                     [--length_normalized_lprobs [FileType('w')]] [-m str]
                     [-n int] [-s {,eps}] [--beam_size int] [-e float]
                     [--lang_pair str] [--max_length int] [--min_length int]
                     [--length_penalty [float]] [--batch_size int]
                     [--sampling_size int] [--unique bool] [--retry int]
                     [--fp16 bool] [--bf16 bool] [--cpu bool] [--seed int]
                     [-q bool] [--report FileType('w')]
                     [--report_format {asciidoc,double_grid,double_outline,fancy_grid,fancy_outline,github,grid,heavy_grid,heavy_outline,html,jira,latex,latex_booktabs,latex_longtable,latex_raw,mediawiki,mixed_grid,mixed_outline,moinmoin,orgtbl,outline,pipe,plain,presto,pretty,psql,rounded_grid,rounded_outline,rst,simple,simple_grid,simple_outline,textile,tsv,unsafehtml,youtrack}]
                     [-w int]
                     [FileType('r', encoding='utf-8')]

Named Arguments#

--config_path

Path to a config file containing default values to use.

--plugin_dir

Path to a directory containing user defined plugins.

GenerationArguments [‘generation’]#

Generation arguments.

generation.input

Input file. If not specified, read from stdin.

Default: -

-o, --output

Output file.

Default: -

--lprobs

Reference log-probabilities file. This option is typically used for the model-based estimation.

--length_normalized_lprobs

Length-normalized reference log-probabilities file. This option is typically used for the model-based estimation.

-m, --model

Model name or path.

Default: 'facebook/m2m100_418M'

-n, --num_candidates

Number of candidates to be returned.

Default: 1

-s, --sampling

Possible choices: , eps

Sampling method.

Default: ''

--beam_size

Beam size.

Default: 5

-e, --eps, --epsilon

Cutoff parameter for epsilon sampling.

Default: 0.02

--lang_pair

Language code pair. Some models like M2M100 uses this information.

Default: 'en-de'

--max_length

Maximum length of an output sentence.

Default: 1024

--min_length

Minimum length of an output sentence.

Default: 1

--length_penalty

Length penalty.

--batch_size

Batch size.

Default: 1

--sampling_size

Sampling size in a single inference. The model generates this number of samples at a time until the total number of samples reaches –num_candidates.

Default: 8

--unique, --nounique

Generate unique sentences for each input. (default: False)

Default: False

--retry

Retry to do sampling N times when generate unique sentences. If no unique sentences are found after this number of attempts, non-unique sentences will be included in outputs.

Default: 100

--fp16, --nofp16

Use float16. (default: False)

Default: False

--bf16, --nobf16

Use bfloat16. (default: False)

Default: False

--cpu, --nocpu

Force to use CPU. (default: False)

Default: False

--seed

Random number seed.

Default: 0

-q, --quiet, --noq, --noquiet

No report statistics.. (default: False)

Default: False

--report

Report file.

Default: -

--report_format

Possible choices: asciidoc, double_grid, double_outline, fancy_grid, fancy_outline, github, grid, heavy_grid, heavy_outline, html, jira, latex, latex_booktabs, latex_longtable, latex_raw, mediawiki, mixed_grid, mixed_outline, moinmoin, orgtbl, outline, pipe, plain, presto, pretty, psql, rounded_grid, rounded_outline, rst, simple, simple_grid, simple_outline, textile, tsv, unsafehtml, youtrack

Report runtime statistics with the given format.

Default: 'rounded_outline'

-w, --width

Number of digits for values of float point.

Default: 1

mbrs-decode#

usage: mbrs-decode [-h] [--config_path Path] [--plugin_dir Path] -n int
                   [-s [str]] [-r [str]] [--reference_lprobs [str]]
                   [-o FileType('w', encoding='utf-8')]
                   [--format {plain,json}] [--num_references [int]]
                   [--decoder {mbr,aggregate_mbr,centroid_mbr,probabilistic_mbr,pruning_mbr,rerank}]
                   [--metric {bertscore,bleu,bleurt,chrf,comet,cometkiwi,metricx,ter,xcomet}]
                   [--selector {diverse,nbest}] [--nbest int] [--quiet bool]
                   [--report FileType('w')]
                   [--report_format {asciidoc,double_grid,double_outline,fancy_grid,fancy_outline,github,grid,heavy_grid,heavy_outline,html,jira,latex,latex_booktabs,latex_longtable,latex_raw,mediawiki,mixed_grid,mixed_outline,moinmoin,orgtbl,outline,pipe,plain,presto,pretty,psql,rounded_grid,rounded_outline,rst,simple,simple_grid,simple_outline,textile,tsv,unsafehtml,youtrack}]
                   [-w int] [--metric.lowercase bool] [--metric.force bool]
                   [--metric.tokenize [str]] [--metric.smooth_method str]
                   [--metric.smooth_value [float]]
                   [--metric.max_ngram_order int]
                   [--metric.effective_order bool] [--metric.trg_lang str]
                   [--metric.num_workers int]
                   str

Named Arguments#

--config_path

Path to a config file containing default values to use.

--plugin_dir

Path to a directory containing user defined plugins.

CommonArguments [‘common’]#

Common arguments.

common.hypotheses

Hypotheses file.

-n, --num_candidates

Number of candidates.

-s, --source

Source file.

-r, --references

References file.

--reference_lprobs

References log-probabilities file.

-o, --output

Output file.

Default: -

--format

Possible choices: plain, json

Output format.

Default: Format.plain

--num_references

Number of references for each sentence.

--decoder

Possible choices: mbr, aggregate_mbr, centroid_mbr, probabilistic_mbr, pruning_mbr, rerank

Type of the decoder.

Default: 'mbr'

--metric

Possible choices: bertscore, bleu, bleurt, chrf, comet, cometkiwi, metricx, ter, xcomet

Type of the metric.

Default: 'bleu'

--selector

Possible choices: diverse, nbest

Type of the selector.

Default: 'nbest'

--nbest

Return the n-best hypotheses.

Default: 1

--quiet, --noquiet

No verbose information and report. (default: False)

Default: False

--report

Report file.

Default: -

--report_format

Possible choices: asciidoc, double_grid, double_outline, fancy_grid, fancy_outline, github, grid, heavy_grid, heavy_outline, html, jira, latex, latex_booktabs, latex_longtable, latex_raw, mediawiki, mixed_grid, mixed_outline, moinmoin, orgtbl, outline, pipe, plain, presto, pretty, psql, rounded_grid, rounded_outline, rst, simple, simple_grid, simple_outline, textile, tsv, unsafehtml, youtrack

Report runtime statistics with the given format.

Default: 'rounded_outline'

-w, --width

Number of digits for values of float point.

Default: 1

MetricBLEU.Config [‘metric’]#

BLEU metric configuration.

  • lowercase (bool): If True, lowercased BLEU is computed.

  • force (bool): Ignore data that looks already tokenized.

  • tokenize (str, optional): The tokenizer to use. If None, defaults to language-specific tokenizers with ‘13a’ as the fallback default.

  • smooth_method (str): The smoothing method to use (‘floor’, ‘add-k’, ‘exp’ or ‘none’).

  • smooth_value (float, optional): The smoothing value for floor and add-k methods. None falls back to default value.

  • max_ngram_order (int): If given, it overrides the maximum n-gram order (default: 4) when computing precisions.

  • effective_order (bool): If True, stop including n-gram orders for which precision is 0. This should be True, if sentence-level BLEU will be computed. (default: True)

  • trg_lang (str): An optional language code to raise potential tokenizer warnings.

  • num_workers (int): Number of workers for multiprocessing.

--metric.lowercase, --metric.nolowercase

<__TEMP__> (default: False)

Default: False

--metric.force, --metric.noforce

<__TEMP__> (default: False)

Default: False

--metric.tokenize
--metric.smooth_method

<__TEMP__>

Default: 'exp'

--metric.smooth_value
--metric.max_ngram_order

<__TEMP__>

Default: 4

--metric.effective_order, --metric.noeffective_order

<__TEMP__> (default: True)

Default: True

--metric.trg_lang

<__TEMP__>

Default: ''

--metric.num_workers

<__TEMP__>

Default: 8

mbrs-score#

usage: mbrs-score [-h] [--config_path Path] [--plugin_dir Path] [-s [str]]
                  [-r [str]] [--format {plain,json}]
                  [--metric {bertscore,bleu,bleurt,chrf,comet,cometkiwi,metricx,ter,xcomet}]
                  [--quiet bool] [-w int] [--metric.lowercase bool]
                  [--metric.force bool] [--metric.tokenize [str]]
                  [--metric.smooth_method str] [--metric.smooth_value [float]]
                  [--metric.max_ngram_order int]
                  [--metric.effective_order bool] [--metric.trg_lang str]
                  [--metric.num_workers int]
                  str

Named Arguments#

--config_path

Path to a config file containing default values to use.

--plugin_dir

Path to a directory containing user defined plugins.

CommonArguments [‘common’]#

Common arguments.

common.hypotheses

Hypotheses file.

-s, --sources

Sources file.

-r, --references

References file.

--format

Possible choices: plain, json

Output format.

Default: Format.json

--metric

Possible choices: bertscore, bleu, bleurt, chrf, comet, cometkiwi, metricx, ter, xcomet

Type of the metric.

Default: 'bleu'

--quiet, --noquiet

No verbose information and report. (default: False)

Default: False

-w, --width

Number of digits for values of float point.

Default: 1

MetricBLEU.Config [‘metric’]#

BLEU metric configuration.

  • lowercase (bool): If True, lowercased BLEU is computed.

  • force (bool): Ignore data that looks already tokenized.

  • tokenize (str, optional): The tokenizer to use. If None, defaults to language-specific tokenizers with ‘13a’ as the fallback default.

  • smooth_method (str): The smoothing method to use (‘floor’, ‘add-k’, ‘exp’ or ‘none’).

  • smooth_value (float, optional): The smoothing value for floor and add-k methods. None falls back to default value.

  • max_ngram_order (int): If given, it overrides the maximum n-gram order (default: 4) when computing precisions.

  • effective_order (bool): If True, stop including n-gram orders for which precision is 0. This should be True, if sentence-level BLEU will be computed. (default: True)

  • trg_lang (str): An optional language code to raise potential tokenizer warnings.

  • num_workers (int): Number of workers for multiprocessing.

--metric.lowercase, --metric.nolowercase

<__TEMP__> (default: False)

Default: False

--metric.force, --metric.noforce

<__TEMP__> (default: False)

Default: False

--metric.tokenize
--metric.smooth_method

<__TEMP__>

Default: 'exp'

--metric.smooth_value
--metric.max_ngram_order

<__TEMP__>

Default: 4

--metric.effective_order, --metric.noeffective_order

<__TEMP__> (default: True)

Default: True

--metric.trg_lang

<__TEMP__>

Default: ''

--metric.num_workers

<__TEMP__>

Default: 8