Manual of CLI options#
mbrs-generate#
usage: mbrs-generate [-h] [--config_path Path] [--plugin_dir Path]
[-o FileType('w')] [--lprobs [FileType('w')]]
[--length_normalized_lprobs [FileType('w')]] [-m str]
[-n int] [-s {,eps}] [--beam_size int] [-e float]
[--lang_pair str] [--max_length int] [--min_length int]
[--length_penalty [float]] [--batch_size int]
[--sampling_size int] [--unique bool] [--retry int]
[--fp16 bool] [--bf16 bool] [--cpu bool] [--seed int]
[-q bool] [--report FileType('w')]
[--report_format {asciidoc,double_grid,double_outline,fancy_grid,fancy_outline,github,grid,heavy_grid,heavy_outline,html,jira,latex,latex_booktabs,latex_longtable,latex_raw,mediawiki,mixed_grid,mixed_outline,moinmoin,orgtbl,outline,pipe,plain,presto,pretty,psql,rounded_grid,rounded_outline,rst,simple,simple_grid,simple_outline,textile,tsv,unsafehtml,youtrack}]
[-w int]
[FileType('r', encoding='utf-8')]
Named Arguments#
- --config_path
Path to a config file containing default values to use.
- --plugin_dir
Path to a directory containing user defined plugins.
GenerationArguments [‘generation’]#
Generation arguments.
- generation.input
Input file. If not specified, read from stdin.
Default:
-- -o, --output
Output file.
Default:
-- --lprobs
Reference log-probabilities file. This option is typically used for the model-based estimation.
- --length_normalized_lprobs
Length-normalized reference log-probabilities file. This option is typically used for the model-based estimation.
- -m, --model
Model name or path.
Default:
'facebook/m2m100_418M'- -n, --num_candidates
Number of candidates to be returned.
Default:
1- -s, --sampling
Possible choices: , eps
Sampling method.
Default:
''- --beam_size
Beam size.
Default:
5- -e, --eps, --epsilon
Cutoff parameter for epsilon sampling.
Default:
0.02- --lang_pair
Language code pair. Some models like M2M100 uses this information.
Default:
'en-de'- --max_length
Maximum length of an output sentence.
Default:
1024- --min_length
Minimum length of an output sentence.
Default:
1- --length_penalty
Length penalty.
- --batch_size
Batch size.
Default:
1- --sampling_size
Sampling size in a single inference. The model generates this number of samples at a time until the total number of samples reaches –num_candidates.
Default:
8- --unique, --nounique
Generate unique sentences for each input. (default: False)
Default:
False- --retry
Retry to do sampling N times when generate unique sentences. If no unique sentences are found after this number of attempts, non-unique sentences will be included in outputs.
Default:
100- --fp16, --nofp16
Use float16. (default: False)
Default:
False- --bf16, --nobf16
Use bfloat16. (default: False)
Default:
False- --cpu, --nocpu
Force to use CPU. (default: False)
Default:
False- --seed
Random number seed.
Default:
0- -q, --quiet, --noq, --noquiet
No report statistics.. (default: False)
Default:
False- --report
Report file.
Default:
-- --report_format
Possible choices: asciidoc, double_grid, double_outline, fancy_grid, fancy_outline, github, grid, heavy_grid, heavy_outline, html, jira, latex, latex_booktabs, latex_longtable, latex_raw, mediawiki, mixed_grid, mixed_outline, moinmoin, orgtbl, outline, pipe, plain, presto, pretty, psql, rounded_grid, rounded_outline, rst, simple, simple_grid, simple_outline, textile, tsv, unsafehtml, youtrack
Report runtime statistics with the given format.
Default:
'rounded_outline'- -w, --width
Number of digits for values of float point.
Default:
1
mbrs-decode#
usage: mbrs-decode [-h] [--config_path Path] [--plugin_dir Path] -n int
[-s [str]] [-r [str]] [--reference_lprobs [str]]
[-o FileType('w', encoding='utf-8')]
[--format {plain,json}] [--num_references [int]]
[--decoder {mbr,aggregate_mbr,centroid_mbr,probabilistic_mbr,pruning_mbr,rerank}]
[--metric {bertscore,bleu,bleurt,chrf,comet,cometkiwi,metricx,ter,xcomet}]
[--selector {diverse,nbest}] [--nbest int] [--quiet bool]
[--report FileType('w')]
[--report_format {asciidoc,double_grid,double_outline,fancy_grid,fancy_outline,github,grid,heavy_grid,heavy_outline,html,jira,latex,latex_booktabs,latex_longtable,latex_raw,mediawiki,mixed_grid,mixed_outline,moinmoin,orgtbl,outline,pipe,plain,presto,pretty,psql,rounded_grid,rounded_outline,rst,simple,simple_grid,simple_outline,textile,tsv,unsafehtml,youtrack}]
[-w int] [--metric.lowercase bool] [--metric.force bool]
[--metric.tokenize [str]] [--metric.smooth_method str]
[--metric.smooth_value [float]]
[--metric.max_ngram_order int]
[--metric.effective_order bool] [--metric.trg_lang str]
[--metric.num_workers int]
str
Named Arguments#
- --config_path
Path to a config file containing default values to use.
- --plugin_dir
Path to a directory containing user defined plugins.
CommonArguments [‘common’]#
Common arguments.
- common.hypotheses
Hypotheses file.
- -n, --num_candidates
Number of candidates.
- -s, --source
Source file.
- -r, --references
References file.
- --reference_lprobs
References log-probabilities file.
- -o, --output
Output file.
Default:
-- --format
Possible choices: plain, json
Output format.
Default:
Format.plain- --num_references
Number of references for each sentence.
- --decoder
Possible choices: mbr, aggregate_mbr, centroid_mbr, probabilistic_mbr, pruning_mbr, rerank
Type of the decoder.
Default:
'mbr'- --metric
Possible choices: bertscore, bleu, bleurt, chrf, comet, cometkiwi, metricx, ter, xcomet
Type of the metric.
Default:
'bleu'- --selector
Possible choices: diverse, nbest
Type of the selector.
Default:
'nbest'- --nbest
Return the n-best hypotheses.
Default:
1- --quiet, --noquiet
No verbose information and report. (default: False)
Default:
False- --report
Report file.
Default:
-- --report_format
Possible choices: asciidoc, double_grid, double_outline, fancy_grid, fancy_outline, github, grid, heavy_grid, heavy_outline, html, jira, latex, latex_booktabs, latex_longtable, latex_raw, mediawiki, mixed_grid, mixed_outline, moinmoin, orgtbl, outline, pipe, plain, presto, pretty, psql, rounded_grid, rounded_outline, rst, simple, simple_grid, simple_outline, textile, tsv, unsafehtml, youtrack
Report runtime statistics with the given format.
Default:
'rounded_outline'- -w, --width
Number of digits for values of float point.
Default:
1
MetricBLEU.Config [‘metric’]#
BLEU metric configuration.
lowercase (bool): If True, lowercased BLEU is computed.
force (bool): Ignore data that looks already tokenized.
tokenize (str, optional): The tokenizer to use. If None, defaults to language-specific tokenizers with ‘13a’ as the fallback default.
smooth_method (str): The smoothing method to use (‘floor’, ‘add-k’, ‘exp’ or ‘none’).
smooth_value (float, optional): The smoothing value for floor and add-k methods. None falls back to default value.
max_ngram_order (int): If given, it overrides the maximum n-gram order (default: 4) when computing precisions.
effective_order (bool): If True, stop including n-gram orders for which precision is 0. This should be True, if sentence-level BLEU will be computed. (default: True)
trg_lang (str): An optional language code to raise potential tokenizer warnings.
num_workers (int): Number of workers for multiprocessing.
- --metric.lowercase, --metric.nolowercase
<__TEMP__> (default: False)
Default:
False- --metric.force, --metric.noforce
<__TEMP__> (default: False)
Default:
False- --metric.tokenize
- --metric.smooth_method
<__TEMP__>
Default:
'exp'- --metric.smooth_value
- --metric.max_ngram_order
<__TEMP__>
Default:
4- --metric.effective_order, --metric.noeffective_order
<__TEMP__> (default: True)
Default:
True- --metric.trg_lang
<__TEMP__>
Default:
''- --metric.num_workers
<__TEMP__>
Default:
8
mbrs-score#
usage: mbrs-score [-h] [--config_path Path] [--plugin_dir Path] [-s [str]]
[-r [str]] [--format {plain,json}]
[--metric {bertscore,bleu,bleurt,chrf,comet,cometkiwi,metricx,ter,xcomet}]
[--quiet bool] [-w int] [--metric.lowercase bool]
[--metric.force bool] [--metric.tokenize [str]]
[--metric.smooth_method str] [--metric.smooth_value [float]]
[--metric.max_ngram_order int]
[--metric.effective_order bool] [--metric.trg_lang str]
[--metric.num_workers int]
str
Named Arguments#
- --config_path
Path to a config file containing default values to use.
- --plugin_dir
Path to a directory containing user defined plugins.
CommonArguments [‘common’]#
Common arguments.
- common.hypotheses
Hypotheses file.
- -s, --sources
Sources file.
- -r, --references
References file.
- --format
Possible choices: plain, json
Output format.
Default:
Format.json- --metric
Possible choices: bertscore, bleu, bleurt, chrf, comet, cometkiwi, metricx, ter, xcomet
Type of the metric.
Default:
'bleu'- --quiet, --noquiet
No verbose information and report. (default: False)
Default:
False- -w, --width
Number of digits for values of float point.
Default:
1
MetricBLEU.Config [‘metric’]#
BLEU metric configuration.
lowercase (bool): If True, lowercased BLEU is computed.
force (bool): Ignore data that looks already tokenized.
tokenize (str, optional): The tokenizer to use. If None, defaults to language-specific tokenizers with ‘13a’ as the fallback default.
smooth_method (str): The smoothing method to use (‘floor’, ‘add-k’, ‘exp’ or ‘none’).
smooth_value (float, optional): The smoothing value for floor and add-k methods. None falls back to default value.
max_ngram_order (int): If given, it overrides the maximum n-gram order (default: 4) when computing precisions.
effective_order (bool): If True, stop including n-gram orders for which precision is 0. This should be True, if sentence-level BLEU will be computed. (default: True)
trg_lang (str): An optional language code to raise potential tokenizer warnings.
num_workers (int): Number of workers for multiprocessing.
- --metric.lowercase, --metric.nolowercase
<__TEMP__> (default: False)
Default:
False- --metric.force, --metric.noforce
<__TEMP__> (default: False)
Default:
False- --metric.tokenize
- --metric.smooth_method
<__TEMP__>
Default:
'exp'- --metric.smooth_value
- --metric.max_ngram_order
<__TEMP__>
Default:
4- --metric.effective_order, --metric.noeffective_order
<__TEMP__> (default: True)
Default:
True- --metric.trg_lang
<__TEMP__>
Default:
''- --metric.num_workers
<__TEMP__>
Default:
8