Releases: awslabs/sockeye
Releases · awslabs/sockeye
3.1.4
3.1.3
[3.1.3]
Added
- Added support for the use of adding source prefixes to the input in JSON format during inference.
[3.1.2]
Changed
- Optimized creation of source length mask by using
expand
instead ofrepeat_interleave
.
[3.1.1]
Changed
- Updated torch dependency to 1.10.x (
torch>=1.10.0,<1.11.0
)
3.1.0
[3.1.0]
Sockeye is now exclusively based on Pytorch.
Changed
- Renamed
x_pt
modules tox
. Updated entry points insetup.py
.
Removed
- Removed MXNet from the codebase
- Removed device locking / GPU acquisition logic. Removed dependency on
portalocker
. - Removed arguments
--softmax-temperature
,--weight-init-*
,--mc-dropout
,--horovod
,--device-ids
- Removed all MXNet-related tests
3.0.15
[3.0.15]
Fixed
- Fixed GPU-based scoring by copying to cpu tensor first before converting to numpy.
[3.0.14]
Added
- Added support for Translation Error Rate (TER) metric as implemented in sacrebleu==1.4.14.
Checkpoint decoder metrics will now include TER scores and early stopping can be determined
via TER improvements (--optimized-metric ter
)
3.0.13
[3.0.13]
Changed
- use
expand
instead ofrepeat
for attention masks to not allocate additional memory - avoid repeated
transpose
for initializing cached encoder-attention states in the decoder.
[3.0.12]
Removed
- Removed unused code for Weight Normalization. Minor code cleanups.
[3.0.11]
Fixed
- Fixed training with a single, fixed learning rate instead of a rate scheduler (
--learning-rate-scheduler none --initial-learning-rate ...
).
3.0.10
[3.0.10]
Changed
- End-to-end trace decode_step of the Sockeye model. Creates less overhead during decoding and a small speedup.
[3.0.9]
Fixed
- Fixed not calling the traced target embedding module during inference.
[3.0.8]
Changed
- Add support for JIT tracing source/target embeddings and JIT scripting the output layer during inference.
3.0.7
[3.0.7]
Changed
- Improve training speed by using
torch.nn.functional.multi_head_attention_forward
for self- and encoder-attention
during training. Requires reorganization of the parameter layout of the key-value input projections,
as the current Sockeye attention interleaves for faster inference.
Attention masks (both for source masking and autoregressive masks need some shape adjustments as requirements
for the fused MHA op differ slightly).- Non-interleaved format for joint key-value input projection parameters:
in_features=hidden, out_features=2*hidden -> Shape: (2*hidden, hidden)
- Interleaved format for joint-key-value input projection stores key and value parameters, grouped by heads:
Shape: ((num_heads * 2 * hidden_per_head), hidden)
- Models save and load key-value projection parameters in interleaved format.
- When
model.training == True
key-value projection parameters are put into
non-interleaved format fortorch.nn.functional.multi_head_attention_forward
- When
model.training == False
, i.e. model.eval() is called, key-value projection
parameters are again converted into interleaved format in place.
- Non-interleaved format for joint key-value input projection parameters:
[3.0.6]
Fixed
- Fixed checkpoint decoder issue that prevented using
bleu
as--optimized-metric
for distributed training (#995).
[3.0.5]
Fixed
- Fixed data download in multilingual tutorial.
3.0.4
[3.0.4]
- Make sure data permutation indices are in int64 format (doesn't seem to be the case by default on all platforms).
[3.0.3]
Fixed
- Fixed ensemble decoding for models without target factors.
[3.0.2]
Changed
sockeye-translate
: Beam search now computes and returns secondary target factor scores. Secondary target factors
do not participate in beam search, but are greedily chosen at every time step. Accumulated scores for secondary factors
are not normalized by length. Factor scores are included in JSON output (--output-type json
).sockeye-score
now returns tab-separated scores for each target factor. Users can decide how to combine factor scores
depending on the downstream application. Score for the first, primary factor (i.e. output words) are normalized,
other factors are not.
[3.0.1]
Fixed
- Parameter averaging (
sockeye-average
) now always uses the CPU, which enables averaging parameters from GPU-trained models on CPU-only hosts.
3.0.0
[3.0.0] Sockeye 3: Fast Neural Machine Translation with PyTorch
Sockeye is now based on PyTorch.
We maintain backwards compatibility with MXNet models in version 2.3.x until 3.1.0.
If MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet but MXNet is no longer strictly required.
Added
- Added model converter CLI
sockeye.mx_to_pt
that converts MXNet models to PyTorch models. - Added
--apex-amp
training argument that runs entire model in FP16 mode, replaces--dtype float16
(requires Apex). - Training automatically uses Apex fused optimizers if available (requires Apex).
- Added training argument
--label-smoothing-impl
to choose label smoothing implementation (default ofmxnet
uses the same logic as MXNet Sockeye 2).
Changed
- CLI names point to the PyTorch code base (e.g.
sockeye-train
etc.). - MXNet-based CLIs are now accessible via
sockeye-<name>-mx
. - MXNet code requires MXNet >= 2.0 since we adopted the new numpy interface.
sockeye-train
now uses PyTorch's distributed data-parallel mode for multi-process (multi-GPU) training. Launch with:torchrun --no_python --nproc_per_node N sockeye-train --dist ...
- Updated the quickstart tutorial to cover multi-device training with PyTorch Sockeye.
- Changed
--device-ids
argument (plural) to--device-id
(singular). For multi-GPU training, see distributed mode noted above. - Updated default value:
--pad-vocab-to-multiple-of 8
- Removed
--horovod
argument used withhorovodrun
(use--dist
withtorchrun
). - Removed
--optimizer-params
argument (use--optimizer-betas
,--optimizer-eps
). - Removed
--no-hybridization
argument (usePYTORCH_JIT=0
, see Disable JIT for Debugging). - Removed
--omp-num-threads
argument (use--env=OMP_NUM_THREADS=N
).
Removed
- Removed support for constrained decoding (both positive and negative lexical constraints)
- Removed support for beam histories
- Removed
--amp-scale-interval
argument. - Removed
--kvstore
argument. - Removed arguments:
--weight-init
,--weight-init-scale
--weight-init-xavier-factor-type
,--weight-init-xavier-rand-type
- Removed
--decode-and-evaluate-device-id
argument. - Removed arguments:
--monitor-pattern'
,--monitor-stat-func
- Removed CUDA-specific requirements files in
requirements/