Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicts resolution #1

Closed
wants to merge 275 commits into from
Closed

Conversation

samirsalman
Copy link
Owner

Description

Please add a clear and concise description of the changes.

This PR fixes a bug/adds a new feature/refactorizes the code/does something else.
It is related to issues: marian-nmt#998, marian-nmt#999, ...

List of changes:

  • ...
  • ...
  • ...

Added dependencies: none

How to test

Describe how to test your changes, adding command line examples and sample input/output files if relevant.
Point to unit tests or regression tests covering the changes if they have been added.

Describe how you have tested your code, including OS and the cmake command.

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

Hieu Hoang and others added 30 commits April 29, 2021 00:44
* Enable compute86 where supported
* Enable on-line packing/quantization
* Add half precision min/max quantization for model weights
* Change default quantization of B matrix to min/max, revert a false commit for AggregateAll
* Fixed missing half quantization
* Fix quantization range for A
* Set all default values for the quantize range to 0.f
* Use 7 bits clip for the weight matrix quantization to avoid an overflow of VPMADDUBSW
…-stalled

This fixes a bug that's been discovered recently by checking if a validator exists before resetting its stalled validations.
Regression test for it is in: marian-nmt/marian-regression-tests#80
dependabot bot and others added 27 commits April 14, 2023 21:03
Bumps [regression-tests](https://github.com/marian-nmt/marian-regression-tests) from `2a8bed3` to `89ce02e`.
- [Release notes](https://github.com/marian-nmt/marian-regression-tests/releases)
- [Commits](marian-nmt/marian-regression-tests@2a8bed3...89ce02e)

---
updated-dependencies:
- dependency-name: regression-tests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Without these quotes, cmake fails in a confusing manner
on systems whose cpuinfo output includes spaces.

This arose in the context of attempting to compile natively on an m1 mac.

$ /usr/sbin/sysctl -n machdep.cpu.features machdep.cpu.leaf7_features
sysctl: unknown oid 'machdep.cpu.leaf7_features'

Obviously, this didn't work out well; there is still much more to do.
Still, the quotes are cheap and eliminate a confusing failure mode.
For this reason, I added them to the linux as well as the darwin path.
Bumps [src/3rd_party/fbgemm](https://github.com/marian-nmt/FBGEMM) from `6f45243` to `0e33146`.
- [Commits](marian-nmt/FBGEMM@6f45243...0e33146)

---
updated-dependencies:
- dependency-name: src/3rd_party/fbgemm
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ed version during training

Adds option to replace current parameters with smoothed version during training. Could potentially help with convergence and training stability.
…g with fp16 fails

This PR adds a do-while loop to training. It should only repeat if a fp16 training run was interrupted via the throwing of a DivergenceException from training/scheduler.h and if --throw-on-divergence and --fp16-fallback-to-fp32 are enabled.

The repeated training run will continue from last checkpoint (similar to a manually interrupted training) but attempt training in fp32. If that training run or any other fp32 training happens to diverge, training will exit with an unhandled DivergenceException. This is on purpose to indicate a fatal error.
Small simplification to create the correctly named tarball via `make marian_tgz`  resulting in e.g. `marian-2023-06-28-8390b1d.tgz`

This will be executed every time make `marian_tgz` is invoked, but depends on the correct targets and will update changed commit revisions etc. Uses PST time zone.
LSH vocab filtering for GPU.

Speed is not competitive with non-LSH. Checking in for completeness and possible future use of LSH on GPU for non-filtering stuff

eg. decoding $22k sentences, mini-batch 256, maxi-batch 10 using production SSRU model:
Without LSH:  53.86 sec. With LSH: 108.27
This PR adds:
* An implementation of BLEURT with conversion script
* Some code refactoring for COMET models
* A more cleanly separated "evaluate" and "embed" functionality for COMET/COMET-QE/BLEURT
* A number of MBR-related scripts.
Fixes and extends unit test for layer norm. Previous version had a weird usage of Glorot Uniform.
Various small improvements, missing operators, missing gradient computations etc. The two most useful ones are probably:
* Working backward step (gradient) for scatter operation
* Possiblity to use LayerNorm and RMSNorm without scale and bias vectors (especially in new layer framework)
Undoes the accidental renaming of the scale parameter in Norms layer back to "weight".
Reusing these YAML configs helps speed up coreleaf loading. The only consumers of this quicksand API are the leaf, and I think this small memory tradeoff of keeping these in cache is worth the speedup.

Related work items: #146810
…ation number) when requested.

This PR adds the option `--overwrite-checkpoints` (by default true to mimic current behavior) which can be set to `false` to force full checkpoint saving and preservation at saving intervals. E.g. for a model named `rus.enu.generalnn.replica_1.model.iter37769.npz`, Marian will then also save `rus.enu.generalnn.replica_1.model.iter37769.npz.optimizer.npz` and `rus.enu.generalnn.replica_1.model.iter37769.npz.progress.yml`.
…nmt#1000)

Bumps [src/3rd_party/sentencepiece](https://github.com/marian-nmt/sentencepiece) from `8dc9172` to `fb6f8e4`.
- [Commits](marian-nmt/sentencepiece@8dc9172...fb6f8e4)

---
updated-dependencies:
- dependency-name: src/3rd_party/sentencepiece
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR explicitly disables server compilation in macOS build with clang. It seems an update to the macos-12 environment provided openssl and boost, which when found by cmake, enables compilation of marian-server, which doesn't work with clang.
Set compatible versions of Python modules after Cython 3.0 release.
This PR adds `--custom-fallbacks` and generalizes the previous attempt at handling diverged trainings.

Now we can specify any number of fallback options that get used in subsequent diverged trainings. E.g. we can restart a training from the last checkpoint by turning off fp16 training and if we still encounter a divergence, we can also lower the learning rate on the next attempt. This would be achieved by adding the following to a config file:

```
custom-fallbacks:
  - fp16: false
    precision: [float32, float32]
    cost-scaling: []
  - fp16: false
    precision: [float32, float32]
    cost-scaling: []
    learn-rate: 0.0001
```

On the command line we can specify json-style options like `--custom-fallbacks "{fp16: false, precision: [float32, float32], cost-scaling: []}" "{fp16: false, precision: [float32, float32], cost-scaling: [], learn-rate: 0.0001}"` where each string in `"..."` gets parsed to a Yaml list entry.

The previous option `--fp16-fallback-to-fp32` is now just an alias for the corresponding `--custom-fallbacks` values (first entry above). Any number of fallbacks can be specified.
This PR fixes fine-tuning a model trained with an older version of Marian by:
- adding the removed option `num-devices` to the list of deprecated options
- checking if `loss-{arg,var}-{slow,fast}` are present in .progress.yml file
…rgence

Make sure that the averaged loss is actually well-defined and not inf or nan.
marian-nmt#1003)

* Add an option to not encode sentencepiece during training/decoding allowing passing of spmIDs directly
* Update changelog
* numbers -> pieces
@samirsalman samirsalman changed the base branch from master to dynamic_swap_mvp August 24, 2023 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.