Releases · alibaba/ChatLearn

What's Changed

[doc] refine performance image size by @SeaOfOcean in #39
Update issue templates by @SeaOfOcean in #40
Add unit test and daily build by @SeaOfOcean in #41
raise error if parameter sync breaks by @charles9304 in #42
Refine the description of docs. by @adoda in #45
Trigger UT when the pull request is approved and the number is 2 by @SeaOfOcean in #52
fix none src model and skip load ckpt for vllm by @stcy07 in #50
refine log output for trainer model by @stcy07 in #49
make *penalty of sampling_params configurable. by @charles9304 in #54
set env concurrent. by @adoda in #55
Refine vllm inference and keep the API same as non-vllm by @SeaOfOcean in #46
Speedup ut && format. by @adoda in #57
Refine efficient memory sharing by @SeaOfOcean in #58
fix vllm_module InferenceMemoryManager args error by @SeaOfOcean in #61
[UT] rm duplicate ray stop by @SeaOfOcean in #62
fix onload offload in save_checkpoint by @SeaOfOcean in #63
fix exit with log_monitor error by @SeaOfOcean in #60
disable onload/offload when not colocated. by @charles9304 in #65
Parameter sync fallback to P2P when TP size is odd by @SeaOfOcean in #64
fix cpu_per_process and gpu_per_process when num_gpu/num_cpu is 1 by @SeaOfOcean in #67
Reverse DP replicas in parameter sync when tp size is odd by @SeaOfOcean in #68
Upload Python Package when release is published by @SeaOfOcean in #69
stop previous run container when running ut by @SeaOfOcean in #73
Support get tp/pp for torch_module/deepspeed_module and fix ut. by @adoda in #72
Add DingTalk group to README. by @adoda in #74
fix policy generation oom when continue train by @SeaOfOcean in #77
Increase the num of episodes to allow the model to converge more fully by @adoda in #76
set build time to 00.30 am utc+8 by @SeaOfOcean in #75
feat:add and use multi thread tokenize tool in VLLMPromptPipeline by @stcy07 in #56
add load ckpt for value model and warinings by @stcy07 in #78
Be compatible with group query attention for QWen2. by @charles9304 in #79
fix missing import in example by @SeaOfOcean in #80
Upgrade version number by @SeaOfOcean in #81
Revert "fix exit with log_monitor error (#60)" by @SeaOfOcean in #82
fix dp_rank not in dp2send_actors when inference replica num less than training replica num by @SeaOfOcean in #83

New Contributors

@charles9304 made their first contribution in #42
@adoda made their first contribution in #45
@stcy07 made their first contribution in #50

Full Changelog: v1.0.0...v1.0.1

What's Changed

Support llama2 based on official repo of Megatron-LM

Refactor tutorial docs and add tools to convert megatron ckpt to hf

Fix parameter sync when src_pipe != tgt_pipe and tgt_pipe != 1

Reduce the number of port required

Refine resume training and doc

Add node address to error message and exit with error code

Show the log in each worker node and refine docs

Add continue train docs and check applied device

Join log thread with timeout and trigger when process exit

Support custom model flow

Feat: support optimizer offload

Doc: add faq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

What's Changed

Releases: alibaba/ChatLearn

v1.0.2

What's Changed

Contributors

v1.0.1

What's Changed

New Contributors

Contributors

v1.0.0

What's Changed

v0.2.0

What's Changed

v0.1.0