PaddlePaddle/PaddleNLP v3.0.0-beta1
PaddlePaddle/PaddleNLP
Captured source
source ↗published Aug 22, 2024seen 5dcaptured 14hhttp 200method plain
v3.0.0-beta1
Repository: PaddlePaddle/PaddleNLP
Tag: v3.0.0-beta1
Published: 2024-08-22T03:41:34Z
Prerelease: yes
Release notes: PaddleNLP从v3.0.0-beta0升级至v3.0.0-beta1版本,带来了多项重要更新与增强。新引入了Yuan、mamba和jamba模型,并优化了LLM推理代码,提升了兼容性和效率。
基础性能优化方面,添加了快速分词器,实现了MoE优化器参数广播,加速了层归一化。同时,修复了多个bug,包括safetensors shape切片问题和Windows下mmap问题,提升了系统稳定性和兼容性。
文档与测试方面,进行了全面更新和优化,确保了文档的准确性和代码的可读性。此外,还增强了国产硬件支持,包括DCU和XPU的优化,以及PIR模式和自动并行的配置更新。
主要变更与新增功能
1. 新模型与特性引入
- 新模型:在#8654 中引入了Yuan模型;在#8513 和#8517 中分别添加了mamba和jamba新模型,并在后续Pull Request中修复了相关bug,确保了模型的稳定运行。
- LLM推理优化:通过多个Pull Request,我们优化了LLM推理代码,并新增了对新模型和参数的支持,进一步提升了推理效率和兼容性。
2. 基础性能优化
- 快速分词器:在#8832 中,我们添加了基于
tokenizers库的快速分词器,显著提升了分词速度和性能。 - MoE优化:在#8810 中,我们实现了MoE(Mixture of Experts)优化器参数的广播,有效增强了模型训练的效率。
- 层归一化加速:通过多个Pull Request,我们添加了fast_rmsnorm,启用了use_fast_layer_norm,并更新了基准测试配置,进一步加速了模型训练过程。特别是在#8717 中,我们支持了在微调过程中使用use_fast_layer_norm,为用户提供了更多灵活性。
- 训练性能优化:在#8803 中,我们添加了
enable_sp_async_reduce_scatter选项,有效优化了训练性能。 - 字典参数支持:在#8446 中,我们为trainer的argparser添加了支持字典参数的新特性,增强了参数传递的灵活性。同时,在#8904 中,我们更新了tensorboard的要求,确保了与最新版本的兼容性。
3. Bug修复
- safetensors修复:在#8702 中,我们修复了safetensors的形状问题。
- Windows系统mmap修复:在#8734 中修复了mmap问题,提升了windows的兼容性。
- 其他Bug修复:包括#8687 、#8730 等多个Pull Request中的bug修复。
4. 文档与测试更新
- 文档优化:在多个Pull Request中,我们进行了文档更新、代码风格清理和版本信息更新,确保了文档的准确性和可读性。
- README修复与增强:在#8741 中,我们修复了README中的断链问题;同时,多个贡献者更新了README文档,添加了新的测试用例,确保了文档与代码的同步更新。
5. 其他重要变更
##### 国产硬件支持增强
- DCU支持:在#8580 中,我们实现了针对DCU的高性能LLM训练和推理,拓展了PaddleNLP的硬件支持范围。
- XPU优化:在#8527 中,我们为XPU添加了LoRA优化;在#8697 和#8710 中,我们分别实现了XPU的allgather功能和修复了统一检查点的gather问题,进一步提升了XPU上的模型训练效率。
##### PIR模式支持
- 导出与加载优化:在#8689 中,我们修改了PIR模式下llama模型的导出方式;在#8712 和#8766 中,我们支持了以三种模式(旧IR、PIR模型文件、PIR JSON文件)加载或保存Llama2-7b模型,为用户提供了更多灵活性和兼容性。
##### 自动并行优化
- 配置更新:在#8679 中,我们更改了Llama2-7b配置中的
max_steps以适应自动并行;在#8767 和#8828 中,我们优化了自动训练器的保存和加载功能;在#8750 中,我们更新了全局剪切的损失函数,进一步提升了自动并行的效率和准确性。
What's Changed
- [DCU] high performance LLM train and inference for DCU by @yuguo-Jack in https://github.com/PaddlePaddle/PaddleNLP/pull/8580
- fix benchmark dir and add CUDA_DEVICE_MAX_CONNECTIONS to qwen by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8678
- bug fix by @wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/8687
- [XPU] add lora optimization by @dynamicheart in https://github.com/PaddlePaddle/PaddleNLP/pull/8527
- [pir save] Modiy export llama model file in pir mode by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleNLP/pull/8689
- [AutoParallel]Change
max_stepsin Llama2-7b config for auto-parallel. by @heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8679 - [benchmark] Change the mirror source for pip by @mmglove in https://github.com/PaddlePaddle/PaddleNLP/pull/8699
- update loss base of auto-parallel tests by @zhiqiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8701
- Add new mistral by @wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/7425
- [Safetensors] Fix safetensors shape by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8702
- [BUG] num_samples 向下去整, 防止prefrech预取时候超过数据集最大长度... by @JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8690
- xpu use allgather by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8697
- add fast_rmsnorm by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8680
- enable use_fast_layer_norm for llama2 benchmark by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8714
- fix xpu gather for unified ckpt by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8710
- [inference] support load or save Llama2-7b in three patterns by @lizexu123 in https://github.com/PaddlePaddle/PaddleNLP/pull/8712
- fix fast_ln backward by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8719
- finetune support use_fast_layer_norm by @tianhaodongbd in https://github.com/PaddlePaddle/PaddleNLP/pull/8717
- bug fix by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8730
- disable lora by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8674
- [Safetensors] Fix mmap for Windows system by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8734
- correct broken links in readme by @jzhang533 in https://github.com/PaddlePaddle/PaddleNLP/pull/8741
- revert benchmark fix by @ronny1996 in https://github.com/PaddlePaddle/PaddleNLP/pull/8747
- [LLM] Add Yuan model by @zhaogf01 in https://github.com/PaddlePaddle/PaddleNLP/pull/8654
- fix nlp dir and auto_parallel_ci exit -6 by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8744
- [LLM] Update sequence parallel linear import by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8706
- [Bug fixes] Fix ring attention by @zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/8740
- update a100 loss by @zhiqiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8708
- [PaddleNLP 3.0] Update README by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8681
- [AutoParallel] update loss for global clip by @JZ-LIANG in https://github.com/PaddlePaddle/PaddleNLP/pull/8750
- [NPU] Fix sequence parallel lib import by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8760
- [DEV] Update develop version show by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8754
- [inference] support load or save Llama2-7b in three patterns by @lizexu123 in https://github.com/PaddlePaddle/PaddleNLP/pull/8766
- add benchmark baichuan2 scripts by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8683
- Add the missing truncation=True in llm/predictor.py by @lszxb in https://github.com/PaddlePaddle/PaddleNLP/pull/8768
- fix the ce for the unittest by @wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/8772
- Enable parallel_config to use commas as delimiters. by @Difers in https://github.com/PaddlePaddle/PaddleNLP/pull/8677
- fix incorrect token counting in
llm/predictor.pyby @lszxb in https://github.com/PaddlePaddle/PaddleNLP/pull/8769 - Refine savable by @ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/8758
- [CodeStyle] remove markdownlint-cli by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8779
- [XPU] use allgather and fp32 multinomial for XPU by @houj04 in https://github.com/PaddlePaddle/PaddleNLP/pull/8787
- fix version show by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8791
- [BUG] Add 20 redundant data…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Beta release of Baidu's NLP library