ReleaseBaidu (ERNIE)Baidu (ERNIE)published Aug 22, 2024seen 5d

PaddlePaddle/PaddleNLP v3.0.0-beta1

PaddlePaddle/PaddleNLP

Open original ↗

Captured source

source ↗
published Aug 22, 2024seen 5dcaptured 14hhttp 200method plain

v3.0.0-beta1

Repository: PaddlePaddle/PaddleNLP

Tag: v3.0.0-beta1

Published: 2024-08-22T03:41:34Z

Prerelease: yes

Release notes: PaddleNLP从v3.0.0-beta0升级至v3.0.0-beta1版本,带来了多项重要更新与增强。新引入了Yuan、mamba和jamba模型,并优化了LLM推理代码,提升了兼容性和效率。

基础性能优化方面,添加了快速分词器,实现了MoE优化器参数广播,加速了层归一化。同时,修复了多个bug,包括safetensors shape切片问题和Windows下mmap问题,提升了系统稳定性和兼容性。

文档与测试方面,进行了全面更新和优化,确保了文档的准确性和代码的可读性。此外,还增强了国产硬件支持,包括DCU和XPU的优化,以及PIR模式和自动并行的配置更新。

主要变更与新增功能

1. 新模型与特性引入

  • 新模型:在#8654 中引入了Yuan模型;在#8513 和#8517 中分别添加了mamba和jamba新模型,并在后续Pull Request中修复了相关bug,确保了模型的稳定运行。
  • LLM推理优化:通过多个Pull Request,我们优化了LLM推理代码,并新增了对新模型和参数的支持,进一步提升了推理效率和兼容性。

2. 基础性能优化

  • 快速分词器:在#8832 中,我们添加了基于tokenizers库的快速分词器,显著提升了分词速度和性能。
  • MoE优化:在#8810 中,我们实现了MoE(Mixture of Experts)优化器参数的广播,有效增强了模型训练的效率。
  • 层归一化加速:通过多个Pull Request,我们添加了fast_rmsnorm,启用了use_fast_layer_norm,并更新了基准测试配置,进一步加速了模型训练过程。特别是在#8717 中,我们支持了在微调过程中使用use_fast_layer_norm,为用户提供了更多灵活性。
  • 训练性能优化:在#8803 中,我们添加了enable_sp_async_reduce_scatter选项,有效优化了训练性能。
  • 字典参数支持:在#8446 中,我们为trainer的argparser添加了支持字典参数的新特性,增强了参数传递的灵活性。同时,在#8904 中,我们更新了tensorboard的要求,确保了与最新版本的兼容性。

3. Bug修复

  • safetensors修复:在#8702 中,我们修复了safetensors的形状问题。
  • Windows系统mmap修复:在#8734 中修复了mmap问题,提升了windows的兼容性。
  • 其他Bug修复:包括#8687 、#8730 等多个Pull Request中的bug修复。

4. 文档与测试更新

  • 文档优化:在多个Pull Request中,我们进行了文档更新、代码风格清理和版本信息更新,确保了文档的准确性和可读性。
  • README修复与增强:在#8741 中,我们修复了README中的断链问题;同时,多个贡献者更新了README文档,添加了新的测试用例,确保了文档与代码的同步更新。

5. 其他重要变更

##### 国产硬件支持增强

  • DCU支持:在#8580 中,我们实现了针对DCU的高性能LLM训练和推理,拓展了PaddleNLP的硬件支持范围。
  • XPU优化:在#8527 中,我们为XPU添加了LoRA优化;在#8697 和#8710 中,我们分别实现了XPU的allgather功能和修复了统一检查点的gather问题,进一步提升了XPU上的模型训练效率。

##### PIR模式支持

  • 导出与加载优化:在#8689 中,我们修改了PIR模式下llama模型的导出方式;在#8712 和#8766 中,我们支持了以三种模式(旧IR、PIR模型文件、PIR JSON文件)加载或保存Llama2-7b模型,为用户提供了更多灵活性和兼容性。

##### 自动并行优化

  • 配置更新:在#8679 中,我们更改了Llama2-7b配置中的max_steps以适应自动并行;在#8767 和#8828 中,我们优化了自动训练器的保存和加载功能;在#8750 中,我们更新了全局剪切的损失函数,进一步提升了自动并行的效率和准确性。

What's Changed

  • [DCU] high performance LLM train and inference for DCU by @yuguo-Jack in https://github.com/PaddlePaddle/PaddleNLP/pull/8580
  • fix benchmark dir and add CUDA_DEVICE_MAX_CONNECTIONS to qwen by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8678
  • bug fix by @wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/8687
  • [XPU] add lora optimization by @dynamicheart in https://github.com/PaddlePaddle/PaddleNLP/pull/8527
  • [pir save] Modiy export llama model file in pir mode by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleNLP/pull/8689
  • [AutoParallel]Change max_steps in Llama2-7b config for auto-parallel. by @heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8679
  • [benchmark] Change the mirror source for pip by @mmglove in https://github.com/PaddlePaddle/PaddleNLP/pull/8699
  • update loss base of auto-parallel tests by @zhiqiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8701
  • Add new mistral by @wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/7425
  • [Safetensors] Fix safetensors shape by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8702
  • [BUG] num_samples 向下去整, 防止prefrech预取时候超过数据集最大长度... by @JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8690
  • xpu use allgather by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8697
  • add fast_rmsnorm by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8680
  • enable use_fast_layer_norm for llama2 benchmark by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8714
  • fix xpu gather for unified ckpt by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8710
  • [inference] support load or save Llama2-7b in three patterns by @lizexu123 in https://github.com/PaddlePaddle/PaddleNLP/pull/8712
  • fix fast_ln backward by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8719
  • finetune support use_fast_layer_norm by @tianhaodongbd in https://github.com/PaddlePaddle/PaddleNLP/pull/8717
  • bug fix by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8730
  • disable lora by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8674
  • [Safetensors] Fix mmap for Windows system by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8734
  • correct broken links in readme by @jzhang533 in https://github.com/PaddlePaddle/PaddleNLP/pull/8741
  • revert benchmark fix by @ronny1996 in https://github.com/PaddlePaddle/PaddleNLP/pull/8747
  • [LLM] Add Yuan model by @zhaogf01 in https://github.com/PaddlePaddle/PaddleNLP/pull/8654
  • fix nlp dir and auto_parallel_ci exit -6 by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8744
  • [LLM] Update sequence parallel linear import by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8706
  • [Bug fixes] Fix ring attention by @zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/8740
  • update a100 loss by @zhiqiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8708
  • [PaddleNLP 3.0] Update README by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8681
  • [AutoParallel] update loss for global clip by @JZ-LIANG in https://github.com/PaddlePaddle/PaddleNLP/pull/8750
  • [NPU] Fix sequence parallel lib import by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8760
  • [DEV] Update develop version show by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8754
  • [inference] support load or save Llama2-7b in three patterns by @lizexu123 in https://github.com/PaddlePaddle/PaddleNLP/pull/8766
  • add benchmark baichuan2 scripts by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8683
  • Add the missing truncation=True in llm/predictor.py by @lszxb in https://github.com/PaddlePaddle/PaddleNLP/pull/8768
  • fix the ce for the unittest by @wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/8772
  • Enable parallel_config to use commas as delimiters. by @Difers in https://github.com/PaddlePaddle/PaddleNLP/pull/8677
  • fix incorrect token counting in llm/predictor.py by @lszxb in https://github.com/PaddlePaddle/PaddleNLP/pull/8769
  • Refine savable by @ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/8758
  • [CodeStyle] remove markdownlint-cli by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8779
  • [XPU] use allgather and fp32 multinomial for XPU by @houj04 in https://github.com/PaddlePaddle/PaddleNLP/pull/8787
  • fix version show by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8791
  • [BUG] Add 20 redundant data…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Beta release of Baidu's NLP library