What does this release signal mean?

Baidu (ERNIE) published PaddlePaddle/PaddleNLP v3.0.0-beta1 (PaddlePaddle/PaddleNLP). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Beta release of Baidu's NLP library · v3.0.0-beta1 Repository: PaddlePaddle/PaddleNLP Tag: v3.0.0-beta1 Published: 2024-08-22T03:41:34Z Prerelease: yes Release notes:.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

Baidu (ERNIE) Release: PaddlePaddle/PaddleNLP v3.0.0-beta1

Captured source

source ↗

GitHub/github.com/PaddlePaddle/PaddleNLP

PaddlePaddle/PaddleNLP v3.0.0-beta1

Source ↗

published Aug 22, 2024seen 5dcaptured 14hhttp 200method plain

v3.0.0-beta1

Repository: PaddlePaddle/PaddleNLP

Tag: v3.0.0-beta1

Published: 2024-08-22T03:41:34Z

Prerelease: yes

Release notes: PaddleNLP从v3.0.0-beta0升级至v3.0.0-beta1版本，带来了多项重要更新与增强。新引入了Yuan、mamba和jamba模型，并优化了LLM推理代码，提升了兼容性和效率。

基础性能优化方面，添加了快速分词器，实现了MoE优化器参数广播，加速了层归一化。同时，修复了多个bug，包括safetensors shape切片问题和Windows下mmap问题，提升了系统稳定性和兼容性。

文档与测试方面，进行了全面更新和优化，确保了文档的准确性和代码的可读性。此外，还增强了国产硬件支持，包括DCU和XPU的优化，以及PIR模式和自动并行的配置更新。

主要变更与新增功能

1. 新模型与特性引入

新模型：在#8654 中引入了Yuan模型；在#8513 和#8517 中分别添加了mamba和jamba新模型，并在后续Pull Request中修复了相关bug，确保了模型的稳定运行。
LLM推理优化：通过多个Pull Request，我们优化了LLM推理代码，并新增了对新模型和参数的支持，进一步提升了推理效率和兼容性。

2. 基础性能优化

快速分词器：在#8832 中，我们添加了基于tokenizers库的快速分词器，显著提升了分词速度和性能。
MoE优化：在#8810 中，我们实现了MoE（Mixture of Experts）优化器参数的广播，有效增强了模型训练的效率。
层归一化加速：通过多个Pull Request，我们添加了fast_rmsnorm，启用了use_fast_layer_norm，并更新了基准测试配置，进一步加速了模型训练过程。特别是在#8717 中，我们支持了在微调过程中使用use_fast_layer_norm，为用户提供了更多灵活性。
训练性能优化：在#8803 中，我们添加了enable_sp_async_reduce_scatter选项，有效优化了训练性能。
字典参数支持：在#8446 中，我们为trainer的argparser添加了支持字典参数的新特性，增强了参数传递的灵活性。同时，在#8904 中，我们更新了tensorboard的要求，确保了与最新版本的兼容性。

3. Bug修复

safetensors修复：在#8702 中，我们修复了safetensors的形状问题。
Windows系统mmap修复：在#8734 中修复了mmap问题，提升了windows的兼容性。
其他Bug修复：包括#8687 、#8730 等多个Pull Request中的bug修复。

4. 文档与测试更新

文档优化：在多个Pull Request中，我们进行了文档更新、代码风格清理和版本信息更新，确保了文档的准确性和可读性。
README修复与增强：在#8741 中，我们修复了README中的断链问题；同时，多个贡献者更新了README文档，添加了新的测试用例，确保了文档与代码的同步更新。

5. 其他重要变更

##### 国产硬件支持增强

DCU支持：在#8580 中，我们实现了针对DCU的高性能LLM训练和推理，拓展了PaddleNLP的硬件支持范围。
XPU优化：在#8527 中，我们为XPU添加了LoRA优化；在#8697 和#8710 中，我们分别实现了XPU的allgather功能和修复了统一检查点的gather问题，进一步提升了XPU上的模型训练效率。

##### PIR模式支持

导出与加载优化：在#8689 中，我们修改了PIR模式下llama模型的导出方式；在#8712 和#8766 中，我们支持了以三种模式（旧IR、PIR模型文件、PIR JSON文件）加载或保存Llama2-7b模型，为用户提供了更多灵活性和兼容性。

##### 自动并行优化

配置更新：在#8679 中，我们更改了Llama2-7b配置中的max_steps以适应自动并行；在#8767 和#8828 中，我们优化了自动训练器的保存和加载功能；在#8750 中，我们更新了全局剪切的损失函数，进一步提升了自动并行的效率和准确性。

What's Changed

[DCU] high performance LLM train and inference for DCU by @yuguo-Jack in https://github.com/PaddlePaddle/PaddleNLP/pull/8580
fix benchmark dir and add CUDA_DEVICE_MAX_CONNECTIONS to qwen by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8678
bug fix by @wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/8687
[XPU] add lora optimization by @dynamicheart in https://github.com/PaddlePaddle/PaddleNLP/pull/8527
[pir save] Modiy export llama model file in pir mode by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleNLP/pull/8689
[AutoParallel]Change max_steps in Llama2-7b config for auto-parallel. by @heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8679
[benchmark] Change the mirror source for pip by @mmglove in https://github.com/PaddlePaddle/PaddleNLP/pull/8699
update loss base of auto-parallel tests by @zhiqiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8701
Add new mistral by @wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/7425
[Safetensors] Fix safetensors shape by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8702
[BUG] num_samples 向下去整, 防止prefrech预取时候超过数据集最大长度... by @JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8690
xpu use allgather by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8697
add fast_rmsnorm by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8680
enable use_fast_layer_norm for llama2 benchmark by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8714
fix xpu gather for unified ckpt by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8710
[inference] support load or save Llama2-7b in three patterns by @lizexu123 in https://github.com/PaddlePaddle/PaddleNLP/pull/8712
fix fast_ln backward by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8719
finetune support use_fast_layer_norm by @tianhaodongbd in https://github.com/PaddlePaddle/PaddleNLP/pull/8717
bug fix by @FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8730
disable lora by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8674
[Safetensors] Fix mmap for Windows system by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8734
correct broken links in readme by @jzhang533 in https://github.com/PaddlePaddle/PaddleNLP/pull/8741
revert benchmark fix by @ronny1996 in https://github.com/PaddlePaddle/PaddleNLP/pull/8747
[LLM] Add Yuan model by @zhaogf01 in https://github.com/PaddlePaddle/PaddleNLP/pull/8654
fix nlp dir and auto_parallel_ci exit -6 by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8744
[LLM] Update sequence parallel linear import by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8706
[Bug fixes] Fix ring attention by @zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/8740
update a100 loss by @zhiqiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8708
[PaddleNLP 3.0] Update README by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8681
[AutoParallel] update loss for global clip by @JZ-LIANG in https://github.com/PaddlePaddle/PaddleNLP/pull/8750
[NPU] Fix sequence parallel lib import by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8760
[DEV] Update develop version show by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8754
[inference] support load or save Llama2-7b in three patterns by @lizexu123 in https://github.com/PaddlePaddle/PaddleNLP/pull/8766
add benchmark baichuan2 scripts by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8683
Add the missing truncation=True in llm/predictor.py by @lszxb in https://github.com/PaddlePaddle/PaddleNLP/pull/8768
fix the ce for the unittest by @wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/8772
Enable parallel_config to use commas as delimiters. by @Difers in https://github.com/PaddlePaddle/PaddleNLP/pull/8677
fix incorrect token counting in llm/predictor.py by @lszxb in https://github.com/PaddlePaddle/PaddleNLP/pull/8769
Refine savable by @ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/8758
[CodeStyle] remove markdownlint-cli by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8779
[XPU] use allgather and fp32 multinomial for XPU by @houj04 in https://github.com/PaddlePaddle/PaddleNLP/pull/8787
fix version show by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8791
[BUG] Add 20 redundant data…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Beta release of Baidu's NLP library