ReleaseBaidu (ERNIE)Baidu (ERNIE)published Dec 16, 2024seen 5d

PaddlePaddle/PaddleNLP v3.0.0-beta3

PaddlePaddle/PaddleNLP

Open original ↗

Captured source

source ↗
published Dec 16, 2024seen 5dcaptured 13hhttp 200method plain

v3.0.0-beta3

Repository: PaddlePaddle/PaddleNLP

Tag: v3.0.0-beta3

Published: 2024-12-16T09:35:00Z

Prerelease: no

Release notes: 本次更新增强了PaddleNLP的基础体验,新增了Llama-3.2、DeepSeekV2模型,升级了TokenizerFast功能,重构了SFTTrainer。

此外,PaddleNLP还支持了优化器状态的卸载和重载功能,实现了精细化的重新计算,训练性能提升7%。在Unified Checkpoint方面,进一步优化了异步保存逻辑,新增Checkpoint压缩功能,可节省78.5%存储空间。 最后,在大模型推理、自动并行、多硬件支持、文档使用上,我们都进行了深度优化。

主要更新与增强

1. 新增模型

  • 新增了Llama-3.2模型(#9199)、DeepSeekV2模型(#9250),进一步丰富了大型模型的选择。

2. 基础架构改进

  • 重构了SFTTrainer和SFTConfig,提高了代码的可维护性。(#9318)
  • 支持优化器状态的卸载和重载功能(#9467),有效降低了内存使用。
  • 通过Hook实现了精细化的重新计算支持,例如,在llama模型上,训练性能可提升7%。(#9396)
  • Unified Checkpoint优化
  • 更新了异步保存逻辑(#9173, #9274, #9321),显著提升了检查点的保存与加载效率。
  • 增加了对专家并行的支持(#9055),使模型训练更加灵活。
  • 支持在开启sharding_comm_overlap时使用Unified Checkpoint。(#9392)
  • 新增了Checkpoint压缩功能,最多可节省78.5%的存储空间。(#9183
  • 通过多线程技术减少了检查点的加载时间(#9034)。
  • Tokenizer功能增强
  • 允许在Tokenizer调用时指定padding_side参数(#9258),提升了用户体验。
  • Qwen tokenizer现支持添加特殊标记(#9344),增强了其灵活性。
  • 修复了TokenizerFast中缺失的clean_up_tokenization_spaces问题(#9304),提高了文本处理的准确性。
  • 统一了分词器的_pad函数到基类。#9280
  • 新增了对BertTokenizerFast的支持,并允许在调用时注册tokenizer。(#9353
  • 改进了Qwen、Gemma、Yuan模型chat template的特殊输入处理。(#9462

3. 推理性能提升

  • 支持LLM推理直接量化内置bos模型(#9197)。
  • 加强了对LLM推理中FP8 量化的支持(如#9328, #9423),满足了多样化的精度需求。
  • 增强了投机解码(speculative decoding)和Append Attention 的支持。(#9180) (#9244)

4. 硬件兼容性扩展

  • 加强了对Intel HPU的支持(#9273),现在支持动态图预测。
  • 为XPU等国产硬件提供了统一检查点功能(#9312)。
  • 修复了XPU和DCU支持中的错误,并提升了性能。#9414#9433

5. 自动并行优化

  • 修复了自动并行过程中的多个问题(如#9217, #9355),确保了并行训练的稳定性。
  • 更新了自动并行配置与检查点转换器(如#9136, #9432),提升了训练的灵活性和稳定性。

6. 文档和测试更新

  • 更新了多个文档,包括LLM模型文档(如#9314)和量化文档(如#9330),确保了信息的时效性和准确性。
  • 新增了多个测试用例,如分布式数据加载测试(#9438),提高了测试的覆盖率。
  • 修复了文档中的链接错误和排版问题(如#9127, #9515),提升了用户体验。

本次更新标志着PaddleNLP的持续进步,为用户提供了更加全面、高效和稳定的NLP解决方案。我们期待在未来的版本中,继续为用户带来更多的创新和价值。

What's Changed

  • [Unified Checkpoint] update async_save_info in develop by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/9173
  • add flashmask rm by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/9154
  • [LLM_INFER] Support quantized model from bos and fix docs by @yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/9197
  • fix ci not set no_proxy and modify tests in pir mode by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/9205
  • [Models] Add Llama-3.2 by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9199
  • move some auto_parallel args into class AutoTrainingArguments by @Wennie396 in https://github.com/PaddlePaddle/PaddleNLP/pull/9155
  • [Performance] Compatible with flashmask API rename upgrade by @GuoxiaWang in https://github.com/PaddlePaddle/PaddleNLP/pull/9019
  • [AutoParallel] add vpp align and pp amp test by @AndSonder in https://github.com/PaddlePaddle/PaddleNLP/pull/9176
  • fix auto ci return bug when run in v100 by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/9216
  • fix auto ci return bug when run in v100 by @AndSonder in https://github.com/PaddlePaddle/PaddleNLP/pull/9228
  • [LLM] Add tools for parameters by @Hanyonggong in https://github.com/PaddlePaddle/PaddleNLP/pull/9137
  • [AutoParallel] Add test for fuse_ffn and fuse_attention_qkv pass by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9203
  • [CI] Fix ci import. by @ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9239
  • [Version] Update version info by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9241
  • [Auto Parallel] Adding align mode support by @zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/9150
  • [LLM INFER] top_p_sampling_reject support top_p=0 and custom seed by @gzy19990617 in https://github.com/PaddlePaddle/PaddleNLP/pull/9202
  • [INFER] update tune_cublaslt_gemm op and fix some bugs by @yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/9222
  • Reduce the time spent on git downloading third-party libraries by @vivienfanghuagood in https://github.com/PaddlePaddle/PaddleNLP/pull/9246
  • [PIR] fix pir open bugs by @yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/9248
  • Cherry-pick some PRs from incubate/paddlenlp-fleety by @sneaxiy in https://github.com/PaddlePaddle/PaddleNLP/pull/9245
  • [Unified Checkpoint] Support expert parallel by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/9055
  • [PIR] fix pir dt2st for chatglm_v2 by @yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/9251
  • Cherry-pick some PRs from incubate/paddlenlp-fleety by @LiYuRio in https://github.com/PaddlePaddle/PaddleNLP/pull/9253
  • [Unified Checkpoint] Fix generation config save by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9223
  • [AutoParallel] Fix tests for pass paddle AutoParallel CI by @liym27 in https://github.com/PaddlePaddle/PaddleNLP/pull/9267
  • change dataset by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/9266
  • [Unified Checkpoint] update async save logic by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/9274
  • add config file for model chatglm2,gemma,yuan by @Mangodadada in https://github.com/PaddlePaddle/PaddleNLP/pull/9139
  • Fix async hang by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/9276
  • [AutoParallel] Change llama test from sharding stage2 to stage1 by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9281
  • [Tokenizer] Enable padding_side as call time kwargs by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9258
  • [Trainer] fix save_model by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/9286
  • [CI] Skip inference test cases by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9270
  • [LLM] Add deepseekv2 by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9250
  • [Tokenizer] Unify tokenizer _pad by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9280
  • [CI] Fix llm/alignment/rm/flashmask path by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9289
  • support attention mask using causal=True by @GuoxiaWang in https://github.com/PaddlePaddle/PaddleNLP/pull/9268
  • [FlashMask] Add FlashMask for Qwen2 by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9264
  • bug fix…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Beta release of major NLP library