PaddlePaddle/PaddleNLP v3.0.0-beta2
PaddlePaddle/PaddleNLP
Captured source
source ↗published Oct 8, 2024seen 5dcaptured 11hhttp 200method plain
v3.0.0-beta2
Repository: PaddlePaddle/PaddleNLP
Tag: v3.0.0-beta2
Published: 2024-10-08T08:52:31Z
Prerelease: yes
Release notes: 本次更新强化了PaddleNLP的基础设施,新增了Qwen2.5、Mixtral 8*22B模型并升级了Tokenizer功能,同时重命名了数据索引工具。
此外,还修复了MoE模型参数保存与加载等问题,提升了文本处理准确性,并更新了文档与测试用例。在推理性能、硬件支持及自动并行方面也进行了优化,包括支持更多模型与参数配置、多GPU推理、国产硬件支持增强以及分布式训练流程优化等。
核心变更与增强功能
1. 基础设施强化:
- 新增Qwen2.5模型(#9157 ),Mixtral 8*22B。进一步丰富模型库。
- Tokenizer功能升级,现支持加载额外解码标记added_tokens_decoder(#8997 ),提升灵活性。
- 数据索引工具
tool_helpers重命名为fast_dataindex(#9134 ),以更直观反映其功能特性。 - 实现训练过程中数据间隔跳过的功能(#8989 ),优化数据处理效率。
- Unified Checkpoint优化:
- 更新优化器异步保存信号(#8975 ),保证保存稳定。
- 修复统一检查点中的多项问题(#9082 ),确保功能正确性。
3. 问题修复:
- 解决了MoE模型参数保存与加载的问题(#9045 )。
- 修正Tokenizer中空格与特殊符号处理的不足(#9010 , #9144 ),提升文本处理准确性。
4. 文档与测试更新:
- 更新多个文档,涵盖LLM模型文档(如#8990 , #8999 )及量化文档(#9057 )等,确保信息的时效性与准确性。
- 新增测试用例,如针对PIR模式序列并行的测试(#9015 ),强化测试覆盖度。
- 修复文档中的链接错误(如#9127 ),提升用户体验。
5. 其他关键变更:
- 推理性能优化:
- LLM推理代码得到优化,支持更多模型与参数配置(如#8986 , #8995 ),拓宽应用场景。
- 实现Qwen2_Moe多GPU推理(#9121 )及wint4量化(#9129 ),提升推理效率。
- 加强LLM推理对FP8与INT8的支持(如#9032 , #9151 ),满足多样化精度需求。
- 硬件支持拓展:
- 增强对DCU、XPU、MLU等国产硬件的支持(如#8983 , #8504 , #9075 ),促进国产化替代。
- 优化上述硬件上的模型训练与推理性能,提升整体运算效率。
- 自动并行优化:
- 修复训练过程中数据重复跳过的问题(#8980 ),确保数据处理的正确性。
- 更新自动并行配置与检查点转换器(如#8847 , #9136 ),提升并行训练的灵活性与稳定性。
- 新增损失NaN/Inf检查器(#8943 ),及时发现并处理潜在数值问题。
- 优化分布式训练中的数据加载与梯度合并流程(如#9120 , #9179 ),提升训练速度与稳定性。
What's Changed
- [Unified checkpoint] update optimizer async save signal by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8975
- 更正run_dpo.py文件路径 by @Mangodadada in https://github.com/PaddlePaddle/PaddleNLP/pull/8952
- fix the loss base in llama_align_dygraph_dy2st_auto_bs2_bf16_DP2-MP1-… by @winter-wang in https://github.com/PaddlePaddle/PaddleNLP/pull/8986
- [Bug fix] fix skip consumed_samples twice bug by @zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/8980
- fix pip error in legacy benchmarks by @fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8978
- 【auto_parallel】Add checkpoint convertor by @xingmingyyj in https://github.com/PaddlePaddle/PaddleNLP/pull/8847
- [llm]update finetune.md by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8990
- tool_helpers升级后可以支持32766个数据集. by @JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8994
- add DCU inference docs by @YanhuiDua in https://github.com/PaddlePaddle/PaddleNLP/pull/8983
- [Distributed]Add loss nan/inf checker by @ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/8943
- 【llm】update docs by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8999
- [Feature] Fused Mixtral support by @penPenf28 in https://github.com/PaddlePaddle/PaddleNLP/pull/8901
- [XPU] Add README.md for llama2-7b by @xiguapipi in https://github.com/PaddlePaddle/PaddleNLP/pull/8979
- Add gcu llama readme by @EnflameGCU in https://github.com/PaddlePaddle/PaddleNLP/pull/8950
- fix qwen model use_casual_mask by @deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/9009
- [ZeroPadding] revert zero_padding #8973 by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9003
- [LLM Inference] Fix step.cu bug by @yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/8995
- Refine checkpoint converter by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9001
- [Feature] fused mixtral wint4 by @penPenf28 in https://github.com/PaddlePaddle/PaddleNLP/pull/9013
- llm inference docs by @Sunny-bot1 in https://github.com/PaddlePaddle/PaddleNLP/pull/8976
- [LLM Inference] Support Qwen2_Moe Inference Model by @CJ77Qi in https://github.com/PaddlePaddle/PaddleNLP/pull/8892
- fix llama3 static run by @yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/8849
- [paddle inference cpu]update cpu inference by @bukejiyu in https://github.com/PaddlePaddle/PaddleNLP/pull/8984
- fix the tipc ce case by @wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/8748
- [Cherry-pick] Add is_distributed field in sharding reshard param_meta by @sneaxiy in https://github.com/PaddlePaddle/PaddleNLP/pull/9028
- [Tokenizer] Support for loading added_tokens_decoder by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8997
- [Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support by @lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9032
- Fix checker of nan/inf by @ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/9029
- [Cherry-pick] add comm buffer size (#8963) by @ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/9031
- [Unified Checkpoint] Update async save info by @DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8982
- [llm]support pad to max_length & fix sp bug by @lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/9040
- [Bugfix] fix bias optional by @penPenf28 in https://github.com/PaddlePaddle/PaddleNLP/pull/9037
- fix setup.py for llm inference by @yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/9041
- [Inference] Add cutlass gemm dequant op by @gzy19990617 in https://github.com/PaddlePaddle/PaddleNLP/pull/8909
- [Inference] update fakequant support by @lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9047
- add test for pir sequence parallel on llama model by @liym27 in https://github.com/PaddlePaddle/PaddleNLP/pull/9015
- Fix moe save load by @Meiyim in https://github.com/PaddlePaddle/PaddleNLP/pull/9045
- Update quantization.md by @ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9057
- 【Fix】Initialize dp degree in single GPU by @greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/9056
- fix bos download by @westfish in https://github.com/PaddlePaddle/PaddleNLP/pull/9023
- [Inference] Update fakequant script by @lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9054
- [AutoParallel][PIR] Fit pir grad merge by @AndSonder in https://github.com/PaddlePaddle/PaddleNLP/pull/8985
- [MLU] Support rms_norm_mlu by @PeiyuLau in https://github.com/PaddlePaddle/PaddleNLP/pull/8504
- [Inference] support llama3 a8w8c8_fp8 inference and cutlass_fp8_gemm by @ckl117 in https://github.com/PaddlePaddle/PaddleNLP/pull/8953
- [Inference] Qwen2 support fp8 inference by @ckl117 in https://github.com/PaddlePaddle/PaddleNLP/pull/8954
- [Version] update version info by @DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9060
- [NPU] Fix baichuan2-13b-chat infer by @ronny1996 in https://github.com/PaddlePaddle/PaddleNLP/pull/9070
- [MLU] Fix Llama attrntion_mask in npu and mlu by @DrownFish19 in…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Major NLP library release, beta of v3.