ReleaseBaidu (ERNIE)Baidu (ERNIE)published Oct 11, 2025seen 5d

PaddlePaddle/FastDeploy v2.2.1

PaddlePaddle/FastDeploy

Open original ↗

Captured source

source ↗
published Oct 11, 2025seen 5dcaptured 8hhttp 200method plain

v2.2.1

Repository: PaddlePaddle/FastDeploy

Tag: v2.2.1

Published: 2025-10-11T07:01:10Z

Prerelease: no

Release notes:

新增功能

  • 新增在线权重更新支持开启Prefix Caching
  • 新增GLM 4.5 Air模型部署支持

What's Changed

  • [docs] update best practice docs for release/2.2 by @zoooo0820 in https://github.com/PaddlePaddle/FastDeploy/pull/3970
  • [Docs] release 2.2.0 by @ming1753 in https://github.com/PaddlePaddle/FastDeploy/pull/3991
  • [docs] update readme by @yangjianfengo1 in https://github.com/PaddlePaddle/FastDeploy/pull/3996
  • [Optimize]Error messages about Model api. by @AuferGachet in https://github.com/PaddlePaddle/FastDeploy/pull/3972
  • [Cherry-Pick] get org_vocab_size from args by @zeroRains in https://github.com/PaddlePaddle/FastDeploy/pull/3984
  • 【FIX】Change the name of sparse attn from moba to plas by @yangjianfengo1 in https://github.com/PaddlePaddle/FastDeploy/pull/4006
  • Fix down projection weight shape in fused MOE layer by @yuanlehome in https://github.com/PaddlePaddle/FastDeploy/pull/4041
  • [Fix] fix multi api server log dir by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3966
  • Fixed the issue of metrics file conflicts between multiple instances … by @zhuangzhuang12 in https://github.com/PaddlePaddle/FastDeploy/pull/4010
  • [Feature] Support mixed deployment with yiyan adapter in release22 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3974
  • [CI] update paddlepaddle==3.2.0 in release/2.2 by @EmmonsCurse in https://github.com/PaddlePaddle/FastDeploy/pull/3997
  • [setup optimize]Support git submodule (#4033) by @YuanRisheng in https://github.com/PaddlePaddle/FastDeploy/pull/4080
  • [CP]Glm45 air 2.2 by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4073
  • [feat] support prefix cache clearing when /clear_load_weight is called by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4091
  • [BugFix]fix tp/ep group gid by @gzy19990617 in https://github.com/PaddlePaddle/FastDeploy/pull/4038
  • Support limit thinking lengths. by @K11OntheBoat in https://github.com/PaddlePaddle/FastDeploy/pull/4070
  • Add assertion for ENABLE_V1_KVCACHE_SCHEDULER by @Jiang-Jia-Jun in https://github.com/PaddlePaddle/FastDeploy/pull/4146
  • [fix] fix ep group all-reduce by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4140
  • [Cherry-pick] fix MTP load with v1 loader by @zoooo0820 in https://github.com/PaddlePaddle/FastDeploy/pull/4153
  • [CP2.2] Machete support group scale & wint8 & v1 loader by @Sunny-bot1 in https://github.com/PaddlePaddle/FastDeploy/pull/4166
  • [Feature] support rdma IB transfer by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/4123
  • [BugFix]2.2 glm all reduce tp group by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4188
  • [Executor] Adjust signal sending order in RL training (#3773) (#4066) by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/4178
  • [fix] initialize available_gpu_block_num with max_gpu_block_num by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4193
  • [fix]Modify follow-up push parameters and Modify the verification method for thinking length by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/4177
  • Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4115
  • [Feature]CP support data clear by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/4214
  • [fix] fix clearing caches synchronization and add more logs by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4212
  • fix ernie vl distributed attr. by @ZHUI in https://github.com/PaddlePaddle/FastDeploy/pull/4217
  • [2.2]include_stop_str_in_output=False not return eos text by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4231
  • [fix]update apply_chat_template by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/4249
  • [fix]remove reasoning_max_tokens=max_toksns*0.8 in sampling_params by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/4294
  • 【fix】Remove the logic that assigns the default value of 80% to reasoning_max_tokens in the offline component of FastDeploy by @kxz2002 in https://github.com/PaddlePaddle/FastDeploy/pull/4304
  • [feature]2.2 custom_allreduce support cudagraph recapture by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4307
  • [BUGFIX] clear request by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/4320

Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/v2.2.0...v2.2.1

Notability

notability 4.0/10

Minor release of Baidu's deployment tool.