What does this release signal mean?

Baidu (ERNIE) published PaddlePaddle/FastDeploy v2.2.1 (PaddlePaddle/FastDeploy). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Minor release of Baidu's deployment tool. · v2.2.1 Repository: PaddlePaddle/FastDeploy Tag: v2.2.1 Published: 2025-10-11T07:01:10Z Prerelease: no Release notes: 新增功能 - 新增在线权重更新支持开启Prefix Caching - 新增GLM 4.5.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

Baidu (ERNIE) Release: PaddlePaddle/FastDeploy v2.2.1

Captured source

source ↗

GitHub/github.com/PaddlePaddle/FastDeploy

PaddlePaddle/FastDeploy v2.2.1

Source ↗

published Oct 11, 2025seen Jun 5captured Jun 11http 200method plain

v2.2.1

Repository: PaddlePaddle/FastDeploy

Tag: v2.2.1

Published: 2025-10-11T07:01:10Z

Prerelease: no

Release notes:

新增功能

新增在线权重更新支持开启Prefix Caching
新增GLM 4.5 Air模型部署支持

What's Changed

[docs] update best practice docs for release/2.2 by @zoooo0820 in https://github.com/PaddlePaddle/FastDeploy/pull/3970
[Docs] release 2.2.0 by @ming1753 in https://github.com/PaddlePaddle/FastDeploy/pull/3991
[docs] update readme by @yangjianfengo1 in https://github.com/PaddlePaddle/FastDeploy/pull/3996
[Optimize]Error messages about Model api. by @AuferGachet in https://github.com/PaddlePaddle/FastDeploy/pull/3972
[Cherry-Pick] get org_vocab_size from args by @zeroRains in https://github.com/PaddlePaddle/FastDeploy/pull/3984
【FIX】Change the name of sparse attn from moba to plas by @yangjianfengo1 in https://github.com/PaddlePaddle/FastDeploy/pull/4006
Fix down projection weight shape in fused MOE layer by @yuanlehome in https://github.com/PaddlePaddle/FastDeploy/pull/4041
[Fix] fix multi api server log dir by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/3966
Fixed the issue of metrics file conflicts between multiple instances … by @zhuangzhuang12 in https://github.com/PaddlePaddle/FastDeploy/pull/4010
[Feature] Support mixed deployment with yiyan adapter in release22 by @rainyfly in https://github.com/PaddlePaddle/FastDeploy/pull/3974
[CI] update paddlepaddle==3.2.0 in release/2.2 by @EmmonsCurse in https://github.com/PaddlePaddle/FastDeploy/pull/3997
[setup optimize]Support git submodule (#4033) by @YuanRisheng in https://github.com/PaddlePaddle/FastDeploy/pull/4080
[CP]Glm45 air 2.2 by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4073
[feat] support prefix cache clearing when /clear_load_weight is called by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4091
[BugFix]fix tp/ep group gid by @gzy19990617 in https://github.com/PaddlePaddle/FastDeploy/pull/4038
Support limit thinking lengths. by @K11OntheBoat in https://github.com/PaddlePaddle/FastDeploy/pull/4070
Add assertion for ENABLE_V1_KVCACHE_SCHEDULER by @Jiang-Jia-Jun in https://github.com/PaddlePaddle/FastDeploy/pull/4146
[fix] fix ep group all-reduce by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4140
[Cherry-pick] fix MTP load with v1 loader by @zoooo0820 in https://github.com/PaddlePaddle/FastDeploy/pull/4153
[CP2.2] Machete support group scale & wint8 & v1 loader by @Sunny-bot1 in https://github.com/PaddlePaddle/FastDeploy/pull/4166
[Feature] support rdma IB transfer by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/4123
[BugFix]2.2 glm all reduce tp group by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4188
[Executor] Adjust signal sending order in RL training (#3773) (#4066) by @gongshaotian in https://github.com/PaddlePaddle/FastDeploy/pull/4178
[fix] initialize available_gpu_block_num with max_gpu_block_num by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4193
[fix]Modify follow-up push parameters and Modify the verification method for thinking length by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/4177
Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4115
[Feature]CP support data clear by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/4214
[fix] fix clearing caches synchronization and add more logs by @liyonghua0910 in https://github.com/PaddlePaddle/FastDeploy/pull/4212
fix ernie vl distributed attr. by @ZHUI in https://github.com/PaddlePaddle/FastDeploy/pull/4217
[2.2]include_stop_str_in_output=False not return eos text by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4231
[fix]update apply_chat_template by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/4249
[fix]remove reasoning_max_tokens=max_toksns*0.8 in sampling_params by @luukunn in https://github.com/PaddlePaddle/FastDeploy/pull/4294
【fix】Remove the logic that assigns the default value of 80% to reasoning_max_tokens in the offline component of FastDeploy by @kxz2002 in https://github.com/PaddlePaddle/FastDeploy/pull/4304
[feature]2.2 custom_allreduce support cudagraph recapture by @ckl117 in https://github.com/PaddlePaddle/FastDeploy/pull/4307
[BUGFIX] clear request by @ltd0924 in https://github.com/PaddlePaddle/FastDeploy/pull/4320

Full Changelog: https://github.com/PaddlePaddle/FastDeploy/compare/v2.2.0...v2.2.1

Notability

notability 4.0/10

Minor release of Baidu's deployment tool.