ReleaseMicrosoftMicrosoftpublished Dec 23, 2024seen 5d

microsoft/mscclpp v0.6.0

microsoft/mscclpp

Open original ↗

Captured source

source ↗
published Dec 23, 2024seen 5dcaptured 8hhttp 200method plain

MSCCL++ v0.6.0

Repository: microsoft/mscclpp

Tag: v0.6.0

Published: 2024-12-23T19:01:13Z

Prerelease: no

Release notes:

Highlight

  • Improved NCCL API integration in MSCCL++ for better performance and usability
  • Enhanced execution plan-based executor in MSCCL++
  • Fixed several bugs to improve stability and reliability

What's Changed

  • Add support for different vector sizes in multimem instructions by @roshandathathri in https://github.com/microsoft/mscclpp/pull/332
  • NCCL API Executor Integration by @caiomcbr in https://github.com/microsoft/mscclpp/pull/331
  • Fix missing import in executor test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/334
  • bfloat16 support by @chhwang in https://github.com/microsoft/mscclpp/pull/336
  • Dynamically load libibverbs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/337
  • Auto-tune vector sizes for NVLS allreduce6 by @roshandathathri in https://github.com/microsoft/mscclpp/pull/338
  • Make ibverbs optional at compile time by @chhwang in https://github.com/microsoft/mscclpp/pull/340
  • ProxyChannel Support in Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/342
  • Support executors to send packets over ProxyChannel by @caiomcbr in https://github.com/microsoft/mscclpp/pull/344
  • Fix for ROCm 6.0 by @chhwang in https://github.com/microsoft/mscclpp/pull/347
  • Fix bug for construct sempaphore by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/341
  • Add proxy channel related operations by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/351
  • Add CI for rocm by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/346
  • Tune threads per block for mscclpp executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/345
  • Fix NPKit exit event offset by @yzygitzh in https://github.com/microsoft/mscclpp/pull/356
  • Use IB transport flags only when an IB device exists by @chhwang in https://github.com/microsoft/mscclpp/pull/355
  • Update ROCm CI by @chhwang in https://github.com/microsoft/mscclpp/pull/357
  • Fixing RegisterMemory Allocation for ProxyChannels by @caiomcbr in https://github.com/microsoft/mscclpp/pull/353
  • Fix NCCL API bugs by @chhwang in https://github.com/microsoft/mscclpp/pull/363
  • Perf optimization & support clipping by @chhwang in https://github.com/microsoft/mscclpp/pull/364
  • Fix copyright messages by @chhwang in https://github.com/microsoft/mscclpp/pull/367
  • [Doc] mscclpp docs by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/348
  • Executor AllGather In-Place Support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/365
  • Fix algo repo name by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/369
  • Update docker image for cuda12.4 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/370
  • Fix in-place all-gather input buffer in executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/372
  • [docs] fix quickstart link by @jeffra in https://github.com/microsoft/mscclpp/pull/374
  • Add kernel-based verification for executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/378
  • Lazily create the context stream by @chhwang in https://github.com/microsoft/mscclpp/pull/381
  • Fixing Bug Const Offset in Execution Plan by @caiomcbr in https://github.com/microsoft/mscclpp/pull/380
  • Fix light load bug by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/379
  • Small Adjust in Test Data AllGather at Executor Test by @caiomcbr in https://github.com/microsoft/mscclpp/pull/384
  • Fix missing packet parameter for executor by @yzygitzh in https://github.com/microsoft/mscclpp/pull/385
  • NVLS support for msccl++ executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/375
  • Fix typo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/389
  • Improve CMake options by @chhwang in https://github.com/microsoft/mscclpp/pull/376
  • Fixing Message Boundary AllReduce Fallback Code by @caiomcbr in https://github.com/microsoft/mscclpp/pull/391
  • Fix mscclpp_benchmark by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/392
  • Add cross threadblock barrier by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/383
  • AllGather Executor Support in NCCL Interface by @caiomcbr in https://github.com/microsoft/mscclpp/pull/393
  • Providing reduce-scatter test support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/390
  • Select algo according to json config by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/396
  • Add connection events for NPKit by @yzygitzh in https://github.com/microsoft/mscclpp/pull/386
  • Revised ProxyChannel interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/400
  • Setup pipeline for mscclpp over nccl by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/401
  • Exception Max Number Operation per Tb by @caiomcbr in https://github.com/microsoft/mscclpp/pull/405
  • Reduce memory usage for scratch buffer by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/403
  • [Cherry-pick] Move pipeline to official org (#406) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/416
  • [Cherry-pick] trigger ci for release branches (#426) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/427
  • [Cherry-pick] Disable CuMemMap check for ROCm (#411) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/424
  • [Cherry-pick] NVLS support for NCCL API (#410) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/425
  • [Cherry-pick] Fix nccl-test failure issue (#421) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/429

New Contributors

  • @jeffra made their first contribution in https://github.com/microsoft/mscclpp/pull/374

Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.2...v0.6.0

Notability

notability 6.0/10

New version of GPU collective comm library