ReleaseMicrosoftMicrosoftpublished Jan 3, 2024seen 1w

microsoft/superbenchmark v0.10.0

microsoft/superbenchmark

Open original ↗

Captured source

source ↗
published Jan 3, 2024seen 1wcaptured 1whttp 200method plain

Release SuperBench v0.10.0

Repository: microsoft/superbenchmark

Tag: v0.10.0

Published: 2024-01-03T00:10:48Z

Prerelease: no

Release notes:

SuperBench 0.10.0 Release Notes

SuperBench Improvements

  • Support monitoring for AMD GPUs.
  • Support ROCm 5.7 and ROCm 6.0 dockerfile.
  • Add MSCCL support for Nvidia GPU.
  • Fix NUMA domains swap issue in NDv4 topology file.
  • Add NDv5 topo file.
  • Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2.

Micro-benchmark Improvements

  • Add HPL random generator to gemm-flops with ROCm.
  • Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames.
  • Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance.
  • Update Docker image for H100 support.
  • Update MLC version into 3.10 for CUDA/ROCm dockerfile.
  • Bug fix for GPU Burn test.
  • Support INT8 in cublaslt function.
  • Add hipBLASLt function benchmark.
  • Support cpu-gpu and gpu-cpu in ib-validation.
  • Support graph mode in NCCL/RCCL benchmarks for latency metrics.
  • Support cpp implementation in distributed inference benchmark.
  • Add O2 option for gpu copy ROCm build.
  • Support different hipblasLt data types in dist inference.
  • Support in-place in NCCL/RCCL benchmark.
  • Support data type option in NCCL/RCCL benchmark.
  • Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs.
  • Update hipblaslt GEMM metric unit to tflops.
  • Support FP8 for hipblaslt benchmark.

Model Benchmark Improvements

  • Change torch.distributed.launch to torchrun.
  • Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark.

Result Analysis

  • Support baseline generation from multiple nodes.

Notability

notability 3.0/10

Routine version release of benchmarking tool.