What does this release signal mean?

NVIDIA published NVIDIA/numba-cuda-mlir v0.4.0 (NVIDIA/numba-cuda-mlir). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Numba CUDA backend that uses MLIR for compilation. · v0.4.0 Repository: NVIDIA/numba-cuda-mlir Tag: v0.4.0 Published: 2026-06-15T16:24:32Z Prerelease: no Release notes: This first update focuses on platform support,.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/numba-cuda-mlir v0.4.0

Captured source

source ↗

GitHub/github.com/NVIDIA/numba-cuda-mlir

NVIDIA/numba-cuda-mlir v0.4.0

Source ↗

published Jun 15, 2026seen 1wcaptured 1whttp 200method plain

v0.4.0

Repository: NVIDIA/numba-cuda-mlir

Tag: v0.4.0

Published: 2026-06-15T16:24:32Z

Prerelease: no

Release notes: This first update focuses on platform support, debugging, ecosystem enablement, performance, and broader CUDA Python compatibility.

Highlights

Added Support for Windows, and integrated Windows tests into CI.
Added CUDA-gdb CI workflow and debugging support validation.
Added experimental third-party ecosystem coverage for nvmath-python, RAPIDS/cuDF, and numbast extension backends.
Improved warm compile-time performance by redesigning extension registry refresh behavior, delivering an additional ~40% speedup over the previous implementation on our benchmark suite and reaching ~1.8x geomean speedup on warm compile-time vs. numba-cuda.

Platform and Tooling Support

Introduced Windows CI coverage and related build fixes, including static CRT usage.
Added CUDA-gdb workflow coverage to validate debugging behavior.
Improved compatibility with newer libc++ versions.
Removed the implicit nvjitlink dependency derived from cudatoolkit.

Performance and Compilation

Replaced implicit context refresh with explicit initialization and version-tracked registries, reducing warm compile overhead.
Optimized CUDA Array Interface launch caching.
Avoided finalizing internal device callees during compilation.
Added user-controlled handling for LTOIR linker optimization disabling instead of unconditionally disabling it.

CUDA Python Compatibility Improvements

Added full array.view() support, including dtype bitwidth changes.
Added support for vector types in local and shared memory.
Added CUDA vector / scalar operations and vector-to-complex conversions.
Added support for custom dtypes.
Added complex constructor support, including complex32.
Added support for complex CPointer getitem/setitem lowering.
Added support for NamedTuple usage in kernels.
Improved support for array slicing and shared-memory views.

Lowering and Type System Fixes

Unified vector type handling by replacing VectorTypeStub with VectorType / VectorTypeClass.
Introduced a value/storage data model to fix float16 and bool memory representation issues.
Fixed lowering for defaults, tuples, dtype tokens, heterogeneous tuple assignment, optional values, and string constant folding.
Fixed array.real / array.imag on shared-memory arrays preserving address space.
Fixed VectorType to complex setitem behavior.
Fixed to_numba_type handling for NumPy dtypes.

Ecosystem and Extension Support

Enabled extension linkage in MLIR lowerings.
Added Extension API documentation.
Added Numbast MLIR source CI tests.
Added experimental cuDF / RAPIDS third-party test coverage, including use of pylibcudf from the active conda environment.
Prevented unintended invocation of the Numba-CUDA JIT and addressed resulting issues.

Documentation and Maintenance

Updated reference documentation.
Added PR documentation preview infrastructure.
Fixed PyPI-hosted README links.
Removed outdated conda install documentation.
Removed legacy @intrinsic implementations.
Removed dead NRT C++ code.
Removed cudasim support.
Removed unnecessary packaging dependency from numba_cuda._compat.

Bug Fixes

Fixed ICE for raise-only kernels.
Fixed shared-memory view behavior with None starts.
Fixed array slicing issues.
Fixed multiple lowering edge cases involving tuples, optionals, constants, and complex/vector interactions.
Fixed cuDF CI and Numbast CI issues.

Notability

notability 4.0/10

Minor release of niche compiler tool