NVIDIA/MatX v1.0.0
NVIDIA/MatX
Captured source
source ↗v1.0.0
Repository: NVIDIA/MatX
Tag: v1.0.0
Published: 2026-03-04T19:48:29Z
Prerelease: no
Release notes:
v1.0.0
Release 1.0.0 marks a major update for MatX. 1.0.0 is the first version to require C++20 support for both the CUDA and host compilers. As a result, CUDA versions lower than 12.2.1 are not supported.
Among the major release highlights are:
- JIT Support
CUDA JIT support via a new CUDAJitExecutor. When used, this executor makes a second pass at the compilation and caches the resulting kernel to be used in the future. JIT allows MatX to convert many runtime parameters into compile-time parameters, thus reducing the computations needed in the kernel. It also optionally enables kernel fusion support of the NVIDIA MathDx libraries. When enabled, MatX can potentially fuse FFT and GEMM operations into other arithmetic expressions if certain criteria are met. Only FFT and BLAS fusion are supported now, but other MathDx libraries will be added in the future. For more information, see the docs.
- Logging
Full logging support to stdout or to a file is supported. Logging is useful for seeing which code path MatX is taking, and dumping verbose information about each function. Note that logging requires the header, which is not available in all C++20 compilers.
- Documentation
Added tag showing which version of MatX each operator was added in
- Compile-time Properties
Specify compile-time properties on an operator for fine-grainer control of operation. For example, change the accumulation type of an operator.
Full Changelog
- CUDA JIT Support by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1071
- Add unsafe aliased memory checking system by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1079
- Add comprehensive logging system and exception disabling support by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1080
- Use new python dlpack interface, fixing warnings by @simonbyrne in https://github.com/NVIDIA/MatX/pull/1082
- Replaced all uses of SFINAE with concepts for better error messages by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1081
- Added JIT capabilities into all operators except transform operators. by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1085
- Clang/nvc++ fixes by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1083
- Add -lineinfo/--extended-lambda to the MatX interface target by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1087
- Remove extraneous lambda capture by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1086
- Use native nvcc flag when architectures aren't specified by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1091
- Remove extra
thispointer from frexp lambda by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1090 - Use cudaMemcpyAsync rather than kernel when possible by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1088
- Add helper functions to clear MatX caches and allocations by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1092
- Use the underlying memory pointer to determine where memory resides in ToDlPack by @dylan-eustice in https://github.com/NVIDIA/MatX/pull/1093
- Handle argmin/argmax tuple accumulators in CUB by @Aminsed in https://github.com/NVIDIA/MatX/pull/1096
- Most non-transform operators working with JIT by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1094
- Add include for cinttypes in print.h by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1099
- Add MATX_EN_NVTIFF option by @tmartin-gh in https://github.com/NVIDIA/MatX/pull/1101
- Changed cudaExecutor to be const& by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1104
- Use cuda::std::accumulate in tensor.h by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1102
- Disable if compiler doesn't support it by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1109
- Use
cuda::std::tupleinstead ofthrust::tupleby @miscco in https://github.com/NVIDIA/MatX/pull/1110 - Add SAR backprojection transform by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1108
- Change to Rank() on Type by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1112
- Add helpers for compile-time operator properties by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1114
- Added version each operator was added to docs by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1116
- Fixes for 32-bit builds. Tested w/ gcc 11.4 and CTK 12.9 by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1120
- Add fltflt division and fltflt operator overloads by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1121
- Export pybind11 and remove visibility flag by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1111
- Add FMA function for the fltflt data type by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1123
- Tylera/gtc 2025 tutorials by @cliffburdick in https://github.com/NVIDIA/MatX/pull/900
- cuBLASDx support by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1122
- Avoid warning about unused variables by @miscco in https://github.com/NVIDIA/MatX/pull/1125
- Add nvbench-based benchmarks for the fltflt data types by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1124
- add link to ust blog post by @aartbik in https://github.com/NVIDIA/MatX/pull/1126
- Allow host-pinned pointers in SetVals() by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1128
- Fix flags for aarch64 containers using FFTW by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1127
- Add fltflt rounding and fmod functions by @tbensonatl in https://github.com/NVIDIA/MatX/pull/1129
- gcc16 warning patch on pybind by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1131
- Update CCCL and deprecate old CTK by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1130
- Added unwrap operator by @cliffburdick in https://github.com/NVIDIA/MatX/pull/1133
New Contributors
- @Aminsed made their first contribution in https://github.com/NVIDIA/MatX/pull/1096
Full Changelog: https://github.com/NVIDIA/MatX/compare/v0.9.4...v1.0.0
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Major GPU library release, notable but not a frontier model.