NVIDIA/DALI v2.0.0
NVIDIA/DALI
Captured source
source ↗published Mar 3, 2026seen 2dcaptured 9hhttp 200method plain
DALI v2.0.0
Repository: NVIDIA/DALI
Tag: v2.0.0
Published: 2026-03-03T16:32:22Z
Prerelease: no
Release notes: Key Features and Enhancements --- This DALI release includes the following key features and enhancements:
- Improved DALI dynamic mode:
- Added asynchronous and deferred execution (#6210, #6204, #6124, #6216, #6152)
- Improved multithreading, supporting no-gil Python 3.13t and Python 3.14. (#6200, #6174, #6136, #5884, #6201, #6202, #6164, #6142)
- Added TorchData integration (#6198)
- Improved usability and interoperability with other libraries (#6131, #6182, #6188, #6172, #6179, ##6143)
- Improved execution device specification and handling (#6194, #6165)
- Improved examples and documentation (#6140, #6189, #6170)
- Added contrast-limited adaptive histogram equalization (CLAHE) operator (#6069)
- Thank you @tonyreina for your contribution!
- Added support for CUDA 13.1U1 (#6163)
- Improved slice, full, zeros, ones operators (#6159, #6109, #6169)
Fixed Issues ---
- Added DALI_MAX_IMAGE_SIZE env var to limit decoded image size in CPU and GPU decoders. (#6208)
- Fiedx out-of-bounds reads in image format detection. (#6207)
- Fixed audio decoder handling of files over 2GB. (#6199)
- Fixed random crop operators conforming to new random state passing. (#6190)
- Fixed displacement filter occasionaly returning corrupted data due to missing synchronization. (#6168)
- Replaced pickle with JSON in DALI checkpoints format. (#6154)
- Fixed slicing with negative stride. (#6161)
- Fixed memory leak (#6153) in fixed-size poll allocator. (#6158)
Improvements ---
- Add a function that purges operator instance cache for an EvalContext. (#6216)
- Add TorchData integration in dynamic mode and create examples (#6198)
- Add exception propagation for deferred and async execution (#6210)
- Update VERSION to 2.0.0
- Add ndd.Stream.synchronize method and implement EvalMode.sync_full (#6204)
nddvsfntests part 1: utils and automated tests (#6191)- Add multithreading guide for dynamic mode (#6200)
- Limit thread count to 32 in ndd multithreading tests. (#6201)
- Fix the conda tests in free threaded env (#6202)
- Improved device handling. Remove mixed device. Make DALI work without GPU (#6194)
- Replace deprecated pkg_resources.require with packaging/importlib-based alternative (#6196)
- Add first class batch to tensor conversion with optional padding (#6182)
- Make DALI Dynamic and Pipeline APIs two separate sections (#6189)
- Documentation for ndd.DType (#6170)
- Add multithreaded tests for dynamic mode (#6164)
- Exclude ndd readers from operator docs (#6173)
- Update DALI_DEPS: libsound, openssl (#6185)
- Broadcast lists of scalars into any shape in ArgValue. (#6188)
- Add per-thread stream. Rework stream semantics. Add a real Python stream class. (#6174)
- Hide deprecated operators from documentation (#6180)
- Fix jupyter tests (#6184)
- Move to CUDA 13.1U1 (#6163)
- Improve the interoperability of dynamic mode with PyTorch (#6172)
- Remove debug mode references from documentation (#6175)
- Create examples showing ndd usage (#6140)
- Add __str__ and __repr__ generic formatting utilites (#6167)
- Add layout handling to full, zeros, ones operator family (#6159)
- Make EvalMode.eager the default (#6152)
- Default
num_threadsandstreamfor dynamic API (#6165) - Dependency update 2026-02 (#6155)
- Unexperimentalize operators (#6134)
- Adjust performance threshold for dynamic mode in TL1_decoder_perf (#6160)
- Update PyTorch Lightning example notebook (#6145)
- Fix O_DIRECT expected to read number of bytes numpy reader (#6148)
- Add pkg_resources compatibility fallback using importlib.metadata (#6144)
- Relax numpy version constraints (#6137)
- Move inflate from experimental to decoders, fix doc hiding for ndd, bump deprecation cut-off for ndd to 2.0 (#6141)
- Support asynchronous execution in dynamic mode. (#6124)
- Fix conda free-threaded Python build (#6142)
- Add experimental Python 3.14 support and remove Python 3.9 (#6136)
- Add dynamic mode RN50 pipeline to hw decoder bench (#6115)
- Add --no-build-isolation flag to cocoapi pip install (#6132)
- Improve interoperability of ndd tensors with third party libraries (#6131)
- Fix cuFFT linking to respect BUILD_FFTS option (#6135)
- Enable cross-device copy with
cudaMemcpyPeerAsync. (#6130) - Add support for Python 3.13t (#5884)
- Upgrade GitHub Actions for Node 24 compatibility (#6133)
- Add PyTorch DataLoader Evaluator plugin (#6112)
- Hide ops API (#6123)
- Add the information of deprecation version origin (#6127)
- Change the defaults for build options in docker/build_helper.sh (#6129)
- Allow non-copying TensorList construction from a list of tesnors. (#6128)
- Move all internal dnn API class/object public members to private (#6120)
- Support more border modes in Slice (#6109)
- Contrast-limited adaptive histogram equalization (CLAHE) to DALI image operators (#6069)
- Add USE_PREBUILD_PYBIND11 option to use system pybind11 (#6117)
- Drop Python 3.9 support (#6119)
- Move to cuda 13.1 (#6116)
- Remove old eager mode. (#6113)
Bug Fixes ---
- Allocate CPU outputs in host order. Reset workspace order to host whe… (#6217)
- Fix workspace stream handling in CPU imgcodec decoders. (#6215)
- Add missing pillow installation in TL0_self_test_Ampere test (#6213)
- Add DALI_MAX_IMAGE_SIZE env var to limit decoded image size in CPU and GPU decoders (#6208)
- Accept more types in BBoxRotate input_shape argument. (#6212)
- Fix out-of-bounds reads in image format detection (#6207)
- Rework instance cache. (#6206)
- Use notify_all instead of notify for EvalMode.async (#6205)
- Fix dynamic mode pyi files (#6187)
- Add sharding support to dynamic mode Reader (#6197)
- Fix audio decoder to support files over 2GB (#6199)
- Improve type hints in dynamic mode (#6183)
- Safely calling Operator._init_spec in invocation.py (#6193)
- Rework random crop operators (#6190)
- Fix batch creation from unevaluated tensors (#6178)
- Forbid passing axes to expand_dims as an input. (#6181)
- Fix stream handling in tensor join when called from Dynamic mode. (#6171)
- Fix batch construction from a tensor and layout. Add ability to change batch layout in batch and as_batch. (#6179)
- Add handling of default layouts in standalone operator calls. (#6176)
- Prevent deadlocks with asynchronous execution (#6177)
- Set the device of ndd tensor slices (#6169)
- Add missing __syncthreads in displacement filter. (#6168)
- Use JSON in pipeline checkpointing (#6154)
*…
Excerpt shown — open the source for the full document.