ReleaseNVIDIANVIDIApublished Mar 3, 2026seen 2d

NVIDIA/DALI v2.0.0

NVIDIA/DALI

Open original ↗

Captured source

source ↗
published Mar 3, 2026seen 2dcaptured 9hhttp 200method plain

DALI v2.0.0

Repository: NVIDIA/DALI

Tag: v2.0.0

Published: 2026-03-03T16:32:22Z

Prerelease: no

Release notes: Key Features and Enhancements --- This DALI release includes the following key features and enhancements:

  • Improved DALI dynamic mode:
  • Added asynchronous and deferred execution (#6210, #6204, #6124, #6216, #6152)
  • Improved multithreading, supporting no-gil Python 3.13t and Python 3.14. (#6200, #6174, #6136, #5884, #6201, #6202, #6164, #6142)
  • Added TorchData integration (#6198)
  • Improved usability and interoperability with other libraries (#6131, #6182, #6188, #6172, #6179, ##6143)
  • Improved execution device specification and handling (#6194, #6165)
  • Improved examples and documentation (#6140, #6189, #6170)
  • Added contrast-limited adaptive histogram equalization (CLAHE) operator (#6069)
  • Thank you @tonyreina for your contribution!
  • Added support for CUDA 13.1U1 (#6163)
  • Improved slice, full, zeros, ones operators (#6159, #6109, #6169)

Fixed Issues ---

  • Added DALI_MAX_IMAGE_SIZE env var to limit decoded image size in CPU and GPU decoders. (#6208)
  • Fiedx out-of-bounds reads in image format detection. (#6207)
  • Fixed audio decoder handling of files over 2GB. (#6199)
  • Fixed random crop operators conforming to new random state passing. (#6190)
  • Fixed displacement filter occasionaly returning corrupted data due to missing synchronization. (#6168)
  • Replaced pickle with JSON in DALI checkpoints format. (#6154)
  • Fixed slicing with negative stride. (#6161)
  • Fixed memory leak (#6153) in fixed-size poll allocator. (#6158)

Improvements ---

  • Add a function that purges operator instance cache for an EvalContext. (#6216)
  • Add TorchData integration in dynamic mode and create examples (#6198)
  • Add exception propagation for deferred and async execution (#6210)
  • Update VERSION to 2.0.0
  • Add ndd.Stream.synchronize method and implement EvalMode.sync_full (#6204)
  • ndd vs fn tests part 1: utils and automated tests (#6191)
  • Add multithreading guide for dynamic mode (#6200)
  • Limit thread count to 32 in ndd multithreading tests. (#6201)
  • Fix the conda tests in free threaded env (#6202)
  • Improved device handling. Remove mixed device. Make DALI work without GPU (#6194)
  • Replace deprecated pkg_resources.require with packaging/importlib-based alternative (#6196)
  • Add first class batch to tensor conversion with optional padding (#6182)
  • Make DALI Dynamic and Pipeline APIs two separate sections (#6189)
  • Documentation for ndd.DType (#6170)
  • Add multithreaded tests for dynamic mode (#6164)
  • Exclude ndd readers from operator docs (#6173)
  • Update DALI_DEPS: libsound, openssl (#6185)
  • Broadcast lists of scalars into any shape in ArgValue. (#6188)
  • Add per-thread stream. Rework stream semantics. Add a real Python stream class. (#6174)
  • Hide deprecated operators from documentation (#6180)
  • Fix jupyter tests (#6184)
  • Move to CUDA 13.1U1 (#6163)
  • Improve the interoperability of dynamic mode with PyTorch (#6172)
  • Remove debug mode references from documentation (#6175)
  • Create examples showing ndd usage (#6140)
  • Add __str__ and __repr__ generic formatting utilites (#6167)
  • Add layout handling to full, zeros, ones operator family (#6159)
  • Make EvalMode.eager the default (#6152)
  • Default num_threads and stream for dynamic API (#6165)
  • Dependency update 2026-02 (#6155)
  • Unexperimentalize operators (#6134)
  • Adjust performance threshold for dynamic mode in TL1_decoder_perf (#6160)
  • Update PyTorch Lightning example notebook (#6145)
  • Fix O_DIRECT expected to read number of bytes numpy reader (#6148)
  • Add pkg_resources compatibility fallback using importlib.metadata (#6144)
  • Relax numpy version constraints (#6137)
  • Move inflate from experimental to decoders, fix doc hiding for ndd, bump deprecation cut-off for ndd to 2.0 (#6141)
  • Support asynchronous execution in dynamic mode. (#6124)
  • Fix conda free-threaded Python build (#6142)
  • Add experimental Python 3.14 support and remove Python 3.9 (#6136)
  • Add dynamic mode RN50 pipeline to hw decoder bench (#6115)
  • Add --no-build-isolation flag to cocoapi pip install (#6132)
  • Improve interoperability of ndd tensors with third party libraries (#6131)
  • Fix cuFFT linking to respect BUILD_FFTS option (#6135)
  • Enable cross-device copy with cudaMemcpyPeerAsync. (#6130)
  • Add support for Python 3.13t (#5884)
  • Upgrade GitHub Actions for Node 24 compatibility (#6133)
  • Add PyTorch DataLoader Evaluator plugin (#6112)
  • Hide ops API (#6123)
  • Add the information of deprecation version origin (#6127)
  • Change the defaults for build options in docker/build_helper.sh (#6129)
  • Allow non-copying TensorList construction from a list of tesnors. (#6128)
  • Move all internal dnn API class/object public members to private (#6120)
  • Support more border modes in Slice (#6109)
  • Contrast-limited adaptive histogram equalization (CLAHE) to DALI image operators (#6069)
  • Add USE_PREBUILD_PYBIND11 option to use system pybind11 (#6117)
  • Drop Python 3.9 support (#6119)
  • Move to cuda 13.1 (#6116)
  • Remove old eager mode. (#6113)

Bug Fixes ---

  • Allocate CPU outputs in host order. Reset workspace order to host whe… (#6217)
  • Fix workspace stream handling in CPU imgcodec decoders. (#6215)
  • Add missing pillow installation in TL0_self_test_Ampere test (#6213)
  • Add DALI_MAX_IMAGE_SIZE env var to limit decoded image size in CPU and GPU decoders (#6208)
  • Accept more types in BBoxRotate input_shape argument. (#6212)
  • Fix out-of-bounds reads in image format detection (#6207)
  • Rework instance cache. (#6206)
  • Use notify_all instead of notify for EvalMode.async (#6205)
  • Fix dynamic mode pyi files (#6187)
  • Add sharding support to dynamic mode Reader (#6197)
  • Fix audio decoder to support files over 2GB (#6199)
  • Improve type hints in dynamic mode (#6183)
  • Safely calling Operator._init_spec in invocation.py (#6193)
  • Rework random crop operators (#6190)
  • Fix batch creation from unevaluated tensors (#6178)
  • Forbid passing axes to expand_dims as an input. (#6181)
  • Fix stream handling in tensor join when called from Dynamic mode. (#6171)
  • Fix batch construction from a tensor and layout. Add ability to change batch layout in batch and as_batch. (#6179)
  • Add handling of default layouts in standalone operator calls. (#6176)
  • Prevent deadlocks with asynchronous execution (#6177)
  • Set the device of ndd tensor slices (#6169)
  • Add missing __syncthreads in displacement filter. (#6168)
  • Use JSON in pipeline checkpointing (#6154)

*…

Excerpt shown — open the source for the full document.