ReleaseNVIDIANVIDIApublished Apr 28, 2026seen 2d

NVIDIA/DALI v2.1.0

NVIDIA/DALI

Open original ↗

Captured source

source ↗
published Apr 28, 2026seen 2dcaptured 9hhttp 200method plain

DALI v2.1.0

Repository: NVIDIA/DALI

Tag: v2.1.0

Published: 2026-04-28T18:16:40Z

Prerelease: no

Release notes: Key Features and Enhancements --- This DALI release includes the following key features and enhancements:

  • Added torchvision API:
  • Enabled composing DALI pipelines with torchvision-like API (#6281, #6278, #6276, #6275, #6272, #6266, #6229)
  • Added utilities for torch tensor/PIL image conversions (#6287)
  • Improved DALI dynamic:
  • Added new thread pool for better threads utilization and sharing across process (#4635, #6219, #6245, #6224, #6254)
  • Improved error-reporting (#6210, #6260)
  • Improved deletion order handling to avoid unnecessary syncs (#6277)
  • Improved readers API (#6252)
  • Improved video readers:
  • Added uniform_sample option to VideoReaderDecoder (#6258)
  • Added enable_frame_num='sequence' mode to video readers. (#6237)
  • Improved free-threaded Python support:
  • Support free-threaded Python in DALI python_function (#6289)
  • Replaced dm-tree with optree dependency (#6225)
  • Added support for CUDA 13.2 (#6249)
  • Added support for instantiating operators and building pipelines in C API (#6253)
  • Updated JAX integration to support JAX 0.9. (#6238, #6286, #6259, #6256, #6247)

Fixed Issues ---

  • Documented workaround for CUDA graph capture clash between JAX and DALI (#6286)
  • Fixed out-of-bounds access and key handling in Caffe/Caffe2 reader (#6211)
  • Fixed range clamping in subscript operator. (#6242)
  • Fixed too strict contiguity check when importing tensors via DLPack (#6285)
  • Fixed inflate operator max output estimation (#6283)

Improvements ---

  • Update DALI_DEPS_VERSION (patch libtiff) (#6295)
  • Torchvision API to tensor/PIL image conversion operators (#6282)
  • Allow passing tensor arguments in reader constructors (#6252)
  • Update third-party dependencies (2026-04-09) (#6287)
  • Declare free-threaded Python support on the python_function plugin (#6289)
  • Torchvision API documentation (#6281)
  • Update VERSION to 2.1.0
  • Move dynamic API class constructor docs to class-level docstrings (#6273)
  • Torchvision normalize (#6278)
  • Pipeline building in C API (#6253)
  • Torchvision padding (#6276)
  • Torchvision gaussian blur (#6275)
  • Torchvision API - ColorJitter and Grayscale operators (#6272)
  • Add uniform_sample option to VideoReaderDecoder (#6258)
  • Torchvision API - center crop operator (#6266)
  • Add quiet argument to RandomBBoxCrop to suppress crop failure warning (#6270)
  • Fix Coverity detected defects (#6257)
  • Improve deadsnakes PPA key handling in aarch64-linux Dockerfile (#6268)
  • Use NewThreadPool in dynamic mode. Use only one default instance of ThreadPool per device. (#6254)
  • Change a way deadsnakes ppa is accessed (#6263)
  • Torchvision API infrastructure (#6229)
  • New ThreadPool + thread pool facade (#6224)
  • Make result of AtScopeExit non-discardable. (#6248)
  • Move to CUDA 13.2 (#6249)
  • Add an ability to skip in-test timestamps (#6250)
  • Update third-party dependencies (2026.03) (#6243)
  • Add enable_frame_num='sequence' mode to video readers. (#6237)
  • Update JAX plugin to JAX 0.9. (#6238)
  • Add non-cooperative jobs to new ThreadPool (#6245)
  • Add shuffle_after_epoch_seed argument to file-based readers. (#6236)
  • Add numpy missing dependency to TL1_custom_src_pattern_build (#6240)
  • Set TensorList deletion order in set_order when possible (#6235)
  • Improve NDD operator filtering. (#6239)
  • Add numpy as an explicit conda dependency for dali_python_bindings (#6232)
  • Remove experimental C++ API documentation page. (#6230)
  • Rework NVTX annotations in dynamic mode (#6227)
  • Replace dm-tree with optree (#6225)
  • Add dump_artifacts flag to avoid dumping artifacts for expected test failures (#6223)
  • Improve nvcomp header detection for dynamic nvcomp builds (#6226)
  • Raise exceptions when an EvalContext is active in multiple threads (#6221)
  • Cleanup after instance cache rework. (#6209)
  • Remove start_immediately parameter from AddWork. (#6219)
  • Remove python tests with forced new executor. (#6222)
  • New thread pool (#4635)
  • Add exception propagation for deferred and async execution (#6210)

Bug Fixes ---

  • Compile the function ahead of time in the JAX example (#6286)
  • Add torchvision module to exclusion list in conda jupyter notebook (#6291)
  • Fix glob_to_regex for Python 3.14 (#6290)
  • DLPack import: Relax stride check in unit dimensions. (#6285)
  • Fix inflate operator: Reset max output volume. Use size in bytes, not elements. (#6283)
  • Fix stream handling in cvcuda resize. (#6284)
  • Defer DLTensor deletion when CUDA graph capture is active. (#6259)
  • Fix call stack depth handling for error tracebacks in dynamic mode (#6262)
  • Remove misleading legacy CMN warning for video layouts (#6269)
  • SequenceOperator: Do not keep thread pool and output order from the 1st iteration. (#6264)
  • Fix missing NVML_ENABLED guards around nvml.h includes (#6255)
  • Fix DALIGenericPeekableIterator missing pmap_compatible parameter. (#6256)
  • Fix UB in JpegCompressionDistortion: use data() for past-the-end pointer (#6251)
  • Fix compatibility with flax-basic_example.ipynb after JAX update (#6247)
  • Fix documentation switcher generation for 2.+ releases (#6246)
  • Fix range clamping in subscript operator. (#6242)
  • Fix TL1_python-nvjpeg_test test (#6233)
  • Fix EvalContext reentrancy (#6220)
  • Dynamic vs Pipeline mode equivalence tests, part 2 (#6214)
  • Fix builds with non-dynamic nvCOMP. (#6231)
  • Add missing include directory to nvCOMP stubgen command. (#6228)
  • Add ArgValue broadcasting when using ShapeFromSize callback. (#6218)
  • Fix out-of-bounds access and key handling in Caffe/Caffe2 reader (#6211)

Breaking API changes --- There are no breaking changes in this DALI release.

Deprecated features --- No features were deprecated in this release.

Known issues: ---

  • In some cases, the pass-through parallel external source outputs may be corrupted when used with pipelined dynamic executor. The issue occurs when all four conditions are met: 1. the pipeline uses dynamic executor exec_dynamic=True (default), 2. the external_source runs in parallel mode (parallel=True), 3. the ES output is directly returned from the pipeline, 4. the ES output is a single contiguous chunk of memory (either batch=True or batch_size=1). Currently, as a workaround, user can specify exec_dynamic=False when instantiating pipeline or add an extra fn.copy to prevent directly returning ES outputs from the pipeline.
  • A problem with insufficient static TLS allocation…

Excerpt shown — open the source for the full document.