NVIDIA/DALI v2.1.0
NVIDIA/DALI
Captured source
source ↗published Apr 28, 2026seen 2dcaptured 9hhttp 200method plain
DALI v2.1.0
Repository: NVIDIA/DALI
Tag: v2.1.0
Published: 2026-04-28T18:16:40Z
Prerelease: no
Release notes: Key Features and Enhancements --- This DALI release includes the following key features and enhancements:
- Added torchvision API:
- Enabled composing DALI pipelines with torchvision-like API (#6281, #6278, #6276, #6275, #6272, #6266, #6229)
- Added utilities for torch tensor/PIL image conversions (#6287)
- Improved DALI dynamic:
- Added new thread pool for better threads utilization and sharing across process (#4635, #6219, #6245, #6224, #6254)
- Improved error-reporting (#6210, #6260)
- Improved deletion order handling to avoid unnecessary syncs (#6277)
- Improved readers API (#6252)
- Improved video readers:
- Added uniform_sample option to VideoReaderDecoder (#6258)
- Added enable_frame_num='sequence' mode to video readers. (#6237)
- Improved free-threaded Python support:
- Support free-threaded Python in DALI python_function (#6289)
- Replaced dm-tree with optree dependency (#6225)
- Added support for CUDA 13.2 (#6249)
- Added support for instantiating operators and building pipelines in C API (#6253)
- Updated JAX integration to support JAX 0.9. (#6238, #6286, #6259, #6256, #6247)
Fixed Issues ---
- Documented workaround for CUDA graph capture clash between JAX and DALI (#6286)
- Fixed out-of-bounds access and key handling in Caffe/Caffe2 reader (#6211)
- Fixed range clamping in subscript operator. (#6242)
- Fixed too strict contiguity check when importing tensors via DLPack (#6285)
- Fixed inflate operator max output estimation (#6283)
Improvements ---
- Update DALI_DEPS_VERSION (patch libtiff) (#6295)
- Torchvision API to tensor/PIL image conversion operators (#6282)
- Allow passing tensor arguments in reader constructors (#6252)
- Update third-party dependencies (2026-04-09) (#6287)
- Declare free-threaded Python support on the python_function plugin (#6289)
- Torchvision API documentation (#6281)
- Update VERSION to 2.1.0
- Move dynamic API class constructor docs to class-level docstrings (#6273)
- Torchvision normalize (#6278)
- Pipeline building in C API (#6253)
- Torchvision padding (#6276)
- Torchvision gaussian blur (#6275)
- Torchvision API - ColorJitter and Grayscale operators (#6272)
- Add uniform_sample option to VideoReaderDecoder (#6258)
- Torchvision API - center crop operator (#6266)
- Add quiet argument to RandomBBoxCrop to suppress crop failure warning (#6270)
- Fix Coverity detected defects (#6257)
- Improve deadsnakes PPA key handling in aarch64-linux Dockerfile (#6268)
- Use NewThreadPool in dynamic mode. Use only one default instance of ThreadPool per device. (#6254)
- Change a way deadsnakes ppa is accessed (#6263)
- Torchvision API infrastructure (#6229)
- New ThreadPool + thread pool facade (#6224)
- Make result of AtScopeExit non-discardable. (#6248)
- Move to CUDA 13.2 (#6249)
- Add an ability to skip in-test timestamps (#6250)
- Update third-party dependencies (2026.03) (#6243)
- Add enable_frame_num='sequence' mode to video readers. (#6237)
- Update JAX plugin to JAX 0.9. (#6238)
- Add non-cooperative jobs to new ThreadPool (#6245)
- Add shuffle_after_epoch_seed argument to file-based readers. (#6236)
- Add numpy missing dependency to TL1_custom_src_pattern_build (#6240)
- Set
TensorListdeletion order inset_orderwhen possible (#6235) - Improve NDD operator filtering. (#6239)
- Add numpy as an explicit conda dependency for dali_python_bindings (#6232)
- Remove experimental C++ API documentation page. (#6230)
- Rework NVTX annotations in dynamic mode (#6227)
- Replace dm-tree with optree (#6225)
- Add dump_artifacts flag to avoid dumping artifacts for expected test failures (#6223)
- Improve nvcomp header detection for dynamic nvcomp builds (#6226)
- Raise exceptions when an EvalContext is active in multiple threads (#6221)
- Cleanup after instance cache rework. (#6209)
- Remove start_immediately parameter from AddWork. (#6219)
- Remove python tests with forced new executor. (#6222)
- New thread pool (#4635)
- Add exception propagation for deferred and async execution (#6210)
Bug Fixes ---
- Compile the function ahead of time in the JAX example (#6286)
- Add torchvision module to exclusion list in conda jupyter notebook (#6291)
- Fix glob_to_regex for Python 3.14 (#6290)
- DLPack import: Relax stride check in unit dimensions. (#6285)
- Fix inflate operator: Reset max output volume. Use size in bytes, not elements. (#6283)
- Fix stream handling in cvcuda resize. (#6284)
- Defer DLTensor deletion when CUDA graph capture is active. (#6259)
- Fix call stack depth handling for error tracebacks in dynamic mode (#6262)
- Remove misleading legacy CMN warning for video layouts (#6269)
- SequenceOperator: Do not keep thread pool and output order from the 1st iteration. (#6264)
- Fix missing NVML_ENABLED guards around nvml.h includes (#6255)
- Fix DALIGenericPeekableIterator missing pmap_compatible parameter. (#6256)
- Fix UB in JpegCompressionDistortion: use data() for past-the-end pointer (#6251)
- Fix compatibility with flax-basic_example.ipynb after JAX update (#6247)
- Fix documentation switcher generation for 2.+ releases (#6246)
- Fix range clamping in subscript operator. (#6242)
- Fix TL1_python-nvjpeg_test test (#6233)
- Fix EvalContext reentrancy (#6220)
- Dynamic vs Pipeline mode equivalence tests, part 2 (#6214)
- Fix builds with non-dynamic nvCOMP. (#6231)
- Add missing include directory to nvCOMP stubgen command. (#6228)
- Add ArgValue broadcasting when using ShapeFromSize callback. (#6218)
- Fix out-of-bounds access and key handling in Caffe/Caffe2 reader (#6211)
Breaking API changes --- There are no breaking changes in this DALI release.
Deprecated features --- No features were deprecated in this release.
Known issues: ---
- In some cases, the pass-through parallel external source outputs may be corrupted when used with pipelined dynamic executor. The issue occurs when all four conditions are met: 1. the pipeline uses dynamic executor
exec_dynamic=True(default), 2. theexternal_sourceruns in parallel mode (parallel=True), 3. the ES output is directly returned from the pipeline, 4. the ES output is a single contiguous chunk of memory (eitherbatch=Trueorbatch_size=1). Currently, as a workaround, user can specifyexec_dynamic=Falsewhen instantiating pipeline or add an extrafn.copyto prevent directly returning ES outputs from the pipeline. - A problem with insufficient static TLS allocation…
Excerpt shown — open the source for the full document.