NVIDIA/DALI v1.52.0
NVIDIA/DALI
Captured source
source ↗published Oct 27, 2025seen 2dcaptured 10hhttp 200method plain
DALI v1.52.0
Repository: NVIDIA/DALI
Tag: v1.52.0
Published: 2025-10-27T14:10:55Z
Prerelease: no
Release notes: Key Features and Enhancements --- This DALI release includes the following key features and enhancements:
- Introduced experimental Dynamic Mode: imperative execution model with lazy evaluation for easier integration into Python workflows. (#6066, #6064, #6060, #6056, #6042, #6039, #6037, #6036, #5954)
- Dynamic mode: add augmentation gallery (#6057)
- DALI Dynamic docs main page (#6052)
- Added pipeline ZOO - snippets and examples for common image and video processing use cases. (#5922)
- Added support for CUDA 13U2 (#6063)
- Added
fn.decoders.numpy(#5953) and CPUfn.pasteoperators (#5968).
Thank you @5had3z for your contributions.
- Exposed knobs for pipeline dynamic executor:
- Exposed executor's
stream_policyandconcurrencyoptions (#5983) - Environment variable to control executor threads. (#5949)
Fixed Issues ---
- Fixed stream ordering in Tensor::Copy and Tensor(List)GPU.as_cpu (#6070)
- Fixed conversion of pinned tensors to DLPack. (#6061)
- Fixed DLPack stride check if stride pointer is NULL
- Fixed handling of videos without keyframes and reuse of old indices (#6058)
- Fixed resize_crop_mirror video output shape (#5957)
Improvements ---
- Update to FFmpeg 8.0
- Dynamic mode: add augmentation gallery (#6057)
- Add dynamic API for math functions + tests. (#6066)
- Rename DALI2 to dynamic (#6064)
- Move to CUDA 13.0 U2 (#6063)
- Dynamic mode: operator base classes and operator call generator (#6060)
- Update VERSION to 1.52.0
- Update deps 25/10 (#6053)
- Dynamic Mode: Tensor and Batch Types (#6056)
- Remove CMake from acknowledgements. (#6020)
- DALI Dynamic docs main page (#6052)
- Reduce minimum throughput for experimental decoder in TL1_decoder_perf (#6050)
- Fix TL0_video_plugin to run with sanitizer (#6040)
- Imperative mode: Invocation (#6042)
- Update LD_PRELOAD in sanitizer configuration, exclude more numba tests (#6041)
- Imperative mode: EvalContext, EvalMode, Type and Device (#6039)
- Update the test environment to Ubuntu 24.04 (#6033)
- Update curl 3.15 -> 3.16 (#6038)
- Add TensorList broadcasting constructor. (#6037)
- Backend changes for imperative mode (#6036)
- Add nvcc/nvjitlink version compatibility check to numba CUDA test (#6035)
- Unify minimum required CMake version. (#6022)
- Fix installation of Horovod in TL1_tensorflow-dali_test (#6024)
- Remove confusing warning on host decoder fallback (#6029)
- Add
streamargument to TensorGPU DLPack constructor. (#6015) - Cumulative dependency update for September 2025. (#6017)
- Silence false warnings in sanitized build (#6018)
- Lower the 5% threshold in image decoder perf test to 15% to account for off iterations (#6021)
- Bump CMake to 3.25.2 (#6019)
- Move to CUDA 13.0 U1 (#6016)
- Move to the gcc-toolset-14 (#6014)
- Update test packages (#6010)
- Correct support matrix entry for Orin (#6008)
- Silence a false positive warning triggered by GCC 12.2.1 (#6002)
- Fix CVE-2024-13978 and CVE-2025-8534 in libtiff (#6007)
- Bump up OpenCV version to 4.12 in conda (#6005)
- Move to the latest nvJPEG2k (#6000)
- Enable more aggressive binary compression (#6001)
- Use subprocess.run in get_tf_compiler_version to avoid CalledProcessError on grep (#5991)
- Add functions that change the type of the tensor or tensor list to a different type of the same size. (#5995)
- Update OpenCV version in tests (#5987)
- Improve performance of experimental.resize (#5662)
- Expose executor policy flags (#5983)
- Pin CMake to max 4.0.3 in jupter_conda tests. (#5985)
- Add driver version check to the usage of numba_cuda (#5982)
- Fix nvComp installation in tests (#5984)
- Update DALI_DEPS_VERSION to use patched libtiff (#5981)
- Improve creating image batches in CV-CUDA ops (#5966)
- Dependency update 07-2025 (#5978)
- Make the numba operator compatible with the numba-cuda package (#5975)
- Adjust TF plugin build dependencies (#5976)
- fn.paste CPU impl (#5968)
- Make sure that protobuf always uses own absl version instead of system one (#5974)
- Thread pool with semaphore and spinlock (#5970)
- Extend GetInputDevice in OpSchema python bindings. (#5972)
- Remove data preparation instructions from the video superres use case (#5965)
- Added fn.decoders.numpy (#5953)
- Pipeline zoo - initial commit (#5922)
- Expose Stream, Operator and Workspace in Python (#5954)
- Fix nvcc not working with sanitizer (#5959)
- Make the number of dynamic executor threads configurable via environment variables. (#5949)
Bug Fixes ---
- Fix stream ordering in Tensor::Copy and Tensor(List)GPU.as_cpu
- Fix conversion of pinned tensors to DLPack. (#6061)
- Fix DLPack tests to use HWC layout instead of NHWC (#6062)
- Fix handling of videos without keyframes and reuse of old indices (#6058)
- Refactor layout handling in Python backend + add layout dimensionality checks in Tensor and TensorList python bindings (#6054)
- Fix standalone op output streams. (#6055)
- Remove EvalContext destructor. (#6043)
- Fix static analysis issues (#6032)
- Install newer CMake in TL0_jupyter (#6034)
- Disable PYBIND11_FINDPYTHON in CMakeLists.txt (#6031)
- Remove a custom patch for PyCuda, add numba_cuda version constrain (#6023)
- Bugfix: Skip DLPack stride check if stride pointer is NULL
- Improve error handling in ThreadPool (#6011)
- Fix test_backend_impl launch command (#6003)
- Remove unnecessary default values from optional arguments. (#5992)
- Add missing backslash in test scripts. (#5986)
- Fixes outdated DALI mannylinux tag (#5980)
- resize_crop_mirror - invalid video output shape fix (#5957)
Breaking API changes --- There are no breaking changes in this DALI release.
Deprecated features --- No features were deprecated in this release.
Known issues: ---
- In some cases, the pass-through parallel external source outputs may be corrupted when used with pipelined dynamic executor. The issue occurs when all four conditions are met: 1. the pipeline uses dynamic executor
exec_dynamic=True(default), 2. theexternal_sourceruns in parallel mode (parallel=True), 3. the ES output is directly returned from the pipeline, 4. the ES output is a single contiguous chunk of memory (eitherbatch=Trueorbatch_size=1). Currently, as a workaround, user can specifyexec_dynamic=Falsewhen instantiating pipeline or add an extrafn.copyto prevent directly returning ES outputs from the pipeline. - A problem with insufficient static TLS…
Excerpt shown — open the source for the full document.