What does this repo signal mean?

NVIDIA published NVIDIA/cuBQL (C++). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo NVIDIA/cuBQL · language C++ · New repo with 66 stars, routine for NVIDIA.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

NVIDIA Repo: NVIDIA/cuBQL

Captured source

source ↗

GitHub/github.com/NVIDIA/cuBQL

NVIDIA/cuBQL repository metadata

Source ↗

published Oct 8, 2025seen 1wcaptured 1whttp 200method plain

NVIDIA/cuBQL

Description: A CUDA BVH Build And Query Library

Language: C++

License: Apache-2.0

Stars: 66

Forks: 16

Open issues: 0

Created: 2025-10-08T22:13:06Z

Pushed: 2026-06-14T20:31:57Z

Default branch: main

Fork: no

Archived: no

README:

cuBQL - A CUDA "BVH Build-and-Query" Library

Build Status: ![Windows](https://github.com/NVIDIA/cuBQL/actions/workflows/Windows.yml) ![Ubuntu](https://github.com/NVIDIA/cuBQL/actions/workflows/Ubuntu.yml)

CuBQL (say: "cubicle") is a (mostly) header-only CUDA/C++ library for the easy and efficient GPU-construction and -traversal of bounding volume hierarchies (BVHes), with the ultimate goal of providing the tools and infrastructure to realize a wide range of (GPU-accelerated) spatial queries over various geometric primitives.

CuBQL is largely inspired by two libraries: the standard template library (STL), and cub. Like those two libraries cuBQL largely relies on header-only CUDA/C++ code, and on the use of templates and lambda functions to make sure that certain key operations (like traversing a BVH) can work for different primititive, different data type and dimensionality (e.g., float3 vs int2), multiple different but similar geometric queries (e.g., find closest point vs k-nearest neighbor (kNN) vs signed distance functions (SDF), etc).

Throughout cuBQL, the main driving goal are robustness, generality, and ease of use: each builder for each BVH type should always work for all input types and dimensionality, and even for numerically challenging input data.

cuBQL Functionality - Overview

CuBQL offers four separate layers of functionality:

Abstract BVH Type layer: defines the basic (GPU friendly) type(s)

for different kinds of BVHes. In particular, the cuBQL bvh types is templated over what geometric space the BVH is to be built over; i.e., you can realize not only BVHes over float3 data, but also BVHes over int4, double2, etc (cuBQL spans the entire space of {int,float,double,long}x{2,3,4,N}).

BVH builders layer: provides a set of primarily GPU-side (but also

some simple host side) builder(s) for the underlying BVH type(s). This level offers multiple different builders with different speed/quality tradeoffs (though the default gpuBuilder should work well for most cases).

BVH Traversal Templates layer: though different types of geometric

queries are often *similar in concept*, nevertheless they often slightly *differ in detail*. Instead of only providing a fixed set of very specific geometric queries cuBQL focusses on providing a set of traversal *templates* that, though the use of lambda functions, can easily be modified in their details. E.g., both a kNN and a find closest point query will build on the same shrinking radius query, with just different way of processing a given candidate primitive encountered during traversal.

Various (specific) Geometric Queries, realized with the underlying

layers. cuBQL provides these queries more as *samples* than anything else, fully assuming that many users will have requirements that the existing samples will not capture---but which these samples's use of the traversal templates should show how to realize.

Supported BVH Type(s)

The main BVH type of this library is a binary BVH, where each node contains that node's bounding box, as well as two ints, count and offset.

template
struct BinaryBVH {
struct CUBQL_ALIGN(16) Node {
box_t bounds;
uint64_t offset : 48;
uint64_t count : 16;
};

Node *nodes;
uint32_t numNodes;
uint32_t *primIDs;
uint32_t numPrims;
};

The count value is 0 for inner nodes, and for leaf nodes specifies the number of primitives in this leaf. For inner nodes, the offset indexes into the BinaryBVH::nodes[] array (the current node's two children are at nodes[offset], and nodes[offset+1], respectively); for leaf nodes it points into the BinaryBVH::primIDs[] array (i.e., that leaf contains primID[offset+0], primID[offset+1], etc).

A WideBVH type (templated over BVH width) is supported as well. WideBVH'es always have a fixed branching factor of N (i.e., a fixed number of N children in each inner node); however, some of these may be 'null' (marked as not valid). Note that most builders will only work for binary BVHes; these can then "collapsed" into Wide-BVHes.

Though most of the algorithms and data types in this library could absolutely be templated over both dimensionality and underlying data type (i.e., a BVH over double4 data rather than float3), for sake of readability in this particular implementation this has not been done (yet?). If this is a feature you would like to have, please let me know.

(on-GPU) BVH Construction

The main workhorse of this library is a CUDA-accelerated and on device parallel BVH builder (with spatial median splits). The primary feature of the BVH builder is its simplicity; i.e., it is still "reasonably fast", but it is much simpler than other variants. Though performance will obviously vary for different data types, data distributions, etc..., right now this builder builds a BinaryBVH over 10 million uniformly distributed random points in under 13ms; that's not the fastest builder I have, but IMHO quite reasonable for most applications. In addition to this cuBQL::gpuBuilder() there are also various other builders, including a regular morton/radix builder, a wide GPU builder (for BVHes with branching factors greater than 2), a surface-area-heuristic (SAH) builder, and a modified morton/radix builder that for numerically challenging inputs is significantly more robust than a regular morton/radix builder.

For all builders, the overall build process is always the same: Create an array of bounding boxes (one box per primitive), and call the builder with a pointer to this array, and the number of primitives. For GPU-side builders this array has to live in device (or managed) memory; for host side builds it has to be in host memory. Obiously, device side builders will create node and primitmive ID arrays in device memory, the host builder will create these in host memory.

Given such an array, the builder (in this case, for float3 data) gets invoked as follows:

#include "cuBQL/bvh.h"
...
box3f *d_boxes = 0;
int numBoxes = 0;
userCodeForGeneratingPrims(&d_boxes,&numBoxes, ...);
......

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New repo with 66 stars, routine for NVIDIA.