docs(rust): add async public API design by msrathore-db · Pull Request #398 · adbc-drivers/databricks

msrathore-db · 2026-04-23T13:56:08Z

Summary of the design

Proposes a parallel, native-async public API on the Rust ADBC kernel. The existing synchronous ADBC trait implementations and the C FFI surface used by the ODBC driver are preserved byte-identically — ODBC keeps linking against the same symbols with the same behavior.

Motivation: the Node.js SQL driver's SEA support is being built as a napi-rs native addon that wraps this kernel. napi-rs cannot consume the current sync-over-async public surface because block_on / block_in_place inside a napi-rs tokio worker thread triggers nested-runtime panics or starves the worker pool.

Approach: publish the async infrastructure that already exists inside the kernel (e.g. StreamingCloudFetchProvider::next_batch, #[async_trait] DatabricksClient, the existing CancellationToken plumbing, memory-bounded concurrency via chunks_in_memory: AtomicUsize, and the TokenStore Fresh/Stale/Expired state machine) as an additive public contract. No new async primitives are built — this is mostly a visibility / surface-area change.

Estimated ~700 LOC added, ~0 LOC deleted across a 9-phase implementation plan.

This PR is docs-only. Once aligned, each phase in §10 becomes a separate code PR.

Key decisions and alternatives considered

Inherent async methods on existing structs, parallel to sync ADBC trait impls — chosen. Both surfaces share client, session, and handle state. Rejected alternative: separate crate for the async surface (would require lifting shared types into a third core crate with arbitrary-feeling visibility boundaries).
New AsyncResultReader trait next to the existing ResultReader — chosen. The existing ResultReader feeds ResultReaderAdapter → Arrow's RecordBatchReader, which is a fundamentally sync iterator and cannot be async. A parallel trait is cleaner than retrofitting.
RuntimeMode enum (Owned(Runtime) | Borrowed(Handle)) — proposed. ODBC continues to get the Owned path (today's behavior, zero change). napi-rs passes a Handle and gets Borrowed mode. Alternative: always take an optional Handle. Flagged as Open Question for reviewer input.
AsyncAuthProvider as a new trait, not a replacement for AuthProvider — chosen. Every concrete auth type (PAT, OAuth M2M, OAuth U2M) implements both. Sync trait stays forever as the ADBC contract. Removes the block_in_place + Handle::current().block_on(...) bridge inside get_auth_header for async callers.
Gated behind an async-api Cargo feature, enabled by default — proposed. Open Question for reviewers.
Drop in Borrowed mode spawns cleanup onto the Handle and returns immediately — chosen. Avoids JS-thread freezes during V8 GC of napi-rs wrappers. Callers should prefer explicit close_async().await for deterministic cleanup.

Rejected alternatives (§14 of the doc):

Making ADBC trait implementations themselves async — breaks every ADBC consumer.
Removing block_on from sync methods entirely — sync-only C/C++ callers have no runtime to enter.
Exposing async through the C FFI — C has no async; each language binding (napi-rs, PyO3, JNI) consumes the Rust-native async API directly.

Areas needing specific review focus

ADBC / ODBC byte-identical guarantee (§8 of doc). Does the plan actually leave ODBC behavior unchanged? The existing databricks-odbc integration test suite is the proposed tripwire — is that sufficient, or do we need additional gates?
Runtime ownership model (§4.3, §6.5). Is RuntimeMode enum the right abstraction, or would an optional Handle (always Borrowed when provided, always Owned otherwise) be simpler?
AsyncResultReader trait shape (§5.4). Does schema() / next_batch() / cancel() cover every realistic consumer need, or are we missing things like row-count hints, schema metadata propagation, or partition iteration?
Drop semantics (§6.4). Is spawn-and-forget in Borrowed mode acceptable, or do we need a stronger guarantee (e.g., a reaper task that awaits all outstanding cleanups on kernel shutdown)?
Feature flag default (§9). Enable async-api by default, or opt-in?
Upstream coordination / rollout. This design benefits from a discussion with kernel maintainers before the first code PR lands. Who owns that conversation and on what timeline?

Notes

Branch pushed directly to upstream given admin access; fork (msrathore-db/databricks) is not GitHub-linked as a fork so cross-repo PR wasn't possible.
No Jira ticket associated yet — will backfill once we align on rollout approach.
Drafting implementation Jira tickets one-per-phase after this design is approved.

Co-authored-by: Isaac

This design document proposes exposing a parallel, native-async public API on the Rust ADBC kernel. The existing synchronous ADBC traits and the C FFI surface used by the ODBC driver are preserved byte-identically. Motivation: the Node.js SQL driver's SEA support is being built as a napi-rs native addon that wraps this kernel. napi-rs cannot consume the current sync-over-async public surface because block_on / block_in_place inside a napi-rs tokio worker thread causes nested-runtime panics. The design covers runtime-ownership refactoring, async constructors, async metadata/execute methods, a new AsyncResultReader trait, async auth provider mirrors, Drop semantics for borrowed-runtime mode, phased implementation plan, testing strategy, and compatibility guarantees for ODBC. Co-authored-by: Isaac

lidavidm

Two things:

(1) Please see the upstream proposal: apache/arrow-adbc#3714
(2) We just released an ADBC driver manager for Node.js, is there any reason we can't use that instead?

Adds §18 to the async public API design based on a full audit of the Node driver's existing auth/transport stack and how it interacts with the proposed kernel async surface. Five gaps not fully addressed in the initial design are documented: 1. Kernel's set_auth_provider uses OnceLock — panics on second call — which blocks any post-construction token rotation. Proposes adding interior mutability (Arc<RwLock<String>>) to PersonalAccessToken with an update_token setter. Small kernel change (~20 LOC). 2. Mid-stream token refresh strategy: TS-driven pre-emptive push based on JWT exp claim, plus a 401-retry safety net for idempotent metadata ops only. 3. TLS trust-store mismatch — Node uses its bundled Mozilla CA list plus NODE_EXTRA_CA_CERTS env; Rust uses the OS system store and ignores NODE_EXTRA_CA_CERTS. Concrete failure scenarios + proposed TS-side bridge (caPath option, temp-file serialization for in-memory ca Buffers) + flagged kernel PR for in-memory Certificate::from_pem. 4. mTLS gap — Node supports client certs, kernel doesn't. Flagged as known SEA v1 limitation with a separate kernel issue. 5. Proxy divergence — Rust has no SOCKS support. Node binding must fail-fast with a clear error for SOCKS users. Plus addresses User-Agent masquerade (kernel defaults to a JDBC UA string that would misattribute SEA traffic), read-timeout default (60s is too short for real workloads), and adds five new Open Questions + a new Phase 2b for the mutable-PAT work. Co-authored-by: Isaac

zeroshade · 2026-04-23T20:00:04Z

+
+## 1. Summary
+
+This design proposes adding a **parallel, native-async public API** to the `databricks-adbc` Rust kernel. The existing synchronous ADBC trait surface (which the ODBC C++ driver consumes via a C FFI) remains byte-identical. The new async surface unlocks a Node.js binding (and future Python/JNI bindings) that cannot tolerate the `block_on` / `block_in_place` bridges currently used everywhere sync-public-wraps-async-internal.


We already have a node.js driver manager that could load the driver without this, it would be better for this proposal to go to the arrow-adbc repo

We're yet to align on whether we want the rust kernel to adhere to the ADBC spec. If the decision is not follow the spec but have an independent databricks rust kernel, then this binding might be reimplemented. This current doc PR is just experimental on what we can achieve. The final design doc will be decided later

Adds §19 — a validation pass comparing this design against the actual implementation of @apache-arrow/adbc-driver-manager (Apache's Node-side napi-rs wrapper over the ADBC C API). Reading Apache's implementation produced three categories of findings: **Patterns to adopt verbatim** (§19.2): - Capture-then-reject error pattern. Never use env.throw() inside AsyncTask::reject — it creates a pending V8 exception that escapes as uncaughtException instead of rejecting the Promise. Fix: capture raw error in compute() into a field on the Task, then construct a structured JS Error object in reject() via create_error + set_named_property. Non-negotiable — getting this wrong produces production-breaking uncaughtExceptions. - Option<Arc<T>> state with close() = take() — uniform across Database/Connection/Statement/ResultIterator. - Raw napi classes prefixed with underscore, TS wrappers mandatory (our translation layer). - Symbol.asyncDispose support for using / await using ergonomics. - String-keyed HashMap for options. **Patterns to deliberately diverge from** (§19.3): - Streaming granularity: Apache drains entire RecordBatchReader into one IPC buffer per query. Would OOM on our EXTERNAL_LINKS GB-scale results. We stay with batch-at-a-time next_batch(). - AsyncTask + libuv vs #[napi] async fn + tokio: Apache is sync-kernel so AsyncTask fits; our kernel is async so #[napi] async fn is right. - Node version floor: we need Node 16+; Apache requires Node 22+. Not a choice for us. - musl-linux: included in our matrix; Apache excludes. - cancel(): Apache omits; we need it for long SEA queries. **Validated existing design choices:** - Arrow IPC bytes as FFI wire format. - Two-layer TS wrapper over raw napi classes. - Structured JS error with sqlState. - IBackend / SeaBackend split. Four independent reasons reaffirm we cannot adopt Apache's npm directly (§19.7): ADBC-compliance coupling risk, alpha status, Node 22+ floor, auth feature gap. We stay aligned with their binding-layer conventions while building our own crate. Adds two new open questions (#12 runtime model for Borrowed mode, #13 capture-then-reject style guide), updates file-by-file map with concrete version pins (napi-rs 3.0, napi-build 2, arrow-ipc/array/ schema >=53.1.0). Co-authored-by: Isaac

Adds a short §20 to the design covering the convergence plan with apache/arrow-adbc#3714 (active upstream PR that adds async traits to adbc_core). Three items: 1. Adopt RPITIT style (`fn foo() -> impl Future<Output=T> + Send`) instead of `#[async_trait]` from day one. Supersedes prior references to async-trait in §5.4/§5.5. Zero-cost convergence when #3714 lands. 2. Convergence is a mechanical ~200 LOC refactor when upstream merges — no behavioral change, napi-rs binding unaffected. Our AsyncResultReader becomes Pin<Box<dyn RecordBatchStream + Send>>; our inherent methods become trait impls; our Databricks-specific metadata method variants go away since upstream `get_objects` covers them. Our AsyncAuthProvider stays — ADBC has no auth trait. 3. Deep-dive analysis of the upstream PR and the Apache Node driver manager (@apache-arrow/adbc-driver-manager) lives in a companion reference doc (adbc-upstream-references.md) to keep the design focused on architecture rather than investigation notes. Why this was not folded into §19 (Apache Node binding validation): different subject matter (runtime trait shape vs binding-layer patterns), different convergence paths (we already implement adbc_core sync traits; we'll add async ones; we don't adopt the Node package). Co-authored-by: Isaac

msrathore-db · 2026-04-27T06:07:26Z

Two things:

(1) Please see the upstream proposal: apache/arrow-adbc#3714 (2) We just released an ADBC driver manager for Node.js, is there any reason we can't use that instead?

What's the timeline for the upstream proposal to land?
Will it be able to support async execution? If yes, what's the timeline for that?

lidavidm · 2026-04-27T06:10:26Z

Two things:
(1) Please see the upstream proposal: apache/arrow-adbc#3714 (2) We just released an ADBC driver manager for Node.js, is there any reason we can't use that instead?
1. What's the timeline for the upstream proposal to land?

2. Will it be able to support async execution? If yes, what's the timeline for that?

For (2), I assume you mean the Node.js side? @kentkwu can you help out here?

lidavidm requested changes Apr 23, 2026

View reviewed changes

zeroshade reviewed Apr 23, 2026

View reviewed changes

msrathore-db marked this pull request as draft April 23, 2026 20:57

msrathore-db added 2 commits April 24, 2026 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(rust): add async public API design#398

docs(rust): add async public API design#398
msrathore-db wants to merge 4 commits intomainfrom
design/rust-async-interface

msrathore-db commented Apr 23, 2026

Uh oh!

lidavidm left a comment

Uh oh!

zeroshade Apr 23, 2026

Uh oh!

msrathore-db Apr 23, 2026

Uh oh!

msrathore-db commented Apr 27, 2026

Uh oh!

lidavidm commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## 1. Summary

		This design proposes adding a parallel, native-async public API to the `databricks-adbc` Rust kernel. The existing synchronous ADBC trait surface (which the ODBC C++ driver consumes via a C FFI) remains byte-identical. The new async surface unlocks a Node.js binding (and future Python/JNI bindings) that cannot tolerate the `block_on` / `block_in_place` bridges currently used everywhere sync-public-wraps-async-internal.

Conversation

msrathore-db commented Apr 23, 2026

Summary of the design

Key decisions and alternatives considered

Areas needing specific review focus

Notes

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

zeroshade Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

msrathore-db Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

msrathore-db commented Apr 27, 2026

Uh oh!

lidavidm commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants