feat(rust-pyo3): add POC PyO3 Python bindings for the kernel by vikrantpuppala · Pull Request #423 · adbc-drivers/databricks

vikrantpuppala · 2026-04-29T09:29:56Z

Summary

This is a proof-of-concept / RFC. Not for merge as-is.

Adds a new satellite crate rust-pyo3/ that exposes the Databricks ADBC Rust kernel to Python via PyO3 + abi3. Validates the multi-language strategy in docs/kernel-strategy-final-recommendation.md — Decision 2: first-party PyO3 binding over routing through adbc_driver_manager.

What works

End-to-end smoke against a live SEA warehouse (PAT auth).
Inline + CloudFetch result paths.
pyarrow.Table return via Arrow C Data Interface (zero-copy where the schema permits).
abi3-py39 wheel — one wheel covers Python 3.9+ versions.
Performance at parity with or better than the existing Thrift-based path on most query sizes (see numbers in perf(rust): reduce per-query overhead and coalesce small batches #422).

Public surface

import databricks_adbc_pyo3 as dbx

conn  = dbx.Connection(host, http_path, access_token, *, catalog=None, schema=None)
rs    = conn.execute(sql)            # → ResultSet (returns when schema is known)
rs.num_columns(); rs.column_names(); rs.arrow_schema()
batch = rs.fetch_next_batch()        # → pyarrow.RecordBatch | None  (streaming)
table = rs.fetch_all_arrow()         # → pyarrow.Table              (drains rest)

Explicitly out of scope (POC)

PAT only — no OAuth M2M, U2M, Azure SP, or external credential providers.
No metadata methods (get_objects, get_table_schema, get_table_types).
No async execute, no cancel(), no Ctrl-C signal handling, no logging bridge into Python logging.
No prepared statements / parameter binding from Python.
No tests, no CI integration, not packaged for PyPI.

Design notes for reviewers

Wraps the kernel's ADBC Optionable layer (string-keyed config). The kernel-strategy doc proposes typed Rust config structs (DatabricksConfig, AuthConfig, …) as Phase 0a. Once that lands, this binding should switch to typed construction. Currently this layering choice is a deliberate POC compromise — Phase 0a is a separate workstream and didn't seem worth blocking the binding spike on.
Uses a new Statement::execute_owned inherent method (in perf(rust): reduce per-query overhead and coalesce small batches #422) that returns Box<dyn RecordBatchReader + Send + 'static>, decoupling the reader's lifetime from the Statement. This lets the PyO3 wrapper hold the reader past the Statement's drop, which is needed for streaming.
Releases the GIL during all kernel-side work (execute_owned, batch drain) and reacquires once for the pyarrow conversion phase.

Dependency

Depends on #422 for the execute_owned API and the server_side_closed cleanup short-circuit. This branch is based on that one; if #422 lands first, this rebases trivially onto main.

Open questions

Do we want to keep this satellite under databricks-adbc/rust-pyo3/ long-term, or move it to its own repo (databricks-adbc-python or similar) once it grows beyond a POC?
Phase 0a (typed config structs) is a precondition for a production-quality binding. Should this PR land as-is and we do Phase 0a + a typed-API rewrite as a follow-up, or is it cleaner to hold this until Phase 0a is done?
The kernel-strategy doc commits to PyO3 + abi3 — but the existing Python design doc (python-driver-rust-adbc-sea-design.md §4.2-4.3) still describes the adbc_driver_manager integration. The newer doc supersedes that, but the older sections need a rewrite.

Test plan

maturin develop --release produces an abi3-py39 wheel.
import databricks_adbc_pyo3 works in a fresh venv.
examples/e2e_smoke.py round-trips small (SELECT 1) and large (1M rows via range()) queries against a dogfood warehouse.
Production tests, CI integration, real auth, real packaging — all deferred.

This pull request and its description were written by Isaac.

Four small kernel changes that together close the per-query gap vs the existing Thrift backend on small/medium results, with no regressions on large CloudFetch queries. 1. Skip redundant DELETE for inline-Closed statements. The SEA server returns status=Closed alongside inline result data — the statement is already cleaned up server-side, so issuing a DELETE is a wasted round-trip (~250ms). Plumb a `server_side_closed: bool` through ExecuteResult; Statement::execute_single skips registering the statement_id for cleanup when set. 2. Make Drop for Statement non-blocking. Drop previously block_on(close_statement(...)), forcing every caller to pay a synchronous cleanup round-trip even when nothing was waiting for the result. Spawn the close on the runtime instead — best-effort fire-and-forget. Saves ~250ms on every CloudFetch query. 3. Coalesce small batches on the inline path. InlineArrowProvider was emitting 200+ tiny RecordBatches per 100K-row result (one per IPC message). Add a batch_merge_target_rows knob and apply the same coalescing logic the CloudFetch download path uses. Reduces per-batch overhead at language bindings (e.g. PyO3, ODBC). 4. Enable batch_merge_target_rows by default (128k rows). Was 0 (disabled). All consumers now get coalesced batches by default; no API change. Measured on dogfood warehouse, randomized interleaved benchmark vs Thrift backend (median wall time, fetchall_arrow path): size Rust (before) -> Rust (after) Thrift ratio SELECT 1 500ms -> 394ms 387ms 1.02x 10K 950ms -> 893ms 1014ms 0.88x 500K 2600ms -> 2178ms 3305ms 0.66x 1M 3700ms -> 3579ms 3814ms 0.94x 10M 8700ms -> 8677ms 8802ms 0.99x

This is a proof-of-concept satellite crate that exposes the Rust kernel to Python via PyO3 + abi3, validating the multi-language strategy in docs/kernel-strategy-final-recommendation.md (Decision 2: PyO3 over adbc_driver_manager for Python). Status: DRAFT / RFC. Not for merge as-is. Surface: Connection(host, http_path, access_token, *, catalog=None, schema=None) .execute(sql) -> ResultSet ResultSet .num_columns(), .column_names(), .arrow_schema() .fetch_next_batch() -> pyarrow.RecordBatch | None (streaming) .fetch_all_arrow() -> pyarrow.Table (drains rest) Working: - End-to-end smoke against a live SEA warehouse (PAT auth). - Inline + CloudFetch result paths. - pyarrow.Table return via Arrow C Data Interface (zero-copy where schema permits). - abi3-py39 wheel — one wheel covers Python 3.9+. - Performance at parity with or better than the existing Thrift-based path on most query sizes. Deferred (POC scope): - PAT only — no OAuth M2M / U2M / Azure SP / external credential providers. - No metadata methods (get_objects, get_table_schema, get_table_types). - No async execute, cancel, Ctrl-C signal handling, logging bridge. - No prepared statements / parameter binding. - No tests, no CI integration, not packaged for PyPI. Design: - Wraps the kernel's ADBC Optionable layer (string-keyed config). Should switch to typed config structs once Phase 0a (DatabricksConfig, AuthConfig, ...) lands per the kernel-strategy doc. - Uses a new `Statement::execute_owned` inherent method (in this PR's parent perf branch) that returns `Box<dyn RecordBatchReader + Send + 'static>`, decoupling the reader's lifetime from the Statement so the binding can hold the reader past the Statement's drop. - Releases the GIL during all kernel-side work. Depends on the perf-small-query-optimizations branch for the `execute_owned` API and `server_side_closed` cleanup short-circuit.

vikrantpuppala added 2 commits April 29, 2026 09:27

vikrantpuppala mentioned this pull request Apr 29, 2026

feat: route use_sea=True through ADBC-Rust kernel via PyO3 databricks/databricks-sql-python#782

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust-pyo3): add POC PyO3 Python bindings for the kernel#423

feat(rust-pyo3): add POC PyO3 Python bindings for the kernel#423
vikrantpuppala wants to merge 2 commits intomainfrom
rust-pyo3/poc-binding

vikrantpuppala commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vikrantpuppala commented Apr 29, 2026

Summary

What works

Public surface

Explicitly out of scope (POC)

Design notes for reviewers

Dependency

Open questions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant