perf(rust): reduce per-query overhead and coalesce small batches by vikrantpuppala · Pull Request #422 · adbc-drivers/databricks

vikrantpuppala · 2026-04-29T09:27:30Z

Summary

Four small kernel changes that together close the per-query gap vs the existing Thrift backend on small/medium results, with no regressions on large CloudFetch queries.

Skip redundant DELETE for inline-Closed statements. The SEA server returns status=Closed alongside inline result data — the statement is already cleaned up server-side, so issuing a DELETE is a wasted round-trip (~250ms). Plumbs a server_side_closed: bool through ExecuteResult; Statement::execute_single skips registering the statement_id for cleanup when set.
Make Drop for Statement non-blocking. Drop previously called block_on(close_statement(...)), forcing every caller to pay a synchronous cleanup round-trip even when nothing was waiting for the result. Spawn the close on the runtime instead — best-effort fire-and-forget. Saves ~250ms on every CloudFetch query.
Coalesce small batches on the inline path. InlineArrowProvider was emitting 200+ tiny RecordBatches per 100K-row result (one per IPC message). Adds a batch_merge_target_rows parameter and applies the same coalescing logic the CloudFetch download path already uses. Reduces per-batch overhead at language bindings (e.g. PyO3, ODBC).
Enable batch_merge_target_rows by default (128k rows). Was 0 (disabled). All consumers now get coalesced batches by default; no API change.

Behavior change to call out

Default batch_merge_target_rows flips from 0 to 128_000. Consumers that previously saw many small batches per chunk will now see ~1 large batch per chunk (post-merge). Set the option explicitly to 0 to opt out.

Benchmark

Dogfood warehouse, randomized interleaved (Rust vs Thrift) benchmark, 20 runs per size, median wall time on fetchall_arrow path:

size	Rust before	Rust after	Thrift	ratio (after/Thrift)
`SELECT 1`	500ms	394ms	387ms	1.02×
10K	950ms	893ms	1014ms	0.88×
100K	1450ms	1148ms	1145ms	1.00×
500K	2600ms	2178ms	3305ms	0.66×
1M	3700ms	3579ms	3814ms	0.94×
10M	8700ms	8677ms	8802ms	0.99×

Test plan

cargo +stable fmt --all -- --check clean
cargo clippy --all-targets -- -D warnings clean
cargo test (full suite, 349 tests) pass
End-to-end smoke against dogfood warehouse with both inline and CloudFetch paths

This pull request and its description were written by Isaac.

Four small kernel changes that together close the per-query gap vs the existing Thrift backend on small/medium results, with no regressions on large CloudFetch queries. 1. Skip redundant DELETE for inline-Closed statements. The SEA server returns status=Closed alongside inline result data — the statement is already cleaned up server-side, so issuing a DELETE is a wasted round-trip (~250ms). Plumb a `server_side_closed: bool` through ExecuteResult; Statement::execute_single skips registering the statement_id for cleanup when set. 2. Make Drop for Statement non-blocking. Drop previously block_on(close_statement(...)), forcing every caller to pay a synchronous cleanup round-trip even when nothing was waiting for the result. Spawn the close on the runtime instead — best-effort fire-and-forget. Saves ~250ms on every CloudFetch query. 3. Coalesce small batches on the inline path. InlineArrowProvider was emitting 200+ tiny RecordBatches per 100K-row result (one per IPC message). Add a batch_merge_target_rows knob and apply the same coalescing logic the CloudFetch download path uses. Reduces per-batch overhead at language bindings (e.g. PyO3, ODBC). 4. Enable batch_merge_target_rows by default (128k rows). Was 0 (disabled). All consumers now get coalesced batches by default; no API change. Measured on dogfood warehouse, randomized interleaved benchmark vs Thrift backend (median wall time, fetchall_arrow path): size Rust (before) -> Rust (after) Thrift ratio SELECT 1 500ms -> 394ms 387ms 1.02x 10K 950ms -> 893ms 1014ms 0.88x 500K 2600ms -> 2178ms 3305ms 0.66x 1M 3700ms -> 3579ms 3814ms 0.94x 10M 8700ms -> 8677ms 8802ms 0.99x

vikrantpuppala mentioned this pull request Apr 29, 2026

feat(rust-pyo3): add POC PyO3 Python bindings for the kernel #423

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(rust): reduce per-query overhead and coalesce small batches#422

perf(rust): reduce per-query overhead and coalesce small batches#422
vikrantpuppala wants to merge 1 commit intomainfrom
rust/perf-small-query-optimizations

vikrantpuppala commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vikrantpuppala commented Apr 29, 2026

Summary

Behavior change to call out

Benchmark

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant