feat(dpp)!: convert Signer trait to async#3492
Conversation
Convert the Signer<K> trait in rs-dpp to an async trait via #[async_trait], enabling true async signing for iOS HSM / Secure Enclave / keychain backends without blocking any thread during biometric prompts or hardware calls. - Signer::sign and Signer::sign_create_witness are now async; can_sign_with stays sync - All 9 concrete Signer implementations converted (SimpleSigner, SingleKeySigner, DummySigner, 2x TestAddressSigner, IdentitySignerWasm, PlatformAddressSignerWasm, VTableSigner, AddressSigner) - Propagate async through rs-dpp state-transition builders, rs-sdk transition builders, rs-drive-abci tests, wasm-sdk, and strategy-tests - Iterator closures calling signer.sign(...) rewritten as sequential for loops (order-preserving, borrow-friendly) - Teach #[stack_size] macro to wrap async fn bodies in a tokio current-thread runtime inside the spawned thread so integration tests keep working BREAKING CHANGE: rs-sdk-ffi Signer vtable redesigned with a completion- callback pattern. The old synchronous SignCallback + free_result_callback pair is replaced with SignAsyncCallback + SignCompletionCallback, so the iOS side can return from the sign callback immediately and invoke the Rust completion function later (from any thread) when the HSM/keychain response arrives. No thread is blocked during the wait. Migration notes for iOS consumers: - dash_sdk_signer_create no longer takes a free_result_callback - The sign callback signature changes: iOS stashes (completion_ctx, completion) and invokes completion(ctx, sig, len, err) when ready - SingleKeySigner now routes through a native (non-callback) path with no C-callback bounce, via signer_handle_from_single_key Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedToo many files! This PR contains 182 files, which is 32 over the limit of 150. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (182)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🕓 Ready for review — 7 ahead in queue (commit 2366428) |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v3.1-dev #3492 +/- ##
============================================
- Coverage 88.30% 88.29% -0.01%
============================================
Files 2474 2474
Lines 302380 300964 -1416
============================================
- Hits 267008 265737 -1271
+ Misses 35372 35227 -145
🚀 New features to boost your workflow:
|
|
✅ DashSDKFFI.xcframework built for this PR.
SwiftPM (host the zip at a stable URL, then use): .binaryTarget(
name: "DashSDKFFI",
url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
checksum: "37a3f4f1b1db3c9df75ab91b71843821f447e5bef4eaab9c1e62dfd10514e6b8"
)Xcode manual integration:
|
QuantumExplorer
left a comment
There was a problem hiding this comment.
-
packages/rs-sdk-ffi/src/signer.rs:266-299: ifsign_asyncreturns without ever invokingcompletion, thecompletion_ctxbox is leaked andrx.awaitnever resolves. TheErr(_recv_err)branch here does not catch that case, so this hangs permanently instead of surfacing the intended protocol error. That makes cancellation / forgotten completion paths wedge the signer future forever. -
packages/rs-sdk/src/sdk.rs:316-327: auto-detect still parses the first proof withPlatformVersion::latest()and only learns the real protocol version after successful proof parsing. On an older network where proof interpretation differs fromlatest(), the first proof-backed request can fail before the SDK ever bootstraps itself, so the advertised auto-detect behavior is incomplete.
…llback If the FFI signer's sign_async returns without ever invoking completion, the oneshot Sender sits inside a Box handed to the C side and never drops, so the previous `rx.await` match arm `Err(_recv_err)` could never fire and the awaiting async fn would hang forever. Bound the wait with tokio::time::timeout (5 min — generous enough for biometric + HSM round-trips). On timeout we surface a recoverable ProtocolError and leak the one Box<Sender> rather than risk a use-after-free if the caller eventually does invoke completion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The async Signer conversion across rs-dpp / rs-sdk is mostly mechanical, but the new rs-sdk-ffi completion-callback bridge still has two real correctness holes at the language boundary. A missing completion callback does not take the documented recoverable-error path because Rust has already leaked ownership of the sender into completion_ctx, so the caller hangs indefinitely instead. Separately, the completion entry point trusts the foreign side to call it exactly once; a double callback reconstructs a Box from already-freed memory and turns a Swift/iOS logic bug into host-process UB.
Reviewed commit: 91f68cc
🔴 2 blocking
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-sdk-ffi/src/signer.rs`:
- [BLOCKING] lines 266-300: Missing completion callback now hangs forever instead of surfacing the documented protocol error
`VTableSigner::sign()` moves the `oneshot::Sender` into `completion_ctx` with `Box::into_raw(tx_box)` and then awaits `rx`. If the foreign `sign_async` implementation never calls `dash_sdk_sign_async_completion`—for example on a cancelled biometric prompt or an early iOS error path—nothing ever reconstructs or drops that boxed sender. That means the `Err(_recv_err)` branch at the end of the await is unreachable in the exact failure mode its comment describes: the receiver does not observe a closed channel, it just waits forever while the boxed sender leaks. In practice this wedges every caller waiting on the signature instead of returning the advertised recoverable `ProtocolError`.
- [BLOCKING] lines 397-431: Double-calling the completion callback is use-after-free UB
`dash_sdk_sign_async_completion()` unconditionally does `Box::from_raw(completion_ctx as *mut oneshot::Sender<SignResult>)` and consumes that sender on the first callback. If the foreign signer ever invokes the completion twice for the same request—success/error races, retry bookkeeping bugs, or duplicate platform callbacks are all realistic here—the second invocation rebuilds a `Box` from already-freed memory and sends/drops through a dangling pointer. Because this is an exported C entry point on a cross-language async boundary, relying on documentation alone for the single-shot guarantee is too weak; the Rust side needs a one-shot guard so duplicate completions become a harmless no-op instead of host-process memory corruption.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
One blocking memory-safety issue at the FFI boundary: dash_sdk_sign_async_completion unconditionally reconstructs a Boxoneshot::Sender from the C-supplied completion_ctx, so a double-invocation (a likely Swift mistake across biometric/HSM callbacks) becomes a use-after-free / double-free in the host process. Several related FFI-contract hardening issues follow from the same design: no catch_unwind on extern "C" entry points, synchronous dash_sdk_signer_sign builds a new Tokio runtime per call and panics if invoked from within one, handles are not shared-ownership so a concurrent destroy during an in-flight async sign is UAF, and the checked-in C header still advertises the removed synchronous signer ABI. Remaining items are suggestion-grade (intentional Sender leak on timeout with no cap, hardcoded P2PKH witness on the callback path, hardcoded null signer_ptr in dash_sdk_signer_create) or nitpick (silent false on bincode encode failure in can_sign_with).
Reviewed commit: 9169382
🔴 1 blocking | 🟡 7 suggestion(s) | 💬 1 nitpick(s)
9 additional findings
🔴 blocking: dash_sdk_sign_async_completion double-invocation is use-after-free / double-free
packages/rs-sdk-ffi/src/signer.rs (lines 429-463)
dash_sdk_sign_async_completion unconditionally executes Box::from_raw(completion_ctx as *mut oneshot::Sender<SignResult>) on every call and lets the reconstructed Box<Sender> drop at end-of-scope. The single-shot guarantee lives only in a # Safety doc comment on SignCompletionCallback ("Must be called at most once per completion_ctx"), and the contract must be upheld by hand on the Swift/Kotlin side of a C ABI. Realistic triggers exist: a Swift closure that calls completion(ctx, ..) on both the success and cancellation paths of a biometric/HSM prompt; a retry-on-timeout helper that fires a stashed completion twice; or a race where two threads observe the same (ctx, completion) pair and both fire. The first call reconstructs the Box, drops the Sender, and frees the heap slot; the second call performs Box::from_raw on that already-freed pointer and drops again — classic UAF / double-free on an attacker-reachable path (the iOS signer can be any third-party integration and the completion is held across biometric prompts, app backgrounding, and possibly process-state changes). This is defense-in-depth the Rust side must provide — a doc-only single-shot contract is not enough at a C ABI boundary. Fix by wrapping the sender in a structure that supports an atomic "taken" flag (e.g. Arc<Mutex<Option<oneshot::Sender<_>>>> or an AtomicPtr swap-to-null before dereference) so the second invocation observes an empty slot and returns without touching freed memory.
🟡 suggestion: dash_sdk_signer_sign builds a fresh Tokio runtime per call and panics if invoked from within a runtime
packages/rs-sdk-ffi/src/signer_simple.rs (lines 116-133)
dash_sdk_signer_sign now accepts any SignerHandle and bridges the async Signer::sign by building a new current_thread Tokio runtime and calling runtime.block_on(...) on every FFI invocation. Two concrete FFI-boundary problems: (1) If the caller is already inside a Tokio runtime — true for anything that composes this helper with async dash_sdk_* entry points on the same thread — runtime.block_on panics with "Cannot start a runtime from within a runtime" and the panic unwinds into C/Swift. (2) For a callback-backed signer produced by dash_sdk_signer_create, VTableSigner::sign no longer returns a signature directly; it waits on a oneshot::Receiver that is only filled when the foreign completion fires. If the foreign implementation posts its completion back to the same thread (or to the main queue), block_on has already blocked that thread, so the completion cannot run and the call hangs until the 5-minute timeout. The function is documented as intended for the native SingleKeySigner path, whose async fn is pure CPU work — use futures::executor::block_on (no Tokio context required, does not conflict with a surrounding runtime), or reject non-native handles explicitly, or expose a sync helper on SingleKeySigner and call that. Per-call runtime construction is also wasteful in batch-signing loops.
🟡 suggestion: Async signer handle can be freed while Rust is still awaiting it
packages/rs-sdk-ffi/src/signer.rs (lines 523-529)
SignerHandle * is a raw opaque pointer destroyed at any time via dash_sdk_signer_destroy, which immediately does Box::from_raw(handle as *mut VTableSigner) and invokes the foreign destroy callback. On the Rust side the async FFI entry points borrow the box as VTableSignerRef(&*(signer_handle as *const VTableSigner)) (e.g. identity/transfer.rs:102-103) and hand that borrow to an await chain that can remain suspended for up to SIGN_ASYNC_COMPLETION_TIMEOUT (5 minutes) while a biometric/HSM completion is outstanding. A concurrent destroy from another C thread during that window frees the vtable/state while an in-flight future still references it; a subsequent completion or method call on the borrow is a cross-boundary UAF. Model the handle as shared ownership (e.g. Arc<VTableSigner> internally, with per-call clones) so in-flight async operations keep the signer alive until their sign future completes, and document the new destroy semantics (drop last reference rather than free immediately).
🟡 suggestion: Missing catch_unwind on extern "C" signer entry points
packages/rs-sdk-ffi/src/signer.rs (lines 428-463)
dash_sdk_sign_async_completion, dash_sdk_signer_create, dash_sdk_signer_destroy, and dash_sdk_bytes_free are all extern "C" and allocate (Box, CStr::to_string_lossy -> String, to_vec for the signature copy). An allocation failure or other panic inside any of these bodies unwinds into the C/Swift caller. Since Rust 1.81 that aborts the process rather than being UB, but aborting the host iOS app from inside a biometric-prompt completion is still a severe end-user failure mode on a path explicitly driven from foreign code. Wrap each body in std::panic::catch_unwind and map the Err case to a ProtocolError::Generic (or a silent drop for destroy/free). In dash_sdk_sign_async_completion specifically, a panic after Box::from_raw but before tx.send(...) would abort while still holding the reconstructed Sender — catching it at least surfaces it as a recoverable signer error on the awaiting task side.
🟡 suggestion: Checked-in C header still exposes the removed synchronous signer ABI
packages/rs-sdk-ffi/test_header.h (lines 1350-1352)
The Rust FFI surface changed dash_sdk_signer_create to take SignAsyncCallback and complete signing via a callback, but the checked-in C header still declares the old synchronous ABI with IOSSignCallback returning uint8_t * and a FreeResultCallback (test_header.h:362-370 and :1350-1352). Any Swift/C consumer that uses this checked-in header instead of regenerating bindings will compile against the wrong boundary contract and either fail to link or call the symbol with an incompatible prototype. Regenerate the header (or delete it and rely on a build-time cbindgen step) as part of this ABI change so non-Rust callers see the same contract the Rust library now exports. The stale dash_sdk_bytes_free doc comment also references the removed SignerVTable::free_result slot and should be updated alongside.
🟡 suggestion: Timeout path intentionally leaks Sender box; no cap on concurrent outstanding requests
packages/rs-sdk-ffi/src/signer.rs (lines 300-334)
When SIGN_ASYNC_COMPLETION_TIMEOUT fires, rx is dropped but tx — wrapped in a Box<oneshot::Sender<SignResult>> — is still owned by the C side via completion_ctx. The in-code comment correctly notes "we leak it rather than risk a use-after-free if the caller eventually does invoke the completion," which is the right choice per request, but there is no cap on the number of outstanding timed-out requests. A buggy or hostile FFI signer that never invokes its completion leaks one Box plus the SignAsyncCallback closure capture per call, held for 5 minutes and then leaked forever. A client that issues sign requests in a loop (strategy tests, retries, batch broadcasts) against a non-responsive signer will accumulate memory with no upper bound. Either cap concurrent outstanding sign requests per VTableSigner, or track outstanding completion_ctx allocations on the VTableSigner and reclaim them at destroy-time. Risk is currently bounded (this is a client SDK, not a validator) which is why this is a suggestion.
🟡 suggestion: Callback-path sign_create_witness hard-codes P2PKH regardless of key type
packages/rs-sdk-ffi/src/signer.rs (lines 339-354)
For Inner::Callback, sign_create_witness always returns AddressWitness::P2pkh { signature: <raw sig from sign()> }, independent of IdentityPublicKey::key_type(). For ECDSA_SECP256K1 this is correct; if an iOS signer for BLS12_381, EDDSA_25519_HASH160, or BIP13_SCRIPT_HASH is ever wired through the callback path, the returned witness is semantically wrong (a P2PKH witness containing a non-ECDSA signature). Consensus nodes will reject such a witness so this is not a safety bug today, but it is a silent footgun the moment the ECDSA assumption is relaxed anywhere in the callers. Either gate on key.key_type() and return InvalidIdentityPublicKeyTypeError for non-ECDSA keys (mirroring SimpleSigner::sign_create_witness), or fully remove this impl from the Signer<IdentityPublicKey> axis since witness creation is conceptually a Signer<PlatformAddress> concern.
🟡 suggestion: dash_sdk_signer_create hardcodes signer_ptr = null, preventing stateful FFI signers
packages/rs-sdk-ffi/src/signer.rs (lines 488-507)
dash_sdk_signer_create takes only function pointers and constructs the VTableSigner with signer_ptr = std::ptr::null_mut(). The function-pointer types (SignAsyncCallback, CanSignCallback) all receive that null pointer as their signer: *const c_void argument. Swift @convention(c) function pointers cannot capture state (the language rejects non-global closures at that type), so an iOS HSM signer that needs to distinguish multiple Secure-Enclave-backed identities has no boundary-safe way to do so — it must rely on process-global static state on the Swift side, which breaks the moment more than one SDK instance exists in the app (watch-kit extension plus host app, tests, etc.). Add an explicit signer_ptr: *mut c_void parameter (plus optional per-ptr destroy) to dash_sdk_signer_create so callers can supply their own per-signer context, or document prominently that FFI-created signers must use static state and do not compose with multi-instance clients.
💬 nitpick: can_sign_with silently swallows bincode encode failure
packages/rs-sdk-ffi/src/signer.rs (lines 356-375)
VTableSigner::can_sign_with calls bincode::encode_to_vec(identity_public_key, ...) and returns false if encoding fails. Because this method is the selection predicate used to choose which key to sign with during state-transition construction, a silent false surfaces upstream as an opaque "no signer can sign" error at a much higher layer, and the underlying serialization bug is invisible from the FFI caller's perspective. Bincode encoding of an IdentityPublicKey should not fail in practice, but if the key schema ever drifts from what the FFI vtable was built against, a silent boundary loss makes the diagnostic trail unrecoverable. At minimum, tracing::warn! on the Err arm preserves the signal without changing the boundary contract.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-sdk-ffi/src/signer.rs`:
- [BLOCKING] lines 429-463: dash_sdk_sign_async_completion double-invocation is use-after-free / double-free
`dash_sdk_sign_async_completion` unconditionally executes `Box::from_raw(completion_ctx as *mut oneshot::Sender<SignResult>)` on every call and lets the reconstructed `Box<Sender>` drop at end-of-scope. The single-shot guarantee lives only in a `# Safety` doc comment on `SignCompletionCallback` ("Must be called at most once per `completion_ctx`"), and the contract must be upheld by hand on the Swift/Kotlin side of a C ABI. Realistic triggers exist: a Swift closure that calls `completion(ctx, ..)` on both the success and cancellation paths of a biometric/HSM prompt; a retry-on-timeout helper that fires a stashed completion twice; or a race where two threads observe the same `(ctx, completion)` pair and both fire. The first call reconstructs the Box, drops the `Sender`, and frees the heap slot; the second call performs `Box::from_raw` on that already-freed pointer and drops again — classic UAF / double-free on an attacker-reachable path (the iOS signer can be any third-party integration and the completion is held across biometric prompts, app backgrounding, and possibly process-state changes). This is defense-in-depth the Rust side must provide — a doc-only single-shot contract is not enough at a C ABI boundary. Fix by wrapping the sender in a structure that supports an atomic "taken" flag (e.g. `Arc<Mutex<Option<oneshot::Sender<_>>>>` or an `AtomicPtr` swap-to-null before dereference) so the second invocation observes an empty slot and returns without touching freed memory.
- [SUGGESTION] lines 523-529: Async signer handle can be freed while Rust is still awaiting it
`SignerHandle *` is a raw opaque pointer destroyed at any time via `dash_sdk_signer_destroy`, which immediately does `Box::from_raw(handle as *mut VTableSigner)` and invokes the foreign `destroy` callback. On the Rust side the async FFI entry points borrow the box as `VTableSignerRef(&*(signer_handle as *const VTableSigner))` (e.g. `identity/transfer.rs:102-103`) and hand that borrow to an await chain that can remain suspended for up to `SIGN_ASYNC_COMPLETION_TIMEOUT` (5 minutes) while a biometric/HSM completion is outstanding. A concurrent destroy from another C thread during that window frees the vtable/state while an in-flight future still references it; a subsequent completion or method call on the borrow is a cross-boundary UAF. Model the handle as shared ownership (e.g. `Arc<VTableSigner>` internally, with per-call clones) so in-flight async operations keep the signer alive until their `sign` future completes, and document the new destroy semantics (drop last reference rather than free immediately).
- [SUGGESTION] lines 428-463: Missing catch_unwind on extern "C" signer entry points
`dash_sdk_sign_async_completion`, `dash_sdk_signer_create`, `dash_sdk_signer_destroy`, and `dash_sdk_bytes_free` are all `extern "C"` and allocate (Box, `CStr::to_string_lossy` -> `String`, `to_vec` for the signature copy). An allocation failure or other panic inside any of these bodies unwinds into the C/Swift caller. Since Rust 1.81 that aborts the process rather than being UB, but aborting the host iOS app from inside a biometric-prompt completion is still a severe end-user failure mode on a path explicitly driven from foreign code. Wrap each body in `std::panic::catch_unwind` and map the `Err` case to a `ProtocolError::Generic` (or a silent drop for `destroy`/`free`). In `dash_sdk_sign_async_completion` specifically, a panic after `Box::from_raw` but before `tx.send(...)` would abort while still holding the reconstructed Sender — catching it at least surfaces it as a recoverable signer error on the awaiting task side.
- [SUGGESTION] lines 300-334: Timeout path intentionally leaks Sender box; no cap on concurrent outstanding requests
When `SIGN_ASYNC_COMPLETION_TIMEOUT` fires, `rx` is dropped but `tx` — wrapped in a `Box<oneshot::Sender<SignResult>>` — is still owned by the C side via `completion_ctx`. The in-code comment correctly notes "we leak it rather than risk a use-after-free if the caller eventually does invoke the completion," which is the right choice per request, but there is no cap on the number of outstanding timed-out requests. A buggy or hostile FFI signer that never invokes its completion leaks one Box plus the `SignAsyncCallback` closure capture per call, held for 5 minutes and then leaked forever. A client that issues sign requests in a loop (strategy tests, retries, batch broadcasts) against a non-responsive signer will accumulate memory with no upper bound. Either cap concurrent outstanding sign requests per `VTableSigner`, or track outstanding `completion_ctx` allocations on the `VTableSigner` and reclaim them at destroy-time. Risk is currently bounded (this is a client SDK, not a validator) which is why this is a suggestion.
- [SUGGESTION] lines 339-354: Callback-path sign_create_witness hard-codes P2PKH regardless of key type
For `Inner::Callback`, `sign_create_witness` always returns `AddressWitness::P2pkh { signature: <raw sig from sign()> }`, independent of `IdentityPublicKey::key_type()`. For `ECDSA_SECP256K1` this is correct; if an iOS signer for `BLS12_381`, `EDDSA_25519_HASH160`, or `BIP13_SCRIPT_HASH` is ever wired through the callback path, the returned witness is semantically wrong (a P2PKH witness containing a non-ECDSA signature). Consensus nodes will reject such a witness so this is not a safety bug today, but it is a silent footgun the moment the ECDSA assumption is relaxed anywhere in the callers. Either gate on `key.key_type()` and return `InvalidIdentityPublicKeyTypeError` for non-ECDSA keys (mirroring `SimpleSigner::sign_create_witness`), or fully remove this impl from the `Signer<IdentityPublicKey>` axis since witness creation is conceptually a `Signer<PlatformAddress>` concern.
- [SUGGESTION] lines 488-507: dash_sdk_signer_create hardcodes signer_ptr = null, preventing stateful FFI signers
`dash_sdk_signer_create` takes only function pointers and constructs the `VTableSigner` with `signer_ptr = std::ptr::null_mut()`. The function-pointer types (`SignAsyncCallback`, `CanSignCallback`) all receive that null pointer as their `signer: *const c_void` argument. Swift `@convention(c)` function pointers cannot capture state (the language rejects non-global closures at that type), so an iOS HSM signer that needs to distinguish multiple Secure-Enclave-backed identities has no boundary-safe way to do so — it must rely on process-global static state on the Swift side, which breaks the moment more than one `SDK` instance exists in the app (watch-kit extension plus host app, tests, etc.). Add an explicit `signer_ptr: *mut c_void` parameter (plus optional per-ptr destroy) to `dash_sdk_signer_create` so callers can supply their own per-signer context, or document prominently that FFI-created signers must use static state and do not compose with multi-instance clients.
In `packages/rs-sdk-ffi/src/signer_simple.rs`:
- [SUGGESTION] lines 116-133: dash_sdk_signer_sign builds a fresh Tokio runtime per call and panics if invoked from within a runtime
`dash_sdk_signer_sign` now accepts any `SignerHandle` and bridges the async `Signer::sign` by building a new `current_thread` Tokio runtime and calling `runtime.block_on(...)` on every FFI invocation. Two concrete FFI-boundary problems: (1) If the caller is already inside a Tokio runtime — true for anything that composes this helper with async `dash_sdk_*` entry points on the same thread — `runtime.block_on` panics with "Cannot start a runtime from within a runtime" and the panic unwinds into C/Swift. (2) For a callback-backed signer produced by `dash_sdk_signer_create`, `VTableSigner::sign` no longer returns a signature directly; it waits on a `oneshot::Receiver` that is only filled when the foreign completion fires. If the foreign implementation posts its completion back to the same thread (or to the main queue), `block_on` has already blocked that thread, so the completion cannot run and the call hangs until the 5-minute timeout. The function is documented as intended for the native `SingleKeySigner` path, whose async fn is pure CPU work — use `futures::executor::block_on` (no Tokio context required, does not conflict with a surrounding runtime), or reject non-native handles explicitly, or expose a sync helper on `SingleKeySigner` and call that. Per-call runtime construction is also wasteful in batch-signing loops.
In `packages/rs-sdk-ffi/test_header.h`:
- [SUGGESTION] lines 1350-1352: Checked-in C header still exposes the removed synchronous signer ABI
The Rust FFI surface changed `dash_sdk_signer_create` to take `SignAsyncCallback` and complete signing via a callback, but the checked-in C header still declares the old synchronous ABI with `IOSSignCallback` returning `uint8_t *` and a `FreeResultCallback` (`test_header.h:362-370` and `:1350-1352`). Any Swift/C consumer that uses this checked-in header instead of regenerating bindings will compile against the wrong boundary contract and either fail to link or call the symbol with an incompatible prototype. Regenerate the header (or delete it and rely on a build-time cbindgen step) as part of this ABI change so non-Rust callers see the same contract the Rust library now exports. The stale `dash_sdk_bytes_free` doc comment also references the removed `SignerVTable::free_result` slot and should be updated alongside.
…-trait # Conflicts: # packages/rs-drive-abci/src/execution/validation/state_transition/check_tx_verification/v0/mod.rs # packages/rs-drive-abci/src/execution/validation/state_transition/state_transitions/shield/tests.rs
Replace the ad-hoc inline `tokio::runtime::Builder::new_current_thread()` in `dash_sdk_signer_sign` with the workspace-canonical `dash_async::block_on` (rs-dash-async). `block_on` is runtime-aware and handles the no-runtime, current-thread, and multi-thread flavors safely — avoiding the deadlocks that motivated PR #3432/#3497. Because `block_on` requires `Send + 'static` futures, the signing future now owns its inputs: the data slice is copied into a `Vec<u8>` up front and `&VTableSigner` is reconstructed from a pointer-sized integer inside the future (safe because `block_on` blocks the caller for the duration of the FFI call and `VTableSigner: Send + Sync`). The async-bridge error is surfaced as a distinct `InternalError` so callers can distinguish bridging failures from signing failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New tests landed on `v3.1-dev` while this branch converted `Signer::sign` and `sign_create_witness` to `async fn`. The upstream tests were written against the pre-async API and called `BatchTransition::new_token_*_transition` (and `perform_document_replace_on_profile_after_epoch_change`) synchronously, producing `impl Future` values that were then chained with `.expect(...)` / `.unwrap(...)`. After the merge this compiled as `E0599: no method named 'expect' found for opaque type 'impl Future<...>'` (54 sites across `token/burn` and `token/unfreeze`) and triggered `unused implementer of Future` warnings in wrapper tests across `token/freeze`, `token/mint`, `token/unfreeze`, and `document/replacement` (13 sites) — i.e. tests that compiled but silently did nothing. This commit mechanically fixes all affected sites: - Converts enclosing `#[test] fn test_...` to `#[tokio::test] async fn test_...` - Inserts `.await` before every `.expect(...)` / `.unwrap(...)` that followed an async signer-dependent call - Inserts `.await` before the terminating `;` of wrapper test bodies that invoke async helpers (`test_token_*_two_member_group_with_keeps_history`, `test_token_mint_by_owner_requires_group_other_member`, `perform_document_replace_on_profile_after_epoch_change`) - Updates `Cargo.lock` with the new `dash-async` dependency of `rs-sdk-ffi` introduced in the preceding commit Verified: `cargo check -p drive-abci --tests` and `cargo check --workspace --tests` both clean (warnings unchanged from pre-merge baseline). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses @thepastaclaw review (PR #3492, thread r3128065543). Previously `dash_sdk_sign_async_completion` unconditionally reconstructed `Box<oneshot::Sender<SignResult>>` from the C-supplied `completion_ctx` on every call. A single-shot guarantee only as documentation made the function a latent use-after-free / double-free if the foreign signer ever invoked completion more than once — a realistic failure mode for biometric retries, success/error races, or duplicate platform callbacks. Introduce a leaked `CompletionSlot { sender: AtomicPtr<Sender> }` as the completion context. The first call atomically swaps the inner sender pointer to null (`AcqRel`) and wraps it back into a `Box` that is dropped exactly once at end of scope; subsequent calls observe null and return without touching freed memory. The slot itself is intentionally leaked — a small bounded cost per sign_async call — because the FFI boundary cannot guarantee when (or whether) the foreign caller stops invoking completion. Adds a regression test `completion_callback_duplicate_is_no_op` that invokes completion three times and asserts only the first result is observed and no crash occurs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses @lklimek review (PR #3492, threads r3085996493 / r3086015077). The crate-root `#![allow(async_fn_in_trait)]` is not a replacement for `#[async_trait]` — it simply permits native `async fn` in traits that are used only through static dispatch. `Signer<K>` must keep the `#[async_trait]` attribute because it is object-safe and stored as `Arc<dyn Signer<IdentityPublicKey>>` (see `VTableSigner::Inner::Native` in packages/rs-sdk-ffi/src/signer.rs). Add short comments above both the crate-level allow and the `Signer` trait declaration so future readers don't mistake the mix for inconsistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses @lklimek review (PR #3492, thread r3086203435). `SingleKeySigner::new_from_slice(&[u8], Network)` exposed a network parameter that only affected WIF encoding (via the wrapped `PrivateKey`), never signing itself — and the only caller in `rs-sdk-ffi` papered over this by hardcoding `Network::Mainnet` with a "network doesn't matter for signing" comment, leaning on internal `SingleKeySigner` behaviour. Drop the parameter: the constructor now tags the key with `Network::Mainnet` as an arbitrary placeholder and callers that care about WIF formatting use `from_hex` / `new` which still accept the network explicitly. `new_from_slice` has only a single caller in the workspace (`packages/rs-sdk-ffi/src/signer_simple.rs`), which also drops the now- unused `Network` import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses @lklimek review (PR #3492, thread r3085919989), deferred to issue #3535. The `#[stack_size]` proc-macro currently expands async function bodies to an inline `tokio::runtime::Builder::new_current_thread().build() .block_on(async move #block)` on a spawned thread — the same pattern PR #3490 / #3497 identified as runtime-within-runtime deadlock-prone. Adding a `dash-async` dependency to this macro crate is cross-cutting and out of scope for the async-signer PR, so the migration is tracked in issue #3535 with a `TODO(CMT-002)` marker at the call site. Issue #3535 body updated to include this site alongside the rs-sdk-ffi FFI bridging sites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upstream PR #3506 landed three new unit tests in `shielded/builder/shield.rs` (test_build_shield_multiple_inputs_all_plumbed, test_build_shield_user_fee_increase_non_zero_succeeds, test_build_shield_memo_is_fully_plumbed) that call `build_shield_transition` synchronously and chain `.is_ok()` / `.err()` onto the returned `impl Future<Output = Result<_, _>>`. After this branch made the builder async these sites fail to compile with E0599 on the opaque Future. Convert the three `#[test] fn` wrappers to `#[tokio::test] async fn` and add `.await` to the `build_shield_transition(...)` call in each (the two pre-existing async tests already did this correctly). `cargo test -p dpp --all-features shield` — 145 passed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@QuantumExplorer thanks for the review — replies inline: QEX-1 ( QEX-2 ( 🤖 Co-authored by Claudius the Magnificent AI Agent |
Trim multi-line commentary added in this session down to 2-3 lines per
block per coding-best-practices ("1 line is great, 2 lines are good, 3
is mediocre"). No behavioural change — comments only.
Also add a one-block rationale on the (intentional, crate-wide)
`#![allow(clippy::result_large_err)]` in rs-dpp — verified as still
actively used: removing it produces 642 warnings on `ProtocolError`.
`#![allow(async_fn_in_trait)]` kept for the same reason (31 warnings
without it on new state-transition async method traits).
Net: 46 insertions vs 120 deletions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lice Previously `SingleKeySigner::new_from_slice` hardcoded `Network::Mainnet` because its only caller — the FFI entry point — didn't expose a network parameter. Network only affects WIF encoding and address derivation (not signing), but the caller should still pass the correct value. - `simple-signer`: `new_from_slice(data, network)` now takes network. - `rs-sdk-ffi`: `dash_sdk_signer_create_from_private_key` gains a trailing `DashSDKNetwork` parameter, mapped to `dashcore::Network` via the same pattern as `sdk.rs` (ABI-breaking for C callers). - swift-sdk: `KeyManager` signer-creation APIs accept a `network` parameter (testnet default mirrors existing `SDK(network:)` usage); `Addresses` helpers pass `sdk.network` through; test targets updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Prior-SHA blockers (missing completion timeout, duplicate-completion UAF) are fixed on the current head. One suggestion-level FFI hazard remains: the synchronous dash_sdk_signer_sign helper accepts callback-backed signer handles and will deadlock-until-timeout if the foreign completion is dispatched onto the same thread the caller is blocking.
Reviewed commit: 2366428
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-sdk-ffi/src/signer_simple.rs`:
- [SUGGESTION] lines 131-139: `dash_sdk_signer_sign` deadlocks callback-backed signers that complete on the caller's thread
`dash_sdk_signer_sign` bridges async via `dash_async::block_on`, which — when invoked with no active tokio runtime (the typical FFI-from-Swift path) — builds a temporary current-thread runtime and drives the future on the caller's OS thread. Combined with the fact that this helper accepts any `SignerHandle`, including `Inner::Callback` handles whose `VTableSigner::sign` only returns after the foreign code invokes the completion callback (`packages/rs-sdk-ffi/src/signer.rs:287-339`), this creates a deadlock-until-timeout: any foreign signer that dispatches its completion onto the same thread/queue that called `dash_sdk_signer_sign` (Swift main queue, a serial DispatchQueue, etc.) cannot run its completion handler because that thread is parked polling tokio. The call then sits idle until `SIGN_ASYNC_COMPLETION_TIMEOUT` (5 minutes) elapses. The native private-key path is fine because it never bounces back through C. Either reject callback-based handles in this entry point, or document loudly that this sync helper is only safe for the native private-key signer path.
| // SAFETY: `signer_handle` is valid for the call; `block_on` blocks the | ||
| // caller so the allocation can't be freed mid-sign. `VTableSigner: Send + | ||
| // Sync`. The usize round-trip gives the `async move` a `Send + 'static` | ||
| // capture of the !Clone handle. | ||
| let signer_addr = signer_handle as usize; | ||
| let result = block_on(async move { | ||
| let signer: &VTableSigner = unsafe { &*(signer_addr as *const VTableSigner) }; | ||
| signer.sign(&dummy_key, &data_owned).await | ||
| }); |
There was a problem hiding this comment.
🟡 Suggestion: dash_sdk_signer_sign deadlocks callback-backed signers that complete on the caller's thread
dash_sdk_signer_sign bridges async via dash_async::block_on, which — when invoked with no active tokio runtime (the typical FFI-from-Swift path) — builds a temporary current-thread runtime and drives the future on the caller's OS thread. Combined with the fact that this helper accepts any SignerHandle, including Inner::Callback handles whose VTableSigner::sign only returns after the foreign code invokes the completion callback (packages/rs-sdk-ffi/src/signer.rs:287-339), this creates a deadlock-until-timeout: any foreign signer that dispatches its completion onto the same thread/queue that called dash_sdk_signer_sign (Swift main queue, a serial DispatchQueue, etc.) cannot run its completion handler because that thread is parked polling tokio. The call then sits idle until SIGN_ASYNC_COMPLETION_TIMEOUT (5 minutes) elapses. The native private-key path is fine because it never bounces back through C. Either reject callback-based handles in this entry point, or document loudly that this sync helper is only safe for the native private-key signer path.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/rs-sdk-ffi/src/signer_simple.rs`:
- [SUGGESTION] lines 131-139: `dash_sdk_signer_sign` deadlocks callback-backed signers that complete on the caller's thread
`dash_sdk_signer_sign` bridges async via `dash_async::block_on`, which — when invoked with no active tokio runtime (the typical FFI-from-Swift path) — builds a temporary current-thread runtime and drives the future on the caller's OS thread. Combined with the fact that this helper accepts any `SignerHandle`, including `Inner::Callback` handles whose `VTableSigner::sign` only returns after the foreign code invokes the completion callback (`packages/rs-sdk-ffi/src/signer.rs:287-339`), this creates a deadlock-until-timeout: any foreign signer that dispatches its completion onto the same thread/queue that called `dash_sdk_signer_sign` (Swift main queue, a serial DispatchQueue, etc.) cannot run its completion handler because that thread is parked polling tokio. The call then sits idle until `SIGN_ASYNC_COMPLETION_TIMEOUT` (5 minutes) elapses. The native private-key path is fine because it never bounces back through C. Either reject callback-based handles in this entry point, or document loudly that this sync helper is only safe for the native private-key signer path.
Summary
Convert
Signer<K>inrs-dppto an async trait via#[async_trait], enabling true async signing for iOS HSM / Secure Enclave / keychain backends without blocking any thread during biometric prompts or hardware calls.Trait shape:
Scope
Signerimplementations converted (SimpleSigner,SingleKeySigner,DummySigner, 2×TestAddressSigner,IdentitySignerWasm,PlatformAddressSignerWasm,VTableSigner,AddressSigner)rs-dppstate-transition builders,rs-sdktransition builders,rs-drive-abcitests,wasm-sdk, andstrategy-tests— 177 files changedsigner.sign(...)rewritten as sequentialforloops (order-preserving, borrow-friendly — notry_join_all)#[stack_size]macro taught to wrapasync fnbodies in a tokio current-thread runtime inside the spawned thread so integration tests keep workingBreaking change: rs-sdk-ffi Signer vtable redesign
The old synchronous
SignCallback+free_result_callbackpair is replaced with a completion-callback pattern:The iOS side returns from
sign_asyncimmediately, stashes(completion_ctx, completion), and invokescompletion(ctx, sig, len, err)later — possibly from a different thread, possibly minutes later after a biometric prompt. The Rust side awaits atokio::sync::oneshot::Receiverin the meantime. No thread is blocked during the wait.iOS migration required (follow-up PR on swift-sdk)
The iOS Swift code in `packages/swift-sdk/` is left unmodified in this PR; updating it is explicit follow-up work.
Test plan
🤖 Generated with Claude Code