cluster: replay queued up events by dkropachev · Pull Request #836 · scylladb/python-driver

dkropachev · 2026-04-30T15:57:52Z

Summary

Replay queued node-up events after down handling completes instead of dropping them while the host remains down.
Track down-handling revisions so stale or superseded callbacks do not clear newer host state work.
Keep queued-up replay invalidatable until on_up() reacquires the host lock, and preserve no-retry auth failures for hosts that were never marked up.

Tests

uv run pytest tests/unit/test_cluster.py -q
git diff --check

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

Host status events can race when an UP notification arrives while DOWN handling is still running in the executor. Previously the UP path could complete first, only for the pending DOWN path to remove pools and start a reconnector afterwards, leaving host liveness state stale. Track per-host liveness epochs and queue UP handling while DOWN handling is active. Replay the queued UP only if no newer DOWN or REMOVE event superseded it, and guard reconnection and pool cleanup against stale host objects. Add unit coverage for superseded up/down/remove sequences, queued replay, and endpoint updates.

    def on_up(self, host):
+        return self._on_up(host)
+
+    def _on_up(self, host, expected_epoch=None):


github-code-quality Bot found potential problems Apr 30, 2026

View reviewed changes

Comment thread cassandra/cluster.py Fixed

dkropachev force-pushed the fix/replay-up-after-down-handling branch 3 times, most recently from 2094ebd to db683a0 Compare April 30, 2026 17:33

github-code-quality Bot found potential problems Apr 30, 2026

View reviewed changes

Comment thread cassandra/cluster.py Fixed

dkropachev force-pushed the fix/replay-up-after-down-handling branch from db683a0 to cff85ac Compare April 30, 2026 17:56

github-code-quality Bot found potential problems Apr 30, 2026

View reviewed changes

Comment thread cassandra/cluster.py Fixed

dkropachev force-pushed the fix/replay-up-after-down-handling branch from cff85ac to 368e7e6 Compare April 30, 2026 21:19

github-code-quality Bot found potential problems Apr 30, 2026

View reviewed changes

Comment thread cassandra/cluster.py

def on_up(self, host):

return self._on_up(host)

def _on_up(self, host, expected_epoch=None):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster: replay queued up events#836

cluster: replay queued up events#836
dkropachev wants to merge 1 commit intomasterfrom
fix/replay-up-after-down-handling

dkropachev commented Apr 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dkropachev commented Apr 30, 2026

Summary

Tests

Pre-review checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant