(improvement)Optimize DCAware/RackAware/TokenAware/HostFilter policies with host distance caching and overall perf. improvements#651
Conversation
|
This is interesting, my change has exposed this - Need to understand this better :-/ |
if host is None:
host = self._cluster.metadata.get_host_by_host_id(host_id)
if host and host.endpoint != endpoint:
log.debug("[control connection] Updating host ip from %s to %s for (%s)", host.endpoint, endpoint, host_id)
old_endpoint = host.endpoint
host.endpoint = endpoint
self._cluster.metadata.update_host(host, old_endpoint)
reconnector = host.get_and_set_reconnection_handler(None)
if reconnector:
reconnector.cancel()
self._cluster.on_down(host, is_host_addition=False, expect_host_to_be_down=True)So first we update the host with the new endpoint, then mark it as down? |
|
This fixes it for me: which also makes sense to me. |
76ee195 to
edab823
Compare
|
I think CI failure is unrelated and is #359 |
edab823 to
1884f59
Compare
|
By using the (not amazing) benchmark from #653 , I got the following results: For master branch as a baseline: This branch (with just DC aware improvements): ** 433 -> 781 Kops/sec improvement ** With improvement to rack aware (on top of master), I got: ** 277 -> 324 Kops/sec improvement ** And on top of this branch: ** 277 -> 344 Kops/sec improvement ** And finally, for #650 : which kinda makes me suspect that branch is no good :-/ |
Sent separate PR - #654 |
|
With rack aware added (3rd commit), these are the current numbers: |
cc0204d to
6282e6f
Compare
Now that I also cache non-local hosts, not just remote (duh!), perf. is better: |
|
Added for TokenAware as well some optimization (need to improve commit message). So reasonable improvement, at least in this micro-benchmark. |
8f96d39 to
87c6a01
Compare
|
Last push, I think I'm done: |
…tate Refactor `DCAwareRoundRobinPolicy` to simplify distance calculations and memory usage. Key changes: - Remove `_hosts_by_distance` and the complex caching of LOCAL hosts. - `distance()` now checks `host.datacenter` directly for LOCAL calculation, which is correct and static. - Only cache `_remote_hosts` to efficiently handle `used_hosts_per_remote_dc`. - Optimize control plane operations (`on_up`, `on_down`) to only rebuild the remote cache when necessary (when remote hosts change or local DC changes). - This removes the overhead of maintaining a redundant LOCAL cache and ensures correct behavior even if a local host is marked down. - Snapshot `self._remote_hosts` to a local variable before reads for GIL-free Python 3.13+ safety. Benchmark (100K queries, 45-node/5-DC topology, Python 3.14, median of 5 runs): Policy | Kops/s | vs master | Mem KB --------------------------------------------------------- DCAware | 195 | +84% | 1.5 RackAware | 68 | 0% | 9.3 TokenAware(DCAware) | 19 | +6% | 1.7 TokenAware(RackAware) | 16 | -6% | 2.8 Default(DCAware) | 136 | +49% | 1.6 HostFilter(DCAware) | 69 | +30% | 1.7 DCAware nearly doubles throughput (+84%), memory footprint drops from 9.3 KB to 1.5 KB. DefaultPolicy and HostFilter benefit transitively since they wrap DCAware. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
…distances Refactor `RackAwareRoundRobinPolicy` to simplify distance calculations and memory usage. Add self._remote_hosts to cache remote hosts distance, self._non_local_rack_hosts for non-local rack host distance. - Only cache `_remote_hosts` to efficiently handle `used_hosts_per_remote_dc`. - Optimize control plane operations (`on_up`, `on_down`) to only rebuild the remote cache when necessary (when remote hosts change or local DC changes). - Snapshot `self._remote_hosts` to a local variable before reads for GIL-free Python 3.13+ safety. Benchmark (100K queries, 45-node/5-DC topology, Python 3.14, median of 5 runs): Policy | Kops/s | vs master | delta | Mem KB ----------------------------------------------------------------- DCAware | 188 | +77% | | 1.5 RackAware | 146 | +115% | +115% | 2.0 TokenAware(DCAware) | 20 | +11% | | 1.7 TokenAware(RackAware) | 19 | +12% | +19% | 2.2 Default(DCAware) | 123 | +35% | | 1.6 HostFilter(DCAware) | 57 | +8% | | 1.7 RackAware more than doubles throughput (+115%), from 68 to 146 Kops/s. TokenAware(RackAware) also benefits transitively (+12% cumulative). Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
…y planning.
Optimize TokenAwarePolicy query plan generation
This patch significantly improves the performance of TokenAwarePolicy by
reducing overhead in list materialization and distance calculation.
Key changes:
1. Introduced `make_query_plan_with_exclusion()` to the LoadBalancingPolicy
interface.
- This allows a parent policy (like TokenAware) to request a plan from
a child policy while efficiently skipping a set of already-yielded
hosts (replicas).
- Implemented optimized versions in `DCAwareRoundRobinPolicy` and
`RackAwareRoundRobinPolicy`. These implementations integrate the
exclusion check directly into their generation loops, avoiding the
need for inefficient external filtering or full list materialization.
2. Optimized `TokenAwarePolicy.make_query_plan`:
- Removed list materialization of the child query plan.
- Replaced multiple passes over replicas (checking `child.distance`
each time) with a single pass that buckets replicas into local/remote
lists.
- Utilizes `make_query_plan_with_exclusion` to yield the remainder
of the plan.
- Added `__slots__` to reduce memory overhead and attribute access cost.
Benchmark (100K queries, 45-node/5-DC topology, Python 3.14, median of 5 runs):
Policy | Kops/s | vs master | delta | Mem KB
-----------------------------------------------------------------
DCAware | 159 | +50% | | 1.5
RackAware | 128 | +88% | | 2.0
TokenAware(DCAware) | 81 | +350% | +305% | 2.9
TokenAware(RackAware) | 64 | +276% | +237% | 3.6
Default(DCAware) | 199 | +119% | +62% | 1.6
HostFilter(DCAware) | 56 | +6% | | 1.7
TokenAware throughput increases ~4x: DCAware-wrapped goes from 18 to 81 Kops/s
(+350%), RackAware-wrapped from 17 to 64 Kops/s (+276%).
DefaultPolicy also benefits strongly (+119%) from the new exclusion interface.
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Use index-based loops and excluded fast-paths in DC/Rack-aware policies to speed up TokenAware replica selection and exclusion handling. Snapshot `self._remote_hosts` to local variables in make_query_plan and make_query_plan_with_exclusion for GIL-free Python 3.13+ safety. Removed hot-path try/except and avoid eager list conversion. Updated TokenAware tests/mocks for token_map and deterministic ordering. Benchmark (100K queries, 45-node/5-DC topology, Python 3.14, median of 5 runs): Policy | Kops/s | vs master | delta | Mem KB ----------------------------------------------------------------- DCAware | 201 | +90% | +26% | 1.5 RackAware | 163 | +140% | +27% | 2.0 TokenAware(DCAware) | 96 | +433% | +19% | 1.7 TokenAware(RackAware) | 88 | +418% | +38% | 2.2 Default(DCAware) | 137 | +51% | -31% | 1.6 HostFilter(DCAware) | 65 | +23% | +16% | 1.7 Index-based loops add +19-38% to TokenAware, +26-27% to DC/RackAware. Cumulative: DCAware ~2x, RackAware ~2.4x, TokenAware ~5x vs master. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
DefaultLoadBalancingPolicy: add make_query_plan_with_exclusion - forward exclusions to child policy - preserve target_host preference while skipping excluded hosts HostFilterPolicy: add make_query_plan_with_exclusion - forward exclusions to child policy - filter excluded hosts via predicate in exclusion-aware plans - add isinstance guard to avoid redundant set() conversion when excluded is already a set Also add isinstance guard in DefaultLoadBalancingPolicy.make_query_plan_with_exclusion for consistency. Benchmark (100K queries, 45-node/5-DC topology, Python 3.14, median of 5 runs): Policy | Kops/s | vs master | delta | Mem KB ----------------------------------------------------------------- DCAware | 206 | +94% | | 1.5 RackAware | 172 | +153% | | 2.0 TokenAware(DCAware) | 97 | +439% | | 1.7 TokenAware(RackAware) | 89 | +424% | | 2.2 Default(DCAware) | 132 | +45% | -4% | 1.6 HostFilter(DCAware) | 54 | +2% | -17% | 1.7 This commit primarily adds exclusion-aware interfaces to Default and HostFilter policies. The throughput deltas are within noise; the main benefit is architectural — enabling these policies to participate in the exclusion protocol for composition with TokenAware. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Add an LRU cache (OrderedDict-based, default size 1024) to TokenAwarePolicy that avoids repeated token-to-replica lookups for the same (keyspace, routing_key) pair. The cache is automatically invalidated when the token_map object identity changes (topology rebuild), using direct reference comparison (`is not`) instead of `id()` to avoid stale cache hits from id reuse after GC. Set cache_replicas_size=0 to disable. Only the non-tablet code path is cached; the tablet path is unchanged. Thread-safety fixes: - Add `super().__init__()` call to initialize `_hosts_lock` from LoadBalancingPolicy base class. - Add `_cache_lock` (threading.Lock) to protect the OrderedDict-based LRU cache, since `move_to_end()` + `popitem()` sequences are not atomic even under CPython's GIL. - Add `_hosts_lock` and `_cache_lock` to `__slots__`. Includes 7 new unit tests for cache hit, miss (different key/keyspace), topology invalidation, eviction, disabled mode, and tablet bypass. Benchmark (100K queries, 45-node/5-DC topology, Python 3.14, median of 5 runs): Policy | Kops/s | vs master | delta | Mem KB ----------------------------------------------------------------- DCAware | 200 | +89% | | 1.5 RackAware | 167 | +146% | | 2.0 TokenAware(DCAware) | 64 | +256% | -34% | 207.5 TokenAware(RackAware) | 62 | +265% | -30% | 87.1 Default(DCAware) | 142 | +56% | | 1.6 HostFilter(DCAware) | 66 | +25% | | 1.7 Note: The cache shows a regression vs the previous commit in this micro-benchmark because mock get_replicas is O(1). In production with real metadata token ring lookups, the cache amortizes that cost. The cache adds ~87-208 KB memory for 1024 entries. The primary value of this commit is correctness (thread-safety, cache invalidation) and amortized lookup cost for real workloads with repeated partition keys.
Restructure the shuffle block in TokenAwarePolicy.make_query_plan to use a nested 'if' instead of 'and', making it explicit that LWT queries skip both the shuffle and the list copy (Paxos leader optimization). Includes 4 new unit tests verifying: LWT deterministic ordering, LWT skips list copy, non-LWT shuffle works, and LWT+cache determinism. Benchmark (100K queries, 45-node/5-DC topology, Python 3.14, median of 5 runs): Policy | Kops/s | vs master | Mem KB --------------------------------------------------------- DCAware | 204 | +92% | 1.5 RackAware | 180 | +165% | 2.0 TokenAware(DCAware) | 60 | +233% | 207.5 TokenAware(RackAware) | 57 | +235% | 87.1 Default(DCAware) | 132 | +45% | 1.6 HostFilter(DCAware) | 66 | +25% | 1.7 Cumulative improvement over master (all 7 commits): DCAware: +92% (106 -> 204 Kops/s) RackAware: +165% ( 68 -> 180 Kops/s) TokenAware(DCAware): +233% ( 18 -> 60 Kops/s) TokenAware(RackAware):+235% ( 17 -> 57 Kops/s) Default(DCAware): +45% ( 91 -> 132 Kops/s) HostFilter(DCAware): +25% ( 53 -> 66 Kops/s) Note: TokenAware benchmark uses unique keys per query (0% cache hit rate). Real workloads with repeated partition keys will benefit further from the replica cache added in the previous commit. The LWT shortcut itself has no measurable impact on non-LWT queries (which this benchmark exercises); its benefit is deterministic ordering for LWT.
5f283d1 to
bd6a9c5
Compare
|
Latest numbers: |
There was a problem hiding this comment.
Pull request overview
This PR optimizes load balancing policies (DCAwareRoundRobinPolicy, RackAwareRoundRobinPolicy, TokenAwarePolicy, HostFilterPolicy) with host distance caching and general performance improvements. The key insight is caching computed host distance data (remote hosts, non-local-rack hosts) and replica lookups to avoid repeated computation in the hot query-planning path.
Changes:
- Introduce
_remote_hosts(COW dict) onDCAwareRoundRobinPolicyandRackAwareRoundRobinPolicyfor O(1) distance lookups, plus_non_local_rack_hostsfor rack-aware iteration; both refreshed on topology changes. - Add an LRU replica cache to
TokenAwarePolicy(keyed by(keyspace, routing_key), invalidated on token map changes) and restructuremake_query_planto use direct distance bucketing instead of repeatedyield_in_orderscans. - Add
make_query_plan_with_exclusionAPI toLoadBalancingPolicyand all subclasses, enablingTokenAwarePolicyto skip already-yielded replicas when querying the child policy for remaining hosts.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cassandra/policies.py |
Core optimization: distance caching in DC/RackAware policies, LRU replica cache in TokenAwarePolicy, new make_query_plan_with_exclusion method across policies, formatting cleanup |
tests/unit/test_policies.py |
New tests for make_query_plan_with_exclusion, replica cache (hit/miss/eviction/invalidation/disabled), LWT determinism, tablet bypass; test mocks updated for new token_map-based replica resolution; formatting cleanup |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Reapply distance sorting (LOCAL_RACK -> LOCAL -> REMOTE) to non-replica hosts in TokenAwarePolicy.make_query_plan, restoring behavioral parity with the original yield_in_order approach - Remove __slots__ from TokenAwarePolicy to avoid breaking downstream subclasses that add instance attributes - Fix DCAwareRoundRobinPolicy.on_up to only refresh remote host cache when a host is actually new, avoiding unnecessary rebuilds during reconnection storms
…olicies TokenAwarePolicy.make_query_plan now uses isinstance to detect when the child policy already yields hosts in distance order (DCAware, RackAware), streaming non-replica hosts directly instead of collecting and re-sorting them by distance. Other child policies (e.g. RoundRobinPolicy) still get the distance re-sort fallback to guarantee correct ordering.
There was a problem hiding this comment.
Pull request overview
This PR optimizes the load balancing policies in the ScyllaDB Python driver by introducing distance caching via Copy-On-Write (COW) strategy, an LRU cache for token-to-replica lookups in TokenAwarePolicy, and a new make_query_plan_with_exclusion API to avoid redundant iteration.
Changes:
- Introduces
_remote_hostsand_non_local_rack_hostscached dictionaries/lists inDCAwareRoundRobinPolicyandRackAwareRoundRobinPolicyfor O(1) distance lookups, rebuilt on topology changes. - Adds an LRU cache (
OrderedDict) inTokenAwarePolicyfor token-to-replica lookups, invalidated by token_map object identity change; includes LWT deterministic ordering (no shuffle). - Adds
make_query_plan_with_exclusion()toLoadBalancingPolicyand its subclasses to efficiently skip already-yielded replicas inTokenAwarePolicy.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cassandra/policies.py |
Core performance optimizations: COW distance caching, LRU replica cache, make_query_plan_with_exclusion API, LWT shuffle skip, code formatting |
tests/unit/test_policies.py |
New tests for exclusion-based query planning, LRU cache behavior, LWT determinism, cache invalidation, and test infrastructure updates for new mock requirements |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| child_policy.make_query_plan_with_exclusion.assert_called() | ||
| elif child_policy.make_query_plan_with_exclusion.called: | ||
| child_policy.make_query_plan_with_exclusion.assert_called() |
Refactor
DCAwareRoundRobinPolicyto use a Copy-On-Write (COW) strategy for managing host distances.Results (5 DCs × 3 racks × 3 nodes = 45 nodes, 100K queries, median of 5 iterations):
Key changes:
_remote_hoststo cacheREMOTEhosts, enabling O(1) distance lookups during query planning for distance.IGNOREDhosts do not need to be stored in the cache.For 'LOCAL' we do a simple comparison.
_refresh_remote_hoststo handle node changes.TokenAwarePolicy(default 1024 entries, auto-invalidated on topology change).TokenAwarePolicyskips distance re-sorting for DCAware/RackAware child policies (they already yield in distance order), with a fallback re-sort for other child policies.TokenAwarePolicyno longer uses__slots__to avoid breaking downstream subclasses.This is a different attempt from #650 to add caching to host distance to make query planning faster.
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.