Skip to content

PIP-121: Implement AutoClusterFailover#547

Open
BewareMyPower wants to merge 13 commits intoapache:mainfrom
BewareMyPower:bewaremypower/auto-cluster-failover
Open

PIP-121: Implement AutoClusterFailover#547
BewareMyPower wants to merge 13 commits intoapache:mainfrom
BewareMyPower:bewaremypower/auto-cluster-failover

Conversation

@BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Mar 5, 2026

Compared to the Java implementation, this PR changes the delay-based failover and switch-back configs to counter-based configs. See apache/pulsar#25326 for the reason.

@BewareMyPower BewareMyPower force-pushed the bewaremypower/auto-cluster-failover branch from 7b1ba7d to 5df139c Compare March 13, 2026 08:19
@BewareMyPower BewareMyPower self-assigned this Mar 13, 2026
@BewareMyPower BewareMyPower requested a review from Copilot March 13, 2026 14:01
@BewareMyPower BewareMyPower marked this pull request as ready for review March 13, 2026 14:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an AutoClusterFailover implementation of ServiceInfoProvider to automatically switch the client’s service URL between a primary cluster and fallback clusters based on periodic TCP reachability probes, plus unit tests for basic failover/switchback behavior.

Changes:

  • Introduces new public pulsar::AutoClusterFailover API (builder/config) and implementation with periodic probing and delayed failover/switchback.
  • Adds new tests exercising failover to an available secondary and switching back to primary after recovery.
  • Makes ServiceInfo default-constructible to support the new builder/config construction pattern.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
lib/AutoClusterFailover.cc Implements the async probing + state machine for primary/secondary switching.
include/pulsar/AutoClusterFailover.h Adds the public API (Config + Builder) for wiring AutoClusterFailover into Client::create(...).
include/pulsar/ServiceInfo.h Adds a default constructor to ServiceInfo to enable default construction in the new builder/config.
tests/ServiceInfoProviderTest.cc Adds probe-based tests for failover and switchback timing behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new AutoClusterFailover ServiceInfoProvider implementation that periodically probes configured Pulsar service URLs and automatically switches between a primary and one (or more) secondary clusters based on availability.

Changes:

  • Added public pulsar::AutoClusterFailover API (builder/config) in include/pulsar/AutoClusterFailover.h.
  • Implemented probing + failover/switchback logic using Asio in lib/AutoClusterFailover.cc.
  • Added new gtest coverage for failover and switchback timing behavior in tests/ServiceInfoProviderTest.cc.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
include/pulsar/AutoClusterFailover.h New public header/API for configuring and constructing the failover provider.
lib/AutoClusterFailover.cc Core implementation: periodic async TCP probing and cluster switching logic.
tests/ServiceInfoProviderTest.cc Adds tests exercising failover to secondary and switchback to primary after recovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants