Thread-safe config reads and SIGHUP reload for HTTP API server by mason-sharp · Pull Request #89 · pgEdge/ace

mason-sharp · 2026-03-13T21:08:58Z

A follow up to the SIGHUP changes in PR #84

Replace all bare config.Cfg reads with config.Get() across auth.go, table_diff.go, merkle.go, listen.go, and setup.go to eliminate data races with config.Set() during SIGHUP reload.

Switch HTTP API handlers from stale s.cfg snapshot to config.Get() so reloaded values (block size, concurrency, timeouts, etc.) take effect on the next request. Remove the cfg field from APIServer.

Add SIGHUP config reload handling for both API-only entry points: ./ace server (StartAPIServerCLI) and ./ace start --component=api. Invalid configs are rejected with the previous config retained.

While the test included is basic, there was testing done to verify the change.

Replace all bare config.Cfg reads with config.Get() across auth.go, table_diff.go, merkle.go, listen.go, and setup.go to eliminate data races with config.Set() during SIGHUP reload. Switch HTTP API handlers from stale s.cfg snapshot to config.Get() so reloaded values (block size, concurrency, timeouts, etc.) take effect on the next request. Remove the cfg field from APIServer. Add SIGHUP config reload handling for both API-only entry points: ./ace server (StartAPIServerCLI) and ./ace start --component=api. Invalid configs are rejected with the previous config retained. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-13T21:09:17Z

📝 Walkthrough

Walkthrough

The PR refactors configuration access throughout the codebase from static snapshots to a centralized runtime accessor, enabling hot-reload of configuration via SIGHUP signals. Changes include removing stored config from APIServer struct, updating all config access patterns to use config.Get(), adding SIGHUP signal handling in CLI, and introducing related tests and documentation.

Changes

Cohort / File(s)	Summary
API Server Configuration Refactoring `internal/api/http/server.go`, `internal/api/http/handler.go`, `internal/api/http/config_reload_test.go`	Removes cfg field from APIServer struct; updates handler methods and resolver functions to use `config.Get()` for live configuration access; introduces test validating resolver methods pick up reloaded configuration values.
Configuration Access Pattern Refactoring `internal/consistency/diff/table_diff.go`, `internal/consistency/mtree/merkle.go`, `internal/infra/cdc/*`, `internal/infra/db/auth.go`	Replaces global `config.Cfg` access with dynamic `config.Get()` accessor across consistency checking, merkle tree operations, CDC replication, and database connection handling.
CLI Signal Handling and Scheduler Reload `internal/cli/cli.go`	Introduces SIGHUP-based configuration reload for API and scheduler modes; adds `schedulerReloadLoop` function to handle graceful in-flight job draining and atomic config swap.
Documentation `docs/scheduling.md`	Adds section documenting SIGHUP signaling mechanism for scheduler configuration reload, including behavior for waiting on in-flight job completion before swapping configuration.

Poem

🐰 Hops with glee ✨
The scheduler drinks fresh water from the spring,
No restart needed, just SIGHUP's ring!
Config refreshed while jobs still run,
The reload dance has just begun,
Live and nimble, we adapt with cheer!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: replacing config.Cfg with config.Get() for thread safety and adding SIGHUP reload functionality to the HTTP API server.
Description check	✅ Passed	The description is directly related to the changeset, explaining the follow-up to PR `#84`, the switch from config.Cfg to config.Get(), removal of the cfg field from APIServer, and SIGHUP handling for API entry points.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/ACE/sighup-follow-up

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (4)

internal/consistency/diff/table_diff.go (1)

730-735: Add a nil-config guard in validation path.

cfg.TableDiff is dereferenced immediately; returning a clear validation error is safer than panicking.

Proposed fix

 	cfg := config.Get()
+	if cfg == nil {
+		return fmt.Errorf("configuration not loaded")
+	}
 	if t.BlockSize > cfg.TableDiff.MaxBlockSize && !t.OverrideBlockSize {
 		return fmt.Errorf("block row size should be <= %d", cfg.TableDiff.MaxBlockSize)
 	}

Also applies to: 741-743

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/consistency/diff/table_diff.go` around lines 730 - 735, The
validation currently dereferences cfg.TableDiff without checking for nil; update
the validation path (the block where cfg := config.Get() compares t.BlockSize
against cfg.TableDiff.MaxBlockSize/MinBlockSize and checks t.OverrideBlockSize)
to first guard that cfg.TableDiff != nil and return a clear error (e.g., "table
diff configuration missing") if it is nil; apply the same nil-config guard to
the other identical check later (the code around the second occurrence
referencing cfg.TableDiff at lines ~741-743) so neither MaxBlockSize nor
MinBlockSize dereferences can panic.

internal/infra/cdc/setup.go (1)

42-43: Guard config.Get() before dereferencing.

config.Get().MTree.CDC can panic when config is unavailable. Returning an explicit error is safer.

Proposed fix

-	cfg := config.Get().MTree.CDC
+	rootCfg := config.Get()
+	if rootCfg == nil {
+		return 0, fmt.Errorf("configuration not loaded")
+	}
+	cfg := rootCfg.MTree.CDC

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/infra/cdc/setup.go` around lines 42 - 43, Guard the result of
config.Get() and its nested fields before dereferencing: replace the direct
access cfg := config.Get().MTree.CDC and slot := cfg.SlotName with a nil-check
flow that obtains root := config.Get(), returns an explicit error if root is nil
or if root.MTree or root.MTree.CDC is nil, then assigns cfg := root.MTree.CDC
and slot := cfg.SlotName; ensure the containing function returns an error when
config is unavailable rather than letting a panic occur.

docs/scheduling.md (1)

44-47: Add a concrete command example for sending SIGHUP.

Consider including one explicit example (e.g., kill -HUP <pid>) so operators can apply this immediately.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/scheduling.md` around lines 44 - 47, Update the "Signaling the Scheduler
to Reload the Configuration" section to include a concrete command example such
as "kill -HUP <pid>" and an alternative like "pkill -HUP <process_name>", plus a
brief note how to locate the PID (e.g., pgrep or systemctl show) and mention the
systemd variant "systemctl kill -s HUP <service>" so operators immediately have
actionable commands; place this immediately after the sentence that explains
using SIGHUP.

internal/api/http/handler.go (1)

223-223: Use one config snapshot per request to avoid mixed-version task settings.

At Line 223, Line 363, Line 444, Line 509, Line 584, Line 666, Line 744, Line 802, Line 866, Line 937, Line 1012, and Line 1091, each handler reads config.Get() independently. Since pkg/config/config.go:116-179 allows Set() to swap config between reads, a single request can assemble a task with mixed old/new config values.

♻️ Suggested pattern (snapshot once, reuse)

 func (s *APIServer) handleTableDiff(w http.ResponseWriter, r *http.Request) {
+	cfg := config.Get()
+	if cfg == nil {
+		writeError(w, http.StatusServiceUnavailable, "configuration unavailable")
+		return
+	}
+
 	...
-	task.BlockSize = s.resolveBlockSize(req.BlockSize)
-	task.ConcurrencyFactor = s.resolveConcurrency(req.Concurrency)
-	task.CompareUnitSize = s.resolveCompareUnitSize(req.CompareUnitSize)
-	task.MaxDiffRows = s.resolveMaxDiffRows(req.MaxDiffRows)
+	task.BlockSize = s.resolveBlockSize(cfg, req.BlockSize)
+	task.ConcurrencyFactor = s.resolveConcurrency(cfg, req.Concurrency)
+	task.CompareUnitSize = s.resolveCompareUnitSize(cfg, req.CompareUnitSize)
+	task.MaxDiffRows = s.resolveMaxDiffRows(cfg, req.MaxDiffRows)
 	...
-	task.TaskStorePath = config.Get().Server.TaskStorePath
+	task.TaskStorePath = cfg.Server.TaskStorePath
 }
 
-func (s *APIServer) resolveBlockSize(requested int) int {
+func (s *APIServer) resolveBlockSize(cfg *config.Config, requested int) int {
 	if requested > 0 {
 		return requested
 	}
-	if cfg := config.Get(); cfg != nil && cfg.TableDiff.DiffBlockSize > 0 {
+	if cfg != nil && cfg.TableDiff.DiffBlockSize > 0 {
 		return cfg.TableDiff.DiffBlockSize
 	}
 	return 100000
 }

Also applies to: 363-363, 444-444, 509-509, 584-584, 666-666, 744-744, 802-802, 866-866, 937-937, 1012-1012, 1091-1091

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/api/http/handler.go` at line 223, Multiple places in handler.go call
config.Get() per-use (e.g., setting task.TaskStorePath =
config.Get().Server.TaskStorePath) which can mix config versions during a
request; fix by taking a single snapshot of the config at the start of each
request handler (e.g., cfg := config.Get()) and then replace subsequent
config.Get() calls in that handler with cfg (use cfg.Server.TaskStorePath,
cfg.OtherField, etc.), ensuring all task assembly and reads (including where
task.TaskStorePath is set) consistently use the same cfg for the duration of the
request.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/api/http/config_reload_test.go`:
- Around line 16-20: The test cleanup currently calls
config.Set(&config.Config{}) when original == nil, which mutates global state;
instead, restore the exact prior state by calling config.Set(original) in all
cases (i.e., allow original to be nil) so that when original is nil the global
remains nil; update the cleanup logic around the original variable and the
config.Set call (references: original, config.Set, config.Config) to ensure no
cross-test pollution.

In `@internal/cli/cli.go`:
- Around line 1456-1462: The reload path currently calls config.Reload and
immediately config.Set the newCfg without validating it; update the reload logic
(the block that calls config.Reload/newCfg and the similar block around lines
where this appears) to validate the reloaded config with
canStartAPIServer(newCfg) (the same check used at startup) before calling
config.Set, and if validation fails log an error with details and continue
without replacing the live config; ensure you reference the functions
config.Reload, config.Set and canStartAPIServer in the change and preserve the
existing success/failure logging behavior.

In `@internal/consistency/mtree/merkle.go`:
- Around line 50-52: The aceSchema() helper currently reads
config.Get().MTree.Schema on every call which can drift during reloads; instead
snapshot the schema once at task/run start and reuse that value. Modify call
sites to accept a schema parameter (or store the snapshot on a per-task struct)
and remove direct use of config.Get().MTree.Schema from aceSchema()/caller code
so aceSchema() returns the pre-captured string (or remove aceSchema and use the
injected snapshot); reference aceSchema(), config.Get().MTree.Schema and any
task/run struct or function that orchestrates the operation to propagate and use
the single captured schema value.

In `@internal/infra/cdc/listen.go`:
- Around line 61-63: The bug is that processChanges reads live config via
config.Get() allowing mid-stream schema changes; instead, capture the schema
once in processReplicationStream and pass it down: modify
processReplicationStream to read cfg := config.Get().MTree.CDC (or specifically
mtreeSchema := cfg.SomeSchemaField) before starting the stream and change the
signature of processChanges(ctx context.Context, pool *pgxpool.Pool, changes
[]cdcMsg, mtreeSchema string) to accept the schema, then update all calls to
processChanges in this file (and related call sites e.g., where
processReplicationStream invokes it) so processChanges uses the passed
mtreeSchema instead of calling config.Get() internally. Ensure any other uses of
config.Get() in this CDC path are replaced with the captured value to prevent
mixed-schema writes.

---

Nitpick comments:
In `@docs/scheduling.md`:
- Around line 44-47: Update the "Signaling the Scheduler to Reload the
Configuration" section to include a concrete command example such as "kill -HUP
<pid>" and an alternative like "pkill -HUP <process_name>", plus a brief note
how to locate the PID (e.g., pgrep or systemctl show) and mention the systemd
variant "systemctl kill -s HUP <service>" so operators immediately have
actionable commands; place this immediately after the sentence that explains
using SIGHUP.

In `@internal/api/http/handler.go`:
- Line 223: Multiple places in handler.go call config.Get() per-use (e.g.,
setting task.TaskStorePath = config.Get().Server.TaskStorePath) which can mix
config versions during a request; fix by taking a single snapshot of the config
at the start of each request handler (e.g., cfg := config.Get()) and then
replace subsequent config.Get() calls in that handler with cfg (use
cfg.Server.TaskStorePath, cfg.OtherField, etc.), ensuring all task assembly and
reads (including where task.TaskStorePath is set) consistently use the same cfg
for the duration of the request.

In `@internal/consistency/diff/table_diff.go`:
- Around line 730-735: The validation currently dereferences cfg.TableDiff
without checking for nil; update the validation path (the block where cfg :=
config.Get() compares t.BlockSize against
cfg.TableDiff.MaxBlockSize/MinBlockSize and checks t.OverrideBlockSize) to first
guard that cfg.TableDiff != nil and return a clear error (e.g., "table diff
configuration missing") if it is nil; apply the same nil-config guard to the
other identical check later (the code around the second occurrence referencing
cfg.TableDiff at lines ~741-743) so neither MaxBlockSize nor MinBlockSize
dereferences can panic.

In `@internal/infra/cdc/setup.go`:
- Around line 42-43: Guard the result of config.Get() and its nested fields
before dereferencing: replace the direct access cfg := config.Get().MTree.CDC
and slot := cfg.SlotName with a nil-check flow that obtains root :=
config.Get(), returns an explicit error if root is nil or if root.MTree or
root.MTree.CDC is nil, then assigns cfg := root.MTree.CDC and slot :=
cfg.SlotName; ensure the containing function returns an error when config is
unavailable rather than letting a panic occur.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a2543e0c-d27d-462b-8870-123b5d8e6bb4

📥 Commits

Reviewing files that changed from the base of the PR and between c3af6dd and e620446.

📒 Files selected for processing (10)

docs/scheduling.md
internal/api/http/config_reload_test.go
internal/api/http/handler.go
internal/api/http/server.go
internal/cli/cli.go
internal/consistency/diff/table_diff.go
internal/consistency/mtree/merkle.go
internal/infra/cdc/listen.go
internal/infra/cdc/setup.go
internal/infra/db/auth.go

💤 Files with no reviewable changes (1)

internal/api/http/server.go

coderabbitai · 2026-03-13T21:15:25Z

internal/api/http/config_reload_test.go

+		if original != nil {
+			config.Set(original)
+		} else {
+			config.Set(&config.Config{})
+		}


⚠️ Potential issue | 🟡 Minor

Restore exact prior global state in test cleanup.

When original == nil, setting &config.Config{} changes post-test global semantics. Restore nil instead to avoid cross-test pollution.

Proposed fix

t.Cleanup(func() { if original != nil { config.Set(original) } else { - config.Set(&config.Config{}) + config.Set(nil) } })

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if original != nil {

config.Set(original)

} else {

config.Set(&config.Config{})

}

if original != nil {

config.Set(original)

} else {

config.Set(nil)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/api/http/config_reload_test.go` around lines 16 - 20, The test cleanup currently calls config.Set(&config.Config{}) when original == nil, which mutates global state; instead, restore the exact prior state by calling config.Set(original) in all cases (i.e., allow original to be nil) so that when original is nil the global remains nil; update the cleanup logic around the original variable and the config.Set call (references: original, config.Set, config.Config) to ensure no cross-test pollution.

coderabbitai · 2026-03-13T21:15:26Z

internal/cli/cli.go

+				newCfg, err := config.Reload(config.CfgPath)
+				if err != nil {
+					logger.Error("api: config reload failed (keeping current config): %v", err)
+					continue
+				}
+				config.Set(newCfg)
+				logger.Info("api: configuration reloaded successfully")


⚠️ Potential issue | 🟠 Major

Validate reloaded API config before config.Set() (parity with startup checks).

Both API reload loops set the new config without canStartAPIServer gating. This can accept an API-invalid config at runtime instead of retaining the previous one.

Proposed fix

newCfg, err := config.Reload(config.CfgPath) if err != nil { logger.Error("api: config reload failed (keeping current config): %v", err) continue } + if ok, vErr := canStartAPIServer(newCfg); !ok { + logger.Error("api: new config rejected (keeping current config): %v", vErr) + continue + } config.Set(newCfg) logger.Info("api: configuration reloaded successfully")

Also applies to: 1602-1608

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/cli/cli.go` around lines 1456 - 1462, The reload path currently calls config.Reload and immediately config.Set the newCfg without validating it; update the reload logic (the block that calls config.Reload/newCfg and the similar block around lines where this appears) to validate the reloaded config with canStartAPIServer(newCfg) (the same check used at startup) before calling config.Set, and if validation fails log an error with details and continue without replacing the live config; ensure you reference the functions config.Reload, config.Set and canStartAPIServer in the change and preserve the existing success/failure logging behavior.

coderabbitai · 2026-03-13T21:15:26Z

internal/consistency/mtree/merkle.go

 func aceSchema() string {
-	return config.Cfg.MTree.Schema
+	return config.Get().MTree.Schema
 }


⚠️ Potential issue | 🟠 Major

Avoid live schema lookup on every aceSchema() call.

This introduces per-call config drift risk during reloads: one long operation can target different schemas. Snapshot the schema once per task/run and reuse it consistently.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/consistency/mtree/merkle.go` around lines 50 - 52, The aceSchema() helper currently reads config.Get().MTree.Schema on every call which can drift during reloads; instead snapshot the schema once at task/run start and reuse that value. Modify call sites to accept a schema parameter (or store the snapshot on a per-task struct) and remove direct use of config.Get().MTree.Schema from aceSchema()/caller code so aceSchema() returns the pre-captured string (or remove aceSchema and use the injected snapshot); reference aceSchema(), config.Get().MTree.Schema and any task/run struct or function that orchestrates the operation to propagate and use the single captured schema value.

coderabbitai · 2026-03-13T21:15:26Z

internal/infra/cdc/listen.go

+	cfg := config.Get().MTree.CDC
 	publication := cfg.PublicationName
 	slotName := cfg.SlotName


⚠️ Potential issue | 🟠 Major

Snapshot config-derived schema once per stream to avoid mixed-schema writes.

Using live config.Get() inside processChanges allows schema changes mid-stream after SIGHUP, so one CDC run can write to different schemas. Capture schema once in processReplicationStream and pass it through.

Proposed direction

- cfg := config.Get().MTree.CDC + rootCfg := config.Get() + if rootCfg == nil { + return fmt.Errorf("configuration not loaded") + } + cfg := rootCfg.MTree.CDC + mtreeSchema := rootCfg.MTree.Schema ... - if err := processChanges(processingCtx, pool, c); err != nil { + if err := processChanges(processingCtx, pool, c, mtreeSchema); err != nil {

// apply outside selected range func processChanges(ctx context.Context, pool *pgxpool.Pool, changes []cdcMsg, mtreeSchema string) error { // use mtreeSchema directly, no config.Get() here }

Also applies to: 554-556

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/infra/cdc/listen.go` around lines 61 - 63, The bug is that processChanges reads live config via config.Get() allowing mid-stream schema changes; instead, capture the schema once in processReplicationStream and pass it down: modify processReplicationStream to read cfg := config.Get().MTree.CDC (or specifically mtreeSchema := cfg.SomeSchemaField) before starting the stream and change the signature of processChanges(ctx context.Context, pool *pgxpool.Pool, changes []cdcMsg, mtreeSchema string) to accept the schema, then update all calls to processChanges in this file (and related call sites e.g., where processReplicationStream invokes it) so processChanges uses the passed mtreeSchema instead of calling config.Get() internally. Ensure any other uses of config.Get() in this CDC path are replaced with the captured value to prevent mixed-schema writes.

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread-safe config reads and SIGHUP reload for HTTP API server#89

Thread-safe config reads and SIGHUP reload for HTTP API server#89
mason-sharp wants to merge 1 commit intomainfrom
feature/ACE/sighup-follow-up

mason-sharp commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mason-sharp commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 13, 2026 •

edited

Loading