fix: check for existing buildkitd before mounting sticky disk by pbardea · Pull Request #71 · useblacksmith/setup-docker-builder

pbardea · 2026-03-16T17:29:42Z

When setup-docker-builder is invoked twice in the same job (e.g. via a composite action called twice), the second invocation was calling setupStickyDisk() before detecting the already-running buildkitd. This caused a new sticky disk to be mounted on top of /var/lib/buildkit while buildkitd was still running with in-memory metadata referencing snapshot directories from the original disk. The subsequent build then failed with:

ERROR: failed to solve: failed to read dockerfile: failed to walk:
resolve: lstat /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/N:
no such file or directory

Fix: move the buildkitd process check to the very beginning of startBlacksmithBuilder(), before any sticky disk setup. If buildkitd is already running, log an informational message and return immediately so the fallback path reuses the existing configured builder (from the first invocation) without corrupting its overlayfs snapshot state.

Based on #65 by @chadxz, with dist rebuilt.

Note

Medium Risk
Changes builder initialization control flow to early-exit when buildkitd is already running, which affects how repeated action invocations behave and could impact workflows that relied on re-initialization side effects. Scope is localized to setup sequencing and logging, with no new external interfaces.

Overview
Prevents repeated setup-docker-builder invocations in the same job from remounting the sticky disk over an already-running buildkitd.

startBlacksmithBuilder() now checks for an existing buildkitd process before calling setupStickyDisk(); if found, it logs and returns early so the action reuses the already-configured builder instead of corrupting BuildKit’s on-disk snapshot state.

^{Written by Cursor Bugbot for commit 730dff6. This will update automatically on new commits. Configure here.}

When setup-docker-builder is invoked twice in the same job (e.g. via a composite action called twice), the second invocation was calling setupStickyDisk() before detecting the already-running buildkitd. This caused a new sticky disk to be mounted on top of /var/lib/buildkit while buildkitd was still running with in-memory metadata referencing snapshot directories from the original disk. The subsequent build then failed with: ERROR: failed to solve: failed to read dockerfile: failed to walk: resolve: lstat /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/N: no such file or directory Fix: move the buildkitd process check to the very beginning of startBlacksmithBuilder(), before any sticky disk setup. If buildkitd is already running, log an informational message and return immediately so the fallback path reuses the existing configured builder (from the first invocation) without corrupting its overlayfs snapshot state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Why? ==== When setup-docker-builder is called twice in one job, the second instance's post-action cleanup kills the buildkitd process that the first instance started. GitHub Actions runs post steps in reverse order, so the second cleanup runs first and tears down buildkitd before the first instance's builds can export their build records. This causes "connection refused" errors in the first build's post step and "buildkitd has crashed" warnings in the first setup's cleanup. How? ==== - Inspected the failing job logs from the ee-monorepo CI run to trace the exact sequence: Post Setup (2nd) kills buildkitd, then Post Build (1st) gets connection refused. - Confirmed that `startBlacksmithBuilder` returns `addr: null` when it detects existing buildkitd, which means `buildkitdAddr` is never saved to state for the second instance. - Used `buildkitdAddr` presence in state as the ownership signal — only the instance that started buildkitd should shut it down. - Extracted the shutdown logic into `maybeShutdownBuildkitd()` with early returns to flatten the deeply nested if/try/catch structure, and deduplicated the crash-log printing into `logBuildkitdCrashLogs()`.

chadxz · 2026-03-18T21:11:56Z

When you rebuild I will test this again!

pbardea · 2026-03-19T20:32:36Z

@chadxz - sorry for the delay you should be good to go here!

chadxz · 2026-03-24T22:07:57Z

Tested this against our monorepo where we hit the original bug. We have composite actions that call setup-docker-builder internally, and when two of those run in the same job, the second one would remount the sticky disk and corrupt the overlayfs state.

Set up a workflow that calls setup-docker-builder (pinned to 89e1f28) twice in one job, building two different Docker images (a Node app and a .NET app). Both builds completed successfully.

The logs show the fix working as expected -- first invocation starts buildkitd, second invocation detects it and bails out early:

Setup Blacksmith Docker Builder (1st): Starting buildkitd ...
Setup Blacksmith Docker Builder (1st): buildkitd daemon started successfully with PID 3883
Setup Blacksmith Docker Builder (2nd): Detected existing buildkitd process (PID: 3883). Skipping builder setup - builder is already initialized.

Cleanup also looked correct -- the second post-step skipped shutdown since it did not start buildkitd, and the first post-step shut it down gracefully.

Full job logs: https://app.blacksmith.sh/convergint/runs/23514311534/jobs/68442899155

chadxz and others added 2 commits March 10, 2026 17:26

build: rebuild dist

debddcb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chadxz mentioned this pull request Mar 16, 2026

fix: check for existing buildkitd before mounting sticky disk #65

Closed

useblacksmith deleted a comment from blacksmith-staging bot Mar 18, 2026

rebuild

89e1f28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: check for existing buildkitd before mounting sticky disk#71

fix: check for existing buildkitd before mounting sticky disk#71
pbardea wants to merge 4 commits intomainfrom
fix-double-setup-error

pbardea commented Mar 16, 2026 •

edited by cursor bot

Loading

Uh oh!

chadxz commented Mar 18, 2026

Uh oh!

pbardea commented Mar 19, 2026

Uh oh!

chadxz commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pbardea commented Mar 16, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chadxz commented Mar 18, 2026

Uh oh!

pbardea commented Mar 19, 2026

Uh oh!

chadxz commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pbardea commented Mar 16, 2026 •

edited by cursor bot

Loading