fix: repair TCP port-remapping slow path#3
Open
ghazariann wants to merge 5 commits intomultikernel:mainfrom
Open
fix: repair TCP port-remapping slow path#3ghazariann wants to merge 5 commits intomultikernel:mainfrom
ghazariann wants to merge 5 commits intomultikernel:mainfrom
Conversation
congwang-mk
requested changes
Mar 24, 2026
Contributor
congwang-mk
left a comment
There was a problem hiding this comment.
Thanks for your PR. Some comments below
| _NR_PIDFD_GETFD = 438 # x86_64 and aarch64 (asm-generic) | ||
|
|
||
|
|
||
| def _pidfd_open(pid: int) -> int: |
Contributor
There was a problem hiding this comment.
duplicated with the one in _context.py ?
Author
There was a problem hiding this comment.
Yes, I refrain from importing it from _context, should we separate all syscall wrappers into a different file, like _syscalls.py
tests/test_sandbox.py
Outdated
| t1 = threading.Thread(target=run, args=(0, code_hold)) | ||
| t1.start() | ||
| import time | ||
| time.sleep(1) |
Contributor
There was a problem hiding this comment.
This could be flaky, could it be improved?
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Vahagn <wanghang25@mails.tsinghua.edu.cn>
Signed-off-by: Vahagn <wanghang25@mails.tsinghua.edu.cn>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2. Four bugs in the TCP port-remapping slow path. The slow path (triggered when the virtual port is already taken) had not been tested (all existing tests ran sequentially, so the port was always free by the time the second sandbox started).
Bugs fixed
Bug 1 -
_allocate_real_portheld socket blocked the child's bindThe parent kept the allocated real port socket open to prevent stealing, then
rewrote the child's sockaddr to that port and sent CONTINUE. The child's
bind()hitEADDRINUSEbecause the parent was already holding the sameaddress:port (only released later in
PortMap.close()).Fixed by closing it immediately, accepting the same tiny race window already mentioned in
_try_reserve_port.Bug 2 -
os.pidfd_opennot available in Python buildfixup_getsocknameusedos.pidfd_openwhich raisesAttributeErroron some builds.Fixed by using
_libc.syscall(_NR_PIDFD_OPEN)directly, consistent with therest of the codebase. In the future, it makes sense to separate all syscalls into
a different file like
_syscall.py, since an identical_NR_PIDFD_OPENwasfound in
_context.py.Bug 3 -
fixup_getsocknameleaked the duplicated fds.detach()does not callos.close(), solocal_fdwas never closed on thesuccess path. The leaked fd kept the child's socket alive after the child
exited, holding the port bound in the parent. Fixed by moving\
os.close(local_fd)into thefinallyblock.Bug 4 -
NotifSupervisor.stop()never calledport_map.close()Proxy listener sockets were left running after sandbox teardown. Fixed by
calling
port_map.close()at the end ofstop().Tests
Two new tests added to
TestPortRemap:test_slow_path_host_holds_virtual_port: parent process holds the virtualport, forcing the slow path without threads or timing dependencies.
test_slow_path_two_concurrent_sandboxes: two sandboxes run concurrently; sandbox 1 holds the port open while sandbox 2 must remap.