Skip to content

Fix Surface Flux Tally Seg Fault#3891

Draft
jtramm wants to merge 2 commits intoopenmc-dev:developfrom
jtramm:fix/lattice-surface-tally-crash
Draft

Fix Surface Flux Tally Seg Fault#3891
jtramm wants to merge 2 commits intoopenmc-dev:developfrom
jtramm:fix/lattice-surface-tally-crash

Conversation

@jtramm
Copy link
Contributor

@jtramm jtramm commented Mar 19, 2026

Description

PR #3742 moved the surface dereference in event_cross_surface() above the lattice translation check so that the surf variable would be available for score_surface_tally() at the end of the function. This caused an out-of-bounds access on model::surfaces when the particle crosses a lattice boundary, because surface() is SURFACE_NONE (0) and surface_index() evaluates to -1.

The fix moves the surface dereference and score_surface_tally call into the else branch (actual surface crossings), where surface() is guaranteed to be a valid nonzero surface ID. A regression test with a lattice geometry and surface tallies is added to prevent recurrence.

Please include a summary of the change and which issue is fixed if applicable. Please also include relevant motivation and context.

Fixes #3890

Checklist

  • I have performed a self-review of my own code
  • I have run clang-format (version 18) on any C++ source files (if applicable)
  • I have followed the style guidelines for Python source files (if applicable)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

John Tramm and others added 2 commits March 19, 2026 16:54
PR openmc-dev#3742 moved the surface dereference in event_cross_surface() above
the lattice translation check so that the surf variable would be
available for score_surface_tally() at the end of the function. This
caused an out-of-bounds access on model::surfaces when the particle
crosses a lattice boundary, because surface() is SURFACE_NONE (0) and
surface_index() evaluates to -1.

The fix moves the surface dereference and score_surface_tally call into
the else branch (actual surface crossings), where surface() is
guaranteed to be a valid nonzero surface ID. A regression test with a
lattice geometry and surface tallies is added to prevent recurrence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jtramm jtramm requested a review from GuySten March 19, 2026 19:43
@jtramm jtramm added the Bugs label Mar 19, 2026
@jtramm jtramm marked this pull request as ready for review March 19, 2026 19:43
@GuySten
Copy link
Contributor

GuySten commented Mar 19, 2026

Thanks for digging into this issue @jtramm.

I do not think it makes sense to not score anything when crossing lattice boundaries.
In principle a user may use current score with a cellfrom filter so even if the notion of surface is unavailable during lattice crossing we should still score to the partial current between two cells.
The only thing we use the surface for is to calculate direction sign and surface crossing cosine.
This data is defined when crossing a lattice boundary so I think the solution should be to implement another code branch.

@jtramm
Copy link
Contributor Author

jtramm commented Mar 19, 2026

@GuySten - that makes sense, I hadn't thought of the cellfrom filter issue. Claude is also saying " In score_surface_tally(): surf.normal() is a virtual method call. For lattice crossings there is no surface, so this either crashes (if the caller passed garbage) or computes a garbage normal."

Anyhow - I will mark this as draft if you want to put in your own PR with a more robust fix.

@jtramm jtramm marked this pull request as draft March 19, 2026 20:13
@GuySten
Copy link
Contributor

GuySten commented Mar 19, 2026

I think I will have time to fix this next week.

@jtramm
Copy link
Contributor Author

jtramm commented Mar 19, 2026

Thanks @GuySten!

I'll also note that the bug is not only triggered when the Flux Tally is in use. It gets triggered potentially anytime there is a lattice due to the:

  const auto& surf {*model::surfaces[surface_index()].get()};

being moved up before the conditional by #3742. So, from what I'm seeing, it's a fairly critical bug that may affect most people (given how common lattices are), not just those using the new tally.

@GuySten
Copy link
Contributor

GuySten commented Mar 19, 2026

That does not makes sense to me. We have a regression test suite that does contain lattices. I think the bug should only appear when using both surface tallies and lattices together in the same problem.

@jtramm
Copy link
Contributor Author

jtramm commented Mar 19, 2026

void Particle::event_cross_surface()
{
  // Saving previous cell data
  for (int j = 0; j < n_coord(); ++j) {
    cell_last(j) = coord(j).cell();
  }
  n_coord_last() = n_coord();

  // Set surface that particle is on and adjust coordinate levels
  surface() = boundary().surface();
  n_coord() = boundary().coord_level();

  const auto& surf {*model::surfaces[surface_index()].get()};

My reading is that if a lattice is present this function will trigger but surface_index() will be SURFACE_NONE (-1), thus resulting in undefined behavior when .get() is called.

I discovered the bug originally with a model that didn't have any tallies in it at all, just a lattice.

@GuySten
Copy link
Contributor

GuySten commented Mar 19, 2026

When I ran your example the code crash for me after this line, in the score_surface_tally function. Maybe undefined behavior causes it to crash randomly in different places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface flux tally causes OpenMC to crash/seg fault when used in model with a lattice

2 participants