Skip to content

Set PyTorch parallelism limits for aarch64/{generic,neoverse_n1,neoverse_v1}#181

Draft
bedroge wants to merge 3 commits intoEESSI:mainfrom
bedroge:more_pytorch_fixes
Draft

Set PyTorch parallelism limits for aarch64/{generic,neoverse_n1,neoverse_v1}#181
bedroge wants to merge 3 commits intoEESSI:mainfrom
bedroge:more_pytorch_fixes

Conversation

@bedroge
Copy link
Contributor

@bedroge bedroge commented Mar 12, 2026

Still keeping this as draft to potentially solve other issues.

@bedroge bedroge marked this pull request as draft March 12, 2026 15:46
@boegel
Copy link
Contributor

boegel commented Mar 12, 2026

You're seeing out-of-memory errors only for aarch64/* targets?

That's interesting, all partitions in our main build cluster have name amount of RAM per core...

@bedroge
Copy link
Contributor Author

bedroge commented Mar 12, 2026

Not completely sure yet, but it definitely worked fine for skylake_av512 and zen5. It failed for neoverse v1 with:

cc1plus: out of memory allocating 65536 bytes after a total of 319553536 bytes
cc1plus: out of memory allocating 139696 bytes after a total of 301793280 bytes
cc1plus: out of memory allocating 3201600 bytes after a total of 98828288 bytes
cc1plus: out of memory allocating 65536 bytes after a total of 348848128 bytes
virtual memory exhausted: Cannot allocate memory
cc1plus: out of memory allocating 193048 bytes after a total of 369950720 bytes
cc1plus: out of memory allocating 145472 bytes after a total of 401604608 bytes
cc1plus: out of memory allocating 146840 bytes after a total of 386990080 bytes
cc1plus: out of memory allocating 65536 bytes after a total of 548405248 bytes
cc1plus: out of memory allocating 151096 bytes after a total of 391184384 bytes
cc1plus: out of memory allocating 65536 bytes after a total of 620494848 bytes

Started builds for the other Arm targets now (the updated hooks file is not applied yet anyway, I just noticed, because it's still using the one from CVMFS ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants