fast-get: fix a recent regression#10639
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a userspace regression in Zephyr’s fast-get cleanup path where memory-domain partition removal can occur from a different thread than the one that originally fast-get’d the buffer (observed with SRC in DP mode and back-to-back aplay -r 44100).
Changes:
- Change
fast_put()userspace partition removal to use theentry->threadmemory domain instead ofk_current_get(). - Update debug logging to reference
entry->threadinstead of the current thread.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lgirdwood
left a comment
There was a problem hiding this comment.
LGTM ,can you respond to copilot. Thanks !
zephyr/lib/fast-get.c
Outdated
| #if CONFIG_USERSPACE | ||
| struct k_mem_domain *mdom = k_current_get()->mem_domain_info.mem_domain; | ||
|
|
||
| /* We only get there for large buffers */ |
There was a problem hiding this comment.
The relation between num_partitions and large buffers is not obvious. Could you make the comment clearer?
zephyr/lib/fast-get.c
Outdated
| /* A userspace thread makes the request */ | ||
| if (k_current_get() != entry->thread && | ||
| if (mdom != entry->mdom && | ||
| !fast_get_domain_exists(k_current_get(), ret, |
There was a problem hiding this comment.
I see that the function fast_get_domain_exists iterates through partitions. Why not calling it fast_get_partition_exists?
Recent commits 7d0621e ("fast-get: fix partition leak for multi-thread usage") and 3c757dd ("fast-get: fix crash by freeing buffer before removing partition") introduced a regression when using SRC in DP mode in userspace on PTL and running two aplay -r 44100 back to back. The reason is that SRC isn't calling mod_fast_put() explicitly and is instead relying on automatic clean up, which then happens in a different thread. In fact those two commits weren't needed AFAICS, since fast_put() shouldn't be called directly from userspace, instead only mod_fast_put() should be called, which is a syscall, and therefore fast_put() then will have access to all the memory. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
In fast_put() sram_ptr == entry->sram_ptr, use the former since it's shorter. Also clarify a comment and rename a function to better match its role. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Before dropping the DP thread memory domain, check if any partitions are left still attached to it. If so, warn and remove them. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
| /* We only get there for large buffers */ | ||
| /* | ||
| * We only get there for large buffers, since small buffers with | ||
| * enabled userspace don't create fast-get entries |
There was a problem hiding this comment.
Just to clarify, I think this part is well understood (you can deduce this info just by examining fast_get function itself). No need to further explain it.
The mystery part that I was referring to in my previous comment is this:
mdom->num_partitions > 1
I am not sure if this comment is even related. I am guessing the related comment is below:
/* A userspace thread makes the request */
Am I right? If yes, maybe you could just change it to:
/* if num_partitions > 0, the request comes form a userspace thread */
Or you can combine both comments like this:
/* We only get there for large buffers.
/* Check if the request comes from a userspace thread by examining num_partitions
*/
There was a problem hiding this comment.
@wjablon1 yes, that test is related to the "userspace thread" comment. @softwarecki wanted to replace that test with a K_USER check, then it will be clearer
There was a problem hiding this comment.
Ok, if we already have a plan for this, forget about my comment and remove unnecessary clarification
|
@lrudyX I see a single WCL "module creation" test failing to create a copier module with this PR. Doesn't look related? |
|
no PTL nocodec device was available. Re-run jenkins tests |
|
SOFCI TEST |
Test failed at deleting pipeline with SRC in DP, which was fixed by "Recent commits 7d0621e ("fast-get: fix partition leak for multi-thread usage") and 3c757dd ("fast-get: fix crash by freeing buffer before removing partition")". It is highly related. |
tmleman
left a comment
There was a problem hiding this comment.
LGTM, thanks for fixing!
singalsu
left a comment
There was a problem hiding this comment.
OK for commit "audio: src: free copied coefficients". I don't know enough to review the other commits.
|
"Internal Intel CI" https://sof-ci.01.org/sof-pr-viewer/#/build/PR10639/build14876904 passes now. Let me just remove the added debugging, after which it should be ready for merging |
Until now SRC was relying on the final module clean up to free copied with fast_get() coefficients. This however isn't clean. Fix it and release partitions explicitly. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
|
The "Internal Intel CI" failure this time is a known intermittent "random HDA DMA" failure |
softwarecki
left a comment
There was a problem hiding this comment.
The fast_put flow is still incorrect for multiple module instances. The fast_get adds partitions to the memory domain of every instance that allocates the buffer, but the partition is removed only for the instance that releases the memory last.
Storing the memory domain pointer inside the entry also looks pointless in the current design. Should we instead maintain a separate entry per module instance?
Additionally, memory domain retrieval for the proxy variant needs fixing. The proxy already stores a pointer to the memory domain in its context structure.
| case MOD_RES_FAST_GET: | ||
| fast_put(res->heap, container->sram_ptr); | ||
| #if CONFIG_USERSPACE | ||
| mdom = mod->mdom; |
There was a problem hiding this comment.
The userspace proxy already stores a pointer to the memory domain in a different place. Need to distinguishing the behavior for PROXY/APPLICATION.
There was a problem hiding this comment.
@softwarecki is this patch breaking anything for the "thread" model? If not, I'd propose to merge it to fix the "application" model which is currently broken because of this fast-get regression. Once it's all fixed we can carefully check what we can improve
| #if CONFIG_USERSPACE | ||
| struct userspace_context *user_ctx; | ||
| #endif /* CONFIG_USERSPACE */ | ||
| #if CONFIG_SOF_USERSPACE_APPLICATION |
There was a problem hiding this comment.
I think this change is unnecessary, because the proxy already stores the memory domain in its context.
| LOG_DBG("grant access to thread %p first was %p", k_current_get(), | ||
| entry->thread); | ||
| int err = fast_get_access_grant(k_current_get(), ret, size); | ||
| if (mdom != entry->mdom && |
There was a problem hiding this comment.
I don't really understand the purpose of this check. The entry allocation together with the mdom assignment happens in a different part of the code.
If the same module instance calls fast_get again for the same address (unlikely), then this condition will indeed trigger. But what about other instances of the same module? Do they always rely on fast_get_partition_exists on every call?
There was a problem hiding this comment.
maybe just checking for "partition exists" would be enough, but this would be an optimisation, we can optimise it later.
@softwarecki I don't think this is correct. That's exactly why |
Recent commits 7d0621e ("fast-get: fix partition leak for multi-thread usage") and 3c757dd ("fast-get: fix crash by freeing buffer before removing partition") introduced a regression when using SRC in DP mode in userspace on PTL and running two aplay -r 44100
back to back. The reason is that SRC isn't calling mod_fast_put() explicitly and is instead relying on automatic clean up, which then happens in a different thread.
In fact those two commits weren't needed AFAICS, since fast_put() shouldn't be called directly from userspace, instead only mod_fast_put() should be called, which is a syscall, and therefore fast_put() then will have access to all the memory.