MDEV-34804: mhnsw: compiler-independent choice of CPU-specific optimizations#4671
Open
hadeer-r wants to merge 1 commit intoMariaDB:mainfrom
Open
MDEV-34804: mhnsw: compiler-independent choice of CPU-specific optimizations#4671hadeer-r wants to merge 1 commit intoMariaDB:mainfrom
hadeer-r wants to merge 1 commit intoMariaDB:mainfrom
Conversation
Author
|
I noticed a few CI test failures, but after reviewing the logs, they appear unrelated to my changes.
|
gkodinov
requested changes
Mar 13, 2026
Member
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Please squash all of your commits into a single one and make it contain a commit message that complies with CODING_STANDARDS.md.
Author
|
Thanks @gkodinov , I squashed my commits to only one. |
…ations Replace GCC-specific __attribute__((target(...))) function multi-versioning with a function pointer dispatch mechanism for dot_product, alloc_size, align_ptr, and fix_tail. A Vector_ops struct holds function pointers for all four operations. At startup, choose_vector_ops_impl() probes CPU capabilities and selects the best available implementation (AVX-512, AVX2, NEON, POWER, or fallback). Move x86 CPU detection into a separate vector_mhnsw_x86.cc, compiled with appropriate -mavx2/-mavx512 flags. This approach works on GCC, Clang, MSVC, and musl libc, matching the pattern used in mysys/crc32/crc32c.cc. Other changes: - Replace GCC vector_size extension in dot_product_avx2 with portable AVX2 intrinsics (_mm256_add_ps, _mm256_extractf128_ps, etc.) - Decouple bloom_filters.h macros from vector dispatch macros.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue: MDEV-34804
Description
Replace GCC-specific attribute((target(...))) function multi-versioning with a compiler-independent function pointer dispatch mechanism for MHNSW vector operations.
What changed
New dispatch mechanism: A
Vector_opsstruct holds function pointers fordot_product,alloc_size,align_ptr, andfix_tail. At startup,choose_vector_ops_impl()probes CPU capabilities and selects the best available SIMD implementation (AVX-512 → AVX2 → NEON → POWER → fallback).Separate x86 detection: CPU detection (CPUID + XGETBV) moved to
vector_mhnsw_x86.ccfor clean separation, following the pattern used inmysys/crc32/crc32c.ccReplaced GCC
__attribute__((vector_size(...)))extension indot_product_avx2with standard mm256* intrinsics for MSVC compatibility.Decoupled bloom filter macros:
vector_mhnsw.ccnow uses its own MHNSW_AVX2/MHNSW_AVX512 macros, independent frombloom_filters.hBenchmark results
1M iterations, 1024 dimensions, AVX-512, 50 runs (micro sec per call):
The ~10% overhead vs the old approach is expected (indirect call vs inlined function). The function pointer and if/else approaches show no significant difference. Function pointer was chosen because it scales better structurally as more CPU variants are added.