Skip to content
/ server Public

MDEV-34804: mhnsw: compiler-independent choice of CPU-specific optimizations#4671

Open
hadeer-r wants to merge 1 commit intoMariaDB:mainfrom
hadeer-r:MDEV-34804
Open

MDEV-34804: mhnsw: compiler-independent choice of CPU-specific optimizations#4671
hadeer-r wants to merge 1 commit intoMariaDB:mainfrom
hadeer-r:MDEV-34804

Conversation

@hadeer-r
Copy link

@hadeer-r hadeer-r commented Feb 20, 2026

Issue: MDEV-34804

Description

Replace GCC-specific attribute((target(...))) function multi-versioning with a compiler-independent function pointer dispatch mechanism for MHNSW vector operations.

What changed

  1. New dispatch mechanism: A Vector_ops struct holds function pointers for dot_product, alloc_size, align_ptr, and fix_tail. At startup, choose_vector_ops_impl() probes CPU capabilities and selects the best available SIMD implementation (AVX-512 → AVX2 → NEON → POWER → fallback).

  2. Separate x86 detection: CPU detection (CPUID + XGETBV) moved to vector_mhnsw_x86.ccfor clean separation, following the pattern used in mysys/crc32/crc32c.cc

  3. Replaced GCC __attribute__((vector_size(...))) extension in dot_product_avx2 with standard mm256* intrinsics for MSVC compatibility.

  4. Decoupled bloom filter macros: vector_mhnsw.cc now uses its own MHNSW_AVX2/MHNSW_AVX512 macros, independent from bloom_filters.h

Benchmark results

1M iterations, 1024 dimensions, AVX-512, 50 runs (micro sec per call):

Approach Min Max Avg
Function pointer (this PR) 0.034 0.040 0.036
if/else branching 0.035 0.039 0.037
Old attribute((target)) 0.031 0.036 0.033

The ~10% overhead vs the old approach is expected (indirect call vs inlined function). The function pointer and if/else approaches show no significant difference. Function pointer was chosen because it scales better structurally as more CPU variants are added.

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 20, 2026
@hadeer-r hadeer-r marked this pull request as ready for review March 9, 2026 12:20
@hadeer-r
Copy link
Author

hadeer-r commented Mar 12, 2026

I noticed a few CI test failures, but after reviewing the logs, they appear unrelated to my changes.

  1. buildbot/amd64-msan-clang-20: This exact test is also failing on the latest main commit (ef4be39).
  2. amd64-ubuntu-2204-debug and amd64-ubuntu-2204-debug-ps: they were passing successfully in my previous runs. Could a maintainer please re-run them?

Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please squash all of your commits into a single one and make it contain a commit message that complies with CODING_STANDARDS.md.

@hadeer-r
Copy link
Author

Thanks @gkodinov , I squashed my commits to only one.

…ations

Replace GCC-specific __attribute__((target(...))) function multi-versioning
with a function pointer dispatch mechanism for dot_product, alloc_size,
align_ptr, and fix_tail.

A Vector_ops struct holds function pointers for all four operations.
At startup, choose_vector_ops_impl() probes CPU capabilities and selects
the best available implementation (AVX-512, AVX2, NEON, POWER, or fallback).

Move x86 CPU detection into a separate vector_mhnsw_x86.cc, compiled
with appropriate -mavx2/-mavx512 flags.

This approach works on GCC, Clang, MSVC, and musl libc, matching
the pattern used in mysys/crc32/crc32c.cc.

Other changes:
- Replace GCC vector_size extension in dot_product_avx2 with portable
  AVX2 intrinsics (_mm256_add_ps, _mm256_extractf128_ps, etc.)
- Decouple bloom_filters.h macros from vector dispatch macros.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants