(improvement) Optimize VectorType deserialization with struct.unpack and numpy#730
Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
Draft
(improvement) Optimize VectorType deserialization with struct.unpack and numpy#730mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul wants to merge 2 commits intoscylladb:masterfrom
Conversation
…ct.unpack
Add bulk deserialization using struct.unpack for common numeric vector types
instead of element-by-element deserialization. This provides significant
performance improvements, especially for small vectors and integer types.
Optimized types:
- FloatType ('>Nf' format)
- DoubleType ('>Nd' format)
- Int32Type ('>Ni' format)
- LongType ('>Nq' format)
- ShortType ('>Nh' format)
Performance improvements (measured with CASS_DRIVER_NO_CYTHON=1):
Small vectors (3-4 elements):
Vector<float, 3> : 0.88 μs → 0.25 μs (3.58x faster)
Vector<float, 4> : 0.78 μs → 0.28 μs (2.79x faster)
Medium vectors (128 elements):
Vector<float, 128> : 4.72 μs → 4.06 μs (1.16x faster)
Vector<double, 128> : 4.83 μs → 4.01 μs (1.20x faster)
Vector<int, 128> : 2.27 μs → 1.25 μs (1.82x faster)
Large vectors (384-1536 elements):
Vector<float, 384> : 15.38 μs → 14.67 μs (1.05x faster)
Vector<float, 768> : 32.43 μs → 30.72 μs (1.06x faster)
Vector<float, 1536> : 63.74 μs → 63.24 μs (1.01x faster)
The optimization is most effective for:
- Small vectors (3-4 elements): 2.8-3.6x speedup
- Integer vectors: 1.8x speedup
- Medium-sized float/double vectors: 1.2-1.3x speedup
For very large vectors (384+ elements), the benefit is minimal as the
deserialization time is dominated by data copying rather than function
call overhead.
Variable-size subtypes and other numeric types continue to use the
element-by-element fallback path.
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
For vectors with 32 or more elements, use numpy.frombuffer() which provides 1.3-1.5x speedup for large vectors (128+ elements) compared to struct.unpack. The hybrid approach: - Small vectors (< 32 elements): struct.unpack (2.8-3.6x faster than baseline) - Large vectors (>= 32 elements): numpy.frombuffer().tolist() (1.3-1.5x faster than struct.unpack) Threshold of 32 elements balances code complexity with performance gains. Benchmark results: - float[128]: 2.15 μs → 1.87 μs (1.15x faster) - float[384]: 6.17 μs → 4.44 μs (1.39x faster) - float[768]: 12.25 μs → 8.45 μs (1.45x faster) - float[1536]: 24.44 μs → 15.77 μs (1.55x faster) Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
There was a problem hiding this comment.
Pull request overview
This PR optimizes VectorType (de)serialization in cassandra/cqltypes.py by introducing bulk numeric (de)serialization via a cached struct.Struct, and an optional numpy-based deserialization fast path for larger vectors.
Changes:
- Cache a per-parameterized-vector
struct.Structto bulkunpack/packcommon numeric vector subtypes. - Add an optional numpy
frombuffer(...).tolist()deserialization fast-path for vectors withvector_size >= 32. - Refactor variable-size vector deserialization to a fixed-iteration loop with stricter bounds checks.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+56
to
57
| import numpy as np | ||
|
|
Comment on lines
1500
to
+1504
| try: | ||
| size, bytes_read = uvint_unpack(byts[idx:]) | ||
| idx += bytes_read | ||
| rv.append(cls.subtype.deserialize(byts[idx:idx + size], protocol_version)) | ||
| idx += size | ||
| except: | ||
| except (IndexError, KeyError): | ||
| raise ValueError("Error reading additional data during vector deserialization after successfully adding {} elements"\ | ||
| .format(len(rv))) | ||
| .format(i)) |
Comment on lines
+1476
to
+1479
| if cls._vector_struct is not None: | ||
| if HAVE_NUMPY and cls.vector_size >= 32 and cls._numpy_dtype is not None: | ||
| return np.frombuffer(byts, dtype=cls._numpy_dtype, count=cls.vector_size).tolist() | ||
| return list(cls._vector_struct.unpack(byts)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
struct.unpackfor known numeric types (float, double, int32, int64, short), caching astruct.Structobject at type-creation timenp.frombuffer().tolist()) for vectors with >= 32 elements, delivering ~4x speedup for 768/1536-dimension float vectorsPerformance (pure Python path,
CASS_DRIVER_NO_CYTHON=1)Vector<float, 3>Vector<float, 128>Vector<float, 768>Vector<float, 1536>Details
Commit 1 — struct.unpack optimization:
apply_parameters()time, cache astruct.Struct('>Nf')for the vector's subtype+dimensiondeserialize()callslist(struct.unpack(byts))— single C-level bulk unpackstruct.pack(*v)Commit 2 — numpy for large vectors:
np.frombuffer(byts, dtype='>f4', count=N).tolist().tolist()batch-converts with better cache locality_numpy_dtypecached on the class at type-creation timeBoth commits modify only
cassandra/cqltypes.py. No Cython dependency.