Skip to content

Native serialization for Indexes.#300

Open
razdoburdin wants to merge 10 commits intomainfrom
dev/razdoburdin_streaming
Open

Native serialization for Indexes.#300
razdoburdin wants to merge 10 commits intomainfrom
dev/razdoburdin_streaming

Conversation

@razdoburdin
Copy link
Contributor

This PR adds native stream serialization to all SVS index types, as an alternative to the existing (legacy) directory-based serialization. It allow to avoid filesystem round-trips of the data. The native serialization doesn't require from the stream to be seek able, so no additional restrictions were introduced.

See the following PR for details: #280, #281, #285, #286, #289, #292, #294, #296, #299

Main changes are:

  1. A CRTP base Archiver extracts binary I/O primitives (write_size, read_size, write_name, read_name, read_from_istream) from DirectoryArchiver. DirectoryArchiver and new StreamArchiver class inherit from Archiver. StreamArchiver has its own magic number ("SVS_STRM") to distinguish native streams from directory archives.
  2. The monolithic Writer is split via CRTP with two derived classes: FileWriter owns an std::ofstream, writes a header, flushes on destructor, StreamWriter wraps an external std::ostream&, no header/lifecycle management. This allows io::save(data, os) to write vector data directly to any stream.
  3. The save(stream) in orchestrator Impl classes no longer does temp-dir->pack. Instead it directly calls impl().save(stream).
  4. The dispatching between new (native) and old (legacy) deserialization is made at the orchestrators. Deserializer::build(is) reads the magic number, exposes is_native() to choose path.

razdoburdin and others added 9 commits March 4, 2026 15:00
Reopening of #275 for
developer branch

---------

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR introduce native serialization for DynamicFlat index.

Main changes are:
1. `auto_dynamic_assemble` now accepts lazy loader. That is mandatory
for buffer-free deserialization.
2. new class `Deserializer' is introduced. It is responsible for
conditional reading of overhead data (like names of temporary files) in
case of legacy models.
3. `IDTranslator` is refactored to cover save and load to/from stream.

---------

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for `Vamana` index.

Main changes are:
1. New overload of `svs::index::vamana::auto_assemble` required for
direct deserialization accepts lazy loaders and call them in a flexible
order to cover legacy serilized models.
2. `save_table` method is renamed to `metadata` to avoid confusions, as
far as it doesn't save anything.
3. Some minor refactoring to streamline the logic.
4. Serialization for `MutableVamana` is also implemented to avoid
compilation errors. Deserialization and related tests for
`MutableVamana` are expected later.

---------

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
Co-authored-by: Rafik Saliev <rafik.f.saliev@intel.com>
This PR introduce native serialization for `MutableVamana` index.

Main changes are:

1. New overload of svs::index::vamana::auto_dynamic_assemble required
for direct deserialization accepts lazy loaders and call them in a
flexible order to cover legacy serialized models.
2. The test file tests/svs/index/vamana/dynamic_index.cpp is returned to
the test build, as far as it is the right place for serialization tests.
3. Some minor refactoring to streamline the logic.

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR diverse deserialization paths for newly serialized models with a
legacy ones.

The main changes are:
1. Legacy models are now deserialized with intermediate files, as it was
done before.
2. Native deserialization path is cleaned, as far as we don't need
support of legacy models for this path.

---------

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for IVF index.

Main changes are:

1. New overload of svs::index::ivf::load_ivf_index accpting istream is
introduced.
2. Added related tests.
3. save(std::ostream&) method for DynamicIVF is just a placeholder for
now. Real implementation is expected later.

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for `MultiMutableVamana` index.
Should be merged after:
#286

Main changes are:

1. New overload of `svs::index::vamana::auto_multi_dynamic_assemble`
required for direct deserialization accepts lazy loaders and call them
in a flexible order to cover legacy serialized models.
2. Added related tests.
3. `supports_saving` flag is keeped false, as far as it isn't used.
4. `MultiMutableVamana` doesn't have an orchestrator. So no changes on
this side.

---------

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for DynamicIVF  index.

Main changes are:

1. New overload of `svs::index::ivf::load_dynamic_ivf_index accpting`
istream is introduced.
2. Added related tests.
3. Minor refactoring and streamlining the logic.

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for Inverted index.

Main changes are:

New overload of svs::index::inverted::assemble_from_clustering accepting
istream is introduced.
Added related tests.

Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
@razdoburdin razdoburdin requested review from ibhati and rfsaliev and removed request for ahuber21, ibhati, mihaic and yuejiaointel March 24, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant