Open
Conversation
Reopening of #275 for developer branch --------- Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR introduce native serialization for DynamicFlat index. Main changes are: 1. `auto_dynamic_assemble` now accepts lazy loader. That is mandatory for buffer-free deserialization. 2. new class `Deserializer' is introduced. It is responsible for conditional reading of overhead data (like names of temporary files) in case of legacy models. 3. `IDTranslator` is refactored to cover save and load to/from stream. --------- Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for `Vamana` index. Main changes are: 1. New overload of `svs::index::vamana::auto_assemble` required for direct deserialization accepts lazy loaders and call them in a flexible order to cover legacy serilized models. 2. `save_table` method is renamed to `metadata` to avoid confusions, as far as it doesn't save anything. 3. Some minor refactoring to streamline the logic. 4. Serialization for `MutableVamana` is also implemented to avoid compilation errors. Deserialization and related tests for `MutableVamana` are expected later. --------- Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com> Co-authored-by: Rafik Saliev <rafik.f.saliev@intel.com>
This PR introduce native serialization for `MutableVamana` index. Main changes are: 1. New overload of svs::index::vamana::auto_dynamic_assemble required for direct deserialization accepts lazy loaders and call them in a flexible order to cover legacy serialized models. 2. The test file tests/svs/index/vamana/dynamic_index.cpp is returned to the test build, as far as it is the right place for serialization tests. 3. Some minor refactoring to streamline the logic. Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR diverse deserialization paths for newly serialized models with a legacy ones. The main changes are: 1. Legacy models are now deserialized with intermediate files, as it was done before. 2. Native deserialization path is cleaned, as far as we don't need support of legacy models for this path. --------- Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for IVF index. Main changes are: 1. New overload of svs::index::ivf::load_ivf_index accpting istream is introduced. 2. Added related tests. 3. save(std::ostream&) method for DynamicIVF is just a placeholder for now. Real implementation is expected later. Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for `MultiMutableVamana` index. Should be merged after: #286 Main changes are: 1. New overload of `svs::index::vamana::auto_multi_dynamic_assemble` required for direct deserialization accepts lazy loaders and call them in a flexible order to cover legacy serialized models. 2. Added related tests. 3. `supports_saving` flag is keeped false, as far as it isn't used. 4. `MultiMutableVamana` doesn't have an orchestrator. So no changes on this side. --------- Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for DynamicIVF index. Main changes are: 1. New overload of `svs::index::ivf::load_dynamic_ivf_index accpting` istream is introduced. 2. Added related tests. 3. Minor refactoring and streamlining the logic. Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This PR introduce native serialization for Inverted index. Main changes are: New overload of svs::index::inverted::assemble_from_clustering accepting istream is introduced. Added related tests. Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds native stream serialization to all SVS index types, as an alternative to the existing (legacy) directory-based serialization. It allow to avoid filesystem round-trips of the data. The native serialization doesn't require from the stream to be seek able, so no additional restrictions were introduced.
See the following PR for details: #280, #281, #285, #286, #289, #292, #294, #296, #299
Main changes are:
Archiverextracts binary I/O primitives (write_size,read_size,write_name,read_name,read_from_istream) fromDirectoryArchiver.DirectoryArchiverand newStreamArchiverclass inherit fromArchiver.StreamArchiverhas its own magic number ("SVS_STRM") to distinguish native streams from directory archives.Writeris split via CRTP with two derived classes:FileWriterowns anstd::ofstream, writes a header, flushes on destructor,StreamWriterwraps an externalstd::ostream&, no header/lifecycle management. This allowsio::save(data, os)to write vector data directly to any stream.save(stream)in orchestratorImplclasses no longer does temp-dir->pack. Instead it directly callsimpl().save(stream).Deserializer::build(is)reads the magic number, exposesis_native()to choose path.