Skip to content

store: Consolidate index creation using CreateIndex and add postponed index support#6434

Open
lutter wants to merge 12 commits intomasterfrom
lutter/create-index
Open

store: Consolidate index creation using CreateIndex and add postponed index support#6434
lutter wants to merge 12 commits intomasterfrom
lutter/create-index

Conversation

@lutter
Copy link
Collaborator

@lutter lutter commented Mar 11, 2026

Currently, GRAPH_POSTPONE_ATTRIBUTE_INDEX_CREATION when copying subgraphs; but subgraphs whose performance is limited by the speed at which we can write to the database, such as amp subgraphs, will also benefit from deferring index creation.

Besides postponing index creation for syncing subgraphs, this PR also refactors how indexes are created for subgraph tables, replacing scattered raw SQL generation with a unified CreateIndex abstraction.

This PR will also be the basis for speeding up graphman restore by having it defer index creation until after the data import.

Index creation consolidation

  • Introduce Table::indexes() returning all indexes (time-travel, attribute, aggregate) as structured CreateIndex objects instead of raw SQL strings
  • Parse and round-trip our own index definitions including BRIN indexes with minmax_multi_ops operator classes and various WHERE clauses
  • Remove the index_def: Option<IndexList> parameter threading and simplify callers across copy.rs, prune.rs, and deployment_store.rs
  • Add a create_index example tool for testing index definition parsing

Postponed index creation

  • Allow deferring index creation during initial sync via CreateIndex::to_postpone(), controlled by the GRAPH_POSTPONE_INDEXES env var
  • Trigger creation of postponed indexes when a subgraph gets within a configurable number of blocks of the chain head (GRAPH_POSTPONE_INDEXES_CREATION_THRESHOLD, default 10000)
  • Re-create any missing postponed indexes on subgraph restart as a safety net, using IF NOT EXISTS + CONCURRENTLY

Test plan

  • Unit tests pass (just test-unit)
  • DDL test constants updated to match new single-line index format
  • Verify postponed index creation triggers correctly near chain head
  • Verify indexes are recreated on subgraph restart

lutter added 12 commits March 10, 2026 17:29
Move some of the responsibility to the caller to make it clearer what is
being used
…able

All callers used the name of the table that was passed in, except for the
use in `copy.rs`, but copying doesn't change table names, so the src and
dst names of tables are the same if they exist in the dst
Also, test some variations of the same index definition
Consolidate index creation into a single `Table::indexes()` method that
returns all indexes (time-travel, attribute, aggregate) as `CreateIndex`
objects. This replaces the old string-based methods and eliminates the
`index_def: Option<IndexList>` parameter threading through the codebase.

Key changes:
- Add `Table::indexes()` combining time_travel + attribute + aggregate indexes
- Add `attr_index_spec()` and `add_attribute_indexes()` structured helpers
- Move env var check into `CreateIndex::to_postpone()` so callers need not check
- Simplify `Table::as_ddl()` to iterate indexes with postpone filtering
- Remove old `create_time_travel_indexes`, `create_attribute_indexes`,
  `create_postponed_indexes`, `create_aggregate_indexes` string methods
- Remove `index_def` parameter from Layout, DeploymentStore, SubgraphStore
- Update copy.rs to use `indexes()` + `references_column_not_in()` for new fields
- Update prune.rs to use simplified `as_ddl()` without index_def
- Update all DDL test constants for new single-line index format
Add a trigger that creates postponed indexes when a subgraph gets
within a configurable number of blocks (default 10000) of the chain
head. This ensures indexes are in place before the subgraph starts
serving queries.

The new env var GRAPH_POSTPONE_INDEXES_CREATION_THRESHOLD controls
how many blocks before the chain head to trigger index creation. The
creation is idempotent (IF NOT EXISTS + CONCURRENTLY) and only
attempted once per subgraph run via an AtomicBool guard.
Replace the IndexList-based `recreate_invalid_indexes` call in
`start_subgraph()` with a call to `create_postponed_indexes()`. This
uses `IF NOT EXISTS` and `CONCURRENTLY` to safely create any missing
postponed indexes on every restart, acting as a safety net.

Remove the now-unused `IndexList::recreate_invalid_indexes` method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant