Datassert is a high-performance CLI for building a DuckDB-backed assertion store from Babel export files, with a focus on fast local builds and simple command-driven workflows.
# Install CLI from GitHub
go install github.com/SkyeAv/datassert@latest
# Verify install
datassert --help# Build a Datassert database from Babel exports
datassert build --babel-dir /path/to/babel| Flag | Required | Default | Description |
|---|---|---|---|
--babel-dir |
Yes | N/A | Directory containing Babel *Class.ndjson.zst and *Synonyms.ndjson.zst files |
--db-path |
No | ./datassert.duckdb |
Output path for the DuckDB database |
--batch-size |
No | 1000000 |
Number of records written per Parquet batch |
--babel-diris scanned for files matching*Class.ndjson.zstand*Synonyms.ndjson.zst.- File matching is non-recursive (top-level of the provided directory).
- Staging Parquet files are written to
./.parquet-store/. - Final DuckDB database is written to
--db-path. - Build creates and loads
SOURCES,CATEGORIES,CURIES, andSYNONYMS, then indexes/sorts synonyms for query performance.
# Use defaults for db path and batch size
datassert build --babel-dir ./babel-exports
# Write database to a custom location
datassert build --babel-dir ./babel-exports --db-path ./data/datassert.duckdb
# Tune Parquet batch size
datassert build --babel-dir ./babel-exports --batch-size 500000- Displays progress bars for class and synonym processing phases.
- Uses CPU-based concurrency (
NumCPU()/2for class processing andNumCPU()/4for synonym processing).
Skye Lane Goetz