Semantic search for Git commit history, powered by TurboQuant vector compression (ICLR 2026).
Stop searching by keywords. Search by meaning.
# Current: keyword matching only
git log --grep="memory leak" # Only finds commits with exact text "memory leak"
# Misses: "fix kfree_skb double free"
# Misses: "plug UAF in reset path"
# Misses: "resolve dangling pointer"# CommitMind: semantic search
commitmind search "memory leak"
# >> #1 [0.94] a3f2c1d Fix kfree_skb double free in netfilter
# >> #2 [0.91] b7e4a2f Plug use-after-free in device reset path
# >> #3 [0.87] c9d1b3e Resolve dangling pointer in slab allocatorCommitMind understands the meaning of your query and finds semantically related commits - even when the exact words don't match.
Git commits --> Sentence embeddings --> TurboQuant compression --> Semantic search
(all-MiniLM-L6-v2) (7.6x compression) (asymmetric scoring)
- Extract commit messages + file change metadata from git history
- Embed each commit into a 384-dimensional vector (local model, no API needed)
- Compress vectors with TurboQuant (Google's ICLR 2026 algorithm) - 87% memory savings
- Search using asymmetric inner-product estimation (no decompression needed)
pip install commitmindOr install from source:
git clone https://github.com/wjddusrb03/commitmind.git
cd commitmind
pip install -e ".[dev]"# 1. Index your repository
cd your-project
commitmind index
# Output:
# Indexing complete!
# > 3,842 commits indexed
# > Compressed: 18.2 MB -> 2.4 MB (7.6x)
# > Saved to .commitmind/index.pkl
# 2. Search by meaning
commitmind search "authentication bug fix"
# 3. View stats
commitmind stats| Command | Description |
|---|---|
commitmind index |
Index commits with TurboQuant compression |
commitmind search "query" |
Semantic search over commits |
commitmind stats |
Show index statistics |
commitmind update |
Add new commits to existing index |
# Index with options
commitmind index --max-commits 1000 # Limit to recent 1000 commits
commitmind index --branch main # Index specific branch
commitmind index --bits 2 # Use 2-bit quantization (more compression)
# Search with options
commitmind search "query" -k 10 # Return top 10 results- New team member: "What authentication changes were made recently?"
- Bug tracking: "Find commits related to network timeout issues"
- Security audit: "Show all SQL injection related fixes"
- Code archaeology: Search Linux kernel's 1M+ commits by meaning
- Cross-language: Search English commits with Korean queries (and vice versa)
Thanks to TurboQuant compression:
| Commits | Uncompressed | CommitMind | Savings |
|---|---|---|---|
| 1,000 | 1.5 MB | 0.2 MB | 87% |
| 10,000 | 15 MB | 2.0 MB | 87% |
| 100,000 | 150 MB | 20 MB | 87% |
| 1,000,000 | 1.5 GB | 200 MB | 87% |
CommitMind uses TurboQuant (Google Research, ICLR 2026):
- PolarQuant: Random orthogonal rotation + Lloyd-Max scalar quantization (3-bit)
- QJL: Quantized Johnson-Lindenstrauss residual correction (1-bit)
- Asymmetric scoring: Compute similarity WITHOUT decompressing vectors
This achieves ~7.6x compression with minimal accuracy loss.
- Python 3.9+
- Git repository
- CPU only (no GPU required)
- ~500 MB disk for embedding model (downloaded once)
Issues and pull requests are welcome! If you find a bug or have suggestions, please open an issue.
MIT License
If you use CommitMind in your research:
@software{commitmind2026,
title={CommitMind: Semantic Git Commit Search with TurboQuant Compression},
author={wjddusrb03},
year={2026},
url={https://github.com/wjddusrb03/commitmind}
}- langchain-turboquant - LangChain VectorStore with TurboQuant compression
- TurboQuant paper - Original ICLR 2026 paper by Google Research