CommitMind

Semantic search for Git commit history, powered by TurboQuant vector compression (ICLR 2026).

Stop searching by keywords. Search by meaning.

The Problem

# Current: keyword matching only
git log --grep="memory leak"     # Only finds commits with exact text "memory leak"
                                  # Misses: "fix kfree_skb double free"
                                  # Misses: "plug UAF in reset path"
                                  # Misses: "resolve dangling pointer"

The Solution

# CommitMind: semantic search
commitmind search "memory leak"
# >> #1 [0.94] a3f2c1d  Fix kfree_skb double free in netfilter
# >> #2 [0.91] b7e4a2f  Plug use-after-free in device reset path
# >> #3 [0.87] c9d1b3e  Resolve dangling pointer in slab allocator

CommitMind understands the meaning of your query and finds semantically related commits - even when the exact words don't match.

How It Works

Git commits --> Sentence embeddings --> TurboQuant compression --> Semantic search
                (all-MiniLM-L6-v2)      (7.6x compression)       (asymmetric scoring)

Extract commit messages + file change metadata from git history
Embed each commit into a 384-dimensional vector (local model, no API needed)
Compress vectors with TurboQuant (Google's ICLR 2026 algorithm) - 87% memory savings
Search using asymmetric inner-product estimation (no decompression needed)

Installation

pip install commitmind

Or install from source:

git clone https://github.com/wjddusrb03/commitmind.git
cd commitmind
pip install -e ".[dev]"

Quick Start

# 1. Index your repository
cd your-project
commitmind index

# Output:
# Indexing complete!
#   > 3,842 commits indexed
#   > Compressed: 18.2 MB -> 2.4 MB (7.6x)
#   > Saved to .commitmind/index.pkl

# 2. Search by meaning
commitmind search "authentication bug fix"

# 3. View stats
commitmind stats

CLI Commands

Command	Description
`commitmind index`	Index commits with TurboQuant compression
`commitmind search "query"`	Semantic search over commits
`commitmind stats`	Show index statistics
`commitmind update`	Add new commits to existing index

Options

# Index with options
commitmind index --max-commits 1000    # Limit to recent 1000 commits
commitmind index --branch main         # Index specific branch
commitmind index --bits 2              # Use 2-bit quantization (more compression)

# Search with options
commitmind search "query" -k 10        # Return top 10 results

Use Cases

New team member: "What authentication changes were made recently?"
Bug tracking: "Find commits related to network timeout issues"
Security audit: "Show all SQL injection related fixes"
Code archaeology: Search Linux kernel's 1M+ commits by meaning
Cross-language: Search English commits with Korean queries (and vice versa)

Memory Efficiency

Thanks to TurboQuant compression:

Commits	Uncompressed	CommitMind	Savings
1,000	1.5 MB	0.2 MB	87%
10,000	15 MB	2.0 MB	87%
100,000	150 MB	20 MB	87%
1,000,000	1.5 GB	200 MB	87%

How TurboQuant Works

CommitMind uses TurboQuant (Google Research, ICLR 2026):

PolarQuant: Random orthogonal rotation + Lloyd-Max scalar quantization (3-bit)
QJL: Quantized Johnson-Lindenstrauss residual correction (1-bit)
Asymmetric scoring: Compute similarity WITHOUT decompressing vectors

This achieves ~7.6x compression with minimal accuracy loss.

Requirements

Python 3.9+
Git repository
CPU only (no GPU required)
~500 MB disk for embedding model (downloaded once)

Contributing

Issues and pull requests are welcome! If you find a bug or have suggestions, please open an issue.

License

MIT License

Citation

If you use CommitMind in your research:

@software{commitmind2026,
  title={CommitMind: Semantic Git Commit Search with TurboQuant Compression},
  author={wjddusrb03},
  year={2026},
  url={https://github.com/wjddusrb03/commitmind}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/commitmind		src/commitmind
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_KO.md		README_KO.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CommitMind

The Problem

The Solution

How It Works

Installation

Quick Start

CLI Commands

Options

Use Cases

Memory Efficiency

How TurboQuant Works

Requirements

Contributing

License

Citation

Related

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CommitMind

The Problem

The Solution

How It Works

Installation

Quick Start

CLI Commands

Options

Use Cases

Memory Efficiency

How TurboQuant Works

Requirements

Contributing

License

Citation

Related

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages