Skip to content

Code review fixes: config centralization, dead code removal, import cleanup, threshold normalization#18

Draft
Copilot wants to merge 3 commits intocursor/serf-long-shot-plan-system-b0d4from
copilot/sub-pr-17
Draft

Code review fixes: config centralization, dead code removal, import cleanup, threshold normalization#18
Copilot wants to merge 3 commits intocursor/serf-long-shot-plan-system-b0d4from
copilot/sub-pr-17

Conversation

Copy link

Copilot AI commented Mar 8, 2026

Initial SERF implementation had several code quality issues identified during review: hardcoded config values scattered across the codebase, unreachable dead code, imports inside CLI functions, and inconsistent evaluation threshold scales.

Changes

  • Dead code (pipeline.py): Unreachable return pd.read_csv(..., encoding="latin-1") after the encoding-retry loop — Latin-1 never raises UnicodeDecodeError. Replaced with ValueError.

  • Inline imports (cli/main.py): All serf module imports and stdlib imports (asyncio, random, pandas) were inside Click command bodies. Moved to module top-level per project conventions.

  • Config centralization (cli/main.py, pipeline.py, analyze/profiler.py, block/pipeline.py, block/embeddings.py, match/matcher.py): Replaced hardcoded "gemini/gemini-2.0-flash", embedding model strings, and numeric defaults (block sizes, iteration counts, concurrency) with config.get(). Model CLI options now use default=None with runtime resolution (model = model or config.get("models.llm")). Added missing er.convergence section and models.analyze_llm key to config.yml.

  • Threshold scale normalization (evaluator.py, config.yml): COVERAGE_THRESHOLD and OVERLAP_THRESHOLD were percentage-scale (0–100) but ERROR_THRESHOLD was fraction-scale (0–1) with a * 100 compensating multiplier in the comparison. Normalized all three to percentage scale, removed the * 100 multiplier, updated config.yml values to match. Effective thresholds are unchanged.

  • Subprocess-isolated embedding (block/subprocess_embed.py, block/pipeline.py, pipeline.py): Introduced subprocess isolation for PyTorch embedding and FAISS clustering to avoid MPS/FAISS memory conflicts on macOS. Embedding model updated to intfloat/multilingual-e5-base.

  • Docker support: Added Dockerfile, docker-compose.yml, and .dockerignore for containerized execution and benchmarking.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…vel, fix threshold scale

Co-authored-by: rjurney <42149+rjurney@users.noreply.github.com>
Copilot AI changed the title [WIP] Update Serf initial implementation for Semantic Entity Resolution Framework Code review fixes: config centralization, dead code removal, import cleanup, threshold normalization Mar 9, 2026
@rjurney
Copy link
Contributor

rjurney commented Mar 9, 2026

/gemini update this PR and resolve conflicts

@gemini-code-assist
Copy link

It looks like you're trying to update the pull request and resolve conflicts. As gemini-code-assist, I am an AI assistant and cannot directly perform actions like updating a pull request or resolving conflicts. My capabilities are limited to providing code reviews and summaries.

If you'd like me to review the current state of the pull request, you can use the /gemini review command. If you want a summary, use /gemini summary.

For a full list of commands and how to use them, please refer to the help message by typing /gemini help.

@rjurney
Copy link
Contributor

rjurney commented Mar 9, 2026

@copilot update this PR and resolve conflicts

…h subprocess embedding

Co-authored-by: rjurney <42149+rjurney@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 9, 2026

@copilot update this PR and resolve conflicts

Updated and conflicts resolved in commit 6b9a748.

The base branch had advanced with subprocess-isolated embedding/FAISS clustering (fixes macOS MPS segfault), a new analyze_llm config key, updated embedding model (multilingual-e5-base), Docker support, and several smaller fixes. Here's how conflicts were resolved:

  • config.yml: Merged base's new analyze_llm key and embedding model change with our er.convergence section and normalized eval thresholds (percentages)
  • pipeline.py: Adopted base's subprocess embedding approach, kept our dead-code fix and config.get() defaults, adopted base's model: str | None = None pattern
  • cli/main.py: Kept our top-level imports, adopted base's default=None for model options with model = model or config.get("models.llm") at runtime
  • profiler.py: Kept our top-level config import, adopted base's model: str | None = None with runtime lookup

Also brought in all non-conflicting base branch additions: subprocess_embed.py, updated block/pipeline.py, Docker files, pyproject.toml, and doc updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants