Code review fixes: config centralization, dead code removal, import cleanup, threshold normalization#18
Conversation
…vel, fix threshold scale Co-authored-by: rjurney <42149+rjurney@users.noreply.github.com>
|
/gemini update this PR and resolve conflicts |
|
It looks like you're trying to update the pull request and resolve conflicts. As If you'd like me to review the current state of the pull request, you can use the For a full list of commands and how to use them, please refer to the help message by typing |
|
@copilot update this PR and resolve conflicts |
…h subprocess embedding Co-authored-by: rjurney <42149+rjurney@users.noreply.github.com>
Updated and conflicts resolved in commit The base branch had advanced with subprocess-isolated embedding/FAISS clustering (fixes macOS MPS segfault), a new
Also brought in all non-conflicting base branch additions: |
Initial SERF implementation had several code quality issues identified during review: hardcoded config values scattered across the codebase, unreachable dead code, imports inside CLI functions, and inconsistent evaluation threshold scales.
Changes
Dead code (
pipeline.py): Unreachablereturn pd.read_csv(..., encoding="latin-1")after the encoding-retry loop — Latin-1 never raisesUnicodeDecodeError. Replaced withValueError.Inline imports (
cli/main.py): All serf module imports and stdlib imports (asyncio,random,pandas) were inside Click command bodies. Moved to module top-level per project conventions.Config centralization (
cli/main.py,pipeline.py,analyze/profiler.py,block/pipeline.py,block/embeddings.py,match/matcher.py): Replaced hardcoded"gemini/gemini-2.0-flash", embedding model strings, and numeric defaults (block sizes, iteration counts, concurrency) withconfig.get(). Model CLI options now usedefault=Nonewith runtime resolution (model = model or config.get("models.llm")). Added missinger.convergencesection andmodels.analyze_llmkey toconfig.yml.Threshold scale normalization (
evaluator.py,config.yml):COVERAGE_THRESHOLDandOVERLAP_THRESHOLDwere percentage-scale (0–100) butERROR_THRESHOLDwas fraction-scale (0–1) with a* 100compensating multiplier in the comparison. Normalized all three to percentage scale, removed the* 100multiplier, updatedconfig.ymlvalues to match. Effective thresholds are unchanged.Subprocess-isolated embedding (
block/subprocess_embed.py,block/pipeline.py,pipeline.py): Introduced subprocess isolation for PyTorch embedding and FAISS clustering to avoid MPS/FAISS memory conflicts on macOS. Embedding model updated tointfloat/multilingual-e5-base.Docker support: Added
Dockerfile,docker-compose.yml, and.dockerignorefor containerized execution and benchmarking.💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.