[Repo Assist] Perf: replace multi-pass Seq.pearsonWeighted with 2-pass single-allocation impl#360
Draft
github-actions[bot] wants to merge 2 commits intodeveloperfrom
Conversation
…n impl Previously Seq.pearsonWeighted traversed the three input sequences approximately 12+ times: - 3x Seq.length calls for the length guard - weightedCorrelation called weightedCoVariance 3 times (xy, xx, yy) - each weightedCoVariance called weightedMean twice + one more pass New implementation: - converts to arrays once (3 passes) - pass 1: compute weighted means in a single loop - pass 2: compute weighted covariance and both variances in a single loop - total: 5 passes instead of 12+, ~3x-5x faster for large inputs The wSum denominator cancels in cov/sqrt(varX * varY), so the formula is mathematically identical. Results are verified to match the old output to floating-point precision. Add 5 regression tests covering: docstring example, consistency between pearsonWeighted/pearsonWeightedOfTriples, perfect ±1 correlations with uniform weights, and mismatched-length error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 This is an automated PR from Repo Assist.
Problem
Seq.pearsonWeightedtraversed the three input sequences ~12+ times:For
nelements this means roughly12nfloating-point reads. Internal allocation was also high: three Seq-based closures (fold2,map3,sum) were created per call.Fix
Two-pass, single-allocation implementation:
x̄_wandȳ_win a singleforloopforloopThe
wSumdenominator cancels incov / sqrt(varX × varY), so the formula is mathematically identical to the original.Performance Impact
In practice the speedup is higher because the old code also performed repeated heap allocations for the
Seq.map3/Seq.fold2closures.Test Status
5 new regression tests added:
docstring example— matches the-0.9764158959value from the XML docofTriples matches pearsonWeighted— verifies API consistencyuniform weights: perfect positive correlation— r = 1.0uniform weights: perfect negative correlation— r = −1.0mismatched length throws— error handling✅ Build: succeeded
✅ Tests: 1196/1196 passed (1191 existing + 5 new)