Skip to content

feat(tl): add ancestral_linkage#52

Open
colganwi wants to merge 2 commits intomainfrom
feat/ancestral-linkage
Open

feat(tl): add ancestral_linkage#52
colganwi wants to merge 2 commits intomainfrom
feat/ancestral-linkage

Conversation

@colganwi
Copy link
Collaborator

Summary

  • Adds tl.ancestral_linkage to measure how closely related cells of different categories are on the lineage tree
  • Pairwise mode (target=None): computes a category × category linkage matrix stored in tdata.uns['{key}_linkage']
  • Single-target mode (target=<cat>): computes per-cell distance to the nearest cell of the given category, stored in tdata.obs['{target}_linkage']
  • Supports metric='path' (branch-length path distance) and metric='lca' (LCA depth)
  • Optional test='permutation' with parallel fork-based workers (n_threads)
  • Optional symmetrize for the pairwise matrix
  • by_tree=True adds per-tree breakdowns in the stats table
  • Adds tqdm as a package dependency (used for permutation progress)

Test plan

  • Run conda run -n pycea python -m pytest tests/test_ancestral_linkage.py — 35 tests covering pairwise/single-target modes, known values, symmetrization, permutation tests, parallel execution, and edge cases

🤖 Generated with Claude Code

…atedness

Computes pairwise or single-target linkage scores between cell categories
using path distance or LCA depth on the lineage tree. Supports permutation
testing, parallel execution (fork-based), symmetrization, and per-tree stats.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 97.88732% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.51%. Comparing base (5725107) to head (f778001).

Files with missing lines Patch % Lines
src/pycea/tl/ancestral_linkage.py 97.87% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   92.91%   93.51%   +0.60%     
==========================================
  Files          34       35       +1     
  Lines        2554     2838     +284     
==========================================
+ Hits         2373     2654     +281     
- Misses        181      184       +3     
Files with missing lines Coverage Δ
src/pycea/tl/__init__.py 100.00% <100.00%> (ø)
src/pycea/tl/ancestral_linkage.py 97.87% <97.87%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a6a745aaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


# Choose strategy: Dijkstra handles the natural "closest" direction for each metric
is_named = isinstance(aggregate, str)
use_dijkstra = is_named and ((aggregate == "min" and metric == "path") or (aggregate == "max" and metric == "lca"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Compute lca+max scores from all targets, not nearest path

Routing aggregate='max' with metric='lca' through the Dijkstra shortcut is not generally correct: Dijkstra picks the target leaf with minimum path distance, but maximizing LCA depth depends on both path length and target depth ((d_src + d_tgt - path)/2). When leaves have unequal depths (non-ultrametric trees), the best LCA target can be farther by path, so this branch underestimates linkage in pairwise, single-target, and permutation computations.

Useful? React with 👍 / 👎.

Comment on lines +83 to +86
if "tree_distances" in tdata.obsp:
D = tdata.obsp["tree_distances"]
if isinstance(D, np.ndarray):
precomputed = D

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate cached tree_distances before reusing

The all-pairs path reuses tdata.obsp['tree_distances'] whenever it exists as a dense array, but it never verifies that this cache was computed with the same metric, depth_key, or tree selection. If users previously ran tree_distance with different parameters, this function silently consumes stale distances and returns incorrect linkage values for mean/max/custom aggregates.

Useful? React with 👍 / 👎.

Comment on lines +192 to +193
else: # min
sym = np.minimum(arr, arr_T)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise on invalid symmetrize mode

Unknown symmetrize values currently fall into the else branch and are treated as 'min', so a typo (for example, 'meen') silently changes analysis output instead of failing fast. This makes results hard to trust because invalid user input produces a valid-looking but incorrect matrix.

Useful? React with 👍 / 👎.

…d permutation test

Adds alternative='two-sided' to support two-tailed p-values. Default None
preserves existing one-sided behavior (more-related direction).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant