Add structured metadata filtering to CZI dataset search

Follow-up to #4 — thanks @hansen7 for the duplicate-output fix (055305e). The parsing error is resolved, but the underlying relevance problem persists: when the agent calls `search_czi_datasets({"query": "lung, Mus musculus", "n_datasets": 5})`, the top results are embryo and skin datasets (similarity ~0.77) rather than lung datasets. The root cause is twofold. First, the function's docstring says "input is a string containing: tissue, condition, and organism," which tells the agent to pack everything into the `query` parameter — so the `organism` and `tissue` filter parameters that already exist in the signature are never actually used. Second, even when those filters are passed, they are silently skipped if the filtered set has fewer than `n_datasets` rows, with no warning or relaxation strategy, so the caller has no idea filtering was dropped.

**Proposed fix:** (1) Update the docstring to explicitly instruct the agent to pass `organism` and `tissue` as separate parameters when the query contains those constraints, so the existing filter logic actually gets invoked. (2) When strict filtering returns fewer than `n_datasets` rows, apply controlled relaxation (e.g., drop `tissue` filter but keep `organism`, then fall back to unfiltered) and include a warning in the output so the agent knows the results are broader than requested. This keeps the current embedding-ranking approach intact but ensures structured metadata is used as a hard filter first. Happy to open a PR for this if it sounds reasonable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add structured metadata filtering to CZI dataset search #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add structured metadata filtering to CZI dataset search #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions