Skip to content

Support optionally_keyed_by("foo", dict, use_msgspec=True)#918

Open
ahal wants to merge 4 commits intotaskcluster:mainfrom
ahal:ahal/push-wqqyrtqxppyt
Open

Support optionally_keyed_by("foo", dict, use_msgspec=True)#918
ahal wants to merge 4 commits intotaskcluster:mainfrom
ahal:ahal/push-wqqyrtqxppyt

Conversation

@ahal
Copy link
Collaborator

@ahal ahal commented Mar 10, 2026

This works around msgspec's limitation where unions can't contain more than one dict.

ahal added 3 commits March 10, 2026 14:14
This tests a broader scope of code and will allow us to catch
regressions in refactors that change the bigger picture.
This is currently an expected fail due to msgspec limitations.
This value isn't used anywhere, we just want to make sure msgspec
doesn't raise an exception. Simplify things a bit.
@ahal ahal self-assigned this Mar 10, 2026
@ahal ahal requested a review from a team as a code owner March 10, 2026 19:25
@ahal ahal requested a review from jcristau March 10, 2026 19:25
@ahal ahal force-pushed the ahal/push-wqqyrtqxppyt branch 2 times, most recently from 8fea0a3 to 1226ac0 Compare March 10, 2026 20:47
@ahal ahal force-pushed the ahal/push-wqqyrtqxppyt branch from 1226ac0 to 6399170 Compare March 10, 2026 20:55
Copy link
Contributor

@jcristau jcristau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too meta for my brain I'm afraid I don't understand the problem or fix :(

Is there a msgspec issue about this, or is this something that can't/won't be fixed there?

@ahal
Copy link
Collaborator Author

ahal commented Mar 11, 2026

From: https://jcristharif.com/msgspec/supported-types.html#union-optional

Type unions are supported, with a few restrictions. These restrictions are in place to remove any ambiguity during decoding - given an encoded value there must always be a single type in a given typing.Union that can decode that value.

  • Unions may contain at most one type that encodes to an object (dict, ...)

So yeah, probably won't be fixed in msgspec.

Problem

So initially our plan was to convert optionally_keyed_by("foo", <type>) into this type expression:

Union[dict[Literal("by-foo"), dict[str, <type>]], <type>]

Which worked great until we ran into a schema that used optionally_keyed_by("foo", dict[str, str]). This resulted in a union with multiple dicts and msgspec raised an error.

Solution

I'm not surprised this patch breaks your brain, as Abhishek and I have spent literal days trying to solve it. At a high level, the idea is that instead of validating the above type expression all at once, we break it down into two parts. First we validate the outer by-foo dict (if it exists). Then we validate the inner <type> after the fact.

This way we can avoid the scenario that resulted in Union[dict. dict]. Had I known about this limitation from msgspec from the outset, I might have pushed for using something else.. but in the end I don't think this patch is too bad.

Lmk if that makes sense or if there's any parts you want better comments.

Comment on lines +293 to +297
with pytest.raises(msgspec.ValidationError):
TestSchema.validate({"field": {"by-bar": "a"}})

# Inner dict values are Any, so mixed types are accepted
result = TestSchema.validate({"field": {"by-foo": {"a": 1, "c": "d"}}})
assert result.field == {"by-foo": {"a": 1, "c": "d"}}
with pytest.raises(msgspec.ValidationError):
TestSchema.validate({"field": {"by-bar": {1: "b"}}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be changing what's being tested, and unrelated to the commit message?

if use_msgspec:
# msgspec implementation - return type hints
# msgspec implementation - use Annotated[Any, OptionallyKeyedBy]
_type = arguments[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by-nit: might want to make this not raise IndexError if there's no arguments

Comment on lines +353 to +361
# First validate the outer keyed-by dict
bykeys = UnionTypes(*[Literal[f"by-{field}"] for field in keyed_by.fields])
msgspec.convert(obj, dict[bykeys, dict[str, Any]])

# Next validate each inner value against the wrapped type
for pattern, value in keyed_by_dict.items():
msgspec.convert(pattern, str)
msgspec.convert(value, keyed_by.wrapped_type)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions here:

  • why do we force the keys to be str?
  • this doesn't seem to handle things like optionally_keyed_by("foo", "bar", str) and nested "by-foo" + "by-bar"? By contrast the voluptuous implementation's validator function does recurse.

TestSchema.validate({"field": {"by-bar": {"a": "b"}}})


def test_optionally_keyed_by_mulitple_keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "multiple"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants