feat: Add FFI_TableProviderFactory support#1396
Conversation
0b226d0 to
97f49af
Compare
f6a87ae to
0fe60b4
Compare
This wraps the new FFI_TableProviderFactory APIs in datafusion-ffi.
0fe60b4 to
9e77535
Compare
There was a problem hiding this comment.
Pull request overview
Adds Python/Rust bindings for DataFusion’s TableProviderFactory via the FFI_TableProviderFactory, enabling custom CREATE EXTERNAL TABLE ... STORED AS <format> behavior from Python.
Changes:
- Add
PySessionContext::register_table_factoryRust API to registerTableProviderFactoryimplementations via FFI capsules. - Add Python-facing
SessionContext.register_table_factory()wrapper andTableProviderFactoryExportableprotocol type hint. - Extend the
datafusion-ffi-examplewith a table provider factory implementation and a corresponding example test.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/context.rs |
Adds Rust binding to register an FFI-backed TableProviderFactory into SessionState table factories. |
python/datafusion/context.py |
Exposes SessionContext.register_table_factory() and documents the new API. |
python/datafusion/catalog.py |
Introduces TableProviderFactoryExportable protocol for typing/FFI export contract. |
examples/datafusion-ffi-example/src/table_provider_factory.rs |
Implements an example TableProviderFactory and exports it via PyCapsule. |
examples/datafusion-ffi-example/src/lib.rs |
Registers the new example class with the example Python module. |
examples/datafusion-ffi-example/python/tests/_test_table_provider_factory.py |
Adds an example-level test exercising CREATE EXTERNAL TABLE with a registered factory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def register_table_factory( | ||
| self, format: str, factory: TableProviderFactoryExportable | ||
| ) -> None: | ||
| """Register a :py:class:`~datafusion.TableProviderFactoryExportable`. | ||
|
|
||
| The registered factory can be reference from SQL DDL statements executed | ||
| against this context. | ||
|
|
||
| Args: | ||
| format: The value to be used in `STORED AS ${format}` clause. | ||
| factory: A PyCapsule that implements TableProviderFactoryExportable" | ||
| """ | ||
| self.ctx.register_table_factory(format, factory) |
There was a problem hiding this comment.
No core test coverage is added for SessionContext.register_table_factory. The new test is under examples/..., but the main pytest testpaths (pyproject.toml) only include python/tests and python/datafusion, so this behavior likely won’t run in CI. Consider adding a unit/integration test under python/tests that registers a factory and exercises CREATE EXTERNAL TABLE end-to-end.
| let capsule = factory | ||
| .getattr("__datafusion_table_provider_factory__")? | ||
| .call1((codec_capsule,))?; | ||
| let capsule = capsule.cast::<PyCapsule>().map_err(py_datafusion_err)?; |
There was a problem hiding this comment.
register_table_factory always calls factory.__datafusion_table_provider_factory__ and therefore fails if the caller passes an already-exported PyCapsule (unlike register_catalog_provider_list / register_catalog_provider, which accept either an exportable object or a capsule). Consider mirroring the existing pattern: if the object has the export method, call it with the codec capsule; otherwise, try casting directly to PyCapsule and validate it.
| let capsule = factory | |
| .getattr("__datafusion_table_provider_factory__")? | |
| .call1((codec_capsule,))?; | |
| let capsule = capsule.cast::<PyCapsule>().map_err(py_datafusion_err)?; | |
| // Support both exportable factory objects and already-exported PyCapsules, | |
| // mirroring the pattern used by catalog registration functions. | |
| let capsule = if factory.hasattr("__datafusion_table_provider_factory__")? { | |
| let exporter = factory.getattr("__datafusion_table_provider_factory__")?; | |
| let capsule_obj = exporter.call1((codec_capsule,))?; | |
| capsule_obj.cast::<PyCapsule>().map_err(py_datafusion_err)? | |
| } else { | |
| factory.cast::<PyCapsule>().map_err(py_datafusion_err)? | |
| }; |
| The registered factory can be reference from SQL DDL statements executed | ||
| against this context. | ||
|
|
||
| Args: | ||
| format: The value to be used in `STORED AS ${format}` clause. | ||
| factory: A PyCapsule that implements TableProviderFactoryExportable" |
There was a problem hiding this comment.
Docstring issues: “can be reference” should be “can be referenced”, and the factory arg description is inaccurate (the method takes an object implementing TableProviderFactoryExportable, not a PyCapsule) and includes a stray trailing quote. Please update the docstring to match the actual expected input and fix the typo/quote.
| The registered factory can be reference from SQL DDL statements executed | |
| against this context. | |
| Args: | |
| format: The value to be used in `STORED AS ${format}` clause. | |
| factory: A PyCapsule that implements TableProviderFactoryExportable" | |
| The registered factory can be referenced from SQL DDL statements executed | |
| against this context. | |
| Args: | |
| format: The value to be used in `STORED AS ${format}` clause. | |
| factory: An object implementing :class:`TableProviderFactoryExportable`. |
| foo | ||
| STORED AS my_format | ||
| LOCATION ''; | ||
| """) |
There was a problem hiding this comment.
CREATE EXTERNAL TABLE is executed via a DataFrame returned from ctx.sql(...); in this codebase DDL statements require .collect() to actually apply side effects (see e.g. python/tests/test_expr.py where CREATE TABLE ... is followed by .collect()). Without collecting here, the external table may never be created before the subsequent SELECT runs.
| """) | |
| """).collect() |
| import pyarrow as pa | ||
| import pytest |
There was a problem hiding this comment.
Unused imports pyarrow as pa and pytest will be flagged by the repo's Ruff configuration for examples/* (no per-file ignore for F401). Please remove them or use them in the test.
| import pyarrow as pa | |
| import pytest |
Which issue does this PR close?
Closes #1393
Rationale for this change
This PR wraps the new FFI_TableProviderFactory to support custom
CREATE EXTERNAL TABLEstatements.What changes are included in this PR?
Mostly just the new wrappers and additions to the FFI example. Though there are two temporary commits to account for using the latest version of datafusion and the necessary updates for that to work.
Are there any user-facing changes?
There's now a
TableProviderFactoryExportableprotocol for use in the newContext::register_table_factoryAPI.