Skip to content

feat: Add FFI_TableProviderFactory support#1396

Open
davisp wants to merge 1 commit intoapache:mainfrom
davisp:feat/ffi-table-provider-factory
Open

feat: Add FFI_TableProviderFactory support#1396
davisp wants to merge 1 commit intoapache:mainfrom
davisp:feat/ffi-table-provider-factory

Conversation

@davisp
Copy link
Member

@davisp davisp commented Feb 25, 2026

Which issue does this PR close?

Closes #1393

Rationale for this change

This PR wraps the new FFI_TableProviderFactory to support custom CREATE EXTERNAL TABLE statements.

What changes are included in this PR?

Mostly just the new wrappers and additions to the FFI example. Though there are two temporary commits to account for using the latest version of datafusion and the necessary updates for that to work.

Are there any user-facing changes?

There's now a TableProviderFactoryExportable protocol for use in the new Context::register_table_factory API.

@davisp davisp force-pushed the feat/ffi-table-provider-factory branch 2 times, most recently from 0b226d0 to 97f49af Compare February 25, 2026 19:28
@davisp davisp force-pushed the feat/ffi-table-provider-factory branch 4 times, most recently from f6a87ae to 0fe60b4 Compare March 10, 2026 18:47
This wraps the new FFI_TableProviderFactory APIs in datafusion-ffi.
@davisp davisp force-pushed the feat/ffi-table-provider-factory branch from 0fe60b4 to 9e77535 Compare March 10, 2026 18:57
@davisp davisp marked this pull request as ready for review March 10, 2026 19:28
@timsaucer timsaucer requested review from Copilot and timsaucer March 11, 2026 05:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Python/Rust bindings for DataFusion’s TableProviderFactory via the FFI_TableProviderFactory, enabling custom CREATE EXTERNAL TABLE ... STORED AS <format> behavior from Python.

Changes:

  • Add PySessionContext::register_table_factory Rust API to register TableProviderFactory implementations via FFI capsules.
  • Add Python-facing SessionContext.register_table_factory() wrapper and TableProviderFactoryExportable protocol type hint.
  • Extend the datafusion-ffi-example with a table provider factory implementation and a corresponding example test.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/context.rs Adds Rust binding to register an FFI-backed TableProviderFactory into SessionState table factories.
python/datafusion/context.py Exposes SessionContext.register_table_factory() and documents the new API.
python/datafusion/catalog.py Introduces TableProviderFactoryExportable protocol for typing/FFI export contract.
examples/datafusion-ffi-example/src/table_provider_factory.rs Implements an example TableProviderFactory and exports it via PyCapsule.
examples/datafusion-ffi-example/src/lib.rs Registers the new example class with the example Python module.
examples/datafusion-ffi-example/python/tests/_test_table_provider_factory.py Adds an example-level test exercising CREATE EXTERNAL TABLE with a registered factory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +834 to +846
def register_table_factory(
self, format: str, factory: TableProviderFactoryExportable
) -> None:
"""Register a :py:class:`~datafusion.TableProviderFactoryExportable`.

The registered factory can be reference from SQL DDL statements executed
against this context.

Args:
format: The value to be used in `STORED AS ${format}` clause.
factory: A PyCapsule that implements TableProviderFactoryExportable"
"""
self.ctx.register_table_factory(format, factory)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No core test coverage is added for SessionContext.register_table_factory. The new test is under examples/..., but the main pytest testpaths (pyproject.toml) only include python/tests and python/datafusion, so this behavior likely won’t run in CI. Consider adding a unit/integration test under python/tests that registers a factory and exercises CREATE EXTERNAL TABLE end-to-end.

Copilot uses AI. Check for mistakes.
Comment on lines +671 to +674
let capsule = factory
.getattr("__datafusion_table_provider_factory__")?
.call1((codec_capsule,))?;
let capsule = capsule.cast::<PyCapsule>().map_err(py_datafusion_err)?;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_table_factory always calls factory.__datafusion_table_provider_factory__ and therefore fails if the caller passes an already-exported PyCapsule (unlike register_catalog_provider_list / register_catalog_provider, which accept either an exportable object or a capsule). Consider mirroring the existing pattern: if the object has the export method, call it with the codec capsule; otherwise, try casting directly to PyCapsule and validate it.

Suggested change
let capsule = factory
.getattr("__datafusion_table_provider_factory__")?
.call1((codec_capsule,))?;
let capsule = capsule.cast::<PyCapsule>().map_err(py_datafusion_err)?;
// Support both exportable factory objects and already-exported PyCapsules,
// mirroring the pattern used by catalog registration functions.
let capsule = if factory.hasattr("__datafusion_table_provider_factory__")? {
let exporter = factory.getattr("__datafusion_table_provider_factory__")?;
let capsule_obj = exporter.call1((codec_capsule,))?;
capsule_obj.cast::<PyCapsule>().map_err(py_datafusion_err)?
} else {
factory.cast::<PyCapsule>().map_err(py_datafusion_err)?
};

Copilot uses AI. Check for mistakes.
Comment on lines +839 to +844
The registered factory can be reference from SQL DDL statements executed
against this context.

Args:
format: The value to be used in `STORED AS ${format}` clause.
factory: A PyCapsule that implements TableProviderFactoryExportable"
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring issues: “can be reference” should be “can be referenced”, and the factory arg description is inaccurate (the method takes an object implementing TableProviderFactoryExportable, not a PyCapsule) and includes a stray trailing quote. Please update the docstring to match the actual expected input and fix the typo/quote.

Suggested change
The registered factory can be reference from SQL DDL statements executed
against this context.
Args:
format: The value to be used in `STORED AS ${format}` clause.
factory: A PyCapsule that implements TableProviderFactoryExportable"
The registered factory can be referenced from SQL DDL statements executed
against this context.
Args:
format: The value to be used in `STORED AS ${format}` clause.
factory: An object implementing :class:`TableProviderFactoryExportable`.

Copilot uses AI. Check for mistakes.
foo
STORED AS my_format
LOCATION '';
""")
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE EXTERNAL TABLE is executed via a DataFrame returned from ctx.sql(...); in this codebase DDL statements require .collect() to actually apply side effects (see e.g. python/tests/test_expr.py where CREATE TABLE ... is followed by .collect()). Without collecting here, the external table may never be created before the subsequent SELECT runs.

Suggested change
""")
""").collect()

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
import pyarrow as pa
import pytest
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports pyarrow as pa and pytest will be flagged by the repo's Ruff configuration for examples/* (no per-file ignore for F401). Please remove them or use them in the test.

Suggested change
import pyarrow as pa
import pytest

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for FFI_TableProviderFactory

2 participants