Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,14 +97,15 @@ from transformplan.backends.duckdb import DuckDBBackend
con = duckdb.connect()
rel = con.sql("SELECT * FROM 'patients.parquet'")

# Same plan — backend chosen at execution time
plan = (
TransformPlan(backend=DuckDBBackend(con))
TransformPlan()
.col_rename(column="PatientID", new_name="patient_id")
.rows_filter(Col("age") >= 18)
.math_round(column="score", decimals=2)
)

result, protocol = plan.process(rel)
result, protocol = plan.process(rel, backend=DuckDBBackend(con))
```

## Available Operations
Expand Down
33 changes: 17 additions & 16 deletions docs/api/backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,22 @@ The backend determines how data is stored and transformed:
- **PolarsBackend** (default): Operates on Polars DataFrames using native Polars expressions
- **DuckDBBackend** (optional): Operates on DuckDB relations using SQL generation

All operations, validation, dry-run, and serialization work identically regardless of backend. Pipelines serialized with one backend can be loaded and executed with another.
A `TransformPlan` is a pure, backend-agnostic recipe of operations. The backend is chosen at execution time by passing it to `process()`, `validate()`, or `dry_run()`. If no backend is specified, `PolarsBackend` is used by default. Pipelines serialized with one backend can be loaded and executed with another.

```python
from transformplan import TransformPlan

# Default — uses PolarsBackend
plan = TransformPlan()
# Build a plan — no backend needed
plan = TransformPlan().col_drop("temp").math_add("age", 1)

# Explicit Polars backend
from transformplan.backends.polars import PolarsBackend
plan = TransformPlan(backend=PolarsBackend())
# Execute with default PolarsBackend
result, protocol = plan.process(polars_df)

# DuckDB backend
# Execute with DuckDB backend
import duckdb
from transformplan.backends.duckdb import DuckDBBackend
con = duckdb.connect()
plan = TransformPlan(backend=DuckDBBackend(con))
result, protocol = plan.process(duckdb_rel, backend=DuckDBBackend(con))
```

## Backend ABC
Expand Down Expand Up @@ -96,38 +95,40 @@ con = duckdb.connect()
rel = con.sql("SELECT * FROM 'data.parquet'")

plan = (
TransformPlan(backend=DuckDBBackend(con))
TransformPlan()
.col_rename(column="ID", new_name="id")
.rows_filter(Col("age") >= 18)
.math_standardize(column="score", new_column="z_score")
)

result, protocol = plan.process(rel)
result, protocol = plan.process(rel, backend=DuckDBBackend(con))
```

## Cross-Backend Serialization

Pipelines are backend-agnostic when serialized. You can build a pipeline with one backend and execute it with another:
Pipelines are inherently backend-agnostic. The same serialized plan can be executed with any backend:

```python
import polars as pl
import duckdb
from transformplan import TransformPlan, Col
from transformplan.backends.duckdb import DuckDBBackend

# Build and serialize with Polars (default)
# Build and serialize
plan = (
TransformPlan()
.col_rename(column="ID", new_name="id")
.rows_filter(Col("age") >= 18)
)
plan.to_json("pipeline.json")

# Load and execute with DuckDB
# Load and execute with Polars (default)
restored = TransformPlan.from_json("pipeline.json")
result, protocol = restored.process(polars_df)

# Or execute with DuckDB
con = duckdb.connect()
rel = con.sql("SELECT * FROM 'data.parquet'")
plan_duckdb = TransformPlan.from_json("pipeline.json", backend=DuckDBBackend(con))
result, protocol = plan_duckdb.process(rel)
result, protocol = restored.process(rel, backend=DuckDBBackend(con))
```

## Type System
Expand Down
14 changes: 9 additions & 5 deletions docs/api/plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The main class for building and executing transformation pipelines.

## Overview

`TransformPlan` uses a deferred execution model: operations are registered via method chaining, then executed together when you call `process()`, `validate()`, or `dry_run()`. An optional `backend` parameter selects the execution engine (defaults to `PolarsBackend`).
`TransformPlan` uses a deferred execution model: operations are registered via method chaining, then executed together when you call `process()`, `validate()`, or `dry_run()`. The plan itself is backend-agnostic — the backend is chosen at execution time (defaults to `PolarsBackend`).

```python
from transformplan import TransformPlan, Col
Expand All @@ -22,15 +22,19 @@ df_result, protocol = plan.process(df)

## Backend Selection

The backend is passed at execution time, not at construction:

```python
from transformplan.backends.duckdb import DuckDBBackend

plan = TransformPlan().col_drop("temp").math_add("age", 1)

# Default (Polars)
plan = TransformPlan()
result, protocol = plan.process(polars_df)

# DuckDB
import duckdb
from transformplan.backends.duckdb import DuckDBBackend
con = duckdb.connect()
plan = TransformPlan(backend=DuckDBBackend(con))
result, protocol = plan.process(duckdb_rel, backend=DuckDBBackend(con))
```

See [Backends](backends.md) for details on each backend.
Expand Down
6 changes: 3 additions & 3 deletions docs/api/validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ if not result.is_valid:

## DuckDB Validation

Validation works identically with DuckDB relations:
Validation works identically with DuckDB relations — pass the backend at validation time:

```python
import duckdb
Expand All @@ -80,12 +80,12 @@ con = duckdb.connect()
rel = con.sql("SELECT 'Alice' AS name, 25 AS age, 50000 AS salary")

plan = (
TransformPlan(backend=DuckDBBackend(con))
TransformPlan()
.col_drop("age")
.rows_filter(Col("age") > 18) # Error: age was dropped!
)

result = plan.validate(rel)
result = plan.validate(rel, backend=DuckDBBackend(con))
# ValidationResult(valid=False, errors=1)
```

Expand Down
12 changes: 7 additions & 5 deletions docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ print(df_result)

## Using the DuckDB Backend

TransformPlan supports DuckDB as an alternative backend. All 88 operations, validation, and dry-run work identically — only the data type changes from Polars DataFrames to DuckDB relations.
TransformPlan supports DuckDB as an alternative backend. All 88 operations, validation, and dry-run work identically — the same plan works with both Polars DataFrames and DuckDB relations. Simply pass the backend at execution time:

```python
import duckdb
Expand All @@ -89,18 +89,20 @@ rel = con.sql("""
UNION ALL SELECT 'Diana', 'Sales', 70000, 2
""")

# Same plan as before — no backend in constructor
plan = (
TransformPlan(backend=DuckDBBackend(con))
TransformPlan()
.col_rename(column="name", new_name="employee")
.math_multiply(column="salary", value=1.05)
.math_round(column="salary", decimals=0)
.rows_filter(Col("years") >= 3)
)

# Validate and execute — same API as Polars
result = plan.validate(rel)
# Pass backend at execution time
backend = DuckDBBackend(con)
result = plan.validate(rel, backend=backend)
if result.is_valid:
df_result, protocol = plan.process(rel)
df_result, protocol = plan.process(rel, backend=backend)
```

## Viewing the Audit Protocol
Expand Down
5 changes: 3 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,14 +96,15 @@ from transformplan.backends.duckdb import DuckDBBackend
con = duckdb.connect()
rel = con.sql("SELECT * FROM 'patients.parquet'")

# Same plan — backend chosen at execution time
plan = (
TransformPlan(backend=DuckDBBackend(con))
TransformPlan()
.col_rename(column="PatientID", new_name="patient_id")
.rows_filter(Col("age") >= 18)
.math_round(column="score", decimals=2)
)

result, protocol = plan.process(rel)
result, protocol = plan.process(rel, backend=DuckDBBackend(con))
```

## Available Operations
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "transformplan"
version = "0.1.2"
version = "0.1.3"
description = "Safe, reproducible data transformations with built-in auditing and validation"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
Loading
Loading