Runs the batch processing workflows. There are two Dataform repositories for development and production.
The test repository is used for development and testing purposes and not connected to the rest of the pipeline infra.
Pipeline can be run manually from the Dataform UI.
- Create new dev workspace in test Dataform repository.
- Make adjustments to the dataform configuration files and manually run a workflow to verify.
- Push all your changes to a dev branch & open a PR with the link to the BigQuery artifacts generated in the test workflow.
Some useful hints:
- In workflow settings vars set
dev_name: devto process sampled data in dev workspace. - Change
current_monthvariable to a month in the past. May be helpful for testing pipelines based onchrome-ux-reportdata. definitions/extra/test_env.sqlxscript helps to setup the tables required to run pipelines when in dev workspace. It's disabled by default.
- In
workflow_settings.yamlsetenvironment: devto process sampled data. - For development and testing, you can modify variables in
includes/constants.js, but note that these are programmatically generated.
definitions/- Contains the core Dataform SQL definitions and declarationsoutput/- Contains the main pipeline transformation logicdeclarations/- Contains referenced tables/views declarations and other resources definitions
includes/- Contains shared JavaScript utilities and constantsinfra/- Infrastructure code and deployment configurationsbigquery-export/- BigQuery export servicedataform-service/- Cloud Run function for dataform workflows automationtf/- Terraform configurations
docs/- Additional documentation
GitHub PAT saved to a Secret Manager secret.
- repository: HTTPArchive/dataform
- permissions:
- Commit statuses: read
- Contents: read, write