Deploying Pipelines with GitHub Actions

GlassFlow enables seamless integration with GitHub, allowing users to build, deploy, and maintain their data pipelines using GitHub Actions.

By storing pipeline code in a Python file and its configuration in a YAML file within a GitHub repository, users can leverage CI/CD workflows to automatically deploy updates to GlassFlow whenever changes are pushed.

Key Features

  • Version Control: Maintain your pipeline in GitHub, ensuring versioning and collaboration.
  • CI/CD Integration: Automate deployments using GitHub Actions.
  • Local Development: Develop and test locally before pushing changes.
  • Team Collaboration: Work with colleagues using GitHub workflows.

Setting Up Your Pipeline

The following steps showcase how to setup a pipeline on a github repository. You can also fork our template repository on github and use it to get started.

1

Structure Your GitHub Repository

Ensure your repository contains:

2

Define Your Pipeline Code

Create a transform.py file with your transformantion code. Example:

3

Configure Your Pipeline

Create a pipeline.yaml file to specify pipeline settings:

4

Set Up GitHub Actions Workflow

Create .github/workflows/on_push.yaml to automate deployment:

5

Setup Github secrets for GitHub Actions

Store Your Token Securely

To use this action, configure your repository with a GlassFlow Personal Access Token:

  • Navigate to Settings > Secrets and variables > Actions in your repository.
  • Click New repository secret.
  • Set the secret name (e.g., GlassFlowPAT).
  • Retrieve your access token from your GlassFlow profile and paste it as the secret value.

GitHub encrypts your token, and the action will not expose it in logs, ensuring security.

Github Secret

Set Workflow Permissions

For the action to update YAML files with assigned space_id and pipeline_id values for newly created spaces and pipelines:

  • Go to Settings > Actions > General > Workflow Permissions.
  • Ensure Read and write permissions are enabled.
6

Push Changes and Deploy

Commit and push your files to GitHub:

GitHub Actions will trigger the deployment, reflecting updates in GlassFlow.


Pipeline YAML specification

Note:

A pipeline consists on three components: source, transformer and sink. All three components are required to define a pipeline. For now, we can only define one of each type and connect them sequentially (source -> transformer -> sink)

Pipeline Components

Transformer

The transformer component specifies the transformation layer of the pipeline.

Source

Sorce component configures the data source of the pipeline. To use a managed source connector, provide a kind which the type of connector you want to configure. To send data without a use of a managed connector (e.g via API or with Python SDK) remove the kind parameter.

A complete list of source connectors can be found on the integrations page.

Sink

Sink connector component configures the data sink for the pipeline. To use a managed sink connector, provide a kind which the type of connector you want to configure. To send data without a use of a managed connector (e.g via API or with Python SDK) remove the kind parameter.
A complete list of sink connectors can be found on the integrations page.

Monitoring and Debugging

  • Check workflow logs in GitHub Actions under the Actions tab.
  • Use GlassFlow WebApp to monitor the pipeline, get access tokens and view logs
  • Update hanlder.py and pipeline.yaml, then push changes to trigger a redeployment.

Conclusion

By integrating GlassFlow with GitHub Actions, you can manage your data pipelines efficiently with CI/CD, enabling automated deployments, collaboration, and a streamlined development workflow.