GCP Series: How to Run Workflows with Google Cloud Composer
Overview: Google Cloud Composer
A fully managed workflow orchestration service built on Apache Airflow.
-
Author, schedule, and monitor pipelines that span across hybrid and multi-cloud environments
-
Built on the Apache Airflow open source project and operated using Python
- Frees you from lock-in and is easy to us
How to Use?
https://cloud.google.com/composer/docs/how-to
BENEFITS
Fully managed workflow orchestration
Cloud Composer’s managed nature and Apache Airflow compatibility allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources.
Integrates with other Google Cloud products
End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline.
Supports hybrid and multi-cloud
Author, schedule, and monitor your workflows through a single orchestration tool—whether your pipeline lives on-premises, in multiple clouds, or fully within Google Cloud.
Key features
Hybrid and multi-cloud
Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud. Create workflows that connect data, processing, and services across clouds to give you a unified data environment.
Open source
Cloud Composer is built upon Apache Airflow, giving users freedom from lock-in and portability. This open source project, which Google is contributing back into, provides freedom from lock-in for customers as well as integration with a broad number of platforms, which will only expand as the Airflow community grows.
Easy orchestration
Cloud Composer pipelines are configured as directed acyclic graphs (DAGs) using Python, making it easy for any user. One-click deployment yields instant access to a rich library of connectors and multiple graphical representations of your workflow in action, making troubleshooting easy. Automatic synchronization of your directed acyclic graphs ensures your jobs stay on schedule.
Why use Cloud Composer?
Cloud Composer is a fully managed workflow orchestration service, enabling you to create workflows that span across clouds and on-premises data centers. Built on the popular Apache Airflow open source project and operated using the Python programming language, Cloud Composer is free from lock-in and easy to use. By using Cloud Composer instead of a local instance of Apache Airflow, users can benefit from the best of Airflow with no installation or management overhead.
https://cloud.google.com/composer/docs/concepts/overview#why_use
Workflows, DAGs, and tasks
In data analytics, a workflow represents a series of tasks for ingesting, transforming, analyzing, or utilizing data. In Airflow, workflows are created using DAGs, or “Directed Acyclic Graphs”.
A DAG is a collection of tasks that you want to schedule and run, organized in a way that reflects their relationships and dependencies. DAGs are created in Python scripts, which define the DAG structure (tasks and their dependencies) using code.
Each task in a DAG can represent almost anything—for example, one task might perform any of the following functions:
- Preparing data for ingestion
- Monitoring an API
- Sending an email
- Running a pipeline
A DAG shouldn’t be concerned with the function of each constituent task—its purpose is to ensure that each task is executed at the right time, in the right order, or with the right issue handling.
For more information on DAGs and tasks, see the Apache Airflow documentation.
Environments
To run workflows, you first need to create an environment. Airflow depends on many micro-services to run, so Cloud Composer provisions Google Cloud components to run your workflows. These components are collectively known as a Cloud Composer environment.
Environments are self-contained Airflow deployments based on Google Kubernetes Engine, and they work with other Google Cloud services using connectors built into Airflow. You can create one or more environments in a single Google Cloud project. You can create Cloud Composer environments in any supported region.
For an in-depth look at the components of an environment, see Cloud Composer environment architecture.