GCP Series: Breakdown of Cloud Operations Tools and Resources

(GCP) – Google Cloud Platform Cloud Operations Tools and Resources

Cloud Monitoring

Gain visibility into the performance, availability, and health of your applications and infrastructure.

  • Collect metrics from multicloud and hybrid infrastructure in real time

  • Enable SRE best practices extensively used by Google based on SLOs and SLIs

  • Visualize insights via dashboards and charts, and generate alerts

  • Collaborate by integrating with Slack, PagerDuty, and other incident management tools

  • Day zero integration for Google Cloud metrics

Key features

SLO monitoring

Automatically infer or custom define service-level objectives (SLOs) for applications and get alerted when SLO violations occur.

Custom metrics

Instrument your application to monitor application and business-level metrics via Cloud Monitoring.

Google Cloud integration

Discover and monitor all Google Cloud resources and services, with no additional instrumentation, integrated right into the Google Cloud console.

All features

SLO monitoring Automatically infer or custom define service-level objectives (SLOs) for applications and get alerted when SLO violations occur.
Custom metrics Instrument your application to monitor application and business-level metrics via Cloud Monitoring.
Google Cloud integration Discover and monitor all Google Cloud resources and services, with no additional configuration, integrated right into the Google Cloud console.
Monitoring agent Deploy our open source agent on your Google Cloud hosts and other environments to collect metrics and monitor applications.
Logging integration Drill down from dashboards and charts to logs. Create, visualize, and alert on metrics based on log data.
Dashboards Get core visibility into your cloud resources and services with no configuration. Define custom dashboards and take advantage of Google’s powerful data visualization tools.
Group/cluster support Define relationships based on resource names, tags, security groups, projects, regions, accounts, and other criteria. Use those relationships to create targeted dashboards and topology-aware alerting policies.
Alerting Configure alerting policies to notify you when events occur or particular system or custom metrics violate rules that you define. Use multiple conditions to define complex alerting rules. Receive notifications via email, SMS, Slack, PagerDuty, and more.
Uptime monitoring Monitor the availability of your internet-accessible URLs, VMs, APIs, and load balancers from probes around the globe. Use Cloud Monitoring alerting to be notified of outages.

Pricing

Cloud Monitoring includes all Google Cloud metrics and all product features for no additional cost. Free allotment and prices for chargeable metrics are listed below. Learn more in the pricing details guide.

FEATURE PRICE1 FREE ALLOTMENT PER MONTH EFFECTIVE DATE
Monitoring data $0.2580/MiB: 150–100,000 MiB

$0.1510/MiB: 100,000–250,000 MiB

$0.0610/MiB: >250,000 MiB

All Google Cloud metrics2

First 150 MiB per billing account for chargeable metrics

July 1, 2018
Monitoring API calls $0.01/1,000 API calls First 1 million API calls included per billing account July 1, 2018

Cloud Logging

Real-time log management and analysis at scale.

  • Securely store, search, analyze, and alert on all of your log data and events

  • Ingest custom log data from any source

  • An exabyte-scale, fully managed service for your application and infrastructure logs

  • Analyze log data in real time

All features

Logs viewer Search, sort, and query logs through flexible query statements, along with rich histogram visualizations, simple field explorers, and ability to save the queries.
Custom logs / Ingestion API Write any custom log, from any source, into Cloud Logging using the public write APIs.
Logs alerting Integrate with Cloud Monitoring to set alerts on the logs events, including the logs-based metrics you have defined.
Advanced analytics with BigQuery Optionally, export data with one-click configuration in real time to BigQuery for advanced analytics and SQL-like querying.
Logs retention Configure different retention periods for logs in different log buckets, and criteria for different logs using the logs router.
Logs-based metrics Create metrics from log data which appears seamlessly in Cloud Monitoring, where you can visualize these metrics and create dashboards.
Audit logging Access audit logs that capture all the admin and data access events within Google Cloud, with 400 days of data retention at no additional cost.
Third-party integrations Integrate with external systems using Pub/Sub and configuring Logs Router to export the logs.
Logs archival Store logs for a longer duration at lower cost by easily exporting into Cloud Storage.
Error Reporting Error Reporting lets you see problems through the noise by automatically analyzing your logs for exceptions and intelligently aggregating them into meaningful error groups.
Log buckets and views Log buckets provides a first-class logs storage solution that lets you centralize or subdivide your logs based on your needs. From there, use log views to specify which logs a user should have access to, all through standard IAM controls.

Pricing

All features of Cloud Logging are free to use, and pricing is applicable for ingested log volume over the free allotment. Get more pricing details.

FEATURE PRICE1 FREE ALLOTMENT PER MONTH EFFECTIVE DATE
Logging data $0.50/GiB First 50 GiB/project July 1, 2018

1For pricing purposes, all units such as MB and GB represent binary measures. For example, 1 MB is 220 bytes. 1 GB is 230 bytes. These binary units are also known as mebibyte (MiB) and gibibyte (GiB), respectively. Note also that MB and MiB, and GB and GiB, are used interchangeably.

Cloud Trace

Distributed tracing for everyone

Cloud Trace is a distributed tracing system that collects latency data from your applications and displays it in the Google Cloud Console. You can track how requests propagate through your application and receive detailed near real-time performance insights. Cloud Trace automatically analyzes all of your application’s traces to generate in-depth latency reports to surface performance degradations, and can capture traces from all of your VMs, containers, or App Engine projects.

Find performance bottlenecks

Using Cloud Trace, you can inspect detailed latency information for a single request or view aggregate latency for your entire application. Using the various tools and filters provided, you can quickly find where bottlenecks are occurring and more quickly identify their root cause. Cloud Trace is based off of the tools used at Google to keep our services running at extreme scale.

Fast, automatic issue detection

Trace continuously gathers and analyzes trace data from your project to automatically identify recent changes to your application’s performance. These latency distributions, available through the Analysis Reports feature, can be compared over time or versions, and Cloud Trace will automatically alert you if it detects a significant shift in your app’s latency profile.

Broad platform support

Cloud Trace’s language-specific SDKs can analyze projects running on VMs (even those not managed by Google Cloud). The Trace SDK is currently available for Java, Node.js, Ruby, and Go, and the Trace API can be used to submit and retrieve trace data from any source. A Zipkin collector is also available, which allows Zipkin tracers to submit data to Cloud Trace. Projects running on App Engine are automatically captured.

Features

Easy set up

All App Engine applications are automatically traced and libraries are available to trace applications running elsewhere after minimal setup. All performance reports and analysis described above are available out of the box.

Performance insights

Each end point level trace is evaluated automatically for performance bottlenecks.

Automatic analysis

Automatic daily performance reports are created for each traced application. You can also generate reports on demand.

Extensibility for custom workloads

The Trace API and language specific SDKs are available to trace applications running on virtual machines and containers. Trace data can be consumed via the Cloud Trace UI through the Trace API.

Latency shift detection

Performance reports of your application are evaluated over time to identify latency degradation of your application over time.

https://cloud.google.com/trace/docs/reference

Cloud Profiler

Continuous CPU and heap profiling to improve performance and reduce costs.

Actionable application profiling

Poorly performing code increases the latency and cost of applications and web services every day, without anyone knowing or doing anything about it. Cloud Profiler changes this by continuously analyzing the performance of CPU or memory-intensive functions executed across an application. Cloud Profiler presents the call hierarchy and resource consumption of the relevant function in an interactive flame graph that helps developers understand which paths consume the most resources and the different ways in which their code is actually called.

Low-impact production profiling

While it’s possible to measure code performance in development environments, the results generally don’t map well to what’s happening in production. Many production profiling techniques either slow down code execution or can only inspect a small subset of a codebase. Cloud Profiler uses statistical techniques and extremely low-impact instrumentation that runs across all production application instances to provide a complete picture of an application’s performance without slowing it down.

Broad platform support

Cloud Profiler allows developers to analyze applications running anywhere, including Google Cloud, other cloud platforms, or on-premises, with support for Java, Go, Node.js, and Python. A full explanation of language support is available in the documentation.

Technical resources

https://cloud.google.com/profiler/docs

Cloud Debugger

Investigate your code’s behavior in production.

Real-time application debugging

Cloud Debugger is a feature of Google Cloud that lets you inspect the state of a running application in real time, without stopping or slowing it down. Your users are not impacted while you capture the call stack and variables at any location in your source code. You can use it to understand the behavior of your code in production, as well as analyze its state to find those hard-to-find bugs.

Debug in production

Cloud Debugger can be used with production applications. With a few mouse clicks, you can take a snapshot of your running application state or inject a new logging statement. A snapshot captures the call stack and variables at a specific code location the first time any instance executes that code. The injected logpoint behaves as if it were part of the deployed code writing the log messages to the same log stream. This is made available for use in a simple, user-friendly interface.

Multiple source options

Cloud Debugger is easier to use when source code is available. It knows how to display the correct version of the source code when a version control system is used, such as Cloud Source Repositories, GitHub, Bitbucket, or GitLab. When other source repositories are used, you can upload the source files as part of your build-and-deploy process. It can also display local files when used for local development. If you don’t have access to source code, just type the filename and line number directly in the user interface to take a snapshot or inject a logpoint.

Collaborate while debugging

Easily collaborate with other team members by sharing your debug session. Sharing a debug session is as easy as sending the Console URL.

Use your workflows

Cloud Debugger is integrated into existing developer workflows. Launch Debugger and take snapshots directly from Cloud Logging, error reporting, dashboards, IDEs, and the gcloud command-line interface.

Features

Debug snapshot

Capture the state of your application in production at a specific line location.

Debug logpoints

Inject a new logging statement on demand at a specific line location.

Conditional debugging

Capture a snapshot or write a logpoint message only when you need it, using a simple conditional expression written in your application’s language.

IDE integration

Use Cloud Debugger within your IDE.

Easy setup

Debugger is automatically enabled for App Engine applications. Follow simple steps to enable it for Google Kubernetes Engine or Compute Engine.

https://cloud.google.com/debugger/docs

https://cloud.google.com/debugger/api

https://cloud.google.com/blog/products/gcp/diagnose-problems-in-your-production-apps-faster-with-google-cloud-debugger