Data Observability on Snowflake vs BigQuery vs Redshift: What Actually Differs

Snowflake BigQuery Redshift observability comparison

All three major cloud data warehouses — Snowflake, BigQuery, and Redshift — provide query history, information schema metadata, and basic monitoring APIs. But the specifics of each platform's observability surface matter for how easy it is to implement lineage extraction, freshness monitoring, and schema change detection. This post covers the practical differences.

Snowflake: Rich Query History, Excellent API Coverage

Snowflake provides the most accessible query history for lineage extraction. The ACCOUNT_USAGE.QUERY_HISTORY view (and the more recent INFORMATION_SCHEMA.QUERY_HISTORY table function) contains full query text, execution timestamps, warehouse name, role, and other metadata for all queries executed in the account. Retention is 365 days in ACCOUNT_USAGE (with some latency) and 7 days in INFORMATION_SCHEMA. For lineage extraction, the combination of full SQL text and metadata enables column-level lineage reconstruction for most transformation patterns.

Snowflake's ACCESS_HISTORY view (Enterprise tier and above) provides object-level access lineage that Snowflake itself computes, reducing the need to parse SQL text for some use cases. For teams on Enterprise or Business Critical tiers, this provides a native foundation for lineage that can be supplemented with SQL parsing for column-level granularity.

For freshness monitoring, Snowflake's INFORMATION_SCHEMA.TABLE_STORAGE_METRICS and the TABLE_LAST_ALTERED metadata provide last-write timestamps at the table level. Snowflake also supports native dynamic data masking and row access policies, which create governance considerations for observability tools that need to inspect data distributions for anomaly detection.

BigQuery: Strong Native Lineage, Different Permissions Model

Google Cloud's BigQuery Data Lineage (part of Data Catalog and Dataplex) provides native column-level lineage for transformations that run as BigQuery jobs. For teams that run their entire transformation layer in BigQuery — dbt using the BigQuery adapter, Dataflow jobs, or direct SQL — this native lineage covers the transformation graph without requiring SQL parsing by a third-party tool.

The limitation is scope: BigQuery native lineage only covers operations that run as BigQuery jobs. External transformations — Python scripts that write to BigQuery via the Storage Write API, Airflow operators that call the BigQuery API, or Spark jobs running on Dataproc — may not be captured. Third-party lineage tools that supplement native BigQuery lineage with query log parsing can fill these gaps.

BigQuery's INFORMATION_SCHEMA.JOBS_BY_PROJECT view provides query history similar to Snowflake's QUERY_HISTORY, with full SQL text and metadata. BigQuery jobs are retained for 180 days. One practical difference from Snowflake: BigQuery's information schema requires project-level IAM permissions that are typically controlled by the GCP organization admin rather than the data warehouse admin, which can create friction for observability tool onboarding in tightly governed GCP environments.

For schema monitoring, BigQuery provides strong table metadata through the INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view, which covers nested and repeated fields in BigQuery's native JSON/struct column types — a consideration that Snowflake and Redshift do not have in the same way.

Redshift: Query History Access, Performance Considerations

Redshift's STL_QUERY and SVL_QUERY_SUMMARY system tables provide query history with SQL text, but with important retention and access considerations. STL tables retain data for approximately 2-5 days depending on cluster load (and the auto-generated system table retention settings), which is significantly shorter than Snowflake or BigQuery. For teams that want longer query history for lineage extraction, this requires either exporting STL data to S3 regularly or using Redshift's query activity logs exported to CloudWatch or S3.

Redshift's SYS_QUERY_HISTORY view (available in Redshift Serverless and newer provisioned clusters) provides better retention and accessibility than STL tables and is the preferred source for observability integrations in current Redshift deployments. For teams still on older Redshift versions, STL_QUERYTEXT is required for queries whose SQL text exceeds 200 characters per row in STL_QUERY (Redshift splits long queries across multiple rows in this table, requiring reassembly).

One practical consideration with Redshift is that large-scale lineage extraction queries against STL and SVL system tables can consume compute resources on the leader node, which also serves user queries. Running lineage extraction during off-peak hours and using appropriate query resource groups is important for teams on capacity-constrained Redshift clusters.

Multi-Warehouse Environments

Many data teams do not run on a single warehouse. A common pattern is Redshift as the primary warehouse for established pipelines with Snowflake or BigQuery being evaluated or in production for newer use cases. Another common pattern is BigQuery as the primary warehouse with Snowflake used by a specific team or function. Observability tooling needs to span the full data stack regardless of which warehouses are involved.

Decube supports Snowflake, BigQuery, and Redshift connections simultaneously, allowing a single lineage graph to span multiple warehouse environments. Cross-warehouse lineage edges — where data moves from one warehouse to another via an ETL or ELT process — are captured where the movement is visible in query history. This is particularly useful for teams in migration scenarios where data flows between an old and new warehouse for an extended period.

Observability Across Your Entire Stack

Decube connects to Snowflake, BigQuery, and Redshift in minutes.

Book a Demo