Skip to content

Use Cases

Real-world applications of clgraph for SQL pipeline development, testing, governance, and AI integration.


Four Pillars of clgraph

clgraph provides four complementary capabilities that address different aspects of SQL pipeline management:

  • SQL Debugging & Lineage


    Understand your data flow, trace issues to source, and confidently make changes with complete lineage visibility.

    Learn more

  • Multi-Environment Execution


    Test locally with DuckDB, deploy to BigQuery in production—same SQL, different backends, zero vendor lock-in.

    Learn more

  • Data Catalog & Governance


    Extract metadata from SQL comments, propagate PII tracking, and maintain compliance documentation.

    Learn more

  • AI & LLM Integration


    Generate documentation, build text-to-SQL systems, and create conversational data agents with schema-grounded AI.

    Learn more


Quick Comparison

Capability What it solves Key benefit
Lineage "Where does this data come from?" Debug in seconds, not hours
Execution "How do I test without touching prod?" Local testing with DuckDB
Metadata "What does this column mean?" Auto-documented data catalog
AI/LLM "How do I query this data?" Schema-grounded SQL generation

Example Pipeline

All use cases in this section use an e-commerce pipeline example:

source_orders ─────┐
source_customers ──┼──> raw_* tables ──> stg_orders_enriched ──┬──> int_daily_metrics
source_products ───┤                                           ├──> mart_customer_ltv
source_order_items─┘                                           └──> mart_product_performance

Try the examples yourself:

cd clgraph

# Lineage analysis (no execution)
uv run python examples/sql_files/run_lineage.py

# Full execution with DuckDB
uv run python examples/sql_files/run_with_duckdb.py

# Metadata management
uv run python examples/sql_files/run_metadata.py

Use Case Details

SQL Debugging & Lineage

For data engineers who need to understand and debug complex SQL pipelines

Key capabilities:

  • Trace any column backward to its sources
  • Analyze impact of schema changes before making them
  • Understand query dependencies and execution order
  • Debug data quality issues by following the data path

Example:

from clgraph import Pipeline

# Load the e-commerce pipeline example
pipeline = Pipeline.from_sql_files("examples/sql_files", dialect="bigquery")

# Where does this metric come from?
sources = pipeline.trace_column_backward("mart_customer_ltv", "lifetime_value")
print(f"Found {len(sources)} source columns for lifetime_value")

# What breaks if I rename this column?
impacts = pipeline.trace_column_forward("raw_orders", "total_amount")
print(f"Changing total_amount affects {len(impacts)} columns")

Full guide


Multi-Environment Execution

For teams who want fast local testing and production-ready deployments

Key capabilities:

  • Run the same SQL against DuckDB locally or BigQuery in production
  • Generate fake data for testing
  • Integrate with CI/CD pipelines
  • Prevent vendor lock-in with portable SQL

Example:

# Mock executor for demonstration
executed_queries = []

def mock_executor(sql):
    executed_queries.append(sql)
    return {"status": "success"}

# Execute the pipeline
result = pipeline.run(executor=mock_executor, max_workers=1)
print(f"Executed {len(executed_queries)} queries in topological order")

Full guide


Data Catalog & Governance

For data teams building self-documenting data platforms with compliance tracking

Key capabilities:

  • Extract metadata from inline SQL comments
  • Automatically propagate PII flags through lineage
  • Query columns by owner, tag, or custom metadata
  • Generate compliance reports for GDPR/CCPA

Example:

# PII propagates through transformations
pipeline.propagate_all_metadata()
pii_columns = list(pipeline.get_pii_columns())
print(f"Found {len(pii_columns)} PII columns")

# Find columns by metadata
finance_columns = list(pipeline.get_columns_by_owner("finance"))
print(f"Finance owns {len(finance_columns)} columns")

metric_columns = list(pipeline.get_columns_by_tag("metric"))
print(f"Found {len(metric_columns)} metric columns")

Full guide


AI & LLM Integration

For teams building AI-powered data tools and conversational interfaces

Key capabilities:

  • Generate column descriptions using LLMs with lineage context
  • Build text-to-SQL systems with schema grounding (no hallucinations)
  • Create conversational agents with LineageAgent
  • Expose pipeline to external AI via MCP server

Example:

from clgraph.agent import LineageAgent
from langchain_openai import ChatOpenAI

# Create conversational agent
agent = LineageAgent(pipeline, llm=ChatOpenAI(model="gpt-4"))

# Ask questions in natural language
result = agent.query("Where does revenue come from?")
print(result.answer)

# Generate SQL without hallucinations
result = agent.query("Write SQL to get monthly revenue by region")
print(result.data["sql"])

Full guide


Next Steps