Use Cases

Real-world applications of clgraph for SQL pipeline development, testing, governance, and AI integration.

Four Pillars of clgraph

clgraph provides four complementary capabilities that address different aspects of SQL pipeline management:

SQL Debugging & Lineage

Understand your data flow, trace issues to source, and confidently make changes with complete lineage visibility.

Learn more
Multi-Environment Execution

Test locally with DuckDB, deploy to BigQuery in production—same SQL, different backends, zero vendor lock-in.

Learn more
Data Catalog & Governance

Extract metadata from SQL comments, propagate PII tracking, and maintain compliance documentation.

Learn more
AI & LLM Integration

Generate documentation, build text-to-SQL systems, and create conversational data agents with schema-grounded AI.

Learn more

Quick Comparison

Capability	What it solves	Key benefit
Lineage	"Where does this data come from?"	Debug in seconds, not hours
Execution	"How do I test without touching prod?"	Local testing with DuckDB
Metadata	"What does this column mean?"	Auto-documented data catalog
AI/LLM	"How do I query this data?"	Schema-grounded SQL generation

Example Pipeline

All use cases in this section use an e-commerce pipeline example:

source_orders ─────┐
source_customers ──┼──> raw_* tables ──> stg_orders_enriched ──┬──> int_daily_metrics
source_products ───┤                                           ├──> mart_customer_ltv
source_order_items─┘                                           └──> mart_product_performance

Try the examples yourself:

cd clgraph

# Lineage analysis (no execution)
uv run python examples/sql_files/run_lineage.py

# Full execution with DuckDB
uv run python examples/sql_files/run_with_duckdb.py

# Metadata management
uv run python examples/sql_files/run_metadata.py

Use Case Details

SQL Debugging & Lineage

For data engineers who need to understand and debug complex SQL pipelines

Key capabilities:

Trace any column backward to its sources
Analyze impact of schema changes before making them
Understand query dependencies and execution order
Debug data quality issues by following the data path

Example:

from clgraph import Pipeline

# Load the e-commerce pipeline example
pipeline = Pipeline.from_sql_files("examples/sql_files", dialect="bigquery")

# Where does this metric come from?
sources = pipeline.trace_column_backward("mart_customer_ltv", "lifetime_value")
print(f"Found {len(sources)} source columns for lifetime_value")

# What breaks if I rename this column?
impacts = pipeline.trace_column_forward("raw_orders", "total_amount")
print(f"Changing total_amount affects {len(impacts)} columns")

Full guide

Multi-Environment Execution

For teams who want fast local testing and production-ready deployments

Key capabilities:

Run the same SQL against DuckDB locally or BigQuery in production
Generate fake data for testing
Integrate with CI/CD pipelines
Prevent vendor lock-in with portable SQL

Example:

# Mock executor for demonstration
executed_queries = []

def mock_executor(sql):
    executed_queries.append(sql)
    return {"status": "success"}

# Execute the pipeline
result = pipeline.run(executor=mock_executor, max_workers=1)
print(f"Executed {len(executed_queries)} queries in topological order")

Full guide

Data Catalog & Governance

For data teams building self-documenting data platforms with compliance tracking

Key capabilities:

Extract metadata from inline SQL comments
Automatically propagate PII flags through lineage
Query columns by owner, tag, or custom metadata
Generate compliance reports for GDPR/CCPA

Example:

# PII propagates through transformations
pipeline.propagate_all_metadata()
pii_columns = list(pipeline.get_pii_columns())
print(f"Found {len(pii_columns)} PII columns")

# Find columns by metadata
finance_columns = list(pipeline.get_columns_by_owner("finance"))
print(f"Finance owns {len(finance_columns)} columns")

metric_columns = list(pipeline.get_columns_by_tag("metric"))
print(f"Found {len(metric_columns)} metric columns")

Full guide

AI & LLM Integration

For teams building AI-powered data tools and conversational interfaces

Key capabilities:

Generate column descriptions using LLMs with lineage context
Build text-to-SQL systems with schema grounding (no hallucinations)
Create conversational agents with LineageAgent
Expose pipeline to external AI via MCP server

Example:

from clgraph.agent import LineageAgent
from langchain_openai import ChatOpenAI

# Create conversational agent
agent = LineageAgent(pipeline, llm=ChatOpenAI(model="gpt-4"))

# Ask questions in natural language
result = agent.query("Where does revenue come from?")
print(result.answer)

# Generate SQL without hallucinations
result = agent.query("Write SQL to get monthly revenue by region")
print(result.data["sql"])

Full guide

Next Steps

Installation - Get clgraph installed
Quick Start - Build your first pipeline
Concepts - Understand the fundamentals