Use Cases
Real-world applications of clgraph for SQL pipeline development, testing, governance, and AI integration.
Four Pillars of clgraph
clgraph provides four complementary capabilities that address different aspects of SQL pipeline management:
-
SQL Debugging & Lineage
Understand your data flow, trace issues to source, and confidently make changes with complete lineage visibility.
-
Multi-Environment Execution
Test locally with DuckDB, deploy to BigQuery in production—same SQL, different backends, zero vendor lock-in.
-
Data Catalog & Governance
Extract metadata from SQL comments, propagate PII tracking, and maintain compliance documentation.
-
AI & LLM Integration
Generate documentation, build text-to-SQL systems, and create conversational data agents with schema-grounded AI.
Quick Comparison
| Capability | What it solves | Key benefit |
|---|---|---|
| Lineage | "Where does this data come from?" | Debug in seconds, not hours |
| Execution | "How do I test without touching prod?" | Local testing with DuckDB |
| Metadata | "What does this column mean?" | Auto-documented data catalog |
| AI/LLM | "How do I query this data?" | Schema-grounded SQL generation |
Example Pipeline
All use cases in this section use an e-commerce pipeline example:
source_orders ─────┐
source_customers ──┼──> raw_* tables ──> stg_orders_enriched ──┬──> int_daily_metrics
source_products ───┤ ├──> mart_customer_ltv
source_order_items─┘ └──> mart_product_performance
Try the examples yourself:
cd clgraph
# Lineage analysis (no execution)
uv run python examples/sql_files/run_lineage.py
# Full execution with DuckDB
uv run python examples/sql_files/run_with_duckdb.py
# Metadata management
uv run python examples/sql_files/run_metadata.py
Use Case Details
SQL Debugging & Lineage
For data engineers who need to understand and debug complex SQL pipelines
Key capabilities:
- Trace any column backward to its sources
- Analyze impact of schema changes before making them
- Understand query dependencies and execution order
- Debug data quality issues by following the data path
Example:
from clgraph import Pipeline
# Load the e-commerce pipeline example
pipeline = Pipeline.from_sql_files("examples/sql_files", dialect="bigquery")
# Where does this metric come from?
sources = pipeline.trace_column_backward("mart_customer_ltv", "lifetime_value")
print(f"Found {len(sources)} source columns for lifetime_value")
# What breaks if I rename this column?
impacts = pipeline.trace_column_forward("raw_orders", "total_amount")
print(f"Changing total_amount affects {len(impacts)} columns")
Multi-Environment Execution
For teams who want fast local testing and production-ready deployments
Key capabilities:
- Run the same SQL against DuckDB locally or BigQuery in production
- Generate fake data for testing
- Integrate with CI/CD pipelines
- Prevent vendor lock-in with portable SQL
Example:
# Mock executor for demonstration
executed_queries = []
def mock_executor(sql):
executed_queries.append(sql)
return {"status": "success"}
# Execute the pipeline
result = pipeline.run(executor=mock_executor, max_workers=1)
print(f"Executed {len(executed_queries)} queries in topological order")
Data Catalog & Governance
For data teams building self-documenting data platforms with compliance tracking
Key capabilities:
- Extract metadata from inline SQL comments
- Automatically propagate PII flags through lineage
- Query columns by owner, tag, or custom metadata
- Generate compliance reports for GDPR/CCPA
Example:
# PII propagates through transformations
pipeline.propagate_all_metadata()
pii_columns = list(pipeline.get_pii_columns())
print(f"Found {len(pii_columns)} PII columns")
# Find columns by metadata
finance_columns = list(pipeline.get_columns_by_owner("finance"))
print(f"Finance owns {len(finance_columns)} columns")
metric_columns = list(pipeline.get_columns_by_tag("metric"))
print(f"Found {len(metric_columns)} metric columns")
AI & LLM Integration
For teams building AI-powered data tools and conversational interfaces
Key capabilities:
- Generate column descriptions using LLMs with lineage context
- Build text-to-SQL systems with schema grounding (no hallucinations)
- Create conversational agents with LineageAgent
- Expose pipeline to external AI via MCP server
Example:
from clgraph.agent import LineageAgent
from langchain_openai import ChatOpenAI
# Create conversational agent
agent = LineageAgent(pipeline, llm=ChatOpenAI(model="gpt-4"))
# Ask questions in natural language
result = agent.query("Where does revenue come from?")
print(result.answer)
# Generate SQL without hallucinations
result = agent.query("Write SQL to get monthly revenue by region")
print(result.data["sql"])
Next Steps
- Installation - Get clgraph installed
- Quick Start - Build your first pipeline
- Concepts - Understand the fundamentals