Skip to content

API Reference

Complete API documentation for clgraph - SQL column lineage analysis and pipeline orchestration.

Quick Start

# Trace lineage - find source columns
sources = pipeline.trace_column_backward("analytics.customer_metrics", "total_amount")
print(f"Source columns: {sources}")

# Forward impact analysis
impacts = pipeline.trace_column_forward("raw.orders", "amount")
print(f"Impacted columns: {impacts}")

# Export to JSON
data = pipeline.to_json()
print(f"Exported {len(data['columns'])} columns")

Core Classes

Pipeline

The main entry point for all lineage operations.

  • Pipeline - Create pipelines, trace lineage, manage metadata, execute queries

Lineage

Classes for understanding data flow.

Validation

Classes for SQL quality validation.

Export

Export lineage data to various formats.

Visualization

Create GraphViz visualizations.

Comparison

Track changes between pipeline versions.

LLM-Powered Features

Natural language interfaces and AI-powered tools.

Common Imports

from clgraph import (
    # Main entry point
    Pipeline,

    # Single-query lineage
    SQLColumnTracer,

    # Export formats
    JSONExporter,
    CSVExporter,

    # Visualization functions
    visualize_pipeline_lineage,
    visualize_table_dependencies,
    visualize_lineage_path,
)

# LLM features
from clgraph.agent import LineageAgent
from clgraph.tools import (
    TraceBackwardTool,
    TraceForwardTool,
    ListTablesTool,
    GenerateSQLTool,  # Requires LLM
)

SQL Dialects

clgraph supports multiple SQL dialects via sqlglot:

# BigQuery (default)
pipeline = Pipeline.from_sql_list(queries, dialect="bigquery")

# Snowflake
pipeline = Pipeline.from_sql_list(queries, dialect="snowflake")

# PostgreSQL
pipeline = Pipeline.from_sql_list(queries, dialect="postgres")

# Other: mysql, redshift, spark, duckdb, etc.