qualink

quality + link โ€” linking your data to quality.

Blazing fast data quality framework for Python, built on Apache DataFusion.

$ uv add qualink
๐Ÿš€

High Performance

Leverages Apache DataFusion for blazing-fast SQL-based data quality checks with zero-copy Arrow processing.

๐Ÿ”ง

25+ Built-in Constraints

Completeness, uniqueness, statistics, patterns, formats, cross-table comparisons, and more โ€” all ready to use.

๐Ÿ“„

YAML Configuration

Define your entire validation suite declaratively in YAML โ€” no code required for standard checks.

โšก

Async First

Built with asyncio for non-blocking execution. Run checks sequentially or in parallel.

๐Ÿ“Š

Multiple Formatters

Output results as human-readable text, JSON for pipelines, or Markdown for reports.

๐Ÿ–ฅ๏ธ

CLI โ€“ qualinkctl

Run validations from the terminal with a single command. Perfect for CI/CD pipelines and automation.

๐Ÿ—๏ธ

Fluent Builder API

Chain methods to define checks with a clean, readable, Pythonic builder pattern.

Quick Example

import asyncio
from datafusion import SessionContext
from qualink.checks import Check, Level
from qualink.constraints import Assertion
from qualink.core import ValidationSuite
from qualink.formatters import MarkdownFormatter

async def main():
    ctx = SessionContext()
    ctx.register_csv("users", "users.csv")

    result = await (
        ValidationSuite()
        .on_data(ctx, "users")
        .with_name("User Data Quality")
        .add_check(
            Check.builder("Critical")
            .with_level(Level.ERROR)
            .is_complete("user_id")
            .is_unique("email")
            .has_size(Assertion.greater_than(0))
            .build()
        )
        .run()
    )
    print(MarkdownFormatter().format(result))

asyncio.run(main())

โšก Benchmark Highlights

Real-world validation on NYC Yellow Taxi trip data.

42M
Records
654 MB
Parquet Data
92
Constraints
1.44s
Engine Time

12 check groups ยท 98.9% pass rate ยท powered by Apache DataFusion
See full benchmark details โ†’

๐Ÿงญ Available Now

Profile, persist, monitor, and bootstrap data quality workflows with features already available in qualink.

๐Ÿ“ˆ Analyzers

Compute reusable dataset and column metrics before turning them into checks.

๐Ÿ—„๏ธ Metrics Repository

Persist analyzer outputs over time to track quality trends, regressions, and baselines.

๐Ÿ” Anomaly Detection

Detect unexpected metric shifts using rate-of-change and z-score strategies.

๐Ÿ’ก Intelligent Rule Suggestions

Generate candidate Qualink rules from profiling results to bootstrap validation suites faster.