Getting Started

qualink — short for quality link — is a blazing-fast data quality framework for Python, built on Apache DataFusion. The name reflects the project's mission: linking your data to quality by bridging raw datasets with rigorous validation rules. It lets you define, run, and report data-quality checks against CSV, Parquet, JSON, or any data source supported by DataFusion — all powered by SQL under the hood.

Why qualink?

Architecture Overview

┌─────────────────────────────────────────────────┐
│              ValidationSuite                    │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐   │
│  │  Check 1  │  │  Check 2  │  │  Check N  │   │
│  │  (ERROR)  │  │ (WARNING) │  │  (INFO)   │   │
│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘   │
│        │              │              │          │
│  ┌─────▼─────┐  ┌─────▼─────┐  ┌─────▼─────┐   │
│  │Constraints│  │Constraints│  │Constraints│   │
│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘   │
│        │              │              │          │
│        └──────────────┼──────────────┘          │
│                       ▼                         │
│              Apache DataFusion                  │
│            (SQL query engine)                   │
└─────────────────────────────────────────────────┘
                       │
                       ▼
              ValidationResult
              ┌───────────────┐
              │ success: bool │
              │ status: str   │
              │ report: ...   │
              └───────────────┘

Key Concepts

Concept Description
ValidationSuite Top-level entry point that orchestrates checks
Check A named group of constraints with a severity level
Constraint A single validation rule (e.g., completeness ≥ 0.95)
Assertion A reusable predicate for numeric comparisons
Level Severity: ERROR, WARNING, or INFO
Formatter Converts results to Human, JSON, or Markdown output
qualinkctl CLI tool to run YAML validations from the terminal

Next Steps