Release Notes

All notable changes to qualink (quality + link — linking your data to quality) are documented here.


v0.0.3 — Next Release

Status: Upcoming

This release expands qualink beyond core validation with metric profiling, historical monitoring, richer datasource support, secret-backed connections, and improved CLI/reporting workflows.

✨ Highlights


📈 Metrics and Monitoring

Feature Description
AnalysisRunner Runs analyzers against a registered table and collects metrics into one context.
AnalyzerContext Stores analyzer metrics, metadata, and any per-analyzer errors.
InMemoryMetricsRepository Lightweight in-process storage for analyzer history.
FileSystemMetricsRepository JSON-backed repository for persisted metric history.
AnomalyDetectionRunner Loads historical metrics and applies one or more anomaly strategies per metric.
RelativeRateOfChangeStrategy Detects sudden jumps from the previous metric value.
ZScoreStrategy Detects deviations from the historical mean using standard deviation.

💡 Rule Bootstrapping


🗂️ Datasources and Connections


🖥️ CLI and Output Improvements


📚 Documentation and Examples


v0.0.1 — Initial Release

Released: March 2026

The first public release of qualink — a blazing-fast data quality framework for Python, built on Apache DataFusion.

✨ Highlights


🏗️ Core Framework

Component Description
ValidationSuite Orchestrates checks against a DataFusion table with sequential or parallel execution.
Check / CheckBuilder Groups constraints under a severity level with a fluent builder pattern.
Constraint Base class for all quality rules — easily extensible.
Level Three severity levels: ERROR, WARNING, INFO.
ValidationResult Structured result with overall status, per-check breakdown, and execution timing.
LoggingMixin Configurable structured logging for debugging and observability.

📏 Constraints

Data Quality Checks

Constraint Description
Completeness Asserts that a column's non-null ratio meets a threshold.
Uniqueness Asserts that a column's values are unique.
UniqueValueRatio Checks the ratio of distinct values to total values.
Distinctness Asserts the count of distinct values satisfies a condition.
ApproxCountDistinct Approximate distinct count using HyperLogLog.
Size Validates the total row count of a table.
ColumnCount Validates the number of columns.
ColumnExists Asserts that a specific column exists.
Statistics Checks min, max, mean, median, std deviation, and sum.
ApproxQuantile Validates approximate quantile (percentile) values.
Compliance Asserts that a SQL condition holds for a given fraction of rows.
CustomSQL Run an arbitrary SQL expression as a quality check.
Correlation Checks correlation between two numeric columns.

String Constraints

Constraint Description
MinLength Validates minimum string length.
MaxLength Validates maximum string length.
PatternMatch Asserts values match a regex pattern.
Format Validates common formats: email, URL, IPv4, phone, SSN, credit card, UUID, and date.

Assertions

Assertion Operators
Assertion equal_to, greater_than, greater_than_or_equal, less_than, less_than_or_equal, between, in_set

🔗 Cross-Table Comparisons

Comparison Description
ReferentialIntegrity Checks foreign-key integrity between two tables.
RowCountMatch Asserts row counts match across tables.
SchemaMatch Compares column names and types between tables.

📄 YAML-Driven Configuration


☁️ Object Store Support


📊 Formatters

Formatter Output
HumanFormatter Pretty-printed table for terminal / console output.
JsonFormatter Machine-readable JSON for pipelines and APIs.
MarkdownFormatter Markdown tables for reports, PRs, and documentation.

📦 Installation

pip install qualink

Requirements: Python ≥ 3.12 • DataFusion ≥ 51.0.0 • PyArrow ≥ 15.0.0


🙏 Acknowledgements

Built on top of the incredible Apache DataFusion and Apache Arrow projects.


Have feedback? Open an issue on GitHub.