Comparison API Reference

qualink.comparison — Low-level cross-table comparison utilities.

These classes are used internally by the cross-table constraints but can also be used standalone.

from qualink.comparison.referential_integrity import ReferentialIntegrity, ReferentialIntegrityResult
from qualink.comparison.row_count_match import RowCountMatch, RowCountMatchResult
from qualink.comparison.schema_match import SchemaMatch, SchemaMatchResult

qualink.comparison.referential_integrity

@dataclass(frozen=True)

class ReferentialIntegrityResult

Field Type Default
match_ratio float required
unmatched_count int required
total_count int required

Properties:


class ReferentialIntegrity(LoggingMixin)

Checks that all values in *child_table.child_column* exist in *parent_table.parent_column*.

Executes a LEFT ANTI JOIN via DataFusion SQL.

ReferentialIntegrity(child_table: str, child_column: str, parent_table: str, parent_column: str)
Param Type Default
child_table str
child_column str
parent_table str
parent_column str
async run(ctx: SessionContext) → ReferentialIntegrityResult

qualink.comparison.row_count_match

Row count match check between two DataFusion tables.

@dataclass(frozen=True)

class RowCountMatchResult

Field Type Default
count_a int required
count_b int required
ratio float required

Properties:


class RowCountMatch(LoggingMixin)

Compares row counts of two tables via DataFusion.

RowCountMatch(table_a: str, table_b: str)
Param Type Default
table_a str
table_b str
async run(ctx: SessionContext) → RowCountMatchResult

qualink.comparison.schema_match

Schema match check between two DataFusion tables.

@dataclass(frozen=True)

class SchemaMatchResult

Field Type Default
matching_columns list[str] field(default_factory=list)
only_in_a list[str] field(default_factory=list)
only_in_b list[str] field(default_factory=list)
type_mismatches dict[str, tuple[str, str]] field(default_factory=dict)

Properties:


class SchemaMatch(LoggingMixin)

Compares schemas of two tables via DataFusion.

SchemaMatch(table_a: str, table_b: str)
Param Type Default
table_a str
table_b str
async run(ctx: SessionContext) → SchemaMatchResult