Comparison API Reference
qualink.comparison — Low-level cross-table comparison utilities.
These classes are used internally by the cross-table constraints but can also be used standalone.
from qualink.comparison.referential_integrity import ReferentialIntegrity, ReferentialIntegrityResult
from qualink.comparison.row_count_match import RowCountMatch, RowCountMatchResult
from qualink.comparison.schema_match import SchemaMatch, SchemaMatchResult
qualink.comparison.referential_integrity
@dataclass(frozen=True)
class ReferentialIntegrityResult
| Field | Type | Default |
|---|---|---|
match_ratio |
float |
required |
unmatched_count |
int |
required |
total_count |
int |
required |
Properties:
is_valid: bool
class ReferentialIntegrity(LoggingMixin)
Checks that all values in *child_table.child_column* exist in *parent_table.parent_column*.
Executes a LEFT ANTI JOIN via DataFusion SQL.
ReferentialIntegrity(child_table: str, child_column: str, parent_table: str, parent_column: str)
| Param | Type | Default |
|---|---|---|
child_table |
str |
— |
child_column |
str |
— |
parent_table |
str |
— |
parent_column |
str |
— |
async run(ctx: SessionContext) → ReferentialIntegrityResult
qualink.comparison.row_count_match
Row count match check between two DataFusion tables.
@dataclass(frozen=True)
class RowCountMatchResult
| Field | Type | Default |
|---|---|---|
count_a |
int |
required |
count_b |
int |
required |
ratio |
float |
required |
Properties:
is_match: bool
class RowCountMatch(LoggingMixin)
Compares row counts of two tables via DataFusion.
RowCountMatch(table_a: str, table_b: str)
| Param | Type | Default |
|---|---|---|
table_a |
str |
— |
table_b |
str |
— |
async run(ctx: SessionContext) → RowCountMatchResult
qualink.comparison.schema_match
Schema match check between two DataFusion tables.
@dataclass(frozen=True)
class SchemaMatchResult
| Field | Type | Default |
|---|---|---|
matching_columns |
list[str] |
field(default_factory=list) |
only_in_a |
list[str] |
field(default_factory=list) |
only_in_b |
list[str] |
field(default_factory=list) |
type_mismatches |
dict[str, tuple[str, str]] |
field(default_factory=dict) |
Properties:
is_match: bool
class SchemaMatch(LoggingMixin)
Compares schemas of two tables via DataFusion.
SchemaMatch(table_a: str, table_b: str)
| Param | Type | Default |
|---|---|---|
table_a |
str |
— |
table_b |
str |
— |
async run(ctx: SessionContext) → SchemaMatchResult