Skip to content

Commit c1f9cff

Browse files
authored
Implement isPrimaryKey check, fix and refactor check tests (#250)
* Implement isPrimaryKey check. * Fix and refactor checks tests. * Check isPrimaryKey: use optional hint argument. * Check tests: use List[str] instead of list[str]. * Check tests: fix isUnique check tests. * Check tests: fix test_fail_hasMax test to use self.hasMax. * Check tests: add case with hint into test_fail_isUnique. * Check tests: update run_check return type to List[Row]. * Check isPrimaryKey: add comment abound columns Scala type conversion.
1 parent e210023 commit c1f9cff

3 files changed

Lines changed: 473 additions & 704 deletions

File tree

docs/checks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Here are the current supported functionalities of Checks.
1515
| | areAnyComplete(columns) | Done |
1616
| | haveAnyCompleteness(columns, assertion) | Done |
1717
| | isUnique(column) | Done |
18-
| | isPrimaryKey(column, *columns) | Not Implemented |
18+
| | isPrimaryKey(column, *columns) | Done |
1919
| | hasUniqueness(columns, assertion) | Done |
2020
| | hasDistinctness(columns, assertion) | Done |
2121
| | hasUniqueValueRatio(columns, assertion) | Done |

pydeequ/checks.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -243,15 +243,18 @@ def isPrimaryKey(self, column, *columns, hint=None):
243243
Currently only checks uniqueness, but reserved for primary key checks if there is another
244244
assertion to run on primary key columns.
245245
246-
# how does column and columns differ
247-
:param str column: Column in Data Frame to run the assertion on.
246+
Uniqueness is checked for the list of all columns: [column] + columns.
247+
248+
:param str column: The 1st column in Data Frame to run the assertion on.
249+
:param list[str] columns: Additional columns to run the assertion on.
248250
:param str hint: A hint that states why a constraint could have failed.
249-
:param list[str] columns: Columns to run the assertion on.
250251
:return: isPrimaryKey self: A Check.scala object that asserts completion in the columns.
251252
"""
253+
# This relies on Py4J's implicit conversion from Seq to varargs:
254+
columns_seq = to_scala_seq(self._jvm, columns)
252255
hint = self._jvm.scala.Option.apply(hint)
253-
print(f"Unsolved integration: {hint}")
254-
raise NotImplementedError("Unsolved integration of Python tuple => varArgs")
256+
self._Check = self._Check.isPrimaryKey(column, hint, columns_seq)
257+
return self
255258

256259
def hasUniqueness(self, columns, assertion, hint=None):
257260
"""

0 commit comments

Comments
 (0)