fix: Multiple Spark Enhancements #1936
pull-request.yml
on: pull_request
Lint commit message
57s
Lint source code
2m 47s
Validate Docs
1m 1s
Annotations
3 errors
|
Lint commit message
You have commit messages with errors
⧗ input: Update spark numeric stats calculations
In the Pandas implementation, the numeric stats like min/max/stddev/etc. by default ignore null values.
This commit updates the spark implementation to more closely match that.
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 2 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Update spark null checks for describe_counts_spark method
Need to add the isnan() check because Pandas isnull check will count NaN as null, but Spark does not
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 2 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Update Spark frequency counts
The previous calculation of counts was actually counting an already summarized dataframe, so it wasn't capturing the correct counts for each instance of a value.
This is updated by summing the count value instead of performing a row count operation.
✖ body's lines must not be longer than 120 characters [body-max-line-length]
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 3 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Adding tests for issue 1429
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 2 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Edge Case - completely null numeric field in Spark
Discovered this edge case with real data, and still need to fix the rendering of an empty histogram.
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 2 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Add handling for spark DecimalType
This change addresses issue #1602 (https://github.com/ydataai/ydata-profiling/issues/1602).
Computations in the summarize process result in some floats when computing against decimal columns.
To solution this, we simply convert those types to a DoubleType when performing those numeric operations.
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 2 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Add handling for spark correlations with no numeric fields
This change addresses issue #1722 (https://github.com/ydataai/ydata-profiling/issues/1722).
Assembling a vector column in Spark with no numeric columns results in features with a NULL size, NULL indices, and an empty list of values.
This causes an exception to be raised when computing correlations.
The solution here is to avoid computing the correlation matrix when there are no interval columns (numeric).
✖ footer's lines must not be longer than 120 characters [footer-max-line-length]
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 3 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Allow NoneType values to be string formatted
This change addresses issue #1723 (https://github.com/ydataai/ydata-profiling/issues/1723).
It implements a "N/A" string as the default when formatting NoneType values.
✖ subject may not be empty [subject-empty]
✖ type may not be empty [type-empty]
✖ found 2 problems, 0 warnings
ⓘ Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint
⧗ input: Multiple Spark fixes to approach closer parity to Pandas profiles
Addresses handling of completely null numeric columns, and gracefully handling empty correlation sets and plots.
✖ subject may not
|
|
Lint source code
Process completed with exit code 1.
|
|
Lint source code
Process completed with exit code 2.
|