Skip to content

Commit 2df302d

Browse files
gaogaotiantianHyukjinKwon
authored andcommitted
[SPARK-56607][PYTHON][FOLLOWUP] Use pyspark.sql.DataFrame to support connect-only
### What changes were proposed in this pull request? Use `pyspark.sql.DataFrame`, not the classic one, in `mlutils.py`. ### Why are the changes needed? We have connect only CI which does not even have class DataFrame. This util should work with connect DataFrame too. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `test_pipeline` and `test_parity_pipeline` passed locally. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #55630 from gaogaotiantian/fix-mlutils. Authored-by: Tian Gao <gaogaotiantian@hotmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 6bfe0ef commit 2df302d

1 file changed

Lines changed: 6 additions & 2 deletions

File tree

python/pyspark/testing/mlutils.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,7 @@
2525
from pyspark.ml.classification import Classifier, ClassificationModel
2626
from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable
2727
from pyspark.ml.wrapper import _java2py
28-
from pyspark.sql import SparkSession
29-
from pyspark.sql.classic.dataframe import DataFrame
28+
from pyspark.sql import DataFrame, SparkSession
3029
from pyspark.sql.types import DoubleType
3130
from pyspark.testing.utils import ReusedPySparkTestCase as PySparkTestCase
3231

@@ -100,6 +99,11 @@ def tearDownClass(cls):
10099

101100

102101
class MockDataset(DataFrame):
102+
def __new__(cls, *args, **kwargs):
103+
# DataFrame by default creates classic DataFrame, we need this to
104+
# overwrite the default behavior.
105+
return object.__new__(cls)
106+
103107
def __init__(self):
104108
self.index = 0
105109

0 commit comments

Comments
 (0)