[GH-2004] Geopandas.GeoSeries: Implement Test Framework#2005
Conversation
|
@zhangfengcdt PR definitely isn't fully ready yet, but I want to hear your thoughts. What types of common tests should we include for each function? Originally, I was thinking we have type checking tests (done) and tests for comparing our output to original geopandas function output, but the result of As I mentioned in the PR description, I want this PR to flush out all the common tests we use for every method. |
I think we need both basic tests that manually check the results from the API and the match to geopandas test suits. For the later, we might get some idea from the pandas on spark package, especially how For the former, Sedona has a test suit for expression (functions) and we may want to at least cover the cases in there: |
|
For various reasons, none of the built in assert methods work out of the box, even when using parameters like Another reason I'd like to avoid using the pyspark testing functions (e.g assertPandasOnSparkEqual and assertDataFrameEqual) because they've been removed and added across different version. They're not available until 3.5.0, and we'd have to start using annoying conditional logic like |
|
I moved old test code to |
Looks great! I think for the first step, we could target covering whatever these scala tests cover in exact results. |
That made perfect sense originally, but now I see why it wasn't done in Scala. These test files we're using ( Another thought. Originally, I had a test checking if the sedona sql function's result matched our new sgpd result. However, we agreed to remove it because it seemed trivial that of course the results would match. Following that same logic, if we can assume the new sedona geopandas results match the sedona sql results, then why do we need to replicate the Scala tests again if we already know sedona sql passes it? Or maybe we should just not make that assumption (in that case maybe it is worth bringing back that old sgpd vs sedona test). Personally, I'm fine with making the assumption. If anything, maybe we should hard-code results for the |
I think the reason why we still need the hard-coded tests are that they are not simply replicating the scala expression tests, they should extend the scala cases. Basically, the goals are:
This means we will very possibly cover more cases than the scala expression tests. |
|
Yes that makes sense, but the hard-coded outputs from the original scala tests would clutter up the codebase too much due to their sheer size (possibly like 1000+ lines for some test cases). IMO, it's not reasonable to hard code those outputs. Then at that point, if we're not extending the scala tests, is it worth still replicate those tests exactly in in python ( How do you feel about hard-coding the geopandas match test cases instead? Those ones are smaller, so the expected results won't be too bad. |
Oh, I meant we can replicate the types of checks from that scala test, not the exact data, we can use smaller inputs for sure. |
|
How's this |
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject.What changes were proposed in this PR?
Implement GeoSeries test skeleton. The goal here is to fully flush out the common testing code we will use for testing GeoSeries methods. Once we have this merged in, I can rapid-fire GeoSeries method implementations using the same testing structure, occasionally adding function specific tests when needed.
This PR also fixes a bug in the
__repr__()method and changes the return type of.area()to pd.Series to be consistent with the Geopandas behavior.How was this patch tested?
Add tests for existing functionality
Did this PR include necessary documentation updates?