Fix dmrpp error handling#880
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #880 +/- ##
==========================================
+ Coverage 89.11% 89.21% +0.10%
==========================================
Files 34 34
Lines 1956 1985 +29
==========================================
+ Hits 1743 1771 +28
- Misses 213 214 +1
🚀 New features to boost your workflow:
|
|
@betolink the dmrpp parser has been integrated into pydap (see here), and see for example: pydap/PR#649 for parsing inline missing values (that cause some issues with attributes when parsing variables). I just have not yet made a release since then, and I still need to make sure that the improvements I have made work with a virtualizarr local dev branch that I have not pushed yet to fully migrate the parser away from vz (it was working during AGU but I need to update a few things... ). |
|
@Mikejmnez nice!! should we try to merge this and then migrate once the pydap parser is ready and gets released? we could also integrate this PR (the idea of the safe attr getter) into your branch and close this PR. The next PR here would be to use PyDAP. I'm open to both options. |
|
Yeah good question - I have also been looking at parsing various attributes currently unsupported. I find the idea of "required=True|False" attributes very interesting. What constitutes required? That logic may need to live outside of pydap (and so perhaps in VZ) because all attributes should be parsed fine by pydap, the question is whether these are supported by Virtualizarrt... For example, dmrpps may have:
These should be parsed by the dmrpp parser, but are these supported by other APIs? |
Do you guys want to just move this parser out of VirtualiZarr entirely? We have a 3rd-party extension system now, which this DMRPP parser predates. It's something of a smell that VZ devs will have to approve PRs changing the DMRPP parser, when we don't really know anything about it. EDIT: We could still advertise it in VZ, or even import then re-export it to retain backwards compatibility. But if you are re-implementing it in PyDAP then we surely should just move development of the parser over there? |
I think that's the idea, I'd would like to merge this PR as the last work we do here and then start working towards removing the code, then we could import it when is ready in pydap and make it backwards compatible like you said. I wonder when you mentioned 3rd party extensions, what is the scope? will the parsers become extensions? or what is the criteria of core vs 3rd party. |
I was referring to the parsers. They are a mechanism for extension by 3rd parties.
👍 Just let me know when you want me to review/merge this.
@Mikejmnez can you explain more about what these mean? Or maybe open another issue if that's a separate topic? |
@betolink Sounds good with me.
@TomNicholas These are two kinds of inline values that can occur within a DMRpp. These are not parsed currently by the parser, and results in the parser dropping the variable when attempting to parse it with a warning. This PR that @betolink has helps log these variables to track them I think. But the parser should parse them (which is what I am currently working on the pydap side). My understanding was that it was unsupported by VZ, but I am glad to see Max's work on enabling this ! |
|
@Mikejmnez one note, Max PR is for when VZ already parsed these into Kerchunk, so then we can open that store and append without having to recreate it all from scratch. This PR is only logging "{var} could not be parsed because of {x}" (instead of just the parser blowing) and also enumerates phony dimensions, right now all they become "phony_dim" instead of "phony_dim_{n}". I'm excited to see your work and Max's in VZ, we'll be in a position to roundtrip chunkmanifest with inline values! |
Max's (draft) PR is actually more general than that. It was motivated by Kerchunk, but has other uses too. The idea is it will allow storing some chunks inlined into VZ's in-memory manifests, regardless of the parser they came from. |
|
Alright, I think this PR is ready for review then @TomNicholas @Mikejmnez |
TomNicholas
left a comment
There was a problem hiding this comment.
lemme know when you want me to merge
|
Thanks Tom! anytime is good. |
- zarr-developers#880 Fix dmrpp error handling - zarr-developers#868 Fix error with Zarr-Python 3.1.0 - zarr-developers#924 Fix coordinate name issue - zarr-developers#916 Fix ZarrParser to use public attribute Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* add empty release notes * Fix wrong PR link references in release notes - Line 57: PR text said #927 but URL pointed to #932 (correct PR) - Line 115: PR text said #565 but URL pointed to #822 (correct PR) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move PR #818 from v2.2.0 to unreleased section This PR was merged after v2.4.0 but was incorrectly listed under the v2.2.0 release notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add missing bug fixes to unreleased section - #880 Fix dmrpp error handling - #868 Fix error with Zarr-Python 3.1.0 - #924 Fix coordinate name issue - #916 Fix ZarrParser to use public attribute Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add missing documentation entries to unreleased section - #918 FAQ answer on "why still write native zarr?" - #893 Update FAQ regarding virtualizing existing Zarr V2 data - #937 R2 docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add missing internal change #909 to unreleased section Compiled regular expressions for improved performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add missing documentation entries to v2.4.0 release notes - #855 Add example of virtualizing GOES - #856 Update kerchunk comparison in FAQ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add missing entries to v2.2.0 release notes - #829 Raise informative error on Zarr V2 parsing with Zarr-Python<3.1.3 - #805 Revert unnecessary dtype conversion in icechunk writer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
I think a while ago we talked about eventually moving the parser to pydap as that would be the more native place to improve it and fix bugs, I think we should move in that direction and offer a thin wrapper for virtualization. However that integration is still TDB, this PR addresses some of the quirkiness of some DMRpp files mostly by adding a safe attribute getter function. Unit tests were added and now the parsers shows up as one of the valid parsers for VirtualiZarr.
docs/releases.rstapi.rstp.s. once #794 gets merged, we should be able to create and append Zarr stores from NASA earthdata using DMRpp + kerchunk really fast!
cc @Mikejmnez @owenlittlejohns @TomNicholas @danielfromearth