Support JWST and Roman by icweaver · Pull Request #25 · JuliaAstro/ASDF.jl

icweaver · 2026-03-21T11:38:34Z

Hey folks, these are a few toggles I added to try and support some of the quirks I ran into "out in the wild". I've split them into the following three commits:

Don't choke on unknown tags: dc40329
1. This just provides a fallback for STScI's many asdf tags. Could be replaced by specific converters in the future.
2. Also adds a metadata tag and a newer ndarray-1.1.0 tag that Roman seems to use now.
3. This feature is behind an ASDF.load(<filename.asdf>; extensions = false) kwarg by default to preserve current behavior. Similar to the extensions kwarg in the Python impl.
Checksum or not to checksum: 56a8c33
1. Apparently some Roman data products store their file's checksum based on the uncompressed file instead of compressed (ASDF.jl's default). This breaks things.
2. Added a flag to get around this, similarly to the Python impl.
3. 2026-04-03: Update from ASDF office hours. This seems to be a genuine bug, so we will keep this checksum validation flag in on our end. Thanks for helping us confirm this, Brett!
Blocks vs Frames: 0d425a0
1. Roman seems to use the block layout for their Lz4 compression, instead of frame. This adds support for both, and automatically handles it using magic numbers.

Usage examples:

JWST
Roman

Does this seem reasonable to folks? I'd especially appreciate feedback for the last point, which I have little experience in and relied heavily on Claude for to come up with the needed incantations.

…r Roman)

codecov · 2026-03-21T11:39:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.74%. Comparing base (ff79be9) to head (ab17e30).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #25      +/-   ##
==========================================
+ Coverage   99.70%   99.74%   +0.03%     
==========================================
  Files           1        1              
  Lines         343      389      +46     
==========================================
+ Hits          342      388      +46     
  Misses          1        1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

eschnett · 2026-03-21T15:47:06Z

+        data[3] == 0x4D && data[4] == 0x18)
+        return decode(LZ4FrameCodec(), data)
+    else
+        # If the data was originally created from Python's ASDF, then it will be in block instead of frame layout,


Can we also produce the block layout? Should we? Can Python handle the frame layout?

Thanks for reminding me, will add that in. Looks like they can, but as a plug-in https://github.com/asdf-format/asdf-compression

Added in e012b21

There is a layout option for NDArrayWrapper now to flip between frame and block. Does that seem like a reasonable place to control things? I guess it doesn't really make sense for other compression schemes, so I just set the default layout as default

I don't love from a maintainability point of view that there is now a hand-rolled Lz4-specific encode and decode path to accommodate Python's asdf scheme. There's also the matter of compatibility with Lz4 frame support on the Python side, but since that's still an experimental plugin, maybe that can be a problem for future us to deal with.

I am now squarely out of my comfort zone and would gladly accept any suggestions for simplifying things, haha. Thanks again for taking a look!

If this is only for lz4 then I would call it lz4_layout, with values :frame and :block. I am sure that other compression schemes will also want to have options in the future, e.g. specifying the compression level.

Sounds good, just renamed here e5b6bd4

eschnett · 2026-03-21T15:49:14Z

The checksum bit sounds weird. There is a standard for checksums, and THE major player in the field gets the implementation wrong? Does this make checksums useless in practice? Well, the world is what it is...

icweaver · 2026-03-21T20:44:37Z

Yea, the checksum bit was bothering me too. Here's what I am seeing for the sample Roman data I tried:

julia> using ASDF, MD5

julia> af = ASDF.load_file("docs/data/roman.asdf"; extensions = true);

julia> ndarray = af.metadata["roman"]["data"];

julia> header = ndarray.lazy_block_headers.block_headers[ndarray.source + 1]
ASDF.BlockHeader(IOStream(<file docs/data/roman.asdf>), 176347, UInt8[0xd3, 0x42, 0x4c, 0x4b], 0x0030, 0x00000000, UInt8[0x6c, 0x7a, 0x34, 0x00], 0x0000000003ffe53b, 0x0000000003ffe53b, 0x0000000003fc0100, UInt8[0xef, 0x4e, 0x63, 0x45, 0xc4, 0xd6, 0xcd, 0xa0, 0xed, 0x4d, 0x14, 0x27, 0x43, 0xa7, 0xb2, 0xbc], true)

julia> block_data_start = header.position + 6 + header.header_size
176401

julia> seek(header.io, block_data_start)
IOStream(<file docs/data/roman.asdf>)

julia> data = Array{UInt8}(undef, header.used_size);

julia> nb = readbytes!(header.io, data)
67102011

julia> data_decompressed = ASDF.decode_Lz4(data);

julia> md5(data_decompressed) == header.checksum
true

I wonder if this behavior could just be an uncompressed data streaming thing for this specific instance with Roman's data products. I don't work with AWS S3, but it seems that this a selling point for them I guess. fwiw, the JWST data I tried looks to behave as expected (it's just downloaded as a regular file). Here are worked examples for both

Doc previews:

eschnett · 2026-04-04T16:47:59Z

You have some unchecked to-do items: Is this PR ready to be merged, or are you still working on it?

icweaver · 2026-04-04T17:41:05Z

Thanks Erik, just updated the top-level comment and merged. According to office hours yesterday, the checksum issue is a genuine bug on the Python side, so according to Brett it sounded like we are good to go over here.

Re: DKIST and newer Roman files, most things look to work in informal tests I tried, and the remaining bits are tracked over in #32

Adds support for structured (nested) [datatypes](https://www.asdf-format.org/projects/asdf-standard/en/latest/generated/stsci.edu/asdf/core/datatype-1.0.0.html), and support for ascii and ucs (also outlined in that link), which some newer Roman data files seem to use. This required changing how the `datatype` field is parsed, so I thought it would be better to have this as a separate PR instead of cramming it into #25. To add this support, the new `AsciiDatatype` and `Ucs4Datatype` types were introduced. These are spiritual variants of `Datatype`, but since that one is an enum, it seemed like it would be more trouble than it's worth to get them to all subtype a common abstract type. This PR also includes a grip of tests using the reference files from [asdf-standard](https://github.com/asdf-format/asdf-standard/tree/main/reference_files/1.6.0). I just included a copy of the most recent ones (v1.6.0) in the test directory here for simplicity. On a related note, should the asdf spec version [printed here](https://github.com/JuliaAstro/ASDF.jl/blob/9810e49925cb5cd0f3ad053f8c7b1bbac82027a0/src/ASDF.jl#L1199) be updated to reflect this? Could be a good first PR to generalize things to dynamically load different versions directly from that repo instead. Another good one might be to start refactoring `src/ASDF.jl` now that it is starting to get a little long # Try this PR 1. Install Julia: <https://julialang.org/downloads/> 1. Create a new environment: ```julia > julia julia> # Press ] to enter pkg mode pkg> activate my-env ``` 1. Install this branch: ```julia (my-env) pkg> add ASDF#datatype ``` 1. Load a sample asdf file: ```julia (my-env) pkg> # Press backspace to return to Julia REPL julia> using ASDF julia> load("path to your asdf file") ``` Sample asdf files [are available here](https://github.com/JuliaAstro/ASDF.jl/tree/datatype/test/data/asdf-1.6.0). Documentation preview: <https://juliaastro.org/ASDF.jl/previews/PR32/>

icweaver added 4 commits March 20, 2026 20:32

feat: extensions/tags fallback

dc40329

feat: checksum validation flag

56a8c33

feat: autodetect block or frame layout for Lz4 compression (needed fo…

0d425a0

…r Roman)

Merge branch 'main' into strict

a821211

icweaver mentioned this pull request Mar 21, 2026

docs: Usage examples #19

Merged

eschnett reviewed Mar 21, 2026

View reviewed changes

icweaver added 2 commits March 22, 2026 00:34

feat: add Lz4 block writing for python asdf compat

e012b21

refactor: rename layout --> lz4_layout

e5b6bd4

icweaver added this to the Release v2 milestone Mar 25, 2026

icweaver added 6 commits March 30, 2026 10:35

Merge branch 'main' into strict

c305d7a

Merge branch 'main' into strict

c2af315

test: update

c67bc14

Merge branch 'main' into strict

4e591e7

test: update

e9a377d

test: added some

ab17e30

icweaver mentioned this pull request Apr 4, 2026

feat: More datatype support #32

Merged

icweaver merged commit 5e5154b into main Apr 4, 2026
22 checks passed

icweaver deleted the strict branch April 4, 2026 17:38

icweaver mentioned this pull request Apr 4, 2026

Release version 2.0 #13

Closed

13 tasks

icweaver referenced this pull request Jul 2, 2026

style: Apply Runic.jl formatting (#49)

919c3f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support JWST and Roman#25

Support JWST and Roman#25
icweaver merged 12 commits into
mainfrom
strict

icweaver commented Mar 21, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

eschnett Mar 21, 2026

Uh oh!

icweaver Mar 21, 2026

Uh oh!

icweaver Mar 22, 2026

Uh oh!

eschnett Mar 22, 2026

Uh oh!

icweaver Mar 22, 2026

Uh oh!

eschnett commented Mar 21, 2026

Uh oh!

icweaver commented Mar 21, 2026 •

edited

Loading

Uh oh!

eschnett commented Apr 4, 2026

Uh oh!

Uh oh!

icweaver commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

icweaver commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eschnett Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

icweaver Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

icweaver Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

eschnett Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

icweaver Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

eschnett commented Mar 21, 2026

Uh oh!

icweaver commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eschnett commented Apr 4, 2026

Uh oh!

Uh oh!

icweaver commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

icweaver commented Mar 21, 2026 •

edited

Loading

codecov Bot commented Mar 21, 2026 •

edited

Loading

icweaver commented Mar 21, 2026 •

edited

Loading