Consolidate DataFetchers#626
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the data fetching architecture by introducing a centralized GetArray utility and dedicated fetchers for Zarr and NetCDF data, alongside adding necessary security headers for SharedArrayBuffer in the Next.js configuration. The review identifies several issues in the new implementation, including a bug where a fillValue of zero is incorrectly handled, inconsistent variable usage during chunk fetching when overrides are present, and potential crashes from missing null checks on LRU cache lookups. Further improvements include removing unnecessary await keywords on synchronous functions, deleting debug logs, and cleaning up redundant code.
| const tempChunk = cache.get(`${cacheBase}_chunk_${id}`); | ||
| tempChunk.scaling = scalingFactor; | ||
| RescaleArray(tempChunk.data, delta); | ||
| cache.set(`${cacheBase}_chunk_${id}`, tempChunk); | ||
| } |
There was a problem hiding this comment.
There is no check to ensure tempChunk exists before accessing its properties. Since the cache uses an LRU policy, entries can be evicted at any time. Accessing tempChunk.scaling on an evicted entry will cause a runtime crash.
const tempChunk = cache.get(`${cacheBase}_chunk_${id}`);
if (tempChunk) {
tempChunk.scaling = scalingFactor;
RescaleArray(tempChunk.data, delta);
cache.set(`${cacheBase}_chunk_${id}`, tempChunk);
}| console.log("thisone") | ||
| const inputArray = analysisMode ? analysisArray : await GetCurrentArray(analysisStore) |
There was a problem hiding this comment.
| const is2D = outputShape.length === 2 | ||
| async function Analyze(){ | ||
| const dataArray = analysisMode ? analysisArray : GetCurrentArray(analysisStore) | ||
| const dataArray = analysisMode ? analysisArray : await GetCurrentArray(analysisStore) |
| export class ZarrFetcher { | ||
| private outVar: zarr.Array<zarr.DataType, any> | undefined; | ||
| private variable: string; | ||
|
|
||
| constructor(variable: string) { | ||
| this.variable = variable; | ||
| } | ||
|
|
||
| async init(){ | ||
| const { currentStore } = useZarrStore.getState(); | ||
| const group = await currentStore; | ||
| const tempOutVar = await zarr.open(group.resolve(this.variable), { kind: "array" }); | ||
| this.outVar = tempOutVar; | ||
| } | ||
|
|
||
| async getMetadata(): Promise<any> { | ||
| if (!this.outVar) { | ||
| await this.init(); | ||
| if (!this.outVar){ | ||
| throw new Error(`Init failed`); | ||
| } | ||
| } | ||
| if (!this.outVar.is("number") && !this.outVar.is("bigint")) { | ||
| throw new Error(`Unsupported data type: ${this.outVar.dtype}`); | ||
| } | ||
|
|
||
| const symbols = Object.getOwnPropertySymbols(this.outVar); | ||
| const contextSymbol = symbols.find(s => s.toString().includes("zarrita.context")); | ||
| const fillValue = contextSymbol && !Number.isNaN((this.outVar as any)[contextSymbol]?.fill_value) | ||
| ? (this.outVar as any)[contextSymbol].fill_value | ||
| : NaN; | ||
| return { | ||
| shape: this.outVar.shape, | ||
| chunkShape: GetSize(this.outVar)[2], | ||
| fillValue, | ||
| dtype: this.outVar.dtype, | ||
| }; | ||
| } | ||
|
|
||
| async fetchChunk({ | ||
| rank, | ||
| chunkShape, | ||
| x, | ||
| y, | ||
| z, | ||
| xDimIndex, | ||
| yDimIndex, | ||
| zDimIndex, | ||
| idx4D, | ||
| }: any): Promise<FetchOutput> { | ||
| if (!this.outVar) { | ||
| await this.init(); | ||
| if (!this.outVar){ | ||
| throw new Error(`Init failed`); | ||
| } | ||
| } | ||
|
|
||
| const chunkSlice = new Array(rank).fill(0); | ||
| chunkSlice[xDimIndex] = zarr.slice(x * chunkShape[xDimIndex], (x + 1) * chunkShape[xDimIndex]); | ||
| chunkSlice[yDimIndex] = zarr.slice(y * chunkShape[yDimIndex], (y + 1) * chunkShape[yDimIndex]); | ||
| if (zDimIndex >= 0) { | ||
| chunkSlice[zDimIndex] = zarr.slice(z * chunkShape[zDimIndex], (z + 1) * chunkShape[zDimIndex]); | ||
| } | ||
| if (rank >= 4) { | ||
| chunkSlice[0] = idx4D; | ||
| } | ||
|
|
||
| const chunk = await fetchWithRetry( | ||
| () => zarr.get(this.outVar!, chunkSlice), | ||
| `variable ${this.variable}`, | ||
| useGlobalStore.getState().setStatus | ||
| ); | ||
|
|
||
| if (!chunk || chunk.data instanceof BigInt64Array || chunk.data instanceof BigUint64Array) { | ||
| throw new Error("BigInt arrays not supported."); | ||
| } | ||
| return { | ||
| data: chunk.data as Float32Array, | ||
| shape: chunkShape, | ||
| stride: chunk.stride, | ||
| }; | ||
| } | ||
| } No newline at end of file |
| async headers() { // Needed for SharedArrayBuffer. Security reasons won't let sharedArrayBuffer | ||
| return [ | ||
| { | ||
| source: '/(.*)', | ||
| headers: [ | ||
| { | ||
| key: 'Cross-Origin-Opener-Policy', | ||
| value: 'same-origin', | ||
| }, | ||
| { | ||
| key: 'Cross-Origin-Embedder-Policy', | ||
| value: 'require-corp', | ||
| }, | ||
| ], | ||
| }, | ||
| ] | ||
| }, |
There was a problem hiding this comment.
This is leftover from the SharedBufferArray that will be used for the workers.
There was a problem hiding this comment.
I think we should try to add this in the fetch options of the zarrita fetch store. And maybe remove for now.
There was a problem hiding this comment.
While I'm not an expert almost certain this is unrelated to Zarrita.
There was a problem hiding this comment.
if it is a fetch operation, then is zarrita. And we should pass headers directly there, instead of setting this globally. Similar to, https://zarrita.dev/packages/storage.html#adding-authentication-headers
There was a problem hiding this comment.
Let me rephrase. Although I'm not certain what this is doing. It's unrelated to zarrita.
This addresses #602 by merging the logic of the netcdf and zarr fetchers. I dunno if I should make the fetcher objects a class or keep it as it is. Cause the current approach is a little odd.
I've left my code for a class approach in if it makes sense to switch later.