Fixed a output seismograph file bug in GPU mode#1278
Merged
Conversation
…not 1, the output seismograph file appended the previous data at the end, thereby overwriting the original data.
There was a problem hiding this comment.
Pull request overview
Fixes incorrect seismogram output in GPU mode when seismograms are written in chunks (e.g., when NTSTEP_BETWEEN_OUTPUT_SAMPLE is not 1) by ensuring device-side seismogram buffers are flushed to host before file output.
Changes:
- Flush GPU seismogram buffers to host immediately before
write_seismograms_to_file()runs. - Adjust CUDA seismogram transfer logic to copy only the “valid” portion of the buffer.
- Add a CPU-only stub for the new CUDA flush entry point for non-CUDA builds.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/specfem2D/write_seismograms.F90 |
Calls a new GPU flush routine before writing seismograms to disk. |
src/gpu/write_seismograms_cuda.cu |
Implements flush_seismograms_cuda() and changes device→host transfer strategy. |
src/gpu/specfem2D_gpu_cuda_method_stubs.c |
Adds a no-op stub for flush_seismograms_cuda() when CUDA is not enabled. |
Comments suppressed due to low confidence (1)
src/gpu/write_seismograms_cuda.cu:276
- The subsequent host->Fortran copy loops over j=0..nlength_seismogram-1 for every receiver even when only valid_length samples were copied from the device. This does extra work and also makes the earlier host-buffer zeroing necessary. Consider copying/converting only the first valid_length samples (or seismo_current samples in flush), leaving the unused tail untouched.
// copies values into host array
for (int irec=0; irec < mp->nrec_local; irec++){
for (int j=0; j < nlength_seismogram; j++){
sisux[j + nlength_seismogram * irec] = (double) h_seismo[j + nlength_seismogram * irec];
sisuz[j + nlength_seismogram * irec] = (double) h_seismo[j + nlength_seismogram * irec + size];
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+250
to
+269
| // clears host buffers first, then copies only the samples that were actually written. | ||
| // This prevents stale tail values from a previous chunk from leaking into the final trace. | ||
| for (int i = 0; i < 2 * size; i++) { | ||
| h_seismo[i] = 0.0f; | ||
| } | ||
|
|
||
| // copies only the valid part from GPU to CPU (note: could use async mem copy in future...) | ||
| for (int irec = 0; irec < mp->nrec_local; irec++) { | ||
| int device_offset = irec * nlength_seismogram; | ||
| int host_offset = irec * nlength_seismogram; | ||
|
|
||
| print_CUDA_error_if_any(cudaMemcpy(h_seismo + host_offset, | ||
| d_seismo + device_offset, | ||
| sizeof(realw) * valid_length, | ||
| cudaMemcpyDeviceToHost),72001); | ||
| print_CUDA_error_if_any(cudaMemcpy(h_seismo + size + host_offset, | ||
| d_seismo + size + device_offset, | ||
| sizeof(realw) * valid_length, | ||
| cudaMemcpyDeviceToHost),72002); | ||
| } |
Comment on lines
+293
to
+311
| Mesh* mp = (Mesh*)(*Mesh_pointer_f); | ||
|
|
||
| synchronize_cuda(); | ||
|
|
||
| if (mp->nrec_local == 0) return; | ||
|
|
||
| int i_sig = *i_sigf - 1; | ||
| int seismo_current = *seismo_currentf; | ||
| int nlength_seismogram = *nlength_seismogramf; | ||
|
|
||
| if (seismo_current <= 0) return; | ||
|
|
||
| if (seismo_current > nlength_seismogram) seismo_current = nlength_seismogram; | ||
|
|
||
| realw* h_seismo = mp->h_seismograms[i_sig]; | ||
| realw* d_seismo = mp->d_seismograms[i_sig]; | ||
|
|
||
| cudaStreamSynchronize(mp->compute_stream); | ||
|
|
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Member
|
thanks for the contribution! 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixed a bug in GPU mode where, when NTSTEP_BETWEEN_OUTPUT_SAMPLE was not 1, the output seismograph file appended the previous data at the end, thereby overwriting the original data.
Before repair, single-channel data when NTSTEP_BETWEEN_OUTPUT_SAMPLE = 1.
(There is some repetition of the data above at the end of the data, which overwrites the original data.)
After repair, single-channel data when NTSTEP_BETWEEN_OUTPUT_SAMPLE = 10.