Skip to content

Implementation of PEFRL Time Integration in LTS GPU Framework of SPECFEM3D #1863

@Takunmi

Description

@Takunmi

Hello everyone,

I am currently experimenting with modifying the LTS time integration in the GPU implementation of SPECFEM3D by replacing the original Newmark scheme with a PEFRL symplectic integrator.

This post ended up being a bit long because I wanted to describe the modification strategy, formulas, and current issues as clearly as possible. Thank you very much for taking the time to read through it, and I would greatly appreciate any comments or suggestions.

Background

I am modifying the time integration scheme in the GPU Local Time Stepping (LTS) implementation of SPECFEM3D.
The goal is to replace the original Newmark time integrator with the PEFRL symplectic integrator, while keeping the existing GPU force computation kernels unchanged.
The modification idea is illustrated in the following figure.‌More detailed instructions are provided after the image.‌
Image

Original Time Integration Structure (Newmark)

In the original SPECFEM3D implementation, the time stepping structure is:

do m = 1, p_level_m_loops(ilevel)

  do iphase = 1,2

    compute_forces_viscoelastic_cuda_lts
    lts_boundary_contribution_cuda
    compute_stacey_viscoelastic_GPU
    compute_add_sources_viscoelastic_GPU
    assemble_MPI_vector_async_sendrecv_lts

  end do

  lts_newmark_update   ! A → V(half) → D

end do

In this structure:

  • The force evaluation is performed twice (iphase = 1,2)
  • Then the Newmark time integration updates velocity and displacement.

Conceptually this corresponds to:

A(t) → V(t + Δt/2) → U(t + Δt)

PEFRL Time Integration Scheme

The PEFRL integrator updates displacement and velocity through several stages.
The update sequence is:

U1= U(t) + ξ Δt V(t)
V1 = V(t) + λ Δt A(U1)
U2 = U1 + χ Δt V1
V2 = V1 + (1 − 2λ) Δt A(U2)
U3 = U2 + (1 − 2(χ + ξ)) Δt V2
V3 = V2 + λ Δt A(U3)
U4 = U3 + χ Δt V3
V4 = V3 + λ Δt A(U4)
U5 = U4 + ξ Δt V4

Implementation Strategy

Instead of calling Newmark once per LTS loop, I expanded the integration into multiple PEFRL stages.
The modified structure becomes:

do m = 1, p_level_m_loops(ilevel)

  ! PEFRL first displacement update
  U = U + ξ * dt * V

  do s = 1,4

    do iphase = 1,2

      compute_forces_viscoelastic_cuda_lts
      lts_boundary_contribution_cuda
      compute_stacey_viscoelastic_GPU
      compute_add_sources_viscoelastic_GPU
      assemble_MPI_vector_async_sendrecv_lts

    end do

    call lts_PEFRL_update   ! A → V → U

  end do

end do

Key Design Choices

1. Force kernels remain unchanged
All original GPU kernels for force computation are reused:

  • viscoelastic internal forces
  • LTS boundary contribution
  • Stacey absorbing boundary
  • source terms
  • MPI halo exchange

Thus the physical force operator remains identical to the original implementation.

2. PEFRL stage loop

A new loop
do s = 1,4
epresents the internal PEFRL stages.
Each stage performs:

  1. force evaluation → acceleration
  2. velocity update
  3. displacement update

according to the PEFRL coefficients.

3. Displacement-only update
The first PEFRL step
U1 = U + ξ Δt V
is implemented before entering the stage loop as a displacement-only update.

4. Mapping between PEFRL variables and SPECFEM variables
The update sequence inside lts_PEFRL_update follows:
A → V → U
meaning acceleration is computed first, followed by velocity and displacement updates.
PS: The full code for this section is provided at the end of this post. The example shown uses kernel_P; the other two kernels (p2 and r) were modified in the same way.

Observed Issues

Two problems appear during testing.

Issue 1 – Phase Error in Degenerated Single-Level Model

To simplify debugging, I tested a uniform mesh model with only one LTS level.
In this case, LTS should effectively behave like a standard global time stepping scheme.
However, the waveform gradually shows phase drift as simulation time increases.
Observed behavior:
amplitude remains approximately stable
waveform phase progressively shifts

Image

Issue 2 – Exponential Instability in Normal LTS Model
For normal multi-level LTS models, the solution becomes unstable and the wavefield grows exponentially after some simulation time.
Observed behavior:
solution initially looks correct
after several hundred time steps amplitudes grow rapidly

Image

Questions

I would greatly appreciate suggestions regarding the following:

  1. Possible reuse of Newmark-based variable relations inside PEFRL stages

Could the phase drift observed in the single-level model be caused by variable relations originally derived for the Newmark scheme being reused inside the PEFRL stages?

The force computation kernels themselves were not modified, and they still use the original Newmark-based variable relations. However, with PEFRL, these routines are now executed multiple times within one time step.

I am wondering whether some of these relations implicitly assume a single Newmark update per time step, and repeating them several times inside the PEFRL stages could disturb the time integration and introduce phase drift.

  1. Another possible issue I am considering is whether the existing LTS velocity reconstruction formula, originally derived for the Newmark scheme, may not be compatible with PEFRL.

In the current code, when ilevel > 1, velocity is reconstructed from displacement using the following relation:

if (ilevel > 1) {
    realw deltat_inv = 1.0f/deltat_lts;

    veloc[index]   += deltat_inv * (displ[indexm1]   - displ[index]);
    veloc[index+1] += deltat_inv * (displ[indexm1+1] - displ[index+1]);
    veloc[index+2] += deltat_inv * (displ[indexm1+2] - displ[index+2]);
}

This formula originate from the Newmark time integration structure, where velocity is related to displacement differences across time levels.
I am wondering whether keeping this Newmark-derived velocity reconstruction inside the LTS framework could introduce inconsistencies, possibly leading to:exponential instability in the multi-level LTS model.

Any suggestions or references would be greatly appreciated.

Thank you very much for your time.

__global__ void lts_pefrl_update_kernel_P(
  realw* displ,
  realw* veloc,
  realw* accel,
  realw* rmassxyz,
  realw* rmassxyz_mod,
  realw* cmassxyz,
  int    ilevel,
  int    step_m,              // 
  int    NGLOB_AB,
  int    NGLOB_PLEVELS_END,
  int    num_p_level,
  realw  c_dt,          // PEFRL velocity coefficient * dt
  realw  d_dt,          // PEFRL displacement coefficient * dt
  realw  deltat_lts,    // 
  int    is_last_stage,
  int    is_first_stage,
  int    collect_accel,
  realw* accel_collected,
  int*   mask_ibool_collected
) {
  int iglob = threadIdx.x
            + (blockIdx.x + blockIdx.y * gridDim.x) * blockDim.x;

  if (iglob >= NGLOB_PLEVELS_END) return;

  /* ----------------------------
   * 1. M^{-1} a
   * ---------------------------- */
  realw ax, ay, az;
  if (ilevel < num_p_level) {
    ax = rmassxyz[iglob*3  ] * accel[iglob*3  ];
    ay = rmassxyz[iglob*3+1] * accel[iglob*3+1];
    az = rmassxyz[iglob*3+2] * accel[iglob*3+2];
  } else {
    ax = rmassxyz_mod[iglob*3  ] * accel[iglob*3  ];
    ay = rmassxyz_mod[iglob*3+1] * accel[iglob*3+1];
    az = rmassxyz_mod[iglob*3+2] * accel[iglob*3+2];
  }
  accel[iglob*3  ] = ax;
  accel[iglob*3+1] = ay;
  accel[iglob*3+2] = az;

  int index   = iglob*3 + NDIM * NGLOB_AB * (ilevel-1);
  int indexm1 = iglob*3 + NDIM * NGLOB_AB * (ilevel-2);

  /* ----------------------------
   * 2. V update (PEFRL)
   *    V += c_dt * a
   * ---------------------------- */
  veloc[index  ] += c_dt * ax;
  veloc[index+1] += c_dt * ay;
  veloc[index+2] += c_dt * az;

  /* ------------------------------------------
   * 3. LTS
   *    V += (2 / Δt) * (u_{m-1} - u_m)
   * ------------------------------------------ */
  if (ilevel > 1 && step_m == 1 && is_first_stage) {
    realw fac = 1.0f / deltat_lts;

    realw dux = displ[indexm1  ] - displ[index  ];
    realw duy = displ[indexm1+1] - displ[index+1];
    realw duz = displ[indexm1+2] - displ[index+2];

    if (ilevel == num_p_level) {
      // coarsest level
      realw invx = 1.0f / (1.0f + rmassxyz[iglob*3  ] * cmassxyz[iglob*3  ]);
      realw invy = 1.0f / (1.0f + rmassxyz[iglob*3+1] * cmassxyz[iglob*3+1]);
      realw invz = 1.0f / (1.0f + rmassxyz[iglob*3+2] * cmassxyz[iglob*3+2]);

      veloc[index  ] += invx * fac * dux;
      veloc[index+1] += invy * fac * duy;
      veloc[index+2] += invz * fac * duz;
    } else {
      veloc[index  ] += fac * dux;
      veloc[index+1] += fac * duy;
      veloc[index+2] += fac * duz;
    }
  }

  /* ----------------------------
   * 4. U update (PEFRL)
   *    U += d_dt * V
   * ---------------------------- */
  displ[index  ] += d_dt * veloc[index  ];
  displ[index+1] += d_dt * veloc[index+1];
  displ[index+2] += d_dt * veloc[index+2];

  /* ----------------------------------------
   * 5. collect_accel
   * ---------------------------------------- */
  if (collect_accel && is_last_stage) {
    if (!mask_ibool_collected[iglob]) {

      if (ilevel < num_p_level) {
        accel_collected[iglob*3  ] = rmassxyz_mod[iglob*3  ] / rmassxyz[iglob*3  ] * ax;
        accel_collected[iglob*3+1] = rmassxyz_mod[iglob*3+1] / rmassxyz[iglob*3+1] * ay;
        accel_collected[iglob*3+2] = rmassxyz_mod[iglob*3+2] / rmassxyz[iglob*3+2] * az;
      } else {
        accel_collected[iglob*3  ] = ax;
        accel_collected[iglob*3+1] = ay;
        accel_collected[iglob*3+2] = az;
      }

      mask_ibool_collected[iglob] = 1;
    }
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions