Hello everyone,
I am currently experimenting with modifying the LTS time integration in the GPU implementation of SPECFEM3D by replacing the original Newmark scheme with a PEFRL symplectic integrator.
This post ended up being a bit long because I wanted to describe the modification strategy, formulas, and current issues as clearly as possible. Thank you very much for taking the time to read through it, and I would greatly appreciate any comments or suggestions.
Background
I am modifying the time integration scheme in the GPU Local Time Stepping (LTS) implementation of SPECFEM3D.
The goal is to replace the original Newmark time integrator with the PEFRL symplectic integrator, while keeping the existing GPU force computation kernels unchanged.
The modification idea is illustrated in the following figure.More detailed instructions are provided after the image.

Original Time Integration Structure (Newmark)
In the original SPECFEM3D implementation, the time stepping structure is:
do m = 1, p_level_m_loops(ilevel)
do iphase = 1,2
compute_forces_viscoelastic_cuda_lts
lts_boundary_contribution_cuda
compute_stacey_viscoelastic_GPU
compute_add_sources_viscoelastic_GPU
assemble_MPI_vector_async_sendrecv_lts
end do
lts_newmark_update ! A → V(half) → D
end do
In this structure:
- The force evaluation is performed twice (iphase = 1,2)
- Then the Newmark time integration updates velocity and displacement.
Conceptually this corresponds to:
A(t) → V(t + Δt/2) → U(t + Δt)
PEFRL Time Integration Scheme
The PEFRL integrator updates displacement and velocity through several stages.
The update sequence is:
U1= U(t) + ξ Δt V(t)
V1 = V(t) + λ Δt A(U1)
U2 = U1 + χ Δt V1
V2 = V1 + (1 − 2λ) Δt A(U2)
U3 = U2 + (1 − 2(χ + ξ)) Δt V2
V3 = V2 + λ Δt A(U3)
U4 = U3 + χ Δt V3
V4 = V3 + λ Δt A(U4)
U5 = U4 + ξ Δt V4
Implementation Strategy
Instead of calling Newmark once per LTS loop, I expanded the integration into multiple PEFRL stages.
The modified structure becomes:
do m = 1, p_level_m_loops(ilevel)
! PEFRL first displacement update
U = U + ξ * dt * V
do s = 1,4
do iphase = 1,2
compute_forces_viscoelastic_cuda_lts
lts_boundary_contribution_cuda
compute_stacey_viscoelastic_GPU
compute_add_sources_viscoelastic_GPU
assemble_MPI_vector_async_sendrecv_lts
end do
call lts_PEFRL_update ! A → V → U
end do
end do
Key Design Choices
1. Force kernels remain unchanged
All original GPU kernels for force computation are reused:
- viscoelastic internal forces
- LTS boundary contribution
- Stacey absorbing boundary
- source terms
- MPI halo exchange
Thus the physical force operator remains identical to the original implementation.
2. PEFRL stage loop
A new loop
do s = 1,4
epresents the internal PEFRL stages.
Each stage performs:
- force evaluation → acceleration
- velocity update
- displacement update
according to the PEFRL coefficients.
3. Displacement-only update
The first PEFRL step
U1 = U + ξ Δt V
is implemented before entering the stage loop as a displacement-only update.
4. Mapping between PEFRL variables and SPECFEM variables
The update sequence inside lts_PEFRL_update follows:
A → V → U
meaning acceleration is computed first, followed by velocity and displacement updates.
PS: The full code for this section is provided at the end of this post. The example shown uses kernel_P; the other two kernels (p2 and r) were modified in the same way.
Observed Issues
Two problems appear during testing.
Issue 1 – Phase Error in Degenerated Single-Level Model
To simplify debugging, I tested a uniform mesh model with only one LTS level.
In this case, LTS should effectively behave like a standard global time stepping scheme.
However, the waveform gradually shows phase drift as simulation time increases.
Observed behavior:
amplitude remains approximately stable
waveform phase progressively shifts
Issue 2 – Exponential Instability in Normal LTS Model
For normal multi-level LTS models, the solution becomes unstable and the wavefield grows exponentially after some simulation time.
Observed behavior:
solution initially looks correct
after several hundred time steps amplitudes grow rapidly
Questions
I would greatly appreciate suggestions regarding the following:
- Possible reuse of Newmark-based variable relations inside PEFRL stages
Could the phase drift observed in the single-level model be caused by variable relations originally derived for the Newmark scheme being reused inside the PEFRL stages?
The force computation kernels themselves were not modified, and they still use the original Newmark-based variable relations. However, with PEFRL, these routines are now executed multiple times within one time step.
I am wondering whether some of these relations implicitly assume a single Newmark update per time step, and repeating them several times inside the PEFRL stages could disturb the time integration and introduce phase drift.
- Another possible issue I am considering is whether the existing LTS velocity reconstruction formula, originally derived for the Newmark scheme, may not be compatible with PEFRL.
In the current code, when ilevel > 1, velocity is reconstructed from displacement using the following relation:
if (ilevel > 1) {
realw deltat_inv = 1.0f/deltat_lts;
veloc[index] += deltat_inv * (displ[indexm1] - displ[index]);
veloc[index+1] += deltat_inv * (displ[indexm1+1] - displ[index+1]);
veloc[index+2] += deltat_inv * (displ[indexm1+2] - displ[index+2]);
}
This formula originate from the Newmark time integration structure, where velocity is related to displacement differences across time levels.
I am wondering whether keeping this Newmark-derived velocity reconstruction inside the LTS framework could introduce inconsistencies, possibly leading to:exponential instability in the multi-level LTS model.
Any suggestions or references would be greatly appreciated.
Thank you very much for your time.
__global__ void lts_pefrl_update_kernel_P(
realw* displ,
realw* veloc,
realw* accel,
realw* rmassxyz,
realw* rmassxyz_mod,
realw* cmassxyz,
int ilevel,
int step_m, //
int NGLOB_AB,
int NGLOB_PLEVELS_END,
int num_p_level,
realw c_dt, // PEFRL velocity coefficient * dt
realw d_dt, // PEFRL displacement coefficient * dt
realw deltat_lts, //
int is_last_stage,
int is_first_stage,
int collect_accel,
realw* accel_collected,
int* mask_ibool_collected
) {
int iglob = threadIdx.x
+ (blockIdx.x + blockIdx.y * gridDim.x) * blockDim.x;
if (iglob >= NGLOB_PLEVELS_END) return;
/* ----------------------------
* 1. M^{-1} a
* ---------------------------- */
realw ax, ay, az;
if (ilevel < num_p_level) {
ax = rmassxyz[iglob*3 ] * accel[iglob*3 ];
ay = rmassxyz[iglob*3+1] * accel[iglob*3+1];
az = rmassxyz[iglob*3+2] * accel[iglob*3+2];
} else {
ax = rmassxyz_mod[iglob*3 ] * accel[iglob*3 ];
ay = rmassxyz_mod[iglob*3+1] * accel[iglob*3+1];
az = rmassxyz_mod[iglob*3+2] * accel[iglob*3+2];
}
accel[iglob*3 ] = ax;
accel[iglob*3+1] = ay;
accel[iglob*3+2] = az;
int index = iglob*3 + NDIM * NGLOB_AB * (ilevel-1);
int indexm1 = iglob*3 + NDIM * NGLOB_AB * (ilevel-2);
/* ----------------------------
* 2. V update (PEFRL)
* V += c_dt * a
* ---------------------------- */
veloc[index ] += c_dt * ax;
veloc[index+1] += c_dt * ay;
veloc[index+2] += c_dt * az;
/* ------------------------------------------
* 3. LTS
* V += (2 / Δt) * (u_{m-1} - u_m)
* ------------------------------------------ */
if (ilevel > 1 && step_m == 1 && is_first_stage) {
realw fac = 1.0f / deltat_lts;
realw dux = displ[indexm1 ] - displ[index ];
realw duy = displ[indexm1+1] - displ[index+1];
realw duz = displ[indexm1+2] - displ[index+2];
if (ilevel == num_p_level) {
// coarsest level
realw invx = 1.0f / (1.0f + rmassxyz[iglob*3 ] * cmassxyz[iglob*3 ]);
realw invy = 1.0f / (1.0f + rmassxyz[iglob*3+1] * cmassxyz[iglob*3+1]);
realw invz = 1.0f / (1.0f + rmassxyz[iglob*3+2] * cmassxyz[iglob*3+2]);
veloc[index ] += invx * fac * dux;
veloc[index+1] += invy * fac * duy;
veloc[index+2] += invz * fac * duz;
} else {
veloc[index ] += fac * dux;
veloc[index+1] += fac * duy;
veloc[index+2] += fac * duz;
}
}
/* ----------------------------
* 4. U update (PEFRL)
* U += d_dt * V
* ---------------------------- */
displ[index ] += d_dt * veloc[index ];
displ[index+1] += d_dt * veloc[index+1];
displ[index+2] += d_dt * veloc[index+2];
/* ----------------------------------------
* 5. collect_accel
* ---------------------------------------- */
if (collect_accel && is_last_stage) {
if (!mask_ibool_collected[iglob]) {
if (ilevel < num_p_level) {
accel_collected[iglob*3 ] = rmassxyz_mod[iglob*3 ] / rmassxyz[iglob*3 ] * ax;
accel_collected[iglob*3+1] = rmassxyz_mod[iglob*3+1] / rmassxyz[iglob*3+1] * ay;
accel_collected[iglob*3+2] = rmassxyz_mod[iglob*3+2] / rmassxyz[iglob*3+2] * az;
} else {
accel_collected[iglob*3 ] = ax;
accel_collected[iglob*3+1] = ay;
accel_collected[iglob*3+2] = az;
}
mask_ibool_collected[iglob] = 1;
}
}
}
Hello everyone,
I am currently experimenting with modifying the LTS time integration in the GPU implementation of SPECFEM3D by replacing the original Newmark scheme with a PEFRL symplectic integrator.
This post ended up being a bit long because I wanted to describe the modification strategy, formulas, and current issues as clearly as possible. Thank you very much for taking the time to read through it, and I would greatly appreciate any comments or suggestions.
Background
I am modifying the time integration scheme in the GPU Local Time Stepping (LTS) implementation of SPECFEM3D.

The goal is to replace the original Newmark time integrator with the PEFRL symplectic integrator, while keeping the existing GPU force computation kernels unchanged.
The modification idea is illustrated in the following figure.More detailed instructions are provided after the image.
Original Time Integration Structure (Newmark)
In the original SPECFEM3D implementation, the time stepping structure is:
In this structure:
Conceptually this corresponds to:
PEFRL Time Integration Scheme
The PEFRL integrator updates displacement and velocity through several stages.
The update sequence is:
Implementation Strategy
Instead of calling Newmark once per LTS loop, I expanded the integration into multiple PEFRL stages.
The modified structure becomes:
Key Design Choices
1. Force kernels remain unchanged
All original GPU kernels for force computation are reused:
Thus the physical force operator remains identical to the original implementation.
2. PEFRL stage loop
A new loop
do s = 1,4epresents the internal PEFRL stages.
Each stage performs:
according to the PEFRL coefficients.
3. Displacement-only update
The first PEFRL step
U1 = U + ξ Δt Vis implemented before entering the stage loop as a displacement-only update.
4. Mapping between PEFRL variables and SPECFEM variables
The update sequence inside lts_PEFRL_update follows:
A → V → U
meaning acceleration is computed first, followed by velocity and displacement updates.
PS: The full code for this section is provided at the end of this post. The example shown uses kernel_P; the other two kernels (p2 and r) were modified in the same way.
Observed Issues
Two problems appear during testing.
Issue 1 – Phase Error in Degenerated Single-Level Model
To simplify debugging, I tested a uniform mesh model with only one LTS level.
In this case, LTS should effectively behave like a standard global time stepping scheme.
However, the waveform gradually shows phase drift as simulation time increases.
Observed behavior:
amplitude remains approximately stable
waveform phase progressively shifts
Issue 2 – Exponential Instability in Normal LTS Model
For normal multi-level LTS models, the solution becomes unstable and the wavefield grows exponentially after some simulation time.
Observed behavior:
solution initially looks correct
after several hundred time steps amplitudes grow rapidly
Questions
I would greatly appreciate suggestions regarding the following:
Could the phase drift observed in the single-level model be caused by variable relations originally derived for the Newmark scheme being reused inside the PEFRL stages?
The force computation kernels themselves were not modified, and they still use the original Newmark-based variable relations. However, with PEFRL, these routines are now executed multiple times within one time step.
I am wondering whether some of these relations implicitly assume a single Newmark update per time step, and repeating them several times inside the PEFRL stages could disturb the time integration and introduce phase drift.
In the current code, when ilevel > 1, velocity is reconstructed from displacement using the following relation:
This formula originate from the Newmark time integration structure, where velocity is related to displacement differences across time levels.
I am wondering whether keeping this Newmark-derived velocity reconstruction inside the LTS framework could introduce inconsistencies, possibly leading to:exponential instability in the multi-level LTS model.
Any suggestions or references would be greatly appreciated.
Thank you very much for your time.