This code file implements a recursive algorithm to solve for the diagonal elements of the Non-Equilibrium Green's Function (NEGF). It is designed to handle matrices by 3 diagonals. The primary outputs are various Green's functions: Retarded (Grl, Grd, Gru), Electron (Gnl, Gnd, Gnu), and Hole (Gpl, Gpd, Gpu - though these are not present in the final kernel's signature). The calculations are performed on a GPU using Numba and CUDA.
Gpu_3d_Block_Recursive:- Description: This is the main CUDA kernel that implements the recursive algorithm to compute Green's functions. It takes coefficient matrices (AD, ALD, AUD) and self-energies (Sigin, Sigout) as inputs and calculates the retarded (Grl, Grd, Gru) and electron (Gnl, Gnd, Gnu) Green's functions. It also uses several helper device functions for matrix operations.
- Parameters:
AD (complex64[:,:,:,:]): Matrix of coefficients (diagonal elements).ALD (complex64[:,:,:,:]): Matrix of coefficients (lower diagonal elements).AUD (complex64[:,:,:,:]): Matrix of coefficients (upper diagonal elements).Sigin (complex64[:,:,:,:]): Matrix of in-scattering self-energies (diagonal).Sigout (complex64[:,:,:,:]): Matrix of out-scattering self-energies (diagonal).gaL (complex64[:,:,:,:]): Output array for advanced Green's function (left-connected).grL (complex64[:,:,:,:]): Output array for retarded Green's function (left-connected).ginL (complex64[:,:,:,:]): Output array for electron in-scattering Green's function (left-connected).gipL (complex64[:,:,:,:]): Output array for hole in-scattering Green's function (left-connected). (Note: mentioned in comments, but not directly used in the same way as others in the kernel body for final output G's).Grl (complex64[:,:,:,:]): Output array for Retarded Green's Function (lower diagonal).Grd (complex64[:,:,:,:]): Output array for Retarded Green's Function (diagonal).Gru (complex64[:,:,:,:]): Output array for Retarded Green's Function (upper diagonal).Gnd (complex64[:,:,:,:]): Output array for Electron Green's Function (diagonal).Gnu (complex64[:,:,:,:]): Output array for Electron Green's Function (upper diagonal).Gnl (complex64[:,:,:,:]): Output array for Electron Green's Function (lower diagonal).Al_cr (complex64[:,:,:,:]): Helper array for Hermitian conjugate of AUD.Ad_cr (complex64[:,:,:,:]): Helper array for Hermitian conjugate of AD.Au_cr (complex64[:,:,:,:]): Helper array for Hermitian conjugate of ALD.Gal (complex64[:,:,:,:]): Output array for advanced Green's function (lower diagonal).Gad (complex64[:,:,:,:]): Output array for advanced Green's function (diagonal).Gau (complex64[:,:,:,:]): Output array for advanced Green's function (upper diagonal).
- Block/Grid Dimensions: The kernel uses
i = cuda.grid(1), suggesting it is launched with a 1D grid of thread blocks. The size of the grid typically corresponds toNe, the number of energy points or independent calculations.
inverse_matrix(mat, B, A):- Description: Computes the inverse of a matrix
matusing Gaussian elimination with partial pivoting and stores it inB.Ais used as an augmented matrix internally. - Parameters:
mat (complex64[:,:]): Input matrix to be inverted.B (complex64[:,:]): Output matrix to store the inverse.A (complex64[:,:]): Internal temporary augmented matrix.
- Returns:
complex64[:,:](reference toB).
- Description: Computes the inverse of a matrix
mul_mat(A, B, C):- Description: Performs matrix multiplication of
AandB(A @ B) and stores the result inC. - Parameters:
A (complex64[:,:]): First input matrix.B (complex64[:,:]): Second input matrix.C (complex64[:,:]): Output matrix for the result.
- Returns:
complex64[:,:](reference toC).
- Description: Performs matrix multiplication of
add_num(a, b, c):- Description: Adds two complex scalars
aandband stores the result inc. (Docstring incorrectly states matrix multiplication). - Parameters:
a (complex64): First scalar.b (complex64): Second scalar.c (complex64): Output scalar for the sum.
- Returns:
complex64(the sum).
- Description: Adds two complex scalars
sub_mat(A, B, C):- Description: Subtracts matrix
Bfrom matrixA(A - B) and stores the result inC. - Parameters:
A (complex64[:,:]): First input matrix.B (complex64[:,:]): Second input matrix (to be subtracted).C (complex64[:,:]): Output matrix for the result.
- Returns:
complex64[:,:](reference toC).
- Description: Subtracts matrix
add_mat(A, B, C):- Description: Adds matrix
Aand matrixB(A + B) and stores the result inC. - Parameters:
A (complex64[:,:]): First input matrix.B (complex64[:,:]): Second input matrix.C (complex64[:,:]): Output matrix for the result.
- Returns: (void, modifies
Cin-place).
- Description: Adds matrix
set_mat(B, A):- Description: Copies the elements from matrix
Ainto matrixB. - Parameters:
B (complex64[:,:]): Destination matrix.A (complex64[:,:]): Source matrix.
- Returns: (void, modifies
Bin-place).
- Description: Copies the elements from matrix
neg_mat(A):- Description: Negates all elements in matrix
A. - Parameters:
A (complex64[:,:]): Input matrix (modified in-place).
- Returns: (void, modifies
Ain-place).
- Description: Negates all elements in matrix
pw_mul(A, B, C):- Description: Performs element-wise multiplication of matrices
AandBand stores the result inC. - Parameters:
A (complex64[:,:]): First input matrix.B (complex64[:,:]): Second input matrix.C (complex64[:,:]): Output matrix for the result.
- Returns:
complex64[:,:](reference toC).
- Description: Performs element-wise multiplication of matrices
add_scalar_mat(a, B, C):- Description: Adds a scalar
ato all elements of matrixBand stores the result inC. - Parameters:
a (complex64): Scalar value.B (complex64[:,:]): Input matrix.C (complex64[:,:]): Output matrix for the result.
- Returns:
complex64[:,:](reference toC).
- Description: Adds a scalar
mul_scalar_mat(a, B, C):- Description: Multiplies all elements of matrix
Bby a scalaraand stores the result inC. - Parameters:
a (complex64): Scalar value.B (complex64[:,:]): Input matrix.C (complex64[:,:]): Output matrix for the result.
- Returns:
complex64[:,:](reference toC).
- Description: Multiplies all elements of matrix
abs_err(A, error):- Description: Calculates the sum of the absolute values (magnitudes) of all complex elements in matrix
A. The result is stored in the first element of theerrorarray. - Parameters:
A (complex64[:,:]): Input matrix.error (float32[:]): Output array (1-element) to store the sum.
- Returns:
float32[:](reference toerror).
- Description: Calculates the sum of the absolute values (magnitudes) of all complex elements in matrix
conjugate(num):- Description: Computes the complex conjugate of a scalar
num. - Parameters:
num (complex64): Input scalar.
- Returns:
complex64(the complex conjugate).
- Description: Computes the complex conjugate of a scalar
hconj_mat(A, B):- Description: Computes the Hermitian conjugate (conjugate transpose) of matrix
Aand stores it inB. - Parameters:
A (complex64[:,:]): Input matrix.B (complex64[:,:]): Output matrix for the Hermitian conjugate.
- Returns:
complex64[:,:](reference toB).
- Description: Computes the Hermitian conjugate (conjugate transpose) of matrix
- N/A: The file primarily consists of the main CUDA kernel and its associated device functions.
EPS (float): A small constant1.0e-19used ininverse_matrixto handle potential division by zero or very small pivot elements, ensuring numerical stability.Np (int): Represents the size of the matrices (e.g., number of points in a spatial dimension, or layers). Inferred from1+Grl.shape[0]in the kernel. The header comments also mention "Np = Size Of The Matrices".Ne (int): Represents the number of independent calculations, often corresponding to energy points. Inferred fromGrl.shape[1]in the kernel.- Green's Function Matrices:
Grl, Grd, Gru: Components of the Retarded Green's Function (lower, diagonal, upper).Gnl, Gnd, Gnu: Components of the Electron Green's Function (lower, diagonal, upper).Gal, Gad, Gau: Components of the Advanced Green's Function (lower, diagonal, upper).grL, ginL, gaL: Left-connected Green's functions used in intermediate calculations.
- Coefficient Matrices:
AD, ALD, AUD: Coefficient matrices representing the system's Hamiltonian or coupling terms (diagonal, lower, upper).
- Self-Energy Matrices:
Sigin, Sigout: In-scattering and out-scattering self-energies, crucial for NEGF calculations.
The Gpu_3d_Block_Recursive kernel is designed to be launched on a CUDA-capable GPU. Below is a conceptual example of how it might be set up and called.
import numpy as np
from numba import cuda
# Assuming Gpu_3d_Block_Recursive is in the current scope or imported
# from current_module import Gpu_3d_Block_Recursive
# Define problem parameters (example values)
Np_val = 10 # Number of points/layers
Ne_val = 100 # Number of energy points
mat_size = 4 # Size of the sub-matrices (e.g., 4x4)
# Initialize input matrices on the CPU (typically with numpy)
# These would be populated with actual physical parameters
AD_cpu = np.random.rand(Np_val, Ne_val, mat_size, mat_size).astype(np.complex64)
ALD_cpu = np.random.rand(Np_val, Ne_val, mat_size, mat_size).astype(np.complex64)
AUD_cpu = np.random.rand(Np_val, Ne_val, mat_size, mat_size).astype(np.complex64)
Sigin_cpu = np.random.rand(Np_val, Ne_val, mat_size, mat_size).astype(np.complex64)
Sigout_cpu = np.random.rand(Np_val, Ne_val, mat_size, mat_size).astype(np.complex64) # Though Sigout is not used in the current kernel code
# Allocate memory on the GPU and copy data
AD_gpu = cuda.to_device(AD_cpu)
ALD_gpu = cuda.to_device(ALD_cpu)
AUD_gpu = cuda.to_device(AUD_cpu)
Sigin_gpu = cuda.to_device(Sigin_cpu)
Sigout_gpu = cuda.to_device(Sigout_cpu) # Matching kernel signature
# Allocate output arrays on the GPU
# Note: Np in kernel (Np-1 for array dim) vs Np_val for allocation size
Grl_gpu = cuda.device_array((Np_val - 1, Ne_val, mat_size, mat_size), dtype=np.complex64)
Grd_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
Gru_gpu = cuda.device_array((Np_val - 1, Ne_val, mat_size, mat_size), dtype=np.complex64)
Gnl_gpu = cuda.device_array((Np_val - 1, Ne_val, mat_size, mat_size), dtype=np.complex64)
Gnd_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
Gnu_gpu = cuda.device_array((Np_val - 1, Ne_val, mat_size, mat_size), dtype=np.complex64)
# Allocate intermediate/helper arrays on the GPU
gaL_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
grL_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
ginL_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
gipL_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64) # As per signature
Al_cr_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
Ad_cr_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
Au_cr_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
Gal_gpu = cuda.device_array((Np_val - 1, Ne_val, mat_size, mat_size), dtype=np.complex64)
Gad_gpu = cuda.device_array((Np_val, Ne_val, mat_size, mat_size), dtype=np.complex64)
Gau_gpu = cuda.device_array((Np_val - 1, Ne_val, mat_size, mat_size), dtype=np.complex64)
# Configure kernel launch parameters
threadsperblock = 32 # Example value, needs tuning
blockspergrid = (Ne_val + (threadsperblock - 1)) // threadsperblock
# Launch the kernel
Gpu_3d_Block_Recursive[blockspergrid, threadsperblock](
AD_gpu, ALD_gpu, AUD_gpu, Sigin_gpu, Sigout_gpu,
gaL_gpu, grL_gpu, ginL_gpu, gipL_gpu,
Grl_gpu, Grd_gpu, Gru_gpu,
Gnd_gpu, Gnu_gpu, Gnl_gpu,
Al_cr_gpu, Ad_cr_gpu, Au_cr_gpu,
Gal_gpu, Gad_gpu, Gau_gpu
)
cuda.synchronize()
# Copy results back to CPU if needed
Grd_results_cpu = Grd_gpu.copy_to_host()
# print(Grd_results_cpu)- Internal Dependencies:
- The main kernel
Gpu_3d_Block_Recursiveheavily relies on the various CUDA device functions defined within the same file (e.g.,inverse_matrix,mul_mat,hconj_mat, etc.) for its computations.
- The main kernel
- External Libraries:
numpy: Used for creating, initializing, and manipulating numerical arrays on the CPU before they are transferred to the GPU. Also used for handling results copied back from the GPU.numba.cuda: Essential for the entire GPU computation. It provides:@cuda.jitdecorator for JIT compiling Python functions into CUDA kernels and device functions.cuda.to_device()for transferring data from CPU to GPU.cuda.device_array()for allocating memory directly on the GPU.cuda.local.array()for creating statically sized arrays local to a thread within a kernel.cuda.grid(1)for determining thread index within the kernel.cuda.synchronize()for ensuring kernel completion.
- Interactions:
- This file provides a core computational kernel for solving a part of a Non-Equilibrium Green's Function problem.
- It is expected to be called by a higher-level Python script that sets up the overall physics simulation, prepares the input data (Hamiltonians, self-energies), manages GPU resources, and orchestrates the kernel launch.
- The output Green's functions (Grd, Grl, Gru, Gnd, Gnl, Gnu) are fundamental quantities in NEGF theory and would be used by subsequent parts of the simulation for calculating physical observables like current, density of states, etc.