Chern Numbers in the Lowest Landau Level Dataset Guide:
This dataset is around ~250GB, although various copies and formats are available. The recommended format is .H5 or accessing the uncompressed .npz files directly (which is significantly slower). Tar files of the original .npz files also exist, for ease of data transfer and archival purposes.
RAW DATA
The raw .npz files exist in compressed and uncompressed format. The dataset is compressed in segments: the directory for each system-size is compressed as a separate archive, for efficiency. Tar + zstd compression is used: the archives "N=128.tar.zst" must thus be decompressed.
The following command may be used for decompression:
tar --use-compress-program=unzstd -xvf N=128.tar.zst
and similarly for other system-sizes.
The file-sizes for each system-size archive are as follows:
| System Size |
File Size (GB) |
|---|---|
| 8 | 18.43 GB |
| 16v1 | 16.95 GB |
| 16v2 | 16.95 GB |
| 32 | 12.96 GB |
| 64 | 20.53 GB |
| 96 | 19.23 GB |
| 128 | 22.15 GB |
| 192 | 19.69 GB |
| 256 | 21.88 GB |
| 512 | 24.99 GB |
| 1024 | 21.96 GB |
| 2048 | 20.62 GB |
| Total | 236.34 GB |
- "16v1" and "16v2" are two separate datasets for
$N_\phi=16$ . These may be used to validate analysis methods and get a rough idea about the precision of analysis metrics.
For the already uncompressed (original data), within each system-size directory, there are are files for every trial in .npz format (compressed numpy format). Each .npz file represents data for a single trial: its name contains a timestamp when it was saved and a trial number (which is irrelevant).
An example of opening the data from one .npz file is below:
def load_trial_data(file_path):
"""Load trial data efficiently from a .npz file."""
data = np.load(file_path)
return {
"PotentialMatrix": data["PotentialMatrix"],
"ChernNumbers": data["ChernNumbers"],
"SumChernNumbers": data["SumChernNumbers"],
"eigs00": data["eigs00"],
"eigs0pi": data["eigs0pi"],
"eigsPi0": data["eigsPi0"],
"eigsPipi": data["eigsPipi"]
}
There are eigenvalues saved in arrays of length
There is also a parameter SumChernNumbers: this should be validated when analyzing data to ensure the Chern numbers in the trial sum to one. Sum trials have
Finally, the Random Potential realization is also saved. The potential matrix is indexed and constructed as in the function below: the matrix should have size parameter POTENTIAL_MATRIX_SIZE = int(4*np.sqrt(NUM_STATES)). It is directly compatible for use in the simulation code given at https://github.com/edeleu/iqhe-simulation.
def constructPotential(size, mean=0, stdev=1):
# define a real, periodic gaussian potential over a size x size field where
# V(x,y)=sum V_{m,n} exp(2pi*i*m*x/L_x) exp(2pi*i*n*y/L_y)
# we allow V_(0,0) to be at V[size,size] such that we can have coefficients from V_(-size,-size) to V_(size,size). We want to have both negatively and positively indexed coefficients!
V = np.zeros((2*size+1, 2*size+1), dtype=complex)
# loop over values in the positive +i,+j quadrant and +i,-j quadrant, assigning conjugates at opposite quadrants
for i in range(size + 1):
for j in range(-size, size + 1):
# Real and imaginary parts
real_part = np.random.normal(0, stdev)
imag_part = np.random.normal(0, stdev)
# Assign the complex value
V[size + i, size + j] = real_part + IMAG * imag_part
# Enforce the symmetry condition, satisfy that V_{i,j}^* = V_{-i,-j}
if not (i == 0 and j == 0): # Avoid double-setting the origin
V[size - i, size - j] = real_part - IMAG * imag_part
# set origin equal to a REAL number! (DC OFFSET)
# V[size, size] = np.random.normal(0, stdev) + 0*IMAG
# Set DC offset to zero so avg over real-space potential = 0
V[size, size] = 0
return V/np.sqrt(NUM_STATES)
Processed Data
The dataset also exists in a pre-processed form to simplify the loading process which becomes cumbersome with a high number of trials and individual .npz files. Each system-size of the dataset has been pre-processed into .H5 format with GZIP compression which is compatible with a wide variety of languages (Python, Mathematica, Matlab, Julia, etc). This format allows for specific parts of the dataset to be efficiently and quickly be transferred to memory for simple processing. This is the best way of accessing the dataset.
The structure of the file for each system-size is as follows:
/filenames [num_trials] dtype=string
/PotentialMatrix [num_trials, POTENTIAL_MATRIX_SIZE, POTENTIAL_MATRIX_SIZE] dtype=complex128
/ChernNumbers [num_trials, $N_\phi$] dtype=int64
/SumChernNumbers [num_trials] dtype=float64
/eigs00 [num_trials, $N_\phi$] dtype=float64
/eigs0pi [num_trials, $N_\phi$] dtype=float64
/eigsPi0 [num_trials, $N_\phi$] dtype=float64
/eigsPipi [num_trials, $N_\phi$] dtype=float64
Iterating through the trials of data in Python may look like the following:
import h5py
from tqdm import tqdm
file = "N=1024_GZIP.h5"
with h5py.File(file, "r") as f:
print(list(f.keys())) #list available datasets
# Access og filenames
filenames = f["filenames"][:]
num_trials = len(filenames)
print(num_trials) # number of trials total
#Next, load desired datasets to memory!
total_eigsPiPi = f["eigsPipi"][:] # shape (n_trials, n_eigs)
total_chernNumbers = f["ChernNumbers"][:] # shape (n_trials, n_eigs)
total_sumChernNumbers = f["SumChernNumbers"][:] # shape (n_trials,)
# total_potentials = f["PotentialMatrix"] # shape (n_trials, L, L)
# Iterate through trials
for i in tqdm(range(num_trials)):
if total_sumChernNumbers[i] != 1:
continue
eigs = total_eigsPiPi[i]
cherns = total_chernNumbers[i]
# process eigs, cherns, whatever...
Loading the data in Mathematica may look like...
(* list datasets *)
Import["N=1024.h5", "Datasets"]
(* trial count *)
filenames = Import["N=1024.h5", {"Datasets", "filenames"}]
numTrials = Length[filenames]
(* load eigs for trial 5 *)
Import["N=1024.h5", {"Datasets", "eigsPipi", 5}]
(*Iterate through trials*)
allTrialEigs = Import["N=1024.h5", {"Datasets", "eigsPipi"}]
As potential matrix data takes up significant amounts of storage, there is also a version of the dataset further processed into .CSV files. Each is organized into several columns: Trial # | eigs00| eigsPi0 | eigs0Pi | eigsPiPi | chern
Thus, if there are 1,024 eigenvalues in a given trial, it will occupy 1,024 rows. There is another even further processed .CSV that only contains energy values where the condition