Raw data processing notebook to Python files by DhmhtrhsPakakis · Pull Request #4 · Astro-BEAM-AUTh/data-analysis

DhmhtrhsPakakis · 2026-05-08T17:31:46Z

From the Hydrogen_Preprocessing_Custom.ipynb i created python files in the folder raw_data_pipeline_tools and a main.py to use the files and the pipeline.

This pipeline has an input of the raw data from the observation and as an output the calibrated/ or only-target spectrum.

Note: The output of this pipeline, the calibrated signal, will be the input for the main data analysis pipelines (HI line extraction,background removal etc). In this version the calibrated signal is not returned, just used in main for plots. If this cant change afterwards, so it can be returned then don't accept the PR.

Copilot

Pull request overview

This PR converts a raw-data preprocessing notebook into importable Python modules plus a main.py runner intended to ingest ON/OFF observations, optionally calibrate (ON/OFF), and visualize the resulting spectra.

Changes:

Added .dat → .csv conversion utility (dat_to_csv).
Added a helper to average a time series into an FFT-sized spectrum (get_avg_signal).
Added main.py to orchestrate loading, optional calibration, and plotting.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 12 comments.

File	Description
`raw_data_pipeline_tools/data_convert_to_csv.py`	Adds a utility to convert binary `.dat` float32 samples into a single-column CSV.
`raw_data_pipeline_tools/average_signal_fftsize.py`	Adds a helper that reshapes samples into `fft_size` blocks and averages them.
`raw_data_pipeline_tools/__init__.py`	Introduces a package marker for the new tools folder.
`main.py`	Adds a runnable script wiring conversion, averaging, optional calibration, and plotting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    """
+    Create the average spectrum from the time series.
+    """
+    reshaped = time_series.reshape(-1, fft_size)


+def get_avg_signal(time_series, fft_size):
+    """
+    Create the average spectrum from the time series.
+    """


+    fft_size = 2048
+
+    frequencies = np.linspace(1.4205 - 0.003840/2, 1.4205000 + 0.003840/2, fft_size)
+
+    on_observation_filename = '2502202_Hot202020.csv'
+    off_observation_filename = "2502202_Cold202020.csv"
+
+    # Files Management
+    if on_observation_filename.endswith(".dat"):
+        base_name = on_observation_filename.split('.')[0]
+        on_observation_filename = f"{base_name}.csv"
+        dat_to_csv(f"{base_name}.dat", on_observation_filename)
+
+    if off_observation_filename.endswith(".dat"):
+        base_name = off_observation_filename.split('.')[0]
+        off_observation_filename = f"{base_name}.csv"
+        dat_to_csv(f"{base_name}.dat", off_observation_filename)
+
+    # ON signal proccesing
+    on_series_df = pd.read_csv(on_observation_filename)
+    on_series_df = on_series_df.filter(regex='power_au|y_axis')
+    on_series_np = on_series_df.to_numpy()
+
+    avg_on = on_series_np
+    if len(on_series_np) > fft_size:
+        avg_on = get_avg_signal(on_series_np, fft_size)
+


+    on_series_df = pd.read_csv(on_observation_filename)
+    on_series_df = on_series_df.filter(regex='power_au|y_axis')
+    on_series_np = on_series_df.to_numpy()
+


+
+        calibrated_signal = avg_on / avg_off
+
+        fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, ncols=1, figsize=(8, 10))


+from raw_data_pipeline_tools.data_convert_to_csv import dat_to_csv
+from raw_data_pipeline_tools.average_signal_fftsize import get_avg_signal
+
+def main():


+    # Files Management
+    if on_observation_filename.endswith(".dat"):
+        base_name = on_observation_filename.split('.')[0]
+        on_observation_filename = f"{base_name}.csv"
+        dat_to_csv(f"{base_name}.dat", on_observation_filename)
+
+    if off_observation_filename.endswith(".dat"):
+        base_name = off_observation_filename.split('.')[0]
+        off_observation_filename = f"{base_name}.csv"
+        dat_to_csv(f"{base_name}.dat", off_observation_filename)


+    """
+    y_data_list = np.fromfile(filename, dtype=np.float32)
+
+    with open(csv_filename, 'w', newline='') as csvfile:


+        off_observation_filename = f"{base_name}.csv"
+        dat_to_csv(f"{base_name}.dat", off_observation_filename)
+
+    # ON signal proccesing


+    # Check for Calibration & Plotting
+    if calibration:
+        off_series_df = pd.read_csv(off_observation_filename)
+        off_series_df = off_series_df.filter(regex='power_au|y_axis')
+        off_series_np = off_series_df.to_numpy()
+
+        avg_off = off_series_np
+        if len(off_series_np) > fft_size:
+            avg_off = get_avg_signal(off_series_np, fft_size)
+
+        calibrated_signal = avg_on / avg_off
+
+        fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, ncols=1, figsize=(8, 10))
+
+        ax1.plot(frequencies, avg_off, color='blue')
+        ax1.set_title('Avg Cold/Off')
+        ax1.set_ylabel('Relative Power')
+
+        ax2.plot(frequencies, avg_on, color='red')
+        ax2.set_title('Avg Hot/On')
+        ax2.set_ylabel('Relative Power')
+
+        ax3.plot(frequencies, calibrated_signal, color='green')
+        ax3.set_title('On/Off calibration')
+        ax3.set_ylabel('Relative Power')
+        ax3.set_xlabel('Frequencies')
+        plt.tight_layout()
+        plt.show()


…n and visualization

…gnal processing

…tions and code quality

Copilot

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 7 comments.

+    try:
+        on_signal_numpy = np.fromfile(on_file_path, dtype=np.float32)
+        off_signal_numpy = np.fromfile(off_file_path, dtype=np.float32)
+
+    except FileNotFoundError:
+        print("File not found")  # noqa: T201
+    except Exception as e:  # noqa: BLE001
+        print(f"Convert Error: {e}")  # noqa: T201
+    else:
+        return on_signal_numpy, off_signal_numpy


+from .average_signal_fftsize import get_avg_signal
+from .convert_to_numpy import convert_dat_to_numpy
+from .preprocessing_plots import create_preprocessing_plots


+
+
+def preprocessing_pipeline(
+    on_signal_filename: str, off_signal_filename: str, fft_size: int, calibration_method: str = "on/off", plot_analysis: bool = True


+    elif calibration_method == "on-off":
+        calibrated_signal: np.ndarray = on_spectrum_avg - off_spectrum_avg
+    else:
+        msg = "Calibration Method does not exists."


+def create_preprocessing_plots(on_spectrum_avg: np.array, off_spectrum_avg: np.array, calibrated_signal: np.array, fft_size: int) -> None:
+    """
+    Create on_spectrum , off_spectrum , calibrated_spectrum plots in frequencies axes.
+
+    Args:
+        on_spectrum_avg (np.array): the on spectrum
+        off_spectrum_avg (np.array): the off spectrum
+        calibrated_signal (np.array): the calibrated spectrum


+        calibrated_signal (np.array): the calibrated spectrum
+        fft_size (int): the fft size used in the observation
+    """
+    frequencies = np.linspace(1.4205 - 0.003840 / 2, 1.4205000 + 0.003840 / 2, fft_size)


+# Import functions from folder
+from raw_data_pipeline_tools.preprocessing_pipeline import preprocessing_pipeline
+
+
+def main() -> None:
+    on_filename = "/home/dimitrios-pakakis/Desktop/Astro/data-analysis/2502202_Hot202020.dat"
+    off_filename = "/home/dimitrios-pakakis/Desktop/Astro/data-analysis/2502202_Cold202020.dat"
+    fft_size = 2048
+    _ = preprocessing_pipeline(
+        on_signal_filename=on_filename, off_signal_filename=off_filename, fft_size=fft_size, calibration_method="on/off", plot_analysis=True


…on method

…pipeline function

…g_plots function

…in get_avg_signal function

…vert_dat_to_numpy function

Add Python pipeline scripts

ce5e113

DhmhtrhsPakakis requested review from PanagiotisPetrid and dyka3773 May 8, 2026 17:31

dyka3773 assigned DhmhtrhsPakakis May 8, 2026

dyka3773 requested a review from Copilot May 9, 2026 10:32

Copilot started reviewing on behalf of dyka3773 May 9, 2026 10:32 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

DhmhtrhsPakakis and others added 7 commits May 9, 2026 16:11

feat: Add function to convert .dat files to numpy arrays

220907c

feat: Enhance get_avg_signal function with input validation

7137649

feat: Remove data_convert_to_csv script as it's no longer needed

ff32398

feat: Add create_preprocessing_plots function for spectrum visualization

a94bb8d

feat: Implement preprocessing_pipeline function for signal calibratio…

91d00f5

…n and visualization

feat: Refactor main function to utilize preprocessing_pipeline for si…

9792325

…gnal processing

perf: convert the project to be UV compatible and improve type annota…

c151159

…tions and code quality

dyka3773 requested a review from Copilot May 10, 2026 19:23

Copilot started reviewing on behalf of dyka3773 May 10, 2026 19:23 View session

Copilot AI reviewed May 10, 2026

View reviewed changes

DhmhtrhsPakakis added 5 commits May 15, 2026 18:54

fix: Update import statements and improve error message for calibrati…

2fef5c5

…on method

style: Improve code formatting and type annotations in preprocessing_…

6d53693

…pipeline function

style: Improve type annotations and formatting in create_preprocessin…

5122a31

…g_plots function

style: Enhance type annotations and improve error message formatting …

ed78973

…in get_avg_signal function

style: Enhance docstring formatting and improve error handling in con…

7c6f42b

…vert_dat_to_numpy function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raw data processing notebook to Python files#4

Raw data processing notebook to Python files#4
DhmhtrhsPakakis wants to merge 13 commits into
mainfrom
pipeline-scripts

DhmhtrhsPakakis commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		calibrated_signal = avg_on / avg_off

		fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, ncols=1, figsize=(8, 10))



		def preprocessing_pipeline(
		on_signal_filename: str, off_signal_filename: str, fft_size: int, calibration_method: str = "on/off", plot_analysis: bool = True

Uh oh!

Conversation

DhmhtrhsPakakis commented May 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants