Skip to content

Latest commit

 

History

History
639 lines (456 loc) · 27.6 KB

File metadata and controls

639 lines (456 loc) · 27.6 KB

NSDF Tutorial: Using NSDF for End-to-End Analysis of Scientific Data - Earth Science

Python 3.10 License Slack Docker Ruff DOI DOI DOI DOI DOI DOI DOI

Overview

This tutorial introduces the NSDF ecosystem, which enhances scientific data access, analysis, and visualization through cloud technologies. It provides step-by-step guidance using a module of the SOMOSPIE engine called GEOtiled to retrieve raw data from public sources, such as the USGS portal and efficiently computes terrain attributes from digital elevation models (DEMs) across large geographic areas while preserving accuracy. The data is processed into multiple files for analysis using NSDF services and stored across both public and private platforms.

By the end of the tutorial, you will learn how to:

  • Build a modular workflow that integrates your application with NSDF services

  • Upload, download, and stream data across public and private storage platforms

  • Use the NSDF dashboard for large-scale data access, visualization, and analysis.

You can download the introductory slides here. The tutorial follows the process shown in Figure 1.


Figure 1. Workflow diagram illustrating the tutorial's process of data collection, transformation, analysis, and storage using the SOMOSPIE engine and NSDF services.


Table of contents

  1. Running the Tutorial
  2. Option 1: GitHub Codespaces (Recommended)
  3. Option 2: Docker
  4. Option 3: Jetstream2
  5. APPENDIX: Prerequities for Docker
  6. Community and Resources
  7. Publications
  8. Copyright and License
  9. Authors
  10. Acknowledgments

Running the Tutorial

This tutorial can be executed in two different environments:

  • GitHub Codespaces – a cloud-based development environment that requires no local installation beyond a GitHub account.

  • Docker – a container-based approach that requires Git and Docker installed on your local machine.

You can choose one of the two options below based on your preferred setup.

Option 1: GitHub Codespaces (Recommended)

Requirements: A GitHub account. No software installation required.

Open in GitHub Codespaces <= Click here to take you to create a new codespace

  1. Repository: TauferLab/NSDF-Tutorial-2025.
  2. Use the main branch of the repository.
  3. Dev container: NSDF Tutorial – Session II.
  4. Click Create Codespace.

This process may take a few minutes. Once ready, run Hands-on/session II/Tutorial.ipynb in Jupyter.


Figure 1. Creating GitHub Codespaces

Figure 2. Setting up your Codespace

Figure 3. VS Code interface in Codespaces
After the creation of the codespace, proceed to Session II by clicking in the file hands-on/session II/1.Tutorial.ipynb

Figure 4. Opening the tutorial file

Option 2: Docker

Requirements: Git, Docker Desktop (v4.15.10 or newer), 8 GB RAM, 5 GB disk space (See Appendix for more information on the installation.

  1. Install Git and Install Docker Desktop
  2. Open Docker Desktop before continuing.
  3. Open terminal and run:
    git clone https://github.com/TauferLab/NSDF-Tutorial-2025.git
    cd NSDF-Tutorial-2025/session\ II/Materials/
    docker-compose up -d
  4. Open http://127.0.0.1:5000/lab/tree/Tutorial.ipynb in a browser.
  5. To stop the container:
    docker-compose down

Option 3: Jetstream2

Requirements: Access to a Jetstream2 instance via SSH.

Initial Setup (First Time Only)

  1. If you do not already have a Jetstream2 instance, follow the Jetstream2 Setup Manual to create and connect to one.
  2. Connect to your Jetstream2 instance via SSH.
  3. Clone the tutorial repository:
    git clone https://github.com/TauferLab/NSDF-Tutorial-2025.git
  4. Navigate to the session materials directory and build the environment:
    cd NSDF-Tutorial-2025/hands-on/session\ II/Materials/
    module load miniforge
    ./build_jetstream_environment.sh
  5. Start Jupyter Lab:
    cd ..
    jupyter-ip.sh
  6. Open the Jupyter Lab URL printed in the terminal in your web browser.
  7. In Jupyter Lab, select the kernel named NSDF-Tutorial.

Starting the Environment (After Installation)

  1. Load Conda:
    module load miniforge
    
  2. Navigate to the tutorial directory and start Jupyter:
    cd hands-on/session\ II
    jupyter-ip.sh
  3. Open the Jupyter Lab URL printed in the terminal in your browser.

Accessing OpenVisuspy Dashboards

  1. From your local machine, create an SSH tunnel to forward the dashboard port:
    ssh -L 8989:127.0.0.1:8989 <Jetstream Native SSH>
  2. After executing the corresponding dashboard cell in Jupyter, open your browser and navigate to:
    http://localhost:8989

APPENDIX: Prerequisites

💡 ONLY IF YOU RUN THE TUTORIAL WITH DOCKER

To install Git and Docker Desktop on your computer, follow these steps:

  • To install Git: Follow the installation instructions for your operating system (Linux, Windows, or Mac).
  • To install Docker Desktop: Follow the installation instructions for your operating system (Linux, Windows, or Mac). -
  • Be sure you are running the most recent version of Docker! Previous versions to 4.15.10 may not work.

After installation, confirm that both tools are correctly set up by executing the following commands in your terminal.

💡 Note: For Windows users, we recommend using the PowerShell terminal for these verifications.

  • To verify the GitHub installation:
# Check the Git version
git --version

Expected output (NOTE: git version can be different):

git version 3.12.0
  • To verify Docker Desktop installation: Open the Docker Desktop application before running Docker commands.
# Check the Docker installation information
docker info

Expected output:

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false

Server:
 Containers: 120
  Running: 0
  Paused: 0
  Stopped: 120
 Images: 48

💡 Note: The specific numbers in the output might vary based on your installation details and additional information may also appear.

Using Docker

cd Materials
docker build --platform linux/amd64 -t globalcomputinglab/somospie_openvisus .
docker pull --platform linux/amd64 globalcomputinglab/somospie_openvisus:tutorial
docker run -d -p 5000:5000 -p 8989:8989 --name tutorial --platform linux/amd64 globalcomputinglab/somospie_openvisus

Visit: http://localhost:5000/

Using Your Local Machine

  1. Install Conda
  2. Run:
    cd Materials
    conda env create -f environment.yml
    conda activate NSDF-Tutorial
    cd GEOtiled/geotiled
    pip install -e .
    ./setup_openvisuspy.sh
    jupyter notebook Tutorial.ipynb

Community and Resources:

NSDF and SOMOSPIE are open-source projects. Questions, discussions, and contributions are welcome. Contributions can include new packages, bug fixes, documentation, or even new core features.

NSDF Resources:

OpenVisus Resources:

SOMOSPIE Resources:

GEOtiled Resources:

Publications

[1] Roa, C., Olaya, P., Llamas, R., Vargas, R., Taufer, M. GEOtiled: A Scalable Workflow for Generating Large Datasets of High-Resolution Terrain Parameters. Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (2023). link

[2] Olaya, Paula, and Luettgau, Jakob, and Roa, Camila, and Llamas, Richardo, and Vargas, Rodrigo, and Wen, Sophia, and Chung, I-Hsin, and Seelam, Seetharami, and Park, Yoonho, and Lofstead, Jay, and others. Enabling Scalability in the Cloud for Scientific Workflows: An Earth Science Use Case. IEEE International Conference on Cloud Computing (2023). link

[3] D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A modular SOil MOisture SPatial Inference Engine based on data-driven decisions. In Proceedings of the 2019 15th International Conference on eScience (eScience) (2019). link

[4] V. Pascucci and R. J. Frank, "Global Static Indexing for Real-Time Exploration of Very Large Regular Grids," SC '01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, Denver, CO, USA, 2001, pp. 45-45, link

[5] Pascucci, Valerio, et al. "The ViSUS visualization framework." High Performance Visualization. Chapman and Hall/CRC, 2012. 439-452. link

[6] Brian Summa, Giorgio Scorzelli, Ming Jiang, Peer-Timo Bremer, and Valerio Pascucci. 2011. Interactive editing of massive imagery made simple: Turning Atlanta into Atlantis. ACM Trans. Graph. 30, 2, Article 7 (April 2011), 13 pages. link

Copyright and License

Copyright (c) 2024, Global Computing Lab

Catalog Comparison Tool is distributed under the terms of the Apache License, Version 2.0 with LLVM Exceptions. See LICENSE for more details.

Authors

This project was created by the NSDF team and the SOMOSPIE team. To reach out email us at info@nationalsciencedatafabric.org and Dr. Michela Taufer mtaufer@utk.edu.

Acknowledgments

The authors of this tutorial would like to express their gratitude to:

  • NSF through the awards 2138811, 2103845, 2334945, 2138296, and 2331152.
  • The Dataverse team link
  • Vargas Lab led by Dr. Rodrigo Vargas link

Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.