This tutorial introduces the NSDF ecosystem, which enhances scientific data access, analysis, and visualization through cloud technologies. It provides step-by-step guidance using a module of the SOMOSPIE engine called GEOtiled to retrieve raw data from public sources, such as the USGS portal and efficiently computes terrain attributes from digital elevation models (DEMs) across large geographic areas while preserving accuracy. The data is processed into multiple files for analysis using NSDF services and stored across both public and private platforms.
By the end of the tutorial, you will learn how to:
-
Build a modular workflow that integrates your application with NSDF services
-
Upload, download, and stream data across public and private storage platforms
-
Use the NSDF dashboard for large-scale data access, visualization, and analysis.
You can download the introductory slides here. The tutorial follows the process shown in Figure 1.
Figure 1. Workflow diagram illustrating the tutorial's process of data collection, transformation, analysis, and storage using the SOMOSPIE engine and NSDF services.
- Running the Tutorial
- Option 1: GitHub Codespaces (Recommended)
- Option 2: Docker
- Option 3: Jetstream2
- APPENDIX: Prerequities for Docker
- Community and Resources
- Publications
- Copyright and License
- Authors
- Acknowledgments
This tutorial can be executed in two different environments:
-
GitHub Codespaces – a cloud-based development environment that requires no local installation beyond a GitHub account.
-
Docker – a container-based approach that requires Git and Docker installed on your local machine.
You can choose one of the two options below based on your preferred setup.
Requirements: A GitHub account. No software installation required.
<= Click here to take you to create a new codespace
- Repository:
TauferLab/NSDF-Tutorial-2025. - Use the main branch of the repository.
- Dev container:
NSDF Tutorial – Session II. - Click Create Codespace.
This process may take a few minutes. Once ready, run Hands-on/session II/Tutorial.ipynb in Jupyter.
Requirements: Git, Docker Desktop (v4.15.10 or newer), 8 GB RAM, 5 GB disk space (See Appendix for more information on the installation.
- Install Git and Install Docker Desktop
- Open Docker Desktop before continuing.
- Open terminal and run:
git clone https://github.com/TauferLab/NSDF-Tutorial-2025.git cd NSDF-Tutorial-2025/session\ II/Materials/ docker-compose up -d - Open http://127.0.0.1:5000/lab/tree/Tutorial.ipynb in a browser.
- To stop the container:
docker-compose down
Requirements: Access to a Jetstream2 instance via SSH.
- If you do not already have a Jetstream2 instance, follow the Jetstream2 Setup Manual to create and connect to one.
- Connect to your Jetstream2 instance via SSH.
-
Clone the tutorial repository:
git clone https://github.com/TauferLab/NSDF-Tutorial-2025.git -
Navigate to the session materials directory and build the environment:
cd NSDF-Tutorial-2025/hands-on/session\ II/Materials/ module load miniforge ./build_jetstream_environment.sh -
Start Jupyter Lab:
cd .. jupyter-ip.sh - Open the Jupyter Lab URL printed in the terminal in your web browser.
- In Jupyter Lab, select the kernel named NSDF-Tutorial.
-
Load Conda:
module load miniforge -
Navigate to the tutorial directory and start Jupyter:
cd hands-on/session\ II jupyter-ip.sh - Open the Jupyter Lab URL printed in the terminal in your browser.
-
From your local machine, create an SSH tunnel to forward the dashboard port:
ssh -L 8989:127.0.0.1:8989 <Jetstream Native SSH> -
After executing the corresponding dashboard cell in Jupyter, open your browser and navigate to:
http://localhost:8989
💡 ONLY IF YOU RUN THE TUTORIAL WITH DOCKER
To install Git and Docker Desktop on your computer, follow these steps:
- To install Git: Follow the installation instructions for your operating system (Linux, Windows, or Mac).
- To install Docker Desktop: Follow the installation instructions for your operating system (Linux, Windows, or Mac). -
- Be sure you are running the most recent version of Docker! Previous versions to 4.15.10 may not work.
After installation, confirm that both tools are correctly set up by executing the following commands in your terminal.
💡 Note: For Windows users, we recommend using the PowerShell terminal for these verifications.
- To verify the GitHub installation:
# Check the Git version
git --version
Expected output (NOTE: git version can be different):
git version 3.12.0
- To verify Docker Desktop installation: Open the Docker Desktop application before running Docker commands.
# Check the Docker installation information
docker info
Expected output:
Client:
Version: 24.0.5
Context: default
Debug Mode: false
Server:
Containers: 120
Running: 0
Paused: 0
Stopped: 120
Images: 48
💡 Note: The specific numbers in the output might vary based on your installation details and additional information may also appear.
cd Materials
docker build --platform linux/amd64 -t globalcomputinglab/somospie_openvisus .
docker pull --platform linux/amd64 globalcomputinglab/somospie_openvisus:tutorial
docker run -d -p 5000:5000 -p 8989:8989 --name tutorial --platform linux/amd64 globalcomputinglab/somospie_openvisusVisit: http://localhost:5000/
- Install Conda
- Run:
cd Materials conda env create -f environment.yml conda activate NSDF-Tutorial cd GEOtiled/geotiled pip install -e . ./setup_openvisuspy.sh jupyter notebook Tutorial.ipynb
NSDF and SOMOSPIE are open-source projects. Questions, discussions, and contributions are welcome. Contributions can include new packages, bug fixes, documentation, or even new core features.
NSDF Resources:
- Slack workspace: nsdf-workspace.
- Github Discussions: issues: Discussions and Q&A.
- Mailing list: https://groups.google.com/g/nsdf - nsdf@googlegroups.com
- LinkedIN: LinkedIn)
OpenVisus Resources:
- Github: Open Source distribution of the ViSUS capabilities
- Webpage: VISUS - High performance Big Data Analysis and Visualization Solutions
SOMOSPIE Resources:
- GitHub: SOMOSPIE software
- Webpage: SOMOSPIE overview
- Questions: Michela Taufer mtaufer@utk.edu
GEOtiled Resources:
- GitHub: GEOtiled software
- Webpage: GEOtiled overview
- Questions: Michela Taufer mtaufer@utk.edu
[1] Roa, C., Olaya, P., Llamas, R., Vargas, R., Taufer, M. GEOtiled: A Scalable Workflow for Generating Large Datasets of High-Resolution Terrain Parameters. Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (2023). link
[2] Olaya, Paula, and Luettgau, Jakob, and Roa, Camila, and Llamas, Richardo, and Vargas, Rodrigo, and Wen, Sophia, and Chung, I-Hsin, and Seelam, Seetharami, and Park, Yoonho, and Lofstead, Jay, and others. Enabling Scalability in the Cloud for Scientific Workflows: An Earth Science Use Case. IEEE International Conference on Cloud Computing (2023). link
[3] D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A modular SOil MOisture SPatial Inference Engine based on data-driven decisions. In Proceedings of the 2019 15th International Conference on eScience (eScience) (2019). link
[4] V. Pascucci and R. J. Frank, "Global Static Indexing for Real-Time Exploration of Very Large Regular Grids," SC '01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, Denver, CO, USA, 2001, pp. 45-45, link
[5] Pascucci, Valerio, et al. "The ViSUS visualization framework." High Performance Visualization. Chapman and Hall/CRC, 2012. 439-452. link
[6] Brian Summa, Giorgio Scorzelli, Ming Jiang, Peer-Timo Bremer, and Valerio Pascucci. 2011. Interactive editing of massive imagery made simple: Turning Atlanta into Atlantis. ACM Trans. Graph. 30, 2, Article 7 (April 2011), 13 pages. link
Copyright (c) 2024, Global Computing Lab
Catalog Comparison Tool is distributed under the terms of the Apache License, Version 2.0 with LLVM Exceptions. See LICENSE for more details.
This project was created by the NSDF team and the SOMOSPIE team. To reach out email us at info@nationalsciencedatafabric.org and Dr. Michela Taufer mtaufer@utk.edu.
The authors of this tutorial would like to express their gratitude to:
- NSF through the awards 2138811, 2103845, 2334945, 2138296, and 2331152.
- The Dataverse team link
- Vargas Lab led by Dr. Rodrigo Vargas link
Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.




