Skip to content

Commit 9191a8a

Browse files
Merge pull request #29 from sam-grant/sgrant/EAF
Added EAF tutorial
2 parents 5af0567 + 4a5847b commit 9191a8a

20 files changed

Lines changed: 398 additions & 0 deletions

EAF/Docs/01-Introduction.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Introduction
2+
3+
EAF is a web-based platform designed for Python analysis and ML tasks. It utilises container-based infrastructure, distinguishing it from traditional virtual machines. This approach allows underlying hardware resources to be swapped without breaking the container, providing "elasticity".
4+
5+
![EAF architecture](../Images/EAF_scheme.png)
6+
7+
## Navigation
8+
9+
- Next: [Accessing EAF](02-AccessingEAF.md)
10+
- [Back to Main](../README.md)

EAF/Docs/02-AccessingEAF.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Accessing EAF
2+
3+
EAF is entirely web-based at [analytics-hub.fnal.gov](https://analytics-hub.fnal.gov). To access EAF from outside of the FNAL network, you will need either:
4+
5+
- A Fermilab VPN connection; or
6+
- A configured proxy.
7+
8+
An addition, you will need an active services account.
9+
10+
## Setting up a Firefox proxy (SCD recommended method)
11+
12+
1. Ensure you have a valid kerberos ticket:
13+
```bash
14+
klist # Check ticket
15+
kinit <username>@FNAL.GOV # Create new ticket
16+
```
17+
18+
2. Start an SSH tunnel:
19+
```bash
20+
ssh -f -N -D 9999 <username>@mu2egpvm01.fnal.gov # Start tunnel
21+
```
22+
23+
3. Configure Firefox:
24+
25+
- Enter `about:config` in the address bar
26+
- Modify the following parameters:
27+
28+
| Parameter | Value |
29+
|-----------|-------|
30+
| network.proxy.socks | 127.0.0.1 |
31+
| network.proxy.socks_port | 9999 |
32+
| network.proxy.socks_remote_dns | true |
33+
| network.proxy.type | 1 |
34+
35+
To disable the proxy, reset `network.proxy.type` to its default value.
36+
37+
## Navigation
38+
39+
- Previous: [Overview](01-Introduction.md)
40+
- Next: [Starting an EAF Server](03-StartingAServer.md)
41+
- [Back to Main](../README.md)

EAF/Docs/03-StartingAServer.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Starting an EAF Server
2+
3+
1. Navigate to [analytics-hub.fnal.gov](https://analytics-hub.fnal.gov)
4+
2. Sign in with your Fermilab Services (SSO) account
5+
3. Click "Start My Server"
6+
4. In the Server Options:
7+
- Go to the "FIFE" server box
8+
- Click "CPU Interactives"
9+
- Select "AL9"
10+
- Scroll to bottom and click "Start"
11+
12+
![Server starting](../Images/ServerStarting.png)
13+
14+
The server may take a few minutes to initialise.
15+
16+
## Navigation
17+
18+
- Previous: [Accessing EAF](02-AccessingEAF.md)
19+
- Next: [Navigating JupyterHub](04-JupyterHub.md)
20+
- [Back to Main](../README.md)

EAF/Docs/04-JupyterHub.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Navigating EAF
2+
3+
Upon loading, you should land on a `JupyterHub` launcher page that offers a suite of applications:
4+
5+
- Terminal
6+
- Python notebook
7+
- Text editor
8+
- Interactive Python console
9+
10+
![The EAF area](../Images/JupyterHub.png)
11+
12+
## Directory Access
13+
14+
Your user area will be automatically created in `/home` with access to:
15+
16+
- `/exp/mu2e/app`
17+
- `/exp/mu2e/data`
18+
- `/pnfs` (possible via `xroot`, see [`anapytools`](07-anapytools.md))
19+
20+
You can test this by starting a terminal and running `ls /exp/mu2e`.
21+
22+
## Resources
23+
24+
Available resources per user:
25+
- 8 guaranteed cores
26+
- 64 GB memory
27+
- 23 GB storage
28+
29+
## Navigation
30+
31+
- Previous: [Starting an EAF Server](03-StartingAServer.md)
32+
- Next: [Using Conda/Mamba](05-CondaMamba.md)
33+
- [Back to Main](../README.md)

EAF/Docs/05-CondaMamba.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Using Conda/Mamba
2+
3+
Conda is an open-source Python package and environment management tool. Mamba is a `C++` implementation of Conda: it has the same command syntax but is supposed to be more efficient. We will use it to control our Python environment. To initialise:
4+
5+
1. Configure your `.bash_profile` using any text editor (such as `emacs` or `vim`):
6+
7+
```bash
8+
# Add to ~/.bash_profile
9+
if [ -f ~/.bashrc ]; then
10+
. ~/.bashrc
11+
fi
12+
```
13+
14+
2. Initialise Mamba:
15+
16+
```bash
17+
mamba init
18+
```
19+
20+
Ensure that you either start a new shell after initialisation, or run
21+
22+
```bash
23+
source ~/.bashrc
24+
```
25+
26+
## Basic Conda/Mamba commands
27+
28+
Below is a list of basic Conda/Mamba commands. You should **not** need to use most of these.
29+
30+
### Environment management
31+
32+
```bash
33+
# Activate an environment
34+
mamba activate myenv
35+
36+
# Deactivate current environment
37+
mamba deactivate
38+
39+
# List all environments
40+
mamba env list
41+
42+
# Remove a environment by name
43+
mamba env remove -n myenv
44+
45+
# Export environment to YAML file
46+
mamba env export > env.yml
47+
48+
# Create environment from YAML file
49+
mamba env create -f env.yml
50+
51+
# Create a new named environment
52+
mamba create -n myenv
53+
54+
```
55+
### Package management
56+
57+
```bash
58+
# Install a package
59+
mamba install package_name
60+
61+
# Install multiple packages
62+
mamba install package1 package2
63+
64+
# Remove a package
65+
mamba remove package_name
66+
67+
# Update a specific package
68+
mamba update package_name
69+
70+
# Update all packages
71+
mamba update --all
72+
73+
# List installed packages
74+
mamba list
75+
76+
# Search for a package
77+
mamba search package_name
78+
```
79+
80+
### General
81+
82+
```bash
83+
# Initialise Mamba
84+
mamba init
85+
86+
# Clean cache (downloaded packages)
87+
mamba clean
88+
89+
# Display Mamba system information
90+
mamba info
91+
```
92+
93+
## Navigation
94+
95+
- Previous: [Navigating JupyterHub](04-JupyterHub.md)
96+
- Next: [The Mu2e Python Environment](06-TheMu2eEnvironment.md)
97+
- [Back to Main](../README.md)

EAF/Docs/06-TheMu2eEnvironment.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# The Mu2e Python environment
2+
3+
To ensure that users have immediate access to the libraries and tools needed to conduct analysis, we have an installed a Mu2e Python environment on `/cvmfs`, called `mu2e_env`, that can used on **both** EAF and the virtual machines.
4+
5+
To set up this environment on EAF:
6+
7+
1. Create an environment symlink to the `current` version:
8+
```bash
9+
ln -s /cvmfs/mu2e.opensciencegrid.org/env/ana/current ~/.conda/envs/mu2e_env
10+
```
11+
12+
2. Activate the environment:
13+
```bash
14+
mamba activate mu2e_env
15+
```
16+
17+
## Available libraries in `v1.2.0`
18+
19+
- matplotlib
20+
- pandas
21+
- uproot
22+
- scipy
23+
- scikit-learn
24+
- pytorch
25+
- tensorflow
26+
- jupyterlab
27+
- notebook
28+
- statsmodels
29+
- awkward
30+
- urllib3 (v1.26.16)
31+
- ipykernel
32+
- conda-pack
33+
- fsspec-xrootd
34+
- htop
35+
- vector
36+
- plotly
37+
- dash
38+
- anapytools (v2.0.0)
39+
40+
## Use on the `gpvms`
41+
42+
To use `mu2e_env` on the Mu2e virtual machines:
43+
44+
```bash
45+
# Activate
46+
source /cvmfs/mu2e.opensciencegrid.org/env/ana/current/bin/activate
47+
48+
# Deactivate
49+
source /cvmfs/mu2e.opensciencegrid.org/env/ana/current/bin/deactivate
50+
```
51+
52+
Suggested aliases for `.my_bashrc`:
53+
54+
```bash
55+
alias pystart="source /cvmfs/mu2e.opensciencegrid.org/env/ana/current/bin/activate"
56+
alias pystop="source /cvmfs/mu2e.opensciencegrid.org/env/ana/current/bin/deactivate"
57+
```
58+
59+
When using alongside `muse`:
60+
61+
```bash
62+
mu2einit
63+
muse setup ops
64+
pystart # or source /cvmfs/mu2e.opensciencegrid.org/env/ana/current/bin/activate
65+
```
66+
67+
Activate after `muse setup` to ensure that your environment variables are set correctly.
68+
69+
## Interactive use
70+
71+
`mu2e_env` also includes an in-built interactive kernel, also called `mu2e_env`. See the following exercise for more information!
72+
73+
## Navigation
74+
75+
- Previous: [Using Conda/Mamba](05-CondaMamba.md)
76+
- Next: [Exercise: "Hello world!"](07-HelloWorld.md)
77+
- [Back to Main](../README.md)

EAF/Docs/07-HelloWorld.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Exercise: "Hello world!"
2+
3+
If you have been following the steps in the previous sections, you should now have a running EAF server and access to the `mu2e_env` Python environment.
4+
5+
## Starting a notebook
6+
7+
To start writing and running analysis code interactively, navigate to the launcher page by pressing the "+" button on the top-left of the screen. Then, select the "Notebook" with the kernel called `mu2e_env`.
8+
9+
![Start a notebook](../Images/StartANotebook.png)
10+
11+
## Managing your files
12+
13+
This will create a new file named `Untitled.ipynb` in your current directory, which you can see by looking at explorer tab on the left. To rename your notebook, right-click on the file and click "Rename".
14+
15+
You can also do this using `mv` from your terminal tab.
16+
17+
![Rename your notebook](../Images/RenameNotebook.png)
18+
19+
## Running code
20+
21+
To starting running code in your notebook, click on the first "cell" and type
22+
23+
```python
24+
print('Hello world!')
25+
```
26+
To run your cell, either press the "play" button or use `Shift+Return` while the cell is selected.
27+
28+
Note that attempts to run code and import modules may run slowly on the very first try. Jupyter needs time to connect to the kernel and Python needs time to compile and cache `.pyc` files. Following attempts will be much faster.
29+
30+
![Rename your notebook](../Images/HelloWorld.png)
31+
32+
## Navigation
33+
34+
- Previous: [The Mu2e Python environment](06-TheMu2eEnvironment.md)
35+
- Next: [anapytools](08-anapytools.md)
36+
- [Back to Main](../README.md)

EAF/Docs/08-anapytools.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# anapytools
2+
3+
[`anapytools`](https://github.com/Mu2e/anapytools.git) is a custom utilities library, installed in `mu2e_env`, that allows users to interface with `SAM` and `/pnfs` from EAF, and provides a multithreading tool.
4+
5+
## Setup
6+
7+
```bash
8+
source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh
9+
kinit ${USER}@FNAL.GOV
10+
/cvmfs/mu2e.opensciencegrid.org/bin/vomsCert
11+
```
12+
13+
## Example Usage
14+
15+
`anapytools` provides a tool to enable remote access to `/pnfs`, which is not directly accessible from EAF.
16+
17+
```python
18+
# Create file list from SAM dataset
19+
from anapytools.read_data import DataReader
20+
reader = DataReader()
21+
file_list = reader.get_file_list(defname='nts.mu2e.CeEndpointMix1BBSignal.Tutorial_2024_03.tka')
22+
23+
# Read file from /pnfs using xroot
24+
file = reader.read_file(filename='nts.sgrant.CosmicCRYExtractedCatTriggered.MDC2020ae_best_v1_3.001205_00000000.root')
25+
```
26+
27+
Example:
28+
29+
![Reading Data Example](../Images/ReadData.png)
30+
31+
In addition `anapytools` provides a tool for mutlithreading, which is extremely useful for running analysis jobs over multiple files.
32+
33+
```python
34+
# Parallel processing
35+
from anapytools.parallelise import ParallelProcessor
36+
processor = ParallelProcessor()
37+
38+
def process_function(filename):
39+
file = reader.read_file(filename, quiet=True)
40+
return
41+
42+
processor.multithread(process_function, file_list)
43+
```
44+
45+
Example:
46+
47+
![Parallel Processing Example](../Images/Parallelise.png)
48+
49+
## Navigation
50+
51+
- Previous: [Exercise: "Hello world!"](07-HelloWorld.md)
52+
- Next: [Custom environments](09-CustomEnvironments.md)
53+
- [Back to Main](../README.md)

EAF/Docs/09-CustomEnvironments.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## Custom environments
2+
3+
If desired, you can create your own environment from scratch.
4+
5+
To create a new environment from scratch:
6+
```bash
7+
mamba create -q -y -n my_env
8+
mamba activate my_env
9+
```
10+
11+
Install packages using: `mamba install <package_name>`
12+
13+
See [Using Conda/Mamba](05-CondaMamba.md) for more help!
14+
15+
- Previous: [anapytools](08-anapytools.md)
16+
- [Back to Main](../README.md)

EAF/Images/EAF_FIFE.png

560 KB
Loading

0 commit comments

Comments
 (0)