Sydney-Informatics-Hub
diff --git a/‎fig/default_data_source.png‎
151 KB b/‎fig/default_data_source.png‎
151 KB
diff --git a/‎fig/workload_comp_resource.png‎
-10.3 KB b/‎fig/workload_comp_resource.png‎
-10.3 KB
diff --git a/‎fig/workload_connect_jupyter.png‎
96.9 KB b/‎fig/workload_connect_jupyter.png‎
96.9 KB
diff --git a/‎fig/workload_datasource.png‎
91.8 KB b/‎fig/workload_datasource.png‎
91.8 KB
diff --git a/‎fig/workload_definition.png‎
-2.53 KB b/‎fig/workload_definition.png‎
-2.53 KB
diff --git a/‎fig/workload_environment.png‎
16.4 KB b/‎fig/workload_environment.png‎
16.4 KB
diff --git a/‎fig/workload_logs.png‎
378 KB b/‎fig/workload_logs.png‎
378 KB
diff --git a/‎notebooks/data_sources.md‎
Lines changed: 8 additions & 1 deletion b/‎notebooks/data_sources.md‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎notebooks/jupyter_tutorial.md‎
Lines changed: 28 additions & 13 deletions b/‎notebooks/jupyter_tutorial.md‎
Lines changed: 28 additions & 13 deletions
@@ -1 +1,8 @@
-# Data Sources
+# Data Sources
+Data sources in Run:ai allow you to connect additional storage systems to your Run:ai projects, enabling seamless access to datasets required for your workloads. Run:ai supports various types of data sources, including PVC ([Persistent Volume Claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)), NFS, S3-compatible storage, and more.
+
+When a Run:ai project is created, a default data source (PVC) is automatically set up for this project. You can find the data source under the "Data Sources", listed as `pvc-<dashr_project_shortcode>`:
+
+![Default Data Source](../fig/default_data_source.png)
+
+This dedicated persistent storage can be accessable from any workload running within the same project. The default mount path inside the container is `/scratch/<dashr_project_shortcode>`. It is especially useful for storing intermediate results, model checkpoints, and [uploading data to the cluster](./data_transfer.md) from external sources (*e.g.* RDS).
@@ -6,32 +6,47 @@ Generally, the minimum requirements you need before creating the workload includ
 * Being granted permission to an active project
 * An [environment](environments.md) to run such job
 * Have created a data source, e.g. a PVC, to store your input and output data
-* Understand the compute resources you need to run the job and have the option available under Compute Resources
+* Understand the compute resources you need to run the job and have the option available under "Compute Resources"
 
 In this tutorial, we will create a simple Jupyter Lab workload that allows you to run Jupyter notebooks interactively on the SIH GPU cluster.
 
 ## Step 1: Create a workload
 Navigate to the Workloads section of the platform and click on the "NEW WORKLOAD" button. Select "Workspace" from the dropdown menu.
+
 ![New workload](../fig/workload_create_workspace.png)
 
 ## Step 2: Configure the workload from scratch
-Fill in the necessary details for your workload:
+Define the necessary information for your workload:
 
 * Under "Projects" select the project it will be linked to
-* Under "Templates" select "Start from sratch" (do not use any existing template)
+* Under "Templates" select "Start from sratch" (*i.e.* do not use any existing template)
+* Provide a descriptive name for the workload
+
 ![Project and Template](../fig/workload_definition.png)
-* Under "Environment" select the Jupyter Lab container environment you want to run
+
+* Select an environment to create the container. The SIH team has prepared a [pre-built image](https://hub.docker.com/r/sydneyinformaticshub/dgx-interactive-jupyterlab) (`sydneyinformatics/dgx-interactive-jupyterlab`) with Jupyter Lab and commonly used data science packages installed.
+
 ![Software environment](../fig/workload_environment.png)
-* Under "Compute resource" select the resources required.
+
+* Select the amount of compute resources to run the workload. In this tutorial, we will select the `small-fraction` option that requires 1 H200 GPU with 10% of its memory (~14GB).
+
 ![Compute resource](../fig/workload_comp_resource.png)
 
-There are other optional components you can add to a workload depending on the needs of your task. These include:
+* Configure the [data source](./data_sources.md) to be mounted to the container. Here we select the default PVC created for the project. The mount path inside the container is set to `/scratch/<dashr_project_shortcode>`.
+
+![Data resource](../fig/workload_datasource.png)
+
+* Lastly, Click on "CREATE WORKLOAD" to submit the workload to the cluster.
+
+## Step 3: Connect to Jupyter Lab
+
+When the status changes to "Running", you can access the Jupyter Lab interface by selecting "Jupyter" under "CONNECT".
+
+![Connect to the Jupyter Lab interface](../fig/workload_connect_jupyter.png)
+
+## (Optional) Step 4: Inspect system logs
+You can review the system logs to access details about event history, workload metrics, and real-time container output. This information is especially useful for debugging issues when a workload fails to start.
+
+![Workload logs](../fig/workload_logs.png)
 
-* Volume (i.e. temporary data storage)
-* Data Sources (e.g. PVCs)
-* Other general settings
-![](../fig/workload_additional_setups.png)
 
-## Optional: Create a workload from a template
-Besides the project allocation, all the other workload components can be populated from a pre-defined template:
-![Create workload from existing template](../fig/workload_template.png)