|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Edison Analysis API Tutorial\n", |
| 8 | + "\n", |
| 9 | + "This notebook provides you with an example usecase for using `Edison Analysis` to perform data analysis.\n", |
| 10 | + "\n", |
| 11 | + "The only dependency you need to follow along is `edison-client` which you can install via pip:\n", |
| 12 | + "\n", |
| 13 | + "```bash\n", |
| 14 | + "pip install edison-client\n", |
| 15 | + "```\n", |
| 16 | + "\n", |
| 17 | + "We recommend reading the edison client [docs](https://pypi.org/project/edison-client/) before following this tutorial.\n", |
| 18 | + "\n", |
| 19 | + "To run a `Edison Analysis` job you should take the following steps:\n", |
| 20 | + "\n", |
| 21 | + "1. Upload the any artifacts to the data storage service\n", |
| 22 | + "2. Start an `Edison Analysis` run using the Edison client passing the data storage entry ids\n", |
| 23 | + " along with any other details in the task config\n", |
| 24 | + "3. Use the output of the task to obtain any data generated by the task" |
| 25 | + ] |
| 26 | + }, |
| 27 | + { |
| 28 | + "cell_type": "code", |
| 29 | + "execution_count": null, |
| 30 | + "metadata": {}, |
| 31 | + "outputs": [], |
| 32 | + "source": [ |
| 33 | + "import time\n", |
| 34 | + "\n", |
| 35 | + "from edison_client import EdisonClient\n", |
| 36 | + "from edison_client.models import RuntimeConfig, TaskRequest\n", |
| 37 | + "from edison_client.models.app import JobNames" |
| 38 | + ] |
| 39 | + }, |
| 40 | + { |
| 41 | + "cell_type": "code", |
| 42 | + "execution_count": null, |
| 43 | + "metadata": {}, |
| 44 | + "outputs": [], |
| 45 | + "source": [ |
| 46 | + "# Instantiate the Edison client with your API key created via the platform\n", |
| 47 | + "EDISON_API_KEY = \"\" # Add your API key here\n", |
| 48 | + "client = EdisonClient(api_key=EDISON_API_KEY)" |
| 49 | + ] |
| 50 | + }, |
| 51 | + { |
| 52 | + "cell_type": "markdown", |
| 53 | + "metadata": {}, |
| 54 | + "source": [ |
| 55 | + "## File management with Edison Analysis\n", |
| 56 | + "\n", |
| 57 | + "`Edison Analysis` is designed to run data analysis on files provided by the user or caller. To provide `Edison Analysis` with this data, \n", |
| 58 | + "you'll need to upload it to the Edison data storage service. This service is your one stop shop for sharing, storing and\n", |
| 59 | + "updating data to be used in the Edison ecosystem." |
| 60 | + ] |
| 61 | + }, |
| 62 | + { |
| 63 | + "cell_type": "code", |
| 64 | + "execution_count": null, |
| 65 | + "metadata": {}, |
| 66 | + "outputs": [], |
| 67 | + "source": [ |
| 68 | + "# Uploading a single file to the data storage service\n", |
| 69 | + "single_file_upload_response = await client.astore_file_content(\n", |
| 70 | + " name=\"Demo file entry for a single file\",\n", |
| 71 | + " file_path=\"./datasets/brain_size_data.csv\", # ADD DATASET PATH HERE\n", |
| 72 | + " description=\"This is a test file that will be be analysed by Edison Analysis\",\n", |
| 73 | + ")" |
| 74 | + ] |
| 75 | + }, |
| 76 | + { |
| 77 | + "cell_type": "code", |
| 78 | + "execution_count": null, |
| 79 | + "metadata": {}, |
| 80 | + "outputs": [], |
| 81 | + "source": [ |
| 82 | + "# Uploading a directory to the data storage service\n", |
| 83 | + "directory_upload_response = await client.astore_file_content(\n", |
| 84 | + " name=\"Demo file entry for a whole directory\",\n", |
| 85 | + " file_path=\"./datasets\", # ADD DATASET FOLDER PATH HERE\n", |
| 86 | + " description=\"This is a directory that will be be analysed by Edison Analysis\",\n", |
| 87 | + " as_collection=True,\n", |
| 88 | + ")" |
| 89 | + ] |
| 90 | + }, |
| 91 | + { |
| 92 | + "cell_type": "markdown", |
| 93 | + "metadata": {}, |
| 94 | + "source": [ |
| 95 | + "## Running Your Job\n", |
| 96 | + "\n", |
| 97 | + "When running a `Edison Analysis` job there are some considerations to take with how you configure the agent. The first things \n", |
| 98 | + "to note are the core configuration settings like `language`, `max_steps` and `query`. In addition to these core settings you have some\n", |
| 99 | + "other options too. The key ones are listed below:\n", |
| 100 | + "\n", |
| 101 | + "### Additional tools available:\n", |
| 102 | + "- `query_ensembl`: query the Ensembl database\n", |
| 103 | + "- `get_convert_gene`: for converting gene IDs from one type to another, for example Ensembl, Entrez, Refseq.\n", |
| 104 | + "- `search_web`: expose exa.ai (/search) web search as a tool\n", |
| 105 | + "- `crawl_web`: expose exa.ai (/contents) web crawl as a tool\n", |
| 106 | + "- `research_web`: expose exa.ai (/research) web research as a tool\n", |
| 107 | + "- `query_literature`: allow `Edison Analysis` to do calls to `Edison Literature` for literature search\n", |
| 108 | + "\n", |
| 109 | + "- You can add in either user or system prompt for tool usage. For example: \"Use the query_literature tool to compare your findings against published literature.\"\n", |
| 110 | + "\n", |
| 111 | + "### Modifying system prompt\n", |
| 112 | + "There are two options to modify the system prompt:\n", |
| 113 | + "1. Replace the existing system prompt completely using `prompting_config[\"system_prompt\"]`\n", |
| 114 | + "2. Append additional guideline to existing system prompt using `prompting_config[\"system_prompt_additional_guidelines]`\n", |
| 115 | + "\n", |
| 116 | + "Build the `prompting_config` dictionary then assign it to the `\"prompting_config\"` key within `environment_config`" |
| 117 | + ] |
| 118 | + }, |
| 119 | + { |
| 120 | + "cell_type": "code", |
| 121 | + "execution_count": null, |
| 122 | + "metadata": {}, |
| 123 | + "outputs": [], |
| 124 | + "source": [ |
| 125 | + "# Define your task\n", |
| 126 | + "USER_QUERY = \"Teach me something new about crows.\" # The actual query you want Edison Analysis` to run\n", |
| 127 | + "SYSTEM_PROMPT = \"\" # By setting this, you will replace the system prompt entirely.\n", |
| 128 | + "SYSTEM_PROMPT_ADDITIONAL_GUIDELINES = (\n", |
| 129 | + " \"Make all figures in dark mode.\" # This will be appended to the system prompt\n", |
| 130 | + ")\n", |
| 131 | + "_SYSTEM_PROMPT_CONFIG = {\n", |
| 132 | + " \"system_prompt\": SYSTEM_PROMPT,\n", |
| 133 | + " \"system_prompt_additional_guidelines\": SYSTEM_PROMPT_ADDITIONAL_GUIDELINES,\n", |
| 134 | + "}\n", |
| 135 | + "LANGUAGE = \"PYTHON\" # Choose between \"R\" and \"PYTHON\"\n", |
| 136 | + "MAX_STEPS = 30 # You can change this to impose a limit on the number of steps the agent can take" |
| 137 | + ] |
| 138 | + }, |
| 139 | + { |
| 140 | + "cell_type": "code", |
| 141 | + "execution_count": null, |
| 142 | + "metadata": {}, |
| 143 | + "outputs": [], |
| 144 | + "source": [ |
| 145 | + "# Create a task\n", |
| 146 | + "task_data = TaskRequest(\n", |
| 147 | + " name=JobNames.ANALYSIS,\n", |
| 148 | + " query=USER_QUERY,\n", |
| 149 | + " runtime_config=RuntimeConfig(\n", |
| 150 | + " max_steps=MAX_STEPS,\n", |
| 151 | + " environment_config={\n", |
| 152 | + " \"language\": LANGUAGE,\n", |
| 153 | + " \"prompting_config\": {\n", |
| 154 | + " k: v for k, v in _SYSTEM_PROMPT_CONFIG.items() if v\n", |
| 155 | + " }, # See above for documentation\n", |
| 156 | + " \"data_storage_uris\": [\n", |
| 157 | + " f\"data_entry:{directory_upload_response.data_storage.id}\"\n", |
| 158 | + " ],\n", |
| 159 | + " \"additional_tools\": None, # See above for options\n", |
| 160 | + " },\n", |
| 161 | + " ),\n", |
| 162 | + ")\n", |
| 163 | + "trajectory_id = client.create_task(task_data)\n", |
| 164 | + "print(\n", |
| 165 | + " f\"Task running on platform, you can view progress live at:https://platform.edisonscientific.com/trajectories/{trajectory_id}\"\n", |
| 166 | + ")" |
| 167 | + ] |
| 168 | + }, |
| 169 | + { |
| 170 | + "cell_type": "code", |
| 171 | + "execution_count": null, |
| 172 | + "metadata": {}, |
| 173 | + "outputs": [], |
| 174 | + "source": [ |
| 175 | + "# Jobs take on average 3-10 minutes to complete\n", |
| 176 | + "# We also have inbuilt support for polling, asynchronous tasks and other utilities documented here:\n", |
| 177 | + "# https://edisonscientific.gitbook.io/edison-cookbook/edison-client\n", |
| 178 | + "status = \"in progress\"\n", |
| 179 | + "while status in {\"in progress\", \"queued\"}:\n", |
| 180 | + " status = client.get_task(trajectory_id).status\n", |
| 181 | + " time.sleep(15)\n", |
| 182 | + "\n", |
| 183 | + "if status == \"failed\":\n", |
| 184 | + " raise RuntimeError(\"Task failed\")\n", |
| 185 | + "\n", |
| 186 | + "job_result = client.get_task(trajectory_id, verbose=True)\n", |
| 187 | + "answer = job_result.environment_frame[\"state\"][\"state\"][\"answer\"]\n", |
| 188 | + "print(f\"The agent's answer to your research question is: \\n{answer}\")" |
| 189 | + ] |
| 190 | + }, |
| 191 | + { |
| 192 | + "cell_type": "markdown", |
| 193 | + "metadata": {}, |
| 194 | + "source": [ |
| 195 | + "## Download Task Output\n", |
| 196 | + "\n", |
| 197 | + "While the task is executing it will create some artifacts. First the notebook \n", |
| 198 | + "which is where the analysis code will be written and any other artifacts creating during the task.\n", |
| 199 | + "\n", |
| 200 | + "Once the task has completed you may want to check the contents of the notebook or look through the artifacts generated. \n", |
| 201 | + "To obtain these artifacts, you will need to inspect the output of the agent's final `environment_frame`" |
| 202 | + ] |
| 203 | + }, |
| 204 | + { |
| 205 | + "cell_type": "code", |
| 206 | + "execution_count": null, |
| 207 | + "metadata": {}, |
| 208 | + "outputs": [], |
| 209 | + "source": [ |
| 210 | + "output_data = job_result.environment_frame[\"state\"][\"info\"][\"output_data\"]\n", |
| 211 | + "print(output_data)" |
| 212 | + ] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "code", |
| 216 | + "execution_count": null, |
| 217 | + "metadata": {}, |
| 218 | + "outputs": [], |
| 219 | + "source": [ |
| 220 | + "for output_file in output_data:\n", |
| 221 | + " download_response = await client.afetch_data_from_storage(\n", |
| 222 | + " data_storage_id=output_file[\"entry_id\"]\n", |
| 223 | + " )\n", |
| 224 | + "\n", |
| 225 | + " # Note there are two potential outcomes here. One where the client downloads\n", |
| 226 | + " # the file to your local filesystem if it's above ~10MB. The second is where\n", |
| 227 | + " # it will return a RawFetchResponse object which contains the raw content.\n", |
| 228 | + " print(download_response)" |
| 229 | + ] |
| 230 | + } |
| 231 | + ], |
| 232 | + "metadata": { |
| 233 | + "kernelspec": { |
| 234 | + "display_name": ".venv", |
| 235 | + "language": "python", |
| 236 | + "name": "python3" |
| 237 | + }, |
| 238 | + "language_info": { |
| 239 | + "codemirror_mode": { |
| 240 | + "name": "ipython", |
| 241 | + "version": 3 |
| 242 | + }, |
| 243 | + "file_extension": ".py", |
| 244 | + "mimetype": "text/x-python", |
| 245 | + "name": "python", |
| 246 | + "nbconvert_exporter": "python", |
| 247 | + "pygments_lexer": "ipython3" |
| 248 | + } |
| 249 | + }, |
| 250 | + "nbformat": 4, |
| 251 | + "nbformat_minor": 4 |
| 252 | +} |
0 commit comments