Skip to content

Commit 9e01f74

Browse files
authored
Edison Analysis Tutorial (#35)
1 parent a135643 commit 9e01f74

2 files changed

Lines changed: 430 additions & 0 deletions

File tree

Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Edison Analysis API Tutorial\n",
8+
"\n",
9+
"This notebook provides you with an example usecase for using `Edison Analysis` to perform data analysis.\n",
10+
"\n",
11+
"The only dependency you need to follow along is `edison-client` which you can install via pip:\n",
12+
"\n",
13+
"```bash\n",
14+
"pip install edison-client\n",
15+
"```\n",
16+
"\n",
17+
"We recommend reading the edison client [docs](https://pypi.org/project/edison-client/) before following this tutorial.\n",
18+
"\n",
19+
"To run a `Edison Analysis` job you should take the following steps:\n",
20+
"\n",
21+
"1. Upload the any artifacts to the data storage service\n",
22+
"2. Start an `Edison Analysis` run using the Edison client passing the data storage entry ids\n",
23+
" along with any other details in the task config\n",
24+
"3. Use the output of the task to obtain any data generated by the task"
25+
]
26+
},
27+
{
28+
"cell_type": "code",
29+
"execution_count": null,
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"import time\n",
34+
"\n",
35+
"from edison_client import EdisonClient\n",
36+
"from edison_client.models import RuntimeConfig, TaskRequest\n",
37+
"from edison_client.models.app import JobNames"
38+
]
39+
},
40+
{
41+
"cell_type": "code",
42+
"execution_count": null,
43+
"metadata": {},
44+
"outputs": [],
45+
"source": [
46+
"# Instantiate the Edison client with your API key created via the platform\n",
47+
"EDISON_API_KEY = \"\" # Add your API key here\n",
48+
"client = EdisonClient(api_key=EDISON_API_KEY)"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"## File management with Edison Analysis\n",
56+
"\n",
57+
"`Edison Analysis` is designed to run data analysis on files provided by the user or caller. To provide `Edison Analysis` with this data, \n",
58+
"you'll need to upload it to the Edison data storage service. This service is your one stop shop for sharing, storing and\n",
59+
"updating data to be used in the Edison ecosystem."
60+
]
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": null,
65+
"metadata": {},
66+
"outputs": [],
67+
"source": [
68+
"# Uploading a single file to the data storage service\n",
69+
"single_file_upload_response = await client.astore_file_content(\n",
70+
" name=\"Demo file entry for a single file\",\n",
71+
" file_path=\"./datasets/brain_size_data.csv\", # ADD DATASET PATH HERE\n",
72+
" description=\"This is a test file that will be be analysed by Edison Analysis\",\n",
73+
")"
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": null,
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"# Uploading a directory to the data storage service\n",
83+
"directory_upload_response = await client.astore_file_content(\n",
84+
" name=\"Demo file entry for a whole directory\",\n",
85+
" file_path=\"./datasets\", # ADD DATASET FOLDER PATH HERE\n",
86+
" description=\"This is a directory that will be be analysed by Edison Analysis\",\n",
87+
" as_collection=True,\n",
88+
")"
89+
]
90+
},
91+
{
92+
"cell_type": "markdown",
93+
"metadata": {},
94+
"source": [
95+
"## Running Your Job\n",
96+
"\n",
97+
"When running a `Edison Analysis` job there are some considerations to take with how you configure the agent. The first things \n",
98+
"to note are the core configuration settings like `language`, `max_steps` and `query`. In addition to these core settings you have some\n",
99+
"other options too. The key ones are listed below:\n",
100+
"\n",
101+
"### Additional tools available:\n",
102+
"- `query_ensembl`: query the Ensembl database\n",
103+
"- `get_convert_gene`: for converting gene IDs from one type to another, for example Ensembl, Entrez, Refseq.\n",
104+
"- `search_web`: expose exa.ai (/search) web search as a tool\n",
105+
"- `crawl_web`: expose exa.ai (/contents) web crawl as a tool\n",
106+
"- `research_web`: expose exa.ai (/research) web research as a tool\n",
107+
"- `query_literature`: allow `Edison Analysis` to do calls to `Edison Literature` for literature search\n",
108+
"\n",
109+
"- You can add in either user or system prompt for tool usage. For example: \"Use the query_literature tool to compare your findings against published literature.\"\n",
110+
"\n",
111+
"### Modifying system prompt\n",
112+
"There are two options to modify the system prompt:\n",
113+
"1. Replace the existing system prompt completely using `prompting_config[\"system_prompt\"]`\n",
114+
"2. Append additional guideline to existing system prompt using `prompting_config[\"system_prompt_additional_guidelines]`\n",
115+
"\n",
116+
"Build the `prompting_config` dictionary then assign it to the `\"prompting_config\"` key within `environment_config`"
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": null,
122+
"metadata": {},
123+
"outputs": [],
124+
"source": [
125+
"# Define your task\n",
126+
"USER_QUERY = \"Teach me something new about crows.\" # The actual query you want Edison Analysis` to run\n",
127+
"SYSTEM_PROMPT = \"\" # By setting this, you will replace the system prompt entirely.\n",
128+
"SYSTEM_PROMPT_ADDITIONAL_GUIDELINES = (\n",
129+
" \"Make all figures in dark mode.\" # This will be appended to the system prompt\n",
130+
")\n",
131+
"_SYSTEM_PROMPT_CONFIG = {\n",
132+
" \"system_prompt\": SYSTEM_PROMPT,\n",
133+
" \"system_prompt_additional_guidelines\": SYSTEM_PROMPT_ADDITIONAL_GUIDELINES,\n",
134+
"}\n",
135+
"LANGUAGE = \"PYTHON\" # Choose between \"R\" and \"PYTHON\"\n",
136+
"MAX_STEPS = 30 # You can change this to impose a limit on the number of steps the agent can take"
137+
]
138+
},
139+
{
140+
"cell_type": "code",
141+
"execution_count": null,
142+
"metadata": {},
143+
"outputs": [],
144+
"source": [
145+
"# Create a task\n",
146+
"task_data = TaskRequest(\n",
147+
" name=JobNames.ANALYSIS,\n",
148+
" query=USER_QUERY,\n",
149+
" runtime_config=RuntimeConfig(\n",
150+
" max_steps=MAX_STEPS,\n",
151+
" environment_config={\n",
152+
" \"language\": LANGUAGE,\n",
153+
" \"prompting_config\": {\n",
154+
" k: v for k, v in _SYSTEM_PROMPT_CONFIG.items() if v\n",
155+
" }, # See above for documentation\n",
156+
" \"data_storage_uris\": [\n",
157+
" f\"data_entry:{directory_upload_response.data_storage.id}\"\n",
158+
" ],\n",
159+
" \"additional_tools\": None, # See above for options\n",
160+
" },\n",
161+
" ),\n",
162+
")\n",
163+
"trajectory_id = client.create_task(task_data)\n",
164+
"print(\n",
165+
" f\"Task running on platform, you can view progress live at:https://platform.edisonscientific.com/trajectories/{trajectory_id}\"\n",
166+
")"
167+
]
168+
},
169+
{
170+
"cell_type": "code",
171+
"execution_count": null,
172+
"metadata": {},
173+
"outputs": [],
174+
"source": [
175+
"# Jobs take on average 3-10 minutes to complete\n",
176+
"# We also have inbuilt support for polling, asynchronous tasks and other utilities documented here:\n",
177+
"# https://edisonscientific.gitbook.io/edison-cookbook/edison-client\n",
178+
"status = \"in progress\"\n",
179+
"while status in {\"in progress\", \"queued\"}:\n",
180+
" status = client.get_task(trajectory_id).status\n",
181+
" time.sleep(15)\n",
182+
"\n",
183+
"if status == \"failed\":\n",
184+
" raise RuntimeError(\"Task failed\")\n",
185+
"\n",
186+
"job_result = client.get_task(trajectory_id, verbose=True)\n",
187+
"answer = job_result.environment_frame[\"state\"][\"state\"][\"answer\"]\n",
188+
"print(f\"The agent's answer to your research question is: \\n{answer}\")"
189+
]
190+
},
191+
{
192+
"cell_type": "markdown",
193+
"metadata": {},
194+
"source": [
195+
"## Download Task Output\n",
196+
"\n",
197+
"While the task is executing it will create some artifacts. First the notebook \n",
198+
"which is where the analysis code will be written and any other artifacts creating during the task.\n",
199+
"\n",
200+
"Once the task has completed you may want to check the contents of the notebook or look through the artifacts generated. \n",
201+
"To obtain these artifacts, you will need to inspect the output of the agent's final `environment_frame`"
202+
]
203+
},
204+
{
205+
"cell_type": "code",
206+
"execution_count": null,
207+
"metadata": {},
208+
"outputs": [],
209+
"source": [
210+
"output_data = job_result.environment_frame[\"state\"][\"info\"][\"output_data\"]\n",
211+
"print(output_data)"
212+
]
213+
},
214+
{
215+
"cell_type": "code",
216+
"execution_count": null,
217+
"metadata": {},
218+
"outputs": [],
219+
"source": [
220+
"for output_file in output_data:\n",
221+
" download_response = await client.afetch_data_from_storage(\n",
222+
" data_storage_id=output_file[\"entry_id\"]\n",
223+
" )\n",
224+
"\n",
225+
" # Note there are two potential outcomes here. One where the client downloads\n",
226+
" # the file to your local filesystem if it's above ~10MB. The second is where\n",
227+
" # it will return a RawFetchResponse object which contains the raw content.\n",
228+
" print(download_response)"
229+
]
230+
}
231+
],
232+
"metadata": {
233+
"kernelspec": {
234+
"display_name": ".venv",
235+
"language": "python",
236+
"name": "python3"
237+
},
238+
"language_info": {
239+
"codemirror_mode": {
240+
"name": "ipython",
241+
"version": 3
242+
},
243+
"file_extension": ".py",
244+
"mimetype": "text/x-python",
245+
"name": "python",
246+
"nbconvert_exporter": "python",
247+
"pygments_lexer": "ipython3"
248+
}
249+
},
250+
"nbformat": 4,
251+
"nbformat_minor": 4
252+
}

0 commit comments

Comments
 (0)