Skip to content

Commit 637fa41

Browse files
authored
Merge branch 'main' into main
2 parents 9ad4350 + bb4ff2c commit 637fa41

12 files changed

Lines changed: 362 additions & 52 deletions

.github/workflows/python-unit-tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ jobs:
3838

3939
- name: Install the latest version of uv
4040
uses: astral-sh/setup-uv@v6
41+
with:
42+
version: "latest"
4143

4244
- name: Install dependencies
4345
run: |

contributing/adk_project_overview_and_architecture.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -28,20 +28,22 @@ Google Agent Development Kit (ADK) for Python
2828

2929
Adhere to this structure for compatibility with ADK tooling.
3030

31-
my_adk_project/ \
32-
└── src/ \
33-
└── my_app/ \
34-
├── agents/ \
35-
│ ├── my_agent/ \
31+
```
32+
my_adk_project/
33+
└── src/
34+
└── my_app/
35+
├── agents/
36+
│ ├── my_agent/
3637
│ │ ├── __init__.py # Must contain: from. import agent \
3738
│ │ └── agent.py # Must contain: root_agent = Agent(...) \
38-
│ └── another_agent/ \
39-
│ ├── __init__.py \
39+
│ └── another_agent/
40+
│ ├── __init__.py
4041
│ └── agent.py\
42+
```
4143

4244
agent.py: Must define the agent and assign it to a variable named root_agent. This is how ADK's tools find it.
4345

44-
__init__.py: In each agent directory, it must contain from. import agent to make the agent discoverable.
46+
`__init__.py`: In each agent directory, it must contain from. import agent to make the agent discoverable.
4547

4648
## Local Development & Debugging
4749

@@ -108,4 +110,3 @@ Test Cases: Create JSON files with input and a reference (expected tool calls an
108110
Metrics: tool_trajectory_avg_score (does it use tools correctly?) and response_match_score (is the final answer good?).
109111

110112
Run via: adk web (UI), pytest (for CI/CD), or adk eval (CLI).
111-

contributing/samples/adk_answering_agent/README.md

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,11 @@
22

33
The ADK Answering Agent is a Python-based agent designed to help answer questions in GitHub discussions for the `google/adk-python` repository. It uses a large language model to analyze open discussions, retrieve information from document store, generate response, and post a comment in the github discussion.
44

5-
This agent can be operated in three distinct modes: an interactive mode for local use, a batch script mode for oncall use, or as a fully automated GitHub Actions workflow (TBD).
5+
This agent can be operated in three distinct modes:
6+
7+
- An interactive mode for local use.
8+
- A batch script mode for oncall use.
9+
- A fully automated GitHub Actions workflow (TBD).
610

711
---
812

@@ -50,6 +54,15 @@ The `main.py` is reserved for the Github Workflow. The detailed setup for the au
5054

5155
---
5256

57+
## Update the Knowledge Base
58+
59+
The `upload_docs_to_vertex_ai_search.py` is a script to upload ADK related docs to Vertex AI Search datastore to update the knowledge base. It can be executed with the following command in your terminal:
60+
61+
```bash
62+
export PYTHONPATH=contributing/samples # If not already exported
63+
python -m adk_answering_agent.upload_docs_to_vertex_ai_search
64+
```
65+
5366
## Setup and Configuration
5467

5568
Whether running in interactive or workflow mode, the agent requires the following setup.
@@ -59,7 +72,7 @@ The agent requires the following Python libraries.
5972

6073
```bash
6174
pip install --upgrade pip
62-
pip install google-adk requests
75+
pip install google-adk
6376
```
6477

6578
The agent also requires gcloud login:
@@ -68,16 +81,28 @@ The agent also requires gcloud login:
6881
gcloud auth application-default login
6982
```
7083

84+
The upload script requires the following additional Python libraries.
85+
86+
```bash
87+
pip install google-cloud-storage google-cloud-discoveryengine
88+
```
89+
7190
### Environment Variables
7291
The following environment variables are required for the agent to connect to the necessary services.
7392

7493
* `GITHUB_TOKEN=YOUR_GITHUB_TOKEN`: **(Required)** A GitHub Personal Access Token with `issues:write` permissions. Needed for both interactive and workflow modes.
7594
* `GOOGLE_GENAI_USE_VERTEXAI=TRUE`: **(Required)** Use Google Vertex AI for the authentication.
7695
* `GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID`: **(Required)** The Google Cloud project ID.
7796
* `GOOGLE_CLOUD_LOCATION=LOCATION`: **(Required)** The Google Cloud region.
78-
* `VERTEXAI_DATASTORE_ID=YOUR_DATASTORE_ID`: **(Required)** The Vertex AI datastore ID for the document store (i.e. knowledge base).
97+
* `VERTEXAI_DATASTORE_ID=YOUR_DATASTORE_ID`: **(Required)** The full Vertex AI datastore ID for the document store (i.e. knowledge base), with the format of `projects/{project_number}/locations/{location}/collections/{collection}/dataStores/{datastore_id}`.
7998
* `OWNER`: The GitHub organization or username that owns the repository (e.g., `google`). Needed for both modes.
8099
* `REPO`: The name of the GitHub repository (e.g., `adk-python`). Needed for both modes.
81100
* `INTERACTIVE`: Controls the agent's interaction mode. For the automated workflow, this is set to `0`. For interactive mode, it should be set to `1` or left unset.
82101

102+
The following environment variables are required to upload the docs to update the knowledge base.
103+
104+
* `GCS_BUCKET_NAME=YOUR_GCS_BUCKET_NAME`: **(Required)** The name of the GCS bucket to store the documents.
105+
* `ADK_DOCS_ROOT_PATH=YOUR_ADK_DOCS_ROOT_PATH`: **(Required)** Path to the root of the downloaded adk-docs repo.
106+
* `ADK_PYTHON_ROOT_PATH=YOUR_ADK_PYTHON_ROOT_PATH`: **(Required)** Path to the root of the downloaded adk-python repo.
107+
83108
For local execution in interactive mode, you can place these variables in a `.env` file in the project's root directory. For the GitHub workflow, they should be configured as repository secrets.

contributing/samples/adk_answering_agent/settings.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,11 @@
2929
if not VERTEXAI_DATASTORE_ID:
3030
raise ValueError("VERTEXAI_DATASTORE_ID environment variable not set")
3131

32+
GOOGLE_CLOUD_PROJECT = os.getenv("GOOGLE_CLOUD_PROJECT")
33+
GCS_BUCKET_NAME = os.getenv("GCS_BUCKET_NAME")
34+
ADK_DOCS_ROOT_PATH = os.getenv("ADK_DOCS_ROOT_PATH")
35+
ADK_PYTHON_ROOT_PATH = os.getenv("ADK_PYTHON_ROOT_PATH")
36+
3237
OWNER = os.getenv("OWNER", "google")
3338
REPO = os.getenv("REPO", "adk-python")
3439
BOT_RESPONSE_LABEL = os.getenv("BOT_RESPONSE_LABEL", "bot responded")
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import os
16+
import sys
17+
18+
from adk_answering_agent.settings import ADK_DOCS_ROOT_PATH
19+
from adk_answering_agent.settings import ADK_PYTHON_ROOT_PATH
20+
from adk_answering_agent.settings import GCS_BUCKET_NAME
21+
from adk_answering_agent.settings import GOOGLE_CLOUD_PROJECT
22+
from adk_answering_agent.settings import VERTEXAI_DATASTORE_ID
23+
from google.api_core.exceptions import GoogleAPICallError
24+
from google.cloud import discoveryengine_v1beta as discoveryengine
25+
from google.cloud import storage
26+
import markdown
27+
28+
GCS_PREFIX_TO_ROOT_PATH = {
29+
"adk-docs": ADK_DOCS_ROOT_PATH,
30+
"adk-python": ADK_PYTHON_ROOT_PATH,
31+
}
32+
33+
34+
def cleanup_gcs_prefix(project_id: str, bucket_name: str, prefix: str) -> bool:
35+
"""Delete all the objects with the given prefix in the bucket."""
36+
print(f"Start cleaning up GCS: gs://{bucket_name}/{prefix}...")
37+
try:
38+
storage_client = storage.Client(project=project_id)
39+
bucket = storage_client.bucket(bucket_name)
40+
blobs = list(bucket.list_blobs(prefix=prefix))
41+
42+
if not blobs:
43+
print("GCS target location is already empty, no need to clean up.")
44+
return True
45+
46+
bucket.delete_blobs(blobs)
47+
print(f"Successfully deleted {len(blobs)} objects.")
48+
return True
49+
except GoogleAPICallError as e:
50+
print(f"[ERROR] Failed to clean up GCS: {e}", file=sys.stderr)
51+
return False
52+
53+
54+
def upload_directory_to_gcs(
55+
source_directory: str, project_id: str, bucket_name: str, prefix: str
56+
) -> bool:
57+
"""Upload the whole directory into GCS."""
58+
print(
59+
f"Start uploading directory {source_directory} to GCS:"
60+
f" gs://{bucket_name}/{prefix}..."
61+
)
62+
63+
if not os.path.isdir(source_directory):
64+
print(f"[Error] {source_directory} is not a directory or does not exist.")
65+
return False
66+
67+
storage_client = storage.Client(project=project_id)
68+
bucket = storage_client.bucket(bucket_name)
69+
file_count = 0
70+
for root, dirs, files in os.walk(source_directory):
71+
# Modify the 'dirs' list in-place to prevent os.walk from descending
72+
# into hidden directories.
73+
dirs[:] = [d for d in dirs if not d.startswith(".")]
74+
75+
# Keep only .md and .py files.
76+
files = [f for f in files if f.endswith(".md") or f.endswith(".py")]
77+
78+
for filename in files:
79+
local_path = os.path.join(root, filename)
80+
81+
relative_path = os.path.relpath(local_path, source_directory)
82+
gcs_path = os.path.join(prefix, relative_path)
83+
84+
try:
85+
content_type = None
86+
if filename.lower().endswith(".md"):
87+
# Vertex AI search doesn't recognize text/markdown,
88+
# convert it to html and use text/html instead
89+
content_type = "text/html"
90+
with open(local_path, "r", encoding="utf-8") as f:
91+
md_content = f.read()
92+
html_content = markdown.markdown(
93+
md_content, output_format="html5", encoding="utf-8"
94+
)
95+
if not html_content:
96+
print(" - Skipped empty file: " + local_path)
97+
continue
98+
gcs_path = gcs_path.removesuffix(".md") + ".html"
99+
bucket.blob(gcs_path).upload_from_string(
100+
html_content, content_type=content_type
101+
)
102+
else: # Python files
103+
bucket.blob(gcs_path).upload_from_filename(
104+
local_path, content_type=content_type
105+
)
106+
type_msg = (
107+
f"(type {content_type})" if content_type else "(type auto-detect)"
108+
)
109+
print(
110+
f" - Uploaded {type_msg}: {local_path} ->"
111+
f" gs://{bucket_name}/{gcs_path}"
112+
)
113+
file_count += 1
114+
except GoogleAPICallError as e:
115+
print(
116+
f"[ERROR] Error uploading file {local_path}: {e}", file=sys.stderr
117+
)
118+
return False
119+
120+
print(f"Sucessfully uploaded {file_count} files to GCS.")
121+
return True
122+
123+
124+
def import_from_gcs_to_vertex_ai(
125+
full_datastore_id: str,
126+
gcs_bucket: str,
127+
) -> bool:
128+
"""Triggers a bulk import task from a GCS folder to Vertex AI Search."""
129+
print(f"Triggering FULL SYNC import from gs://{gcs_bucket}/**...")
130+
131+
try:
132+
client = discoveryengine.DocumentServiceClient()
133+
gcs_uri = f"gs://{gcs_bucket}/**"
134+
request = discoveryengine.ImportDocumentsRequest(
135+
# parent has the format of
136+
# "projects/{project_number}/locations/{location}/collections/{collection}/dataStores/{datastore_id}/branches/default_branch"
137+
parent=full_datastore_id + "/branches/default_branch",
138+
# Specify the GCS source and use "content" for unstructed data.
139+
gcs_source=discoveryengine.GcsSource(
140+
input_uris=[gcs_uri], data_schema="content"
141+
),
142+
reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.FULL,
143+
)
144+
operation = client.import_documents(request=request)
145+
print(
146+
"Successfully started full sync import operation."
147+
f"Operation Name: {operation.operation.name}"
148+
)
149+
return True
150+
151+
except GoogleAPICallError as e:
152+
print(f"[ERROR] Error triggering import: {e}", file=sys.stderr)
153+
return False
154+
155+
156+
def main():
157+
# Check required environment variables.
158+
if not GOOGLE_CLOUD_PROJECT:
159+
print(
160+
"[ERROR] GOOGLE_CLOUD_PROJECT environment variable not set. Exiting...",
161+
file=sys.stderr,
162+
)
163+
return 1
164+
if not GCS_BUCKET_NAME:
165+
print(
166+
"[ERROR] GCS_BUCKET_NAME environment variable not set. Exiting...",
167+
file=sys.stderr,
168+
)
169+
return 1
170+
if not VERTEXAI_DATASTORE_ID:
171+
print(
172+
"[ERROR] VERTEXAI_DATASTORE_ID environment variable not set."
173+
" Exiting...",
174+
file=sys.stderr,
175+
)
176+
return 1
177+
if not ADK_DOCS_ROOT_PATH:
178+
print(
179+
"[ERROR] ADK_DOCS_ROOT_PATH environment variable not set. Exiting...",
180+
file=sys.stderr,
181+
)
182+
return 1
183+
if not ADK_PYTHON_ROOT_PATH:
184+
print(
185+
"[ERROR] ADK_PYTHON_ROOT_PATH environment variable not set. Exiting...",
186+
file=sys.stderr,
187+
)
188+
return 1
189+
190+
for gcs_prefix in GCS_PREFIX_TO_ROOT_PATH:
191+
# 1. Cleanup the GSC for a clean start.
192+
if not cleanup_gcs_prefix(
193+
GOOGLE_CLOUD_PROJECT, GCS_BUCKET_NAME, gcs_prefix
194+
):
195+
print("[ERROR] Failed to clean up GCS. Exiting...", file=sys.stderr)
196+
return 1
197+
198+
# 2. Upload the docs to GCS.
199+
if not upload_directory_to_gcs(
200+
GCS_PREFIX_TO_ROOT_PATH[gcs_prefix],
201+
GOOGLE_CLOUD_PROJECT,
202+
GCS_BUCKET_NAME,
203+
gcs_prefix,
204+
):
205+
print("[ERROR] Failed to upload docs to GCS. Exiting...", file=sys.stderr)
206+
return 1
207+
208+
# 3. Import the docs from GCS to Vertex AI Search.
209+
if not import_from_gcs_to_vertex_ai(VERTEXAI_DATASTORE_ID, GCS_BUCKET_NAME):
210+
print(
211+
"[ERROR] Failed to import docs from GCS to Vertex AI Search."
212+
" Exiting...",
213+
file=sys.stderr,
214+
)
215+
return 1
216+
217+
print("--- Sync task has been successfully initiated ---")
218+
return 0
219+
220+
221+
if __name__ == "__main__":
222+
sys.exit(main())

0 commit comments

Comments
 (0)