Skip to content

Commit e4c2647

Browse files
committed
Feat: add default infra to htc (#5)
1 parent f2b67d4 commit e4c2647

249 files changed

Lines changed: 23654 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
2+
# Load Test for HTC
3+
4+
## Overview
5+
6+
This is a general purpose gRPC load test.
7+
8+
See the details of running the program in its [src/README.md](src/README.md). See below for
9+
deployment on Google Cloud.
10+
11+
## Deployment with Cloud Shell
12+
13+
The following link will walk you through a quick start in Cloud Shell:
14+
15+
[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.svg)](https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2Fgooglecloudplatform%2Frisk-and-research-blueprints&cloudshell_git_branch=main&cloudshell_workspace=examples%2Frisk%2Floadtest&cloudshell_tutorial=QUICKSTART.md&show=terminal)
16+
17+
## Deploy with Terraform
18+
19+
### Requirements
20+
21+
You must have have the following installed:
22+
* `gcloud` installed (see [installation](https://cloud.google.com/sdk/docs/install))
23+
* `kubectl` installed (see [install tools](https://kubernetes.io/docs/tasks/tools/))
24+
* A bash-based shell (Linux or Mac OS/X)
25+
26+
Note that Cloud Shell meets the requirements.
27+
28+
### Configuration
29+
30+
Create `terraform.tfvars` with the following content:
31+
```
32+
project_id="<project id>"
33+
region="<region>"
34+
zones=["<letter zone1>", "<letter zone2>", "<letter zone3>"]
35+
```
36+
37+
For example in us-central1:
38+
```
39+
project_id="<project id>"
40+
region="us-central1"
41+
zones=["a", "b", "c", "f"]
42+
```
43+
44+
For example in europe-west1:
45+
```
46+
project_id="<project id>"
47+
region="europe-west1"
48+
zones=["b", "c", "d"]
49+
```
50+
51+
### Create infrastructure
52+
53+
Authorize `gcloud` if needed:
54+
```sh
55+
gcloud auth login --quiet --update-adc
56+
```
57+
58+
Update the `gcloud` project:
59+
60+
```bash
61+
gcloud config set project YOUR_PROJECT_ID
62+
```
63+
64+
You may need to enable some basic APIs for Terraform to work:
65+
```sh
66+
gcloud services enable iam.googleapis.com cloudresourcemanager.googleapis.com
67+
```
68+
69+
Initialize and run terraform:
70+
```sh
71+
terraform init
72+
terraform apply
73+
```
74+
75+
NOTE: While running the terraform if the APIs are newly enabled, there may be
76+
timing errors and terraform apply will need to be re-run.
77+
78+
## Seeing infrastructure & Running Test Workloads
79+
80+
### See what's from terraform
81+
82+
Inspect the possible run scripts:
83+
```sh
84+
terraform output
85+
```
86+
87+
Key variable outputs:
88+
* local_test_scripts contain a list of shell scripts which you can run for different loadtests.
89+
* get_credentials is the command line to fetch the credentials for kubectl.
90+
* lookerstudio_create_dashboard_url is a link to create a new Lookerstudio Dashboard from a template.
91+
* monitoring_dashboard_url is a custom made monitoring dashboard for loadtest.
92+
93+
### Running the GUI
94+
95+
Create a virtual environment:
96+
```sh
97+
python3 -m venv ui/.venv
98+
ui/.venv/bin/python3 -m pip install -r ui/requirements.txt
99+
```
100+
101+
Run the Gradio dashboard:
102+
```sh
103+
ui/.venv/bin/python3 ui/main.py generated/config.yaml
104+
```
105+
106+
Use port 8080 or preview 8080 in the Cloud Shell (Webpreview). This allows you to load
107+
tests, inspect the jobs from BigQuery (similar to the dashboard), and has some deep
108+
links into the Console.
109+
110+
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
111+
## Inputs
112+
113+
| Name | Description | Type | Default | Required |
114+
|------|-------------|------|---------|:--------:|
115+
| access\_level\_name | (VPC-SC) Access Level full name. When providing this variable, additional identities will be added to the access level, these are required to work within an enforced VPC-SC Perimeter. | `string` | `null` | no |
116+
| additional\_quota\_enabled | Enable quota requests for additional resources | `bool` | `false` | no |
117+
| admin\_project | The admin project where cloudbuild/cloudrun configurations will be managed. | `string` | n/a | yes |
118+
| cluster\_project\_id | The GCP project ID where the cluster is created. | `string` | n/a | yes |
119+
| cluster\_project\_number | The GCP project ID where the cluster is created. | `string` | n/a | yes |
120+
| enable\_csi\_parallelstore | Enable the Parallelstore CSI Driver | `bool` | `true` | no |
121+
| env | The environment to prepare (ex. development) | `string` | n/a | yes |
122+
| gke\_cluster\_names | GKE Cluster Name to be used in configurations | `list(string)` | n/a | yes |
123+
| hsn\_bucket | Enable hierarchical namespace GCS buckets | `bool` | `false` | no |
124+
| infra\_project | The infrastructure project where resources will be managed. | `string` | n/a | yes |
125+
| network\_name | VPC Network Name | `string` | n/a | yes |
126+
| network\_self\_link | VPC Network self link | `string` | n/a | yes |
127+
| parallelstore\_deployment\_type | Parallelstore Instance deployment type (SCRATCH or PERSISTENT) | `string` | `"SCRATCH"` | no |
128+
| pubsub\_exactly\_once | Enable Pub/Sub exactly once subscriptions | `bool` | `true` | no |
129+
| quota\_contact\_email | Contact email for quota requests | `string` | `""` | no |
130+
| region | The region where the cloud resources will be deployed. | `string` | n/a | yes |
131+
| regions | List of regions where GKE clusters should be created. Used for multi-region deployments. | `list(string)` | <pre>[<br> "us-central1"<br>]</pre> | no |
132+
| service\_name | service name (e.g. 'transactionhistory') | `string` | n/a | yes |
133+
| service\_perimeter\_mode | (VPC-SC) Service perimeter mode: ENFORCE, DRY\_RUN. | `string` | `"DRY_RUN"` | no |
134+
| service\_perimeter\_name | (VPC-SC) Service perimeter name. The created projects in this step will be assigned to this perimeter. | `string` | `null` | no |
135+
| storage\_capacity\_gib | Capacity in GiB for the selected storage system (Parallelstore or Lustre) | `number` | `null` | no |
136+
| storage\_locations | Map of region to location (zone) for storage instances e.g. {"us-central1" = "us-central1-a"} | `map(string)` | `{}` | no |
137+
| storage\_type | The type of storage system to deploy (PARALLELSTORE, LUSTRE, or null for none) | `string` | `null` | no |
138+
| team | Environment Team, must be the same as the fleet scope team | `string` | n/a | yes |
139+
140+
## Outputs
141+
142+
| Name | Description |
143+
|------|-------------|
144+
| test\_scripts | Test configuration shell scripts |
145+
| ui\_config | Yaml configuration for UI deployment |
146+
147+
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
2+
# Agent for HTC
3+
4+
## Overview
5+
6+
The Agent for HTC is a tool for deploying [Unary GRPC](https://grpc.io/docs/what-is-grpc/core-concepts/#unary-rpc) services. This an RPC service
7+
that takes in a [protobuf message](https://protobuf.dev/overview/) and returns a
8+
protobuf message.
9+
10+
This can use the [loadtest](../loadtest) or [american-option](../american-option) as
11+
workloads and will be used in this example. Your own workload can be used as well.
12+
13+
## Agent Modes
14+
15+
The Agent is intended to be deployed as a sidecar to the gRPC service. It can
16+
be deployed in [Cloud Run](https://cloud.google.com/run) or [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine). Cloud Run
17+
provides scale-to-zero, rapid scaling, and a fully managed service. Google
18+
Kubernetes Engine provides immense flexibility, scalability, and enterprise control.
19+
20+
The following examples are focused in Cloud Run but some can run on Google Kubernetes
21+
Engine (or locally).
22+
23+
### BigQuery
24+
25+
![BigQuery Pattern](docs/bigquery_pattern.png "BigQuery Pattern")
26+
27+
The agent will be deployed alongside the workload in the same container on Cloud Run. It will
28+
receive HTTP requests from [BigQuery Remote Functions](https://cloud.google.com/bigquery/docs/remote-functions), converting
29+
JSON requests into protobuf, dispatching it to the gRPC service, and returning the
30+
results back to BigQuery as JSON.
31+
32+
This enables flexible data anslysis from BigQuery (or notebook leveraging BigQuery),
33+
accessing arbitrary gRPC-based services.
34+
35+
### Pub/Sub Push
36+
37+
![Pub/Sub Push Pattern](docs/pubsub_push_pattern.png "Pub/Sub Push Pattern")
38+
39+
The agent will be deployed alongside the workload in the same container on Cloud Run. It will
40+
receive HTTP requests from a [Pub/Sub Push Subscription](https://cloud.google.com/pubsub/docs/push). It will
41+
receive JSON requests, convert into protobuf, dispatching it to the gRPC service,
42+
and publish the protobuf result as JSON into a topic.
43+
44+
### Pub/Sub
45+
46+
![Pub/Sub Pattern](docs/pubsub_pattern.png "Pub/Sub Pattern")
47+
48+
The agent will be deployed alongside the workload in the same container on Cloud Run,
49+
the same container on GKE, or on the same machine. It will
50+
receive HTTP requests from a [Pub/Sub Subscription](https://cloud.google.com/pubsub/docs/overview). It will
51+
receive JSON requests from the subscription, convert into protobuf, dispatch it to the gRPC service,
52+
and publish the protobuf result as JSON into a topic.
53+
54+
### File IO
55+
56+
![File Pattern](docs/file_pattern.png "File Pattern")
57+
58+
The agent will read from a JSONL file, dispatch each line into gRPC, and write the
59+
results into a JSONL output file.
60+
61+
This is more intended for testing purposes.
62+
63+
## Test Modes
64+
65+
The Agent also includes two modes for testing: gRPC (direct) and Pub/Sub.
66+
67+
For testing it can be used for latency testing (on-going tests, measuring throughput
68+
and latency) or for batch (send all data, wait for all results).
69+
70+
TO BE ADDED.
71+
72+
## Testing Locally
73+
74+
NOTE: It is assumed that loadtest tasks have been generated as `../loadtest/tasks.jsonl`,
75+
and loadtest is running on port 2002. See [loadtest README.md](../loadtest/README.md)
76+
for instructions on getting it running.
77+
78+
### Build the container
79+
80+
```sh
81+
docker build -t agent .
82+
```
83+
84+
### Running File IO
85+
86+
The following command:
87+
* Runs the container on the host network (so can connect to the gRPC service)
88+
* Mounts the local loadtest folder (so it can read the task.jsonl)
89+
* Runs the "agent file" subcommand with the input and output files.
90+
* Configures the gRPC endpoint, service, and method for targeting the gRPC service.
91+
92+
```sh
93+
docker run \
94+
--network host \
95+
-v $PWD/../loadtest:/data \
96+
agent \
97+
agent file /data/tasks.jsonl /data/tasks_output.jsonl \
98+
--endpoint http://localhost:2002/main.LoadTestService/RunLibrary
99+
```
100+
101+
### Running BigQuery
102+
103+
Start the agent in BigQuery RDF mode like with File IO. This will open a port
104+
on 8080 (on the host) for HTTP requests.
105+
106+
```sh
107+
docker run \
108+
--network host \
109+
agent \
110+
--logJSON \
111+
agent \
112+
rdf \
113+
--endpoint http://localhost:2002/main.LoadTestService/RunLibrary
114+
```
115+
116+
Create an input.json file with the following tasks:
117+
118+
```sh
119+
cat > input.json <<EOF
120+
{
121+
"requestId": "id1",
122+
"caller": "caller1",
123+
"sessionUser": "sessionUser1",
124+
"userDefinedContext": {},
125+
"calls": [
126+
[{"task":{"id":"1","minMicros":"500000"}}],
127+
[{"task":{"id":"2","minMicros":"500000"}}],
128+
[{"task":{"id":"3","minMicros":"500000"}}],
129+
[{"task":{"id":"4","minMicros":"500000"}}]
130+
]
131+
}
132+
EOF
133+
```
134+
135+
Use curl to test it out:
136+
137+
```sh
138+
curl -H "Content-Type: application/json" --data @input.json http://localhost:8080/
139+
```
140+
141+
### Running Pub/Sub
142+
143+
Create the topics and subscriptions:
144+
145+
```sh
146+
gcloud pubsub topics create test-reqs
147+
gcloud pubsub subscriptions create --topic test-reqs test-reqs-sub
148+
gcloud pubsub topics create test-resps
149+
gcloud pubsub subscriptions create --topic test-resps test-resps-sub
150+
```
151+
152+
```sh
153+
docker run \
154+
--network host \
155+
agent \
156+
agent pubsub \
157+
--sub-name projects/your-project-id/subscriptions/test-reqs-sub \
158+
--topic-name projects/your-project-id/topics/test-resps \
159+
--endpoint http://localhost:2002/main.LoadTestService/RunLibrary
160+
```
161+
Delete the topics and subscriptions:
162+
163+
```sh
164+
gcloud pubsub subscriptions delete test-reqs-sub
165+
gcloud pubsub topics delete test-reqs
166+
gcloud pubsub subscriptions delete test-resps-sub
167+
gcloud pubsub topics delete test-resps
168+
```
169+
170+
Steps:
171+
* Create two topics and a subscription.
172+
* Run locally pulling from the subscription and publishing to the topic.
173+
174+
175+
### Running Pub/Sub Push
176+
177+
TO BE COMPLETED.
178+
179+
This requires more curl stuff.
36.7 KB
Loading
35.3 KB
Loading
44.7 KB
Loading
44 KB
Loading

0 commit comments

Comments
 (0)