Skip to content

Commit 0219c0b

Browse files
[Docs] Generate external links CSS decoration automatically (without manual hardcoded HTML)
1 parent 495d376 commit 0219c0b

File tree

84 files changed

+418
-381
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+418
-381
lines changed

docs/assets/javascripts/extra.js

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,4 +155,11 @@ window.addEventListener("DOMContentLoaded", function() {
155155
}
156156
});
157157
})
158+
159+
document.querySelectorAll('a[href^="http"]').forEach(link => {
160+
if (!link.href.includes(location.hostname)) {
161+
link.setAttribute('target', '_blank');
162+
link.setAttribute('rel', 'noopener noreferrer');
163+
}
164+
});
158165
})()

docs/assets/stylesheets/extra.css

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1350,7 +1350,7 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
13501350
visibility: visible;
13511351
}*/
13521352

1353-
.twemoji.external {
1353+
/* .twemoji.external {
13541354
position: relative;
13551355
top: 2.5px;
13561356
height: 18.5px;
@@ -1364,7 +1364,7 @@ html .md-footer-meta.md-typeset a:is(:focus,:hover) {
13641364
position: relative;
13651365
top: 1.5px;
13661366
margin-right: -7px;
1367-
}
1367+
} */
13681368

13691369
/*.md-tabs__item:nth-child(6) .md-tabs__link:before {
13701370
position: relative;
@@ -1801,3 +1801,33 @@ img.border {
18011801
font-size: 12px !important;;
18021802
padding: 30px !important;
18031803
}
1804+
1805+
/* External link indicator */
1806+
a[href^="http"]:not(:where(
1807+
/* exclude http:// dstack links */
1808+
[href^="http://dstack.ai"],
1809+
/* exclude https://dstack.ai links */
1810+
[href^="https://dstack.ai"],
1811+
)):after {
1812+
content: '';
1813+
display: inline-block;
1814+
width: 18.5px;
1815+
height: 18.5px;
1816+
margin-left: 0.15em;
1817+
vertical-align: -0.2em;
1818+
background-color: currentColor;
1819+
mask-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"></path></svg>');
1820+
mask-size: 100%;
1821+
mask-repeat: no-repeat;
1822+
mask-position: center;
1823+
-webkit-mask-image: url('data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor"><path d="m11.93 5 2.83 2.83L5 17.59 6.42 19l9.76-9.75L19 12.07V5z"></path></svg>');
1824+
-webkit-mask-size: 100%;
1825+
-webkit-mask-repeat: no-repeat;
1826+
-webkit-mask-position: center;
1827+
text-decoration: none;
1828+
}
1829+
1830+
/* Exclude links inside .md-social */
1831+
.md-social a[href^="http"]:after {
1832+
display: none;
1833+
}

docs/assets/stylesheets/landing.css

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -327,7 +327,7 @@
327327
margin-right: -7px;
328328
}
329329

330-
.md-button-secondary.external:after {
330+
/* .md-button-secondary.external:after {
331331
content: url('data:image/svg+xml,<svg fill="rgba(0, 0, 0, 0.87)" xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewBox="0 0 16 16"><polygon points="5 4.31 5 5.69 9.33 5.69 2.51 12.51 3.49 13.49 10.31 6.67 10.31 11 11.69 11 11.69 4.31 5 4.31" data-v-e1bdab2c=""></polygon></svg>');
332332
line-height: 14px;
333333
margin-left: 5px;
@@ -343,7 +343,7 @@
343343
position: relative;
344344
top: 2.5px;
345345
margin-right: -7px;
346-
}
346+
} */
347347

348348
.md-header__buttons .md-button-secondary,
349349
.md-typeset .md-button-secondary,
@@ -702,13 +702,13 @@
702702
line-height: 32px;
703703
}
704704

705-
.tx-landing__highlights_grid h3.external:after {
705+
/* .tx-landing__highlights_grid h3.external:after {
706706
content: url('data:image/svg+xml,<svg fill="black" xmlns="http://www.w3.org/2000/svg" width="22px" height="22px" viewBox="0 0 16 16"><polygon points="5 4.31 5 5.69 9.33 5.69 2.51 12.51 3.49 13.49 10.31 6.67 10.31 11 11.69 11 11.69 4.31 5 4.31" data-v-e1bdab2c=""></polygon></svg>');
707707
margin-left: 2px;
708708
position: relative;
709709
top: 3px;
710710
margin-right: -7px;
711-
}
711+
} */
712712

713713
.tx-landing__highlights_grid p {
714714
font-size: 16px;

docs/blog/archive/ambassador-program.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,8 @@ yourself and your experience. We’ll reach out with a starter kit and next step
5858
Get involved
5959
</a>
6060

61-
Have questions? Reach out via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}!
61+
Have questions? Reach out via [Discord](https://discord.gg/u8SmfwPpMd)!
6262

6363
> 💜 In the meantime, we’re thrilled to
64-
> welcome [Park Chansung :material-arrow-top-right-thin:{ .external }](https://x.com/algo_diver){:target="_blank"}, the
64+
> welcome [Park Chansung](https://x.com/algo_diver), the
6565
> first `dstack` ambassador.

docs/blog/archive/efa.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ categories:
1010

1111
# Efficient distributed training with AWS EFA
1212

13-
[Amazon Elastic Fabric Adapter (EFA) :material-arrow-top-right-thin:{ .external }](https://aws.amazon.com/hpc/efa/){:target="_blank"} is a high-performance network interface designed for AWS EC2 instances, enabling
13+
[Amazon Elastic Fabric Adapter (EFA)](https://aws.amazon.com/hpc/efa/) is a high-performance network interface designed for AWS EC2 instances, enabling
1414
ultra-low latency and high-throughput communication between nodes. This makes it an ideal solution for scaling
1515
distributed training workloads across multiple GPUs and instances.
1616

@@ -39,7 +39,7 @@ network interfaces, you’ll need to disable public IPs. Note, the `dstack`
3939
server in this case should have access to the private subnet of the VPC.
4040

4141
You’ll also need to specify an AMI that includes the GDRCopy drivers. For example, you can use the
42-
[AWS Deep Learning Base GPU AMI :material-arrow-top-right-thin:{ .external }](https://aws.amazon.com/releasenotes/aws-deep-learning-base-gpu-ami-ubuntu-22-04/){:target="_blank"}.
42+
[AWS Deep Learning Base GPU AMI](https://aws.amazon.com/releasenotes/aws-deep-learning-base-gpu-ami-ubuntu-22-04/).
4343

4444
Here’s an example backend configuration:
4545

@@ -164,10 +164,10 @@ $ dstack apply -f examples/misc/efa/task.dstack.yml -R
164164
EFA.
165165

166166
> Have questions? You're welcome to join
167-
> our [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"} or talk
168-
> directly to [our team :material-arrow-top-right-thin:{ .external }](https://calendly.com/dstackai/discovery-call){:target="_blank"}.
167+
> our [Discord](https://discord.gg/u8SmfwPpMd) or talk
168+
> directly to [our team](https://calendly.com/dstackai/discovery-call).
169169

170170
!!! info "What's next?"
171171
1. Check [fleets](../../docs/concepts/fleets.md), [tasks](../../docs/concepts/tasks.md), and [volumes](../../docs/concepts/volumes.md)
172172
2. Also see [dev environments](../../docs/concepts/dev-environments.md) and [services](../../docs/concepts/services.md)
173-
3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}
173+
3. Join [Discord](https://discord.gg/u8SmfwPpMd)

docs/blog/posts/amd-mi300x-inference-benchmark.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ categories:
1212

1313
At `dstack`, we've been adding support for AMD GPUs with [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets),
1414
so we saw this as a great chance to test our integration by benchmarking AMD GPUs. Our friends at
15-
[Hot Aisle :material-arrow-top-right-thin:{ .external }](https://hotaisle.xyz/){:target="_blank"}, who build top-tier
15+
[Hot Aisle](https://hotaisle.xyz/), who build top-tier
1616
bare metal compute for AMD GPUs, kindly provided the hardware for the benchmark.
1717

1818
<img src="https://dstack.ai/static-assets/static-assets/images/dstack-hotaisle-amd-mi300x-prompt-v5.png" width="750" />
@@ -106,7 +106,7 @@ Here is the spec of the bare metal machine we got:
106106
??? info "TGI"
107107
The `ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm` Docker image was used.
108108

109-
For conducting the tests, we've been using the [`benchmark_serving` :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py){:target="_blank"} provided by vLLM.
109+
For conducting the tests, we've been using the [`benchmark_serving`](https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py) provided by vLLM.
110110

111111
## Observations
112112

@@ -175,7 +175,7 @@ to vLLM.
175175

176176
<img src="https://raw.githubusercontent.com/dstackai/benchmarks/refs/heads/main/amd/inference/gpu_vram_tgi_vllm.png" width="750" />
177177

178-
This difference may be related to how vLLM [pre-allocates GPU cache :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/models/performance.html){:target="_blank"}.
178+
This difference may be related to how vLLM [pre-allocates GPU cache](https://docs.vllm.ai/en/latest/models/performance.html).
179179

180180
## Conclusion
181181

@@ -203,22 +203,22 @@ like the H100 and H200, as well as possibly Google TPU.
203203
### Source code
204204

205205
The source code used for this benchmark can be found in our
206-
[GitHub repo :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/benchmarks/tree/main/amd/inference){:target="_blank"}.
206+
[GitHub repo](https://github.com/dstackai/benchmarks/tree/main/amd/inference).
207207

208208
If you have questions, feedback, or want to help improve the benchmark, please reach out to our team.
209209

210210
## Thanks to our friends
211211

212212
### Hot Aisle
213213

214-
[Hot Aisle :material-arrow-top-right-thin:{ .external }](https://hotaisle.xyz/){:target="_blank"}
214+
[Hot Aisle](https://hotaisle.xyz/)
215215
is the primary sponsor of this benchmark, and we are sincerely grateful for their hardware and support.
216216

217217
If you'd like to use top-tier bare metal compute with AMD GPUs, we recommend going
218218
with Hot Aisle. Once you gain access to a cluster, it can be easily accessed via `dstack`'s [SSH fleet](../../docs/concepts/fleets.md#ssh-fleets) easily.
219219

220220
### RunPod
221221
If you’d like to use on-demand compute with AMD GPUs at affordable prices, you can configure `dstack` to
222-
use [RunPod :material-arrow-top-right-thin:{ .external }](https://runpod.io/){:target="_blank"}. In
222+
use [RunPod](https://runpod.io/). In
223223
this case, `dstack` will be able to provision fleets automatically when you run dev environments, tasks, and
224224
services.

docs/blog/posts/amd-on-runpod.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,14 +33,14 @@ One of the main advantages of the `MI300X` is its VRAM. For example, with the `H
3333
version of Llama 3.1 405B into a single node with 8 GPUs—you'd have to use FP8 instead. However, with the `MI300X`, you
3434
can fit FP16 into a single node with 8 GPUs, and for FP8, you'd only need 4 GPUs.
3535

36-
With the [latest update :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/releases/0.18.11rc1){:target="_blank"},
36+
With the [latest update](https://github.com/dstackai/dstack/releases/0.18.11rc1),
3737
you can now specify an AMD GPU under `resources`. Below are a few examples.
3838

3939
## Configuration
4040

4141
=== "Service"
4242
Here's an example of a [service](../../docs/concepts/services.md) that deploys
43-
Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/installation_amd){:target="_blank"}.
43+
Llama 3.1 70B in FP16 using [TGI](https://huggingface.co/docs/text-generation-inference/en/installation_amd).
4444

4545
<div editor-title="examples/inference/tgi/amd/service.dstack.yml">
4646

@@ -72,7 +72,7 @@ you can now specify an AMD GPU under `resources`. Below are a few examples.
7272

7373
=== "Dev environment"
7474
Here's an example of a [dev environment](../../docs/concepts/dev-environments.md) using
75-
[TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/installation_amd){:target="_blank"}'s
75+
[TGI](https://huggingface.co/docs/text-generation-inference/en/installation_amd)'s
7676
Docker image:
7777

7878
```yaml
@@ -111,11 +111,11 @@ cloud resources and run the configuration.
111111
## What's next?
112112

113113
1. The examples above demonstrate the use of
114-
[TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/installation_amd){:target="_blank"}.
114+
[TGI](https://huggingface.co/docs/text-generation-inference/en/installation_amd).
115115
AMD accelerators can also be used with other frameworks like vLLM, Ollama, etc., and we'll be adding more examples soon.
116116
2. RunPod is the first cloud provider where dstack supports AMD. More cloud providers will be supported soon as well.
117-
3. Want to give RunPod and `dstack` a try? Make sure you've signed up for [RunPod :material-arrow-top-right-thin:{ .external }](https://www.runpod.io/){:target="_blank"},
117+
3. Want to give RunPod and `dstack` a try? Make sure you've signed up for [RunPod](https://www.runpod.io/),
118118
then [set up](../../docs/reference/server/config.yml.md#runpod) the `dstack server`.
119119

120-
> Have questioned or feedback? Join our [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}
120+
> Have questioned or feedback? Join our [Discord](https://discord.gg/u8SmfwPpMd)
121121
server.

docs/blog/posts/amd-on-tensorwave.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Since last month, when we introduced support for private clouds and data centers
1414
to orchestrate AI containers with any AI cloud vendor, whether they provide on-demand compute or reserved clusters.
1515

1616
In this tutorial, we’ll walk you through how `dstack` can be used with
17-
[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using
17+
[TensorWave](https://tensorwave.com/) using
1818
[SSH fleets](../../docs/concepts/fleets.md#ssh-fleets).
1919

2020
<img src="https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png" width="630"/>
@@ -237,4 +237,4 @@ Want to see how it works? Check out the video below:
237237
!!! info "What's next?"
238238
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh-fleets)
239239
2. Read about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md)
240-
3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
240+
3. Join [Discord](https://discord.gg/u8SmfwPpMd)

docs/blog/posts/benchmark-amd-containers-and-partitions.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Our new benchmark explores two important areas for optimizing AI workloads on AM
1616

1717
<!-- more -->
1818

19-
This benchmark was supported by [Hot Aisle :material-arrow-top-right-thin:{ .external }](https://hotaisle.xyz/){:target="_blank"},
19+
This benchmark was supported by [Hot Aisle](https://hotaisle.xyz/),
2020
a provider of AMD GPU bare-metal and VM infrastructure.
2121

2222
## Benchmark 1: Bare-metal vs containers
@@ -56,11 +56,11 @@ Our experiments consistently demonstrate that running multi-node AI workloads in
5656

5757
## Benchmark 2: Partition performance isolated vs mesh
5858

59-
The AMD GPU can be [partitioned :material-arrow-top-right-thin:{ .external }](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/gpu-partitioning/mi300x/overview.html){:target="_blank"} into smaller, independent units (e.g., NPS4 mode splits one GPU into four partitions). This promises better memory bandwidth utilization. Does this theoretical gain translate to better performance in practice?
59+
The AMD GPU can be [partitioned](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/gpu-partitioning/mi300x/overview.html) into smaller, independent units (e.g., NPS4 mode splits one GPU into four partitions). This promises better memory bandwidth utilization. Does this theoretical gain translate to better performance in practice?
6060

6161
### Finding 1: Higher performance for isolated partitions
6262

63-
First, we sought to reproduce and extend findings from the [official ROCm blog :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/software-tools-optimization/compute-memory-modes/README.html){:target="_blank"}. We benchmarked the memory bandwidth of a single partition (in CPX/NPS4 mode) against a full, unpartitioned GPU (in SPX/NPS1 mode).
63+
First, we sought to reproduce and extend findings from the [official ROCm blog](https://rocm.blogs.amd.com/software-tools-optimization/compute-memory-modes/README.html). We benchmarked the memory bandwidth of a single partition (in CPX/NPS4 mode) against a full, unpartitioned GPU (in SPX/NPS1 mode).
6464

6565
<img src="https://dstack.ai/static-assets/static-assets/images/benchmark-amd-containers-and-partitions-chart4a.png" width="750"/>
6666

@@ -100,7 +100,7 @@ GPU partitioning is only practical if used dynamically—for instance, to run mu
100100
#### Limitations
101101

102102
1. **Reproducibility**: AMD’s original blog post on partitioning lacked detailed setup information, so we had to reconstruct the benchmarks independently.
103-
2. **Network tuning**: These benchmarks were run on a default, out-of-the-box network configuration. Our results for RCCL (~339 GB/s) and RDMA (~726 Gbps) are slightly below the peak figures [reported by Dell :material-arrow-top-right-thin:{ .external }](https://infohub.delltechnologies.com/en-us/l/generative-ai-in-the-enterprise-with-amd-accelerators/rccl-and-perftest-for-cluster-validation-1/4/){:target="_blank"}. This suggests that further performance could be unlocked with expert tuning of network topology, MTU size, and NCCL environment variables.
103+
2. **Network tuning**: These benchmarks were run on a default, out-of-the-box network configuration. Our results for RCCL (~339 GB/s) and RDMA (~726 Gbps) are slightly below the peak figures [reported by Dell](https://infohub.delltechnologies.com/en-us/l/generative-ai-in-the-enterprise-with-amd-accelerators/rccl-and-perftest-for-cluster-validation-1/4/). This suggests that further performance could be unlocked with expert tuning of network topology, MTU size, and NCCL environment variables.
104104

105105
## Benchmark setup
106106

@@ -352,7 +352,7 @@ The `SIZE` value is `1M`, `2M`, .., `8G`.
352352

353353
**vLLM data parallel**
354354

355-
1. Build nginx container (see [vLLM-nginx :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/stable/deployment/nginx.html#build-nginx-container){:target="_blank"}).
355+
1. Build nginx container (see [vLLM-nginx](https://docs.vllm.ai/en/stable/deployment/nginx.html#build-nginx-container)).
356356

357357
2. Create `nginx.conf`
358358

@@ -471,13 +471,13 @@ HIP_VISIBLE_DEVICES=0 python3 toy_inference_benchmark.py \
471471

472472
## Source code
473473

474-
All source code and findings are available in [our GitHub repo :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/benchmarks/tree/main/amd/baremetal_container_partition){:target="_blank"}.
474+
All source code and findings are available in [our GitHub repo](https://github.com/dstackai/benchmarks/tree/main/amd/baremetal_container_partition).
475475

476476
## References
477477

478-
* [AMD Instinct MI300X GPU partitioning overview :material-arrow-top-right-thin:{ .external }](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/gpu-partitioning/mi300x/overview.html){:target="_blank"}
479-
* [Deep dive into partition modes by AMD :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/software-tools-optimization/compute-memory-modes/README.html){:target="_blank"}.
480-
* [RCCL and PerfTest for cluster validation by Dell :material-arrow-top-right-thin:{ .external }](https://infohub.delltechnologies.com/en-us/l/generative-ai-in-the-enterprise-with-amd-accelerators/rccl-and-perftest-for-cluster-validation-1/4/){:target="_blank"}.
478+
* [AMD Instinct MI300X GPU partitioning overview](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/gpu-partitioning/mi300x/overview.html)
479+
* [Deep dive into partition modes by AMD](https://rocm.blogs.amd.com/software-tools-optimization/compute-memory-modes/README.html).
480+
* [RCCL and PerfTest for cluster validation by Dell](https://infohub.delltechnologies.com/en-us/l/generative-ai-in-the-enterprise-with-amd-accelerators/rccl-and-perftest-for-cluster-validation-1/4/).
481481

482482
## What's next?
483483

@@ -487,5 +487,5 @@ Benchmark the performance impact of VMs vs bare-metal for inference and training
487487

488488
#### Hot Aisle
489489

490-
Big thanks to [Hot Aisle :material-arrow-top-right-thin:{ .external }](https://hotaisle.xyz/){:target="_blank"} for providing the compute power behind these benchmarks.
490+
Big thanks to [Hot Aisle](https://hotaisle.xyz/) for providing the compute power behind these benchmarks.
491491
If you’re looking for fast AMD GPU bare-metal or VM instances, they’re definitely worth checking out.

0 commit comments

Comments
 (0)