We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent b2eccc2 commit 160894eCopy full SHA for 160894e
1 file changed
README.md
@@ -81,7 +81,7 @@
81
82
- 使用paged attention, flash attention后端,cuda graph等功能:
83
```bash
84
- CUDA_VISIBLE_DEVICES=0,1,2,3 python python/infinilm/server/inference_server.py --device nvidia --model=/models/9G7B_MHA/ --enable-paged-attn --attn=flash-atten --enable-graph
+ CUDA_VISIBLE_DEVICES=0,1,2,3 python python/infinilm/server/inference_server.py --device nvidia --model=/models/9G7B_MHA/ --enable-paged-attn --attn=flash-attn --enable-graph
85
```
86
87
- 测试推理服务性能:
0 commit comments