@@ -347,6 +347,12 @@ instruction-tuned and thus does not respond to instructions. Make sure you are
347347using an instruction-tuned model (` 2b-it-sfp ` , ` 2b-it ` , ` 7b-it-sfp ` , ` 7b-it ` )
348348and not a pre-trained model (any model with a ` -pt ` suffix).
349349
350+ ** What sequence lengths are supported?**
351+
352+ See ` seq_len ` in ` configs.cc ` . For the Gemma 3 models larger than 1B, this is
353+ typically 32K but 128K would also work given enough RAM. Note that long
354+ sequences will be slow due to the quadratic cost of attention.
355+
350356** How do I convert my fine-tune to a ` .sbs ` compressed model file?**
351357
352358For PaliGemma (1 and 2) checkpoints, you can use
@@ -372,15 +378,17 @@ pytorch checkpoint. (The code may need updates to work with Gemma-2 models.)
372378
373379** What are some easy ways to make the model run faster?**
374380
375- 1 . Make sure you are using the 8-bit switched floating point ` -sfp ` models.
376- 2 . If you're on a laptop, make sure power mode is set to maximize performance
377- and saving mode is ** off** . For most laptops, the power saving modes get
378- activated automatically if the computer is not plugged in.
379- 3 . Close other unused cpu-intensive applications.
380- 4 . On macs, anecdotally we observe a "warm-up" ramp-up in speed as performance
381- cores get engaged.
382- 5 . Experiment with the ` --num_threads ` argument value. Depending on the device,
383- larger numbers don't always mean better performance.
381+ 1 . Make sure you are using the 8-bit switched floating point ` -sfp ` models.
382+ These are half the size of bf16 and thus use less memory bandwidth and cache
383+ space.
384+ 2 . If you're on a laptop, make sure power mode is set to maximize performance
385+ and saving mode is ** off** . For most laptops, the power saving modes get
386+ activated automatically if the computer is not plugged in.
387+ 3 . Close other unused cpu-intensive applications.
388+ 4 . On macs, anecdotally we observe a "warm-up" ramp-up in speed as performance
389+ cores get engaged.
390+ 5 . Experiment with the ` --num_threads ` argument value. Depending on the device,
391+ larger numbers don't always mean better performance.
384392
385393We're also working on algorithmic and optimization approaches for faster
386394inference, stay tuned.
0 commit comments