Skip to content

Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device#2715

Open
ZhengHongming888 wants to merge 2 commits into
huggingface:mainfrom
ZhengHongming888:intel_gaudi_mistral_7b_new
Open

Enable intfloat/e5-mistral-7b-instruct model with 32k token lens on hpu device#2715
ZhengHongming888 wants to merge 2 commits into
huggingface:mainfrom
ZhengHongming888:intel_gaudi_mistral_7b_new

Conversation

@ZhengHongming888
Copy link
Copy Markdown
Contributor

@ZhengHongming888 ZhengHongming888 commented Jun 4, 2024

This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training

This PR enables intfloat/e5-mistral-7b-instruct model with 32k token lens input on hpu device and it is the revision of PR#2656.

There are two parts for updates -

  1. Efficient new padding for bigger token lens input by using multiple of 128 instead of original power of 2 to reduce the padding overhead when the input token lens is bigger which is not efficient for power of 2.

  2. Bring in the 7b mistral 32k token lens support with hpu device by using the specific arguments in high level encode arguments which is not hard coded as previous PR.

The usage example for 7b mistral with 32k token lens will be -

hpu_kwargs = {"attn_softmax_bf16": True, "reuse_cache": True, "use_flash_attention":True,"flash_attention_recompute": True,"flash_attention_causal_mask": True, }
emb = model.encode(sentences, batch_size=32, kwargs={"hpu_kwargs" : hpu_kwargs})

any questions please comments.

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant