Skip to content

Can't use Hugginface Inference API (serverless) due to hardcoded /generate path #849

@eschnou

Description

@eschnou

Spring-ai has support for Hugginface Inference endpoints. However this doesn't work with the 'serverless' version of the inference API due to a hardcoded '/generate' subpath being used.

Bug description
Configure hugginface and use a serverless inference endpoint such as:
spring.ai.huggingface.chat.url=https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct

This will result in an exception with the following error:
```404 Not Found: "{"error":"Model meta-llama/Meta-Llama-3-8B-Instruct/generate does not exist"}"````

This is because the chatModel is calling the generate method (which leads to the openclient client calling /generate), while it seems it should be the 'compatGenerate' method to invoke the endpoint the / path.

https://github.com/spring-projects/spring-ai/blob/v1.0.0-M1/models/spring-ai-huggingface/src/main/java/org/springframework/ai/huggingface/HuggingfaceChatModel.java#L97

Environment
Spring-ai 1.0.0.M1

Expected behavior
I would expect this library to work with the serverless version of the inference endpoints (much cheaper 😅). If the path has to be different between versions it should be configurable.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions