Currently this library uses the blocking litellm.completion method in inference.py even though the library methods are themselves async. The LLM inference file should be updated to be asynchronous as well, unless there are other blockers I'm not aware of.
Currently this library uses the blocking
litellm.completionmethod ininference.pyeven though the library methods are themselves async. The LLM inference file should be updated to be asynchronous as well, unless there are other blockers I'm not aware of.