Add bindings for multi-modal input of images and audio. Also, how are PDF's supported, for example?
Add bindings for multi-modal input of images and audio. Also, how are PDF's supported, for example?