Provide documentation

I would use your checkpoints to obtain masks of relevance on images, but there is a lack of documentation of "how to use" Adaptive Slot Attention on single images for example.

Can you provide documentation or example code, avoiding cli/eval.py which use (maybe workaround) a trainer to provide data?