A repository of scripts, tools, configurations and experiments in the MARMoT project.
data: datasets data preparation pipelinestokenizer: tokenization models and tokenizer training scriptstools: scripts and tools used in some experimentssandbox: playground for testing models, tools and pipelines