Code for processing stackexchange data dump available in h4_code (to build https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences) and other, notebook for further processing (e.g convert all HTML to Markdown) in StackExchangeProcessing.ipynb (to build https://huggingface.co/datasets/lvwerra/stack-exchange-paired)