Here are the steps I want to go through to test out the meta-learning idea: - [x] Extract format analyzer parser (bblfsh + operator, whitespace and special cases handling) - [x] Transform parsed files (virtual nodes + uast) into a graph - [x] Create a model made of a GGNN encoder and a LSTM decoder with [Deep Graph Library](https://github.com/dmlc/dgl) and [PyTorch](https://pytorch.org/) - [x] Overfit 1 file formatted by prettier to check that the model is expressive enough to learn the formatting of one file - [x] Overfit 1 project formatted by prettier, still to check expressiveness - [ ] Gather a dataset of diverse and somewhat well maintained (ie formatted) projects to learn from (like @warenlg's top javascript repos dataset) - [ ] Define an evaluation scheme made of both interpolation (modeling style on training repos) and extrapolation (modeling style on unseen repos) - [ ] Test 4 approaches to train the model: - [ ] One model per repository (like style-analyzer) - [ ] One model for all repositories - [ ] One model for all repositories with multi-task learning (one task per repository) - [ ] One model for all repositories with [meta-learning](https://arxiv.org/pdf/1703.03400.pdf) (one task per repository + learn to adapt) - [ ] Plug the system into the visualizer to understand results - [ ] If results seem promising, evaluate a bit more and report to give input to product
Here are the steps I want to go through to test out the meta-learning idea: