Hi everyone,
I would like to propose a custom lightweight architecture called LookThem V5 to be included or benchmarked in this repository.
LookThem V5 uses a custom Ratio-based Attention mechanism (without the traditional QKV structure). In my independent training from scratch for only 40 epochs on Google Colab, it achieved 35.46% Validation Accuracy with a very small model size (around 1.4M parameters / 5.38 MB).
Since I have limited GPU resources to run the full 200-epoch cyclic learning rate protocol required by this repository, I would be very grateful if anyone with better hardware could benchmark it according to the official procedures here.
All the codes, notebooks, and training logs are available on my Hugging Face repository:
👉 https://huggingface.co/ASomeoneWhoInterestedWithAI/LookThem_Tiny-ImageNet
How to integrate and benchmark the code:
- Open the notebook named LookThemV5ContinueTrain.ipynb in the Hugging Face repo.
- Extract the model architecture code (LookThemLayer, etc.).
- To train it completely from scratch, please remove the model checkpoint import/loading section.
- Change the starting epoch back to 0, and adjust the training loop from 40 to 200 epochs to match this repository's benchmark procedure.
- Feel free to adjust the rest of the pipeline configuration (such as the data loader or training script) to seamlessly fit your environment.
Thank you very much! I would love to see how this architecture performs under your full 200-epoch protocol.
Hi everyone,
I would like to propose a custom lightweight architecture called LookThem V5 to be included or benchmarked in this repository.
LookThem V5 uses a custom Ratio-based Attention mechanism (without the traditional QKV structure). In my independent training from scratch for only 40 epochs on Google Colab, it achieved 35.46% Validation Accuracy with a very small model size (around 1.4M parameters / 5.38 MB).
Since I have limited GPU resources to run the full 200-epoch cyclic learning rate protocol required by this repository, I would be very grateful if anyone with better hardware could benchmark it according to the official procedures here.
All the codes, notebooks, and training logs are available on my Hugging Face repository:
👉 https://huggingface.co/ASomeoneWhoInterestedWithAI/LookThem_Tiny-ImageNet
How to integrate and benchmark the code:
Thank you very much! I would love to see how this architecture performs under your full 200-epoch protocol.