Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
example.json	example.json

Subtask 1

Prompts
Dataset distribution
Metrics
Hyperparameters
Example

Prompts

Zero-shot ICL using LLMs

User prompt:

Given a text, a question, and multiple choice answers, reply with only the letter (e.g., 'A', 'B', etc.) of the correct answer.
Do not include any explanation or additional text — only the single letter.

Text: <text>

Question: <question>

<optionId>) <text>
<optionId>) <text>
<optionId>) <text>

Evaluation dataset

Proficiency level	A1	A2	B1	B2	C1	C2	Total
Number of exercises	6	10	13	6	4	1	40

Metrics

The evaluation metric used was accuracy (%).

Strategy	Model	A1	A2	B1	B2	C1	C2	Overall
ICL (zero-shot)	Claude 3.7 Sonnet	100	96.08	98.59	94.59	79.17	60.00	95.37
ICL (zero-shot)	Qwen 2.5 72B	96.97	96.08	98.59	97.30	83.33	40.00	95.83
ICL (zero-shot)	DeepSeek R1	96.97	98.04	98.59	97.30	83.33	60.00	96.30
ICL (zero-shot)	Llama 3.3 70B	93.94	98.04	91.55	91.89	70.83	40.00	94.44
ICL (zero-shot)	QwQ 32B	96.97	98.04	97.18	97.30	79.17	20.00	95.37
ICL (zero-shot)	Mistral Large	93.94	98.04	97.18	86.49	70.83	40.00	92.13
ICL (zero-shot)	Gemma 3 12B	96.97	98.04	97.18	94.59	70.83	40.00	93.98
ICL (zero-shot)	Phi 4 14B	96.97	98.08	94.37	100.00	75.00	40.00	93.98
ICL (zero-shot)	DeepSeek R1 Distill Qwen 14B	84.85	92.16	91.55	78.38	70.83	20.00	86.11
ICL (zero-shot)	Qwen 2.5 14B Instruct 1M	87.88	98.04	97.18	100.00	83.33	40.00	94.91
ICL (zero-shot)	Mistral Nemo 12B Instruct	100	98.04	95.77	89.19	54.17	20.00	91.20
ICL (zero-shot)	Qwen 2.5 7B Instruct	93.94	96.08	90.14	91.89	70.83	20.00	90.28
ICL (zero-shot)	Ministral 8B Instruct	93.94	98.04	92.96	89.19	79.17	20.00	92.13
ICL (zero-shot)	Gemma 3 4B	81.82	90.20	83.10	78.38	75.00	20.00	82.87
ICL (zero-shot)	Phi 4 3B	81.82	92.16	88.73	75.68	54.17	40.00	82.41
Fine-tuning (zero-shot)	Gemma 3 12B (version 0)	100	90.20	92.96	81.08	70.83	40.00	88.89
Fine-tuning (zero-shot)	Gemma 3 12B (version 1)	93.94	93.14	91.55	87.84	66.67	20.00	88.43
Ensemble (majority voting), ICL (zero-shot)	Gemma3 12B + Phi4 14B + Qwen2.5 14B + Ministral 8B	96.97	98.04	97.18	100.00	75.00	40.00	95.37
Ensemble (random forest), ICL (zero-shot)	Gemma3 12B + Phi4 14B + Qwen2.5 14B + Ministral 8B	96.97	98.04	97.18	97.30	70.83	40.00	94.44
NLI (zero-shot)	DeBERTa Base Long NLI	66.67	92.16	77.47	56.76	41.67	80.00	71.95
NLI (zero-shot)	DeBERTa v3 Large tasksource NLI	87.88	96.08	88.73	56.76	58.33	60.00	81.00
NLI (zero-shot)	A2T RoBERTa SMFA ACE arg	48.49	37.26	35.21	51.35	45.83	20.00	41.35
NLI (zero-shot)	Longformer Base 4096 BNE ES NLI	57.58	50.98	40.85	35.14	58.33	40.00	46.94
Ensemble, NLI (zero-shot)	DeBERTa Base + DeBERTa v3 Large + A2T RoBERTa + Longformer Base	87.88	96.08	88.73	56.76	58.33	60.00	81.00

Fine-tuning hyperparameters

Parameter	Value
Model ID from HuggingFace	`google/gemma-3-12b-it`
Batch size	`4`
Gradient accumulation	`8`
Learning rate	`2e-4`
Epochs	`20`
Max sequence length	`2048`
LoRA rank (`r`)	`16`
LoRA alpha	`16`
LoRA dropout	`0`
GPUs	4 NVIDIA L40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Subtask 1

Prompts

Zero-shot ICL using LLMs

Evaluation dataset

Metrics

Fine-tuning hyperparameters

FilesExpand file tree

subtask1

Directory actions

More options

Directory actions

More options

Latest commit

History

subtask1

Folders and files

parent directory

README.md

Subtask 1

Prompts

Zero-shot ICL using LLMs

Evaluation dataset

Metrics

Fine-tuning hyperparameters