Skip to content

Commit 505a300

Browse files
authored
Update README with SIDE model info and requirements
Added details about the SIDE model checkpoint and updated requirements.
1 parent defdac2 commit 505a300

1 file changed

Lines changed: 33 additions & 9 deletions

File tree

summarization/README.md

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,17 @@ code-summarization/
99
├── fft_train.py # Full fine-tuning training script
1010
├── qlora_train.py # QLoRA training script
1111
├── codereval/
12-
├── infer_summarization_fft.py # Inference for FFT models
13-
├── infer_summarization_qlora.py # Inference for QLoRA models
14-
├── evaluate_summarization_metrics.py # Compute BLEU, ROUGE, METEOR, etc.
15-
├── evaluate_summarization_llm_judge.py # LLM-as-judge evaluation (GPT-5 mini)
16-
├── aggregate_llm_judge_scores.py # Aggregate LLM judge scores using mean
17-
├── cs_codereval_eval_dataset_java_v2.jsonl # CoderEval Java benchmark
18-
└── cs_codereval_eval_dataset_py_v2.jsonl # CoderEval Python benchmark
12+
│ ├── infer_summarization_fft.py # Inference for FFT models
13+
│ ├── infer_summarization_qlora.py # Inference for QLoRA models
14+
│ ├── evaluate_summarization_metrics.py # Compute BLEU, ROUGE, METEOR, etc.
15+
│ ├── evaluate_summarization_llm_judge.py # LLM-as-judge evaluation (GPT-5)
16+
│ ├── aggregate_llm_judge_scores.py # Aggregate LLM judge scores using mean
17+
│ ├── cs_codereval_eval_dataset_java_v2.jsonl # CoderEval Java benchmark
18+
│ └── cs_codereval_eval_dataset_py_v2.jsonl # CoderEval Python benchmark
19+
└── dataset/
20+
└── code_x_glue_ct_code_to_text/
21+
├── java/
22+
└── python/
1923
```
2024

2125
## Step 1: Training
@@ -106,7 +110,10 @@ python codereval/evaluate_summarization_metrics.py \
106110
- `--language`: Programming language (`java` or `python`)
107111
- `--summary_field`: Field name containing generated summaries
108112
- `--output_file`: Output path for results (without extension)
109-
- `--side_checkpoint`: Path to SIDE model checkpoint (Java only)
113+
- `--side_checkpoint`: Path to SIDE model checkpoint (Java only, default: `path/to/SIDE/checkpoint`)
114+
115+
**Note on SIDE Score:**
116+
SIDE (Semantic Identifier for Documentation Evaluation) is computed only for Java. You need to download the SIDE model checkpoint from the [SIDE repository](https://github.com/antonio-mastropaolo/code-summarization-metric) and provide the path via `--side_checkpoint`.
110117

111118
**Output:**
112119
- `<output_file>.txt`: Human-readable results
@@ -191,9 +198,26 @@ python codereval/aggregate_llm_judge_scores.py
191198
- **ROUGE-1/2/L**: Recall-oriented metrics
192199
- **chrF**: Character-level F-score
193200
- **BERTScore**: Contextual embedding similarity
194-
- **SIDE**: Semantic similarity for code summaries (Java only)
201+
- **SIDE**: Semantic similarity for code summaries (Java only) - [GitHub](https://github.com/antonio-mastropaolo/code-summarization-metric)
195202

196203
### LLM-as-Judge Metrics
197204
- **Content Adequacy**: How well the summary captures code functionality
198205
- **Conciseness**: Absence of unnecessary information
199206
- **Fluency**: Readability and clarity
207+
208+
## Requirements
209+
210+
```
211+
torch
212+
transformers
213+
datasets
214+
trl
215+
peft
216+
bitsandbytes
217+
sacrebleu
218+
rouge-score
219+
bert-score
220+
nltk
221+
openai
222+
sentence-transformers # Required for SIDE score
223+
```

0 commit comments

Comments
 (0)