<<<<<<< HEAD
This repository implements all four assignment tasks for CSE425/EEE474:
- Task 1: LSTM Autoencoder
- Task 2: Variational Autoencoder (VAE)
- Task 3: Transformer Decoder (autoregressive)
- Task 4: RLHF-style preference tuning (policy gradient with human or simulated rewards)
data/raw_midi/: raw MIDI files grouped by genre subfoldersdata/processed/: tokenized sequences and vocabularydata/train_test_split/: optional split metadatasrc/preprocessing/: parsing, tokenization, piano-roll conversionsrc/models/: AE, VAE, Transformer, optional diffusion placeholdersrc/training/: training entrypoints for each tasksrc/evaluation/: metrics and comparisonssrc/generation/: sequence sampling and MIDI exportoutputs/generated_midis/: generated samplesoutputs/plots/: loss/perplexity plotsoutputs/survey_results/: human feedback CSV and summaries
pip install -r requirements.txtPlace MIDI files inside genre folders:
data/raw_midi/
classical/*.mid
jazz/*.mid
rock/*.mid
pop/*.mid
electronic/*.mid
- Preprocess and tokenize MIDI files
python -m src.preprocessing.midi_parser --input data/raw_midi --output data/processed- Train Task 1 (LSTM Autoencoder)
python -m src.training.train_ae --data data/processed/sequences.npz --out outputs- Train Task 2 (VAE)
python -m src.training.train_vae --data data/processed/sequences.npz --out outputs- Train Task 3 (Transformer)
python -m src.training.train_transformer --data data/processed/sequences.npz --out outputs- Run Task 4 (RLHF fine-tuning)
python -m src.training.train_rlhf --model-checkpoint outputs/checkpoints/transformer.pt --data data/processed/sequences.npz --out outputs- Evaluate
python -m src.evaluation.metrics --real data/processed/sequences.npz --generated outputs/generated_tokens.npz --out outputs- Export generated tokens to MIDI
python -m src.generation.generate_music --checkpoint outputs/checkpoints/transformer.pt --vocab data/processed/vocab.json --out outputs/generated_midis- Random note generator:
src/evaluation/metrics.py(random_baseline) - Markov chain baseline:
notebooks/baseline_markov.ipynband utility insrc/generation/sample_latent.py
- Task 4 supports both real human scores (CSV) and a simulated reward function for debugging.
- Replace simulated rewards with survey results to complete final deliverables. =======
b207f2421144af4e5effb0d93a442703bf18aed7