|
| 1 | +# Inside the DeepRaaga Neural Engine: Code Internals |
| 2 | + |
| 3 | +We often talk about the profound implications of merging Deep Learning with ancient musical grammar. Today, we're popping the hood to show you exactly how the DeepRaaga core models operate on a code level. |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +## 1. The Preprocessing Pipeline (`data_ingestion.py`) |
| 8 | + |
| 9 | +Before a neural network can generate a Carnatic Sanchara, it must understand the raw audio. We don't feed raw waveform data (like `.wav` or `.mp3`) directly into our LSTM block; the dimensionality is too high to effectively parse logical raga grammar. |
| 10 | + |
| 11 | +Instead, we extract semantic data: |
| 12 | +```python |
| 13 | +def extract_note_sequence(midi_path): |
| 14 | + """ |
| 15 | + Parses complex MIDI files into Magenta NoteSequences, |
| 16 | + preserving pitch bends and gamaka arrays. |
| 17 | + """ |
| 18 | + raw_sequence = midi_io.midi_file_to_note_sequence(midi_path) |
| 19 | + quantized_sequence = sequences_lib.quantize_note_sequence(raw_sequence, steps_per_quarter=4) |
| 20 | + return quantized_sequence |
| 21 | +``` |
| 22 | +By utilizing `NoteSequence` protobufs, we structure the music into an array of discrete and continuous events, maintaining the microtonal bends essential to Carnatic ragas. |
| 23 | + |
| 24 | +## 2. Raga-Conditioned LSTM Training Loop (`train_model.py`) |
| 25 | + |
| 26 | +The magic behind DeepRaaga is the **Conditioning Vector**. We do not use a single monolithic model for all Indian music. We inject the *Raga_ID* straight into the LSTM state. |
| 27 | + |
| 28 | +```python |
| 29 | +# Pseudo-code for our conditioning block |
| 30 | +class RagaLSTM(tf.keras.Model): |
| 31 | + def __init__(self, vocab_size, raga_classes): |
| 32 | + super(RagaLSTM, self).__init__() |
| 33 | + self.raga_embedding = tf.keras.layers.Embedding(raga_classes, 64) |
| 34 | + self.note_embedding = tf.keras.layers.Embedding(vocab_size, 128) |
| 35 | + self.lstm = tf.keras.layers.LSTM(256, return_sequences=True) |
| 36 | + self.dense = tf.keras.layers.Dense(vocab_size, activation='softmax') |
| 37 | + |
| 38 | + def call(self, inputs, raga_id): |
| 39 | + # The raga latent vector acts as a strong bias |
| 40 | + condition = self.raga_embedding(raga_id) |
| 41 | + x = self.note_embedding(inputs) |
| 42 | + |
| 43 | + # Inject context alongside the sequence data |
| 44 | + context_injected = tf.concat([x, condition], axis=-1) |
| 45 | + lstm_out = self.lstm(context_injected) |
| 46 | + return self.dense(lstm_out) |
| 47 | +``` |
| 48 | +This forces the LSTM to constrain its probabilistic output strictly to the allowed *Arohana/Avarohana* of the injected context. |
| 49 | + |
| 50 | +## 3. Real-Time Generation via FastAPI (`app.py`) |
| 51 | + |
| 52 | +To make these models accessible on the web, we wrap the inference logic in an asynchronous FastAPI server. When you adjust the "Temperature" or "Raga" sliders on our React frontend, a REST call triggers the generation: |
| 53 | + |
| 54 | +```python |
| 55 | +@app.post("/api/generate") |
| 56 | +async def generate_melody(request: GenerationRequest): |
| 57 | + raga_vector = get_raga_embedding(request.raga_name) |
| 58 | + seed_sequence = generate_seed(request.raga_name) |
| 59 | + |
| 60 | + # Run autoregressive generation |
| 61 | + output = model.generate( |
| 62 | + seed=seed_sequence, |
| 63 | + condition=raga_vector, |
| 64 | + temperature=request.temperature |
| 65 | + ) |
| 66 | + |
| 67 | + return {"midi_data": sequence_to_midi_base64(output)} |
| 68 | +``` |
| 69 | + |
| 70 | +By decoupling the Heavy ML operations from the frontend, DeepRaaga achieves ultra-fast inference while letting you play the generated MIDI results seamlessly through `Tone.js`. |
0 commit comments