Hi, thanks for releasing this project. I am testing the inference path on piano performance MIDI and exporting the resulting music21 score to MusicXML.
One interoperability issue I ran into is that the inferred MusicXML does not contain any explicit tempo information (<direction><metronome>... or <sound tempo="..."/>). This makes downstream renderers/players such as alphaTab, OSMD + playback pipelines, or MuseScore round-trips fall back to defaults.
Simply copying the input MIDI's first set_tempo is not always correct for performance-MIDI-to-score conversion. For example, a MIDI may contain a default 120 BPM tempo while the inferred score grid represents a slower/faster tactus, or may need a local tempo map to preserve expressive timing.
I noticed MultistreamTokenizer.detokenize_mxl(..., midi_sequence=...) has comments around using the input MIDI timings to create (score offset, time in seconds) pairs:
# If we have the input midi timings, we can use them to set the tempo
# We first set tempo marks to track where their location `should` be
# The inserted tempo marks therefore form (offset, time in seconds) pairs.
However, the public inference helper quantize_path() currently calls detokenize_mxl(y_hat) without passing the MIDI sequence, and the inserted MetronomeMark(number=midi_sequence[i].start) values appear to be seconds rather than BPM, so they are not directly valid MusicXML tempo markings for playback.
Would it make sense for inference/export to expose one of these options?
- preserve the input MIDI tempo when no score timing alignment is requested;
- accept an explicit
bpm argument for MusicXML export;
- derive and export a tempo map from the alignment between input performance onset times and predicted score offsets/downbeats;
- return the predicted score timing information separately, so callers can generate their own MusicXML
<sound tempo="..."/> marks.
For my local wrapper I can add a constant metronome mark after inference, but this is not enough for expressive performance MIDI and it is hard for downstream tools to know the intended playback tempo without additional output from the model/inference step.
Hi, thanks for releasing this project. I am testing the inference path on piano performance MIDI and exporting the resulting
music21score to MusicXML.One interoperability issue I ran into is that the inferred MusicXML does not contain any explicit tempo information (
<direction><metronome>...or<sound tempo="..."/>). This makes downstream renderers/players such as alphaTab, OSMD + playback pipelines, or MuseScore round-trips fall back to defaults.Simply copying the input MIDI's first
set_tempois not always correct for performance-MIDI-to-score conversion. For example, a MIDI may contain a default 120 BPM tempo while the inferred score grid represents a slower/faster tactus, or may need a local tempo map to preserve expressive timing.I noticed
MultistreamTokenizer.detokenize_mxl(..., midi_sequence=...)has comments around using the input MIDI timings to create(score offset, time in seconds)pairs:However, the public inference helper
quantize_path()currently callsdetokenize_mxl(y_hat)without passing the MIDI sequence, and the insertedMetronomeMark(number=midi_sequence[i].start)values appear to be seconds rather than BPM, so they are not directly valid MusicXML tempo markings for playback.Would it make sense for inference/export to expose one of these options?
bpmargument for MusicXML export;<sound tempo="..."/>marks.For my local wrapper I can add a constant metronome mark after inference, but this is not enough for expressive performance MIDI and it is hard for downstream tools to know the intended playback tempo without additional output from the model/inference step.