Skip to content

Expose or export tempo information during inference #9

Description

@fchange

Hi, thanks for releasing this project. I am testing the inference path on piano performance MIDI and exporting the resulting music21 score to MusicXML.

One interoperability issue I ran into is that the inferred MusicXML does not contain any explicit tempo information (<direction><metronome>... or <sound tempo="..."/>). This makes downstream renderers/players such as alphaTab, OSMD + playback pipelines, or MuseScore round-trips fall back to defaults.

Simply copying the input MIDI's first set_tempo is not always correct for performance-MIDI-to-score conversion. For example, a MIDI may contain a default 120 BPM tempo while the inferred score grid represents a slower/faster tactus, or may need a local tempo map to preserve expressive timing.

I noticed MultistreamTokenizer.detokenize_mxl(..., midi_sequence=...) has comments around using the input MIDI timings to create (score offset, time in seconds) pairs:

# If we have the input midi timings, we can use them to set the tempo
# We first set tempo marks to track where their location `should` be
# The inserted tempo marks therefore form (offset, time in seconds) pairs.

However, the public inference helper quantize_path() currently calls detokenize_mxl(y_hat) without passing the MIDI sequence, and the inserted MetronomeMark(number=midi_sequence[i].start) values appear to be seconds rather than BPM, so they are not directly valid MusicXML tempo markings for playback.

Would it make sense for inference/export to expose one of these options?

  1. preserve the input MIDI tempo when no score timing alignment is requested;
  2. accept an explicit bpm argument for MusicXML export;
  3. derive and export a tempo map from the alignment between input performance onset times and predicted score offsets/downbeats;
  4. return the predicted score timing information separately, so callers can generate their own MusicXML <sound tempo="..."/> marks.

For my local wrapper I can add a constant metronome mark after inference, but this is not enough for expressive performance MIDI and it is hard for downstream tools to know the intended playback tempo without additional output from the model/inference step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions