Skip to content

Commit bde3b8f

Browse files
improved gui refresh
1 parent e061abc commit bde3b8f

8 files changed

Lines changed: 203 additions & 304 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,9 @@ pytch
3737
```
3838
hit return and sing!
3939

40-
## Contribution
40+
## Contributing
4141

42-
Every contribution is welcome. To ensure consistent style we use [black](https://github.com/psf/black).
42+
Every contribution is welcome. Please feel free to open and issue or a pull request. To ensure consistent style we use [black](https://github.com/psf/black).
4343
You can add automated style checks at commit time using [pre-commit](https://pre-commit.com/)
4444

4545
```bash

paper/paper.bib

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,3 @@
1-
@inproceedings{MeierSM25_RealTimeF0_ISMIR,
2-
author = {Peter Meier and Sebastian Strahl and Simon Schw{\"a}r and Meinard M{\"u}ller},
3-
title = {libf0-realtime: TODO},
4-
booktitle = {Submitted to the International Society for Music Information Retrieval Conference ({ISMIR})},
5-
address = {},
6-
year = {2025},
7-
url-pdf = {},
8-
url-code = {}
9-
}
10-
111
@article{MeierCM24_RealTimePLP_TISMIR,
122
author = {Peter Meier and Ching-Yu Chiu and Meinard M{\"u}ller},
133
title = {{A} Real-Time Beat Tracking System with Zero Latency and Enhanced Controllability},

paper/paper.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ bibliography: paper.bib
3636

3737
# Summary
3838
Polyphonic singing is one of the most widespread forms of music-making. During a performance, singers must constantly adjust their pitch to stay in tune with one another — a complex skill that requires extensive practice. Research has shown that pitch monitoring tools can assist singers in fine-tuning their intonation during a performance [@BerglinPD22_VisualFeedback_JPM]. Specifically, real-time visualizations of the fundamental frequency (F0), which represents the pitch of the singing voice, help singers assess their pitch relative to a fixed reference or other voices.
39-
To support the monitoring of polyphonic singing performances, we developed `pytch`, an interactive Python tool with a graphical user interface (GUI) designed to record, process, and visualize multiple voices in real time. The GUI displays vocal spectra and estimated F0 trajectories for all singers, as well as the harmonic intervals between them. Additionally, users can adjust visual and algorithmic parameters interactively to accommodate different input devices, microphone signals, singing styles, and use cases. Written in Python, `pytch` utilizes the `libf0-realtime` library [@MeierSM25_RealTimeF0_ISMIR] for real-time F0 estimation and `pyqtgraph`[^1] for efficient visualizations of the analysis results.
39+
To support the monitoring of polyphonic singing performances, we developed `pytch`, an interactive Python tool with a graphical user interface (GUI) designed to record, process, and visualize multiple voices in real time. The GUI displays vocal spectra and estimated F0 trajectories for all singers, as well as the harmonic intervals between them. Additionally, users can adjust visual and algorithmic parameters interactively to accommodate different input devices, microphone signals, singing styles, and use cases. Written in Python, `pytch` utilizes the `libf0` library [@RosenzweigSM22_libf0_ISMIR-LBD] for real-time F0 estimation and `pyqtgraph`[^1] for efficient visualizations of the analysis results.
4040
Our tool builds upon a late-breaking demo in [@KriegerowskiS_Pytch_2017], which we refer to as version 1. Since then, the tool has been significantly extended with a new real-time graphics engine, a modular audio processing backend that facilitates the integration of additional algorithms, and improved support for a wider range of platforms and recording hardware, which we refer to as version 2. Over its seven years of development, `pytch` has been tested and refined through use in several rehearsals, workshops, and field studies — including Sardinian quartet singing (see demo video[^2]) and traditional Georgian singing (see demo video[^3]).
4141

4242
[^1]: <https://www.pyqtgraph.org>
@@ -72,7 +72,7 @@ In addition to live monitoring, `pytch` can also be used to analyze pre-recorded
7272
# Audio Processing
7373
The real-time audio processing pipeline implemented in the file `audio.py` is the heart of `pytch` and consists of two main stages: recording and analysis. The recording stage captures multichannel audio waveforms from the soundcard or an external audio interface using the `sounddevice` library. The library is based on PortAudio and supports a wide range of operating systems, audio devices, and sampling rates. The recorded audio is received in chunks via a recording callback and fed into a ring buffer shared with the analysis process. When the buffer is sufficiently filled with audio chunks, the analysis process reads the recorded audio to compute several audio features.
7474

75-
For each channel, the analysis stage computes the audio level in dBFS, a time--frequency representation of the audio signal via the Short-Time Fourier Transform (see [@Mueller21_FMP_SPRINGER] for fundamentals of music processing), and an estimate of the F0 along with a confidence value, using the `libf0-realtime` library [@MeierSM25_RealTimeF0_ISMIR]. The library includes several real-time implementations of well-known F0 estimation algorithms, such as YIN [@CheveigneK02_YIN_JASA] and SWIPE [@CamachoH08_SawtoothWaveform_JASA]. YIN is a time-domain algorithm that computes the F0 based on a tweaked auto-correlation function. It is computationally efficient and well-suited for low-latency applications, but it tends to suffer from estimation errors, particularly confusions with higher harmonics such as the octave. In contrast, SWIPE is a frequency-domain algorithm that estimates the F0 by matching different spectral representations of the audio with sawtooth-like kernels. While more computationally demanding, SWIPE typically yields more reliable estimates, in particular for vocal input signals. `pytch` allows users to choose between these algorithms depending on their specific needs and system capabilities. The obtained F0 estimates, which are natively computed in the unit Hz, are converted to the unit cents using a user-specified reference frequency. Depending on the audio quality and vocal characteristics, F0 estimates may exhibit artifacts such as discontinuities or pitch slides, which can make the resulting trajectories difficult to interpret [@RosenzweigSM19_StableF0_ISMIR]. Previous research has shown that using throat microphones can improve the isolation of individual voices in group singing contexts, resulting in cleaner signals and more accurate F0 estimates [@Scherbaum16_LarynxMicrophones_IWFMA]. To further enhance interpretability, `pytch` includes several optional post-processing steps: a confidence threshold to discard estimates with low confidence score, a median filter to smooth the trajectories, and a gradient filter to suppress abrupt pitch slides. As a final step in the audio analysis, the harmonic intervals between the F0 trajectories are computed. Every audio feature is stored separately in a dedicated ring buffer. After processing, the pipeline sets a flag that notifies the GUI that new data is ready for visualization.
75+
For each channel, the analysis stage computes the audio level in dBFS, a time--frequency representation of the audio signal via the Short-Time Fourier Transform (see [@Mueller21_FMP_SPRINGER] for fundamentals of music processing), and an estimate of the F0 along with a confidence value, using the `libf0` library [@RosenzweigSM22_libf0_ISMIR-LBD]. The library includes several implementations of well-known F0 estimation algorithms. We make use of YIN [@CheveigneK02_YIN_JASA], which is a time-domain algorithm that computes the F0 based on a tweaked auto-correlation function. It is computationally efficient and well-suited for low-latency applications, but it tends to suffer from estimation errors, particularly confusions with higher harmonics such as the octave. The obtained F0 estimates, which are natively computed in the unit Hz, are converted to the unit cents using a user-specified reference frequency. Depending on the audio quality and vocal characteristics, F0 estimates may exhibit artifacts such as discontinuities or pitch slides, which can make the resulting trajectories difficult to interpret [@RosenzweigSM19_StableF0_ISMIR]. Previous research has shown that using throat microphones can improve the isolation of individual voices in group singing contexts, resulting in cleaner signals and more accurate F0 estimates [@Scherbaum16_LarynxMicrophones_IWFMA]. To further enhance interpretability, `pytch` includes several optional post-processing steps: a confidence threshold to discard estimates with low confidence score, a median filter to smooth the trajectories, and a gradient filter to suppress abrupt pitch slides. As a final step in the audio analysis, the harmonic intervals between the F0 trajectories are computed. Every audio feature is stored separately in a dedicated ring buffer. After processing, the pipeline sets a flag that notifies the GUI that new data is ready for visualization.
7676

7777

7878
# Graphical User Interface (GUI)
@@ -89,6 +89,6 @@ The main GUI is organized into three horizontal sections. On the left, a control
8989
The right section, referred to as the "trajectory view," provides time-based visualizations of either the F0 trajectories ("pitches" tab) or the harmonic intervals between voices ("differential" tab) with a 10 second time context. Using the controls in the left-side menu, the user can select the F0 estimation algorithm and improve the real-time visualization by adjusting the confidence threshold, the median filter length for smoothing, and the tolerance of the gradient filter. F0 and interval trajectories can be displayed with respect to a fixed reference frequency or a dynamic one derived from a selected channel, the lowest, or highest detected voice. Axis limits for this section can also be manually set.
9090

9191
# Acknowledgements
92-
We would like to thank Lukas Dietz for his help with the implementation, Peter Meier and Sebastian Strahl for their support with integrating the real-time F0 algorithms, and all the singers who contributed to testing `pytch` during its development.
92+
We would like to thank Lukas Dietz for his help with the implementation, Peter Meier and Sebastian Strahl for the collaboration on real-time implementations, and all the singers who contributed to testing `pytch` during its development.
9393

9494
# References

pytch/__init__.py

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,3 @@
11
import logging
22

3-
try: # Python 2.7+
4-
from logging import NullHandler
5-
except ImportError:
6-
7-
class NullHandler(logging.Handler):
8-
def emit(self, record):
9-
pass
10-
11-
123
logging.basicConfig(level=logging.INFO)
13-
logging.getLogger(__name__).addHandler(NullHandler())

0 commit comments

Comments
 (0)