feat: Add audacity marker file support and creating annotation from rttm file#92
feat: Add audacity marker file support and creating annotation from rttm file#92yojul wants to merge 5 commits into
Conversation
…rom generic _serialize and _write methods
|
Thanks for this PR. Note that RTTM files may contain annotations for multiple audio files (hence the second
One okayish solution could be to add an option |
|
Thank you for your feedback. For more consistency with other "from" methods and the
Thus, it insures consistency with I also added a condition to only read lines starting with "SPEAKER" corresponding to speech segments. |
Problem
When evaluating speaker diarization pipelines, one might want to use Annotation objects and creating annotation from rttm (or other format) as well as serializing/writing annotation to rttm (or other format).
Audacity marker track feature is a very convenient (and free) way to create ground truth segmentation for speaker diarization. The format is a .txt file very similar to already implemented LAB file support but tab separated.
Solution
Refactor methods for various file format support
_serializemethod to replace multipleto_{format}methods._writemethod method to replace multiplewrite_{format}methods.to_<format>andwrite_<format>are now partial methods from generic methods.Currently supported formats are :
annotation.to_rttm()andannotation.write_rttm(file).annotation.to_audacity()andannotation.write_audacity(file).annotation.to_lab()andannotation.write_lab(file).Therefore, to add a new format one only need to implement
_iter_{format}methods similarly to_iter_rttmor_iter_lab.Creating annotation from audacity or rttm
Similarly to the
from_dfclass methods, I createdfrom_audacityandfrom_rttmclass methods to create easily annotations from those file formats.Usage :