Skip to content

[Dataset] MeetingBank Dataset #312

@ddeepak95

Description

@ddeepak95

MeetingBank Corpus

A collection of transcribed city council meetings from 6 major U.S. cities, converted to ConvoKit format for conversational AI research and meeting summarization tasks. The data consists of 1,366 meetings with over 3,579 hours of video content, providing a rich dataset for studying political discourse, meeting dynamics, and automated summarization.

Attribution: Yebowen Hu, Tim Ganter, Hanieh Deilamsalehy, Franck Dernoncourt, Hassan Foroosh, Fei Liu, "MeetingBank: A Benchmark Dataset for Meeting Summarization," in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), Toronto, Canada, July 2023, pp. 4512-4522. [Online]. Available: https://zenodo.org/records/7989108

Dataset details

Speaker-level information

Speakers in the dataset are participants in city council meetings, including council members, city officials, and public speakers. Each speaker is identified by a unique identifier that combines the meeting name with their speaker number (e.g., "SeattleCityCouncil_12142015_speaker_0").

Speaker metadata includes:

  • city: The city where the meeting took place
  • meeting_name: The specific meeting identifier
  • utterance_count: Total number of utterances contributed by this speaker

Utterance-level information

For each utterance (speech segment), we provide:

  • id: An identifier for the utterance (comprised of the meeting ID concatenated with its index in the meeting)
  • conversation_id: An identifier for the meeting/conversation to which the utterance belongs
  • reply_to: ID of the previous utterance in the conversation (None if it's the first utterance)
  • speaker: The speaker who delivered the utterance
  • timestamp: Time offset of the utterance within the meeting (in microseconds)
  • text: Transcribed textual content of the utterance

Utterance metadata:

  • city: The city where the meeting took place
  • meeting_name: The specific meeting identifier
  • duration: Duration of the speech segment (in microseconds)

Conversational-level information

Each conversation represents a complete city council meeting. The conversation structure follows a linear progression where each utterance replies to the previous one, creating a chronological chain of the meeting proceedings. Conversations are organized by city and meeting date, with each meeting containing multiple agenda items and discussion segments.

Meeting metadata includes:

  • city: The city where the meeting took place
  • meeting_name: The specific meeting identifier
  • num_speakers: Total number of unique speakers in the meeting
  • total_utterances: Total number of speech segments in the meeting
  • total_duration: Total duration of the meeting (in microseconds)

Quick stats

Number of conversations in the dataset = 1366
Number of speakers in the dataset = 12272
Number of utterances in the dataset = 1011870

=== CITY BREAKDOWN ===
Alameda: 164 transcripts
Boston: 32 transcripts
Denver: 401 transcripts
KingCounty: 132 transcripts
LongBeach: 310 transcripts
Seattle: 327 transcripts

Contact

Please email any questions to: dv292@cornell.edu

Dataset Link

https://drive.google.com/drive/u/1/folders/15OXtWuMj2GYBAeYGo1EzJlcIzSco6Z1Q

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions