Skip to content

Handle Bill Text Limit #2169

Description

@Mephistic

Summary

Now that we're scraping the text of longer bills from the PDFs, we're seeing occasional errors where a Bill would exceed the maximum acceptable size of a Firebase document (~1 MiB). This prevents scraping of the bill altogether - which is

Instead of failing altogether, we should handle this case more gracefully:

  • Catch failed Firestore writes around maximum allowed size and re-try saving the Bill document without the DocumentText (basically, the current behavior where we only have the PDF)

Success Criteria

  • Bills with excessively long DocumentText can be mostly successfully scraped (by excluding the long DocumentText field)

Additional Info

  • Error: In fetchBillBatch: Error: 3 INVALID_ARGUMENT: Document 'projects/digital-testimony-dev/databases/(default)/documents/generalCourts/194/bills/H5500' cannot be written because its size (1,373,153 bytes) exceeds the maximum allowed size of 1,048,576 bytes.

  • Example Bill: Bill H5500 in Court 194

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomersscraperBackend work related to content scraping

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions