Invalid XML character break docket parsers

#### Summary
When a page on pacer (or elsewhere) contains some characters that are not in the [valid list of XML characters](https://www.w3.org/TR/xml11/Overview.html#charsets) lxml's html5 parser will fail. 

> *This is not a hypothetical, I was scraping a docket at the Ohio Northern Bankruptcy Court (ohnb), and the `docketreport.parse()`  failed because of some invalid XML characters coming back from the request.*

#### Tasks
- update the code in the `juriscraper/lib/html_utils.py` to escape these characters, probably using some regex so we don't lose too much speed.
- capture the raw response of the parsed docket, and include it in the test suite.

#### Questions
- has anyone seen this type of error coming from a pacer scrape ? You would've seen a `All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters` traceback bubble up the stack.  
- any opposition to having someone (possibly me) work on a patch for the html_utils to ensure that this type of data is protected against ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Invalid XML character break docket parsers #348

Summary

Tasks

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Invalid XML character break docket parsers #348

Description

Summary

Tasks

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions