feat(CLAUDE.md): created for opinions and oral_args submodules#1862
Open
grossir wants to merge 1 commit into
Open
feat(CLAUDE.md): created for opinions and oral_args submodules#1862grossir wants to merge 1 commit into
grossir wants to merge 1 commit into
Conversation
#1793 Juriscraper has a lot of quirks that confuse both human users and agents. This CLAUDE.md attempts to create guidelines for the development process (issue creation, development, code review). In a sense, it could be used by humans too. I have tried to keep it concise and limit it to non obvious points that we value in scraper development
Contributor
Author
|
I think it needs some more comments on backscraping, download_backwards, etc. It got a little confused today implementing a backscraper |
Luis-manzur
suggested changes
Mar 20, 2026
| - Sentry links let reviewers count occurrences, see affected courts, and trace exact failing records. Any proposed fix should address those specific occurrences. | ||
| - Show data examples (screenshots, SQL counts, specific records) to illustrate the problem | ||
| - Reviewers need real edge cases to verify fixes against. Without examples, PRs get reviewed blindly and bugs slip through. | ||
| - If the issue is getting to big, use the `<details><summary>Title of collapsible section</summary>A lot of content</details>` tags to organize extra info. |
Contributor
There was a problem hiding this comment.
suggestion
Suggested change
| - If the issue is getting to big, use the `<details><summary>Title of collapsible section</summary>A lot of content</details>` tags to organize extra info. | |
| - If the issue is getting too big, use the `<details><summary>Title of collapsible section</summary>A lot of content</details>` tags to organize extra info. |
| - Use `date_filed_is_approximate = True` when exact dates aren't available; don't set unrealistic dates | ||
| - `"Unknown"` status when status can't be determined, not `"Published"` | ||
| - Try to parse the most amount of data possible. If a source shows the author of an opinion, the disposition, a summary, etc; try to pick it up. If there is some interesting field you don't know where to fit in the accepted return keys, highlight it to the user so they can research. | ||
| - If a source shows different opinion types for the same case, consider using `ClusterSite` as a base class to group those together. |
Contributor
There was a problem hiding this comment.
No mention of ClusterSite mechanics — mentions considering ClusterSite but doesn't explain how it works or when to use it vs OpinionSiteLinear.
| - Real data captures encoding quirks, edge cases, and unexpected fields that synthetic data misses. | ||
| - Auto-generate example compare files via: `python -m unittest -v tests.local.test_ScraperExampleTest juriscraper.opinions` | ||
| - Manually created compare files drift from actual scraper output. The test command generates them from saved responses. | ||
| - Test both scraper AND backscraper with `sample_caller`. |
Contributor
There was a problem hiding this comment.
consider adding something like python sample_caller.py -c
juriscraper.opinions.united_states.state.conn would help.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#1793
Juriscraper has a lot of quirks that confuse both human users and agents.
This CLAUDE.md attempts to create guidelines for the development process (issue creation, development, code review). In a sense, it could be used by humans too.
I have tried to keep it concise and limit it to non obvious points that we value in scraper development