Update!

Hey to anyone using this! Starting April 2nd, 2020, I will be abandoning this repo for a private, forked version of it. Every so often, I might make some updates to this repo if they're general enough, but I'll probably forget to, if I'm honest. There's still good stuff here, but I no longer want to deal with the hassle of constantly guessing what should be public and what should be private.

Thanks!

My web-scraping projects

This is (most) of the code of my personal web-scraping Python projects. Feel free to use the source code to learn from, but if you borrow stuff, please source me.

Warning

This code is mostly for me. That means I'm still working on it, playing around with it, and I understand things about it that you, a stranger, may not. Although I "try" to document things and be clear, do not assume any of the code works in the way that you think it will.

Note:

Additionally, much of this code is aimed at scraping particular websites, so please do not start running stuff willy-nilly. For example, if you want the data from the website I scraped for the manga_updates.py/manga_project project, please just help yourself to what I've collected (everything_json.json and everything_json_issues_slimmed.json).

Please do not try to scrape it again yourself---they don't need multiple people bombarding their site with my code.

If you want to see how it's done by running through the code yourself, please limit yourself to ~20 requests per run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update!

My web-scraping projects

Warning

Note:

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Update!

My web-scraping projects

Warning

Note: