Scrape Instance stora IOPS data and present as an additional filterable sortable column#892
Scrape Instance stora IOPS data and present as an additional filterable sortable column#892Ngalstyan4 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
As a now external user of the dataset, I'd incorporate this into the Go scraper (most of the methods should be there to help you). Other than that LGTM!
There was a problem hiding this comment.
thanks for taking a look, @IAmJSD !
Happy to make the necessary changes to bring this upstream. But would first want to hear from a current maintainer (you or someone else!) that they are interested in having this in upstream.
In the meantime, the fork I linked above along with the github CI works for me for now.
There was a problem hiding this comment.
I'm no longer a maintainer so this isn't my call, just inputting my 2 cents :)
There was a problem hiding this comment.
I agree with Astrid’s suggestion. The UI/table changes look reasonable, but I’d prefer to see the IOPS scrape folded into the Go scraper so it becomes part of the normal EC2 data generation path rather than a separate Node script.
There was a problem hiding this comment.
ok, thanks for getting back to me @emilydunenfeld !
I will do my best to get around to this in the next couple of weeks and fold this into existing Go scraper.
Happy to also defer to someone in your team to make the changes if you would prefer that.
In any case, I will check back any changes on this PR before starting my changes and will update the PR if at the time there still are no other updates from other people.
Thanks again!
|
Hi Ngalstyan4! Will look into this soon, ty for your patience |
This is an attempt at addressing #476
AWS doesn't expose IOPS through its API (discussed here), so this scrapes the data from the EC2 instance types documentation (https://aws.amazon.com/ec2/instance-types/). Specifically the "Instance store specifications" tables in gp.md, co.md, mo.md, so.md, ac.md, and hpc.md — and merges the IOPS values
Check out https://ec2instances.narekg.me/ to see this deployed.
The deploy source is in the main branch of my fork. A daily GitHub Actions cron re-scrapes and redeploys.
This is the plan I wrote for codegen assistance:
I am using ec2instances.info for ec2 instance info gathering and choosing.
I ran into an issue where local instance storage nvme specs are not filterable in the UI.
There have been past issues reporting this on the repo
issue refs: #476 currently open issue with some links
Another issue: #587
here they are discussing whether this could be done via aws API but conclusion seems to be --- no.
Would be worth investigating this further before giving up on API.
This amazon page has instance type info: https://docs.aws.amazon.com/ec2/latest/instancetypes/ (markdown of the page: https://docs.aws.amazon.com/ec2/latest/instancetypes/instance-types.md)
In there, there is a section with title.
And under it contains the following:
Current generation instances
For the best performance, we recommend that you use the following instance types when you launch new instances. For more information, see Amazon EC2 Instance Types.
if you open each .md file from above, you will see "Instance store specifications" section that has a table which has a column called "100% random read IOPS / Write IOPS"
That column contains read and write IOPS for local nvme. I would like to scrape clean this data and expose it as part of 2 columns that can be sorted, filtered etc.
Here is the full plan:
Propose a mechanism for scraping and exposing all instance storage iops. make this an internal route that only I can call to trigger scraping and can schedule for once a day regular scraping via cloudflare scheduled things.
Add a readonly debug endpoint here where I can send a get request and get raw data that has been scraped and structured as well as last scrape date time. keep each unique version of scraped md file in a cloudflare blob store with scrape date. if something changes in md from last version, make another copy with a new date. do this even if some time in the past history the exact same file existed. this way I will have copies of all files retained whenever there is a change. if latest scrape looks exactly like the last one, do not save a separate file. If something does change, make sure it is reflected in the website.
Ideally, I would like to avoid scraping all the things in a cloud myself. Ideally, I would like to download https://instances.vantagestaging.sh/www_pre_build.tar.gz like shown in readme once a day and base things on that. be careful to not run into cloudflare worker usage limits however. if you think this is doomed to fail, tell me early in stead of going forward.