Scrape Instance stora IOPS data and present as an additional filterable sortable column by Ngalstyan4 · Pull Request #892 · vantage-sh/ec2instances.info

Ngalstyan4 · 2026-04-24T19:02:52Z

This is an attempt at addressing #476

AWS doesn't expose IOPS through its API (discussed here), so this scrapes the data from the EC2 instance types documentation (https://aws.amazon.com/ec2/instance-types/). Specifically the "Instance store specifications" tables in gp.md, co.md, mo.md, so.md, ac.md, and hpc.md — and merges the IOPS values

Check out https://ec2instances.narekg.me/ to see this deployed.
The deploy source is in the main branch of my fork. A daily GitHub Actions cron re-scrapes and redeploys.

This is the plan I wrote for codegen assistance:

I am using ec2instances.info for ec2 instance info gathering and choosing.

I ran into an issue where local instance storage nvme specs are not filterable in the UI.

There have been past issues reporting this on the repo
issue refs: #476 currently open issue with some links

Another issue: #587
here they are discussing whether this could be done via aws API but conclusion seems to be --- no.
Would be worth investigating this further before giving up on API.

This amazon page has instance type info: https://docs.aws.amazon.com/ec2/latest/instancetypes/ (markdown of the page: https://docs.aws.amazon.com/ec2/latest/instancetypes/instance-types.md)

In there, there is a section with title.

And under it contains the following:

Current generation instances

For the best performance, we recommend that you use the following instance types when you launch new instances. For more information, see Amazon EC2 Instance Types.

**General purpose: **M5 $1 M5a $1 M5ad $1 M5d $1 M5dn $1 M5n $1 M5zn $1 M6a $1 M6g $1 M6gd $1 M6i $1 M6id $1 M6idn $1 M6in $1 M7a $1 M7g $1 M7gd $1 M7i $1 M7i-flex $1 M8a $1 M8azn $1 M8g $1 M8gb $1 M8gd $1 M8gn $1 M8i $1 M8id $1 M8i-flex $1 Mac1 $1 Mac2 $1 Mac2-m1ultra $1 Mac2-m2 $1 Mac2-m2pro $1 Mac-m4 $1 Mac-m4pro $1 Mac-m4max $1 T2 $1 T3 $1 T3a $1 T4g
**Compute optimized: **C5 $1 C5a $1 C5ad $1 C5d $1 C5n $1 C6a $1 C6g $1 C6gd $1 C6gn $1 C6i $1 C6id $1 C6in $1 C7a $1 C7g $1 C7gd $1 C7gn $1 C7i $1 C7i-flex $1 C8a $1 C8g $1 C8gb $1 C8gd $1 C8gn $1 C8i $1 C8ib $1 C8id $1 C8in $1 C8i-flex
**Memory optimized: **R5 $1 R5a $1 R5ad $1 R5b $1 R5d $1 R5dn $1 R5n $1 R6a $1 R6g $1 R6gd $1 R6i $1 R6id $1 R6idn $1 R6in $1 R7a $1 R7g $1 R7gd $1 R7i $1 R7iz $1 R8a $1 R8g $1 R8gb $1 R8gd $1 R8gn $1 R8i $1 R8id $1 R8i-flex $1 U-3tb1 $1 U-6tb1 $1 U-9tb1 $1 U-12tb1 $1 U-18tb1 $1 U-24tb1 $1 U7i-6tb $1 U7i-8tb $1 U7i-12tb $1 U7in-16tb $1 U7in-24tb $1 U7in-32tb $1 U7inh-32tb $1 X1 $1 X1e $1 X2gd $1 X2idn $1 X2iedn $1 X2iezn $1 X8g $1 X8aedz $1 X8i $1 z1d
**Storage optimized: **D2 $1 D3 $1 D3en $1 H1 $1 I3 $1 I3en $1 I4g $1 I4i $1 I7i $1 I7ie $1 I8g $1 I8ge $1 Im4gn $1 Is4gen
**Accelerated computing: **DL1 $1 DL2q $1 F1 $1 F2 $1 G4ad $1 G4dn $1 G5 $1 G5g $1 G6 $1 G6e $1 G6f $1 Gr6 $1 Gr6f $1 G7e $1 Inf1 $1 Inf2 $1 P4d $1 P4de $1 P5 $1 P5e $1 P5en $1 P6-B200 $1 P6-B300 $1 P6e-GB200 $1 Trn1 $1 Trn1n $1 Trn2 $1 Trn2u $1 VT1
**High-performance computing: **Hpc6a $1 Hpc6id $1 Hpc7a $1 Hpc7g $1 Hpc8a

if you open each .md file from above, you will see "Instance store specifications" section that has a table which has a column called "100% random read IOPS / Write IOPS"
That column contains read and write IOPS for local nvme. I would like to scrape clean this data and expose it as part of 2 columns that can be sorted, filtered etc.

Here is the full plan:

Propose a mechanism for scraping and exposing all instance storage iops. make this an internal route that only I can call to trigger scraping and can schedule for once a day regular scraping via cloudflare scheduled things.
Add a readonly debug endpoint here where I can send a get request and get raw data that has been scraped and structured as well as last scrape date time. keep each unique version of scraped md file in a cloudflare blob store with scrape date. if something changes in md from last version, make another copy with a new date. do this even if some time in the past history the exact same file existed. this way I will have copies of all files retained whenever there is a change. if latest scrape looks exactly like the last one, do not save a separate file. If something does change, make sure it is reflected in the website.

Ideally, I would like to avoid scraping all the things in a cloud myself. Ideally, I would like to download https://instances.vantagestaging.sh/www_pre_build.tar.gz like shown in readme once a day and base things on that. be careful to not run into cloudflare worker usage limits however. if you think this is doomed to fail, tell me early in stead of going forward.

IAmJSD · 2026-05-04T11:44:21Z

As a now external user of the dataset, I'd incorporate this into the Go scraper (most of the methods should be there to help you). Other than that LGTM!

thanks for taking a look, @IAmJSD !
Happy to make the necessary changes to bring this upstream. But would first want to hear from a current maintainer (you or someone else!) that they are interested in having this in upstream.

In the meantime, the fork I linked above along with the github CI works for me for now.

I'm no longer a maintainer so this isn't my call, just inputting my 2 cents :)

I agree with Astrid’s suggestion. The UI/table changes look reasonable, but I’d prefer to see the IOPS scrape folded into the Go scraper so it becomes part of the normal EC2 data generation path rather than a separate Node script.

ok, thanks for getting back to me @emilydunenfeld !
I will do my best to get around to this in the next couple of weeks and fold this into existing Go scraper.
Happy to also defer to someone in your team to make the changes if you would prefer that.

In any case, I will check back any changes on this PR before starting my changes and will update the PR if at the time there still are no other updates from other people.

Thanks again!

emilydunenfeld · 2026-05-05T18:36:37Z

Hi Ngalstyan4! Will look into this soon, ty for your patience

Ngalstyan4 added 2 commits April 24, 2026 13:26

Scrape Instance store read/write IOPS and expose in table

b000fdb

Update golden copy

dee39a1

Ngalstyan4 mentioned this pull request Apr 24, 2026

Display Attached Storage iOPs for relevant instance classes #476

Open

IAmJSD suggested changes May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape Instance stora IOPS data and present as an additional filterable sortable column#892

Scrape Instance stora IOPS data and present as an additional filterable sortable column#892
Ngalstyan4 wants to merge 2 commits into
vantage-sh:developfrom
Ngalstyan4:narek/nvme-iops

Ngalstyan4 commented Apr 24, 2026

Uh oh!

IAmJSD May 4, 2026

Uh oh!

Ngalstyan4 May 4, 2026

Uh oh!

IAmJSD May 4, 2026

Uh oh!

emilydunenfeld May 12, 2026

Uh oh!

Ngalstyan4 May 12, 2026

Uh oh!

emilydunenfeld commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ngalstyan4 commented Apr 24, 2026

Current generation instances

Uh oh!

IAmJSD May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Ngalstyan4 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

IAmJSD May 4, 2026

Choose a reason for hiding this comment

Uh oh!

emilydunenfeld May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Ngalstyan4 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

emilydunenfeld commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants