Skip to content

Commit c5f4e46

Browse files
committed
Allow Software Heritage IP addresses
Software Heritage is similar to the Internet Archive (which is allowed) but is focused on archiving source code from forges instead of pages from websites. Software Heritage processes do not use generic web crawling, but instead forge-specific and VCS-specific tools that are designed to use as little resources as possible, especially using incremental pulls, using/adding APIs and ignoring repositories that have not changed. See-also: https://github.com/TecharoHQ/anubis/pulls/276 See-also: https://www.softwareheritage.org/ See-also: https://www.softwareheritage.org/software-heritage-faq/ See-also: https://docs.softwareheritage.org/user/faq/ See-also: https://gitlab.softwareheritage.org/swh See-also: https://gitlab.softwareheritage.org/swh/infra/add-forge-now-requests/-/merge_requests/4
1 parent dbd64e0 commit c5f4e46

3 files changed

Lines changed: 6 additions & 0 deletions

File tree

data/crawlers/_allow-good.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,6 @@
88
- import: (data)/crawlers/marginalia.yaml
99
- import: (data)/crawlers/mojeekbot.yaml
1010
- import: (data)/crawlers/commoncrawl.yaml
11+
- import: (data)/crawlers/software-heritage.yaml
1112
- import: (data)/crawlers/wikimedia-citoid.yaml
1213
- import: (data)/crawlers/yandexbot.yaml
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- name: software-heritage
2+
action: ALLOW
3+
# https://docs.softwareheritage.org/user/faq/#which-ip-address-range-should-we-mark-as-safe-in-our-anti-bot-protection-systems
4+
remote_addresses: [ "128.93.166.0/26" ]

docs/docs/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111

1212
## [Unreleased]
1313

14+
- Allow requests from Software Heritage
1415
- Expose [pprof endpoints](https://pkg.go.dev/net/http/pprof) on the metrics listener to enable profiling Anubis in production.
1516
- fix: prevent nil pointer panic in challenge validation when threshold rules match during PassChallenge (#1463)
1617
- Instruct reverse proxies to not cache error pages.

0 commit comments

Comments
 (0)