Skip to content

Commit d30954a

Browse files
authored
Merge pull request #21 from proxymesh/feature/python-direct-proxy-examples
Refactor Python proxy examples to use libraries directly
2 parents 82ed11e + e74b333 commit d30954a

28 files changed

Lines changed: 557 additions & 567 deletions

.github/workflows/proxy_integration_tests_javascript.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,12 @@ jobs:
1919
runs-on: ubuntu-latest
2020

2121
steps:
22-
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
22+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
2323
with:
2424
persist-credentials: false
2525

2626
- name: Set up Node
27-
uses: actions/setup-node@v4
27+
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0 (Node 24)
2828
with:
2929
node-version: "24"
3030
cache: npm

.github/workflows/proxy_integration_tests_php.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
runs-on: ubuntu-latest
2020

2121
steps:
22-
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
22+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
2323
with:
2424
persist-credentials: false
2525

.github/workflows/proxy_integration_tests_python.yml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,22 +19,23 @@ jobs:
1919
runs-on: ubuntu-latest
2020

2121
steps:
22-
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
22+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
2323
with:
2424
persist-credentials: false
2525

2626
- name: Set up Python
27-
uses: actions/setup-python@v5
27+
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0 (Node 24)
2828
with:
29-
python-version: "3.x"
29+
# Pin for reproducible dependency wheels (pycurl, etc.); adjust as needed.
30+
python-version: "3.12"
3031

3132
- name: Install system dependencies (pycurl)
3233
run: sudo apt-get update && sudo apt-get install -y libcurl4-openssl-dev
3334

34-
- name: Install python-proxy-headers and example dependencies
35+
- name: Install example dependencies
3536
run: |
3637
python -m pip install --upgrade pip
37-
pip install python-proxy-headers requests urllib3 aiohttp httpx cloudscraper autoscraper pycurl
38+
pip install -r python/requirements.txt
3839
3940
- name: Require PROXY_URL Actions secret
4041
env:

.github/workflows/proxy_integration_tests_ruby.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,12 @@ jobs:
1919
runs-on: ubuntu-latest
2020

2121
steps:
22-
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
22+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
2323
with:
2424
persist-credentials: false
2525

2626
- name: Set up Ruby
27-
uses: ruby/setup-ruby@2e007403fc1ec238429ecaa57af6f22f019cc135 # v1.234.0
27+
uses: ruby/setup-ruby@3ff19f5e2baf30647122352b96108b1fbe250c64 # v1.299.0 (Node 24)
2828
with:
2929
ruby-version: "3.3"
3030
bundler-cache: true

README.md

Lines changed: 24 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -9,69 +9,56 @@ Example code for using proxy servers in different programming languages. Current
99

1010
## Python Proxy Examples
1111

12-
### Using python-proxy-headers
13-
14-
The [python-proxy-headers](https://github.com/proxymesh/python-proxy-headers) library enables sending custom headers to proxy servers and receiving proxy response headers. This is essential for services like [ProxyMesh](https://proxymesh.com) that use custom headers for country selection and IP assignment.
15-
1612
**Installation:**
1713

1814
```bash
19-
pip install python-proxy-headers
15+
pip install -r python/requirements.txt
2016
```
2117

22-
**Running Examples:**
18+
`pycurl` needs libcurl and `curl-config` (for example Debian/Ubuntu: `libcurl4-openssl-dev`). The test runner skips `pycurl-*` examples when `pycurl` is not installed, and skips `scrapy-proxy` when `import scrapy` fails (for example a broken `cryptography` / `cffi` install).
2319

24-
All examples read proxy configuration from environment variables:
20+
**Running Examples:**
2521

2622
```bash
2723
# Required: Set your proxy URL
2824
export PROXY_URL='http://user:pass@proxy.example.com:8080'
2925

30-
# Optional: Custom test URL (default: https://api.ipify.org?format=json)
26+
# Optional: Target URL (default: https://api.ipify.org?format=json)
3127
export TEST_URL='https://httpbin.org/ip'
3228

33-
# Optional: Send a custom header to the proxy
34-
export PROXY_HEADER='X-ProxyMesh-Country'
35-
export PROXY_VALUE='US'
36-
37-
# Optional: Read a specific header from the response
29+
# Optional: Print one response header
3830
export RESPONSE_HEADER='X-ProxyMesh-IP'
3931

40-
# Run a single example
41-
python python/requests-proxy-headers.py
32+
# Single example
33+
python python/requests-proxy.py
4234

43-
# Run all examples as tests
35+
# All examples as tests
4436
python python/run_tests.py
4537

46-
# Run specific examples
47-
python python/run_tests.py requests-proxy-headers httpx-proxy-headers
38+
# Specific examples (substring match, like the JS runner)
39+
python python/run_tests.py requests httpx
4840
```
4941

5042
**Examples:**
5143

5244
| Library | Example | Description |
5345
|---------|---------|-------------|
54-
| [requests](https://docs.python-requests.org/) | [requests-proxy-headers.py](python/requests-proxy-headers.py) | Simple HTTP requests with proxy headers |
55-
| [requests](https://docs.python-requests.org/) | [requests-proxy-headers-session.py](python/requests-proxy-headers-session.py) | Session-based requests for connection pooling |
56-
| [urllib3](https://urllib3.readthedocs.io/) | [urllib3-proxy-headers.py](python/urllib3-proxy-headers.py) | Low-level HTTP client with proxy headers |
57-
| [aiohttp](https://docs.aiohttp.org/) | [aiohttp-proxy-headers.py](python/aiohttp-proxy-headers.py) | Async HTTP client with proxy headers |
58-
| [httpx](https://www.python-httpx.org/) | [httpx-proxy-headers.py](python/httpx-proxy-headers.py) | Modern HTTP client with proxy headers |
59-
| [httpx](https://www.python-httpx.org/) | [httpx-async-proxy-headers.py](python/httpx-async-proxy-headers.py) | Async httpx with proxy headers |
60-
| [pycurl](http://pycurl.io/) | [pycurl-proxy-headers.py](python/pycurl-proxy-headers.py) | libcurl bindings with proxy headers |
61-
| [pycurl](http://pycurl.io/) | [pycurl-proxy-headers-lowlevel.py](python/pycurl-proxy-headers-lowlevel.py) | Low-level pycurl integration |
62-
| [cloudscraper](https://github.com/venomous/cloudscraper) | [cloudscraper-proxy-headers.py](python/cloudscraper-proxy-headers.py) | Cloudflare bypass with proxy headers |
63-
| [autoscraper](https://github.com/alirezamika/autoscraper) | [autoscraper-proxy-headers.py](python/autoscraper-proxy-headers.py) | Automatic web scraping with proxy headers |
64-
65-
> **Note:** Most Python HTTP libraries do not expose custom headers on HTTPS `CONNECT` tunneling by default. These examples use [python-proxy-headers](https://github.com/proxymesh/python-proxy-headers) adapters to send proxy headers and read proxy response headers consistently.
66-
67-
### Basic Proxy Examples
68-
69-
* [requests-proxy.py](python/requests-proxy.py) - Basic proxy usage with requests
70-
* [requests-random-proxy.py](python/requests-random-proxy.py) - Random proxy rotation
46+
| [requests](https://docs.python-requests.org/) | [requests-proxy.py](python/requests-proxy.py) | Basic `GET` with `proxies=` |
47+
| [requests](https://docs.python-requests.org/) | [requests-session-proxy.py](python/requests-session-proxy.py) | Session with pooled connections |
48+
| [urllib3](https://urllib3.readthedocs.io/) | [urllib3-proxy.py](python/urllib3-proxy.py) | `ProxyManager` |
49+
| [aiohttp](https://docs.aiohttp.org/) | [aiohttp-proxy.py](python/aiohttp-proxy.py) | Async client, `proxy=` on the request |
50+
| [httpx](https://www.python-httpx.org/) | [httpx-proxy.py](python/httpx-proxy.py) | Sync client, `proxy=` on the client |
51+
| [httpx](https://www.python-httpx.org/) | [httpx-async-proxy.py](python/httpx-async-proxy.py) | Async client |
52+
| [pycurl](http://pycurl.io/) | [pycurl-proxy.py](python/pycurl-proxy.py) | libcurl via `setopt` (`PROXY`, `WRITEDATA`, etc.) |
53+
| [cloudscraper](https://github.com/VeNoMouS/cloudscraper) | [cloudscraper-proxy.py](python/cloudscraper-proxy.py) | Requests-based scraper with `proxies` |
54+
| [autoscraper](https://github.com/alirezamika/autoscraper) | [autoscraper-proxy.py](python/autoscraper-proxy.py) | Offline `html=` demo (matches upstream tests); README shows `request_args` + `proxies` for live URLs |
55+
| [Scrapy](https://scrapy.org/) | [scrapy-proxy.py](python/scrapy-proxy.py) | `scrapy runspider` with `meta['proxy']` |
56+
57+
### Other Python scripts
7158

72-
### Scrapy
59+
* [requests-random-proxy.py](python/requests-random-proxy.py) - Random proxy rotation
7360

74-
* [scrapy-proxy-headers.py](python/scrapy-proxy-headers.py) - Scrapy spider with proxy headers
61+
> **Note:** Like the Ruby, JavaScript, and PHP examples here, these scripts use each library's normal proxy options only. Most of them do not send custom headers on the HTTPS `CONNECT` tunnel or surface proxy `CONNECT` response headers. For that, see [python-proxy-headers](https://github.com/proxymesh/python-proxy-headers) or [scrapy-proxy-headers](https://github.com/proxymesh/scrapy-proxy-headers).
7562
7663
## JavaScript / Node.js Proxy Examples
7764

python/aiohttp-proxy-headers.py

Lines changed: 0 additions & 48 deletions
This file was deleted.

python/aiohttp-proxy.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#!/usr/bin/env python3
2+
"""
3+
aiohttp with an HTTP proxy.
4+
5+
Configuration via environment variables:
6+
PROXY_URL - Proxy URL (required), e.g., http://user:pass@proxy:8080
7+
TEST_URL - URL to request (default: https://api.ipify.org?format=json)
8+
RESPONSE_HEADER - Optional header name to print from the response
9+
10+
Documentation: https://docs.aiohttp.org/en/stable/client_advanced.html#proxy-support
11+
"""
12+
import asyncio
13+
import os
14+
import sys
15+
16+
import aiohttp
17+
18+
proxy_url = os.environ.get('PROXY_URL') or os.environ.get('HTTPS_PROXY')
19+
if not proxy_url:
20+
print('Error: Set PROXY_URL environment variable', file=sys.stderr)
21+
sys.exit(1)
22+
23+
test_url = os.environ.get('TEST_URL', 'https://api.ipify.org?format=json')
24+
response_header = os.environ.get('RESPONSE_HEADER')
25+
26+
27+
async def main() -> None:
28+
timeout = aiohttp.ClientTimeout(total=30)
29+
async with aiohttp.ClientSession(timeout=timeout) as session:
30+
async with session.get(test_url, proxy=proxy_url) as response:
31+
body = await response.text()
32+
print(f'Status: {response.status}')
33+
print(f'Body: {body}')
34+
if response_header:
35+
print(f'{response_header}: {response.headers.get(response_header)}')
36+
37+
38+
if __name__ == '__main__':
39+
asyncio.run(main())

python/autoscraper-proxy-headers.py

Lines changed: 0 additions & 45 deletions
This file was deleted.

python/autoscraper-proxy.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/env python3
2+
"""
3+
AutoScraper with a proxy (how to pass ``request_args``).
4+
5+
The AutoScraper project tests ``build`` / ``get_result_similar`` with **inline HTML**
6+
only — see ``tests/unit/test_build.py`` and ``tests/integration/`` in
7+
https://github.com/alirezamika/autoscraper — not with live URLs. That keeps tests
8+
deterministic. This script does the same for the integration runner.
9+
10+
**Using a proxy with a real URL** matches the library README::
11+
12+
scraper.build(url, wanted_list, request_args={'proxies': proxies, 'timeout': 30})
13+
scraper.get_result_similar(url, request_args={'proxies': proxies, 'timeout': 30})
14+
15+
``PROXY_URL`` is required here so this example fits the same env as the other scripts;
16+
this demo does not open a network connection — it only exercises AutoScraper on
17+
embedded HTML.
18+
19+
Configuration via environment variables:
20+
PROXY_URL - Required by the test runner (same as other examples), e.g.
21+
http://user:pass@proxy:8080
22+
23+
Documentation: https://github.com/alirezamika/autoscraper
24+
"""
25+
import os
26+
import sys
27+
28+
from autoscraper import AutoScraper
29+
30+
# Same idea as upstream tests/unit/test_build.py — fixed HTML, no HTTP.
31+
SAMPLE_HTML = """<!DOCTYPE html>
32+
<html><head><title>Proxy example</title></head>
33+
<body>
34+
<h1>AutoScraper proxy example</h1>
35+
<p>Paragraph one.</p>
36+
</body></html>
37+
"""
38+
PLACEHOLDER_URL = 'https://example.invalid/autoscraper-proxy-demo'
39+
40+
41+
def main() -> None:
42+
scraper = AutoScraper()
43+
wanted_list = ['AutoScraper proxy example']
44+
learned = scraper.build(
45+
html=SAMPLE_HTML,
46+
url=PLACEHOLDER_URL,
47+
wanted_list=wanted_list,
48+
)
49+
similar = scraper.get_result_similar(html=SAMPLE_HTML, url=PLACEHOLDER_URL)
50+
print(f'AutoScraper build: {learned}')
51+
print(f'AutoScraper get_result_similar: {similar}')
52+
if not learned:
53+
sys.exit(1)
54+
55+
56+
if __name__ == '__main__':
57+
if not (os.environ.get('PROXY_URL') or os.environ.get('HTTPS_PROXY')):
58+
print('Error: Set PROXY_URL environment variable', file=sys.stderr)
59+
sys.exit(1)
60+
main()

0 commit comments

Comments
 (0)