Skip to content

Commit c7ec702

Browse files
committed
Merge branch 'ThaiHipster-robert_branch'
2 parents 492d120 + cd7e9ce commit c7ec702

4 files changed

Lines changed: 9 additions & 3 deletions

File tree

.build.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
python3 -m pip install --user virtualenv
44
python3 -m venv .venv
55
source .venv/bin/activate
6-
cd scrapy/schools
6+
cd schools
77
pip install -r requirements.txt
88
pip install schools --no-index --find-links .
99

docker-compose.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
version: '3.5'
55
services:
66
crawler_api:
7-
build: scrapy/schools # Build the Scrapy Dockerfile.
7+
build: schools # Build the Scrapy Dockerfile.
88
depends_on:
99
- mongodb_container
1010
logging:
@@ -38,7 +38,7 @@ services:
3838
- "7480:6379"
3939
# command: rq worker crawling-tasks
4040
redis_worker:
41-
build: scrapy/schools
41+
build: schools
4242
depends_on:
4343
- mongodb_container
4444
- redis

schools/schools/spiders/scrapy_vanilla.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,12 @@ def parse_items(self, response):
162162
if 'text/html' in str(response.headers['Content-Type']):
163163
for href in response.xpath('//a/@href').getall():
164164
yield Request(response.urljoin(href), self.parse_items)
165+
yield from response.follow_all(
166+
css="a[href]" \
167+
+ ":not([href^='javascript:'])" \
168+
+ ":not([href^='tel:'])" \
169+
+ ":not([href^='mailto:'])",
170+
callback=self.parse_items)
165171

166172
def init_from_school_list(self, school_list):
167173
"""

0 commit comments

Comments
 (0)