Skip to content

Commit 20f0efb

Browse files
Merge pull request #8 from InsightDataCommunity/dev-dualdocker-20190204
Dev dualdocker 20190204
2 parents 279e969 + e7cbc96 commit 20f0efb

30 files changed

Lines changed: 3808 additions & 41 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
results/
22
.vscode/
33

4+
src/app/models/SentimentV1/uncased_L-12_H-768_A-12/
5+
*.zip
46
# Byte-compiled / optimized / DLL files
57
.DS_store
68
__pycache__/
@@ -14,6 +16,7 @@ __pycache__/
1416

1517
# model files
1618
*.h5
19+
*.bin
1720

1821
# Distribution / packaging
1922
.Python

README.md

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
# Sherlock
22
Sherlock is a web platform that allows user to create a image classifier for custom images, based on pre-trained CNN models. It also allows to use the customized CNN to pre-label images, and re-train the customized CNN when more training data become avaliable.
33

4-
Sherlock is currently serveing as RESTful APIs.
4+
Sherlock is currently serving as RESTful APIs.
5+
6+
- [Sherlock for NLP](#sherlock-for-nlp)
7+
58

69
[Here](http://bit.ly/michaniki_demo) are the slides for project Sherlock (previously called Michaniki).
710

@@ -75,6 +78,10 @@ Move to the directory where you cloned *Sherlock* , and run:
7578
```bash
7679
docker-compose up --build
7780
```
81+
Training using BERT runs much faster on GPU with >12GB RAM (Tested with Nvidia K80). To train with GPU run:
82+
```bash
83+
docker-compose -f docker-compose-gpu.yml up --build
84+
```
7885

7986
If everything goes well, you should start seeing the building message of the docker containers:
8087
```
@@ -253,3 +260,66 @@ curl -X POST \
253260
-F train_bucket_prefix=S3_BUCEKT_PREFIX/sha
254261
```
255262

263+
## Sherlock for NLP
264+
265+
### 1. Predict Sentiment of Text with Simple run_classifier
266+
```bash
267+
curl -X POST \
268+
http://127.0.0.1:3031/sentimentV1/predict \
269+
-H 'Cache-Control: no-cache' \
270+
-H 'Postman-Token: eeedb319-2218-44b9-86eb-63a3a1f62e14' \
271+
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
272+
-F textv='the movie was bad' \
273+
-F model_name=base
274+
```
275+
276+
### 2. Train a new classification model using pre-trained BERT model
277+
278+
**The new text dataset should be stored at S3 first, the directory architecture in S3 should look like this**:
279+
```
280+
.
281+
├── YOUR_BUCKET_NAME
282+
│ ├── train.tsv
283+
│ ├── val.tsv
284+
│ ├── test.tsv
285+
```
286+
The folder name you give to *YOUR_MODEL_NAME* will be used to identify this model once it get trained.
287+
288+
The name of train, val and test files **can't be changed**.
289+
The train and dev file should have below format (without header)-
290+
id label None Sentence
291+
1 0 NC Text
292+
The test.tsv file should only have id and sentence column (with header)
293+
**The S3 folders should have public access permission**.
294+
295+
To call this API, do:
296+
```bash
297+
curl -X POST \
298+
http://127.0.0.1:3031/sentimentV1/trainbert \
299+
-H 'Cache-Control: no-cache' \
300+
-H 'Postman-Token: 4e90e1d6-de18-4501-a82c-f8a878616b12' \
301+
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
302+
-F train_bucket_name=YOUR_BUCKET_NAME \
303+
-F train_bucket_prefix=YOUR_MODEL_NAME
304+
```
305+
### 3. Lable all text in a csv file using pre-trained BERT model
306+
307+
**The new test tsv file should be stored at the same S3 bucket as above for that model, directory architecture in S3 should look like this**:
308+
```
309+
.
310+
├── YOUR_BUCKET_NAME
311+
│ ├── train.tsv
312+
│ ├── val.tsv
313+
│ ├── test.tsv
314+
```
315+
To call this API do:
316+
```bash
317+
curl -X POST \
318+
http://127.0.0.1:3031/sentimentV1/testbert \
319+
-H 'Cache-Control: no-cache' \
320+
-H 'Postman-Token: 4e90e1d6-de18-4501-a82c-f8a878616b12' \
321+
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
322+
-F test_bucket_name=YOUR_BUCKET_NAME \
323+
-F test_bucket_prefix=YOUR_MODEL_NAME
324+
```
325+
At the end of prediction a file named 'test_results.csv' will be uploaded to the same S3 bucket.

docker-compose-gpu.yml

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
version: '2.3'
2+
3+
services:
4+
michaniki_client:
5+
build:
6+
context: ./src
7+
dockerfile: Dockerfile-gpu
8+
runtime: nvidia
9+
ports:
10+
- "3031:3031"
11+
environment:
12+
- PORT=3031
13+
- FLAS_APP=app/__init__.py
14+
- FLASK_DEBUG=1
15+
- REDIS_URL="redis://redis"
16+
- REDIS_PORT=6379
17+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
18+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
19+
20+
volumes:
21+
- ./src:/opt/src
22+
23+
command: ./entryPoint.sh
24+
depends_on:
25+
- redis
26+
networks:
27+
- michaniki
28+
29+
inference_server:
30+
build:
31+
context: ./src
32+
dockerfile: Dockerfile-gpu
33+
runtime: nvidia
34+
environment:
35+
- REDIS_URL="redis://redis"
36+
- REDIS_PORT=6379
37+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
38+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
39+
40+
command: ['python', 'app/models/InceptionV3/inception_inference_server.py']
41+
volumes:
42+
- ./src:/opt/src
43+
44+
networks:
45+
- michaniki
46+
depends_on:
47+
- michaniki_client
48+
- redis
49+
50+
sentiment_inference_server:
51+
build:
52+
context: ./src
53+
dockerfile: Dockerfile-sentiment
54+
runtime: nvidia
55+
environment:
56+
- REDIS_URL="redis://redis"
57+
- REDIS_PORT=6379
58+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
59+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
60+
command: ['python', 'app/models/SentimentV1/sentiment_infer_server.py']
61+
volumes:
62+
- ./src:/opt/src
63+
networks:
64+
- michaniki
65+
depends_on:
66+
- michaniki_client
67+
- redis
68+
69+
celery_worker:
70+
build:
71+
context: ./src
72+
dockerfile: Dockerfile-gpu
73+
runtime: nvidia
74+
command: ['celery', '-A', 'app.celeryapp:michaniki_celery_app', 'worker', '-l', 'info']
75+
volumes:
76+
- ./src:/opt/src
77+
networks:
78+
- michaniki
79+
depends_on:
80+
- michaniki_client
81+
environment:
82+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
83+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
84+
- DB_HOST=db
85+
- DB_USERNAME=root
86+
- DB_PASSWORD=michaniki
87+
- DB_NAME=michanikidb
88+
- BROKER_URL=redis://redis:6379/0
89+
90+
91+
redis:
92+
image: redis:4.0.5-alpine
93+
command: ["redis-server", "--appendonly", "yes"]
94+
hostname: redis
95+
networks:
96+
- michaniki
97+
98+
networks:
99+
michaniki:

docker-compose.yml

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
version: '3'
1+
version: '2.3'
22

33
services:
44
michaniki_client:
@@ -13,8 +13,10 @@ services:
1313
- REDIS_PORT=6379
1414
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
1515
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
16+
1617
volumes:
1718
- ./src:/opt/src
19+
1820
command: ./entryPoint.sh
1921
depends_on:
2022
- redis
@@ -28,15 +30,35 @@ services:
2830
- REDIS_PORT=6379
2931
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
3032
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
33+
3134
command: ['python', 'app/models/InceptionV3/inception_inference_server.py']
3235
volumes:
3336
- ./src:/opt/src
37+
3438
networks:
3539
- michaniki
3640
depends_on:
3741
- michaniki_client
3842
- redis
39-
43+
44+
sentiment_inference_server:
45+
build:
46+
context: ./src
47+
dockerfile: Dockerfile-sentiment
48+
environment:
49+
- REDIS_URL="redis://redis"
50+
- REDIS_PORT=6379
51+
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
52+
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
53+
command: ['python', 'app/models/SentimentV1/sentiment_infer_server.py']
54+
volumes:
55+
- ./src:/opt/src
56+
networks:
57+
- michaniki
58+
depends_on:
59+
- michaniki_client
60+
- redis
61+
4062
celery_worker:
4163
build: ./src
4264
command: ['celery', '-A', 'app.celeryapp:michaniki_celery_app', 'worker', '-l', 'info']
@@ -54,14 +76,14 @@ services:
5476
- DB_PASSWORD=michaniki
5577
- DB_NAME=michanikidb
5678
- BROKER_URL=redis://redis:6379/0
57-
79+
80+
5881
redis:
5982
image: redis:4.0.5-alpine
6083
command: ["redis-server", "--appendonly", "yes"]
6184
hostname: redis
6285
networks:
6386
- michaniki
64-
87+
6588
networks:
6689
michaniki:
67-

src/Dockerfile-gpu

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
FROM tensorflow/tensorflow:1.12.0-gpu
2+
3+
# utils
4+
RUN apt-get update && apt-get install -y --no-install-recommends apt-utils
5+
6+
#RUN conda install gxx_linux-64
7+
8+
#RUN apt-get install -y --force-yes default-libmysqlclient-dev mysql-client build-essential
9+
10+
# Grab requirements.txt.
11+
COPY requirements-gpu.txt /tmp/requirements-gpu.txt
12+
13+
# Install dependencies
14+
RUN pip install -qr /tmp/requirements-gpu.txt
15+
16+
# create a user for web server
17+
RUN adduser --disabled-password --gecos "" foo
18+
19+
COPY ./ /opt/src
20+
21+
WORKDIR /opt/src

src/Dockerfile-sentiment

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
FROM continuumio/miniconda:4.4.10
2+
3+
# utils
4+
RUN apt-get update && apt-get install -y --no-install-recommends apt-utils
5+
6+
RUN conda install gxx_linux-64
7+
8+
9+
RUN apt-get install -y --force-yes default-libmysqlclient-dev mysql-client build-essential
10+
11+
# Grab requirements.txt.
12+
COPY requirementssenti.txt /tmp/requirementssenti.txt
13+
14+
# Install dependencies
15+
RUN pip install -qr /tmp/requirementssenti.txt
16+
17+
RUN pip install fasttext
18+
19+
# create a user for web server
20+
RUN adduser --disabled-password --gecos "" foo
21+
22+
COPY ./ /opt/src
23+
24+
WORKDIR /opt/src

src/__init__.py

Whitespace-only changes.

src/app/apis/InceptionV3/API_helpers.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,10 @@ def download_a_dir_from_s3(bucket_name, bucket_prefix, local_path):
6969
os.makedirs(save_path)
7070
except OSError:
7171
pass
72-
mybucket.download_file(obj.key, os.path.join(save_path, filename))
72+
try:
73+
mybucket.download_file(obj.key, os.path.join(save_path, filename))
74+
except OSError:
75+
pass
7376

7477
print "* Helper: Images Loaded at: {}".format(output_path)
7578
return output_path

0 commit comments

Comments
 (0)