|
1 | 1 | # Sherlock |
2 | 2 | Sherlock is a web platform that allows user to create a image classifier for custom images, based on pre-trained CNN models. It also allows to use the customized CNN to pre-label images, and re-train the customized CNN when more training data become avaliable. |
3 | 3 |
|
4 | | -Sherlock is currently serveing as RESTful APIs. |
| 4 | +Sherlock is currently serving as RESTful APIs. |
| 5 | + |
| 6 | +- [Sherlock for NLP](#sherlock-for-nlp) |
| 7 | + |
5 | 8 |
|
6 | 9 | [Here](http://bit.ly/michaniki_demo) are the slides for project Sherlock (previously called Michaniki). |
7 | 10 |
|
@@ -75,6 +78,10 @@ Move to the directory where you cloned *Sherlock* , and run: |
75 | 78 | ```bash |
76 | 79 | docker-compose up --build |
77 | 80 | ``` |
| 81 | +Training using BERT runs much faster on GPU with >12GB RAM (Tested with Nvidia K80). To train with GPU run: |
| 82 | +```bash |
| 83 | +docker-compose -f docker-compose-gpu.yml up --build |
| 84 | +``` |
78 | 85 |
|
79 | 86 | If everything goes well, you should start seeing the building message of the docker containers: |
80 | 87 | ``` |
@@ -253,3 +260,66 @@ curl -X POST \ |
253 | 260 | -F train_bucket_prefix=S3_BUCEKT_PREFIX/sha |
254 | 261 | ``` |
255 | 262 |
|
| 263 | +## Sherlock for NLP |
| 264 | + |
| 265 | +### 1. Predict Sentiment of Text with Simple run_classifier |
| 266 | +```bash |
| 267 | +curl -X POST \ |
| 268 | + http://127.0.0.1:3031/sentimentV1/predict \ |
| 269 | + -H 'Cache-Control: no-cache' \ |
| 270 | + -H 'Postman-Token: eeedb319-2218-44b9-86eb-63a3a1f62e14' \ |
| 271 | + -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \ |
| 272 | + -F textv='the movie was bad' \ |
| 273 | + -F model_name=base |
| 274 | +``` |
| 275 | + |
| 276 | +### 2. Train a new classification model using pre-trained BERT model |
| 277 | + |
| 278 | +**The new text dataset should be stored at S3 first, the directory architecture in S3 should look like this**: |
| 279 | +``` |
| 280 | +. |
| 281 | +├── YOUR_BUCKET_NAME |
| 282 | +│ ├── train.tsv |
| 283 | +│ ├── val.tsv |
| 284 | +│ ├── test.tsv |
| 285 | +``` |
| 286 | +The folder name you give to *YOUR_MODEL_NAME* will be used to identify this model once it get trained. |
| 287 | + |
| 288 | +The name of train, val and test files **can't be changed**. |
| 289 | +The train and dev file should have below format (without header)- |
| 290 | +id label None Sentence |
| 291 | +1 0 NC Text |
| 292 | +The test.tsv file should only have id and sentence column (with header) |
| 293 | +**The S3 folders should have public access permission**. |
| 294 | + |
| 295 | +To call this API, do: |
| 296 | +```bash |
| 297 | +curl -X POST \ |
| 298 | + http://127.0.0.1:3031/sentimentV1/trainbert \ |
| 299 | + -H 'Cache-Control: no-cache' \ |
| 300 | + -H 'Postman-Token: 4e90e1d6-de18-4501-a82c-f8a878616b12' \ |
| 301 | + -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \ |
| 302 | + -F train_bucket_name=YOUR_BUCKET_NAME \ |
| 303 | + -F train_bucket_prefix=YOUR_MODEL_NAME |
| 304 | +``` |
| 305 | +### 3. Lable all text in a csv file using pre-trained BERT model |
| 306 | + |
| 307 | +**The new test tsv file should be stored at the same S3 bucket as above for that model, directory architecture in S3 should look like this**: |
| 308 | +``` |
| 309 | +. |
| 310 | +├── YOUR_BUCKET_NAME |
| 311 | +│ ├── train.tsv |
| 312 | +│ ├── val.tsv |
| 313 | +│ ├── test.tsv |
| 314 | +``` |
| 315 | +To call this API do: |
| 316 | +```bash |
| 317 | +curl -X POST \ |
| 318 | + http://127.0.0.1:3031/sentimentV1/testbert \ |
| 319 | + -H 'Cache-Control: no-cache' \ |
| 320 | + -H 'Postman-Token: 4e90e1d6-de18-4501-a82c-f8a878616b12' \ |
| 321 | + -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \ |
| 322 | + -F test_bucket_name=YOUR_BUCKET_NAME \ |
| 323 | + -F test_bucket_prefix=YOUR_MODEL_NAME |
| 324 | +``` |
| 325 | +At the end of prediction a file named 'test_results.csv' will be uploaded to the same S3 bucket. |
0 commit comments