Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 692 Bytes

File metadata and controls

15 lines (11 loc) · 692 Bytes

BERT_Tokenizer_for_classification:

This repo gives a step by step guide of using BERT Style tokenizer and how it can be used for tasks like sentiment analysis with models like CNN, LSTM etc. BERT has a unique way of tokenizing, and we could leverage similar tokenization technique to feed tokenized data to our traditional models.

Experiment:

We will try to experiement and check out BERT's tokenizer utility. then we will build a 1-D CNN model to see the whole flow. To minimize the data loss due to padding, we will use a batching trick to create batches of sentences with similar length while training.

Please feel free to use similar steps for Glueing with other kind of models.