Skip to content

Add sentence tokenization to process longer texts.#71

Open
askonivala wants to merge 1 commit into
kamalkraj:devfrom
askonivala:dev
Open

Add sentence tokenization to process longer texts.#71
askonivala wants to merge 1 commit into
kamalkraj:devfrom
askonivala:dev

Conversation

@askonivala
Copy link
Copy Markdown

The supported sequence length of BERT is up to 512 tokens. Adding a simple sentence tokenization to API would enable users to process longer texts.

@tanmayag78
Copy link
Copy Markdown

Any other way to handle longer texts as time complexity is higher and it will be inefficient while handling huge text. Like Mitie Ner and Stanford Ner are more efficient for handling longer texts though not as accurate as BERT-NER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants