Skip to content
This repository was archived by the owner on Jul 4, 2023. It is now read-only.
This repository was archived by the owner on Jul 4, 2023. It is now read-only.

Wrong number of classes is derived from label_encoder.vocab_size #116

@guanqun-yang

Description

@guanqun-yang

Behaviors

The following code snippet is directly taken from README.md of the this library (see here). I am expecting the following n_class to be equal to 2 (i.e. there are only two classes [1, 2]) but 3 is returned.

import itertools

import numpy as np

from torchnlp.datasets import imdb_dataset
from torchnlp.encoders.text import WhitespaceEncoder
from torchnlp.encoders import LabelEncoder

from collections import Counter

sentence_corpus = [record["text"] for record in itertools.chain(train, test)]
label_corpus = [record["sentiment"] for record in itertools.chain(train, test)]

sentence_encoder = WhitespaceEncoder(sentence_corpus)
label_encoder = LabelEncoder(label_corpus)

for record in itertools.chain(train, test):
    record["text"] = sentence_encoder.encode(record["text"])
    record["sentiment"] = label_encoder.encode(record["sentiment"])

print(np.unique([record["sentiment"].item() for record in itertools.chain(train, test)]))
# [1 2]

vocab_size = sentence_encoder.vocab_size
n_class = label_encoder.vocab_size

print(vocab_size, n_class)
# 11402 3

Steps to Reproduce the Problem

Directly run the code snippet after pip install pytorch-nlp.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions