Skip to content

The Sequence class in features do not have "dtype" #8002

@gonzalo-santamaria-iic

Description

@gonzalo-santamaria-iic

Describe the bug

I'm not sure if this is a bug.

I see that a FeatureType object contains an attribute called self.dtype that is not covered when this feature is a Sequence or a List.

When I try to run a multilabel classification with this example script from the transformers library:

https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_classification.py#L442

I get this error on the linked line:

AttributeError: 'List' object has no attribute 'dtype'. Did you mean: '_type'?

Looking at the check that the script is attempting to perform, could we perhaps add a self.dtype="list" attribute for this FeatureType 's: Sequence, List, etc.?

Steps to reproduce the bug

For example, this code works for me:

from datasets import ClassLabel, Features, Sequence, Value
features = {'text': Value('string'), 'label': ClassLabel(names=['No', 'Yes'])}
print(features["text"].dtype)
print(features["label"].dtype)
'string'
'int64'

and this code does not work for me:

from datasets import ClassLabel, Features, Sequence, Value
features = {'text': Value('string'), 'label': Sequence(ClassLabel(names=['No', 'Yes']))}
print(features["label"].dtype) # it could be equal to "list"?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'List' object has no attribute 'dtype'. Did you mean: '_type'?

Expected behavior

The attribute dtype equal to "list" when using objects of type Sequence.

from datasets import ClassLabel, Features, Sequence, Value
features = {'text': Value('string'), 'label': Sequence(ClassLabel(names=['No', 'Yes']))}
print(features["label"].dtype)
'list'

Environment info

I have installed datasets==4.5.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions