Skip to content

v1.0 sentence polarity corrupted on download #59

@claravdw

Description

@claravdw

Hi,

I'm not sure this package is still being maintained, but I'm using it for teaching (thanks!). I get the following error when trying to download v1.0 sentence polarity:

> library(textdata)
> d <- dataset_sentence_polarity()
Do you want to download:
 Name: v1.0 sentence polarity 
 URL: http://www.cs.cornell.edu/people/pabo/movie-review-data 
 License: Cite the paper when used. 
 Size: 2 MB (cleaned 1.4 MB) 
 Download mechanism: https 

1: Yes
2: No

Selection: 1
trying URL 'https://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz'
Content type 'application/x-gzip' length 487770 bytes (476 KB)
==================================================
downloaded 476 KB

Error: The size of the connection buffer (131072) was not large enough
to fit a complete line:
  * Increase it by setting `Sys.setenv("VROOM_CONNECTION_SIZE")`

I needed to increase the connection size all the way up to 50000000 to download the data, and as expected, this was because the file was corrupted. Only the first review (row in the dataset) contained any text; the others contained newline characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions