Skip to content

Linecount for limit and double-quotes #1160

@luke-kiernan

Description

@luke-kiernan

I'm using limit and skipto to parse part of a larger file. I get correct results for small files, but for a large file, the read starts one line later than it should:

big_file_kwargs = Dict(:skipto => 15, :limit => 2000)
small_file_kwargs = Dict(:skipto => 21, :limit => 5)
common_kwargs = Dict(:header => false, :ntasks => 1, :delim => '\t')
small_filepath = "case5.m"
big_filepath = "ACTIVSg2000.m"

using CSV
using DataFrames

# correct
df1 = DataFrame(CSV.File(small_filepath;
                        common_kwargs...,
                        small_file_kwargs...))
display(df1[1, :])
for (i, line) in enumerate(eachline(small_filepath))
    if i == small_file_kwargs[:skipto]
        println("first row of dataframe should be: $line")
    end
end
# incorrect: starts at line 16, not line 15.
df2 = DataFrame(CSV.File(big_filepath;
                        common_kwargs...,
                        big_file_kwargs...))
display(df2[1, :])
for (i, line) in enumerate(eachline(big_filepath))
    if i == big_file_kwargs[:skipto]
        println("first row of dataframe should be: $line")
    end
end

The inputs I'm using can be found here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions