Skip to content

Poor Header Detection #57

@sohamyedgaonkar

Description

@sohamyedgaonkar

Table :

Image

The CSV extracted has :
Date,Particulars \nBrought Forward,Vch Type,Vch No.,"Debit \n40,26,47,921.52","Credit \n47,30,75,585.09"
31-12-2023,Dr Material ( Local ) - MFG,Purchase,PV/23-24/4481,,"4,68,525.56"
,Dr Material ( Local ) - MFG,Purchase,PV/23-24/4482,,"5,35,388.72"
,Dr Material ( Local ) - MFG,Purchase,PV/23-24/4483,,"4,68,525.56"

The problem is with identifying where the headers stop
usually my usecase dealing with single row headers
currently i am using

cropped_output.pdf

Configuration :
config_hdr = AutoFormatConfig() # config may be passed like so
config_hdr.verbosity = 3
config_hdr.enable_multi_header = False
config_hdr.semantic_spanning_cells = False # [Experimental] Merge headers
config_hdr.large_table_if_n_rows_removed = 0
formatter = AutoTableFormatter(config=config_hdr)

Metadata

Metadata

Assignees

No one assigned

    Labels

    structure accuracyissue related to recognizing table structure ("format")

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions