Skip to content

Commit dfa17bd

Browse files
authored
fix: hi_res PDF parsing: only uncategorized text for extracted elements (#3975)
1 parent 8fc4181 commit dfa17bd

17 files changed

Lines changed: 171 additions & 167 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
## 0.17.6-dev0
1+
## 0.17.6-dev1
22

33
### Enhancements
44

55
### Features
66

77
### Fixes
8+
- **Do not use NLP to determine element types for extracted elements with hi_res.** This avoids extraneous Title elements in hi_res outputs. This only applies to *extracted* elements, meaning text objects that are found outside of Object Detection objects which get mapped to *inferred* elements. (*extracted* and *inferred* elements get merged together to form the list of `Element`s returned by `pdf_partition()`)
89

910
## 0.17.5
1011

test_unstructured/partition/pdf_image/test_pdf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -823,8 +823,8 @@ def test_partition_categorization_backup():
823823
example_doc_path("pdf/layout-parser-paper-fast.pdf"),
824824
strategy=PartitionStrategy.HI_RES,
825825
)
826-
# Should have changed the element class from Text to Title
827-
assert isinstance(elements[0], Title)
826+
# Should NOT have changed the element class from Text to Title
827+
assert isinstance(elements[0], Text)
828828
assert elements[0].text == text
829829

830830

test_unstructured/partition/test_msg.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ def test_partition_msg_can_process_attachments():
141141
"Text",
142142
"Text",
143143
"Image",
144-
"Title",
144+
"Text",
145145
"Text",
146146
"Title",
147147
"Title",

test_unstructured_ingest/expected-structured-output-html/biomed-api/65/11/main.PMC6312790.pdf.html

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@
1414
<p class="NarrativeText" id="0ac27467f42b45650b5bf093d76055d6">
1515
Contents lists available at ScienceDirect
1616
</p>
17-
<h1 class="Title" id="2a9146ee3c09e107d11693c7b6e4725c">
17+
<p class="UncategorizedText" id="2a9146ee3c09e107d11693c7b6e4725c">
1818
Data in Brief
19-
</h1>
19+
</p>
2020
<p class="NarrativeText" id="97e80c6e7dc2754c9083b263ff65039e">
2121
journal homepage: www.elsevier.com/locate/dib
2222
</p>
@@ -28,19 +28,19 @@ <h1 class="Title" id="2b6774c69a58146cac55073356d3e265">
2828
Data on environmental sustainable corrosion inhibitor for stainless steel in aggressive environment
2929
</h1>
3030
<img alt="" class="Image" id="151a01a072f2b18a3fda459fd6e71d79"/>
31-
<h1 class="Title" id="5781f22b6e47a24d5f7847e6a4720677">
31+
<p class="UncategorizedText" id="5781f22b6e47a24d5f7847e6a4720677">
3232
(Jee
33-
</h1>
33+
</p>
3434
<h1 class="Title" id="bddd1cbc864e9b44cc0715a1cccf8dbc">
3535
Omotayo Sanni n, Abimbola Patricia I. Popoola
3636
</h1>
3737
<p class="NarrativeText" id="589a383c831226a006e06ae55dba9b55">
3838
Department of Chemical, Metallurgical and Materials Engineering, Tshwane University of Technology, Pretoria, South Africa
3939
</p>
40-
<p class="NarrativeText" id="658c1a75d44888e4fe434dc3daf48818">
40+
<p class="UncategorizedText" id="658c1a75d44888e4fe434dc3daf48818">
4141
a r t i c l e i n f o
4242
</p>
43-
<p class="NarrativeText" id="b9e48f235de5b531427187eb6ea135fe">
43+
<p class="UncategorizedText" id="b9e48f235de5b531427187eb6ea135fe">
4444
a b s t r a c t
4545
</p>
4646
<h1 class="Title" id="911bfead9b546998812e2d1d615ecc87">
@@ -88,19 +88,19 @@ <h1 class="Title" id="c6936b1417c38444f77baf72b2ff53dd">
8888
<h1 class="Title" id="13fd694e1ff862d163b840a246964e58">
8989
Value of the data
9090
</h1>
91-
<p class="NarrativeText" id="5f1c4074c1b5d641b724b99be6f5ddfd">
91+
<p class="UncategorizedText" id="5f1c4074c1b5d641b724b99be6f5ddfd">
9292
© Data presented here provide optimum conditions of waste material as inhibitor for stainless steel
9393
</p>
9494
<li class="ListItem" id="afed004de4c50d761640b6c18729a988">
9595
Type 316 in 0.5 M H2SO4 medium. The given data describe the inhibitive performance of eco-friendly egg shell powder on austenitic stainless steel Type 316 corrosion in sulphuric acid environment.
9696
</li>
97-
<p class="NarrativeText" id="f93d89ccb971e2b60f44afbf710673c6">
97+
<p class="UncategorizedText" id="f93d89ccb971e2b60f44afbf710673c6">
9898
© The data obtained for the inhibition of waste product (egg shell powder) on stainless steel Type 316
9999
</p>
100100
<li class="ListItem" id="cb6e8acb9c24820b59f8973cc236ef35">
101101
can be used as basis in determining the inhibitive performance of the same inhibitor in other environments.
102102
</li>
103-
<p class="NarrativeText" id="5964ede27be8850de7a13e0dd32c1b21">
103+
<p class="UncategorizedText" id="5964ede27be8850de7a13e0dd32c1b21">
104104
© The data can be used to examine the relationship between the process variable as it affect the
105105
</p>
106106
<li class="ListItem" id="e1f7e635d8739a97d8d0000ba8004f61">
@@ -152,9 +152,9 @@ <h1 class="Title" id="0a05f8568758bcff4e2912e0fd11eb02">
152152
<table class="Table" id="7e0388ec6fd4ec451d96232e30d41e7c" style="border: 1px solid black; border-collapse: collapse;">
153153
Inhibitor be (V/dec) ba (V/dec) Ecorr (V) icorr (A/cm?) Polarization Corrosion concentration (g) resistance (Q) rate (mm/year) oO 0.0335 0.0409 —0.9393 0.0003 24.0910 2.8163 2 1.9460 0.0596 —0.8276 0.0002 121.440 1.5054 4 0.0163 0.2369 —0.8825 0.0001 42.121 0.9476 6 0.3233 0.0540 —0.8027 5.39E-05 373.180 0.4318 8 0.1240 0.0556 —0.5896 5.46E-05 305.650 0.3772 10 0.0382 0.0086 —0.5356 1.24E-05 246.080 0.0919
154154
</table>
155-
<h1 class="Title" id="d61e56d1a4c761ad3c69f4b970ba4f3c">
155+
<p class="UncategorizedText" id="d61e56d1a4c761ad3c69f4b970ba4f3c">
156156
rate (mm/year)
157-
</h1>
157+
</p>
158158
<p class="NarrativeText" id="3a5534c2aafc2d8a4c0b65d530d00ab3">
159159
The plot of inhibitor concentration over degree of surface coverage versus inhibitor concentration gives a straight line as shown in Fig. 5. The strong correlation reveals that egg shell adsorption on stainless surface in 0.5 M H2SO4 follow Langmuir adsorption isotherm. Figs. 6–8 show the SEM/EDX surface morphology analysis of stainless steel. Figs. 7 and 8 are the SEM/EDX images of the stainless steel specimens without and with inhibitor after weight loss experiment in sulphuric acid medium. The stainless steel surface corrosion product layer in the absence of inhibitor was porous and as a result gives no corrosion protection. With the presence of ES, corrosion damage was minimized, with an evidence of ES present on the metal surface as shown in Fig. 8.
160160
</p>
@@ -232,12 +232,12 @@ <h1 class="Title" id="bffefa92b06bc6009f81965d3dadc0ce">
232232
<p class="NarrativeText" id="25833fe4955e01b455cf77d0cfd7d71f">
233233
The potentiodynamic polarization method was performed on the prepared test samples immersed in 0.5 M H2SO4 solution in the presence and absence of different ES concentrations. A three electrode system was used; stainless steel Type 316 plate as working electrode with an exposed area of 1.0 cm2, platinum rod as counter electrode and silver chloride electrode as reference electrode. The electrode was polished, degreased in acetone and thoroughly rinsed with distilled water before the experiment. Current density against applied potential was plotted. The slope of the linear part in anodic and cathodic plots gives anodic and cathodic constants according to the Stern–Geary equation, and the
234234
</p>
235-
<h1 class="Title" id="57906367eca399b52f7eecbf78345bf4">
235+
<p class="UncategorizedText" id="57906367eca399b52f7eecbf78345bf4">
236236
ð2Þ
237-
</h1>
238-
<h1 class="Title" id="cff55ae1916232dbda5239f59c897cb9">
237+
</p>
238+
<p class="UncategorizedText" id="cff55ae1916232dbda5239f59c897cb9">
239239
ð3Þ
240-
</h1>
240+
</p>
241241
<div class="Header" id="e40c3ee561b10ca5b7a76900c8d5b263">
242242
O. Sanni, A.P.I. Popoola / Data in Brief 22 (2019) 451–457
243243
</div>

test_unstructured_ingest/expected-structured-output-html/biomed-api/75/29/main.PMC6312793.pdf.html

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@
1414
<p class="NarrativeText" id="fefc7aa600d4266a6cca6d017bc77306">
1515
Contents lists available at ScienceDirect
1616
</p>
17-
<h1 class="Title" id="6e552bae24f7a412e4b5764d0428a5eb">
17+
<p class="UncategorizedText" id="6e552bae24f7a412e4b5764d0428a5eb">
1818
Data in Brief
19-
</h1>
19+
</p>
2020
<p class="NarrativeText" id="c1b3d4f53698b892fcc23fc10a72e6fb">
2121
journal homepage: www.elsevier.com/locate/dib
2222
</p>
@@ -28,9 +28,9 @@ <h1 class="Title" id="bc3a10c851cc6305a52a1bc8d8cf785c">
2828
A benchmark dataset for the multiple depot vehicle scheduling problem
2929
</h1>
3030
<img alt="" class="Image" id="3934d1d731466b344854fc9932fd9e3d"/>
31-
<h1 class="Title" id="cb34109c5030876248f9a9bbdd65093f">
31+
<p class="UncategorizedText" id="cb34109c5030876248f9a9bbdd65093f">
3232
(eee
33-
</h1>
33+
</p>
3434
<p class="NarrativeText" id="0cda4eb20070fdf01ec0d47b2a550241">
3535
Sarang Kulkarni a,b,c,n, Mohan Krishnamoorthy d,e, Abhiram Ranade f, Andreas T. Ernst c, Rahul Patil b
3636
</p>
@@ -52,16 +52,16 @@ <h1 class="Title" id="cb34109c5030876248f9a9bbdd65093f">
5252
<p class="UncategorizedText" id="03b4116b32ee9de3beea142b52694b19">
5353
e School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072,
5454
</p>
55-
<h1 class="Title" id="bfcbabb9ed9169f6a4be19576064f702">
55+
<p class="UncategorizedText" id="bfcbabb9ed9169f6a4be19576064f702">
5656
Australia
57-
</h1>
57+
</p>
5858
<p class="NarrativeText" id="85875ebbc1de554e92edc54674add1d5">
5959
f Department of Computer Science and Engineering, IIT Bombay, Powai, Mumbai 400076, India
6060
</p>
61-
<p class="NarrativeText" id="f9f33fff8fbb981301df3055b60e12c7">
61+
<p class="UncategorizedText" id="f9f33fff8fbb981301df3055b60e12c7">
6262
a r t i c l e i n f o
6363
</p>
64-
<p class="NarrativeText" id="4f3f69dd17ddae776c656ec73d9837ae">
64+
<p class="UncategorizedText" id="4f3f69dd17ddae776c656ec73d9837ae">
6565
a b s t r a c t
6666
</p>
6767
<p class="NarrativeText" id="34522460857b10c63d8c2c8d2fbb3087">
@@ -106,13 +106,13 @@ <h1 class="Title" id="05334542b26bb9988adc1abd9a371496">
106106
<li class="ListItem" id="26ac34f98623dc94e0854dc5e841d4e4">
107107
© The data provide all the information that is required to model the MDVSP by using the existing mathematical formulations.
108108
</li>
109-
<p class="NarrativeText" id="79e2a2e0c24e1e8befe2b6beb2f1df64">
109+
<p class="UncategorizedText" id="79e2a2e0c24e1e8befe2b6beb2f1df64">
110110
e All the problem instances are available for use without any restrictions.
111111
</p>
112112
<li class="ListItem" id="d401597b8ff2854bfb89f2833d02a763">
113113
e The benchmark solutions and solution time for the problem instances are presented in [3] and can be used for the comparison.
114114
</li>
115-
<p class="NarrativeText" id="c1cff3abe7c7915accab35910df1c5cd">
115+
<p class="UncategorizedText" id="c1cff3abe7c7915accab35910df1c5cd">
116116
© The dataset includes a program that can generate similar problem instances of different sizes.
117117
</p>
118118
<h1 class="Title" id="fb765d6762e6a423cb8b9dab27359732">
@@ -121,9 +121,9 @@ <h1 class="Title" id="fb765d6762e6a423cb8b9dab27359732">
121121
<p class="NarrativeText" id="1f3d79f338b86fbfcfa7054f11de28f0">
122122
The dataset contains 60 different problem instances of the multiple depot vehicle scheduling pro- blem (MDVSP). Each problem instance is provided in a separate file. Each file is named as ‘RN-m-n-k.dat’, where ‘m’, ‘n’, and ‘k’ denote the number of depots, the number of trips, and the instance number for the size, ‘ðm;nÞ’, respectively. For example, the problem instance, ‘RN-8–1500-01.dat’, is the first problem instance with 8 depots and 1500 trips. For the number of depots, m, we used three values, 8,12, and 16. The four values for the number of trips, n, are 1500, 2000, 2500, and 3000. For each size, ðm;nÞ, five instances are provided. The dataset can be downloaded from https://orlib.uqcloud.net. For each problem instance, the following information is provided:
123123
</p>
124-
<h1 class="Title" id="fc547df12bfc22e91a0b5927670caa78">
124+
<p class="UncategorizedText" id="fc547df12bfc22e91a0b5927670caa78">
125125
The number of depots mð
126-
</h1>
126+
</p>
127127
<p class="UncategorizedText" id="320f6d28582c354d35673c2a4119851f">
128128
Þ,
129129
</p>
@@ -187,9 +187,9 @@ <h1 class="Title" id="0db20c23a12e1b6eadee6eb8aecc17d8">
187187
<table class="Table" id="63de709cd751564fc9622864af4e9310" style="border: 1px solid black; border-collapse: collapse;">
188188
Instance size (m, n) Average number of Locations Times Vehicles (8, 1500) 568.40 975.20 652.20 668,279.40 (8, 2000) 672.80 1048.00 857.20 1,195,844.80 (8, 2500) 923.40 1078.00 1082.40 1,866,175.20 (8, 3000) 977.00 1113.20 1272.80 2,705,617.00 (12, 1500) 566.00 994.00 642.00 674,191.00 (12, 2000) 732.60 1040.60 861.20 1,199,659.80 (12, 2500) 875.00 1081.00 1096.00 1,878,745.20 (12, 3000) 1119.60 1107.40 1286.20 2,711,180.40 (16, 1500) 581.80 985.40 667.80 673,585.80 (16, 2000) 778.00 1040.60 872.40 1,200,560.80 (16, 2500) 879.00 1083.20 1076.40 1,879,387.00 (16, 3000) 1087.20 1101.60 1284.60 2,684,983.60
189189
</table>
190-
<h1 class="Title" id="ec04cd3d411eed35515b3ea80ebac5af">
190+
<p class="UncategorizedText" id="ec04cd3d411eed35515b3ea80ebac5af">
191191
Possible empty travels
192-
</h1>
192+
</p>
193193
<div class="Header" id="fa23407a7c3c99ae3b6fb79034698807">
194194
S. Kulkarni et al. / Data in Brief 22 (2019) 484–487
195195
</div>

test_unstructured_ingest/expected-structured-output-html/biomed-path/07/07/sbaa031.073.PMC7234218.pdf.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,8 +76,8 @@ <h1 class="Title" id="32e2d561158fd20a749e2329cb9d94dc">
7676
<p class="NarrativeText" id="ad58a94e747d9fe18e2550e58c54f6bc">
7777
Camila Loureiro*1, Corsi-Zuelli Fabiana1, Fachim Helene Aparecida1, Shuhama Rosana1, Menezes Paulo Rossi1, Dalton Caroline F2,
7878
</p>
79-
<h1 class="Title" id="6a0290d48528f40c9c2288fddff94e3e">
79+
<p class="UncategorizedText" id="6a0290d48528f40c9c2288fddff94e3e">
8080
AQ3
81-
</h1>
81+
</p>
8282
</body>
8383
</html>

test_unstructured_ingest/expected-structured-output-html/google-drive/recalibrating-risk-report.pdf.html

Lines changed: 43 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ <h1 class="Title" id="614b7a52d42e8e3b66edf4943093c85c">
1111
WORLD ASSOCIATION
1212
</h1>
1313
<img alt="" class="Image" id="4ab4d4df6aeb3d4fb6d8102edd876ab8"/>
14-
<p class="NarrativeText" id="7137c1e14141fad3ad306fe68918a967">
14+
<p class="UncategorizedText" id="7137c1e14141fad3ad306fe68918a967">
1515
Recalibrating risk
1616
</p>
1717
<p class="NarrativeText" id="dbdc2d6c6381e4fa1c7b8058bf86abef">
@@ -89,69 +89,69 @@ <h1 class="Title" id="bf248ce5194cc4686f97a2769cd9744a">
8989
In terms of accidents, hydropower is the deadliest electricity generator, mostly due to collapsing dams and the consequences of flooding. The Banqiao Dam failure in 1975 led to at least 26,000 people drowning, and as many as 150,000 deaths resulting from the secondary effects of the accident. In comparison, radiation exposure following Chernobyl caused 54 deaths2, while no casualties due to radiation are likely to occur from the accident at Fukushima Daiichi.
9090
</p>
9191
<img alt="25 24.6 20 18.4 e 15 10 5 4.6 2.8 0 Coal Oil Bio m ass Natural gas 0.07 Wind 0.04 Hydropower 0.02 Solar 0.01 Nuclear " class="Image" id="c0a86e51afb417a3b057d7cf101bbed6"/>
92-
<h1 class="Title" id="a8706e82b3f90cffc996a24348e3b670">
92+
<p class="UncategorizedText" id="a8706e82b3f90cffc996a24348e3b670">
9393
r
94-
</h1>
95-
<h1 class="Title" id="da631c23500655c51b9311a61f55744f">
94+
</p>
95+
<p class="UncategorizedText" id="da631c23500655c51b9311a61f55744f">
9696
a
97-
</h1>
98-
<h1 class="Title" id="d78a11e9e55235934c3a4922053c68e5">
97+
</p>
98+
<p class="UncategorizedText" id="d78a11e9e55235934c3a4922053c68e5">
9999
e
100-
</h1>
101-
<h1 class="Title" id="8d14df8b7fd7744365fbf8e02d69415a">
100+
</p>
101+
<p class="UncategorizedText" id="8d14df8b7fd7744365fbf8e02d69415a">
102102
y
103-
</h1>
104-
<h1 class="Title" id="f4df01bee1b8ffb973ac8539649c5189">
103+
</p>
104+
<p class="UncategorizedText" id="f4df01bee1b8ffb973ac8539649c5189">
105105
W
106-
</h1>
107-
<h1 class="Title" id="b733cf49de269e22bed7c9883b958669">
106+
</p>
107+
<p class="UncategorizedText" id="b733cf49de269e22bed7c9883b958669">
108108
T
109-
</h1>
110-
<h1 class="Title" id="c4b47d788b26c3d5c62ad462ed3ca2db">
109+
</p>
110+
<p class="UncategorizedText" id="c4b47d788b26c3d5c62ad462ed3ca2db">
111111
r
112-
</h1>
113-
<h1 class="Title" id="bff4435574259239761670b31432cc8a">
112+
</p>
113+
<p class="UncategorizedText" id="bff4435574259239761670b31432cc8a">
114114
e
115-
</h1>
116-
<h1 class="Title" id="8ba15a3a71eb0bb689c582098cce6730">
115+
</p>
116+
<p class="UncategorizedText" id="8ba15a3a71eb0bb689c582098cce6730">
117117
p
118-
</h1>
119-
<h1 class="Title" id="5fde097ba00ad7647206ae11c721d28c">
118+
</p>
119+
<p class="UncategorizedText" id="5fde097ba00ad7647206ae11c721d28c">
120120
s
121-
</h1>
121+
</p>
122122
<p class="UncategorizedText" id="81331ee9da4145c2651d6483696fe966">
123123
8
124124
</p>
125-
<h1 class="Title" id="81f1f3b9da6df38d938bf7871fa069b5">
125+
<p class="UncategorizedText" id="81f1f3b9da6df38d938bf7871fa069b5">
126126
e
127-
</h1>
128-
<h1 class="Title" id="aa4a79651a9a0087b66fcc40a2213113">
127+
</p>
128+
<p class="UncategorizedText" id="aa4a79651a9a0087b66fcc40a2213113">
129129
i
130-
</h1>
131-
<h1 class="Title" id="6d1c0d05d3a424b43d9572188a76c2d4">
130+
</p>
131+
<p class="UncategorizedText" id="6d1c0d05d3a424b43d9572188a76c2d4">
132132
t
133-
</h1>
134-
<h1 class="Title" id="392a17b2f3eba46f4bcf078e0b204514">
133+
</p>
134+
<p class="UncategorizedText" id="392a17b2f3eba46f4bcf078e0b204514">
135135
i
136-
</h1>
137-
<h1 class="Title" id="d24a9a771e46fdd6b269f1ecaf0b5eec">
136+
</p>
137+
<p class="UncategorizedText" id="d24a9a771e46fdd6b269f1ecaf0b5eec">
138138
l
139-
</h1>
140-
<h1 class="Title" id="9dc4537afa8ae0b959a542f9ba5c1e03">
139+
</p>
140+
<p class="UncategorizedText" id="9dc4537afa8ae0b959a542f9ba5c1e03">
141141
S
142-
</h1>
143-
<h1 class="Title" id="919dac2487a4c860747318a132a54a72">
142+
</p>
143+
<p class="UncategorizedText" id="919dac2487a4c860747318a132a54a72">
144144
a
145-
</h1>
146-
<h1 class="Title" id="04ee5d05c3fcfffd945762e803478600">
145+
</p>
146+
<p class="UncategorizedText" id="04ee5d05c3fcfffd945762e803478600">
147147
t
148-
</h1>
149-
<h1 class="Title" id="63dabde368e2cf310d20a885fe50314a">
148+
</p>
149+
<p class="UncategorizedText" id="63dabde368e2cf310d20a885fe50314a">
150150
a
151-
</h1>
152-
<h1 class="Title" id="796538927664e4d87312c428469428f5">
151+
</p>
152+
<p class="UncategorizedText" id="796538927664e4d87312c428469428f5">
153153
F
154-
</h1>
154+
</p>
155155
<p class="FigureCaption" id="d1496d2dc28f6877646e280c0c47e9ab">
156156
Figure 3. Comparison of number of fatalities due to electricity generation, including accidents and air pollution3
157157
</p>
@@ -251,9 +251,9 @@ <h1 class="Title" id="3d819f053bf67ec228cf8c23aca02ac7">
251251
<li class="ListItem" id="59f05d231c2357ab111ee31b0da3c25d">
252252
World Health Organization (2020). Road traffic injuries. Available at: https://www.who.int/news-room/fact-sheets/ detail/road-traffic-injuries
253253
</li>
254-
<h1 class="Title" id="a95a2add68d668b944cc332c88ea721e">
254+
<p class="UncategorizedText" id="a95a2add68d668b944cc332c88ea721e">
255255
i
256-
</h1>
256+
</p>
257257
<li class="ListItem" id="2ab37467d413d491735b002a679afdb8">
258258
ii BBC (2020). Plane crash fatalities fell more than 50% in 2019. Available at: https://www.bbc.co.uk/news/ business-50953712
259259
</li>

test_unstructured_ingest/expected-structured-output-html/local-single-file-with-pdf-infer-table-structure/layout-parser-paper-with-table.jpg.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,9 @@ <h1 class="Title" id="a98721b4c18e53da7ee4e38512d91480">
114114
<li class="ListItem" id="04d62ad595016d7b490dff67a00b9f35">
115115
import layoutparser as lp
116116
</li>
117-
<h1 class="Title" id="9d40bf1b2e2af1692f5689a1c44ab2ae">
117+
<p class="UncategorizedText" id="9d40bf1b2e2af1692f5689a1c44ab2ae">
118118
wwe
119-
</h1>
119+
</p>
120120
<li class="ListItem" id="cafbdebf75706654ed769cd9785e8697">
121121
image = cv2.imread("image_file") # load images
122122
</li>

0 commit comments

Comments
 (0)