diff --git a/CHANGELOG.md b/CHANGELOG.md index ad3afdfc3e..baa69aae9f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,10 +1,11 @@ -## 0.17.6-dev0 +## 0.17.6-dev1 ### Enhancements ### Features ### Fixes +- **Do not use NLP to determine element types for extracted elements with hi_res.** This avoids extraneous Title elements in hi_res outputs. This only applies to *extracted* elements, meaning text objects that are found outside of Object Detection objects which get mapped to *inferred* elements. (*extracted* and *inferred* elements get merged together to form the list of `Element`s returned by `pdf_partition()`) ## 0.17.5 diff --git a/test_unstructured/partition/pdf_image/test_pdf.py b/test_unstructured/partition/pdf_image/test_pdf.py index 6d1145eb80..7a0c8ff29c 100644 --- a/test_unstructured/partition/pdf_image/test_pdf.py +++ b/test_unstructured/partition/pdf_image/test_pdf.py @@ -823,8 +823,8 @@ def test_partition_categorization_backup(): example_doc_path("pdf/layout-parser-paper-fast.pdf"), strategy=PartitionStrategy.HI_RES, ) - # Should have changed the element class from Text to Title - assert isinstance(elements[0], Title) + # Should NOT have changed the element class from Text to Title + assert isinstance(elements[0], Text) assert elements[0].text == text diff --git a/test_unstructured/partition/test_msg.py b/test_unstructured/partition/test_msg.py index d1d66876ed..94b12d5578 100644 --- a/test_unstructured/partition/test_msg.py +++ b/test_unstructured/partition/test_msg.py @@ -141,7 +141,7 @@ def test_partition_msg_can_process_attachments(): "Text", "Text", "Image", - "Title", + "Text", "Text", "Title", "Title", diff --git a/test_unstructured_ingest/expected-structured-output-html/biomed-api/65/11/main.PMC6312790.pdf.html b/test_unstructured_ingest/expected-structured-output-html/biomed-api/65/11/main.PMC6312790.pdf.html index a55cccdbbd..210109c06e 100644 --- a/test_unstructured_ingest/expected-structured-output-html/biomed-api/65/11/main.PMC6312790.pdf.html +++ b/test_unstructured_ingest/expected-structured-output-html/biomed-api/65/11/main.PMC6312790.pdf.html @@ -14,9 +14,9 @@
Contents lists available at ScienceDirect
-Data in Brief - +
journal homepage: www.elsevier.com/locate/dib
@@ -28,19 +28,19 @@(Jee - +
Department of Chemical, Metallurgical and Materials Engineering, Tshwane University of Technology, Pretoria, South Africa
-+
a r t i c l e i n f o
-+
a b s t r a c t
+
© Data presented here provide optimum conditions of waste material as inhibitor for stainless steel
+
© The data obtained for the inhibition of waste product (egg shell powder) on stainless steel Type 316
+
© The data can be used to examine the relationship between the process variable as it affect the
rate (mm/year) - +
The plot of inhibitor concentration over degree of surface coverage versus inhibitor concentration gives a straight line as shown in Fig. 5. The strong correlation reveals that egg shell adsorption on stainless surface in 0.5 M H2SO4 follow Langmuir adsorption isotherm. Figs. 6–8 show the SEM/EDX surface morphology analysis of stainless steel. Figs. 7 and 8 are the SEM/EDX images of the stainless steel specimens without and with inhibitor after weight loss experiment in sulphuric acid medium. The stainless steel surface corrosion product layer in the absence of inhibitor was porous and as a result gives no corrosion protection. With the presence of ES, corrosion damage was minimized, with an evidence of ES present on the metal surface as shown in Fig. 8.
@@ -232,12 +232,12 @@The potentiodynamic polarization method was performed on the prepared test samples immersed in 0.5 M H2SO4 solution in the presence and absence of different ES concentrations. A three electrode system was used; stainless steel Type 316 plate as working electrode with an exposed area of 1.0 cm2, platinum rod as counter electrode and silver chloride electrode as reference electrode. The electrode was polished, degreased in acetone and thoroughly rinsed with distilled water before the experiment. Current density against applied potential was plotted. The slope of the linear part in anodic and cathodic plots gives anodic and cathodic constants according to the Stern–Geary equation, and the
-ð2Þ - -
ð3Þ - +
Contents lists available at ScienceDirect
-Data in Brief - +
journal homepage: www.elsevier.com/locate/dib
@@ -28,9 +28,9 @@(eee - +
Sarang Kulkarni a,b,c,n, Mohan Krishnamoorthy d,e, Abhiram Ranade f, Andreas T. Ernst c, Rahul Patil b
@@ -52,16 +52,16 @@e School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072,
-Australia - +
f Department of Computer Science and Engineering, IIT Bombay, Powai, Mumbai 400076, India
-+
a r t i c l e i n f o
-+
a b s t r a c t
@@ -106,13 +106,13 @@
+
e All the problem instances are available for use without any restrictions.
+
© The dataset includes a program that can generate similar problem instances of different sizes.
The dataset contains 60 different problem instances of the multiple depot vehicle scheduling pro- blem (MDVSP). Each problem instance is provided in a separate file. Each file is named as ‘RN-m-n-k.dat’, where ‘m’, ‘n’, and ‘k’ denote the number of depots, the number of trips, and the instance number for the size, ‘ðm;nÞ’, respectively. For example, the problem instance, ‘RN-8–1500-01.dat’, is the first problem instance with 8 depots and 1500 trips. For the number of depots, m, we used three values, 8,12, and 16. The four values for the number of trips, n, are 1500, 2000, 2500, and 3000. For each size, ðm;nÞ, five instances are provided. The dataset can be downloaded from https://orlib.uqcloud.net. For each problem instance, the following information is provided:
-The number of depots mð - +
Þ,
@@ -187,9 +187,9 @@Possible empty travels - +
Camila Loureiro*1, Corsi-Zuelli Fabiana1, Fachim Helene Aparecida1, Shuhama Rosana1, Menezes Paulo Rossi1, Dalton Caroline F2,
-AQ3 - +