Skip to content

Commit ce493b6

Browse files
authored
Update README.md
fixing typos
1 parent 5e933c3 commit ce493b6

1 file changed

Lines changed: 8 additions & 7 deletions

File tree

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
DOCX files are complex, and their complexity makes scraping documents
44
for their content difficult. The aim of this package is to simplify
5-
`.docx` files to just the components which carry meaning thereby easing the
6-
process of document identification and scraping by converting a `.docx`
7-
file into a predictable an *human readable* JSON file.
5+
`.docx` files to just the components which carry meaning, thereby easing the
6+
process of pattern matching and data extraction by converting a `.docx`
7+
file into a predictable and *human readable* JSON file.
88

99
Simplifying a complex document down to it's *meaningful* parts of course
1010
requires taking a position on what does and does-not convey meaning in a
@@ -43,9 +43,13 @@ etc.), you'll need to clone [this fork](https://github.com/jdthorpe/python-docx)
4343

4444
### General
4545

46-
* **"friendly-names"**: (*Default = `True`*): Use user-friendly type names
46+
* **"friendly-name"**: (*Default = `True`*): Use user-friendly type names
4747
such as "table-cell", over standard element names like "CT_Tc"
4848

49+
* **"merge-consecutive-text"**: (*Default = `True`*): Sentences and even single
50+
words can be represented by multiple text elements. If `True`,
51+
concatenate consecutive text elements into a single text element.
52+
4953
### Ignoring Invisible things
5054

5155
* **"ignore-empty-paragraphs"**: (*Default = `True`*): Empty paragraphs are
@@ -147,9 +151,6 @@ often used to divide sections of a document into logical components.
147151

148152
### Special content
149153

150-
* **"merge-consecutive-text"**: (*Default = `True`*): Sentences and even single
151-
words can be represented by multiple text elements. If `True`,
152-
concatenate consecutive text elements into a single text element.
153154
* **"flatten-hyperlink"**: (*Default = `True`*): Flatten hyperlinks, including
154155
their contents in the flow of normal text.
155156
* **"flatten-smartTag"**: (*Default = `True`*): Flatten smartTag elements,

0 commit comments

Comments
 (0)