Skip to content

Commit 68917ca

Browse files
committed
readme update
fix various typos
1 parent cee5ff0 commit 68917ca

2 files changed

Lines changed: 88 additions & 78 deletions

File tree

docs/readme.md

Lines changed: 60 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -2,28 +2,28 @@ CSV Lint plug-in documentation
22
==============================
33

44
**CSV Lint** is a plug-in for Notepad++ to work with [comma-separated values](https://en.wikipedia.org/wiki/Comma-separated_values)
5-
(csv) and fixed width data files.
5+
(CSV) and fixed-width data files.
66

77

88
* [CSV Lint plug-in](https://github.com/BdR76/CSVLint/)
99
* [Notepad++ homepage](http://notepad-plus-plus.org/)
1010

1111
![preview screenshot](../csvlint_preview.png?raw=true "CSVLint plug-in preview")
1212

13-
Use the **CSV Lint** plug-in to quickly and easily inspect csv data files,
13+
Use the **CSV Lint** plug-in to quickly and easily inspect CSV data files,
1414
apply syntax highlighting to columns, detect technical errors and fix datetime
1515
and decimal formatting. It's not meant as a replacement for a
1616
[spreadsheet](https://www.reddit.com/r/datascience/comments/1dsnbww/youre_not_helping_excel_please_stop_helping/)
17-
program, but rather it's a quality control tool to examine, verify or polish up
17+
program, but rather as a quality control tool to examine, verify or polish up
1818
a dataset before further processing.
1919

2020
First install and open Notepad++, then go to the menu item `Plugins > Plugins Admin...`,
2121
search for "csv lint", check the checkbox and press Install. This will add
2222
CSV Lint under the `Plugins > CSV Lint` menu item and a CSV Lint icon in the
23-
toolbar icon.
23+
toolbar.
2424

2525
CSV Lint doesn't require an internet connection and doesn't use any cloud service.
26-
All data processing is done offline on the pc that runs Notepad++.
26+
All data processing is done offline on the PC that runs Notepad++.
2727

2828
**If you find the CSV Lint plug-in useful you can buy me a coffee!**
2929
[![paypal](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://www.paypal.com/donate/?hosted_button_id=T8QZSFBNAPERL)
@@ -37,7 +37,7 @@ happen that the beginning of the file displays column colors but at the end of t
3737
file it's still uncolored.
3838

3939
There are four pre-defined color schemes you can select using the
40-
`Highlighting` button on the `Plugins > CSV Lint > Setttings` dialog.
40+
`Highlighting` button on the `Plugins > CSV Lint > Settings` dialog.
4141
At first time startup, the plug-in will select light or darkmode color scheme,
4242
depending on the Dark Mode setting in the Notepad++ `config.xml`.
4343

@@ -81,13 +81,13 @@ displayed. This contains information about the file, which separator is used
8181
and if it has column headers, as well as all the definitions for each column,
8282
such as datatype and width.
8383

84-
The metadata is very important for the other functionality to work. When the
84+
The metadata is very important for the other features to work. When the
8585
content of the file and the metadata get out-of-sync, for example when editing
86-
the data file, the plug-in will detect the columns incorrectly. This can lead
86+
the data file, the plug-in may detect the columns incorrectly. This can lead
8787
to unexpected results when reformatting or validating the data.
8888

89-
The metadata is based on the `schema.ini` format and it is important for the
90-
edit options in the plug-in to work correctly. You can press "Detect columns"
89+
The metadata is based on the `schema.ini` format and is required for the editing
90+
features in the plug-in to work correctly. You can press "Detect columns"
9191
to automatically detect the metadata from the datafile, and/or manually
9292
edit the metadata in the textbox. When manually editing the metadata, always
9393
press the save icon (blue disk) to apply it before continuing.
@@ -96,7 +96,7 @@ press the save icon (blue disk) to apply it before continuing.
9696

9797
Press the "Detect columns" button to auto-detect column types from the
9898
currently active file. The auto-detection function will try to infer the
99-
column separator character and column data types by looking at the data.
99+
column separator character and column datatypes by looking at the data.
100100
When a file is opened and no schema.ini is found then
101101
this auto-detection feature will also run once by default.
102102

@@ -135,10 +135,10 @@ Here you can enter the column separator character or Fixed Width.
135135
When selecting "Fixed Width" you can optionally provide a comma separated
136136
list of the column ending positions in "Column end positions". For example
137137
the text data `2025-10-15HbA1c 123.5` has column end positions `10, 16, 21`.
138-
The plug-in expects the column character positions, but if you instead enter
138+
The plug-in expects column character positions, but if you instead enter
139139
the individual column widths, so `10, 6, 5` in this example, that will also
140140
work in most cases. Leave "Fixed positions" empty and the plug-in will
141-
try to detect fixed width columns same as auto-detect.
141+
try to detect fixed width columns the same way as auto-detect.
142142
Click the `[..]` button to paste the current column widths into the textbox,
143143
either as absolute column positions or right-click for individual column widths.
144144

@@ -159,7 +159,7 @@ in the same folder as the data file. The next time you open the datafile with
159159
the plug-in, it will automatically load the metadata from this file.
160160

161161
The file and column metadata will be saved under a section with the filename.
162-
A `schema.ini` file can contain the meta data for more
162+
A `schema.ini` file can contain the metadata for more
163163
than one data file, using a separate section for each file.
164164

165165
### Toggle syntax highlighting ###
@@ -203,7 +203,7 @@ decimals digits for example `NumberDigits=2` for values like "1.23" or
203203
### Comment lines
204204

205205
Some data files contain comments or documentation about the data.
206-
CSV Lint supports a `SkipLines` option to skip at start of the file, and a `CommentChar` option to skip lines thst start with this character.
206+
CSV Lint supports a `SkipLines` option to skip at start of the file, and a `CommentChar` option to skip lines that start with this character.
207207

208208
Note that the `SkipLines` and `CommentChar` keywords are not officially part of the schema.ini format,
209209
(please upvote [the suggestion here](https://feedback.azure.com/d365community/search/?q=schema.ini)),
@@ -213,7 +213,8 @@ so applications that use the ODBC Text driver will ignore this setting.
213213

214214
Enumeration, or coded values, is when a column may only contain a certain set of values.
215215
For example boolean columns that can only contain `true`/`false` or `Yes`/`No`,
216-
or a variable "TestStage" that can only contain `Warmup`, `Training` or `Recovery`.
216+
or a variable "TestStage" that can only contain `Warmup`, `Training` or `Recovery`,
217+
or a multiple choice variable can only contain `1`, `2`, `3`, `4` or `5`.
217218

218219
This type of column is also not supported by the schema.ini format, but
219220
CSV Lint does support this using the `Enumeration` keyword followed by the
@@ -225,8 +226,11 @@ enumeration items separated by a `|` character, see example below
225226
Col7=TestStage Text Width 8
226227
;Col7=TestStage Enumeration Recovery|Training|Warmup
227228

229+
Col13=Question1 Integer Width 1
230+
;Col13=Question1 Enumeration 1|2|3|4|5
231+
228232
Auto-detect will check the unique values for string or integer columns, if the unique value count is less than or equal to the `UniqueValuesMax` setting, then
229-
it will be treated as a enumeration column, and stored as such in the meta data.
233+
it will be automatically detected and anotated with the `Enumeration` keyword in the metadata.
230234

231235
The plug-in menu options `Convert data` and `Generate metadata` will also export these enumeration metadata, where possible.
232236
When converting the data to an SQL insert statements, the script will also contain column constrains or enum types, depending on the SQL database type.
@@ -262,15 +266,19 @@ or add `00:00:00` as a time part to all values.
262266

263267
### Decimal separator ###
264268

265-
Set the decimal separator for all decimal/float values, select either the dot `.` or the comma `,`.
269+
Set the decimal separator for all decimal/float values,
270+
select either the dot `.` or the comma `,`.
266271

267272
### Replace CrLf within values ###
268273

269-
Replace new-line characters (carriage return / line feed) within quoted values with a given string.
270-
New lines usually indicate the next record in a dataset. However, quoted values may also contain a new line character.
271-
Sometimes this can cause problems and these values aren't processed correctly.
272-
You can use this option to replace the new-lines with for example `<br>` or `\par` or just a space ` `.
273-
The plug-in will only replace the new-line characters within a quoted value, not the new-lines at the end of each record.
274+
Replace new-line characters (carriage return / line feed) within quoted values
275+
with a given string. New lines usually indicate the next record in a dataset.
276+
However, quoted values may also contain a new line character. In some
277+
applications this can cause problems and these values aren't processed
278+
correctly. You can use this option to replace the new-lines with
279+
for example `<br>` or `\par` or just a space ` `.
280+
The plug-in will only replace the new-line characters within a quoted value,
281+
not the new-lines at the end of each record.
274282

275283
### Align vertically ###
276284

@@ -312,16 +320,20 @@ Also see `Plugins > CSV Lint > Settings`.
312320

313321
Validate data
314322
-------------
315-
Validate data based on the meta data. When you press "Validate data",
323+
Validate data based on the metadata. When you press "Validate data",
316324
the input data will be checked for technical errors based on the given metadata.
317325
The line and column numbers of any errors will be logged in the textbox on the
318326
right. It will check the input data for the following errors:
319327

320328
* Values that are too long, example value "abcde" when column is "Width 4"
321329
* Non-numeric values in numeric columns, example value "n/a" when column datatype is Integer
322330
* Incorrect decimal separator, example value "12.34" when DecimalSymbol is set to comma
323-
* Too many decimals, example value "12.345" when NumberDigits=2.
331+
* Too many decimals, example value "12.345" when NumberDigits=2
324332
* Incorrect date format, example value "12/31/2025" when DateTimeFormat=dd/mm/yyyy
333+
* Date value out of range, example value "01/01/2205" when YearMaximum=2050
334+
* Too few or too many columns
335+
* Invalid enumeration code, example value "Cooldown" when Enumeration="Warmup|Training|Recovery"
336+
* Unexpected column names, name found in first header row doesn't match metadata, only when `ColNameHeader=True`
325337

326338
Important note: If you've edited the data file, for example changed the column
327339
separator or added columns using the Split function, make sure to also update
@@ -331,10 +343,10 @@ then saving it.
331343

332344
Sort data
333345
---------
334-
Sort data on a single column, and take into account the data type of the column.
346+
Sort data on a single column, and take into account the datatype of the column.
335347
String text columns will be sorted alphabetically. Integer, decimal and
336348
datetime columns will be sorted according to their respective values.
337-
Note, that the resulting new dataset will have quotes applied according to the
349+
Note, that the resulting dataset will have quotes applied according to the
338350
current `Apply quotes` setting in the Reformat dialog.
339351

340352
![CSV Lint sort data dialog](/docs/csvlint_sort_data.png?raw=true "CSV Lint plug-in sort data dialog")
@@ -400,7 +412,7 @@ value `PT123` into `PT12300`, see other examples below:
400412
### Search and replace ###
401413

402414
Search and replace a string with another string for all values in a column.
403-
Unlike the default "Search and replace" function of Notepad++, this only affect
415+
Unlike the default "Search and replace" function of Notepad++, this only affects
404416
the values in a single column, not the values of any other columns. Note that
405417
this is case-sensitive, for example search for `no` and replace with `False`:
406418

@@ -419,7 +431,7 @@ contains string values like `error` or `N/A`. This option will create two new
419431
columns, one with just the valid values and one containing the invalid values.
420432

421433
As an example, if there is a column VISITDAT and in the metadata it is defined
422-
as a date value formated as `dd-mm-yyyy`, see some example results below:
434+
as a date value formatted as `dd-mm-yyyy`, see some example results below:
423435

424436
| visitdat | visitdat (2) | visitdat (3) |
425437
|------------|--------------|--------------|
@@ -432,7 +444,7 @@ as a date value formated as `dd-mm-yyyy`, see some example results below:
432444
### Split on character ###
433445

434446
Split on a character, split the value on the Nth occurrence of character or
435-
string. For example split on `/` with Nth occurrence `1` will split the orginal
447+
string. For example split on `/` with Nth occurrence `1` will split the original
436448
value `121/84` into `121` and `84`, see examples below:
437449

438450
| bpvalue | bpvalue (2) | bpvalue (3) |
@@ -528,28 +540,29 @@ The amount of each of these 3 values is listed under "Unique values".
528540

529541
Select Columns
530542
--------------
531-
Select columns and/or put columns in a different order.
532-
Select one or more columns.
543+
Select columns and/or rearrange to put columns in a different order using the
544+
`Move Up` or `Move Down` buttons.
533545

534546
![CSV Lint select columns dialog](/docs/csvlint_select_columns.png?raw=true "CSV Lint plug-in select columns dialog")
535547

536-
Check the `Select distinct values` checkbox to list all unique values in a column, or
537-
combination of columns, and count how often that unique value or combination
538-
of values was found. This can be useful to check if the dataset contains the
539-
expected amount of unique names, patients, product codes, barcodes etc.
548+
Check the `Select distinct values` checkbox to list only distinct
549+
unique values of a column or combination of columns, and count how often that
550+
unique value or combination of values was found. This can be useful to check
551+
if the dataset contains the expected amount of unique names, patients,
552+
product codes, barcodes etc.
540553

541554
As an example, if you have a data file where each line is one blood pressure
542-
measurement of a participant, and you want to verify that each participant in
555+
measurement of a patient, and you want to verify that each patient in
543556
the data file has exactly 3 measurements. In that case you can select just the
544-
column participantId and select sort by `count`, to sort the result by the new
557+
column patientId and select sort by `count`, to sort the result by the new
545558
`count_distinct` column.
546559

547-
If the data is correct, it should list all participantId with a `count_distinct`
560+
If the data is correct, it should list all patientId with a `count_distinct`
548561
value of 3. And, because it's sorted by `count_distinct`, you can check the
549-
beginning and end of the list to see if there are any participants with fewer
562+
beginning and end of the list to see if there are any patients with fewer
550563
or more than 3 measurements.
551564

552-
Right-click Ascending or Descing to disable sorting, the resulting list
565+
Right-click Ascending or Descending to disable sorting, the resulting list
553566
of values will be in the order as the values were first found in the dataset.
554567

555568
Convert data
@@ -565,8 +578,8 @@ For XML, enter a `Table/tag name` to use as tag name for each record,
565578
or leave it empty to use the current filename.
566579

567580
Select SQL to convert the data to an SQL script to create a database
568-
table and inserts all records from the csv datafile into that table.
569-
The insert statement will be grouped in batches of X lines of csv data,
581+
table and inserts all records from the CSV datafile into that table.
582+
The insert statement will be grouped in batches of X lines of CSV data,
570583
as set by the Batch size number in the plug-in Settings.
571584

572585
Depending on which database type you select, MySQL/MariaDB, MS-SQL or
@@ -604,20 +617,11 @@ See below for an example of an SQL insert script the plugin will generate:
604617

605618
Note: Oracle is not supported as Database type, but you can work around this by
606619
using Database type `MySQL / MariaDB` and setting the Batch size to `1`.
607-
If there are any Date or DateTime values, then add this line
620+
If there are any Date or DateTime values, then add the following line
608621
at the top of the script.
609622

610623
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';
611624

612-
Or, alternatively, use Notepad++ Search and Replace to apply `TO_DATE()`
613-
to any datetime values, so in the resulting INSERT script press `Ctrl + H`
614-
and then:
615-
616-
Find what: '(\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})'
617-
Replace with: TO_DATE\('\1', 'YYYY-MM-DD HH24:MI:SS'\)
618-
Search mode: Regular expression
619-
-> [Replace all]
620-
621625
Generate metadata
622626
-----------------
623627
Generate metadata in different formats based on the column separator,
@@ -657,20 +661,20 @@ which contains the following columns:
657661

658662
### Python ###
659663

660-
Generates a [Python](https://www.python.org/) script to read the csv data file
664+
Generates a [Python](https://www.python.org/) script to read the CSV data file
661665
as a dataframe. It contains the required scripting for the appropriate
662666
datatypes, and it is meant as a starting point for further script development.
663667

664668
### R-script ###
665669

666-
Generates an [R-script](https://www.r-project.org/) to read the csv data file
670+
Generates an [R-script](https://www.r-project.org/) to read the CSV data file
667671
as a dataframe. It contains the required scripting for the appropriate
668672
datatypes, and it is meant as a starting point for further script development in
669673
[R-Studio](https://www.rstudio.com/products/rstudio/).
670674

671675
### PowerShell ###
672676

673-
Generates a [PowerShell](https://en.wikipedia.org/wiki/PowerShell) script to read the csv data file
677+
Generates a [PowerShell](https://en.wikipedia.org/wiki/PowerShell) script to read the CSV data file
674678
as a dataset variable. It contains the required scripting for the appropriate
675679
datatypes, and it is meant as a starting point for further script development.
676680

0 commit comments

Comments
 (0)