|
3 | 3 | <head> |
4 | 4 | <!-- Book generated using mdBook --> |
5 | 5 | <meta charset="UTF-8"> |
6 | | - <title>Xlsx - ROAPI Documentation</title> |
| 6 | + <title>MS Excel compatible formats - ROAPI Documentation</title> |
7 | 7 |
|
8 | 8 |
|
9 | 9 | <!-- Custom HTML head --> |
@@ -168,20 +168,97 @@ <h1 class="menu-title">ROAPI Documentation</h1> |
168 | 168 |
|
169 | 169 | <div id="content" class="content"> |
170 | 170 | <main> |
171 | | - <h1 id="xlsxmicrosoft-excel"><a class="header" href="#xlsxmicrosoft-excel">Xlsx(Microsoft Excel)</a></h1> |
172 | | -<p>To load the <code>.xlsx</code>, <code>sheet_name</code> needs to be specified in a config.</p> |
173 | | -<p>By default, most <code>.xlsx</code> files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this sheet_name as needed if your spreadsheet uses a different sheet_name.</p> |
174 | | -<p>Ex) Sheet1</p> |
175 | | -<p><img src="../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" /></p> |
| 171 | + <h1 id="ms-excel-compatible-formats"><a class="header" href="#ms-excel-compatible-formats">MS Excel compatible formats.</a></h1> |
| 172 | +<p>ROAPI supports loading a few Microsoft Excel compatible formats like xls, xlsx, xlsb, ods.</p> |
| 173 | +<h2 id="configuration"><a class="header" href="#configuration">Configuration</a></h2> |
| 174 | +<p>To load MS Excel compatible files the config should be specified like:</p> |
| 175 | +<pre><code class="language-yaml">tables: |
| 176 | + - name: "<table name>" |
| 177 | + uri: "<files path>" |
| 178 | + option: |
| 179 | + format: "<file format>" |
| 180 | + sheet_name: "Sheet1" |
| 181 | + rows_range_start: 2 |
| 182 | + rows_range_end: 5 |
| 183 | + columns_range_start: 1 |
| 184 | + columns_range_end: 6 |
| 185 | + schema_inference_lines: 3 |
| 186 | +</code></pre> |
| 187 | +<ul> |
| 188 | +<li><strong>format</strong> - name of file format. Currently supported files format: |
| 189 | +<ul> |
| 190 | +<li>xls (Microsoft Excel 5.0/95 Workbook)</li> |
| 191 | +<li>xlsx (Excel Workbook)</li> |
| 192 | +<li>xlsb (Excel Binary Workbook)</li> |
| 193 | +<li>ods (OpenDocument Spreadsheet)</li> |
| 194 | +</ul> |
| 195 | +</li> |
| 196 | +<li><strong>sheet_name</strong> - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this <code>sheet_name</code> as needed if your spreadsheet uses a different name. |
| 197 | +<img src="../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" /> |
| 198 | +If no <code>sheet_name</code> is specified, ROAPI will use first spreadsheet.</li> |
| 199 | +<li><strong>Table range options</strong> |
| 200 | +<ul> |
| 201 | +<li><strong>rows_range_start</strong> - the first row of the table. It contains column names. By default, <code>rows_range_start</code> is 0 (the first raw in spreadsheet)</li> |
| 202 | +<li><strong>rows_range_end</strong> - the last row of the table. By default, ROAPI reads all data.</li> |
| 203 | +<li><strong>columns_range_start</strong> - the column of the table. By default, <code>columns_range_start</code> is 0 (first column in spreadsheet)</li> |
| 204 | +<li><strong>columns_range_end</strong> - the last column of the table. By default, ROAPI reads all columns.<br /> |
| 205 | +For example, to take only selected data: |
| 206 | +<img src="../../images/spread_sheet_range.png" alt="spread_sheet_range" /> |
| 207 | +the config file looks like:</li> |
| 208 | +</ul> |
| 209 | +</li> |
| 210 | +</ul> |
| 211 | +<pre><code class="language-yaml">tables: |
| 212 | + - name: "<table name>" |
| 213 | + uri: "<files path>" |
| 214 | + option: |
| 215 | + format: "<file format>" |
| 216 | + sheet_name: "Sheet1" |
| 217 | + rows_range_start: 1 |
| 218 | + rows_range_end: 4 |
| 219 | + columns_range_start: 1 |
| 220 | + columns_range_end: 3 |
| 221 | +</code></pre> |
| 222 | +<ul> |
| 223 | +<li><strong>schema_inference_lines</strong> - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, <code>schema_inference_lines: 3</code> means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference.</li> |
| 224 | +</ul> |
| 225 | +<h2 id="schema-inference"><a class="header" href="#schema-inference">Schema inference.</a></h2> |
| 226 | +<p>ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in <code>schema_inference_lines</code> option. |
| 227 | +If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype.</p> |
| 228 | +<p>Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster.</p> |
176 | 229 | <pre><code class="language-yaml">tables: |
177 | 230 | - name: "excel_table" |
178 | 231 | uri: "path/to/file.xlsx" |
179 | 232 | option: |
180 | 233 | format: "xlsx" |
181 | | - sheet_name: "Sheet1" |
| 234 | + schema: |
| 235 | + columns: |
| 236 | + - name: "int_column" |
| 237 | + data_type: "Int64" |
| 238 | + nullable: true |
| 239 | + - name: "string_column" |
| 240 | + data_type: "Utf8" |
| 241 | + nullable: true |
| 242 | + - name: "float_column" |
| 243 | + data_type: "Float64" |
| 244 | + nullable: true |
| 245 | + - name: "datetime_column" |
| 246 | + data_type: !Timestamp [Seconds, null] |
| 247 | + nullable: true |
| 248 | + - name: "duration_column" |
| 249 | + data_type: !Duration Second |
| 250 | + nullable: true |
| 251 | + - name: "date32_column" |
| 252 | + data_type: Date32 |
| 253 | + nullable: true |
| 254 | + - name: "date64_column" |
| 255 | + data_type: Date64 |
| 256 | + nullable: true |
| 257 | + - name: "null_column" |
| 258 | + data_type: Null |
| 259 | + nullable: true |
182 | 260 | </code></pre> |
183 | | -<p>If no <code>sheet_name</code> is specified, ROAPI will throw the error.</p> |
184 | | -<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/xlsx.md">Edit this page on GitHub.</a></footer> |
| 261 | +<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/excel.md">Edit this page on GitHub.</a></footer> |
185 | 262 | </main> |
186 | 263 |
|
187 | 264 | <nav class="nav-wrapper" aria-label="Page navigation"> |
|
0 commit comments