Skip to content

Commit fd80b6e

Browse files
committed
Deploying to gh-pages from @ 1f4099d 🚀
1 parent 67e6319 commit fd80b6e

9 files changed

Lines changed: 179 additions & 25 deletions

File tree

config/blob-store.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ <h2 id="azure-storage"><a class="header" href="#azure-storage">Azure Storage</a>
280280

281281
<nav class="nav-wrapper" aria-label="Page navigation">
282282
<!-- Mobile navigation buttons -->
283-
<a rel="prev" href="../config/dataset-formats/xlsx.html" class="mobile-nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
283+
<a rel="prev" href="../config/dataset-formats/excel.html" class="mobile-nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
284284
<i class="fa fa-angle-left"></i>
285285
</a>
286286

@@ -294,7 +294,7 @@ <h2 id="azure-storage"><a class="header" href="#azure-storage">Azure Storage</a>
294294
</div>
295295

296296
<nav class="nav-wide-wrapper" aria-label="Page navigation">
297-
<a rel="prev" href="../config/dataset-formats/xlsx.html" class="nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
297+
<a rel="prev" href="../config/dataset-formats/excel.html" class="nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
298298
<i class="fa fa-angle-left"></i>
299299
</a>
300300

config/dataset-formats/arrow.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ <h1 id="arrow"><a class="header" href="#arrow">Arrow</a></h1>
192192
<i class="fa fa-angle-left"></i>
193193
</a>
194194

195-
<a rel="next prefetch" href="../../config/dataset-formats/xlsx.html" class="mobile-nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
195+
<a rel="next prefetch" href="../../config/dataset-formats/excel.html" class="mobile-nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
196196
<i class="fa fa-angle-right"></i>
197197
</a>
198198

@@ -206,7 +206,7 @@ <h1 id="arrow"><a class="header" href="#arrow">Arrow</a></h1>
206206
<i class="fa fa-angle-left"></i>
207207
</a>
208208

209-
<a rel="next prefetch" href="../../config/dataset-formats/xlsx.html" class="nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
209+
<a rel="next prefetch" href="../../config/dataset-formats/excel.html" class="nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
210210
<i class="fa fa-angle-right"></i>
211211
</a>
212212
</nav>
Lines changed: 86 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<!-- Book generated using mdBook -->
55
<meta charset="UTF-8">
6-
<title>Xlsx - ROAPI Documentation</title>
6+
<title>MS Excel compatible formats - ROAPI Documentation</title>
77

88

99
<!-- Custom HTML head -->
@@ -168,20 +168,97 @@ <h1 class="menu-title">ROAPI Documentation</h1>
168168

169169
<div id="content" class="content">
170170
<main>
171-
<h1 id="xlsxmicrosoft-excel"><a class="header" href="#xlsxmicrosoft-excel">Xlsx(Microsoft Excel)</a></h1>
172-
<p>To load the <code>.xlsx</code>, <code>sheet_name</code> needs to be specified in a config.</p>
173-
<p>By default, most <code>.xlsx</code> files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this sheet_name as needed if your spreadsheet uses a different sheet_name.</p>
174-
<p>Ex) Sheet1</p>
175-
<p><img src="../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" /></p>
171+
<h1 id="ms-excel-compatible-formats"><a class="header" href="#ms-excel-compatible-formats">MS Excel compatible formats.</a></h1>
172+
<p>ROAPI supports loading a few Microsoft Excel compatible formats like xls, xlsx, xlsb, ods.</p>
173+
<h2 id="configuration"><a class="header" href="#configuration">Configuration</a></h2>
174+
<p>To load MS Excel compatible files the config should be specified like:</p>
175+
<pre><code class="language-yaml">tables:
176+
- name: "&lt;table name&gt;"
177+
uri: "&lt;files path&gt;"
178+
option:
179+
format: "&lt;file format&gt;"
180+
sheet_name: "Sheet1"
181+
rows_range_start: 2
182+
rows_range_end: 5
183+
columns_range_start: 1
184+
columns_range_end: 6
185+
schema_inference_lines: 3
186+
</code></pre>
187+
<ul>
188+
<li><strong>format</strong> - name of file format. Currently supported files format:
189+
<ul>
190+
<li>xls (Microsoft Excel 5.0/95 Workbook)</li>
191+
<li>xlsx (Excel Workbook)</li>
192+
<li>xlsb (Excel Binary Workbook)</li>
193+
<li>ods (OpenDocument Spreadsheet)</li>
194+
</ul>
195+
</li>
196+
<li><strong>sheet_name</strong> - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this <code>sheet_name</code> as needed if your spreadsheet uses a different name.
197+
<img src="../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" />
198+
If no <code>sheet_name</code> is specified, ROAPI will use first spreadsheet.</li>
199+
<li><strong>Table range options</strong>
200+
<ul>
201+
<li><strong>rows_range_start</strong> - the first row of the table. It contains column names. By default, <code>rows_range_start</code> is 0 (the first raw in spreadsheet)</li>
202+
<li><strong>rows_range_end</strong> - the last row of the table. By default, ROAPI reads all data.</li>
203+
<li><strong>columns_range_start</strong> - the column of the table. By default, <code>columns_range_start</code> is 0 (first column in spreadsheet)</li>
204+
<li><strong>columns_range_end</strong> - the last column of the table. By default, ROAPI reads all columns.<br />
205+
For example, to take only selected data:
206+
<img src="../../images/spread_sheet_range.png" alt="spread_sheet_range" />
207+
the config file looks like:</li>
208+
</ul>
209+
</li>
210+
</ul>
211+
<pre><code class="language-yaml">tables:
212+
- name: "&lt;table name&gt;"
213+
uri: "&lt;files path&gt;"
214+
option:
215+
format: "&lt;file format&gt;"
216+
sheet_name: "Sheet1"
217+
rows_range_start: 1
218+
rows_range_end: 4
219+
columns_range_start: 1
220+
columns_range_end: 3
221+
</code></pre>
222+
<ul>
223+
<li><strong>schema_inference_lines</strong> - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, <code>schema_inference_lines: 3</code> means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference.</li>
224+
</ul>
225+
<h2 id="schema-inference"><a class="header" href="#schema-inference">Schema inference.</a></h2>
226+
<p>ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in <code>schema_inference_lines</code> option.
227+
If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype.</p>
228+
<p>Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster.</p>
176229
<pre><code class="language-yaml">tables:
177230
- name: "excel_table"
178231
uri: "path/to/file.xlsx"
179232
option:
180233
format: "xlsx"
181-
sheet_name: "Sheet1"
234+
schema:
235+
columns:
236+
- name: "int_column"
237+
data_type: "Int64"
238+
nullable: true
239+
- name: "string_column"
240+
data_type: "Utf8"
241+
nullable: true
242+
- name: "float_column"
243+
data_type: "Float64"
244+
nullable: true
245+
- name: "datetime_column"
246+
data_type: !Timestamp [Seconds, null]
247+
nullable: true
248+
- name: "duration_column"
249+
data_type: !Duration Second
250+
nullable: true
251+
- name: "date32_column"
252+
data_type: Date32
253+
nullable: true
254+
- name: "date64_column"
255+
data_type: Date64
256+
nullable: true
257+
- name: "null_column"
258+
data_type: Null
259+
nullable: true
182260
</code></pre>
183-
<p>If no <code>sheet_name</code> is specified, ROAPI will throw the error.</p>
184-
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/xlsx.md">Edit this page on GitHub.</a></footer>
261+
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/excel.md">Edit this page on GitHub.</a></footer>
185262
</main>
186263

187264
<nav class="nav-wrapper" aria-label="Page navigation">

images/spread_sheet_range.png

31.4 KB
Loading

print.html

Lines changed: 85 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -682,20 +682,97 @@ <h2 id="large-datasets-1"><a class="header" href="#large-datasets-1">Large Datas
682682
option:
683683
format: "arrow" # or arrows
684684
</code></pre>
685-
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/arrow.md">Edit this page on GitHub.</a></footer><div style="break-before: page; page-break-before: always;"></div><h1 id="xlsxmicrosoft-excel"><a class="header" href="#xlsxmicrosoft-excel">Xlsx(Microsoft Excel)</a></h1>
686-
<p>To load the <code>.xlsx</code>, <code>sheet_name</code> needs to be specified in a config.</p>
687-
<p>By default, most <code>.xlsx</code> files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this sheet_name as needed if your spreadsheet uses a different sheet_name.</p>
688-
<p>Ex) Sheet1</p>
689-
<p><img src="config/dataset-formats/../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" /></p>
685+
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/arrow.md">Edit this page on GitHub.</a></footer><div style="break-before: page; page-break-before: always;"></div><h1 id="ms-excel-compatible-formats"><a class="header" href="#ms-excel-compatible-formats">MS Excel compatible formats.</a></h1>
686+
<p>ROAPI supports loading a few Microsoft Excel compatible formats like xls, xlsx, xlsb, ods.</p>
687+
<h2 id="configuration"><a class="header" href="#configuration">Configuration</a></h2>
688+
<p>To load MS Excel compatible files the config should be specified like:</p>
689+
<pre><code class="language-yaml">tables:
690+
- name: "&lt;table name&gt;"
691+
uri: "&lt;files path&gt;"
692+
option:
693+
format: "&lt;file format&gt;"
694+
sheet_name: "Sheet1"
695+
rows_range_start: 2
696+
rows_range_end: 5
697+
columns_range_start: 1
698+
columns_range_end: 6
699+
schema_inference_lines: 3
700+
</code></pre>
701+
<ul>
702+
<li><strong>format</strong> - name of file format. Currently supported files format:
703+
<ul>
704+
<li>xls (Microsoft Excel 5.0/95 Workbook)</li>
705+
<li>xlsx (Excel Workbook)</li>
706+
<li>xlsb (Excel Binary Workbook)</li>
707+
<li>ods (OpenDocument Spreadsheet)</li>
708+
</ul>
709+
</li>
710+
<li><strong>sheet_name</strong> - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this <code>sheet_name</code> as needed if your spreadsheet uses a different name.
711+
<img src="config/dataset-formats/../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" />
712+
If no <code>sheet_name</code> is specified, ROAPI will use first spreadsheet.</li>
713+
<li><strong>Table range options</strong>
714+
<ul>
715+
<li><strong>rows_range_start</strong> - the first row of the table. It contains column names. By default, <code>rows_range_start</code> is 0 (the first raw in spreadsheet)</li>
716+
<li><strong>rows_range_end</strong> - the last row of the table. By default, ROAPI reads all data.</li>
717+
<li><strong>columns_range_start</strong> - the column of the table. By default, <code>columns_range_start</code> is 0 (first column in spreadsheet)</li>
718+
<li><strong>columns_range_end</strong> - the last column of the table. By default, ROAPI reads all columns.<br />
719+
For example, to take only selected data:
720+
<img src="config/dataset-formats/../../images/spread_sheet_range.png" alt="spread_sheet_range" />
721+
the config file looks like:</li>
722+
</ul>
723+
</li>
724+
</ul>
725+
<pre><code class="language-yaml">tables:
726+
- name: "&lt;table name&gt;"
727+
uri: "&lt;files path&gt;"
728+
option:
729+
format: "&lt;file format&gt;"
730+
sheet_name: "Sheet1"
731+
rows_range_start: 1
732+
rows_range_end: 4
733+
columns_range_start: 1
734+
columns_range_end: 3
735+
</code></pre>
736+
<ul>
737+
<li><strong>schema_inference_lines</strong> - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, <code>schema_inference_lines: 3</code> means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference.</li>
738+
</ul>
739+
<h2 id="schema-inference"><a class="header" href="#schema-inference">Schema inference.</a></h2>
740+
<p>ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in <code>schema_inference_lines</code> option.
741+
If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype.</p>
742+
<p>Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster.</p>
690743
<pre><code class="language-yaml">tables:
691744
- name: "excel_table"
692745
uri: "path/to/file.xlsx"
693746
option:
694747
format: "xlsx"
695-
sheet_name: "Sheet1"
748+
schema:
749+
columns:
750+
- name: "int_column"
751+
data_type: "Int64"
752+
nullable: true
753+
- name: "string_column"
754+
data_type: "Utf8"
755+
nullable: true
756+
- name: "float_column"
757+
data_type: "Float64"
758+
nullable: true
759+
- name: "datetime_column"
760+
data_type: !Timestamp [Seconds, null]
761+
nullable: true
762+
- name: "duration_column"
763+
data_type: !Duration Second
764+
nullable: true
765+
- name: "date32_column"
766+
data_type: Date32
767+
nullable: true
768+
- name: "date64_column"
769+
data_type: Date64
770+
nullable: true
771+
- name: "null_column"
772+
data_type: Null
773+
nullable: true
696774
</code></pre>
697-
<p>If no <code>sheet_name</code> is specified, ROAPI will throw the error.</p>
698-
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/xlsx.md">Edit this page on GitHub.</a></footer><div style="break-before: page; page-break-before: always;"></div><h1 id="blob-store"><a class="header" href="#blob-store">Blob store</a></h1>
775+
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/excel.md">Edit this page on GitHub.</a></footer><div style="break-before: page; page-break-before: always;"></div><h1 id="blob-store"><a class="header" href="#blob-store">Blob store</a></h1>
699776
<p>ROAPI currently supports the following blob storages:</p>
700777
<ul>
701778
<li>Filesystem</li>

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

searchindex.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)