Skip to content

Commit 61796d2

Browse files
committed
Deploying to gh-pages from @ f066db0 🚀
1 parent bc477da commit 61796d2

5 files changed

Lines changed: 120 additions & 114 deletions

File tree

config/config-file.html

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,14 @@ <h2 id="key-value-stores"><a class="header" href="#key-value-stores">Key value s
224224
value: name
225225
</code></pre>
226226
<p>The above config will create a keyvalue store named <code>spacex_launch_name</code> that allows you to lookup SpaceX launch names using launch ids.</p>
227+
<h2 id="datafusion-configuration"><a class="header" href="#datafusion-configuration">DataFusion configuration</a></h2>
228+
<p>You can override DataFusion configuration settings by specifying them in the <code>datafusion</code> section of your config file. This allows you to tune the query engine's behavior for your specific use case:</p>
229+
<pre><code class="language-yaml">datafusion:
230+
"execution.collect_statistics": "true"
231+
"execution.batch_size": "8192"
232+
"sql_parser.enable_ident_normalization": "true"
233+
</code></pre>
234+
<p>The <code>datafusion</code> field accepts a map of configuration key-value pairs where both keys and values are strings. You can reference the <a href="https://docs.rs/datafusion/latest/datafusion/config/struct.ConfigOptions.html">DataFusion configuration documentation</a> for a complete list of available configuration options.</p>
227235
<h2 id="specify-a-config-file-on-startup"><a class="header" href="#specify-a-config-file-on-startup">Specify a config file on startup</a></h2>
228236
<p>Use <code>-c</code> argument to run ROAPI using a specific config file:</p>
229237
<pre><code class="language-bash">roapi -c ./roapi.yml

config/dataset-formats/excel.html

Lines changed: 51 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -168,71 +168,66 @@ <h1 class="menu-title">ROAPI Documentation</h1>
168168

169169
<div id="content" class="content">
170170
<main>
171-
<h1 id="ms-excel-compatible-formats"><a class="header" href="#ms-excel-compatible-formats">MS Excel compatible formats.</a></h1>
172-
<p>ROAPI supports loading a few Microsoft Excel compatible formats like xls, xlsx, xlsb, ods.</p>
173-
<h2 id="configuration"><a class="header" href="#configuration">Configuration</a></h2>
174-
<p>To load MS Excel compatible files the config should be specified like:</p>
171+
<h1 id="excel"><a class="header" href="#excel">Excel</a></h1>
172+
<p>ROAPI supports loading Microsoft Excel compatible formats including xls, xlsx, xlsb, and ods files.</p>
175173
<pre><code class="language-yaml">tables:
176-
- name: "&lt;table name&gt;"
177-
uri: "&lt;files path&gt;"
178-
option:
179-
format: "&lt;file format&gt;"
180-
sheet_name: "Sheet1"
181-
rows_range_start: 2
182-
rows_range_end: 5
183-
columns_range_start: 1
184-
columns_range_end: 6
185-
schema_inference_lines: 3
174+
- name: "mytable"
175+
uri: "path/to/file.xlsx"
176+
option:
177+
format: "xlsx"
178+
sheet_name: "Sheet1"
186179
</code></pre>
180+
<h2 id="supported-formats"><a class="header" href="#supported-formats">Supported Formats</a></h2>
187181
<ul>
188-
<li><strong>format</strong> - name of file format. Currently supported files format:
189-
<ul>
190-
<li>xls (Microsoft Excel 5.0/95 Workbook)</li>
191-
<li>xlsx (Excel Workbook)</li>
192-
<li>xlsb (Excel Binary Workbook)</li>
193-
<li>ods (OpenDocument Spreadsheet)</li>
194-
</ul>
195-
</li>
196-
<li><strong>sheet_name</strong> - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this <code>sheet_name</code> as needed if your spreadsheet uses a different name.
197-
<img src="../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" />
198-
If no <code>sheet_name</code> is specified, ROAPI will use first spreadsheet.</li>
199-
<li><strong>Table range options</strong>
200-
<ul>
201-
<li><strong>rows_range_start</strong> - the first row of the table. It contains column names. By default, <code>rows_range_start</code> is 0 (the first raw in spreadsheet)</li>
202-
<li><strong>rows_range_end</strong> - the last row of the table. By default, ROAPI reads all data.</li>
203-
<li><strong>columns_range_start</strong> - the column of the table. By default, <code>columns_range_start</code> is 0 (first column in spreadsheet)</li>
204-
<li><strong>columns_range_end</strong> - the last column of the table. By default, ROAPI reads all columns.<br />
205-
For example, to take only selected data:
206-
<img src="../../images/spread_sheet_range.png" alt="spread_sheet_range" />
207-
the config file looks like:</li>
208-
</ul>
209-
</li>
182+
<li><strong>xls</strong> - Microsoft Excel 5.0/95 Workbook</li>
183+
<li><strong>xlsx</strong> - Excel Workbook</li>
184+
<li><strong>xlsb</strong> - Excel Binary Workbook</li>
185+
<li><strong>ods</strong> - OpenDocument Spreadsheet</li>
210186
</ul>
187+
<h2 id="sheet-selection"><a class="header" href="#sheet-selection">Sheet Selection</a></h2>
188+
<p>You can specify which sheet to load from the spreadsheet using the <code>sheet_name</code> option. By default, ROAPI will use the first sheet if no sheet name is specified.
189+
<img src="../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" /></p>
211190
<pre><code class="language-yaml">tables:
212-
- name: "&lt;table name&gt;"
213-
uri: "&lt;files path&gt;"
214-
option:
215-
format: "&lt;file format&gt;"
216-
sheet_name: "Sheet1"
217-
rows_range_start: 1
218-
rows_range_end: 4
219-
columns_range_start: 1
220-
columns_range_end: 3
191+
- name: "mytable"
192+
uri: "path/to/file.xlsx"
193+
option:
194+
format: "xlsx"
195+
sheet_name: "MyDataSheet"
221196
</code></pre>
197+
<h2 id="table-range-options"><a class="header" href="#table-range-options">Table Range Options</a></h2>
198+
<p>You can specify a specific range of cells to load from the spreadsheet:
199+
You can specify a specific range of cells to load from the spreadsheet:</p>
222200
<ul>
223-
<li><strong>schema_inference_lines</strong> - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, <code>schema_inference_lines: 3</code> means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference.</li>
201+
<li><strong>rows_range_start</strong> - The first row of the table containing column names (default: 0)</li>
202+
<li><strong>rows_range_end</strong> - The last row of the table (default: all rows)</li>
203+
<li><strong>columns_range_start</strong> - The first column of the table (default: 0)</li>
204+
<li><strong>columns_range_end</strong> - The last column of the table (default: all columns)</li>
224205
</ul>
225-
<h2 id="schema-inference"><a class="header" href="#schema-inference">Schema inference.</a></h2>
226-
<p>ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in <code>schema_inference_lines</code> option.
227-
If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype.</p>
228-
<p>Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster.</p>
206+
<p><img src="../../images/spread_sheet_range.png" alt="spread_sheet_range" /></p>
207+
<pre><code class="language-yaml">tables:
208+
- name: "mytable"
209+
uri: "path/to/file.xlsx"
210+
option:
211+
format: "xlsx"
212+
sheet_name: "Sheet1"
213+
rows_range_start: 1
214+
rows_range_end: 4
215+
columns_range_start: 1
216+
columns_range_end: 3
217+
</code></pre>
218+
<h2 id="schema-inference"><a class="header" href="#schema-inference">Schema Inference</a></h2>
219+
<p>ROAPI can automatically infer the schema from your Excel data. The first row within the specified range is treated as column names, and ROAPI will analyze the remaining rows to determine data types.</p>
220+
<p>You can control schema inference with the <code>schema_inference_lines</code> option, which specifies how many rows to analyze (including the header row). For example, <code>schema_inference_lines: 3</code> will use the first row for column names and analyze 2 additional rows for data types.</p>
221+
<p>If a column contains mixed data types (like both integers and floats), ROAPI will default to the Utf8 (string) data type.</p>
222+
<h2 id="explicit-schema-definition"><a class="header" href="#explicit-schema-definition">Explicit Schema Definition</a></h2>
223+
<p>For better performance and predictable data types, you can define the schema explicitly in your configuration:</p>
229224
<pre><code class="language-yaml">tables:
230-
- name: "excel_table"
231-
uri: "path/to/file.xlsx"
232-
option:
233-
format: "xlsx"
234-
schema:
235-
columns:
225+
- name: "excel_table"
226+
uri: "path/to/file.xlsx"
227+
option:
228+
format: "xlsx"
229+
schema:
230+
columns:
236231
- name: "int_column"
237232
data_type: "Int64"
238233
nullable: true

print.html

Lines changed: 59 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,14 @@ <h2 id="key-value-stores"><a class="header" href="#key-value-stores">Key value s
376376
value: name
377377
</code></pre>
378378
<p>The above config will create a keyvalue store named <code>spacex_launch_name</code> that allows you to lookup SpaceX launch names using launch ids.</p>
379+
<h2 id="datafusion-configuration"><a class="header" href="#datafusion-configuration">DataFusion configuration</a></h2>
380+
<p>You can override DataFusion configuration settings by specifying them in the <code>datafusion</code> section of your config file. This allows you to tune the query engine's behavior for your specific use case:</p>
381+
<pre><code class="language-yaml">datafusion:
382+
"execution.collect_statistics": "true"
383+
"execution.batch_size": "8192"
384+
"sql_parser.enable_ident_normalization": "true"
385+
</code></pre>
386+
<p>The <code>datafusion</code> field accepts a map of configuration key-value pairs where both keys and values are strings. You can reference the <a href="https://docs.rs/datafusion/latest/datafusion/config/struct.ConfigOptions.html">DataFusion configuration documentation</a> for a complete list of available configuration options.</p>
379387
<h2 id="specify-a-config-file-on-startup"><a class="header" href="#specify-a-config-file-on-startup">Specify a config file on startup</a></h2>
380388
<p>Use <code>-c</code> argument to run ROAPI using a specific config file:</p>
381389
<pre><code class="language-bash">roapi -c ./roapi.yml
@@ -682,71 +690,66 @@ <h2 id="large-datasets-1"><a class="header" href="#large-datasets-1">Large Datas
682690
option:
683691
format: "arrow" # or arrows
684692
</code></pre>
685-
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/arrow.md">Edit this page on GitHub.</a></footer><div style="break-before: page; page-break-before: always;"></div><h1 id="ms-excel-compatible-formats"><a class="header" href="#ms-excel-compatible-formats">MS Excel compatible formats.</a></h1>
686-
<p>ROAPI supports loading a few Microsoft Excel compatible formats like xls, xlsx, xlsb, ods.</p>
687-
<h2 id="configuration"><a class="header" href="#configuration">Configuration</a></h2>
688-
<p>To load MS Excel compatible files the config should be specified like:</p>
693+
<footer id="open-on-gh">Found a bug? <a href="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/arrow.md">Edit this page on GitHub.</a></footer><div style="break-before: page; page-break-before: always;"></div><h1 id="excel"><a class="header" href="#excel">Excel</a></h1>
694+
<p>ROAPI supports loading Microsoft Excel compatible formats including xls, xlsx, xlsb, and ods files.</p>
689695
<pre><code class="language-yaml">tables:
690-
- name: "&lt;table name&gt;"
691-
uri: "&lt;files path&gt;"
692-
option:
693-
format: "&lt;file format&gt;"
694-
sheet_name: "Sheet1"
695-
rows_range_start: 2
696-
rows_range_end: 5
697-
columns_range_start: 1
698-
columns_range_end: 6
699-
schema_inference_lines: 3
696+
- name: "mytable"
697+
uri: "path/to/file.xlsx"
698+
option:
699+
format: "xlsx"
700+
sheet_name: "Sheet1"
700701
</code></pre>
702+
<h2 id="supported-formats"><a class="header" href="#supported-formats">Supported Formats</a></h2>
701703
<ul>
702-
<li><strong>format</strong> - name of file format. Currently supported files format:
703-
<ul>
704-
<li>xls (Microsoft Excel 5.0/95 Workbook)</li>
705-
<li>xlsx (Excel Workbook)</li>
706-
<li>xlsb (Excel Binary Workbook)</li>
707-
<li>ods (OpenDocument Spreadsheet)</li>
708-
</ul>
709-
</li>
710-
<li><strong>sheet_name</strong> - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this <code>sheet_name</code> as needed if your spreadsheet uses a different name.
711-
<img src="config/dataset-formats/../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" />
712-
If no <code>sheet_name</code> is specified, ROAPI will use first spreadsheet.</li>
713-
<li><strong>Table range options</strong>
714-
<ul>
715-
<li><strong>rows_range_start</strong> - the first row of the table. It contains column names. By default, <code>rows_range_start</code> is 0 (the first raw in spreadsheet)</li>
716-
<li><strong>rows_range_end</strong> - the last row of the table. By default, ROAPI reads all data.</li>
717-
<li><strong>columns_range_start</strong> - the column of the table. By default, <code>columns_range_start</code> is 0 (first column in spreadsheet)</li>
718-
<li><strong>columns_range_end</strong> - the last column of the table. By default, ROAPI reads all columns.<br />
719-
For example, to take only selected data:
720-
<img src="config/dataset-formats/../../images/spread_sheet_range.png" alt="spread_sheet_range" />
721-
the config file looks like:</li>
722-
</ul>
723-
</li>
704+
<li><strong>xls</strong> - Microsoft Excel 5.0/95 Workbook</li>
705+
<li><strong>xlsx</strong> - Excel Workbook</li>
706+
<li><strong>xlsb</strong> - Excel Binary Workbook</li>
707+
<li><strong>ods</strong> - OpenDocument Spreadsheet</li>
724708
</ul>
709+
<h2 id="sheet-selection"><a class="header" href="#sheet-selection">Sheet Selection</a></h2>
710+
<p>You can specify which sheet to load from the spreadsheet using the <code>sheet_name</code> option. By default, ROAPI will use the first sheet if no sheet name is specified.
711+
<img src="config/dataset-formats/../../images/xlsx_sheet_name.png" alt="xlsx_sheet_name" /></p>
725712
<pre><code class="language-yaml">tables:
726-
- name: "&lt;table name&gt;"
727-
uri: "&lt;files path&gt;"
728-
option:
729-
format: "&lt;file format&gt;"
730-
sheet_name: "Sheet1"
731-
rows_range_start: 1
732-
rows_range_end: 4
733-
columns_range_start: 1
734-
columns_range_end: 3
713+
- name: "mytable"
714+
uri: "path/to/file.xlsx"
715+
option:
716+
format: "xlsx"
717+
sheet_name: "MyDataSheet"
735718
</code></pre>
719+
<h2 id="table-range-options"><a class="header" href="#table-range-options">Table Range Options</a></h2>
720+
<p>You can specify a specific range of cells to load from the spreadsheet:
721+
You can specify a specific range of cells to load from the spreadsheet:</p>
736722
<ul>
737-
<li><strong>schema_inference_lines</strong> - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, <code>schema_inference_lines: 3</code> means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference.</li>
723+
<li><strong>rows_range_start</strong> - The first row of the table containing column names (default: 0)</li>
724+
<li><strong>rows_range_end</strong> - The last row of the table (default: all rows)</li>
725+
<li><strong>columns_range_start</strong> - The first column of the table (default: 0)</li>
726+
<li><strong>columns_range_end</strong> - The last column of the table (default: all columns)</li>
738727
</ul>
739-
<h2 id="schema-inference"><a class="header" href="#schema-inference">Schema inference.</a></h2>
740-
<p>ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in <code>schema_inference_lines</code> option.
741-
If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype.</p>
742-
<p>Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster.</p>
728+
<p><img src="config/dataset-formats/../../images/spread_sheet_range.png" alt="spread_sheet_range" /></p>
729+
<pre><code class="language-yaml">tables:
730+
- name: "mytable"
731+
uri: "path/to/file.xlsx"
732+
option:
733+
format: "xlsx"
734+
sheet_name: "Sheet1"
735+
rows_range_start: 1
736+
rows_range_end: 4
737+
columns_range_start: 1
738+
columns_range_end: 3
739+
</code></pre>
740+
<h2 id="schema-inference"><a class="header" href="#schema-inference">Schema Inference</a></h2>
741+
<p>ROAPI can automatically infer the schema from your Excel data. The first row within the specified range is treated as column names, and ROAPI will analyze the remaining rows to determine data types.</p>
742+
<p>You can control schema inference with the <code>schema_inference_lines</code> option, which specifies how many rows to analyze (including the header row). For example, <code>schema_inference_lines: 3</code> will use the first row for column names and analyze 2 additional rows for data types.</p>
743+
<p>If a column contains mixed data types (like both integers and floats), ROAPI will default to the Utf8 (string) data type.</p>
744+
<h2 id="explicit-schema-definition"><a class="header" href="#explicit-schema-definition">Explicit Schema Definition</a></h2>
745+
<p>For better performance and predictable data types, you can define the schema explicitly in your configuration:</p>
743746
<pre><code class="language-yaml">tables:
744-
- name: "excel_table"
745-
uri: "path/to/file.xlsx"
746-
option:
747-
format: "xlsx"
748-
schema:
749-
columns:
747+
- name: "excel_table"
748+
uri: "path/to/file.xlsx"
749+
option:
750+
format: "xlsx"
751+
schema:
752+
columns:
750753
- name: "int_column"
751754
data_type: "Int64"
752755
nullable: true

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

searchindex.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)