You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: config/config-file.html
+8Lines changed: 8 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -224,6 +224,14 @@ <h2 id="key-value-stores"><a class="header" href="#key-value-stores">Key value s
224
224
value: name
225
225
</code></pre>
226
226
<p>The above config will create a keyvalue store named <code>spacex_launch_name</code> that allows you to lookup SpaceX launch names using launch ids.</p>
<p>You can override DataFusion configuration settings by specifying them in the <code>datafusion</code> section of your config file. This allows you to tune the query engine's behavior for your specific use case:</p>
229
+
<pre><codeclass="language-yaml">datafusion:
230
+
"execution.collect_statistics": "true"
231
+
"execution.batch_size": "8192"
232
+
"sql_parser.enable_ident_normalization": "true"
233
+
</code></pre>
234
+
<p>The <code>datafusion</code> field accepts a map of configuration key-value pairs where both keys and values are strings. You can reference the <ahref="https://docs.rs/datafusion/latest/datafusion/config/struct.ConfigOptions.html">DataFusion configuration documentation</a> for a complete list of available configuration options.</p>
227
235
<h2id="specify-a-config-file-on-startup"><aclass="header" href="#specify-a-config-file-on-startup">Specify a config file on startup</a></h2>
228
236
<p>Use <code>-c</code> argument to run ROAPI using a specific config file:</p>
<li><strong>format</strong> - name of file format. Currently supported files format:
189
-
<ul>
190
-
<li>xls (Microsoft Excel 5.0/95 Workbook)</li>
191
-
<li>xlsx (Excel Workbook)</li>
192
-
<li>xlsb (Excel Binary Workbook)</li>
193
-
<li>ods (OpenDocument Spreadsheet)</li>
194
-
</ul>
195
-
</li>
196
-
<li><strong>sheet_name</strong> - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this <code>sheet_name</code> as needed if your spreadsheet uses a different name.
If no <code>sheet_name</code> is specified, ROAPI will use first spreadsheet.</li>
199
-
<li><strong>Table range options</strong>
200
-
<ul>
201
-
<li><strong>rows_range_start</strong> - the first row of the table. It contains column names. By default, <code>rows_range_start</code> is 0 (the first raw in spreadsheet)</li>
202
-
<li><strong>rows_range_end</strong> - the last row of the table. By default, ROAPI reads all data.</li>
203
-
<li><strong>columns_range_start</strong> - the column of the table. By default, <code>columns_range_start</code> is 0 (first column in spreadsheet)</li>
204
-
<li><strong>columns_range_end</strong> - the last column of the table. By default, ROAPI reads all columns.<br/>
<p>You can specify which sheet to load from the spreadsheet using the <code>sheet_name</code> option. By default, ROAPI will use the first sheet if no sheet name is specified.
<h2id="table-range-options"><aclass="header" href="#table-range-options">Table Range Options</a></h2>
198
+
<p>You can specify a specific range of cells to load from the spreadsheet:
199
+
You can specify a specific range of cells to load from the spreadsheet:</p>
222
200
<ul>
223
-
<li><strong>schema_inference_lines</strong> - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, <code>schema_inference_lines: 3</code> means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference.</li>
201
+
<li><strong>rows_range_start</strong> - The first row of the table containing column names (default: 0)</li>
202
+
<li><strong>rows_range_end</strong> - The last row of the table (default: all rows)</li>
203
+
<li><strong>columns_range_start</strong> - The first column of the table (default: 0)</li>
204
+
<li><strong>columns_range_end</strong> - The last column of the table (default: all columns)</li>
<p>ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in <code>schema_inference_lines</code> option.
227
-
If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype.</p>
228
-
<p>Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster.</p>
<p>ROAPI can automatically infer the schema from your Excel data. The first row within the specified range is treated as column names, and ROAPI will analyze the remaining rows to determine data types.</p>
220
+
<p>You can control schema inference with the <code>schema_inference_lines</code> option, which specifies how many rows to analyze (including the header row). For example, <code>schema_inference_lines: 3</code> will use the first row for column names and analyze 2 additional rows for data types.</p>
221
+
<p>If a column contains mixed data types (like both integers and floats), ROAPI will default to the Utf8 (string) data type.</p>
Copy file name to clipboardExpand all lines: print.html
+59-56Lines changed: 59 additions & 56 deletions
Original file line number
Diff line number
Diff line change
@@ -376,6 +376,14 @@ <h2 id="key-value-stores"><a class="header" href="#key-value-stores">Key value s
376
376
value: name
377
377
</code></pre>
378
378
<p>The above config will create a keyvalue store named <code>spacex_launch_name</code> that allows you to lookup SpaceX launch names using launch ids.</p>
<p>You can override DataFusion configuration settings by specifying them in the <code>datafusion</code> section of your config file. This allows you to tune the query engine's behavior for your specific use case:</p>
381
+
<pre><codeclass="language-yaml">datafusion:
382
+
"execution.collect_statistics": "true"
383
+
"execution.batch_size": "8192"
384
+
"sql_parser.enable_ident_normalization": "true"
385
+
</code></pre>
386
+
<p>The <code>datafusion</code> field accepts a map of configuration key-value pairs where both keys and values are strings. You can reference the <ahref="https://docs.rs/datafusion/latest/datafusion/config/struct.ConfigOptions.html">DataFusion configuration documentation</a> for a complete list of available configuration options.</p>
379
387
<h2id="specify-a-config-file-on-startup"><aclass="header" href="#specify-a-config-file-on-startup">Specify a config file on startup</a></h2>
380
388
<p>Use <code>-c</code> argument to run ROAPI using a specific config file:</p>
<footerid="open-on-gh">Found a bug? <ahref="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/arrow.md">Edit this page on GitHub.</a></footer><divstyle="break-before: page; page-break-before: always;"></div><h1id="ms-excel-compatible-formats"><aclass="header" href="#ms-excel-compatible-formats">MS Excel compatible formats.</a></h1>
686
-
<p>ROAPI supports loading a few Microsoft Excel compatible formats like xls, xlsx, xlsb, ods.</p>
<p>To load MS Excel compatible files the config should be specified like:</p>
693
+
<footerid="open-on-gh">Found a bug? <ahref="https://github.com/roapi/docs/edit/main/src/config/dataset-formats/arrow.md">Edit this page on GitHub.</a></footer><divstyle="break-before: page; page-break-before: always;"></div><h1id="excel"><aclass="header" href="#excel">Excel</a></h1>
694
+
<p>ROAPI supports loading Microsoft Excel compatible formats including xls, xlsx, xlsb, and ods files.</p>
<li><strong>format</strong> - name of file format. Currently supported files format:
703
-
<ul>
704
-
<li>xls (Microsoft Excel 5.0/95 Workbook)</li>
705
-
<li>xlsx (Excel Workbook)</li>
706
-
<li>xlsb (Excel Binary Workbook)</li>
707
-
<li>ods (OpenDocument Spreadsheet)</li>
708
-
</ul>
709
-
</li>
710
-
<li><strong>sheet_name</strong> - the name of the spread sheet with table data. By default, most files initially use Sheet1 as the <code>sheet_name</code>. Be sure to change this <code>sheet_name</code> as needed if your spreadsheet uses a different name.
If no <code>sheet_name</code> is specified, ROAPI will use first spreadsheet.</li>
713
-
<li><strong>Table range options</strong>
714
-
<ul>
715
-
<li><strong>rows_range_start</strong> - the first row of the table. It contains column names. By default, <code>rows_range_start</code> is 0 (the first raw in spreadsheet)</li>
716
-
<li><strong>rows_range_end</strong> - the last row of the table. By default, ROAPI reads all data.</li>
717
-
<li><strong>columns_range_start</strong> - the column of the table. By default, <code>columns_range_start</code> is 0 (first column in spreadsheet)</li>
718
-
<li><strong>columns_range_end</strong> - the last column of the table. By default, ROAPI reads all columns.<br/>
<p>You can specify which sheet to load from the spreadsheet using the <code>sheet_name</code> option. By default, ROAPI will use the first sheet if no sheet name is specified.
<h2id="table-range-options"><aclass="header" href="#table-range-options">Table Range Options</a></h2>
720
+
<p>You can specify a specific range of cells to load from the spreadsheet:
721
+
You can specify a specific range of cells to load from the spreadsheet:</p>
736
722
<ul>
737
-
<li><strong>schema_inference_lines</strong> - the number of rows (inside table range) to use in schema inference. This number includes the row with column names, so, for example, <code>schema_inference_lines: 3</code> means ROAPI will use first row for column names inference and 2 rows for column types inference. If this option is not specified then ROAPI reads all rows for column data types inference.</li>
723
+
<li><strong>rows_range_start</strong> - The first row of the table containing column names (default: 0)</li>
724
+
<li><strong>rows_range_end</strong> - The last row of the table (default: all rows)</li>
725
+
<li><strong>columns_range_start</strong> - The first column of the table (default: 0)</li>
726
+
<li><strong>columns_range_end</strong> - The last column of the table (default: all columns)</li>
<p>ROAPI can infer schema of data automatically. The first row of data range is a row with column names. After column names inference ROAPI will infer data types by scanning all remaining rows or limited number of rows specified in <code>schema_inference_lines</code> option.
741
-
If column contains more than one data type (for exaple, float and int) then ROAPI use Utf8 datatype.</p>
742
-
<p>Also, it is possible to specify schema in configuration file. This allows to avoid schema inference from data and loading of table will be faster.</p>
<p>ROAPI can automatically infer the schema from your Excel data. The first row within the specified range is treated as column names, and ROAPI will analyze the remaining rows to determine data types.</p>
742
+
<p>You can control schema inference with the <code>schema_inference_lines</code> option, which specifies how many rows to analyze (including the header row). For example, <code>schema_inference_lines: 3</code> will use the first row for column names and analyze 2 additional rows for data types.</p>
743
+
<p>If a column contains mixed data types (like both integers and floats), ROAPI will default to the Utf8 (string) data type.</p>
0 commit comments