When working in Jupyter notebooks or other environments that support rich HTML display,
DataFusion DataFrames automatically render as nicely formatted HTML tables. This functionality
is provided by the _repr_html_ method, which is automatically called by Jupyter to provide
a richer visualization than plain text output.
In a Jupyter environment, simply displaying a DataFrame object will trigger HTML rendering:
# Will display as HTML table in Jupyter
df
# Explicit display also uses HTML rendering
display(df)DataFusion provides extensive customization options for HTML table rendering through the
datafusion.html_formatter module.
You can customize how DataFrames are rendered by configuring the formatter:
from datafusion.html_formatter import configure_formatter
# Change the default styling
configure_formatter(
max_cell_length=25, # Maximum characters in a cell before truncation
max_width=1000, # Maximum width in pixels
max_height=300, # Maximum height in pixels
max_memory_bytes=2097152, # Maximum memory for rendering (2MB)
min_rows=10, # Minimum number of rows to display
max_rows=10, # Maximum rows to display in __repr__
enable_cell_expansion=True,# Allow expanding truncated cells
custom_css=None, # Additional custom CSS
show_truncation_message=True, # Show message when data is truncated
style_provider=None, # Custom styling provider
use_shared_styles=True # Share styles across tables
)The formatter settings affect all DataFrames displayed after configuration.
For advanced styling needs, you can create a custom style provider:
from datafusion.html_formatter import StyleProvider, configure_formatter
class MyStyleProvider(StyleProvider):
def get_table_styles(self):
return {
"table": "border-collapse: collapse; width: 100%;",
"th": "background-color: #007bff; color: white; padding: 8px; text-align: left;",
"td": "border: 1px solid #ddd; padding: 8px;",
"tr:nth-child(even)": "background-color: #f2f2f2;",
}
def get_value_styles(self, dtype, value):
"""Return custom styles for specific values"""
if dtype == "float" and value < 0:
return "color: red;"
return None
# Apply the custom style provider
configure_formatter(style_provider=MyStyleProvider())The use_shared_styles parameter (enabled by default) optimizes performance when displaying
multiple DataFrames in notebook environments:
from datafusion.html_formatter import StyleProvider, configure_formatter
# Default: Use shared styles (recommended for notebooks)
configure_formatter(use_shared_styles=True)
# Disable shared styles (each DataFrame includes its own styles)
configure_formatter(use_shared_styles=False)When use_shared_styles=True:
- CSS styles and JavaScript are included only once per notebook session
- This reduces HTML output size and prevents style duplication
- Improves rendering performance with many DataFrames
- Applies consistent styling across all DataFrames
For complete control over rendering, you can implement a custom formatter:
from datafusion.html_formatter import Formatter, get_formatter
class MyFormatter(Formatter):
def format_html(self, batches, schema, has_more=False, table_uuid=None):
# Create your custom HTML here
html = "<div class='my-custom-table'>"
# ... formatting logic ...
html += "</div>"
return html
# Set as the global formatter
configure_formatter(formatter_class=MyFormatter)
# Or use the formatter just for specific operations
formatter = get_formatter()
custom_html = formatter.format_html(batches, schema)Reset to default formatting:
from datafusion.html_formatter import reset_formatter
# Reset to default settings
reset_formatter()Get the current formatter settings:
from datafusion.html_formatter import get_formatter
formatter = get_formatter()
print(formatter.max_rows)
print(formatter.theme)You can also use a context manager to temporarily change formatting settings:
from datafusion.html_formatter import formatting_context
# Default formatting
df.show()
# Temporarily use different formatting
with formatting_context(max_rows=100, theme="dark"):
df.show() # Will use the temporary settings
# Back to default formatting
df.show()You can control how much data is displayed and how much memory is used for rendering:
configure_formatter(
max_memory_bytes=4 * 1024 * 1024, # 4MB maximum memory for display
min_rows=20, # Always show at least 20 rows
max_rows=50 # Show up to 50 rows in output
)These parameters help balance comprehensive data display against performance considerations.
- Global Configuration: Use
configure_formatter()at the beginning of your notebook to set up consistent formatting for all DataFrames. - Memory Management: Set appropriate
max_memory_byteslimits to prevent performance issues with large datasets. - Shared Styles: Keep
use_shared_styles=True(default) for better performance in notebooks with multiple DataFrames. - Reset When Needed: Call
reset_formatter()when you want to start fresh with default settings. - Cell Expansion: Use
enable_cell_expansion=Truewhen cells might contain longer content that users may want to see in full.
- :doc:`../dataframe/index` - Complete guide to using DataFrames
- :doc:`../io/index` - I/O Guide for reading data from various sources
- :doc:`../data-sources` - Comprehensive data sources guide
- :ref:`io_csv` - CSV file reading
- :ref:`io_parquet` - Parquet file reading
- :ref:`io_json` - JSON file reading
- :ref:`io_avro` - Avro file reading
- :ref:`io_custom_table_provider` - Custom table providers
- API Reference - Full API reference