Skip to content

Commit 8f7bd79

Browse files
committed
update rendering for the most recent dataframe_formatter instead of the deprecated html_formatter
1 parent a185cfb commit 8f7bd79

File tree

1 file changed

+121
-104
lines changed

1 file changed

+121
-104
lines changed

docs/source/user-guide/dataframe/rendering.rst

Lines changed: 121 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,18 @@
1515
.. specific language governing permissions and limitations
1616
.. under the License.
1717
18-
HTML Rendering in Jupyter
19-
=========================
18+
DataFrame Rendering
19+
===================
2020

21-
When working in Jupyter notebooks or other environments that support rich HTML display,
22-
DataFusion DataFrames automatically render as nicely formatted HTML tables. This functionality
23-
is provided by the ``_repr_html_`` method, which is automatically called by Jupyter to provide
24-
a richer visualization than plain text output.
21+
DataFusion provides configurable rendering for DataFrames in both plain text and HTML
22+
formats. The ``datafusion.dataframe_formatter`` module controls how DataFrames are
23+
displayed in Jupyter notebooks (via ``_repr_html_``), in the terminal (via ``__repr__``),
24+
and anywhere else a string or HTML representation is needed.
2525

26-
Basic HTML Rendering
27-
--------------------
26+
Basic Rendering
27+
---------------
2828

29-
In a Jupyter environment, simply displaying a DataFrame object will trigger HTML rendering:
29+
In a Jupyter environment, displaying a DataFrame triggers HTML rendering:
3030

3131
.. code-block:: python
3232
@@ -36,162 +36,179 @@ In a Jupyter environment, simply displaying a DataFrame object will trigger HTML
3636
# Explicit display also uses HTML rendering
3737
display(df)
3838
39-
Customizing HTML Rendering
40-
---------------------------
39+
In a terminal or when converting to string, plain text rendering is used:
40+
41+
.. code-block:: python
4142
42-
DataFusion provides extensive customization options for HTML table rendering through the
43-
``datafusion.html_formatter`` module.
43+
# Plain text table output
44+
print(df)
4445
45-
Configuring the HTML Formatter
46-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46+
Configuring the Formatter
47+
-------------------------
4748

48-
You can customize how DataFrames are rendered by configuring the formatter:
49+
You can customize how DataFrames are rendered by configuring the global formatter:
4950

5051
.. code-block:: python
5152
52-
from datafusion.html_formatter import configure_formatter
53-
54-
# Change the default styling
53+
from datafusion.dataframe_formatter import configure_formatter
54+
5555
configure_formatter(
56-
max_cell_length=25, # Maximum characters in a cell before truncation
57-
max_width=1000, # Maximum width in pixels
58-
max_height=300, # Maximum height in pixels
59-
max_memory_bytes=2097152, # Maximum memory for rendering (2MB)
60-
min_rows=10, # Minimum number of rows to display
61-
max_rows=10, # Maximum rows to display in __repr__
62-
enable_cell_expansion=True,# Allow expanding truncated cells
63-
custom_css=None, # Additional custom CSS
56+
max_cell_length=25, # Maximum characters in a cell before truncation
57+
max_width=1000, # Maximum width in pixels (HTML only)
58+
max_height=300, # Maximum height in pixels (HTML only)
59+
max_memory_bytes=2097152, # Maximum memory for rendering (2MB)
60+
min_rows=10, # Minimum number of rows to display
61+
max_rows=10, # Maximum rows to display
62+
enable_cell_expansion=True, # Allow expanding truncated cells (HTML only)
63+
custom_css=None, # Additional custom CSS (HTML only)
6464
show_truncation_message=True, # Show message when data is truncated
65-
style_provider=None, # Custom styling provider
66-
use_shared_styles=True # Share styles across tables
65+
style_provider=None, # Custom styling provider (HTML only)
66+
use_shared_styles=True, # Share styles across tables (HTML only)
6767
)
6868
6969
The formatter settings affect all DataFrames displayed after configuration.
7070

7171
Custom Style Providers
72-
-----------------------
72+
----------------------
7373

74-
For advanced styling needs, you can create a custom style provider:
74+
For HTML styling, you can create a custom style provider that implements the
75+
``StyleProvider`` protocol:
7576

7677
.. code-block:: python
7778
78-
from datafusion.html_formatter import StyleProvider, configure_formatter
79-
80-
class MyStyleProvider(StyleProvider):
81-
def get_table_styles(self):
82-
return {
83-
"table": "border-collapse: collapse; width: 100%;",
84-
"th": "background-color: #007bff; color: white; padding: 8px; text-align: left;",
85-
"td": "border: 1px solid #ddd; padding: 8px;",
86-
"tr:nth-child(even)": "background-color: #f2f2f2;",
87-
}
88-
89-
def get_value_styles(self, dtype, value):
90-
"""Return custom styles for specific values"""
91-
if dtype == "float" and value < 0:
92-
return "color: red;"
93-
return None
94-
79+
from datafusion.dataframe_formatter import configure_formatter
80+
81+
class MyStyleProvider:
82+
def get_cell_style(self):
83+
"""Return CSS style string for table data cells."""
84+
return "border: 1px solid #ddd; padding: 8px; text-align: left;"
85+
86+
def get_header_style(self):
87+
"""Return CSS style string for table header cells."""
88+
return (
89+
"background-color: #007bff; color: white; "
90+
"padding: 8px; text-align: left;"
91+
)
92+
9593
# Apply the custom style provider
9694
configure_formatter(style_provider=MyStyleProvider())
9795
96+
Custom Cell Formatters
97+
----------------------
98+
99+
You can register custom formatters for specific Python types. A cell formatter is any
100+
callable that takes a value and returns a string:
101+
102+
.. code-block:: python
103+
104+
from datafusion.dataframe_formatter import get_formatter
105+
106+
formatter = get_formatter()
107+
108+
# Format floats to 2 decimal places
109+
formatter.register_formatter(float, lambda v: f"{v:.2f}")
110+
111+
# Format dates in a custom way
112+
from datetime import date
113+
formatter.register_formatter(date, lambda v: v.strftime("%B %d, %Y"))
114+
115+
Custom Cell and Header Builders
116+
-------------------------------
117+
118+
For full control over the HTML of individual cells or headers, you can set custom
119+
builder functions:
120+
121+
.. code-block:: python
122+
123+
from datafusion.dataframe_formatter import get_formatter
124+
125+
formatter = get_formatter()
126+
127+
# Custom cell builder receives (value, row, col, table_id) and returns HTML
128+
def my_cell_builder(value, row, col, table_id):
129+
color = "red" if isinstance(value, (int, float)) and value < 0 else "black"
130+
return f"<td style='color: {color}; padding: 8px;'>{value}</td>"
131+
132+
formatter.set_custom_cell_builder(my_cell_builder)
133+
134+
# Custom header builder receives a schema field and returns HTML
135+
def my_header_builder(field):
136+
return f"<th style='background: #333; color: white; padding: 8px;'>{field.name}</th>"
137+
138+
formatter.set_custom_header_builder(my_header_builder)
139+
98140
Performance Optimization with Shared Styles
99141
--------------------------------------------
100142

101-
The ``use_shared_styles`` parameter (enabled by default) optimizes performance when displaying
102-
multiple DataFrames in notebook environments:
143+
The ``use_shared_styles`` parameter (enabled by default) optimizes performance when
144+
displaying multiple DataFrames in notebook environments:
103145

104146
.. code-block:: python
105147
106-
from datafusion.html_formatter import StyleProvider, configure_formatter
148+
from datafusion.dataframe_formatter import configure_formatter
149+
107150
# Default: Use shared styles (recommended for notebooks)
108151
configure_formatter(use_shared_styles=True)
109152
110153
# Disable shared styles (each DataFrame includes its own styles)
111154
configure_formatter(use_shared_styles=False)
112155
113156
When ``use_shared_styles=True``:
157+
114158
- CSS styles and JavaScript are included only once per notebook session
115159
- This reduces HTML output size and prevents style duplication
116160
- Improves rendering performance with many DataFrames
117161
- Applies consistent styling across all DataFrames
118162

119-
Creating a Custom Formatter
120-
----------------------------
163+
Working with the Formatter Directly
164+
------------------------------------
121165

122-
For complete control over rendering, you can implement a custom formatter:
166+
You can use ``get_formatter()`` and ``set_formatter()`` for direct access to the global
167+
formatter instance:
123168

124169
.. code-block:: python
125170
126-
from datafusion.html_formatter import Formatter, get_formatter
127-
128-
class MyFormatter(Formatter):
129-
def format_html(self, batches, schema, has_more=False, table_uuid=None):
130-
# Create your custom HTML here
131-
html = "<div class='my-custom-table'>"
132-
# ... formatting logic ...
133-
html += "</div>"
134-
return html
135-
136-
# Set as the global formatter
137-
configure_formatter(formatter_class=MyFormatter)
138-
139-
# Or use the formatter just for specific operations
171+
from datafusion.dataframe_formatter import (
172+
DataFrameHtmlFormatter,
173+
get_formatter,
174+
set_formatter,
175+
)
176+
177+
# Get and modify the current formatter
140178
formatter = get_formatter()
141-
custom_html = formatter.format_html(batches, schema)
179+
print(formatter.max_rows)
180+
print(formatter.max_cell_length)
142181
143-
Managing Formatters
144-
-------------------
182+
# Create and set a fully custom formatter
183+
custom_formatter = DataFrameHtmlFormatter(
184+
max_cell_length=50,
185+
max_rows=20,
186+
enable_cell_expansion=False,
187+
)
188+
set_formatter(custom_formatter)
145189
146190
Reset to default formatting:
147191

148192
.. code-block:: python
149193
150-
from datafusion.html_formatter import reset_formatter
151-
194+
from datafusion.dataframe_formatter import reset_formatter
195+
152196
# Reset to default settings
153197
reset_formatter()
154198
155-
Get the current formatter settings:
156-
157-
.. code-block:: python
158-
159-
from datafusion.html_formatter import get_formatter
160-
161-
formatter = get_formatter()
162-
print(formatter.max_rows)
163-
print(formatter.theme)
164-
165-
Contextual Formatting
166-
----------------------
167-
168-
You can also use a context manager to temporarily change formatting settings:
169-
170-
.. code-block:: python
171-
172-
from datafusion.html_formatter import formatting_context
173-
174-
# Default formatting
175-
df.show()
176-
177-
# Temporarily use different formatting
178-
with formatting_context(max_rows=100, theme="dark"):
179-
df.show() # Will use the temporary settings
180-
181-
# Back to default formatting
182-
df.show()
183-
184199
Memory and Display Controls
185200
---------------------------
186201

187202
You can control how much data is displayed and how much memory is used for rendering:
188203

189204
.. code-block:: python
190205
206+
from datafusion.dataframe_formatter import configure_formatter
207+
191208
configure_formatter(
192209
max_memory_bytes=4 * 1024 * 1024, # 4MB maximum memory for display
193210
min_rows=20, # Always show at least 20 rows
194-
max_rows=50 # Show up to 50 rows in output
211+
max_rows=50, # Show up to 50 rows in output
195212
)
196213
197214
These parameters help balance comprehensive data display against performance considerations.
@@ -216,7 +233,7 @@ Additional Resources
216233
* :doc:`../io/index` - I/O Guide for reading data from various sources
217234
* :doc:`../data-sources` - Comprehensive data sources guide
218235
* :ref:`io_csv` - CSV file reading
219-
* :ref:`io_parquet` - Parquet file reading
236+
* :ref:`io_parquet` - Parquet file reading
220237
* :ref:`io_json` - JSON file reading
221238
* :ref:`io_avro` - Avro file reading
222239
* :ref:`io_custom_table_provider` - Custom table providers

0 commit comments

Comments
 (0)