1515.. specific language governing permissions and limitations
1616.. under the License.
1717
18- HTML Rendering in Jupyter
19- =========================
18+ DataFrame Rendering
19+ ===================
2020
21- When working in Jupyter notebooks or other environments that support rich HTML display,
22- DataFusion DataFrames automatically render as nicely formatted HTML tables. This functionality
23- is provided by the ``_repr_html_ `` method, which is automatically called by Jupyter to provide
24- a richer visualization than plain text output .
21+ DataFusion provides configurable rendering for DataFrames in both plain text and HTML
22+ formats. The `` datafusion.dataframe_formatter `` module controls how DataFrames are
23+ displayed in Jupyter notebooks (via ``_repr_html_ ``), in the terminal (via `` __repr__ ``),
24+ and anywhere else a string or HTML representation is needed .
2525
26- Basic HTML Rendering
27- --------------------
26+ Basic Rendering
27+ ---------------
2828
29- In a Jupyter environment, simply displaying a DataFrame object will trigger HTML rendering:
29+ In a Jupyter environment, displaying a DataFrame triggers HTML rendering:
3030
3131.. code-block :: python
3232
@@ -36,162 +36,179 @@ In a Jupyter environment, simply displaying a DataFrame object will trigger HTML
3636 # Explicit display also uses HTML rendering
3737 display(df)
3838
39- Customizing HTML Rendering
40- ---------------------------
39+ In a terminal or when converting to string, plain text rendering is used:
40+
41+ .. code-block :: python
4142
42- DataFusion provides extensive customization options for HTML table rendering through the
43- `` datafusion.html_formatter `` module.
43+ # Plain text table output
44+ print (df)
4445
45- Configuring the HTML Formatter
46- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46+ Configuring the Formatter
47+ -------------------------
4748
48- You can customize how DataFrames are rendered by configuring the formatter:
49+ You can customize how DataFrames are rendered by configuring the global formatter:
4950
5051.. code-block :: python
5152
52- from datafusion.html_formatter import configure_formatter
53-
54- # Change the default styling
53+ from datafusion.dataframe_formatter import configure_formatter
54+
5555 configure_formatter(
56- max_cell_length = 25 , # Maximum characters in a cell before truncation
57- max_width = 1000 , # Maximum width in pixels
58- max_height = 300 , # Maximum height in pixels
59- max_memory_bytes = 2097152 , # Maximum memory for rendering (2MB)
60- min_rows = 10 , # Minimum number of rows to display
61- max_rows = 10 , # Maximum rows to display in __repr__
62- enable_cell_expansion = True ,# Allow expanding truncated cells
63- custom_css = None , # Additional custom CSS
56+ max_cell_length = 25 , # Maximum characters in a cell before truncation
57+ max_width = 1000 , # Maximum width in pixels (HTML only)
58+ max_height = 300 , # Maximum height in pixels (HTML only)
59+ max_memory_bytes = 2097152 , # Maximum memory for rendering (2MB)
60+ min_rows = 10 , # Minimum number of rows to display
61+ max_rows = 10 , # Maximum rows to display
62+ enable_cell_expansion = True , # Allow expanding truncated cells (HTML only)
63+ custom_css = None , # Additional custom CSS (HTML only)
6464 show_truncation_message = True , # Show message when data is truncated
65- style_provider = None , # Custom styling provider
66- use_shared_styles = True # Share styles across tables
65+ style_provider = None , # Custom styling provider (HTML only)
66+ use_shared_styles = True , # Share styles across tables (HTML only)
6767 )
6868
6969 The formatter settings affect all DataFrames displayed after configuration.
7070
7171Custom Style Providers
72- -----------------------
72+ ----------------------
7373
74- For advanced styling needs, you can create a custom style provider:
74+ For HTML styling, you can create a custom style provider that implements the
75+ ``StyleProvider `` protocol:
7576
7677.. code-block :: python
7778
78- from datafusion.html_formatter import StyleProvider, configure_formatter
79-
80- class MyStyleProvider (StyleProvider ):
81- def get_table_styles (self ):
82- return {
83- " table" : " border-collapse: collapse; width: 100%;" ,
84- " th" : " background-color: #007bff; color: white; padding: 8px; text-align: left;" ,
85- " td" : " border: 1px solid #ddd; padding: 8px;" ,
86- " tr:nth-child(even)" : " background-color: #f2f2f2;" ,
87- }
88-
89- def get_value_styles (self , dtype , value ):
90- """ Return custom styles for specific values"""
91- if dtype == " float" and value < 0 :
92- return " color: red;"
93- return None
94-
79+ from datafusion.dataframe_formatter import configure_formatter
80+
81+ class MyStyleProvider :
82+ def get_cell_style (self ):
83+ """ Return CSS style string for table data cells."""
84+ return " border: 1px solid #ddd; padding: 8px; text-align: left;"
85+
86+ def get_header_style (self ):
87+ """ Return CSS style string for table header cells."""
88+ return (
89+ " background-color: #007bff; color: white; "
90+ " padding: 8px; text-align: left;"
91+ )
92+
9593 # Apply the custom style provider
9694 configure_formatter(style_provider = MyStyleProvider())
9795
96+ Custom Cell Formatters
97+ ----------------------
98+
99+ You can register custom formatters for specific Python types. A cell formatter is any
100+ callable that takes a value and returns a string:
101+
102+ .. code-block :: python
103+
104+ from datafusion.dataframe_formatter import get_formatter
105+
106+ formatter = get_formatter()
107+
108+ # Format floats to 2 decimal places
109+ formatter.register_formatter(float , lambda v : f " { v:.2f } " )
110+
111+ # Format dates in a custom way
112+ from datetime import date
113+ formatter.register_formatter(date, lambda v : v.strftime(" %B %d , %Y" ))
114+
115+ Custom Cell and Header Builders
116+ -------------------------------
117+
118+ For full control over the HTML of individual cells or headers, you can set custom
119+ builder functions:
120+
121+ .. code-block :: python
122+
123+ from datafusion.dataframe_formatter import get_formatter
124+
125+ formatter = get_formatter()
126+
127+ # Custom cell builder receives (value, row, col, table_id) and returns HTML
128+ def my_cell_builder (value , row , col , table_id ):
129+ color = " red" if isinstance (value, (int , float )) and value < 0 else " black"
130+ return f " <td style='color: { color} ; padding: 8px;'> { value} </td> "
131+
132+ formatter.set_custom_cell_builder(my_cell_builder)
133+
134+ # Custom header builder receives a schema field and returns HTML
135+ def my_header_builder (field ):
136+ return f " <th style='background: #333; color: white; padding: 8px;'> { field.name} </th> "
137+
138+ formatter.set_custom_header_builder(my_header_builder)
139+
98140 Performance Optimization with Shared Styles
99141--------------------------------------------
100142
101- The ``use_shared_styles `` parameter (enabled by default) optimizes performance when displaying
102- multiple DataFrames in notebook environments:
143+ The ``use_shared_styles `` parameter (enabled by default) optimizes performance when
144+ displaying multiple DataFrames in notebook environments:
103145
104146.. code-block :: python
105147
106- from datafusion.html_formatter import StyleProvider, configure_formatter
148+ from datafusion.dataframe_formatter import configure_formatter
149+
107150 # Default: Use shared styles (recommended for notebooks)
108151 configure_formatter(use_shared_styles = True )
109152
110153 # Disable shared styles (each DataFrame includes its own styles)
111154 configure_formatter(use_shared_styles = False )
112155
113156 When ``use_shared_styles=True ``:
157+
114158- CSS styles and JavaScript are included only once per notebook session
115159- This reduces HTML output size and prevents style duplication
116160- Improves rendering performance with many DataFrames
117161- Applies consistent styling across all DataFrames
118162
119- Creating a Custom Formatter
120- ----------------------------
163+ Working with the Formatter Directly
164+ ------------------------------------
121165
122- For complete control over rendering, you can implement a custom formatter:
166+ You can use ``get_formatter() `` and ``set_formatter() `` for direct access to the global
167+ formatter instance:
123168
124169.. code-block :: python
125170
126- from datafusion.html_formatter import Formatter, get_formatter
127-
128- class MyFormatter (Formatter ):
129- def format_html (self , batches , schema , has_more = False , table_uuid = None ):
130- # Create your custom HTML here
131- html = " <div class='my-custom-table'>"
132- # ... formatting logic ...
133- html += " </div>"
134- return html
135-
136- # Set as the global formatter
137- configure_formatter(formatter_class = MyFormatter)
138-
139- # Or use the formatter just for specific operations
171+ from datafusion.dataframe_formatter import (
172+ DataFrameHtmlFormatter,
173+ get_formatter,
174+ set_formatter,
175+ )
176+
177+ # Get and modify the current formatter
140178 formatter = get_formatter()
141- custom_html = formatter.format_html(batches, schema)
179+ print (formatter.max_rows)
180+ print (formatter.max_cell_length)
142181
143- Managing Formatters
144- -------------------
182+ # Create and set a fully custom formatter
183+ custom_formatter = DataFrameHtmlFormatter(
184+ max_cell_length = 50 ,
185+ max_rows = 20 ,
186+ enable_cell_expansion = False ,
187+ )
188+ set_formatter(custom_formatter)
145189
146190 Reset to default formatting:
147191
148192.. code-block :: python
149193
150- from datafusion.html_formatter import reset_formatter
151-
194+ from datafusion.dataframe_formatter import reset_formatter
195+
152196 # Reset to default settings
153197 reset_formatter()
154198
155- Get the current formatter settings:
156-
157- .. code-block :: python
158-
159- from datafusion.html_formatter import get_formatter
160-
161- formatter = get_formatter()
162- print (formatter.max_rows)
163- print (formatter.theme)
164-
165- Contextual Formatting
166- ----------------------
167-
168- You can also use a context manager to temporarily change formatting settings:
169-
170- .. code-block :: python
171-
172- from datafusion.html_formatter import formatting_context
173-
174- # Default formatting
175- df.show()
176-
177- # Temporarily use different formatting
178- with formatting_context(max_rows = 100 , theme = " dark" ):
179- df.show() # Will use the temporary settings
180-
181- # Back to default formatting
182- df.show()
183-
184199 Memory and Display Controls
185200---------------------------
186201
187202You can control how much data is displayed and how much memory is used for rendering:
188203
189204.. code-block :: python
190205
206+ from datafusion.dataframe_formatter import configure_formatter
207+
191208 configure_formatter(
192209 max_memory_bytes = 4 * 1024 * 1024 , # 4MB maximum memory for display
193210 min_rows = 20 , # Always show at least 20 rows
194- max_rows = 50 # Show up to 50 rows in output
211+ max_rows = 50 , # Show up to 50 rows in output
195212 )
196213
197214 These parameters help balance comprehensive data display against performance considerations.
@@ -216,7 +233,7 @@ Additional Resources
216233* :doc: `../io/index ` - I/O Guide for reading data from various sources
217234* :doc: `../data-sources ` - Comprehensive data sources guide
218235* :ref: `io_csv ` - CSV file reading
219- * :ref: `io_parquet ` - Parquet file reading
236+ * :ref: `io_parquet ` - Parquet file reading
220237* :ref: `io_json ` - JSON file reading
221238* :ref: `io_avro ` - Avro file reading
222239* :ref: `io_custom_table_provider ` - Custom table providers
0 commit comments