Skip to content

Commit 1289bf0

Browse files
committed
Update documentation. Resolves #192
1 parent 719a818 commit 1289bf0

6 files changed

Lines changed: 145 additions & 784 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ assert len(results) == 1
108108
assert results[0].text() == "lexbor is AwesOme"
109109
```
110110

111-
* [Detailed overview](https://github.com/rushter/selectolax/blob/master/examples/walkthrough.ipynb)
111+
* [More examples](https://selectolax.readthedocs.io/en/latest/examples.html)
112112

113113
### Available backends
114114

docs/examples.rst

Lines changed: 128 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,15 @@ Basic HTML Parsing
1313
There are 3 ways to create or parse objects in Selectolax:
1414

1515
1. Parse HTML as a full document using ``LexborHTMLParser()``
16-
2. Parse HTML as a fragment using ``LexborHTMLParser().parse_fragment()``
16+
2. Parse HTML as a fragment using ``LexborHTMLParser(..., is_fragment=True)``
1717
3. Create single node using ``LexborHTMLParser().create_tag()``
1818

1919
- ``LexborHTMLParser()`` - Returns the HTML tree as parsed by Lexbor, unmodified. The HTML is assumed to be a full document. ``<html>``, ``<head>``, and ``<body>`` tags are added if missing.
2020

21-
- ``parse_fragment()`` - Intended for HTML fragments/partials. Returns a list of Nodes. Given HTML doesn't need to contain ``<html>``, ``<head>``, ``<body>``. HTML can have multiple root elements.
21+
- ``LexborHTMLParser(..., is_fragment=True)`` - Intended for HTML fragments/partials.
22+
Behaves the same way as `DocumentFragment` in browsers.
23+
Drops ``<html>``, ``<head>``, and ``<body>`` tags if present in the input HTML.
24+
Use it to parse snippets of HTML that are not complete documents.
2225

2326
- ``create_tag()`` - Create a single empty node for given tag.
2427

@@ -55,7 +58,7 @@ There are 3 ways to create or parse objects in Selectolax:
5558
html_tree = LexborHTMLParser(html)
5659
5760
# Parse HTML as a fragment
58-
frag_tree = LexborHTMLParser().parse_fragment(fragment)
61+
frag_tree = LexborHTMLParser(html, is_fragment=True)
5962
6063
# Create a single node
6164
node = LexborHTMLParser().create_tag("div")
@@ -173,6 +176,34 @@ Ensure exactly one match exists, otherwise raise an error.
173176
174177
ValueError: Expected 1 match, but found 2 matches
175178
179+
CSS Chaining
180+
~~~~~~~~~~~~
181+
182+
Chain multiple CSS selectors to progressively filter results.
183+
184+
.. code-block:: python
185+
186+
html = """
187+
<div id="container">
188+
<span class="red"></span>
189+
<span class="green"></span>
190+
<span class="red"></span>
191+
<span class="green"></span>
192+
</div>
193+
"""
194+
195+
parser = LexborHTMLParser(html)
196+
197+
# Chain selectors: start with div, then span, then .red
198+
red_spans = parser.select('div').css("span").css(".red").matches
199+
print([node.html for node in red_spans])
200+
201+
**Output:**
202+
203+
.. code-block:: text
204+
205+
['<span class="red"></span>', '<span class="red"></span>']
206+
176207
HTML manipulation
177208
-----------------
178209

@@ -399,6 +430,42 @@ Add, modify, and remove element attributes.
399430
<p class="p3" vid>Lorem ipsum</p>
400431
</div>
401432
433+
Inserting Nodes
434+
~~~~~~~~~~~~~~~
435+
436+
Insert new content into the DOM at specific positions.
437+
438+
.. code-block:: python
439+
440+
html = """
441+
<div id="container">
442+
<span class="red"></span>
443+
<span class="green"></span>
444+
<span class="red"></span>
445+
<span class="green"></span>
446+
</div>
447+
"""
448+
449+
parser = LexborHTMLParser(html)
450+
451+
# Insert text before an element
452+
red_node = parser.css_first('.red')
453+
red_node.insert_before("Hello")
454+
455+
# Insert HTML nodes
456+
subtree = LexborHTMLParser("<div>Hi</div>")
457+
green_node = parser.css_first('.green')
458+
green_node.insert_before(subtree)
459+
460+
# Insert before, after, or as child
461+
car_div = LexborHTMLParser().create_tag("div")
462+
car_div.inner_html = "Car"
463+
green_node.insert_before(car_div)
464+
green_node.insert_after(car_div)
465+
green_node.insert_child(car_div)
466+
467+
print(parser.body.html)
468+
402469
Tree Traversal
403470
--------------
404471

@@ -544,6 +611,37 @@ Extract all links and images from HTML content.
544611
Advanced selectors
545612
------------------
546613

614+
Text Content Filtering
615+
~~~~~~~~~~~~~~~~~~~~~~
616+
617+
Use advanced selectors to filter elements based on their text content.
618+
619+
.. code-block:: python
620+
621+
html = """
622+
<script>
623+
var super_variable = 100;
624+
</script>
625+
<script>
626+
console.log('debug');
627+
</script>
628+
"""
629+
630+
parser = LexborHTMLParser(html)
631+
632+
# Filter script tags containing specific text
633+
scripts_with_super = parser.select('script').text_contains("super").matches
634+
print([node.text() for node in scripts_with_super])
635+
636+
**Output:**
637+
638+
.. code-block:: text
639+
640+
['\n var super_variable = 100;\n']
641+
642+
CSS Attribute and Pseudo-class Selectors
643+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
644+
547645
.. code-block:: python
548646
549647
html = """
@@ -607,6 +705,33 @@ Advanced selectors
607705
First article title: First Post
608706
Post ID 1 title: First Post
609707
708+
Text Content Pseudo-class Selectors
709+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
710+
711+
Use lexbor-specific pseudo-classes for case-sensitive and case-insensitive text matching.
712+
713+
.. code-block:: python
714+
715+
html = '<div><p>hello </p><p id="main">lexbor is AwesOme</p></div>'
716+
parser = LexborHTMLParser(html)
717+
718+
# Case-insensitive search
719+
results_ci = parser.css('p:lexbor-contains("awesome" i)')
720+
print(f"Case-insensitive results: {len(results_ci)}")
721+
722+
# Case-sensitive search
723+
results_cs = parser.css('p:lexbor-contains("AwesOme")')
724+
print(f"Case-sensitive results: {len(results_cs)}")
725+
print(f"Matching text: {results_cs[0].text()}")
726+
727+
**Output:**
728+
729+
.. code-block:: text
730+
731+
Case-insensitive results: 1
732+
Case-sensitive results: 1
733+
Matching text: lexbor is AwesOme
734+
610735
Sibling Navigation
611736
------------------
612737

0 commit comments

Comments
 (0)