-
new_from($source)Create a new HTMLScraper object from the passed source.
$sourcecan be of typeDOMNodeList,DOMNodeorstring.Returns:
Type Description arrayWhen $sourceis an instance ofDOMNodeListthen returns anarrayofHTMLScraperobjects.HTMLScraperWhen $sourceis an instance ofDOMNodeor astring -
CSS_to_Xpath(string $path) : stringTranslates CSS selector to XPath expression.
-
__toString() : stringMagic function to convert
HTMLScraperinto astringcontaining the HTML code of the loaded document. -
textContent() : stringGet the textContent of the loaded HTML document.
-
load_HTML_str(string $source, int $options = NULL) : boolLoad HTML from a string.
$options
It is for passing LIBXML constant flags.LIBXML_NOERROR | LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIEDis always applied (even when$optionsisNULL).
Returns
TRUEon success andFALSEon failure. -
load_HTML_file(string $filename, int $options = NULL, array $context = NULL) : boolLoad HTML from a file.
-
$options
see$optionsinHTMLScraper->load_HTML_str() -
$context
see$contextinstream_context_create()
Returns
TRUEon success andFALSEon failure. -
-
xpath(string $expr, int ...$items)Get
DOMNodethat match the passed XPath path expression.$items
Index of theDOMNodeto be returned in theDOMNodeListmatching the XPath path expression.
It is 0-indexed. (i.e. to get first node use0, for second node use1and so on).
Negative values can be used for referencing the list item from the end. (i.e. use-1for last node,-2for second last node and so on).
If invalid index is usedNULLis returned. (i.e. if only two nodes match the XPath path expression then using 3 will returnNULL).
Returns:
Type Description NULLWhen no nodes matches the XPath path expression DOMNodeListWhen no ...$itemsare passedDOMNodeWhen only one ...$itemsis passedarrayWhen more than one ...$itemsare passed. Array containsDOMNodeorNULLReturns
DOMNodeList(orDOMNodewhen$itemindex is specified) that matches the specified XPath path expression. -
querySelector(string $selector, int ...$items)Same as
HTMLScraper->xpath()except that it uses CSS selector instead of XPath path expression. -
xpath_extract($mapper, string $expr, int ...$items)Find
DOMNode(s) in the same way as inHTMLScraper->xpath()then extract data from theDOMNode(s) as specified by the$mapper.$mapper
It can be any one of thestringspecified below or afunctionthat takes aDOMNodeand returns any extracted value.Mapper Value Description 'innerHTML'Maps DOMNodeto its innerHTML'outerHTML'Maps DOMNodeto its outerHTML'textContent'Maps DOMNodeto its textContent'textContentTrim'Maps DOMNodeto its textContent without any whitespaces at the beginning or at the end of the textContent
-
querySelector_extract($mapper, string $selector, int ...$items)Same as
HTMLScraper->xpath_extract()except that it uses CSS selector instead of XPath path expression.
-
innerHTML(DOMNode &$node) : stringReturns innerHTML of the passed
DOMNode. -
outerHTML(DOMNode &$node) : stringReturns outerHTML of the passed
DOMNode. -
xpath(DOMNode &$node, string $expr, int ...$items)Similar to
HTMLScraper->xpath()except that it works on aDOMNodeinstead of theHTMLScraper'sDOMDocument. -
querySelector(DOMNode &$node, string $selector, int ...$items)Similar to
DOMNodeHelper::xpath()except it uses CSS selector instead of a XPath path expression. -
getChildNode(DOMNode &$node, int ...$indexes)Get one or more child nodes of the
DOMNode.$indexes
See$itemsinHTMLScraper->xpath().
Returns:
Type Description DOMNodeListWhen no ...$indexesis passedDOMNodeWhen only one ...$indexesis passedarrayWhen more that one ...$indexesis passed. Array containsDOMNodeorNULL -
getChildElements(DOMNode &$node, int ...$indexes) : arraySame as
DOMNode::getChildNode()except that it works on child elements instead of child nodes. -
remove_self(DOMNode &$node)Removes the
DOMNodefrom its parentDOMDocument. -
filter_child_elements_xpath(DOMNode &$node, string ...$exprs)Removes the child elements of the passed
DOMNodethat match the passed XPath path expression(s). -
filter_child_elements_querySelector(DOMNode &$node, string ...$selectors)Removes the child elements of the passed
DOMNodethat match the passed CSS selector(s). -
filter_child_elements_index(DOMNode &$node, int ...$indexes)Removes the child elements of the passed
DOMNodespecified by the...$indexes.$indexes
See$itemsinHTMLScraper->xpath().