Lets restart this project step by step. I have already created a new branch.
Here are the tasks:
- Rewrite
Claude.md, keeping development workflow and basic guidance for a Rust library and CLI
- Clear the existing Rust codebase, TypeScript type bindings, type management scripts
- Release management, website, GitHub workflows can stay
- Write tests as mentioned below
- Write the features, check tests
Tests for features to be built (described later)
- Test text trimming, cleaning or CLI arguments or parts extracted from them (as described in features)
- Test opening browser using WebDriver using
https://example.com
- Test error handling when WebDriver is not working (connect to a non-configured port)
- Test opening
https://example.com and its HTML title
- Thorough tests for rules for HTML node tree creation from HTML sources
Features to develop
- Create CLI to accept
--link argument which can be one, multiple --link arguments can exist, check duplicates
- URL browser that uses WebDriver to open given URL in local browser
- Define storage for URLs per domain, unique URLs only with data per URL (described below)
- Per URL, store the fetch status, HTML node tree (described below)
- Per URL, load URL in browser as needed, extract the page's HTML source by executing JavaScript in the browser
- Then parse the source into a node tree using rules mentioned below
- When all links are fetched, show the HTML [head > title] per page, from the node tree
Rules for HTML node tree
- Per element, save the name, class, id and content
- Per element's class, split by space, trim - save as a list of words
- Per element's id, ignore ids that are numeric and seem
- Ignore nodes that are blank
- Ignore tags like script, style, noscript, svg, path, img, video, audio, etc.
- Ignore tags that have same parent nodes (including class and ID) and content
- Merge contents of two or more immediate siblings if they have text only contents
Lets restart this project step by step. I have already created a new branch.
Here are the tasks:
Claude.md, keeping development workflow and basic guidance for a Rust library and CLITests for features to be built (described later)
https://example.comhttps://example.comand its HTML titleFeatures to develop
--linkargument which can be one, multiple--linkarguments can exist, check duplicatesRules for HTML node tree