Implemented a comprehensive Puppeteer-based scraping system to extract UniFi Network API documentation from the authenticated portal at https://unifi.ui.com/settings/api-docs.
- Session cookie management
- Credential handling (env vars + interactive prompts)
- Session validation and reuse
- 2FA support (60-second manual completion window)
- Secure cookie storage
Navigation Extractor (scripts/scraper/extractors/navigation-extractor.js)
- Multiple selector strategies with fallbacks
- Automatic category discovery
- Endpoint link extraction
- Heuristic-based navigation finding
Endpoint Extractor (scripts/scraper/extractors/endpoint-extractor.js)
- Detailed endpoint information extraction
- Retry logic with exponential backoff
- Batch processing with rate limiting
- Section-based content extraction
Schema Extractor (scripts/scraper/extractors/schema-extractor.js)
- JSON schema parsing
- HTML table extraction
- Definition list parsing
- Property list extraction
- Multiple format support
- Structured API specification format
- Normalization and validation
- Statistics generation
- Version management
- Completeness checking
Wait-for-Selectors (scripts/scraper/utils/wait-for-selectors.js)
- Multi-selector waiting with fallbacks
- Text content matching
- Element stability detection
- Retry with exponential backoff
Screenshot Debugger (scripts/scraper/utils/screenshot-debugger.js)
- Automatic screenshot capture
- HTML and DOM dumps
- Element highlighting
- Debug snapshots
- Command-line interface
- Environment variable loading
- Interactive credential prompts
- Progress indicators (ora spinners)
- Validation and statistics
- JSON output
- Version-to-version comparison
- Added/removed/modified detection
- Parameter-level change tracking
- Colored console output
- JSON diff export
- Markdown generation from API spec
- Diff integration (badges for new/modified)
- Table of contents generation
- Getting Started section
- Parameter and schema tables
- Example code blocks
- Changelog section
scripts/
├── package.json # NPM dependencies and scripts
├── .env.example # Environment variable template
├── README.md # User documentation
├── scraper/
│ ├── scrape-api-docs.js # Main entry point ✅
│ ├── auth/
│ │ └── unifi-login.js # Authentication ✅
│ ├── extractors/
│ │ ├── navigation-extractor.js # Navigation ✅
│ │ ├── endpoint-extractor.js # Endpoints ✅
│ │ └── schema-extractor.js # Schemas ✅
│ ├── parsers/
│ │ └── api-spec-parser.js # Parser ✅
│ └── utils/
│ ├── wait-for-selectors.js # Waiting utilities ✅
│ └── screenshot-debugger.js # Debug helpers ✅
├── compare/
│ └── diff-api-specs.js # Comparison tool ✅
└── update/
└── update-docs.js # Doc generator ✅
{
"scrape": "node scraper/scrape-api-docs.js",
"scrape:headed": "node scraper/scrape-api-docs.js --headed",
"compare": "node compare/diff-api-specs.js",
"update-docs": "node update/update-docs.js"
}puppeteer@^24.0.0- Browser automationdotenv@^16.4.0- Environment variable loadingprompts@^2.4.2- Interactive CLI promptschalk@^5.3.0- Terminal colorsora@^8.0.1- Spinner animations
-
Credentials:
- Never stored in code
- Environment variables (
.env) - Interactive prompts as fallback
- Masked password input
-
Session Cookies:
- Stored in
session-cookies.json(gitignored) - Validation before reuse
- Automatic cleanup option
- Stored in
-
Gitignore Additions:
scripts/session-cookies.json scripts/screenshots/ scripts/scraped-api-spec-*.json scripts/api-diff-*.json scripts/node_modules/ scripts/package-lock.json
- ✅ Multiple selector strategies with fallbacks
- ✅ Retry logic with exponential backoff
- ✅ Session cookie reuse
- ✅ 2FA support
- ✅ Error handling throughout
- ✅ Validation and completeness checking
- ✅ Headed/headless mode toggle
- ✅ Debug mode with screenshots
- ✅ DOM structure dumps
- ✅ Element highlighting
- ✅ Progress indicators
- ✅ Interactive credential prompts
- ✅ Version shorthand (e.g.,
10.1.68instead of full path) - ✅ Colored console output
- ✅ Comprehensive documentation
- ✅ NPM scripts for common tasks
cd scripts
npm install
npm run scrape -- --version=10.1.68npm run compare 10.0.160 10.1.68npm run update-docs 10.1.68node scraper/scrape-api-docs.js --headed --debugThe scraper is fully implemented but requires manual testing with actual UniFi credentials:
-
Authentication Flow
- Test with valid credentials
- Test with 2FA enabled
- Test session cookie reuse
-
Extraction Accuracy
- Verify navigation structure
- Validate endpoint details
- Check schema extraction
-
Comparison & Documentation
- Test diff generation
- Verify markdown output
- Check for broken links
When testing with real credentials:
# 1. Test authentication
npm run scrape:headed -- --clear-session
# 2. Test headless scraping
npm run scrape -- --version=10.1.68
# 3. Test comparison (requires v10.0.160 scraped first)
npm run compare 10.0.160 10.1.68
# 4. Test documentation generation
npm run update-docs 10.1.68
# 5. Validate output
cat docs/UNIFI_API.md-
Portal Structure Dependency
- Selectors may break if UniFi portal UI changes
- Multiple fallback strategies implemented to mitigate
-
Sequential Processing
- Endpoints processed one at a time (rate limiting consideration)
- Can be slow for large API surfaces
-
Authentication Requirement
- Requires valid UniFi account with API doc access
- Cannot be fully automated without credentials
-
Early Access Version
- v10.1.68 is EA and subject to change
- Documentation may need frequent updates
✅ Criteria Met (Implementation Complete):
- Authentication module with session management
- Navigation structure extraction
- Endpoint detail extraction
- Schema and parameter extraction
- API specification normalization
- Version comparison tool
- Markdown documentation generator
- Comprehensive error handling
- Debug capabilities
- User documentation
⏸️ Pending (Requires Manual Testing):
- Successful authentication to portal
- Accurate extraction of v10.1.68 API
- Valid comparison with v10.0.160
- Correct markdown generation
- No broken links in output
- Manual testing with UniFi credentials
- Validate scraper output
- Review generated documentation
- Fix any selector issues discovered
- Parallel extraction with rate limiting
- OpenAPI/Swagger output format
- Automated CI/CD integration
- Scheduled documentation updates
- HTML diff visualization
scripts/package.jsonscripts/.env.examplescripts/README.mdscripts/scraper/scrape-api-docs.jsscripts/scraper/auth/unifi-login.jsscripts/scraper/extractors/navigation-extractor.jsscripts/scraper/extractors/endpoint-extractor.jsscripts/scraper/extractors/schema-extractor.jsscripts/scraper/parsers/api-spec-parser.jsscripts/scraper/utils/wait-for-selectors.jsscripts/scraper/utils/screenshot-debugger.jsscripts/compare/diff-api-specs.jsscripts/update/update-docs.jsSCRAPER_IMPLEMENTATION.md(this file)
.gitignore(added scraper-specific ignores)
The UniFi API documentation scraper is fully implemented and ready for testing. All components are in place and follow the plan specifications. The system is robust, well-documented, and ready for manual validation with actual UniFi credentials.
The implementation demonstrates:
- Security-first design with credential handling
- Robust extraction with multiple fallback strategies
- Excellent debugging capabilities
- User-friendly CLI interface
- Comprehensive documentation for maintenance
Once manual testing confirms functionality, the scraper can be used to extract v10.1.68 API documentation and update the project docs accordingly.