Skip to content

Latest commit

 

History

History
177 lines (132 loc) · 12.7 KB

File metadata and controls

177 lines (132 loc) · 12.7 KB

NOTE: Generated by AI but should help you understand what's going on should you wish to contribute/are curious. As always the actual code, or emailing me is a way to be sure =)

Architecture Document: to-userscript

This document provides a detailed overview of the architecture for the to-userscript converter. It is intended for developers looking to understand the project's structure, data flow, and core design principles.

1. High-Level Overview

The to-userscript project is a Node.js command-line tool designed to convert a standard WebExtension into a single, self-contained userscript (.user.js) or a vanilla JavaScript file. Its primary goal is to emulate the WebExtension environment (APIs, resources, execution lifecycle) within the constraints of a userscript engine like Tampermonkey or Greasemonkey.

The tool achieves this by:

  1. Parsing the extension's manifest.json.
  2. Reading all specified JavaScript and CSS files.
  3. Inlining all local assets (images, fonts, HTML, CSS) as Data URLs to make the script self-contained.
  4. Polyfilling common WebExtension APIs (chrome.storage, chrome.runtime, chrome.i18n, etc.).
  5. Orchestrating the execution of content scripts and injection of styles according to the manifest's run_at rules.
  6. Generating a single output file with a proper userscript metadata block.

2. Core Concepts & Design Patterns

The project is built around several key architectural concepts:

2.1. Abstraction Layer (Adapter Pattern)

The most critical part of the polyfill is the abstraction layer. It provides a common internal interface for core functionalities that have different implementations depending on the target environment.

  • Interface: _storageSet, _storageGet, _fetch, _openTab, etc.
  • Implementations:
    • userscript target: Maps to functions (GM_setValue, GM_xmlhttpRequest, etc.). This is defined in templates/abstractionLayer.userscript.template.js.
    • vanilla target: Maps to browser-native APIs like IndexedDB for storage and fetch. This is defined in templates/abstractionLayer.vanilla.template.js.
    • postmessage target: For code running inside an iframe (like an options or popup page). It forwards all API calls to the parent window via postMessage. This is defined in templates/abstractionLayer.postmessage.template.js.
    • handle_postmessage: The counterpart to the above, it runs in the main userscript context to listen for and handle API requests from iframes.

This design decouples the WebExtension API polyfill from the underlying execution environment, making the system extensible.

2.2. WebExtension API Polyfill

The tool constructs a chrome and browser object that mimics the real WebExtension APIs.

  • Source: templates/polyfill.template.js
  • Functionality: It provides stubs and working implementations for APIs like runtime, storage, i18n, tabs, contextMenus, and notifications.
  • Dependency: It relies on the Abstraction Layer to perform its tasks. For example, chrome.storage.local.set() calls _storageSet().
  • Context Isolation: The polyfill uses a with block and proxies to create a sandboxed global scope for the extension's scripts. This ensures that window, chrome, etc., refer to the polyfilled versions, minimizing conflicts with the host page.

2.3. Asset Inlining and Management

A key feature is making the script self-contained. assetsGenerator.js is the engine for this.

  1. Recursive Processing: It starts with top-level files (like options/popup HTML) and recursively scans them for asset references (src, href, url()).
  2. Asset Conversion:
    • Binary assets (images, fonts) are read and converted to Base64 Data URLs.
    • Text assets (CSS, HTML) are read, and their contents are also recursively scanned for more assets before being inlined.
  3. Asset Map: All processed assets are stored in a large JavaScript object EXTENSION_ASSETS_MAP, which is injected into the final script.
  4. runtime.getURL Polyfill: The polyfilled chrome.runtime.getURL function does not return a relative path. Instead, it looks up the requested path in the EXTENSION_ASSETS_MAP and generates a blob: or data: URL from the in-memory content. This allows the extension's code to access its resources as if they were files.

2.4. Execution Orchestration

The generated userscript doesn't just dump all the code into the page. It follows the execution logic defined in the manifest.

  • Source: templates/orchestration.template.js and scriptAssembler.js.
  • Lifecycle: The orchestration logic is the main function of the generated script.
  • Matching: It first checks if the current page URL matches any of the content_scripts patterns from the manifest.
  • Phased Execution: If there's a match, it executes code in the order defined by run_at:
    1. document-start
    2. document-end
    3. document-idle
  • Assembly: scriptAssembler.js is responsible for taking all the individual script contents and generating a single executeAllScripts function string, which neatly orders the code and CSS injections for each phase.

2.5. Inter-Context Communication (Message Bus)

Since UI pages (options, popup) are rendered in sandboxed iframes, a message bus is required to emulate chrome.runtime.sendMessage and other cross-context communication.

  • Source: templates/messaging.template.js
  • Mechanism: It uses window.postMessage to send events between the main userscript context and any iframe contexts.
  • createEventBus: Sets up the low-level on/emit listeners.
  • createRuntime: Builds a chrome.runtime-like object on top of the event bus, handling request/response logic for sendMessage.

3. Architectural Flow & Module Breakdown

The project has two main workflows, driven by the CLI commands convert and download.

3.1. convert Workflow

This is the primary workflow for converting an extension.

cli/workflow.js -> run() acts as the main conductor.

  1. Source Analysis (determineSourceType): Determines if the source is a URL, local archive (.crx, .xpi, .zip), or a directory.
  2. Preparation (Download/Unpack):
    • If the source is a URL (e.g., Chrome/Firefox store), cli/download.js is used to fetch the extension archive. It contains specific logic (getCrxUrl, getFirefoxAddonUrl) to find the direct download link.
    • If the source is an archive, cli/unpack.js is used to extract its contents into a temporary directory. It can handle .zip, .xpi, and CRXv2/v3 formats.
  3. Core Conversion (convert.js -> convertExtension()): This is the pure, library-level conversion function.
    • Manifest Parsing (manifestParser.js): Reads manifest.json, applies localization from _locales using locales.js, and normalizes the structure.
    • Resource Processing (resourceProcessor.js): Reads all JS and CSS files listed in content_scripts and background into memory maps, keyed by their relative paths.
    • Output Building (outputBuilder.js): This is the assembler. It orchestrates the creation of the final script.
      • It initializes assetsGenerator.js to process all assets and create the EXTENSION_ASSETS_MAP.
      • It calls metadataGenerator.js to create the // ==UserScript== block, including resolving the best icon with getIcon.js.
      • It calls buildPolyfillString.js which combines the abstraction layer and polyfill templates.
      • It calls scriptAssembler.js to create the ordered script execution logic.
      • It injects all these generated parts into the master orchestration.template.js.
  4. Post-processing (cli/minify.js): Optionally, the final script is minified with terser or beautified with prettier.
  5. File Output: The final string is written to the specified output file.

3.2. File-by-File Module Responsibilities

src/cli/ - Command-Line Interface Layer

  • index.js: The CLI entry point. Uses yargs to define commands (convert, download, require) and their options. Delegates execution to workflow.js.
  • workflow.js: The high-level orchestrator for CLI commands. It manages temporary directories, spinners, and the step-by-step flow of downloading, unpacking, and converting. It separates CLI concerns from the core conversion logic.
  • download.js: Handles downloading files from URLs. It includes a progress bar and logic to determine the downloadable URL from store pages.
  • downloadExt.js: A helper specifically for constructing the direct download URL for a Chrome Web Store extension.
  • unpack.js: Extracts extension archives (.crx, .xpi, .zip) using yauzl. Contains logic to handle the CRX header.
  • minify.js: A wrapper around terser and prettier to provide minification and beautification, correctly preserving the userscript metadata block.
  • require.js: Logic for the require command, which generates a metadata block that @requires another userscript.

src/ - Core Logic Layer

  • convert.js: A high-level library function that encapsulates the entire conversion process. It's the main entry point for using the converter programmatically.
  • manifestParser.js: Responsible for reading, parsing, and normalizing manifest.json. It integrates with locales.js to provide localized names and descriptions.
  • resourceProcessor.js: Reads the content of all JS and CSS files specified in the manifest.
  • assetsGenerator.js: The powerful asset inlining engine. Recursively finds and converts all referenced assets to be self-contained within the script.
  • scriptAssembler.js: Organizes the JS and CSS from content scripts into an executeAllScripts function, respecting the run_at order.
  • outputBuilder.js: The master assembler. It takes the output from all other core modules and uses templates to build the final script string.
  • buildPolyfillString.js: Specifically responsible for constructing the complete polyfill code by combining the messaging, abstraction layer, and assets helper templates.
  • abstractionLayer.js: Selects the correct abstraction layer code based on the target and determines the necessary @grant permissions for userscripts.
  • locales.js: Handles loading _locales/ directories and replacing __MSG_...__ placeholders.
  • getIcon.js: Finds the most appropriate icon from the manifest and converts it to a Data URL.
  • templateManager.js: A simple manager to read and cache the .template.js files.
  • utils.js: A collection of utility functions used across the project (e.g., normalizePath, convertMatchPatternToRegExp).

src/templates/ - Generated Code Blueprint Layer

These files are not executed by the tool itself; they are the source code for the generated userscript.

  • orchestration.template.js: The main runtime logic of the final script. It contains the logic to check URL matches, trigger phased execution, and handle UI (popup/options modals).
  • polyfill.template.js: The core chrome.* API polyfill.
  • abstractionLayer.*.template.js: The different backends for the polyfill (Greasemonkey, Vanilla JS, PostMessage).
  • messaging.template.js: The postMessage-based event bus for communication between the main script and iframes.
  • trustedTypes.template.js: A small script injected via @require to bypass Trusted Types security policies on some websites.

4. The Generated Userscript Architecture

The final output file has its own internal architecture, composed from the templates:

// ==UserScript==
// ... Metadata Block ...
// ==/UserScript==

(function() { // IIFE for scope isolation
    'use strict';

    // 1. UNIFIED POLYFILL is defined here
    //    - messaging.template.js -> createEventBus, createRuntime
    //    - abstractionLayer.*.template.js -> _storageSet, _fetch, etc.
    //    - assetsGenerator code -> EXTENSION_ASSETS_MAP, _createAssetUrl
    //    - polyfill.template.js -> buildPolyfill() which creates chrome.*

    // 2. BACKGROUND SCRIPT ENVIRONMENT is defined and executed
    //    - Runs all background scripts inside the polyfill's scope.
    //    - This happens immediately on script start.

    // 3. ORCHESTRATION LOGIC is defined and executed
    //    - Checks if location.href matches a content_script pattern.
    //    - If it matches:
    //        - Calls `executeAllScripts()`.
    //        - This function injects CSS and runs JS in phases:
    //          - document-start
    //          - document-end
    //          - document-idle
    //    - Registers GM_registerMenuCommand for options/popup pages.
    //    - Options/Popup pages are rendered in a modal with an iframe.
    //    - The iframe's content is populated with the inlined HTML and
    //      a specialized 'postmessage' version of the polyfill.
})();