Skip to content

Latest commit

 

History

History
370 lines (257 loc) · 14.4 KB

File metadata and controls

370 lines (257 loc) · 14.4 KB

Edit Banana Logo

🍌 Edit Banana

中文 | English

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Python License CUDA WeChat GitHub stars


Try It Now!

Try Online Demo

👆 Click above or https://www.editbanana.net/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.

Warning

Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.


💬 Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code
Scan to join the Edit Banana community

Tip

If the QR code has expired, please submit an Issue to request an updated one.


📑 Table of Contents


📸 Effect Demonstration

High-Definition Input-Output Comparison (4 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to DrawIO

🔒 Original Static Diagram (Input · Non-editable) 🔓 DrawIO Reconstruction Result (Output · Fully Editable)

Example 1: Basic Flowchart

Original Diagram 1

✨ Editable Flowchart

Reconstruction Result 1

Example 2: Multi-level Architecture

Original Diagram 2

✨ Editable Architecture

Reconstruction Result 2

Example 3: Technical Schematic

Original Diagram 3

✨ Editable Schematic

Reconstruction Result 3

Example 4: Scientific Formula

Original Diagram 4

✨ Editable Formula

Reconstruction Result 4

Scenario 2: Human in the Loop Modification



✨ Manual repair




✨ Save locally

Note

✨ Conversion Highlights:

  1. Preserves the layout logic, color matching, and element hierarchy of the original diagram.
  2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness).
  3. Accurate text recognition, supporting direct subsequent editing and format adjustment.
  4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.

🚀 Key Features

  • Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.

  • Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.

  • Text Recognition:

    • Local OCR for text localization; easy to install, runs offline.
    • Pix2Text for mathematical formula recognition and LaTeX conversion .
    • Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
  • User System:

    • Registration: New users receive 10 free credits.
    • Credit System: Pay-per-use model prevents resource abuse.
    • Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.

🛠️ Architecture Pipeline

  1. Input: Image (PNG/JPG/BMP/TIFF/WebP).
  2. Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
  3. Text Extraction (Parallel):
    • Local OCR (Tesseract) detects text bounding boxes.
    • High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
  4. DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.

📂 Project Structure

Click to expand project structure
Edit-Banana/
├── config/               # Configuration files (copy config.yaml.example → config.yaml)
├── flowchart_text/       # OCR & Text Extraction Module (standalone entry)
│   ├── src/
│   └── main.py             # OCR-only entry point
├── input/                # [Manual] Input images directory
├── models/               # [Manual] Model weights (SAM3) and optional BPE vocab
├── output/               # [Manual] Results directory
├── sam3/                 # SAM3 library (see Installation: install from facebookresearch/sam3)
├── sam3_service/         # SAM3 HTTP service (optional, for multi-process deployment)
├── scripts/              # Setup and utility scripts
│   ├── setup_sam3.sh       # Install SAM3 lib and copy BPE to models/
│   ├── setup_rmbg.py       # Download RMBG model from ModelScope
│   └── merge_xml.py        # XML merge utilities
├── main.py               # CLI entry (modular pipeline)
├── server_pa.py          # FastAPI backend server
└── requirements.txt      # Python dependencies

📦 Installation & Setup

Follow these core phases to set up the project locally.

Phase 1: Environment & Base Setup

Configure your base environment and directory structure.

1. Prerequisites & Environment

  • Python 3.10+** & CUDA-capable GPU (Highly recommended)

  • Install PyTorch with CUDA support (e.g., for CUDA 11.8):

    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

2. Clone Repository & Init Directories

git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Edit-Banana
mkdir -p input output sam3_output

Phase 2: Models & Core Dependencies

Next, install the required packages and download necessary model weights (which should be placed in models/ and not committed).

1. Base Dependencies

pip install -r requirements.txt

2. SAM3 & Model Assets

  • SAM3 Library & BPE: Run bash scripts/setup_sam3.shto install the lib and copy the BPE vocab to models/. Verify with:

    python -c "from sam3.model_builder import build_sam3_image_model; print('OK')"
  • SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under models/sam3_ms.

  • Text Local OCR (Tesseract):

    sudo apt install tesseract-ocr tesseract-ocr-chi-sim
🧩 Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand
  • PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).

    pip install paddlepaddle==3.2.2 paddleocr.
  • Formula (Pix2Text):

    pip install pix2text onnxruntime-gpu.
  • Background Removal (RMBG): pip install onnxruntime modelscope then run python scripts/setup_rmbg.py.

Phase 3: Configuration & Troubleshooting

1. Final Configuration

Copy the example config and adjust the asset paths:

cp config/config.yaml.example config/config.yaml

Edit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.

🛠️ Before First Run Checklist & Troubleshooting - Click to expand

Checklist:

  • Config files copied and model paths set in config.yaml
  • SAM3 weights (sam3.pt) and BPE vocab placed under models/
  • Extracted SAM3 library via scripts/setup_sam3.sh Tesseract or PaddleOCR installed

Common Issues:

  • "no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set sam3.device: "cpu".
  • "Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
  • "PaddleOCR inference failed...": Use paddlepaddle==3.2.2 or fallback to Tesseract.

🔤 Usage

Command Line Interface (CLI)

Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.

Run and test locally

  1. One-time setup

    git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana
    python3 -m venv .venv && source .venv/bin/activate   # Linux/macOS; Windows: .venv\Scripts\activate
    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118   # or CPU build
    pip install -r requirements.txt
    sudo apt install tesseract-ocr tesseract-ocr-chi-sim   # OCR (or equivalent on your OS)

    Install the SAM3 library and download model weights + BPE. Then:

    mkdir -p input output
    cp config/config.yaml.example config/config.yaml
    # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths
  2. Test with CLI

    # Put a diagram image in input/, e.g. input/test.png
    python main.py -i input/test.png
    # Output appears under output/<image_stem>/ (DrawIO XML and intermediates)
  3. Optional: test the web API

    python server_pa.py
    # In another terminal:
    curl -X POST http://localhost:8000/convert -F "file=@input/test.png"
    # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload

⚙️ Configuration

Customize the pipeline behavior in config/config.yaml:

  • sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.

  • paths: Set input/output directories.

  • dominant_color: Fine-tune color extraction sensitivity.


📌 Development Roadmap

Feature Module Status Description
Core Conversion Pipeline ✅ Completed Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection ⚠️ In Development Automatically associate arrows with target shapes
DrawIO Template Adaptation 📍 Planned Support custom template import
Batch Export Optimization 📍 Planned Batch export to DrawIO files (.drawio)
Local LLM Adaptation 📍 Planned Support local VLM deployment, independent of APIs

🤝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/xxx)
  3. Commit your changes (git commit -m 'feat: add xxx')
  4. Push to the branch (git push origin feature/xxx)
  5. Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions


📄 License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).


🌟 Star History

🌟 If this project helps you, please star it to show your support!

Star History Chart