Step 3 creates masks/ from the images/ kept in Step 2. These masks exclude regions you do not want SfM or 3DGS to use, such as people, the camera operator, tripods, hands, vehicles, sky, stitch seams, and blown-out highlights.
Mask polarity is white = use, black = exclude. Creating masks before Metashape, COLMAP, SphereSfM, or another SfM route helps avoid unwanted features from people, sky, or other unstable regions.
| Goal | Recommended settings |
|---|---|
| Mask people or the camera operator in 360° frames | Confirm Image Type: 360°, then use Model: YOLO/SAM2.1, Quality: High |
| Get higher-accuracy person or sky masks | Model: SAM3.1 |
| Try sky masks without SAM3.1 | Model: Mask2Former, target sky |
| Process normal photos or normal video frames | Add or copy them into the scene and confirm Image Type: Normal |
| Process only the source you just added | Choose that video or still folder in Source |
| Keep existing masks untouched | Choose Scope: Images without masks only |
| Rebuild only masks affected by changed settings | Choose Scope: Missing + changed settings |
| Exclude visible nearby stitch seams | Stitch ON |
| Exclude blown-out windows or lights | Overexp ON |
| Apply your own fixed mask to every image | Custom ON |
When unsure, start with YOLO/SAM2.1 + High + person for 360° images, then regenerate only problem images with higher quality or a different model.
- Confirm that
Images Folderpoints to the sceneimages/. - Check
Source. Leave it onAllfor normal runs. Choose a specific video or still folder when you only want to process newly added material. - Choose
Scope. UseImages without masks onlyto preserve existing masks,Missing + changed settingsto update only images affected by setting changes, andRegenerate all imagesonly when you intentionally want a full rebuild. - Confirm the
Image Typestatus. Step 3 detects it from Step 1 records, external image registration, or image headers, and splits mixed inputs by image type automatically. - Choose the mask
Modeland detection targets. - Enable
Stitch,Overexp, orCustomonly when needed. - Run
Mask Previewon one image and confirm that the regions to be masked are shown with a red overlay. - If the preview looks reasonable, press
Generatefor the selected range. - Scan the thumbnail list for missed regions or false positives.
- Select only problem images, adjust settings, and run
Regenerate Mask.
Step 3 stops before running if images marked for removal still remain, or if unrelated images are mixed into the scene. Apply Step 2 decisions first, or clean up images/ so it contains only the images you want to mask.
| Model | Best for |
|---|---|
YOLO/SAM2.1 |
Recommended route for fast person masking |
SAM3.1 |
Higher-accuracy English-prompt masks for people, sky, tripods, hands, phones, and cleanup |
Mask2Former |
Trying semantic masks such as sky without SAM3.1 |
This is the recommended route when you want fast person masks. YOLO detects the selected targets, then SAM2.1 refines their shapes. The usual target is person only. Add vehicles or other available targets only when you also want to mask them.
SAM3.1 is prompt-driven rather than a fixed class list. You describe what to mask with English prompts. Single words such as person, sky, or tripod work, and short natural-language phrases such as selfie stick, cell phone, person wearing a red jacket, hand holding a phone, or tripod legs near the floor can also be used. The checkboxes are shortcuts for common prompts, so you do not have to type them manually. Add any missing targets in the extra prompt field.
After generation, select only images with misses or false detections and combine prompts with the mask regeneration mode to add regions to the current mask or remove unwanted targets such as logo or sign from the masked area.
Use add prompts to pick up targets that were not masked. For example, enter tripod legs near the floor for a missed tripod near the feet, or hand holding a phone for a missed phone, set the mode to Add, and regenerate only the affected images. Use subtract prompts when areas you do not want masked were picked up by mistake. For example, enter male icon, female icon, logo, or sign, set the mode to Subtract, and remove those targets from the masked area.
On first use, if models/sam3.1/sam3.1_multiplex.pt is missing, Hugging Face access approval and SAM License acceptance are required. GUI downloads use a Read token from the approved account. This app does not save the token.
Mask2Former uses ADE20K semantic classes. Use it when you want to try semantic targets such as sky without setting up SAM3.1.
Quality controls the balance between accuracy and processing time.
| Quality | Use when |
|---|---|
Standard |
Quick checks. Also a reasonable starting point for normal images |
High |
Recommended starting point for 360° images. Adds person-oriented tiles and top/bottom assist |
Best |
Targeted fixes for images that still leak. Slower |
Mask Expand grows or shrinks mask boundaries. Positive values make black excluded regions larger; negative values make them tighter. Increase it when silhouettes leak through, and decrease it when masks remove too much.
After the model-based mask, Step 3 can merge extra masks. The order is always model -> stitch seam -> overexposure -> custom.
| Option | Use when |
|---|---|
Stitch |
Excluding stitch seams in 360° images |
Overexp |
Excluding blown-out windows, lights, or reflections |
Custom |
AND-merging your own PNG mask into every output |
If stitch seams are barely visible, keep stitch masks off or use a narrow seam mask first. Usually keep it off for stabilized, direction-locked, or AI-stitched footage where seam positions move.
Custom masks use PNG input. White means use, black means exclude. They apply only to images with matching dimensions.
Mask Preview builds a temporary mask for the currently displayed image using the current settings. It does not save to masks/. Use it first to check missed regions or over-masking.
Show Preview switches between the temporary preview and the saved mask display. For 360° images, perspective preview makes it easier to inspect feet, tripods, silhouettes, and stitch-seam regions.
Regenerate Mask saves a new mask to masks/ for the current image, or for selected images in thumbnail mode.
With SAM3.1, choose how prompt detections are applied to the saved mask:
Replace: rebuild the mask from current settingsAdd: add detected targets to the masked areaSubtract: remove detected targets from the masked area
It is usually faster to generate the full set at Standard/High quality, then regenerate only images with visible misses.
| Output | Meaning |
|---|---|
masks/ |
Mask images corresponding to images/ |
| Temporary preview | Visual check only; not a saved mask |
| Run log | Shown in the integrated log panel |
Generated masks/ can be imported into Metashape, COLMAP, or SphereSfM, and Step 5 converts them to match training images when creating datasets. Polarity remains white = use, black = exclude.
- Start with
YOLO/SAM2.1+High+person. - Use
SAM3.1when you also need sky, tripods, hands, or prompt-based cleanup. - Use
Mask2Formerfor a quick sky-mask test without SAM3.1. - If the camera operator or tripod remains near the bottom of a 360° image, raise quality or use SAM3.1 prompts such as
tripod,hand, orselfie stickto add those areas to the mask. - Turn
Overexpon only for footage where blown-out areas are actually harmful; it can remove too much in some interiors. - Normal images do not use stitch seam masks or 360° pole projection assist.
- Use
Images without masks onlywhen you want to keep masks you already reviewed. - Use
Missing + changed settingsafter changing mask settings and updating only affected images. - After adding a new source, choose that source in
Sourceto keep the run and review focused. - If SAM3.1 stops due to GPU memory, completed masks remain saved. Rerun with the same settings to continue from unfinished images.
- Semantic/person masking uses third-party libraries and model weights with separate license terms. See ../THIRD_PARTY_LICENSES.md.