Feat/image segmentation#1981
Open
NANDAGOPALNG wants to merge 8 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new end-to-end example under examples/ intended to demonstrate SAM 2-based image segmentation workflows (points/boxes) and related environment setup assets.
Changes:
- Added a new SAM 2 segmentation example script with multiple prompt demonstrations (points, boxes, batched inputs).
- Added a setup script to download sample images and a SAM 2 checkpoint.
- Added example-specific README and requirements.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| examples/Image Segmentation with SAM 2/inference_example.py | New SAM 2 segmentation demo script (device setup, visualization, prompt demos). |
| examples/Image Segmentation with SAM 2/README.md | New example documentation (setup, run instructions, output, structure). |
| examples/Image Segmentation with SAM 2/requirements.txt | Python dependencies for running the example. |
| examples/Image Segmentation with SAM 2/setup.sh | Helper script to download sample assets and the model checkpoint. |
Comment on lines
+43
to
+47
| Upload your segmented image result below: | ||
|
|
||
| ``` | ||
|  | ||
| ``` |
Comment on lines
+56
to
+63
| ``` | ||
| examples/ | ||
| └── text_driven_segmentation/ | ||
| ├── README.md | ||
| ├── requirements.txt | ||
| ├── inference_example.py | ||
| └── setup.sh | ||
| ``` |
Comment on lines
+24
to
+29
| # Use bfloat16 for faster inference on supported GPUs | ||
| torch.autocast("cuda", dtype=torch.bfloat16).__enter__() | ||
| # Enable tf32 for Ampere GPUs | ||
| if torch.cuda.get_device_properties(0).major >= 8: | ||
| torch.backends.cuda.matmul.allow_tf32 = True | ||
| torch.backends.cudnn.allow_tf32 = True |
Comment on lines
+387
to
+389
| sam2_checkpoint = "../checkpoints/sam2.1_hiera_large.pt" | ||
| model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml" | ||
|
|
Comment on lines
+8
to
+15
| import os | ||
|
|
||
| import cv2 | ||
| import matplotlib.pyplot as plt | ||
| import numpy as np | ||
| import torch | ||
| from PIL import Image | ||
|
|
Comment on lines
+1
to
+9
| #!/bin/bash | ||
|
|
||
| echo "Setting up environment for Grounded SAM2 Image Segmentation..." | ||
|
|
||
| # Create necessary directories | ||
| echo "Creating directories..." | ||
| mkdir -p images | ||
| mkdir -p ../checkpoints/ | ||
|
|
Comment on lines
+1
to
+25
| # Text-Driven Image Segmentation with SAM 2 | ||
|
|
||
| This example demonstrates **image segmentation** using the **Segment Anything Model 2**. | ||
| You can specify an object in the image via a **Points or Boxes*, and the model automatically segments that region. | ||
|
|
||
| --- | ||
|
|
||
| ## 🧠 Overview | ||
|
|
||
| Points or Boxes segmentation allows you to extract a specific object or region from an image by providing a natural language prompt. | ||
|
|
||
| This implementation integrates SAM 2 with a grounding model (like GroundingDINO/GLIP) to link text to image regions. | ||
|
|
||
| --- | ||
|
|
||
| ## ⚙️ Requirements | ||
|
|
||
| Install dependencies before running the script: | ||
|
|
||
| ```bash | ||
| pip install opencv-python-headless matplotlib pillow tqdm | ||
| pip install git+https://github.com/facebookresearch/segment-anything.git@main | ||
| pip install git+https://github.com/IDEA-Research/GroundingDINO.git@main | ||
| pip install --upgrade roboflow albumentations | ||
| ``` |
Comment on lines
+29
to
+37
| ## 🚀 How to Run | ||
|
|
||
| Run the segmentation example script: | ||
|
|
||
| ```bash | ||
| python inference_example.py --image-path path/to/image.jpg --text-prompt "segment the person" | ||
| ``` | ||
|
|
||
| You can also modify the script to test different input images or prompts. |
Comment on lines
+1
to
+6
| """ | ||
| Text-Driven Image Segmentation with Grounded SAM2 | ||
|
|
||
| This module demonstrates image segmentation using Meta's Segment Anything Model 2 | ||
| combined with text prompts for automatic object detection and segmentation. | ||
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This project demonstrates text-driven image segmentation using Meta's Segment Anything Model 2 (SAM 2). The implementation focuses on converting various input prompts (points, boxes) into precise object segmentation masks.
List any dependencies that are required for this change.
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
Yes i have ran the code in my local system and it works perfectly
This is the PR which is made on the following Issue- "Adding new feature of image segmentation by using Grounded SAM2 #1977"
can you consider this PR for HacktoberFest 2025
Docs