Vision-to-Policy (V2P): Multimodal Policy Diagram Translation

Python 3.10+ | License: MIT | Backend: OpenAI Vision API

Vision-to-Policy (V2P) is the official artifact repository for the paper:

Rethinking Access-Control Policy Authoring as a Multimodal Challenge
Accepted at SACMAT 2026

This repository implements the direct vision-to-specification translation pipeline described in Section 3 of the paper, where access-control policy diagrams are translated into structured policy specifications using vision-language models (VLMs).

Overview

V2P is a model-agnostic pipeline that converts Access Control Directed Acyclic Graph (DAG) images into structured knowledge graphs. Given a diagram depicting users, resources, and policies, the system supports:

Entity extraction (nodes),
Relation classification (typed edges),
End-to-end graph reconstruction (nodes + edges + paths),

These correspond to the experimental pipeline evaluated in:

Section 3 — Direct VLM Translation Pipeline
Section 4 — Empirical Framing of Vision-to-Specification Translation

Reproducing Paper Results

This repository reproduces the two empirical tables in the paper. Each table maps to a single batch script (run from the repository root):

Paper Table	What it reports	Script to run
Table 1 — Entity and Relation Recovery (micro-averaged P/R/F1)	End-to-end node + edge recovery across SAG/MAG/DAG, with and without legend	`bash run_v1.sh`
Table 2 — Assignment Directional Structural Error (Misdirection, Mis/FN)	Reversed-edge analysis across SAG/MAG/DAG, with and without legend	`bash run_assignment_misdirection_analyzer.sh` followed by `bash run_make_assignment_misdirection_table.sh`

Both tables cover the three structural regimes — SAG (Sparse), MAG (Moderate), and DAG (Dense) Authorization Graphs — under both legend conditions.

Step-by-step commands, per-regime invocations, and the underlying CLI flags are documented in Reproducing Results below.

Environment and Estimated Runtime

The results in the paper were generated under the following environment. We recommend reviewers use a comparable setup; deviations (notably in the OpenAI model version) can change absolute numbers but should preserve the qualitative trends in Tables 1 and 2.

Hardware

CPU: standard x86_64 workstation (e.g., Intel Core i7 / Xeon, 8+ cores)
Memory: 16 GB RAM
GPU: not required — all inference runs through the OpenAI Vision API; no local model weights
Network: stable connection to api.openai.com is required for the duration of the run

Software

OS: Ubuntu 22.04 LTS (also tested on macOS 14)
Python: 3.10+
Python dependencies: pinned in requirements.txt (openai>=1.0, pydantic>=2.0, python-dotenv>=1.0, Pillow>=10.0)
VLM backend: gpt-5-nano (default; configurable via --model)
Image detail: high for Table 1 relation extraction; low is supported for cost-sensitive runs

Estimated wall-clock runtime

End-to-end relation extraction takes approximately 3 minutes per image (case) with gpt-5-nano at --image_detail high. Total runtime scales linearly with the number of images in each regime × legend combination, and inversely with --workers for parallel execution.

Step	Dataset coverage	Approx. runtime
`bash run_v1.sh` (Table 1)	6 regime × legend combinations, 130 images total (SAG: 51, MAG: 9, DAG: 5, each ×2 legend conditions)	~3 min per case; ~6.5 hours sequential, ~1.6 hours with `--workers 4`
`bash run_assignment_misdirection_analyzer.sh` (Table 2, step 1)	Post-hoc analysis of predicted JSON from step above	< 5 min total (local analysis only, no API calls)
`bash run_make_assignment_misdirection_table.sh` (Table 2, step 2)	Aggregation across the 6 CSV outputs	< 1 min (no API calls)

Note: VLM endpoints are non-deterministic, so absolute numbers may shift slightly across runs; qualitative trends across SAG/MAG/DAG and legend conditions are preserved.

Pipeline Mapping to the Paper

The repository structure directly corresponds to the architecture in Figure 2 (Direct VLM Pipeline):

Paper Component	Implementation
VLM Translation Pipeline	`access_control_run.py`, `src/core_processor.py`
Prompt + Structured Output	`src/access_prompt.py`
Processing Strategies	`src/processing_strategies.py`
Evaluation Metrics	`src/evaluation.py`, `src/eval_metric.py`
Assignment Misdirection Analysis	`assignment_misdirection_analyzer.py`

Key Idea

Rather than using multi-stage diagram parsers, V2P performs single-pass structured generation, producing JSON outputs that explicitly enumerate:

nodes,
typed relations,
policy paths.

This design makes structural errors (omissions, misdirection, relation failures) directly observable and measurable.

Authors & Affiliations

Sherifdeen Lawal, University of Texas at San Antonio
Xingmeng Zhao, University of Colorado
Enrique Navarroespino, University of Texas at San Antonio
Anthony Rios, University of Texas at San Antonio
Ram Krishnan, University of Texas at San Antonio

Overview

flowchart LR
  images["DAG images<br/>(PNG / JPEG)"] --> entry[access_control_run.py]
  entry --> cli[src/cli.py]
  cli --> engine[src/core_processor.py]
  engine --> strategies[src/processing_strategies.py]
  strategies --> openai["OpenAI Vision API"]
  openai --> strategies
  strategies --> output["JSON results<br/>experiments/"]

The pipeline supports three primary modes:

Mode (`--method`)	What it does
`extract_entities`	Identify nodes (users, objects, policy classes) from the image.
`relation_classification`	Binary relation check for each entity pair (requires entity list).
`relation_extraction`	End-to-end: extract nodes + edges from the image in one pass. This is the mode used to populate Table 1.

Additional experimental methods (enumerate_paths, path_generation, extract_relation) are also available via the CLI.

Data Format

Input: PNG or JPEG images of Access Control DAG graphs.
Images can include or exclude a legend (--with_legend / --no_legend).

Output: JSON files containing extracted entities, classified relations, or full knowledge graphs, saved under the output directory.

Dataset layout (when using the bundled SubgraphsWithTriples data):

datasets/
  GroundTruthGraphsImages/        # --input points here
    SAG_with_legend/
    MAG_with_legend/
    DAG_with_legend/
    SAG_wo_legend/
    MAG_wo_legend/
    DAG_wo_legend/
  GroundTruthGraphsJSON/             # ground-truth (auto-resolved)
    SAG/                             # GT for SAG_with_legend + SAG_wo_legend
    MAG/                             # GT for MAG_with_legend + MAG_wo_legend
    DAG/                             # GT for DAG_with_legend + DAG_wo_legend
  PredictedPathGenerationJSON/
    SAG_with_legend/
    MAG_with_legend/
    DAG_with_legend/
    SAG_wo_legend/
    MAG_wo_legend/
    DAG_wo_legend/

Place your own images in datasets/ or pass an explicit --input path.

Installation

git clone git@github.com:UTSA-ICS/Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge.git
cd Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge

python3 -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

pip install -r requirements.txt

Dependencies (see requirements.txt):

openai >= 1.0
pydantic >= 2.0
python-dotenv >= 1.0
Pillow >= 10.0

Configuration

Copy the example environment file and fill in your API key:

cp .env.example .env

Edit .env:

OPENAI_API_KEY="<OPENAI_API_KEY>"

Security: .env is in .gitignore. Never commit real API keys.
Use placeholders (<OPENAI_API_KEY>, <YOUR_NAME>, etc.) in documentation and pull requests.

Reproducing Results

The two paper tables are reproduced by two independent workflows. Run from the repository root (Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge/).

Table 1 — Entity and Relation Recovery

Table 1 reports micro-averaged precision, recall, and F1 for entity recovery and relation recovery across the three structural regimes (SAG/MAG/DAG) under both legend conditions. It is produced by the end-to-end relation extraction pipeline (--method relation_extraction), which emits a single JSON specification per image enumerating nodes, typed edges, and paths.

Batch run (recommended for full reproduction):

bash run_v1.sh

This script iterates over all six regime × legend combinations, writes per-image predictions under experiments/relation_extraction/<regime>_<legend>/, and computes the micro-averaged metrics that populate Table 1.

Per-regime invocation (for inspection or partial reruns):

# SAG, with legend
python access_control_run.py \
  --input datasets/GroundTruthGraphsImages/SAG_with_legend \
  --output experiments/relation_extraction/SAG_with_legend \
  --method relation_extraction \
  --model gpt-5-nano --image_detail high --few_shot zero

# MAG, without legend
python access_control_run.py \
  --input datasets/GroundTruthGraphsImages/MAG_wo_legend \
  --output experiments/relation_extraction/MAG_wo_legend \
  --no_legend --method relation_extraction \
  --model gpt-5-nano --image_detail high --few_shot zero

Swap SAG/MAG/DAG and with_legend/wo_legend to reproduce the remaining four rows.

Table 2 — Assignment Directional Structural Error (Misdirection)

Table 2 quantifies reversed-edge errors among predicted assignment relations, normalized by assignment false negatives (Mis/FN). It is a post-hoc analysis of predictions already produced for Table 1 (no additional API calls).

Step 1 — Per-regime misdirection analysis:

bash run_assignment_misdirection_analyzer.sh

This iterates over the six regime × legend combinations and writes per-sample CSVs to experiments/assignment_misdirection_results/<regime>_<legend>/.

Step 2 — Aggregate into the table:

bash run_make_assignment_misdirection_table.sh

This consumes the per-sample CSVs and emits the aggregated Mis/FN values reported in Table 2.

Additional CLI modes (not used for Table 1 or Table 2)

The following modes are available for ablations and inspection but are not required to reproduce the paper's tables:

# Entity-only extraction (Section 3 ablation)
python access_control_run.py --method extract_entities

# Binary relation classification given a fixed entity list
python access_control_run.py \
  --input datasets/GroundTruthGraphsImages/DAG_with_legend \
  --output experiments/relation_classification/DAG_with_legend \
  --entities_input datasets/GroundTruthGraphsJSON/DAG \
  --gt_input datasets/GroundTruthGraphsJSON/DAG \
  --method relation_classification \
  --relation_source ground_truth \
  --model gpt-5-nano --image_detail low

Per-regime misdirection inspection (single regime, useful for debugging):

python3 assignment_misdirection_analyzer.py \
  --dir datasets/GroundTruthGraphsImages/SAG_with_legend \
  --glob "*path_generation.json" \
  --out_prefix assign_misdirection_with \
  --out_dir experiments/assignment_misdirection_results/SAG_with_legend

python3 make_assignment_misdirection_table.py \
  --with_sag /path/to/sag_assign_misdirection_with_per_sample.csv \
  --wo_sag /path/to/sag_assign_misdirection_wo_per_sample.csv \
  --out_csv /path/to/assign_misdirection_results.csv

Full CLI Help

python access_control_run.py --help

CLI Reference

Argument	Default	Description
`--input`	`datasets/`	Image file or directory.
`--output`	`experiments/`	Output file or directory.
`--method`	`extract_entities`	Processing mode (see table above).
`--model`	`gpt-5-nano`	Vision model (`gpt-5-nano`, `gpt-5-mini`, `gpt-4o-mini`, `gpt-4o`).
`--image_detail`	`low`	`low` (cost-efficient, ~2.8k tokens) or `high` (~54k tokens).
`--few_shot`	`zero`	`zero` or `few` (Context7-style few-shot).
`--workers`	`4`	Parallel workers for batch processing (1 = sequential).
`--relation_source`	`ground_truth`	Entity source for `relation_classification`: `ground_truth` or `predicted`.
`--entities_input`	—	Entity folder for `relation_classification`.
`--gt_input`	—	Explicit ground-truth directory for evaluation.
`--with_legend` / `--no_legend`	with	Legend handling.
`--subset_size`	—	Limit to N random relations per graph (testing).
`--comprehensive_eval`	off	Broader evaluation across all possible relations.
`--fuzzy_matching`	off	Fuzzy entity name matching in evaluation.

Project Structure

Access-Control-Policy/
├── README.md
├── access_control_run.py
├── assignment_misdirection_analyzer.py
├── datasets
│   ├── GroundTruthGraphsImages
│   ├── GroundTruthGraphsJSON
│   ├── PredictedGraphsJSON
│   └── test.md
├── experiments
│   ├── assignment_misdirection_results
│   ├── relation_classification
│   ├── relation_extraction
│   └── test.md
├── logs
├── make_assignment_misdirection_table.py
├── performance_results.csv
├── requirements.txt
├── run_assignment_misdirection_analyzer.sh
├── run_make_assignment_misdirection_table.sh
├── run_v1.sh
└── src
    ├── __init__.py
    ├── __pycache__
    ├── access_prompt.py
    ├── cli.py
    ├── config.py
    ├── core_processor.py
    ├── entity_pair_generator.py
    ├── eval_metric.py
    ├── evaluation.py
    ├── file_utils.py
    └── processing_strategies.py

Troubleshooting

Problem	Solution
`OpenAI API key not provided`	Set `OPENAI_API_KEY` in `.env` or export it in your shell.
`ModuleNotFoundError: No module named 'src'`	Run from the repository root (`Access-Control-Policy/`).
Wrong output directory	Set `--output` explicitly; the default auto-nests under `experiments/<method>/`.
Truncated model responses	Increase `max_tokens` in `src/config.py` → `APIConfig`.
Numbers differ slightly from the paper	OpenAI VLM endpoints are non-deterministic; small variations are expected. Qualitative trends across SAG/MAG/DAG and legend conditions should be preserved.

Security Notes

Never commit .env, API keys, or paths that reveal personal information.
Use these placeholders in all shared docs: <OPENAI_API_KEY>, <YOUR_NAME>, <YOUR_ORG>, <YOUR_EMAIL>.
.env is listed in .gitignore.

License

This project is licensed under the MIT License.
See the LICENSE file for details.

Citation

@inproceedings{lawal2026v2p,
  title   = {Rethinking Access-Control Policy Authoring as a Multimodal Challenge},
  author  = {Lawal, Sherifdeen and Zhao, Xingmeng and Navarroespino, Enrique and Rios, Anthony and Krishnan, Ram},
  booktitle = {Proceedings of the ACM Symposium on Access Control Models and Technologies (SACMAT)},
  year    = {2026},
  note    = {Artifact: Vision-to-Policy (V2P)},
  url     = {https://github.com/UTSA-ICS/Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision-to-Policy (V2P): Multimodal Policy Diagram Translation

Overview

Reproducing Paper Results

Environment and Estimated Runtime

Pipeline Mapping to the Paper

Key Idea

Authors & Affiliations

Overview

Data Format

Installation

Configuration

Reproducing Results

Table 1 — Entity and Relation Recovery

Table 2 — Assignment Directional Structural Error (Misdirection)

Additional CLI modes (not used for Table 1 or Table 2)

Full CLI Help

CLI Reference

Project Structure

Troubleshooting

Security Notes

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
access_control_run.py		access_control_run.py
assignment_misdirection_analyzer.py		assignment_misdirection_analyzer.py
make_assignment_misdirection_table.py		make_assignment_misdirection_table.py
requirements.txt		requirements.txt
run_assignment_misdirection_analyzer.sh		run_assignment_misdirection_analyzer.sh
run_make_assignment_misdirection_table.sh		run_make_assignment_misdirection_table.sh
run_v1.sh		run_v1.sh

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Vision-to-Policy (V2P): Multimodal Policy Diagram Translation

Overview

Reproducing Paper Results

Environment and Estimated Runtime

Pipeline Mapping to the Paper

Key Idea

Authors & Affiliations

Overview

Data Format

Installation

Configuration

Reproducing Results

Table 1 — Entity and Relation Recovery

Table 2 — Assignment Directional Structural Error (Misdirection)

Additional CLI modes (not used for Table 1 or Table 2)

Full CLI Help

CLI Reference

Project Structure

Troubleshooting

Security Notes

License

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages