Skip to content

UTSA-ICS/Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge

Repository files navigation

Vision-to-Policy (V2P): Multimodal Policy Diagram Translation

SACMAT 2026 Artifact Python License

Python 3.10+ | License: MIT | Backend: OpenAI Vision API

Vision-to-Policy (V2P) is the official artifact repository for the paper:

Rethinking Access-Control Policy Authoring as a Multimodal Challenge
Accepted at SACMAT 2026

This repository implements the direct vision-to-specification translation pipeline described in Section 3 of the paper, where access-control policy diagrams are translated into structured policy specifications using vision-language models (VLMs).

Overview

V2P is a model-agnostic pipeline that converts Access Control Directed Acyclic Graph (DAG) images into structured knowledge graphs. Given a diagram depicting users, resources, and policies, the system supports:

  • Entity extraction (nodes),
  • Relation classification (typed edges),
  • End-to-end graph reconstruction (nodes + edges + paths),

These correspond to the experimental pipeline evaluated in:

  • Section 3 — Direct VLM Translation Pipeline
  • Section 4 — Empirical Framing of Vision-to-Specification Translation

Reproducing Paper Results

This repository reproduces the two empirical tables in the paper. Each table maps to a single batch script (run from the repository root):

Paper Table What it reports Script to run
Table 1 — Entity and Relation Recovery (micro-averaged P/R/F1) End-to-end node + edge recovery across SAG/MAG/DAG, with and without legend bash run_v1.sh
Table 2 — Assignment Directional Structural Error (Misdirection, Mis/FN) Reversed-edge analysis across SAG/MAG/DAG, with and without legend bash run_assignment_misdirection_analyzer.sh followed by bash run_make_assignment_misdirection_table.sh

Both tables cover the three structural regimes — SAG (Sparse), MAG (Moderate), and DAG (Dense) Authorization Graphs — under both legend conditions.

Step-by-step commands, per-regime invocations, and the underlying CLI flags are documented in Reproducing Results below.

Environment and Estimated Runtime

The results in the paper were generated under the following environment. We recommend reviewers use a comparable setup; deviations (notably in the OpenAI model version) can change absolute numbers but should preserve the qualitative trends in Tables 1 and 2.

Hardware

  • CPU: standard x86_64 workstation (e.g., Intel Core i7 / Xeon, 8+ cores)
  • Memory: 16 GB RAM
  • GPU: not required — all inference runs through the OpenAI Vision API; no local model weights
  • Network: stable connection to api.openai.com is required for the duration of the run

Software

  • OS: Ubuntu 22.04 LTS (also tested on macOS 14)
  • Python: 3.10+
  • Python dependencies: pinned in requirements.txt (openai>=1.0, pydantic>=2.0, python-dotenv>=1.0, Pillow>=10.0)
  • VLM backend: gpt-5-nano (default; configurable via --model)
  • Image detail: high for Table 1 relation extraction; low is supported for cost-sensitive runs

Estimated wall-clock runtime

End-to-end relation extraction takes approximately 3 minutes per image (case) with gpt-5-nano at --image_detail high. Total runtime scales linearly with the number of images in each regime × legend combination, and inversely with --workers for parallel execution.

Step Dataset coverage Approx. runtime
bash run_v1.sh (Table 1) 6 regime × legend combinations, 130 images total (SAG: 51, MAG: 9, DAG: 5, each ×2 legend conditions) ~3 min per case; ~6.5 hours sequential, ~1.6 hours with --workers 4
bash run_assignment_misdirection_analyzer.sh (Table 2, step 1) Post-hoc analysis of predicted JSON from step above < 5 min total (local analysis only, no API calls)
bash run_make_assignment_misdirection_table.sh (Table 2, step 2) Aggregation across the 6 CSV outputs < 1 min (no API calls)

Note: VLM endpoints are non-deterministic, so absolute numbers may shift slightly across runs; qualitative trends across SAG/MAG/DAG and legend conditions are preserved.

Pipeline Mapping to the Paper

The repository structure directly corresponds to the architecture in Figure 2 (Direct VLM Pipeline):

Paper Component Implementation
VLM Translation Pipeline access_control_run.py, src/core_processor.py
Prompt + Structured Output src/access_prompt.py
Processing Strategies src/processing_strategies.py
Evaluation Metrics src/evaluation.py, src/eval_metric.py
Assignment Misdirection Analysis assignment_misdirection_analyzer.py

Key Idea

Rather than using multi-stage diagram parsers, V2P performs single-pass structured generation, producing JSON outputs that explicitly enumerate:

  • nodes,
  • typed relations,
  • policy paths.

This design makes structural errors (omissions, misdirection, relation failures) directly observable and measurable.


Authors & Affiliations

  1. Sherifdeen Lawal, University of Texas at San Antonio
  2. Xingmeng Zhao, University of Colorado
  3. Enrique Navarroespino, University of Texas at San Antonio
  4. Anthony Rios, University of Texas at San Antonio
  5. Ram Krishnan, University of Texas at San Antonio

Overview

flowchart LR
  images["DAG images<br/>(PNG / JPEG)"] --> entry[access_control_run.py]
  entry --> cli[src/cli.py]
  cli --> engine[src/core_processor.py]
  engine --> strategies[src/processing_strategies.py]
  strategies --> openai["OpenAI Vision API"]
  openai --> strategies
  strategies --> output["JSON results<br/>experiments/"]
Loading

The pipeline supports three primary modes:

Mode (--method) What it does
extract_entities Identify nodes (users, objects, policy classes) from the image.
relation_classification Binary relation check for each entity pair (requires entity list).
relation_extraction End-to-end: extract nodes + edges from the image in one pass. This is the mode used to populate Table 1.

Additional experimental methods (enumerate_paths, path_generation, extract_relation) are also available via the CLI.


Data Format

Input: PNG or JPEG images of Access Control DAG graphs.
Images can include or exclude a legend (--with_legend / --no_legend).

Output: JSON files containing extracted entities, classified relations, or full knowledge graphs, saved under the output directory.

Dataset layout (when using the bundled SubgraphsWithTriples data):

datasets/
  GroundTruthGraphsImages/        # --input points here
    SAG_with_legend/
    MAG_with_legend/
    DAG_with_legend/
    SAG_wo_legend/
    MAG_wo_legend/
    DAG_wo_legend/
  GroundTruthGraphsJSON/             # ground-truth (auto-resolved)
    SAG/                             # GT for SAG_with_legend + SAG_wo_legend
    MAG/                             # GT for MAG_with_legend + MAG_wo_legend
    DAG/                             # GT for DAG_with_legend + DAG_wo_legend
  PredictedPathGenerationJSON/
    SAG_with_legend/
    MAG_with_legend/
    DAG_with_legend/
    SAG_wo_legend/
    MAG_wo_legend/
    DAG_wo_legend/
    

Place your own images in datasets/ or pass an explicit --input path.


Installation

git clone git@github.com:UTSA-ICS/Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge.git
cd Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge

python3 -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

pip install -r requirements.txt

Dependencies (see requirements.txt):

  • openai >= 1.0
  • pydantic >= 2.0
  • python-dotenv >= 1.0
  • Pillow >= 10.0

Configuration

Copy the example environment file and fill in your API key:

cp .env.example .env

Edit .env:

OPENAI_API_KEY="<OPENAI_API_KEY>"

Security: .env is in .gitignore. Never commit real API keys.
Use placeholders (<OPENAI_API_KEY>, <YOUR_NAME>, etc.) in documentation and pull requests.


Reproducing Results

The two paper tables are reproduced by two independent workflows. Run from the repository root (Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge/).

Table 1 — Entity and Relation Recovery

Table 1 reports micro-averaged precision, recall, and F1 for entity recovery and relation recovery across the three structural regimes (SAG/MAG/DAG) under both legend conditions. It is produced by the end-to-end relation extraction pipeline (--method relation_extraction), which emits a single JSON specification per image enumerating nodes, typed edges, and paths.

Batch run (recommended for full reproduction):

bash run_v1.sh

This script iterates over all six regime × legend combinations, writes per-image predictions under experiments/relation_extraction/<regime>_<legend>/, and computes the micro-averaged metrics that populate Table 1.

Per-regime invocation (for inspection or partial reruns):

# SAG, with legend
python access_control_run.py \
  --input datasets/GroundTruthGraphsImages/SAG_with_legend \
  --output experiments/relation_extraction/SAG_with_legend \
  --method relation_extraction \
  --model gpt-5-nano --image_detail high --few_shot zero
# MAG, without legend
python access_control_run.py \
  --input datasets/GroundTruthGraphsImages/MAG_wo_legend \
  --output experiments/relation_extraction/MAG_wo_legend \
  --no_legend --method relation_extraction \
  --model gpt-5-nano --image_detail high --few_shot zero

Swap SAG/MAG/DAG and with_legend/wo_legend to reproduce the remaining four rows.

Table 2 — Assignment Directional Structural Error (Misdirection)

Table 2 quantifies reversed-edge errors among predicted assignment relations, normalized by assignment false negatives (Mis/FN). It is a post-hoc analysis of predictions already produced for Table 1 (no additional API calls).

Step 1 — Per-regime misdirection analysis:

bash run_assignment_misdirection_analyzer.sh

This iterates over the six regime × legend combinations and writes per-sample CSVs to experiments/assignment_misdirection_results/<regime>_<legend>/.

Step 2 — Aggregate into the table:

bash run_make_assignment_misdirection_table.sh

This consumes the per-sample CSVs and emits the aggregated Mis/FN values reported in Table 2.

Additional CLI modes (not used for Table 1 or Table 2)

The following modes are available for ablations and inspection but are not required to reproduce the paper's tables:

# Entity-only extraction (Section 3 ablation)
python access_control_run.py --method extract_entities

# Binary relation classification given a fixed entity list
python access_control_run.py \
  --input datasets/GroundTruthGraphsImages/DAG_with_legend \
  --output experiments/relation_classification/DAG_with_legend \
  --entities_input datasets/GroundTruthGraphsJSON/DAG \
  --gt_input datasets/GroundTruthGraphsJSON/DAG \
  --method relation_classification \
  --relation_source ground_truth \
  --model gpt-5-nano --image_detail low

Per-regime misdirection inspection (single regime, useful for debugging):

python3 assignment_misdirection_analyzer.py \
  --dir datasets/GroundTruthGraphsImages/SAG_with_legend \
  --glob "*path_generation.json" \
  --out_prefix assign_misdirection_with \
  --out_dir experiments/assignment_misdirection_results/SAG_with_legend
python3 make_assignment_misdirection_table.py \
  --with_sag /path/to/sag_assign_misdirection_with_per_sample.csv \
  --wo_sag /path/to/sag_assign_misdirection_wo_per_sample.csv \
  --out_csv /path/to/assign_misdirection_results.csv

Full CLI Help

python access_control_run.py --help

CLI Reference

Argument Default Description
--input datasets/ Image file or directory.
--output experiments/ Output file or directory.
--method extract_entities Processing mode (see table above).
--model gpt-5-nano Vision model (gpt-5-nano, gpt-5-mini, gpt-4o-mini, gpt-4o).
--image_detail low low (cost-efficient, ~2.8k tokens) or high (~54k tokens).
--few_shot zero zero or few (Context7-style few-shot).
--workers 4 Parallel workers for batch processing (1 = sequential).
--relation_source ground_truth Entity source for relation_classification: ground_truth or predicted.
--entities_input Entity folder for relation_classification.
--gt_input Explicit ground-truth directory for evaluation.
--with_legend / --no_legend with Legend handling.
--subset_size Limit to N random relations per graph (testing).
--comprehensive_eval off Broader evaluation across all possible relations.
--fuzzy_matching off Fuzzy entity name matching in evaluation.

Project Structure

Access-Control-Policy/
├── README.md
├── access_control_run.py
├── assignment_misdirection_analyzer.py
├── datasets
│   ├── GroundTruthGraphsImages
│   ├── GroundTruthGraphsJSON
│   ├── PredictedGraphsJSON
│   └── test.md
├── experiments
│   ├── assignment_misdirection_results
│   ├── relation_classification
│   ├── relation_extraction
│   └── test.md
├── logs
├── make_assignment_misdirection_table.py
├── performance_results.csv
├── requirements.txt
├── run_assignment_misdirection_analyzer.sh
├── run_make_assignment_misdirection_table.sh
├── run_v1.sh
└── src
    ├── __init__.py
    ├── __pycache__
    ├── access_prompt.py
    ├── cli.py
    ├── config.py
    ├── core_processor.py
    ├── entity_pair_generator.py
    ├── eval_metric.py
    ├── evaluation.py
    ├── file_utils.py
    └── processing_strategies.py


Troubleshooting

Problem Solution
OpenAI API key not provided Set OPENAI_API_KEY in .env or export it in your shell.
ModuleNotFoundError: No module named 'src' Run from the repository root (Access-Control-Policy/).
Wrong output directory Set --output explicitly; the default auto-nests under experiments/<method>/.
Truncated model responses Increase max_tokens in src/config.pyAPIConfig.
Numbers differ slightly from the paper OpenAI VLM endpoints are non-deterministic; small variations are expected. Qualitative trends across SAG/MAG/DAG and legend conditions should be preserved.

Security Notes

  • Never commit .env, API keys, or paths that reveal personal information.
  • Use these placeholders in all shared docs: <OPENAI_API_KEY>, <YOUR_NAME>, <YOUR_ORG>, <YOUR_EMAIL>.
  • .env is listed in .gitignore.

License

This project is licensed under the MIT License.
See the LICENSE file for details.


Citation

@inproceedings{lawal2026v2p,
  title   = {Rethinking Access-Control Policy Authoring as a Multimodal Challenge},
  author  = {Lawal, Sherifdeen and Zhao, Xingmeng and Navarroespino, Enrique and Rios, Anthony and Krishnan, Ram},
  booktitle = {Proceedings of the ACM Symposium on Access Control Models and Technologies (SACMAT)},
  year    = {2026},
  note    = {Artifact: Vision-to-Policy (V2P)},
  url     = {https://github.com/UTSA-ICS/Rethinking-Access-Control-Policy-Authoring-as-a-Multimodal-Challenge}
}

About

Vision-to-Policy (V2P): Translating access-control policy diagrams into structured specifications via vision-language models (SACMAT 2026 artifact).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors