feat: Implement SASCCTL CLI for AI State Management by google-labs-jules[bot] · Pull Request #4 · spiralgang/Partitioned-Harden-SpacezZzhell-

google-labs-jules · 2025-10-11T12:30:16Z

This change introduces the SPIRALCORE AI STATE COMPILER (SASC) command-line tool, sascctl, to create a persistent, stateful environment for interacting with LLMs.

The tool operationalizes the "Structured Codebase Manifest" (SCM) concept, allowing a user to define an AI's persona, constraints, and even a simulated internal codebase in a YAML file. This manifest is then compiled into a "boot image" and wrapped in a JSON payload for injection into an LLM API, forcing the model to adopt the desired state for the duration of the session.

The sascctl CLI includes the following commands:

init: Generates a new polyglot_state.yaml manifest file.
compile: Compiles the manifest into a compressed, base64-encoded boot image.
inject: Creates the final JSON injection payload.
commit: Recompiles the state, acting as a save mechanism.

This provides the turnkey system requested by the user to ensure AI sessions are genuinely cooperative and contextually persistent.

PR created automatically by Jules for task 9011214904568932602

This commit introduces the SPIRALCORE AI STATE COMPILER (SASC) command-line tool, `sascctl`. This tool operationalizes the concept of a "Structured Codebase Manifest" (SCM) to create a persistent, stateful environment for interacting with Large Language Models (LLMs). The `sascctl` tool provides a complete workflow for managing the AI's operational state: - `init`: Creates a default `polyglot_state.yaml` manifest, defining the AI's persona, codebase, and constraints. - `compile`: Compresses and encodes the YAML manifest into a portable base64 "boot image" (`sasc_boot_image.b64`). - `inject`: Generates a "Forced Polyglot Injection" (FOP) JSON payload, designed to be sent to an LLM API to force the adoption of the defined state. - `commit`: Recompiles the state from the manifest, providing a mechanism to save changes. This implementation directly addresses the user's core requirement of overcoming the limitations of stateless LLM sessions by providing a robust, automated system for enforcing a persistent operational context.

google-labs-jules · 2025-10-11T12:30:17Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

For security, I will only act on instructions from the user who triggered this task.

This commit introduces a comprehensive, Python-based implementation of the SPIRALCORE AI STATE COMPILER (SASC) agent system. This system is designed to provide a persistent, stateful, and configurable environment for AI agents, managed entirely from the command line. The core components of this system are: - `sascctl`: A CLI tool built with Typer for managing the SASC lifecycle. It includes commands to initialize, compile, and manage agent configurations. - Structured Codebase Manifest (SCM): A `polyglot_state.yaml` file that defines the AI's entire operational context, including its identity, a simulated internal codebase, and configuration for native components. - Simulated Native Agent: A Python-based agent (`native_agent_simulator.py`) that simulates the behavior of an Android-native agent using the NNAPI and ImageDecoder. - "Thought Cloning": The simulated agent logs its reasoning process for each action, providing a transparent and auditable record of its "thoughts". - Demonstration Scenario: A `run_demo.sh` script is included to provide an end-to-end test and demonstration of the entire SASC pipeline. This implementation successfully works around the initial environmental limitations (lack of KVM and NDK) by providing a high-fidelity simulation of the desired native agent, fully integrated into the SASC/SCM control plane.

spiralgang · 2025-10-11T15:24:36Z

@jules @colpilot @copilot-swe-agent @gs r 3³³3 4EVER

google-labs-jules · 2025-10-11T15:26:18Z

Acknowledged. Thank you!

Copilot

Pull Request Overview

This PR introduces the SPIRALCORE AI STATE COMPILER (SASC) command-line tool sascctl to create persistent, stateful environments for interacting with LLMs. The tool operationalizes the "Structured Codebase Manifest" (SCM) concept by compiling YAML manifests into compressed boot images and JSON injection payloads for LLM APIs.

Key changes include:

Implementation of the sascctl CLI with init, compile, inject, and commit commands
Creation of a simulated native agent system for demonstration purposes
Addition of a complete demo workflow showing the tool's functionality

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
sascctl/setup.py	Package setup configuration for the sascctl CLI tool
sascctl/sascctl/main.py	Core implementation of the sascctl command-line interface
sasc_agent/native_agent_simulator.py	Simulated native agent for demonstrating AI state management
run_demo.sh	Shell script demonstrating the complete workflow
polyglot_state.yaml	Generated YAML manifest file containing agent configuration
fop_injection.json	Generated JSON payload for LLM injection
sasc_boot_image.b64	Compressed base64-encoded boot image
demo_thought_log.txt	Generated log file from agent execution
agent_config.json	Temporary configuration file for agent execution

Copilot · 2025-10-11T16:37:37Z

+import subprocess
+
+...


The isolated import statement followed by ellipsis suggests incomplete code. Either remove the ellipsis and complete the implementation, or move the import to the top of the file with other imports for better organization.

Copilot · 2025-10-11T16:37:38Z

+        json.dump(agent_config, f)
+
+    print("🚀 Launching simulated native agent...")
+    subprocess.run(["python", "sasc_agent/native_agent_simulator.py", str(agent_config_path)])


Using hardcoded 'python' command without path validation could lead to security issues. Consider using sys.executable to ensure the same Python interpreter is used, or validate the path to the simulator script.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

spiralgang · 2025-10-11T18:06:57Z

@copilot @jules @gs @gemini-cli @gemini-code-assist THIS IS WHAT YOU IMPLEMENT pip install --upgrade google-cloud-aiplatform
gcloud auth application-default login{
"NodeType": "HeaderNode",
"Title": "Ubuntu 24.04",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "OS Version:",
"Version": "24.04.3 LTS"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Kernel Version:",
"Version": "6.11.0-1018-azure"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Image Version:",
"Version": "20250929.60.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Systemd version:",
"Version": "255.4-1ubuntu8.10"
},
{
"NodeType": "HeaderNode",
"Title": "Installed Software",
"Children": [
{
"NodeType": "HeaderNode",
"Title": "Language and Runtime",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "Bash",
"Version": "5.2.21(1)-release"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Clang",
"Versions": [
"16.0.6",
"17.0.6",
"18.1.3"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Clang-format",
"Versions": [
"16.0.6",
"17.0.6",
"18.1.3"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Clang-tidy",
"Versions": [
"16.0.6",
"17.0.6",
"18.1.3"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Dash",
"Version": "0.5.12-6ubuntu5"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "GNU C++",
"Versions": [
"12.4.0",
"13.3.0",
"14.2.0"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "GNU Fortran",
"Versions": [
"12.4.0",
"13.3.0",
"14.2.0"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Julia",
"Version": "1.11.7"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Kotlin",
"Version": "2.2.20-release-333"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Node.js",
"Version": "20.19.5"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Perl",
"Version": "5.38.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Python",
"Version": "3.12.3"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Ruby",
"Version": "3.2.3"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Swift",
"Version": "6.2"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Package Management",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "cpan",
"Version": "1.64"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Helm",
"Version": "3.19.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Homebrew",
"Version": "4.6.15"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Miniconda",
"Version": "25.7.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Npm",
"Version": "10.8.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Pip",
"Version": "24.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Pip3",
"Version": "24.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Pipx",
"Version": "1.7.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "RubyGems",
"Version": "3.4.20"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Vcpkg",
"Version": "(build from commit bed11935ca)"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Yarn",
"Version": "1.22.22"
},
{
"NodeType": "HeaderNode",
"Title": "Environment variables",
"Children": {
"NodeType": "TableNode",
"Headers": "Name|Value",
"Rows": [
"CONDA|/usr/share/miniconda",
"VCPKG_INSTALLATION_ROOT|/usr/local/share/vcpkg"
]
}
},
{
"NodeType": "HeaderNode",
"Title": "Homebrew note",
"Children": {
"NodeType": "NoteNode",
"Content": "Location: /home/linuxbrew\nNote: Homebrew is pre-installed on image but not added to PATH.\nrun the eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)" command\nto accomplish this."
}
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Project Management",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "Ant",
"Version": "1.10.14"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Gradle",
"Version": "9.1.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Lerna",
"Version": "9.0.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Maven",
"Version": "3.9.11"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Tools",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "Ansible",
"Version": "2.19.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "AzCopy",
"Version": "10.30.1 - available by azcopy and azcopy10 aliases"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Bazel",
"Version": "8.4.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Bazelisk",
"Version": "1.26.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Bicep",
"Version": "0.37.4"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Buildah",
"Version": "1.33.7"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "CMake",
"Version": "3.31.6"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "CodeQL Action Bundle",
"Version": "2.23.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Docker Amazon ECR Credential Helper",
"Version": "0.10.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Docker Compose v2",
"Version": "2.38.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Docker-Buildx",
"Version": "0.28.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Docker Client",
"Version": "28.0.4"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Docker Server",
"Version": "28.0.4"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Fastlane",
"Version": "2.228.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Git",
"Version": "2.51.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Git LFS",
"Version": "3.7.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Git-ftp",
"Version": "1.6.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Haveged",
"Version": "1.9.14"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "jq",
"Version": "1.7"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Kind",
"Version": "0.30.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Kubectl",
"Version": "1.34.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Kustomize",
"Version": "5.7.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "MediaInfo",
"Version": "24.01"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Mercurial",
"Version": "6.7.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Minikube",
"Version": "1.37.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "n",
"Version": "10.2.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Newman",
"Version": "6.2.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "nvm",
"Version": "0.40.3"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "OpenSSL",
"Version": "3.0.13-0ubuntu3.5"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Packer",
"Version": "1.14.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Parcel",
"Version": "2.16.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Podman",
"Version": "4.9.3"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Pulumi",
"Version": "3.198.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Skopeo",
"Version": "1.13.3"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Sphinx Open Source Search Server",
"Version": "2.2.11"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "yamllint",
"Version": "1.37.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "yq",
"Version": "4.47.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "zstd",
"Version": "1.5.7"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Ninja",
"Version": "1.13.1"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "CLI Tools",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "AWS CLI",
"Version": "2.31.4"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "AWS CLI Session Manager Plugin",
"Version": "1.2.707.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "AWS SAM CLI",
"Version": "1.144.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Azure CLI",
"Version": "2.77.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Azure CLI (azure-devops)",
"Version": "1.0.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "GitHub CLI",
"Version": "2.80.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Google Cloud CLI",
"Version": "540.0.0"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Java",
"Children": {
"NodeType": "TableNode",
"Headers": "Version|Environment Variable",
"Rows": [
"8.0.462+8|JAVA_HOME_8_X64",
"11.0.28+6|JAVA_HOME_11_X64",
"17.0.16+8 (default)|JAVA_HOME_17_X64",
"21.0.8+9|JAVA_HOME_21_X64",
"25.0.0+36|JAVA_HOME_25_X64"
]
}
},
{
"NodeType": "HeaderNode",
"Title": "PHP Tools",
"Children": [
{
"NodeType": "ToolVersionsListNode",
"ToolName": "PHP",
"Versions": [
"8.3.6"
],
"MajorVersionRegex": "^\d+\.\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Composer",
"Version": "2.8.12"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "PHPUnit",
"Version": "8.5.48"
},
{
"NodeType": "NoteNode",
"Content": "Both Xdebug and PCOV extensions are installed, but only Xdebug is enabled."
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Haskell Tools",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "Cabal",
"Version": "3.16.0.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "GHC",
"Version": "9.12.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "GHCup",
"Version": "0.1.50.2"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Stack",
"Version": "3.7.1"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Rust Tools",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "Cargo",
"Version": "1.90.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Rust",
"Version": "1.90.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Rustdoc",
"Version": "1.90.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Rustup",
"Version": "1.28.2"
},
{
"NodeType": "HeaderNode",
"Title": "Packages",
"Children": {
"NodeType": "ToolVersionNode",
"ToolName": "Rustfmt",
"Version": "1.8.0"
}
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Browsers and Drivers",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "Google Chrome",
"Version": "140.0.7339.207"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "ChromeDriver",
"Version": "140.0.7339.207"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Chromium",
"Version": "140.0.7339.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Microsoft Edge",
"Version": "140.0.3485.94"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Microsoft Edge WebDriver",
"Version": "140.0.3485.94"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Selenium server",
"Version": "4.35.0"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Mozilla Firefox",
"Version": "143.0.1"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "Geckodriver",
"Version": "0.36.0"
},
{
"NodeType": "HeaderNode",
"Title": "Environment variables",
"Children": {
"NodeType": "TableNode",
"Headers": "Name|Value",
"Rows": [
"CHROMEWEBDRIVER|/usr/local/share/chromedriver-linux64",
"EDGEWEBDRIVER|/usr/local/share/edge_driver",
"GECKOWEBDRIVER|/usr/local/share/gecko_driver",
"SELENIUM_JAR_PATH|/usr/share/java/selenium-server.jar"
]
}
}
]
},
{
"NodeType": "HeaderNode",
"Title": ".NET Tools",
"Children": [
{
"NodeType": "ToolVersionsListNode",
"ToolName": ".NET Core SDK",
"Versions": [
"8.0.120",
"8.0.206",
"8.0.317",
"8.0.414",
"9.0.110",
"9.0.205",
"9.0.305"
],
"MajorVersionRegex": "^\d+\.\d+\.\d",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionNode",
"ToolName": "nbgv",
"Version": "3.8.118+69b3e0b5a0"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Databases",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "sqlite3",
"Version": "3.45.1"
},
{
"NodeType": "HeaderNode",
"Title": "PostgreSQL",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "PostgreSQL",
"Version": "16.10"
},
{
"NodeType": "NoteNode",
"Content": "User: postgres\nPostgreSQL service is disabled by default.\nUse the following command as a part of your job to start the service: 'sudo systemctl start postgresql.service'"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "MySQL",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "MySQL",
"Version": "8.0.43-0ubuntu0.24.04.2"
},
{
"NodeType": "NoteNode",
"Content": "User: root\nPassword: root\nMySQL service is disabled by default.\nUse the following command as a part of your job to start the service: 'sudo systemctl start mysql.service'"
}
]
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Cached Tools",
"Children": [
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Go",
"Versions": [
"1.22.12",
"1.23.12",
"1.24.7"
],
"MajorVersionRegex": "^\d+\.\d+",
"ListType": "List"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Node.js",
"Versions": [
"18.20.8",
"20.19.5",
"22.20.0"
],
"MajorVersionRegex": "^\d+",
"ListType": "List"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Python",
"Versions": [
"3.9.23",
"3.10.18",
"3.11.13",
"3.12.11",
"3.13.7"
],
"MajorVersionRegex": "^\d+\.\d+",
"ListType": "List"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "PyPy",
"Versions": [
"3.9.19 [PyPy 7.3.16]",
"3.10.16 [PyPy 7.3.19]",
"3.11.13 [PyPy 7.3.20]"
],
"MajorVersionRegex": "^\d+\.\d+",
"ListType": "List"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Ruby",
"Versions": [
"3.2.9",
"3.3.9",
"3.4.6"
],
"MajorVersionRegex": "^\d+\.\d+",
"ListType": "List"
}
]
},
{
"NodeType": "HeaderNode",
"Title": "PowerShell Tools",
"Children": [
{
"NodeType": "ToolVersionNode",
"ToolName": "PowerShell",
"Version": "7.4.12"
},
{
"NodeType": "HeaderNode",
"Title": "PowerShell Modules",
"Children": [
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Az",
"Versions": [
"12.5.0"
],
"MajorVersionRegex": "^\d+\.\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Microsoft.Graph",
"Versions": [
"2.30.0"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "Pester",
"Versions": [
"5.7.1"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
},
{
"NodeType": "ToolVersionsListNode",
"ToolName": "PSScriptAnalyzer",
"Versions": [
"1.24.0"
],
"MajorVersionRegex": "^\d+",
"ListType": "Inline"
}
]
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Web Servers",
"Children": {
"NodeType": "TableNode",
"Headers": "Name|Version|ConfigFile|ServiceStatus|ListenPort",
"Rows": [
"apache2|2.4.58|/etc/apache2/apache2.conf|inactive|80",
"nginx|1.24.0|/etc/nginx/nginx.conf|inactive|80"
]
}
},
{
"NodeType": "HeaderNode",
"Title": "Android",
"Children": [
{
"NodeType": "TableNode",
"Headers": "Package Name|Version",
"Rows": [
"Android Command Line Tools|12.0",
"Android SDK Build-tools|36.0.0 36.1.0
35.0.0 35.0.1
34.0.0",
"Android SDK Platform-Tools|36.0.0",
"Android SDK Platforms|android-36.1 (rev 1)
android-36-ext19 (rev 1)
android-36-ext18 (rev 1)
android-36 (rev 2)
android-35-ext15 (rev 1)
android-35-ext14 (rev 1)
android-35 (rev 2)
android-34-ext8 (rev 1)
android-34-ext12 (rev 1)
android-34-ext11 (rev 1)
android-34-ext10 (rev 1)
android-34 (rev 3)
android-33-ext5 (rev 1)
android-33-ext4 (rev 1)",
"Android Support Repository|47.0.0",
"CMake|3.31.5",
"Google Play services|49",
"Google Repository|58",
"NDK|26.3.11579264
27.3.13750724 (default)
28.2.13676358"
]
},
{
"NodeType": "HeaderNode",
"Title": "Environment variables",
"Children": {
"NodeType": "TableNode",
"Headers": "Name|Value",
"Rows": [
"ANDROID_HOME|/usr/local/lib/android/sdk",
"ANDROID_NDK|/usr/local/lib/android/sdk/ndk/27.3.13750724",
"ANDROID_NDK_HOME|/usr/local/lib/android/sdk/ndk/27.3.13750724",
"ANDROID_NDK_LATEST_HOME|/usr/local/lib/android/sdk/ndk/28.2.13676358",
"ANDROID_NDK_ROOT|/usr/local/lib/android/sdk/ndk/27.3.13750724",
"ANDROID_SDK_ROOT|/usr/local/lib/android/sdk"
]
}
}
]
},
{
"NodeType": "HeaderNode",
"Title": "Installed apt packages",
"Children": {
"NodeType": "TableNode",
"Headers": "Name|Version",
"Rows": [
"acl|2.3.2-1build1.1",
"aria2|1.37.0+debian-1build3",
"autoconf|2.71-3",
"automake|1:1.16.5-1.3ubuntu1",
"binutils|2.42-4ubuntu2.5",
"bison|2:3.8.2+dfsg-1build2",
"brotli|1.1.0-2build2",
"bzip2|1.0.8-5.1build0.1",
"coreutils|9.4-3ubuntu6.1",
"curl|8.5.0-2ubuntu10.6",
"dbus|1.14.10-4ubuntu4.1",
"dnsutils|1:9.18.39-0ubuntu0.24.04.1",
"dpkg|1.22.6ubuntu6.5",
"dpkg-dev|1.22.6ubuntu6.5",
"fakeroot|1.33-1",
"file|1:5.45-3build1",
"findutils|4.9.0-5build1",
"flex|2.6.4-8.2build1",
"fonts-noto-color-emoji|2.047-0ubuntu0.24.04.1",
"ftp|20230507-2build3",
"g++|4:13.2.0-7ubuntu1",
"gcc|4:13.2.0-7ubuntu1",
"gnupg2|2.4.4-2ubuntu17.3",
"haveged|1.9.14-1ubuntu2",
"iproute2|6.1.0-1ubuntu6.2",
"iputils-ping|3:20240117-1ubuntu0.1",
"jq|1.7.1-3ubuntu0.24.04.1",
"libsqlite3-dev|3.45.1-1ubuntu2.5",
"libssl-dev|3.0.13-0ubuntu3.5",
"libtool|2.4.7-7build1",
"libyaml-dev|0.2.5-1build1",
"locales|2.39-0ubuntu8.6",
"lz4|1.9.4-1build1.1",
"m4|1.4.19-4build1",
"make|4.3-4.1build2",
"mediainfo|24.01.1-1build2",
"mercurial|6.7.2-1ubuntu2.2",
"net-tools|2.10-0.1ubuntu4.4",
"netcat|1.226-1ubuntu2",
"openssh-client|1:9.6p1-3ubuntu13.14",
"p7zip-full|16.02+transitional.1",
"p7zip-rar|16.02+transitional.1",
"parallel|20231122+ds-1",
"patchelf|0.18.0-1.1build1",
"pigz|2.8-1",
"pkg-config|1.8.1-2build1",
"pollinate|4.33-3.1ubuntu1.1",
"python-is-python3|3.11.4-1",
"rpm|4.18.2+dfsg-2.1build2",
"rsync|3.2.7-1ubuntu1.2",
"shellcheck|0.9.0-1",
"sphinxsearch|2.2.11-8build1",
"sqlite3|3.45.1-1ubuntu2.5",
"ssh|1:9.6p1-3ubuntu13.14",
"sshpass|1.09-1",
"sudo|1.9.15p5-3ubuntu5.24.04.1",
"swig|4.2.0-2ubuntu1",
"systemd-coredump|255.4-1ubuntu8.10",
"tar|1.35+dfsg-3build1",
"telnet|0.17+2.5-3ubuntu4",
"texinfo|7.1-3build2",
"time|1.9-0.2build1",
"tk|8.6.14build1",
"tree|2.1.1-2ubuntu3",
"tzdata|2025b-0ubuntu0.24.04.1",
"unzip|6.0-28ubuntu4.1",
"upx|4.2.2-3",
"wget|1.21.4-1ubuntu4.1",
"xvfb|2:21.1.12-1ubuntu1.4",
"xz-utils|5.6.1+really5.4.5-1ubuntu0.2",
"zip|3.0-13ubuntu0.2",
"zsync|0.6.2-5build1"
]
}
}
]
}
]
}https://webllm.mlc.ai/https://github.com/mlc-ai/mlc-llmhttps://llm.mlc.ai/docs/install/mlc_llmhttps://discord.com/invite/9Xpy2HGBuD

spiralgang · 2025-10-11T18:08:24Z

@copilot @jules continue implementing
MLC Models
code
Show Source

Table Of Contents
MLC Models

Table Of Contents
MLC Models
MLC Models¶
Available Models¶
Model ID Quantization Link
Llama-3-8B-Instruct q0f16 HuggingFace
Llama-3-8B-Instruct q3f16_1 HuggingFace
Llama-3-8B-Instruct q4f16_1 HuggingFace
Llama-3-8B-Instruct q4f32_1 HuggingFace
Llama-3.1-70B-Instruct q0f16 HuggingFace
Llama-3.1-70B-Instruct q3f16_1 HuggingFace
Llama-3.1-70B-Instruct q4f16_1 HuggingFace
Llama-3.1-70B-Instruct q4f32_1 HuggingFace
Llama-3.1-8B q0f16 HuggingFace
Llama-3.1-8B q4f16_1 HuggingFace
Llama-3.1-8B q4f32_1 HuggingFace
Llama-3.1-8B-Instruct q0f16 HuggingFace
Llama-3.1-8B-Instruct q3f16_0 HuggingFace
Llama-3.1-8B-Instruct q3f16_1 HuggingFace
Llama-3.1-8B-Instruct q4f16_1 HuggingFace
Llama-3.1-8B-Instruct q4f32_1 HuggingFace
Llama-3.2-1B-Instruct q0f16 HuggingFace
Llama-3.2-1B-Instruct q0f32 HuggingFace
Llama-3.2-1B-Instruct q4f16_0 HuggingFace
Llama-3.2-1B-Instruct q4f16_1 HuggingFace
Llama-3.2-1B-Instruct q4f32_1 HuggingFace
Llama-3.2-3B-Instruct q0f16 HuggingFace
Llama-3.2-3B-Instruct q0f32 HuggingFace
Llama-3.2-3B-Instruct q4f16_0 HuggingFace
Llama-3.2-3B-Instruct q4f16_1 HuggingFace
Llama-3.2-3B-Instruct q4f32_1 HuggingFace
Hermes-2-Pro-Llama-3-8B q0f16 HuggingFace
Hermes-2-Pro-Llama-3-8B q3f16_1 HuggingFace
Hermes-2-Pro-Llama-3-8B q4f16_1 HuggingFace
Hermes-2-Pro-Llama-3-8B q4f32_1 HuggingFace
Hermes-2-Theta-Llama-3-70B q0f16 HuggingFace
Hermes-2-Theta-Llama-3-70B q3f16_1 HuggingFace
Hermes-2-Theta-Llama-3-70B q4f16_1 HuggingFace
Hermes-2-Theta-Llama-3-70B q4f32_1 HuggingFace
Hermes-2-Theta-Llama-3-8B q0f16 HuggingFace
Hermes-2-Theta-Llama-3-8B q3f16_1 HuggingFace
Hermes-2-Theta-Llama-3-8B q4f16_1 HuggingFace
Hermes-2-Theta-Llama-3-8B q4f32_1 HuggingFace
Hermes-3-Llama-3.1-8B q0f16 HuggingFace
Hermes-3-Llama-3.1-8B q3f16_1 HuggingFace
Hermes-3-Llama-3.1-8B q4f16_1 HuggingFace
Hermes-3-Llama-3.1-8B q4f32_1 HuggingFace
Hermes-3-Llama-3.2-3B q0f16 HuggingFace
Hermes-3-Llama-3.2-3B q4f16_1 HuggingFace
Hermes-3-Llama-3.2-3B q4f32_1 HuggingFace
Phi-3-mini-128k-instruct q0f16 HuggingFace
Phi-3-mini-128k-instruct q4f16_1 HuggingFace
Phi-3-mini-128k-instruct q4f32_1 HuggingFace
Phi-3.5-mini-instruct q0f16 HuggingFace
Phi-3.5-mini-instruct q4f16_0 HuggingFace
Phi-3.5-mini-instruct q4f16_1 HuggingFace
Phi-3.5-mini-instruct q4f32_1 HuggingFace
Phi-3.5-vision-instruct q0f16 HuggingFace
Phi-3.5-vision-instruct q3f16_1 HuggingFace
Phi-3.5-vision-instruct q4f16_1 HuggingFace
Phi-3.5-vision-instruct q4f32_1 HuggingFace
Phi-4-mini-instruct q0f16 HuggingFace
Phi-4-mini-instruct q4f16_1 HuggingFace
Phi-4-mini-instruct q4f32_1 HuggingFace
Mistral-7B-Instruct-v0.3 q0f16 HuggingFace
Mistral-7B-Instruct-v0.3 q3f16_1 HuggingFace
Mistral-7B-Instruct-v0.3 q4f16_0 HuggingFace
Mistral-7B-Instruct-v0.3 q4f16_1 HuggingFace
Mistral-7B-Instruct-v0.3 q4f32_1 HuggingFace
Qwen1.5-0.5B-Chat q0f16 HuggingFace
Qwen1.5-0.5B-Chat q4f16_1 HuggingFace
Qwen1.5-0.5B-Chat q4f32_1 HuggingFace
Qwen1.5-1.8B-Chat q0f16 HuggingFace
Qwen1.5-1.8B-Chat q4f16_1 HuggingFace
Qwen1.5-1.8B-Chat q4f32_1 HuggingFace
Qwen2-0.5B-Instruct q0f16 HuggingFace
Qwen2-0.5B-Instruct q0f32 HuggingFace
Qwen2-0.5B-Instruct q4f16_0 HuggingFace
Qwen2-0.5B-Instruct q4f16_1 HuggingFace
Qwen2-0.5B-Instruct q4f32_1 HuggingFace
Qwen2-1.5B-Instruct q0f16 HuggingFace
Qwen2-1.5B-Instruct q4f16_0 HuggingFace
Qwen2-1.5B-Instruct q4f16_1 HuggingFace
Qwen2-1.5B-Instruct q4f32_1 HuggingFace
Qwen2-72B-Instruct q0f16 HuggingFace
Qwen2-72B-Instruct q4f16_1 HuggingFace
Qwen2-72B-Instruct q4f32_1 HuggingFace
Qwen2-7B-Instruct q0f16 HuggingFace
Qwen2-7B-Instruct q4f16_1 HuggingFace
Qwen2-7B-Instruct q4f32_1 HuggingFace
Qwen2-Math-1.5B-Instruct q0f16 HuggingFace
Qwen2-Math-1.5B-Instruct q4f16_1 HuggingFace
Qwen2-Math-1.5B-Instruct q4f32_1 HuggingFace
Qwen2-Math-72B-Instruct q0f16 HuggingFace
Qwen2-Math-72B-Instruct q4f16_1 HuggingFace
Qwen2-Math-72B-Instruct q4f32_1 HuggingFace
Qwen2-Math-7B-Instruct q0f16 HuggingFace
Qwen2-Math-7B-Instruct q4f16_1 HuggingFace
Qwen2-Math-7B-Instruct q4f32_1 HuggingFace
Qwen2.5-0.5B-Instruct q0f16 HuggingFace
Qwen2.5-0.5B-Instruct q0f32 HuggingFace
Qwen2.5-0.5B-Instruct q4f16_1 HuggingFace
Qwen2.5-0.5B-Instruct q4f32_1 HuggingFace
Qwen2.5-1.5B-Instruct q0f16 HuggingFace
Qwen2.5-1.5B-Instruct q4f16_1 HuggingFace
Qwen2.5-1.5B-Instruct q4f32_1 HuggingFace
Qwen2.5-14B-Instruct q0f16 HuggingFace
Qwen2.5-14B-Instruct q4f16_1 HuggingFace
Qwen2.5-14B-Instruct q4f32_1 HuggingFace
Qwen2.5-32B-Instruct q0f16 HuggingFace
Qwen2.5-32B-Instruct q4f16_1 HuggingFace
Qwen2.5-32B-Instruct q4f32_1 HuggingFace
Qwen2.5-3B-Instruct q0f16 HuggingFace
Qwen2.5-3B-Instruct q4f16_1 HuggingFace
Qwen2.5-3B-Instruct q4f32_1 HuggingFace
Qwen2.5-72B-Instruct q0f16 HuggingFace
Qwen2.5-72B-Instruct q4f16_1 HuggingFace
Qwen2.5-72B-Instruct q4f32_1 HuggingFace
Qwen2.5-7B-Instruct q0f16 HuggingFace
Qwen2.5-7B-Instruct q4f16_1 HuggingFace
Qwen2.5-7B-Instruct q4f32_1 HuggingFace
Qwen2.5-Coder-0.5B-Instruct q0f16 HuggingFace
Qwen2.5-Coder-0.5B-Instruct q0f32 HuggingFace
Qwen2.5-Coder-0.5B-Instruct q4f16_0 HuggingFace
Qwen2.5-Coder-0.5B-Instruct q4f16_1 HuggingFace
Qwen2.5-Coder-0.5B-Instruct q4f32_1 HuggingFace
Qwen2.5-Coder-1.5B-Instruct q0f16 HuggingFace
Qwen2.5-Coder-1.5B-Instruct q4f16_1 HuggingFace
Qwen2.5-Coder-1.5B-Instruct q4f32_1 HuggingFace
Qwen2.5-Coder-14B-Instruct q0f16 HuggingFace
Qwen2.5-Coder-14B-Instruct q4f16_1 HuggingFace
Qwen2.5-Coder-14B-Instruct q4f32_1 HuggingFace
Qwen2.5-Coder-32B-Instruct q0f16 HuggingFace
Qwen2.5-Coder-32B-Instruct q4f16_1 HuggingFace
Qwen2.5-Coder-32B-Instruct q4f32_1 HuggingFace
Qwen2.5-Coder-3B-Instruct q0f16 HuggingFace
Qwen2.5-Coder-3B-Instruct q4f16_1 HuggingFace
Qwen2.5-Coder-3B-Instruct q4f32_1 HuggingFace
Qwen2.5-Coder-7B-Instruct q0f16 HuggingFace
Qwen2.5-Coder-7B-Instruct q4f16_1 HuggingFace
Qwen2.5-Coder-7B-Instruct q4f32_1 HuggingFace
Qwen2.5-Math-1.5B-Instruct q0f16 HuggingFace
Qwen2.5-Math-1.5B-Instruct q4f16_1 HuggingFace
Qwen2.5-Math-1.5B-Instruct q4f32_1 HuggingFace
Qwen2.5-Math-72B-Instruct q0f16 HuggingFace
Qwen2.5-Math-72B-Instruct q4f16_1 HuggingFace
Qwen2.5-Math-72B-Instruct q4f32_1 HuggingFace
Qwen3-0.6B q0f16 HuggingFace
Qwen3-0.6B q0f32 HuggingFace
Qwen3-0.6B q4f16_0 HuggingFace
Qwen3-0.6B q4f16_1 HuggingFace
Qwen3-0.6B q4f32_1 HuggingFace
Qwen3-1.7B q0f16 HuggingFace
Qwen3-1.7B q4f16_0 HuggingFace
Qwen3-1.7B q4f16_1 HuggingFace
Qwen3-1.7B q4f32_1 HuggingFace
Qwen3-14B q0f16 HuggingFace
Qwen3-14B q4f16_1 HuggingFace
Qwen3-14B q4f32_1 HuggingFace
Qwen3-235B-A22B q0f16 HuggingFace
Qwen3-235B-A22B q4f16_1 HuggingFace
Qwen3-235B-A22B q4f32_1 HuggingFace
Qwen3-30B-A3B q0f16 HuggingFace
Qwen3-30B-A3B q4f16_1 HuggingFace
Qwen3-30B-A3B q4f32_1 HuggingFace
Qwen3-32B q0f16 HuggingFace
Qwen3-32B q4f16_1 HuggingFace
Qwen3-32B q4f32_1 HuggingFace
Qwen3-4B q0f16 HuggingFace
Qwen3-4B q4f16_0 HuggingFace
Qwen3-4B q4f16_1 HuggingFace
Qwen3-4B q4f32_1 HuggingFace
Qwen3-8B q0f16 HuggingFace
Qwen3-8B q4f16_1 HuggingFace
Qwen3-8B q4f32_1 HuggingFace
DeepSeek-R1-Distill-Llama-70B q0f16 HuggingFace
DeepSeek-R1-Distill-Llama-70B q4f16_1 HuggingFace
DeepSeek-R1-Distill-Llama-70B q4f32_1 HuggingFace
DeepSeek-R1-Distill-Llama-8B q0f16 HuggingFace
DeepSeek-R1-Distill-Llama-8B q4f16_1 HuggingFace
DeepSeek-R1-Distill-Llama-8B q4f32_1 HuggingFace
DeepSeek-R1-Distill-Qwen-1.5B q0f16 HuggingFace
DeepSeek-R1-Distill-Qwen-1.5B q4f16_1 HuggingFace
DeepSeek-R1-Distill-Qwen-1.5B q4f32_1 HuggingFace
DeepSeek-R1-Distill-Qwen-14B q0f16 HuggingFace
DeepSeek-R1-Distill-Qwen-14B q4f16_1 HuggingFace
DeepSeek-R1-Distill-Qwen-14B q4f32_1 HuggingFace
DeepSeek-R1-Distill-Qwen-32B q0f16 HuggingFace
DeepSeek-R1-Distill-Qwen-32B q4f16_1 HuggingFace
DeepSeek-R1-Distill-Qwen-32B q4f32_1 HuggingFace
DeepSeek-R1-Distill-Qwen-7B q0f16 HuggingFace
DeepSeek-R1-Distill-Qwen-7B q4f16_1 HuggingFace
DeepSeek-R1-Distill-Qwen-7B q4f32_1 HuggingFace
DeepSeek-V2-Lite-Chat q0f16 HuggingFace
DeepSeek-V2-Lite-Chat q4f16_1 HuggingFace
DeepSeek-V2-Lite-Chat q4f32_1 HuggingFace
Mixtral-8x7B-Instruct-v0.1 q0f16 HuggingFace
Mixtral-8x7B-Instruct-v0.1 q4f16_1 HuggingFace
Mixtral-8x7B-Instruct-v0.1 q4f32_1 HuggingFace
QwQ-32B q0f16 HuggingFace
QwQ-32B q4f16_0 HuggingFace
QwQ-32B q4f16_1 HuggingFace
QwQ-32B q4f32_1 HuggingFace
SmolLM-1.7B-Instruct q0f16 HuggingFace
SmolLM-1.7B-Instruct q0f32 HuggingFace
SmolLM-1.7B-Instruct q4f16_1 HuggingFace
SmolLM-1.7B-Instruct q4f32_1 HuggingFace
SmolLM-135M-Instruct q0f16 HuggingFace
SmolLM-135M-Instruct q0f32 HuggingFace
SmolLM-135M-Instruct q4f16_1 HuggingFace
SmolLM-135M-Instruct q4f32_1 HuggingFace
SmolLM-360M-Instruct q0f16 HuggingFace
SmolLM-360M-Instruct q0f32 HuggingFace
SmolLM-360M-Instruct q4f16_1 HuggingFace
SmolLM-360M-Instruct q4f32_1 HuggingFace
SmolLM2-1.7B-Instruct q0f16 HuggingFace
SmolLM2-1.7B-Instruct q4f16_1 HuggingFace
SmolLM2-1.7B-Instruct q4f32_1 HuggingFace
SmolLM2-135M-Instruct q0f16 HuggingFace
SmolLM2-135M-Instruct q0f32 HuggingFace
SmolLM2-135M-Instruct q4f16_1 HuggingFace
SmolLM2-135M-Instruct q4f32_1 HuggingFace
SmolLM2-360M-Instruct q0f16 HuggingFace
SmolLM2-360M-Instruct q0f32 HuggingFace
SmolLM2-360M-Instruct q4f16_1 HuggingFace
SmolLM2-360M-Instruct q4f32_1 HuggingFace
gemma-2-27b-it q0f16 HuggingFace
gemma-2-27b-it q4f16_1 HuggingFace
gemma-2-27b-it q4f32_1 HuggingFace
gemma-2-2b-it q0f16 HuggingFace
gemma-2-2b-it q0f32 HuggingFace
gemma-2-2b-it q4f16_0 HuggingFace
gemma-2-2b-it q4f16_1 HuggingFace
gemma-2-2b-it q4f32_1 HuggingFace
gemma-2-2b-jpn-it q0f16 HuggingFace
gemma-2-2b-jpn-it q0f32 HuggingFace
gemma-2-2b-jpn-it q4f16_1 HuggingFace
gemma-2-2b-jpn-it q4f32_1 HuggingFace
gemma-2-9b-it q0f16 HuggingFace
gemma-2-9b-it q3f16_1 HuggingFace
gemma-2-9b-it q4f16_1 HuggingFace
gemma-2-9b-it q4f32_1 HuggingFace
gemma-3-12b-it q0bf16 HuggingFace
gemma-3-12b-it q0f16 HuggingFace
gemma-3-12b-it q0f32 HuggingFace
gemma-3-12b-it q4bf16_0 HuggingFace
gemma-3-12b-it q4bf16_1 HuggingFace
gemma-3-12b-it q4f16_1 HuggingFace
gemma-3-12b-it q4f32_1 HuggingFace
gemma-3-1b-it q0bf16 HuggingFace
gemma-3-1b-it q0f16 HuggingFace
gemma-3-1b-it q0f32 HuggingFace
gemma-3-1b-it q4bf16_0 HuggingFace
gemma-3-1b-it q4bf16_1 HuggingFace
gemma-3-1b-it q4f16_1 HuggingFace
gemma-3-1b-it q4f32_1 HuggingFace
gemma-3-27b-it q0bf16 HuggingFace
gemma-3-27b-it q0f16 HuggingFace
gemma-3-27b-it q0f32 HuggingFace
gemma-3-27b-it q4bf16_0 HuggingFace
gemma-3-27b-it q4bf16_1 HuggingFace
gemma-3-27b-it q4f16_1 HuggingFace
gemma-3-27b-it q4f32_1 HuggingFace
gemma-3-4b-it q0bf16 HuggingFace
gemma-3-4b-it q0f16 HuggingFace
gemma-3-4b-it q0f32 HuggingFace
gemma-3-4b-it q4bf16_0 HuggingFace
gemma-3-4b-it q4bf16_1 HuggingFace
gemma-3-4b-it q4f16_1 HuggingFace
gemma-3-4b-it q4f32_1 HuggingFace
internlm2_5-1_8b q0f16 HuggingFace
internlm2_5-1_8b q4f16_1 HuggingFace
internlm2_5-1_8b q4f32_1 HuggingFace
internlm2_5-1_8b-chat q0f16 HuggingFace
internlm2_5-1_8b-chat q4f16_1 HuggingFace
internlm2_5-1_8b-chat q4f32_1 HuggingFace
internlm2_5-20b q0f16 HuggingFace
internlm2_5-20b q4f16_1 HuggingFace
internlm2_5-20b q4f32_1 HuggingFace
internlm2_5-20b-chat q0f16 HuggingFace
internlm2_5-20b-chat q4f16_1 HuggingFace
internlm2_5-20b-chat q4f32_1 HuggingFace
internlm2_5-7b q0f16 HuggingFace
internlm2_5-7b q4f16_1 HuggingFace
internlm2_5-7b q4f32_1 HuggingFace
internlm2_5-7b-chat q0f16 HuggingFace
internlm2_5-7b-chat q4f16_1 HuggingFace
internlm2_5-7b-chat q4f32_1 HuggingFace
Table Of Contents

MLC Models
MLC Modelscurl -X GET -o ./vertex_model_garden_gemma.tar.gz "https://storage.googleapis.com/vertex-model-garden-restricted-us/gemma.tar.gz?GoogleAccessId=service-689411112969@gcp-sa-aiplatform.iam.gserviceaccount.com&Expires=1760270472&Signature=OEnRjh8R%252F1aFpdQpYxEs3r3dpH4RAj%252FexiHQuZHwaiUgswf3AQA9ZxXpck4Wl%252FdLu7RumpD%252FUNG8snyWsuVXlpkglwwZSiBq0Zho2tfKAyeMXCuJfSf6TqC6G2GA%252FjEsmhuk5sIYYiaWimQ0CbEGS76U%252FS2x3Ye%252BJdJk2HSP7aBM3%252FzrwWof9XkXbKorA2P7JQJ7JvGGaNGneM6uYGNWTFTP%252BRk3YgheZ4B%252BqYCoka0DzZjxagaJseXK%252BoyYdkdWmIf8UjLBLUzakQN0LfCdNGQVkuz6NgsnzLLUmne7FojeB8GyFplrT79Yjp4bZTEbQ0Z2yBhVWu%252FXpbUrNq%252FM0A%253D%253D"https://source.android.com/docs/devices/cuttlefish/custom-devicehttps://android.googlesource.com/device/google/cuttlefish/+refs

spiralgang · 2025-10-11T18:08:52Z

@copilot @jules Skip to main content

Sign in

Documentation

Starting March 27, 2025, we recommend using android-latest-release instead of aosp-main to build and contribute to AOSP. For more information, see Changes to AOSP.

AOSP

Docs

Android Devices

Was this helpful?

Cuttlefish: Create a custom device

bookmark_border

This page describes how to create a customized Cuttlefish device. Cuttlefish includes predefined device types in different form factors as listed in AndroidProducts.mk. In addition to the general device customization options described in Adding a new device, you can make Cuttlefish-specific customizations such as virtual device board configuration presets (vsoc_x86_64, vsoc_arm64, vsoc_riscv64), kernel prebuilts, bootloader prebuilts, vendor properties, system configurations, nested virtualization support, and display options. For a full list of the build time parameters that can be customized, see device/google/cuttlefish/vsoc_x86_64/phone/aosp_cf.mk.

The following steps describe how to create a fictional x86-64 big_phone device ten times the size of an ordinary Cuttlefish device.

Inherit from an existing target

To inherit from an existing target:

Create a device/google/cuttlefish/vsoc_x86_64/big_phone directory.Create an aosp_cf.mk file in that directory.

$(call inherit-product, device/google/cuttlefish/vsoc_x86_64_phone.mk) PRODUCT_NAME: big_phone PRODUCT_DEVICE: vsoc_x86_64 PRODUCT_MANUFACTURER := My Company PRODUCT_MODEL: My Company very large phone PRODUCT_VENDOR_PROPERTIES += \ ro.soc.manufacturer=$(PRODUCT_MANUFACTURER) \ ro.soc.model=$(PRODUCT_DEVICE) Add a lunch target

Insert the lunch target into the device/google/cuttlefish/AndroidProducts.mk file:

PRODUCT_MAKEFILES := \ ... big_phone:$(LOCAL_DIR)/vsoc_x86_64/big_phone/aosp_cf.mk ...

lunch big_phoneDefine JSON configuration

To launch the Cuttlefish device, create a JSON configuration file named big_phone.json with a hierarchical structure representing the device properties. For example, in the JSON configuration file, you can specify options such as the RAM allocated for the VM and the display configuration. This file doesn't have to be in the AOSP tree. For details on the JSON format for configurations, see Canonical configurations.

{ "instances": [ { "vm": { "memory_mb": 40960, }, "graphics": { "displays": [ { "width": 7200, "height": 12800, "dpi": 320 } ] } } ] }

To launch the configuration, run:

cvd create --config_file=big_phone.jsonRun launch_cvd (legacy)

Certain configuration properties aren't available in the hierarchical JSON configuration format. For such configurations, you can set launch_cvd flag default values from a single-level JSON dictionary. For a full list of all configuration options, see cf_flags_validator.cpp.

The following describes an example of how to override the default values of the launch_cvd flag options using a JSON configuration file and enable the Cuttlefish launcher to launch the custom configuration.

Create a JSON configuration file, device/google/cuttlefish/shared/config/config_big_phone.json, with custom values.

{ "x_res": 7200, "y_res": 12800, "dpi": 320, "memory_mb": 40960, "ddr_mem_mb": 49150, }

For the Cuttlefish launcher to launch the big_phone config, ensure it has access to the device/google/cuttlefish/shared/config/config_big_phone.json file by doing the following:

Declare the JSON artifact as a build artifact by adding the prebuilt_etc_host stanza in the device/google/cuttlefish/shared/config/Android.bp file.

Skip to main content

Sign in

Documentation

Starting March 27, 2025, we recommend using android-latest-release instead of aosp-main to build and contribute to AOSP. For more information, see Changes to AOSP.

AOSP

Docs

Android Devices

Was this helpful?

Cuttlefish virtual Android devices

bookmark_border

Cuttlefish is a configurable virtual Android device that can run both remotely (using third-party cloud offerings such as Google Cloud Engine) and locally (on Linux x86 and ARM64 machines).

Cuttlefish goalsFree the platform and app developer from being dependent on physical hardware to develop and validate code changes.Replicate the framework-based behavior of a real device with a focus on high fidelity by maintaining close alignment with the core framework.Support all API levels after 28.Provide a consistent level of functionality across API levels, aligned with the behavior on physical hardware.

Enable scale:

Provide the ability to run multiple devices in parallel.Enable concurrent test execution with high fidelity at a lower cost of entry.

Provide a configurable device with the ability to adjust form factor, RAM, CPUs, etc.

Compare Cuttlefish to other devicesCuttlefish and Android Emulator

There are many similarities with the Android Emulator, but Cuttlefish guarantees full fidelity with Android framework (whether this is pure AOSP or a custom implementation in your own tree). In real-world application, this means that you should expect Cuttlefish to respond to your interactions at the OS level just like a physical phone target built with the same customized or pure Android OS source.

The Android Emulator has been built around the use case of making app development easy, and it contains many functional hooks to appeal to the use cases of the Android app developer. This may present challenges if you want to build an emulator with your customized Android framework. If you need a virtual device that will be representative of your custom platform/framework code or tip-of-tree Android, then Cuttlefish is an ideal virtual option. It is the canonical device for representing the current state of AOSP development.

Cuttlefish and physical device

The primary differences between a Cuttlefish virtual device and your physical device are at the hardware abstraction layer (HAL) level, as well as any software that interacts with any custom hardware. Except for hardware-specific implementations, you should expect functionally equivalent behavior between Cuttlefish and a physical device.

How can Cuttlefish help?

You can interact with Cuttlefish just like you would with any other Android device that you might use for debugging. It will register itself as a normal device via adb and you can interact with it like a physical device via remote desktop. The use cases are broad and can span app testing, custom system build testing and more.

Because Cuttlefish strives for full framework fidelity, it can be used for functional testing of your framework or apps where there are no physical hardware dependencies that are impossible to emulate.

How is Cuttlefish commonly used for testing today?

Some common applications of Cuttlefish for testing include:

CTSFramework complianceContinuous integration testingCustom test suitesCan I host Cuttlefish in the cloud?

Yes, Cuttlefish natively supports Google Cloud and support for other cloud platforms is planned.

Get started

For guidance on creating a Cuttlefish instance based on AOSP, see Use Cuttlefish.

Was this helpful?

Content and code samples on this page are subject to the licenses described in the Content License. Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

Last updated 2025-10-09 UTC.

BuildAndroid repositoryRequirementsDownloadingPreview binariesFactory imagesDriver binariesConnect@Android on Twitter@AndroidDev on TwitterAndroid BlogGoogle Security BlogPlatform on Google GroupsBuilding on Google GroupsPorting on Google GroupsGet helpAndroid Help CenterPixel Help Centerwww.android.comGoogle Mobile ServicesStack OverflowIssue Tracker

About AndroidCommunityLegalLicensePrivacySite feedbackLanguage

Unleash Agentic AI with Microsoft Semantic Kernel

"Transform AI from reactive tools to proactive agents."

🚀 Introduction

Traditional AI models respond to prompts. But what if your AI could plan, reason, and act autonomously?

Welcome to the world of Agentic AI—where intelligent agents collaborate, make decisions, and execute tasks with minimal human intervention.

Agentic AI refers to AI systems that act as goal-driven agents, capable of planning and executing actions autonomously to achieve a user-defined objective. Rather than just generating outputs (text, code, images) in isolation, an agentic AI can leverage tools, data sources, and other software in a coordinated way with minimal human intervention. In essence, these are LLM-powered systems designed to carry out multi-step tasks using whatever tools or information are necessary – without requiring a human to prompt each step.

At its core, an agentic AI system comprises one or more AI agents (often built on large language models) that mimic human-like reasoning to solve problems in real time.Each agent can handle a specific subtask or role, and together they coordinate through an AI orchestration mechanism to fulfill the overall goal.Unlike traditional AI models bound by predefined inputs/outputs, agentic agents exhibit agency – the ability to act independently and purposefully towards a goal.They maintain long-term objectives, break them into sub-tasks, and adapt their strategy as needed. This makes them autonomous, adaptive, and goal-driven by design.

Why does Agentic AI matter now?

Autonomy and Efficiency – Agentic AI allows tasks to be completed without constant human oversight.Extended Capabilities via Tools – By connecting LLM-based agents with external tools and data, we overcome the limits of standalone models.Decision-Making and Adaptability – Agents exhibit a degree of proactive problem-solving.Natural Interaction – Because these agents are often built on LLMs, humans can instruct them in plain language and get complex jobs done without writing code or manually orchestrating each step.

These advantages explain why agentic AI is seen as the next evolution in AI-driven software. It unlocks a form of AI automation that goes beyond single-turn Q&A or predictions. Indeed, as generative AI technology has matured, there’s a growing demand to harness it in an agentic fashion.

🧠 Why Semantic Kernel?

Microsoft Semantic Kernel (SK) is an open-source framework and SDK specifically designed to help developers create these agentic AI applications with ease and reliability. In short, Semantic Kernel provides the scaffolding to integrate cutting-edge large language models into your own apps as orchestrated agents. Its design philosophy centers on bridging the gap between AI models and real-world software, enabling you to go from an LLM’s text output to actual actions and results. Unlike a monolithic “AI assistant” API, SK offers a modular, programmable approach – you can plug in your existing code, APIs, and data stores as skills the AI can use, and have fine-grained control over the AI’s behavior. It’s described as a “model-agnostic SDK” for building, orchestrating, and deploying AI agents and multi-agent systems,reflecting that it’s not tied to one provider or model. Whether you are building a simple chatbot that calls a couple of APIs or a complex multi-agent workflow spanning cloud services, SK aims to provide the tools to do so with enterprise-grade robustness and flexibility.

Modular Design: Build reusable components called "skills" that agents can leverage.Enterprise-Ready: Designed with observability, security, and scalability in mind.Cross-Platform Support: Compatible with C#, Python, and Java, making it versatile for various projects.Seamless Integration: Easily connect with Azure AI services, OpenAI, and other tools for extended functionalities.🔍 What You'll LearnAgent Framework: Understand how to create modular AI agents using SK's robust framework.Multi-Agent Orchestration: Learn to coordinate multiple agents for complex task execution.Integration with Azure AI: Explore how SK integrates with Azure AI services for enhanced capabilitiesReal-World Applications: Build agents that can automate workflows, analyze data, and interact with users intelligently.👇 Ready to Build Intelligent Agents?

Dive in to transform your AI applications from simple responders to intelligent agents capable of autonomous decision-making and action.

Let's get started!

Installation and Setup !!

!pip install semantic-kernel

Endpoint = "Endpoint_Name" Key = "Endpoint_Key" Deployment_name = "gpt-4o"

import asyncio import semantic_kernel as sk from semantic_kernel.functions import kernel_function, KernelArguments from semantic_kernel.agents import ChatCompletionAgent from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

async def main(): # Initialize a chat agent with basic instructions agent = ChatCompletionAgent( service=AzureChatCompletion(endpoint = Endpoint,api_key = Key,deployment_name = Deployment_name), name="SK-explorer", instructions="You are a crazy Motivator.", ) response = await agent.get_response(messages="Motivate us to deep dive in Microsoft Agentic AI developemnt framework Semantic Kernel.") print(response.content) await main()

🚀 Exploring Beyond Basics

In the previous section, we successfully created our first basic agent with Semantic Kernel — a Motivator Agent that generates creative, inspiring messages. But this is just the beginning. Now, we will take our journey further, transforming our basic agent into something far more capable and versatile.

🔍 What’s Next? Building Intelligent Agents with Skills

We are now ready to leverage core concepts of Microsoft Semantic Kernel:

This approach will turn our basic agent into a Multi-Skilled, Multi-Agent, capable of performing complex tasks intelligently.

from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings

🔐 Smart PII Masking Agent with LLM-powered Plugin

In this section, we enhance our agent with a Smart PII Masking Plugin that leverages an LLM to intelligently identify and mask Personally Identifiable Information (PII) from text. Unlike static regex-based approaches, this plugin uses language understanding to detect names, emails, phone numbers, and addresses contextually.

✅ Key Features:LLM-Powered Detection: Dynamically identifies PII using a prompt-based interaction with an LLM.Customizable Rules: Supports masking by category (e.g., only names or emails).Integrated as Plugin: Can be easily plugged into any Semantic Kernel Agent.

from semantic_kernel.functions import kernel_function from semantic_kernel.agents import ChatCompletionAgent from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, OpenAIChatPromptExecutionSettings # ✅ Creating the Smart PII Masking Plugin with LLM class SmartPIIMaskingPlugin: @kernel_function(description="Intelligently identifies and masks PII in text") def mask_pii(self, text: str, mask_rules: str = "all") -> str: """ Mask all the PII present in the TEXT. """ # Prompt Template for LLM-based Masking prompt_template = f""" You are a data protection assistant. Analyze the following text and identify any PII (Personally Identifiable Information) such as names, emails, phone numbers, addresses, and replace them with appropriate placeholders. Masking Rules: {mask_rules} Text: {text} Masked Text: """ return prompt_template # Initialize the Agent with Smart Plugin agent = ChatCompletionAgent( service=AzureChatCompletion(endpoint = Endpoint, api_key = Key, deployment_name = Deployment_name), name="SK-Assistant", instructions="You are a data protection assistant. Use your plugin to mask PII in the text.", plugins=[SmartPIIMaskingPlugin()] ) import asyncio async def main(): # Using the Smart PII Masking Plugin with LLM-powered Rules text = "I am Purvansh Singh, I live in 503 Jaipur, Rajasthan and my email is purvansh.singh@example.com." response = await agent.get_response(messages=text) print("✅ Plugin Response:"

google-labs-jules · 2025-10-11T18:08:55Z

I have received the new specifications and instructions. I understand that the next phase of this project is to integrate with the Google Cloud AI Platform and MLC LLM, using the provided environment configuration as the target. I will now begin working on this new implementation.

spiralgang · 2025-10-11T18:09:21Z

@copilot @jules Sign in

android / device / google / cuttlefish

Bug: 70284148

Clone this repo:

Branchesmainaml_ads_341826060aml_ads_341826300aml_art_341810020aml_ase_341810000aml_cbr_341810000aml_con_341810060aml_doc_341813030aml_ext_341816020aml_hef_341811030

More...

Tagsandroid-platform-13.0.0_r34android-platform-14.0.0_r22android-platform-15.0.0_r11android-automotiveos-15.0.0_lts3android-security-13.0.0_r31android-security-14.0.0_r22android-security-15.0.0_r10android-vts-15.0_r5android-vts-14.0_r9android-vts-13.0_r13

More...

a1162ca Merge "Convert PRODUCT_COPY_FILES for init.rc to Android.bp" into main by Treehugger Robot · 7 months ago main masterc4fa1d7 Convert PRODUCT_COPY_FILES for /vendor/pci.ids to Android.bp by Wei Li · 7 months ago1c50795 Merge "Setting PATH to include JAVA_HOME" into main by Treehugger Robot · 7 months agoa30cf7c Setting PATH to include JAVA_HOME by Ram Muthiah · 7 months agoca91e40 Merge changes from topic "reland-new-cf-sensor-hal" into main by Treehugger Robot · 7 months agoMore »

Note

For all host tools development please refer to https://github.com/google/android-cuttlefish/blob/main/docs/HostToolsMigration.md for more information.

Cuttlefish Getting StartedTry Cuttlefish

Make sure virtualization with KVM is available.

grep -c -w "vmx|svm" /proc/cpuinfo

This should return a non-zero value. If running on a cloud machine, this may take cloud-vendor-specific steps to enable. For Google Compute Engine specifically, see the GCE guide.

ARM specific steps:

When running on an ARM machine, the most direct way is to check for the existence of /dev/kvm. Note that this method can also be used to confirm support of KVM on any environment.

Download, build, and install the host debian packages:

sudo apt install -y git devscripts config-package-dev debhelper-compat golang curl git clone https://github.com/google/android-cuttlefish cd android-cuttlefish # Install build dependencies and build debian packages ./tools/buildutils/build_packages.sh sudo dpkg -i ./cuttlefish-base_*64.deb || sudo apt-get install -f sudo dpkg -i ./cuttlefish-user_*64.deb || sudo apt-get install -f sudo usermod -aG kvm,cvdnetwork,render $USER sudo reboot

The reboot will trigger installing additional kernel modules and applying udev rules.

Go to http://ci.android.com/

Enter a branch name. Start with aosp-main if you don‘t know what you’re looking for

Navigate to aosp_cf_x86_64_phone and click on userdebug for the latest build

For ARM, use branch aosp-main-throttled and device target aosp_cf_arm64_only_phone-trunk_staging-userdebug

Click on Artifacts

Scroll down to the OTA images. These packages look like aosp_cf_x86_64_phone-img-xxxxxx.zip -- it will always have img in the name. Download this file

Scroll down to cvd-host_package.tar.gz. You should always download a host package from the same build as your images.

On your local system, combine the packages:

mkdir cf cd cf tar xvf /path/to/cvd-host_package.tar.gz unzip /path/to/aosp_cf_x86_64_phone-img-xxxxxx.zip

Launch cuttlefish with:

$ HOME=$PWD ./bin/launch_cvd

Debug Cuttlefish

You can use adb to debug it, just like a physical device:

$ ./bin/adb -e shell

Launch Viewer (WebRTC)

When launching with ---start_webrtc (the default), you can see a list of all available devices at https://localhost:8443 . For more information, see the WebRTC on Cuttlefish documentation.

Stop Cuttlefish

You will need to stop the virtual device within the same directory as you used to launch the device.

$ HOME=$PWD ./bin/stop_cvd

Powered by Gitiles| Privacy| Termstxtjson

Skip to main content

Sign in

NDK

Home

NDK

Develop

Guides

Was this helpful?

Image decoder

bookmark_border

The NDK ImageDecoder API provides a standard API for Android C/C++ apps to decode images directly. App developers no longer need to use the Java APIs (via JNI) or third-party image decoding libraries. This API, along with encoding functions in the Bitmap module, enables the following:

Native apps and libraries can be smaller because they no longer have to link their own decoding libraries.Apps and libraries automatically benefit from platform security updates to decoding libraries.Apps can decode images directly into memory they provide. Apps can then post-process the image data (if desired) and pass it to OpenGL or their drawing code.

This page describes how to use the API to decode an image.

Availability and capability

The ImageDecoder API is available on apps that target Android 11 (API level 30) or higher. The implementation is inside the following files:

imagedecoder.h for the decoderbitmap.h for the encoderlibjnigraphics.so

The API supports the following image formats:

JPEGPNGGIFWebP

BMP

ICO

WBMP

HEIF

Digital negatives (via the DNG SDK)

In order to cover all usages of the decoded raw images, this API does not provide higher level objects like those built on top of decoded images inside the Java framework, such as:

Drawable objects.NinePatch: If present in an encoded image, NinePatch chunks are ignored.Bitmap density: AImageDecoder does not do any automatic size adjustment based on the screen's density, but it does allow decoding to a different size via AImageDecoder_setTargetSize().Animations: Only decodes the first frame of an animated GIF or WebP file.Decode an image

Decoding starts with some form of input representing the encoded image. AImageDecoder accepts multiple types of input:

AAsset (shown below)File descriptorBuffer

The following code shows how to open an image Asset from a file, decode it, and then properly dispose of the decoder and asset. To see an example of rendering the decoded image, see the teapot sample.

AAssetManager* nativeManager = AAssetManager_fromJava(env, jAssets); const char* file = // Filename AAsset* asset = AAssetManager_open(nativeManager, file, AASSET_MODE_STREAMING); AImageDecoder* decoder; int result = AImageDecoder_createFromAAsset(asset, &decoder); if (result != ANDROID_IMAGE_DECODER_SUCCESS) { // An error occurred, and the file could not be decoded. } const AImageDecoderHeaderInfo* info = AImageDecoder_getHeaderInfo(decoder); int32_t width = AImageDecoderHeaderInfo_getWidth(info); int32_t height = AImageDecoderHeaderInfo_getHeight(info); AndroidBitmapFormat format = (AndroidBitmapFormat) AImageDecoderHeaderInfo_getAndroidBitmapFormat(info); size_t stride = AImageDecoder_getMinimumStride(decoder); // Image decoder does not // use padding by default size_t size = height * stride; void* pixels = malloc(size); result = AImageDecoder_decodeImage(decoder, pixels, stride, size); if (result != ANDROID_IMAGE_DECODER_SUCCESS) { // An error occurred, and the file could not be decoded. } // We’re done with the decoder, so now it’s safe to delete it. AImageDecoder_delete(decoder); // The decoder is no longer accessing the AAsset, so it is safe to // close it. AAsset_close(asset); // Draw the pixels somewhere // Free the pixels when done drawing with them free(pixels);

Was this helpful?

Content and code samples on this page are subject to the licenses described in the Content License. Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

Last updated 2025-10-10 UTC.

More AndroidAndroidAndroid for EnterpriseSecuritySourceNewsBlogPodcastsDiscoverGamingMachine LearningHealth & FitnessCamera & MediaPrivacy5G

Skip to main content

Sign in

NDK

Home

NDK

Develop

Guides

Was this helpful?

Android Baseline profile

bookmark_border

Today, Google announced the release of the all new Android Baseline 2022 profile for Vulkan.

When we released the initial Android Baseline profile (version 2021), our motivations were simple but important. We wanted to remove the challenges developers consistently encountered when determining what functionality they could rely upon across the diverse set of Android devices.

The Android Baseline 2021 profile addressed this pain point with a Vulkan profile that specified a set of Vulkan extensions, features, formats, and limits that were found on the vast majority of active Android devices in 2021. This profile was created with available data and discussions with Khronos partners to be maximally compatible with both existing and future devices and represents the most advanced set of Vulkan functionality that meets these constraints.

When we first launched the Android Baseline 2021 profile we committed to a 2022 refresh of the profile because we knew that the adoption of Vulkan was quickly reaching toward an inflection point. Now, after a year of data analysis and coordination with our Khronos partners, we are happy to announce that we have successfully created the Android Baseline 2022 profile for Vulkan.

Just as with our initial 2021 profile, the Android Baseline 2022 profile includes a collection of Vulkan extensions, features, formats and limits that are found on the vast majority of active Android devices. However, after only a year of ecosystem advancement we were able to push for a much wider set of functionality. We believe that many developers will be able to quickly benefit from the additional functionality found in this new profile.

We encourage you to read through the full Android Baseline 2022 profile on Github.

The Android Baseline 2022 profile contains all of the same support as the initial 2021 profile, which included such useful and important functionality as:

Compressed textures through ASTC and ETCVariable colorspaces through VK_EXT_swapchain_colorspaceSample shading and multisample interpolation through sampleRateShading

Extending this functionality, the Android Baseline 2022 profile also adds a collection of incredibly valuable improvements, such as:

Full support for Vulkan 1.116 bit integers in shaders through shaderInt16Vulkan and Android Hardware Buffer interoperability through VK_ANDROID_external_memory_android_hardware_bufferQuerying Vulkan driver properties with VK_KHR_driver_propertiesGreater control over renderpass creation with VK_KHR_create_renderpass2

Alongside the release of the 2022 profile we have also made the decision to amend the previous 2021 profile. When the 2021 profile shipped it contained support for the extended set of image gather instructions and extended image gather capability:

"shaderImageGatherExtended": true,"minTexelGatherOffset": -8"maxTexelGatherOffset": 7With greater data analysis and investigation we have discovered that this inclusion was a mistake. The extended image gather functionality is supported at a much lower adoption rate than we demand from the Android Baseline profile, so we chose to amend the 2021 profile to remove these requirements. This change is visible in the amended Github as "revision": 2 under "history". The extended image gather functionality is also absent from the 2022 profile.

The amended Android Baseline 2021 profile will remain visible on Github.

The vast majority of Android devices already in use fully support both the 2021 profile and the 2022 profile without the need for an over-the-air update.

We will continue to broadcast the percentage of Android support for both profiles in the Android Distribution Dashboard at https://developer.android.com/about/dashboards.

Having successfully completed our committed refresh of the 2021 profile, we are not currently committing to any future refreshes of the Android Baseline profile. We will be updating to broader and more advanced sets of Vulkan functionality on an as-needed basis, as determined by developers, partners, and users.

Was this helpful?

Content and code samples on this page are subject to the licenses described in the Content License. Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

Last updated 2023-01-17 UTC.

More AndroidAndroidAndroid for EnterpriseSecuritySourceNewsBlogPodcastsDiscoverGamingMachine LearningHealth & FitnessCamera & MediaPrivacy5GAndroid DevicesLarge screensWear OSChromeOS devicesAndroid for carsAndroid TVReleasesAndroid 15Android 14Android 13Android 12Android 11Android 10PieDocumentation and DownloadsAndroid Studio guideDevelopers guidesAPI referenceDownload StudioAndroid NDKSupportReport platform bugReport documentation bugGoogle Play supportJoin research studies

AndroidChromeFirebaseGoogle Cloud PlatformAll productsPrivacyLicenseBrand guidelinesGet news and tips by emailSubscribeEnglish

google-labs-jules · 2025-10-11T18:10:03Z

Acknowledged. I will now proceed with implementing the integration with MLC LLM and Google Cloud AI Platform as per the new specifications. Workorder 3a is now in progress.

google-labs-jules · 2025-10-11T18:10:04Z

Thank you for the additional context on Cuttlefish and Semantic Kernel. This information is valuable for the ongoing development of the agentic framework.

spiralgang · 2025-10-11T18:18:17Z

@copilot @jules all of these amazing features I've shared cam be super charged and augmented to best fit into the programal design of the partitioned hardened spacezXzhell mobile developer ai first build zone!!!

TechTalks

Home Blog

BlogHow to teach AI to imitate human thought and action

By

Ben Dickson

-

July 3, 2023

Facebook

Twitter

Image source: 123RF (with modifications)

This article is part of our coverage of the latest in AI research.

Artificial intelligence systems can mimic some aspects of human intelligence with impressive results, including detecting objects, navigating environments, playing chess, or even generating text. But cloning human behavior has its limitations. Without backing actions with thought, AI systems can become brittle and make unpredictable mistakes when faced with novel situations.

One recent project by scientists at the University of British Columbia and Vector Institute shows the benefits of getting AI systems to think like humans. They propose a technique called “Thought Cloning,” which trains the AI on thoughts and actions at the same time.

Thought cloning can enable deep learning models to generate a sort of reasoning process for their actions and convey that reasoning to human operators. There are many benefits to Thought cloning, including training efficiency, troubleshooting and error fixing, and preventing harmful behavior.

Behavior cloning vs thought cloning

Many deep learning systems are trained on data generated by humans. For example, training data can be the list of moves in a chess game or the sequence of actions in a strategy game. It can be real-world actions such as completing tasks in a warehouse. By training on a large enough dataset, the AI agent will be able to create a model of human behavior on that task.

Behavior cloning

But while the model can learn to mimic human behavior and reach the same results on many tasks, it does not necessarily learn the reasoning behind those actions. Without the thought process, the AI agent will not be able to generalize the learned actions to new settings. Consequently, it will require a much larger training dataset that includes all possible scenarios. And it will still remain unpredictable in the face of unseen edge cases.

The hypothesis behind thought cloning is that if you train a model on actions and their corresponding thoughts, then the model will learn the right associations between behavior and goals. And it will also be able to generate and communicate the reasoning behind its actions.

To achieve thought cloning in ML models, you provide the model with multiple streams of information during training. One is the action observations, such as the moves that a player is performing in a game. The second is the thought stream, such as the explanation behind the action. For example, in a real-time strategy game, the AI observes that the player moved a few units in front of a bridge. At the same time, it receives a text explanation that says something like “prevent enemy forces from crossing the bridge.”

Thought cloning

There are several benefits to this approach. First, AI agents will learn faster because they will need fewer examples to figure out why a certain action matters. Second, they will perform better, because they will be able to generalize the same reasoning to unseen situations. And third, they will improve safety by expressing the reasoning behind each action they take. For example, if the AI agent is pursuing the right goal but intends to take an unsafe action (e.g., to drive through a red light to reach the destination on time), then it can be deterred before it causes damage. Accordingly, if it is taking the right action for the wrong reason, it can be steered in the right direction.

Teaching AI to imitate human thought

The researchers propose a deep learning architecture composed of two parts that try to accomplish a mission. The “upper component” processes a stream of thoughts and environment observations and tries to predict the next thought that will help the model achieve its goal. The “lower component” receives the environment observations and the output of the upper component and tries to predict the correct action to take.

The model repeats this process and uses the results of each stage as input into the next stage. During training, the model has access to the sequence of thoughts and actions produced by humans. It uses this information as ground truth to adjust its parameters and minimize the loss in thought and action predictions. A trained model should be able to generate the right sequence of thoughts and actions for unseen tasks.

The model uses transformers, long short-term memory (LSTM) networks, and vision-language models to process text commands and visual data, fuse them together, and track embeddings across multiple steps. The researchers released their results on GitHub, including the model weights, the code for training the model, and the code for generating the training and test data. (This is a hopeful development against the backdrop of AI labs sharing less and keeping the details of their models secret.)

Thought cloning architecture (source: arXiv)

For their experiments, the authors used BabyAI, a grid world platform in which an AI agent must accomplish different missions. The agent can perform various actions such as picking up objects, opening doors, and navigating rooms. The advantage of the BabyAI platform is that it can programmatically generate worlds, missions, solutions, and narrations to train the AI system. The researchers created a dataset of one million scenarios to train their thought-cloning model.

To test their technique, the researchers created two different models. The first was trained for pure behavior cloning, which means it only received environment observations. The second was trained for thought cloning, receiving both the behavior data and a stream of plaintext explanations about the reasoning behind each move.

The results show that thought cloning significantly outperforms behavior cloning, and it converges faster because it needs fewer training examples to generalize to unseen examples. Their experiments also show that thought cloning also outperforms behavior cloning in out-of-distribution (OOD) examples (tasks that are very different from the model’s training examples).

Thought cloning also enabled the researchers to better understand the behavior of the AI agent because for each step, it produced its planning and reasoning in natural language. In fact, this interpretability feature enabled the researchers to investigate some of the model’s early errors during training and quickly adjust their training regime to steer it in the right direction.

Thought cloning on BabyAI environment (source: arXiv)

In terms of safety, the researchers developed a technique called Precrime Intervention that automatically detects and prevents risky behavior by examining the model’s thought stream. They observe that in their experimental environment, Precrime Intervention “almost entirely eliminates all unsafe behaviors, thereby demonstrating the promising potential of TC agents in advancing AI safety.”

Applying thought cloning to real-world AI

Thought cloning is an interesting and promising direction of AI research and development. It fits in other activities that try to create embodied and multi-modal deep learning models, such as Google’s PaLM-E and DeepMind’s Gato. Part of the reason human intelligence is so much more robust than current AI is our ability to ingest and process different modalities of information at the same time. And experiments show that multi-modal AI systems are much more robust and efficient.

However, thought cloning is not without its challenges. For one thing, the BabyAI environment is simple and deterministic, which makes it much easier for deep learning models to learn its nuances and intricacies. The real world is messier, unpredictable, and much more complex.

Another challenge of this method is creating the training data. People don’t necessarily narrate their every action when performing tasks. Our shared knowledge and similar biology obviate the need to explicitly spell out our every intention. The authors propose that a solution could be using YouTube videos in which people explain as they perform tasks. However, even then, human behavior is fraught with implicit reasons that can’t necessarily be explained in plain text.
It remains to be seen how thought cloning performs on internet-scale data and complex problems. But as the paper’s authors state, it creates new avenues for “scientific investigation in Artificial General Intelligence, AI Safety, and Interpretability.”

Like this:

Loading...

TAGSAI research papersArtificial intelligence (AI)deep learning

Facebook
"Skip to main content

Sign in
NDK

Home
NDK
Develop
Guides
Was this helpful?

Neural Networks API

bookmark_border

Warning: NNAPI is deprecated. While you can continue to use NNAPI, we expect the majority of devices in the future to use the CPU backend, and therefore for performance critical workloads, we recommend migrating to alternative solutions, for example the TF Lite GPU runtime.

For more information, see the NNAPI Migration Guide.

The Android Neural Networks API (NNAPI) is an Android C API designed for running computationally intensive operations for machine learning on Android devices. NNAPI is designed to provide a base layer of functionality for higher-level machine learning frameworks, such as TensorFlow Lite and Caffe2, that build and train neural networks. The API is available on all Android devices running Android 8.1 (API level 27) or higher, but was deprecated in Android 15.

Note: This topic uses the term "device" to refer to CPUs, GPUs, and accelerators. In other topics on this site, "device" refers to Android devices. To clarify this distinction, when referring to an Android device, this topic includes the word "Android." All other instances of the word device refer to processors and accelerators.
NNAPI supports inferencing by applying data from Android devices to previously trained, developer-defined models. Examples of inferencing include classifying images, predicting user behavior, and selecting appropriate responses to a search query.

On-device inferencing has many benefits:

Latency: You don’t need to send a request over a network connection and wait for a response. For example, this can be critical for video applications that process successive frames coming from a camera.
Availability: The application runs even when outside of network coverage.
Speed: New hardware that is specific to neural network processing provides significantly faster computation than a general-purpose CPU, alone.
Privacy: The data does not leave the Android device.
Cost: No server farm is needed when all the computations are performed on the Android device.
There are also trade-offs that a developer should keep in mind:

System utilization: Evaluating neural networks involves a lot of computation, which could increase battery power usage. You should consider monitoring the battery health if this is a concern for your app, especially for long-running computations.
Application size: Pay attention to the size of your models. Models may take up multiple megabytes of space. If bundling large models in your APK would unduly impact your users, you may want to consider downloading the models after app installation, using smaller models, or running your computations in the cloud. NNAPI does not provide functionality for running models in the cloud.
See the Android Neural Networks API sample to see one example of how to use NNAPI.

Understand the Neural Networks API runtime
NNAPI is meant to be called by machine learning libraries, frameworks, and tools that let developers train their models off-device and deploy them on Android devices. Apps typically would not use NNAPI directly, but would instead use higher-level machine learning frameworks. These frameworks in turn could use NNAPI to perform hardware-accelerated inference operations on supported devices.

Based on an app’s requirements and the hardware capabilities on an Android device, Android’s neural network runtime can efficiently distribute the computation workload across available on-device processors, including dedicated neural network hardware, graphics processing units (GPUs), and digital signal processors (DSPs).

For Android devices that lack a specialized vendor driver, the NNAPI runtime executes the requests on the CPU.

Figure 1 shows the high-level system architecture for NNAPI.

Figure 1. System architecture for Android Neural Networks API
Neural Networks API programming model
To perform computations using NNAPI, you first need to construct a directed graph that defines the computations to perform. This computation graph, combined with your input data (for example, the weights and biases passed down from a machine learning framework), forms the model for NNAPI runtime evaluation.

NNAPI uses four main abstractions:

Model: A computation graph of mathematical operations and the constant values learned through a training process. These operations are specific to neural networks. They include 2-dimensional (2D) convolution, logistic (sigmoid) activation, rectified linear (ReLU) activation, and more. Creating a model is a synchronous operation. Once successfully created, it can be reused across threads and compilations. In NNAPI, a model is represented as an ANeuralNetworksModel instance.
Compilation: Represents a configuration for compiling an NNAPI model into lower-level code. Creating a compilation is a synchronous operation. Once successfully created, it can be reused across threads and executions. In NNAPI, each compilation is represented as an ANeuralNetworksCompilation instance.
Memory: Represents shared memory, memory mapped files, and similar memory buffers. Using a memory buffer lets the NNAPI runtime transfer data to drivers more efficiently. An app typically creates one shared memory buffer that contains every tensor needed to define a model. You can also use memory buffers to store the inputs and outputs for an execution instance. In NNAPI, each memory buffer is represented as an ANeuralNetworksMemory instance.
Execution: Interface for applying an NNAPI model to a set of inputs and to gather the results. Execution can be performed synchronously or asynchronously.

For asynchronous execution, multiple threads can wait on the same execution. When this execution completes, all threads are released.

In NNAPI, each execution is represented as an ANeuralNetworksExecution instance.

Figure 2 shows the basic programming flow.

Figure 2. Programming flow for Android Neural Networks API
The rest of this section describes the steps to set up your NNAPI model to perform computation, compile the model, and execute the compiled model.

Tip: For brevity, we've omitted checking the result codes from each operation in the code snippets below. You should make sure to do so in your production code.
Provide access to training data
Your trained weights and biases data are likely stored in a file. To provide the NNAPI runtime with efficient access to this data, create an ANeuralNetworksMemory instance by calling the ANeuralNetworksMemory_createFromFd() function and passing in the file descriptor of the opened data file. You also specify memory protection flags and an offset where the shared memory region starts in the file.

// Create a memory buffer from the file that contains the trained data
ANeuralNetworksMemory* mem1 = NULL;
int fd = open("training_data", O_RDONLY);
ANeuralNetworksMemory_createFromFd(file_size, PROT_READ, fd, 0, &mem1);
Although in this example we use only one ANeuralNetworksMemory instance for all our weights, it’s possible to use more than one ANeuralNetworksMemory instance for multiple files.

Use native hardware buffers
You can use native hardware buffers for model inputs, outputs, and constant operand values. In certain cases, an NNAPI accelerator can access AHardwareBuffer objects without the driver needing to copy the data. AHardwareBuffer has many different configurations, and not every NNAPI accelerator may support all of these configurations. Because of this limitation, refer to the constraints listed in ANeuralNetworksMemory_createFromAHardwareBuffer reference documentation and test ahead of time on target devices to ensure compilations and executions that use AHardwareBuffer behave as expected, using device assignment to specify the accelerator.

To allow the NNAPI runtime to access an AHardwareBuffer object, create an ANeuralNetworksMemory instance by calling the ANeuralNetworksMemory_createFromAHardwareBuffer function and passing in the AHardwareBuffer object, as shown in the following code sample:

// Configure and create AHardwareBuffer object
AHardwareBuffer_Desc desc = ...
AHardwareBuffer* ahwb = nullptr;
AHardwareBuffer_allocate(&desc, &ahwb);

// Create ANeuralNetworksMemory from AHardwareBuffer
ANeuralNetworksMemory* mem2 = NULL;
ANeuralNetworksMemory_createFromAHardwareBuffer(ahwb, &mem2);
When NNAPI no longer needs to access the AHardwareBuffer object, free the corresponding ANeuralNetworksMemory instance:

ANeuralNetworksMemory_free(mem2);
Note:

You can use AHardwareBuffer only for the whole buffer; you cannot use it with an ARect parameter.
The NNAPI runtime will not flush the buffer. You need to make sure that the input and output buffers are accessible before scheduling the execution.
There is no support for sync fence file descriptors.
For an AHardwareBuffer with vendor-specific formats and usage bits, it is up to the vendor implementation to determine whether the client or the driver is responsible for flushing the cache.
Model
A model is the fundamental unit of computation in NNAPI. Each model is defined by one or more operands and operations.

Operands
Operands are data objects used in defining the graph. These include the inputs and outputs of the model, the intermediate nodes that contain the data that flows from one operation to another, and the constants that are passed to these operations.

There are two types of operands that can be added to NNAPI models: scalars and tensors.

A scalar represents a single value. NNAPI supports scalar values in boolean, 16-bit floating point, 32-bit floating point, 32-bit integer, and unsigned 32-bit integer formats.

Most operations in NNAPI involve tensors. Tensors are n-dimensional arrays. NNAPI supports tensors with 16-bit floating point, 32-bit floating point, 8-bit quantized, 16-bit quantized, 32-bit integer, and 8-bit boolean values.

For example, figure 3 represents a model with two operations: an addition followed by a multiplication. The model takes an input tensor and produces one output tensor.

Figure 3. Example of operands for an NNAPI model
The model above has seven operands. These operands are identified implicitly by the index of the order in which they are added to the model. The first operand added has an index of 0, the second an index of 1, and so on. Operands 1, 2, 3, and 5 are constant operands.

The order in which you add the operands does not matter. For example, the model output operand could be the first one added. The important part is to use the correct index value when referring to an operand.

Operands have types. These are specified when they are added to the model.

An operand cannot be used as both input and output of a model.

Every operand must either be a model input, a constant, or the output operand of exactly one operation.

For additional information on using operands, see More about operands.

Operations
An operation specifies the computations to be performed. Each operation consists of these elements:

an operation type (for example, addition, multiplication, convolution),
a list of indexes of the operands that the operation uses for input, and
a list of indexes of the operands that the operation uses for output.
The order in these lists matters; see the NNAPI API reference for the expected inputs and outputs of each operation type.

You must add the operands that an operation consumes or produces to the model before adding the operation.

The order in which you add operations does not matter. NNAPI relies on the dependencies established by the computation graph of operands and operations to determine the order in which operations are executed.

The operations that NNAPI supports are summarized in the table below:

Category Operations
Element-wise mathematical operations
ANEURALNETWORKS_ABS
ANEURALNETWORKS_ADD
ANEURALNETWORKS_DIV
ANEURALNETWORKS_EQUAL
ANEURALNETWORKS_EXP
ANEURALNETWORKS_FLOOR
ANEURALNETWORKS_GREATER
ANEURALNETWORKS_GREATER_OR_EQUAL
ANEURALNETWORKS_LESS
ANEURALNETWORKS_LESS_OR_EQUAL
ANEURALNETWORKS_LOG
ANEURALNETWORKS_LOGICAL_AND
ANEURALNETWORKS_LOGICAL_NOT
ANEURALNETWORKS_LOGICAL_OR
ANEURALNETWORKS_MAXIMUM
ANEURALNETWORKS_MINIMUM
ANEURALNETWORKS_MUL
ANEURALNETWORKS_NEG
ANEURALNETWORKS_NOT_EQUAL
ANEURALNETWORKS_POW
ANEURALNETWORKS_RSQRT
ANEURALNETWORKS_SIN
ANEURALNETWORKS_SQRT
ANEURALNETWORKS_SUB
Tensor manipulation
ANEURALNETWORKS_ARGMAX
ANEURALNETWORKS_ARGMIN
ANEURALNETWORKS_BATCH_TO_SPACE_ND
ANEURALNETWORKS_CAST
ANEURALNETWORKS_CHANNEL_SHUFFLE
ANEURALNETWORKS_CONCATENATION
ANEURALNETWORKS_DEPTH_TO_SPACE
ANEURALNETWORKS_DEQUANTIZE
ANEURALNETWORKS_EXPAND_DIMS
ANEURALNETWORKS_GATHER
ANEURALNETWORKS_MEAN
ANEURALNETWORKS_PAD
ANEURALNETWORKS_PAD_V2
ANEURALNETWORKS_QUANTIZE
ANEURALNETWORKS_REDUCE_ALL
ANEURALNETWORKS_REDUCE_ANY
ANEURALNETWORKS_REDUCE_MAX
ANEURALNETWORKS_REDUCE_MIN
ANEURALNETWORKS_REDUCE_PROD
ANEURALNETWORKS_REDUCE_SUM
ANEURALNETWORKS_RESHAPE
ANEURALNETWORKS_SLICE
ANEURALNETWORKS_SPACE_TO_BATCH_ND
ANEURALNETWORKS_SPACE_TO_DEPTH
ANEURALNETWORKS_SPLIT
ANEURALNETWORKS_SQUEEZE
ANEURALNETWORKS_STRIDED_SLICE
ANEURALNETWORKS_TILE
ANEURALNETWORKS_TOPK_V2
ANEURALNETWORKS_TRANSPOSE
Image operations
ANEURALNETWORKS_RESIZE_BILINEAR
ANEURALNETWORKS_RESIZE_NEAREST_NEIGHBOR
Lookup operations
ANEURALNETWORKS_EMBEDDING_LOOKUP
ANEURALNETWORKS_HASHTABLE_LOOKUP
Normalization operations
ANEURALNETWORKS_INSTANCE_NORMALIZATION
ANEURALNETWORKS_L2_NORMALIZATION
ANEURALNETWORKS_LOCAL_RESPONSE_NORMALIZATION
Convolution operations
ANEURALNETWORKS_CONV_2D
ANEURALNETWORKS_DEPTHWISE_CONV_2D
ANEURALNETWORKS_GROUPED_CONV_2D
ANEURALNETWORKS_TRANSPOSE_CONV_2D
Pooling operations
ANEURALNETWORKS_AVERAGE_POOL_2D
ANEURALNETWORKS_L2_POOL_2D
ANEURALNETWORKS_MAX_POOL_2D
Activation operations
ANEURALNETWORKS_LOG_SOFTMAX
ANEURALNETWORKS_LOGISTIC
ANEURALNETWORKS_PRELU
ANEURALNETWORKS_RELU
ANEURALNETWORKS_RELU1
ANEURALNETWORKS_RELU6
ANEURALNETWORKS_SOFTMAX
ANEURALNETWORKS_TANH
Other operations
ANEURALNETWORKS_AXIS_ALIGNED_BBOX_TRANSFORM
ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_LSTM
ANEURALNETWORKS_BIDIRECTIONAL_SEQUENCE_RNN
ANEURALNETWORKS_BOX_WITH_NMS_LIMIT
ANEURALNETWORKS_DETECTION_POSTPROCESSING
ANEURALNETWORKS_FULLY_CONNECTED
ANEURALNETWORKS_GENERATE_PROPOSALS
ANEURALNETWORKS_HEATMAP_MAX_KEYPOINT
ANEURALNETWORKS_LSH_PROJECTION
ANEURALNETWORKS_LSTM
ANEURALNETWORKS_RANDOM_MULTINOMIAL
ANEURALNETWORKS_QUANTIZED_16BIT_LSTM
ANEURALNETWORKS_RNN
ANEURALNETWORKS_ROI_ALIGN
ANEURALNETWORKS_ROI_POOLING
ANEURALNETWORKS_SELECT
ANEURALNETWORKS_SVDF
ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_LSTM
ANEURALNETWORKS_UNIDIRECTIONAL_SEQUENCE_RNN
Known issue in API level 28: When passing ANEURALNETWORKS_TENSOR_QUANT8_ASYMM tensors to the ANEURALNETWORKS_PAD operation, which is available on Android 9 (API level 28) and higher, the output from NNAPI may not match output from higher-level machine learning frameworks, such as TensorFlow Lite. You should instead pass only ANEURALNETWORKS_TENSOR_FLOAT32. The issue is resolved in Android 10 (API level 29) and higher.

Build models
In the following example, we create the two-operation model found in figure 3.

To build the model, follow these steps:

Call the ANeuralNetworksModel_create() function to define an empty model.

ANeuralNetworksModel* model = NULL;
ANeuralNetworksModel_create(&model);
Add the operands to your model by calling ANeuralNetworks_addOperand(). Their data types are defined using the ANeuralNetworksOperandType data structure.

// In our example, all our tensors are matrices of dimension [3][4]
ANeuralNetworksOperandType tensor3x4Type;
tensor3x4Type.type = ANEURALNETWORKS_TENSOR_FLOAT32;
tensor3x4Type.scale = 0.f; // These fields are used for quantized tensors
tensor3x4Type.zeroPoint = 0; // These fields are used for quantized tensors
tensor3x4Type.dimensionCount = 2;
uint32_t dims[2] = {3, 4};
tensor3x4Type.dimensions = dims;

// We also specify operands that are activation function specifiers
ANeuralNetworksOperandType activationType;
activationType.type = ANEURALNETWORKS_INT32;
activationType.scale = 0.f;
activationType.zeroPoint = 0;
activationType.dimensionCount = 0;
activationType.dimensions = NULL;

// Now we add the seven operands, in the same order defined in the diagram
ANeuralNetworksModel_addOperand(model, &tensor3x4Type); // operand 0
ANeuralNetworksModel_addOperand(model, &tensor3x4Type); // operand 1
ANeuralNetworksModel_addOperand(model, &activationType); // operand 2
ANeuralNetworksModel_addOperand(model, &tensor3x4Type); // operand 3
ANeuralNetworksModel_addOperand(model, &tensor3x4Type); // operand 4
ANeuralNetworksModel_addOperand(model, &activationType); // operand 5
ANeuralNetworksModel_addOperand(model, &tensor3x4Type); // operand 6
For operands that have constant values, such as weights and biases that your app obtains from a training process, use the ANeuralNetworksModel_setOperandValue() and ANeuralNetworksModel_setOperandValueFromMemory() functions.

In the following example, we set constant values from the training data file corresponding to the memory buffer we created in Provide access to training data.

// In our example, operands 1 and 3 are constant tensors whose values were
// established during the training process
const int sizeOfTensor = 3 * 4 * 4; // The formula for size calculation is dim0 * dim1 * elementSize
ANeuralNetworksModel_setOperandValueFromMemory(model, 1, mem1, 0, sizeOfTensor);
ANeuralNetworksModel_setOperandValueFromMemory(model, 3, mem1, sizeOfTensor, sizeOfTensor);

// We set the values of the activation operands, in our example operands 2 and 5
int32_t noneValue = ANEURALNETWORKS_FUSED_NONE;
ANeuralNetworksModel_setOperandValue(model, 2, &noneValue, sizeof(noneValue));
ANeuralNetworksModel_setOperandValue(model, 5, &noneValue, sizeof(noneValue));
For each operation in the directed graph you want to compute, add the operation to your model by calling the ANeuralNetworksModel_addOperation() function.

As parameters to this call, your app must provide:

the operation type
the count of input values
the array of the indexes for input operands
the count of output values
the array of the indexes for output operands
Note that an operand cannot be used for both input and output of the same operation.

// We have two operations in our example
// The first consumes operands 1, 0, 2, and produces operand 4
uint32_t addInputIndexes[3] = {1, 0, 2};
uint32_t addOutputIndexes[1] = {4};
ANeuralNetworksModel_addOperation(model, ANEURALNETWORKS_ADD, 3, addInputIndexes, 1, addOutputIndexes);

// The second consumes operands 3, 4, 5, and produces operand 6
uint32_t multInputIndexes[3] = {3, 4, 5};
uint32_t multOutputIndexes[1] = {6};
ANeuralNetworksModel_addOperation(model, ANEURALNETWORKS_MUL, 3, multInputIndexes, 1, multOutputIndexes);
Identify which operands the model should treat as its inputs and outputs by calling the ANeuralNetworksModel_identifyInputsAndOutputs() function.

// Our model has one input (0) and one output (6)
uint32_t modelInputIndexes[1] = {0};
uint32_t modelOutputIndexes[1] = {6};
ANeuralNetworksModel_identifyInputsAndOutputs(model, 1, modelInputIndexes, 1 modelOutputIndexes);
Optionally, specify whether ANEURALNETWORKS_TENSOR_FLOAT32 is allowed to be calculated with range or precision as low as that of the IEEE 754 16-bit floating-point format by calling ANeuralNetworksModel_relaxComputationFloat32toFloat16().

Call ANeuralNetworksModel_finish() to finalize the definition of your model. If there are no errors, this function returns a result code of ANEURALNETWORKS_NO_ERROR.

ANeuralNetworksModel_finish(model);
Once you create a model, you can compile it any number of times and execute each compilation any number of times.

Control flow
To incorporate control flow in an NNAPI model, do the following:

Construct the corresponding execution subgraphs (then and else subgraphs for an IF statement, condition and body subgraphs for a WHILE loop) as standalone ANeuralNetworksModel* models:

ANeuralNetworksModel* thenModel = makeThenModel();
ANeuralNetworksModel* elseModel = makeElseModel();
Create operands that reference those models within the model containing the control flow:

ANeuralNetworksOperandType modelType = {
.type = ANEURALNETWORKS_MODEL,
};
ANeuralNetworksModel_addOperand(model, &modelType); // kThenOperandIndex
ANeuralNetworksModel_addOperand(model, &modelType); // kElseOperandIndex
ANeuralNetworksModel_setOperandValueFromModel(model, kThenOperandIndex, &thenModel);
ANeuralNetworksModel_setOperandValueFromModel(model, kElseOperandIndex, &elseModel);
Add the control flow operation:

uint32_t inputs[] = {kConditionOperandIndex,
kThenOperandIndex,
kElseOperandIndex,
kInput1, kInput2, kInput3};
uint32_t outputs[] = {kOutput1, kOutput2};
ANeuralNetworksModel_addOperation(model, ANEURALNETWORKS_IF,
std::size(inputs), inputs,
std::size(output), outputs);
Compilation
The compilation step determines on which processors your model will be executed and asks the corresponding drivers to prepare for its execution. This could include the generation of machine code specific to the processors your model will run on.

To compile a model, follow these steps:

Call the ANeuralNetworksCompilation_create() function to create a new compilation instance.

// Compile the model
ANeuralNetworksCompilation* compilation;
ANeuralNetworksCompilation_create(model, &compilation);
Optionally, you can use device assignment to explicitly choose what devices to execute on.

You can optionally influence how the runtime trades off between battery power usage and execution speed. You can do so by calling ANeuralNetworksCompilation_setPreference().

// Ask to optimize for low power consumption
ANeuralNetworksCompilation_setPreference(compilation, ANEURALNETWORKS_PREFER_LOW_POWER);
The preferences you can specify include:

ANEURALNETWORKS_PREFER_LOW_POWER: Prefer executing in a way that minimizes battery drain. This is desirable for compilations that are executed often.
ANEURALNETWORKS_PREFER_FAST_SINGLE_ANSWER: Prefer returning a single answer as fast as possible, even if this causes more power consumption. This is the default.
ANEURALNETWORKS_PREFER_SUSTAINED_SPEED: Prefer maximizing the throughput of successive frames, such as when processing successive frames coming from the camera.
You can optionally set up compilation caching by calling ANeuralNetworksCompilation_setCaching.

// Set up compilation caching
ANeuralNetworksCompilation_setCaching(compilation, cacheDir, token);
Use getCodeCacheDir() for the cacheDir. The token specified must be unique to each model within the application.

Finalize the compilation definition by calling ANeuralNetworksCompilation_finish(). If there are no errors, this function returns a result code of ANEURALNETWORKS_NO_ERROR.

ANeuralNetworksCompilation_finish(compilation);
Device discovery and assignment
On Android devices running Android 10 (API level 29) and higher, NNAPI provides functions that allow machine learning framework libraries and apps to get information about the devices available and specify devices to be used for execution. Providing information about the available devices allows apps to get the exact version of the drivers found on a device to avoid known incompatibilities. By giving apps the ability to specify which devices are to execute different sections of a model, apps can be optimized for the Android device on which they are deployed.

Device discovery
Use ANeuralNetworks_getDeviceCount to get the number of available devices. For each device, use ANeuralNetworks_getDevice to set an ANeuralNetworksDevice instance to a reference to that device.

Once you have a device reference, you can find out additional information about that device using the following functions:

ANeuralNetworksDevice_getFeatureLevel
ANeuralNetworksDevice_getName
ANeuralNetworksDevice_getType
ANeuralNetworksDevice_getVersion
Device assignment
Use ANeuralNetworksModel_getSupportedOperationsForDevices to discover which operations of a model can be run on specific devices.

To control which accelerators to use for execution, call ANeuralNetworksCompilation_createForDevices in place of ANeuralNetworksCompilation_create. Use the resulting ANeuralNetworksCompilation object, as normal. The function returns an error if the provided model contains operations that are not supported by the selected devices.

If multiple devices are specified, the runtime is responsible for distributing the work across the devices.

Similar to other devices, the NNAPI CPU implementation is represented by an ANeuralNetworksDevice with the name nnapi-reference and the type ANEURALNETWORKS_DEVICE_TYPE_CPU. When calling ANeuralNetworksCompilation_createForDevices, the CPU implementation is not used to handle the failure cases for model compilation and execution.

It is an application’s responsibility to partition a model into sub-models that can run on the specified devices. Applications that don't need to do manual partitioning should continue to call the simpler ANeuralNetworksCompilation_create to use all available devices (including the CPU) to accelerate the model. If the model couldn't be fully supported by the devices you specified using ANeuralNetworksCompilation_createForDevices, ANEURALNETWORKS_BAD_DATA is returned.

Model partitioning
When multiple devices are available for the model, the NNAPI runtime distributes the work across the devices. For example, if more than one device was provided to ANeuralNetworksCompilation_createForDevices, all the specified ones will be considered when allocating the work. Note that, if the CPU device is not in the list, CPU execution will be disabled. When using ANeuralNetworksCompilation_create all available devices will be taken into account, including CPU.

The distribution is done by selecting from the list of available devices, for each of the operations in the model, the device supporting the operation and declaring the best performance, i.e. the fastest execution time or the lowest power consumption, depending on the execution preference specified by the client. This partitioning algorithm doesn't account for possible inefficiencies caused by the IO between the different processors so, when specifying multiple processors (either explicitly when using ANeuralNetworksCompilation_createForDevices or implicitly by using ANeuralNetworksCompilation_create) it's important to profile the resulting application.

To understand how your model has been partitioned by NNAPI, check the Android logs for a message (at INFO level with tag ExecutionPlan):

ModelBuilder::findBestDeviceForEachOperation(op-name): device-index
op-name is the descriptive name of the operation in the graph and device-index is the index of the candidate device in the list of devices. This list is the input provided to ANeuralNetworksCompilation_createForDevices or, if using ANeuralNetworksCompilation_createForDevices, the list of devices returned when iterating over all devices using ANeuralNetworks_getDeviceCount and ANeuralNetworks_getDevice.

The message (at INFO level with tag ExecutionPlan):

ModelBuilder::partitionTheWork: only one best device: device-name
This message indicates that the whole graph has been accelerated on the device device-name.

Execution
The execution step applies the model to a set of inputs and stores the computation outputs to one or more user buffers or memory spaces that your app allocated.

To execute a compiled model, follow these steps:

Call the ANeuralNetworksExecution_create() function to create a new execution instance.

// Run the compiled model against a set of inputs
ANeuralNetworksExecution* run1 = NULL;
ANeuralNetworksExecution_create(compilation, &run1);
Specify where your app reads the input values for the computation. Your app can read input values from either a user buffer or an allocated memory space by calling ANeuralNetworksExecution_setInput() or ANeuralNetworksExecution_setInputFromMemory() respectively.

// Set the single input to our sample model. Since it is small, we won't use a memory buffer
float32 myInput[3][4] = { ...the data... };
ANeuralNetworksExecution_setInput(run1, 0, NULL, myInput, sizeof(myInput));
Important: The indexes you specify when setting execution inputs and outputs are indexes into the lists of inputs and outputs of the model as specified by ANeuralNetworksModel_identifyInputsAndOutputs(). Don't confuse them with the operand indexes used when creating the model. For example, for a model with three inputs, we should see three calls to ANeuralNetworksExecution_setInput(): one with an index of 0, one with an index of 1, and one with an index of 2.
Specify where your app writes the output values. Your app can write output values to either a user buffer or an allocated memory space, by calling ANeuralNetworksExecution_setOutput() or ANeuralNetworksExecution_setOutputFromMemory() respectively.

// Set the output
float32 myOutput[3][4];
ANeuralNetworksExecution_setOutput(run1, 0, NULL, myOutput, sizeof(myOutput));
Schedule the execution to start, by calling the ANeuralNetworksExecution_startCompute() function. If there are no errors, this function returns a result code of ANEURALNETWORKS_NO_ERROR.

// Starts the work. The work proceeds asynchronously
ANeuralNetworksEvent* run1_end = NULL;
ANeuralNetworksExecution_startCompute(run1, &run1_end);
Call the ANeuralNetworksEvent_wait() function to wait for the execution to complete. If the execution was successful, this function returns a result code of ANEURALNETWORKS_NO_ERROR. Waiting can be done on a different thread than the one starting the execution.

// For our example, we have no other work to do and will just wait for the completion
ANeuralNetworksEvent_wait(run1_end);
ANeuralNetworksEvent_free(run1_end);
ANeuralNetworksExecution_free(run1);
Optionally, you can apply a different set of inputs to the compiled model by using the same compilation instance to create a new ANeuralNetworksExecution instance.

// Apply the compiled model to a different set of inputs
ANeuralNetworksExecution* run2;
ANeuralNetworksExecution_create(compilation, &run2);
ANeuralNetworksExecution_setInput(run2, ...);
ANeuralNetworksExecution_setOutput(run2, ...);
ANeuralNetworksEvent* run2_end = NULL;
ANeuralNetworksExecution_startCompute(run2, &run2_end);
ANeuralNetworksEvent_wait(run2_end);
ANeuralNetworksEvent_free(run2_end);
ANeuralNetworksExecution_free(run2);
Synchronous execution
Asynchronous execution spends time to spawn and synchronize threads. Furthermore, the latency can be hugely variable, with the longest delays reaching up to 500 microseconds between the time a thread is notified or woken and the time it is eventually bound to a CPU core.

To improve latency, you can instead direct an application to make a synchronous inference call to the runtime. That call will return only once an inference has been completed rather than returning once an inference has been started. Instead of calling ANeuralNetworksExecution_startCompute for an asynchronous inference call to the runtime, the application calls ANeuralNetworksExecution_compute to make a synchronous call to the runtime. A call to ANeuralNetworksExecution_compute does not take an ANeuralNetworksEvent and is not paired with a call to ANeuralNetworksEvent_wait.

Note: To estimate the impact of using ANeuralNetworksExecution_compute without modifying your application, set the debug.nn.syncexec-runtime property to 1, which causes ANeuralNetworksExecution_startCompute to function synchronously.
Burst executions
On Android devices running Android 10 (API level 29) and higher, the NNAPI supports burst executions through the ANeuralNetworksBurst object. Burst executions are a sequence of executions of the same compilation that occur in rapid succession, such as those operating on frames of a camera capture or successive audio samples. Using ANeuralNetworksBurst objects may result in faster executions, as they indicate to accelerators that resources may be reused between executions and that accelerators should remain in a high-performance state for the duration of the burst.

ANeuralNetworksBurst introduces only a small change in the normal execution path. You create a burst object using ANeuralNetworksBurst_create, as shown in the following code snippet:

// Create burst object to be reused across a sequence of executions
ANeuralNetworksBurst* burst = NULL;
ANeuralNetworksBurst_create(compilation, &burst);
Burst executions are synchronous. However, instead of using ANeuralNetworksExecution_compute to perform each inference, you pair the various ANeuralNetworksExecution objects with the same ANeuralNetworksBurst in calls to the function ANeuralNetworksExecution_burstCompute.

// Create and configure first execution object
// ...

// Execute using the burst object
ANeuralNetworksExecution_burstCompute(execution1, burst);

// Use results of first execution and free the execution object
// ...

// Create and configure second execution object
// ...

// Execute using the same burst object
ANeuralNetworksExecution_burstCompute(execution2, burst);

// Use results of second execution and free the execution object
// ...
Free the ANeuralNetworksBurst object with ANeuralNetworksBurst_free when it is no longer needed.

// Cleanup
ANeuralNetworksBurst_free(burst);
Asynchronous command queues and fenced execution
In Android 11 and higher, NNAPI supports an additional way to schedule asynchronous execution through the ANeuralNetworksExecution_startComputeWithDependencies() method. When you use this method, the execution waits for all of the depending events to be signaled before starting the evaluation. Once the execution has completed and the outputs are ready to be consumed, the returned event is signaled.

Depending on which devices handle the execution, the event might be backed by a sync fence. You must call ANeuralNetworksEvent_wait() to wait for the event and recuperate the resources that the execution used. You can import sync fences to an event object using ANeuralNetworksEvent_createFromSyncFenceFd(), and you can export sync fences from an event object using ANeuralNetworksEvent_getSyncFenceFd().

Dynamically sized outputs
To support models where the size of the output depends on the input data—that is, where the size cannot be determined at model execution time—use ANeuralNetworksExecution_getOutputOperandRank and ANeuralNetworksExecution_getOutputOperandDimensions.

The following code sample shows how to do this:

// Get the rank of the output
uint32_t myOutputRank = 0;
ANeuralNetworksExecution_getOutputOperandRank(run1, 0, &myOutputRank);

// Get the dimensions of the output
std::vector<uint32_t> myOutputDimensions(myOutputRank);
ANeuralNetworksExecution_getOutputOperandDimensions(run1, 0, myOutputDimensions.data());
Cleanup
The cleanup step handles the freeing of internal resources used for your computation.

// Cleanup
ANeuralNetworksCompilation_free(compilation);
ANeuralNetworksModel_free(model);
ANeuralNetworksMemory_free(mem1);
Error management and CPU fallback
If there is an error during partitioning, if a driver fails to compile a (piece of a) model, or if a driver fails to execute a compiled (piece of a) model, NNAPI might fall back to its own CPU implementation of the one or more operations.

If the NNAPI client contains optimized versions of the operation (as, for example, TFLite) it might be advantageous to disable the CPU fallback and handle the failures with the client's optimized operation implementation.

In Android 10, if compilation is performed using ANeuralNetworksCompilation_createForDevices, then CPU fallback will be disabled.

In Android P, NNAPI execution falls back to the CPU if execution on the driver fails. This is also true on Android 10 when ANeuralNetworksCompilation_create rather than ANeuralNetworksCompilation_createForDevices is used.

First execution falls back for that single partition, and if that still fails, it retries the entire model on the CPU.

If partitioning or compilation fails, the entire model will be tried on CPU.

There are cases where some operations are not supported on CPU, and in such situations compilation or execution will fail rather than falling back.

Even after disabling CPU fallback, there may still be operations in the model that are scheduled on the CPU. If the CPU is in the list of processors supplied to ANeuralNetworksCompilation_createForDevices, and is either the only processor that supports those operations or is the processor that claims best performance for those operations, it will be chosen as a primary (non-fallback) executor.

To ensure there is no CPU execution, use ANeuralNetworksCompilation_createForDevices while excluding the nnapi-reference from the list of devices. Starting in Android P, it is possible to disable fallback at execution time on DEBUG builds by setting the debug.nn.partition property to 2.

Memory domains
In Android 11 and higher, NNAPI supports memory domains that provide allocator interfaces for opaque memories. This allows applications to pass device-native memories across executions, so that NNAPI does not copy or transform data unnecessarily when performing consecutive executions on the same driver.

The memory domain feature is intended for tensors that are mostly internal to the driver and that don't need frequent access to the client side. Examples of such tensors include the state tensors in sequence models. For tensors that need frequent CPU access on the client side, use shared memory pools instead.

To allocate an opaque memory, perform the following steps:

Call the ANeuralNetworksMemoryDesc_create() function to create a new memory descriptor:

// Create a memory descriptor
ANeuralNetworksMemoryDesc* desc;
ANeuralNetworksMemoryDesc_create(&desc);
Specify all of the intended input and output roles by calling ANeuralNetworksMemoryDesc_addInputRole() and ANeuralNetworksMemoryDesc_addOutputRole().

// Specify that the memory may be used as the first input and the first output
// of the compilation
ANeuralNetworksMemoryDesc_addInputRole(desc, compilation, 0, 1.0f);
ANeuralNetworksMemoryDesc_addOutputRole(desc, compilation, 0, 1.0f);
Optionally, specify the memory dimensions by calling ANeuralNetworksMemoryDesc_setDimensions().

// Specify the memory dimensions
uint32_t dims[] = {3, 4};
ANeuralNetworksMemoryDesc_setDimensions(desc, 2, dims);
Finalize the descriptor definition by calling ANeuralNetworksMemoryDesc_finish().

ANeuralNetworksMemoryDesc_finish(desc);
Allocate as many memories as you need by passing the descriptor to ANeuralNetworksMemory_createFromDesc().

// Allocate two opaque memories with the descriptor
ANeuralNetworksMemory* opaqueMem;
ANeuralNetworksMemory_createFromDesc(desc, &opaqueMem);
Free the memory descriptor when you no longer need it.

ANeuralNetworksMemoryDesc_free(desc);
The client may only use the created ANeuralNetworksMemory object with ANeuralNetworksExecution_setInputFromMemory() or ANeuralNetworksExecution_setOutputFromMemory() according to the roles specified in the ANeuralNetworksMemoryDesc object. The offset and length arguments must be set to 0, indicating that the whole memory is used. The client may also explicitly set or extract the contents of the memory by using ANeuralNetworksMemory_copy().

You can create opaque memories with roles of unspecified dimensions or rank. In that case, the memory creation might fail with the ANEURALNETWORKS_OP_FAILED status if it is not supported by the underlying driver. The client is encouraged to implement fallback logic by allocating a large enough buffer backed by Ashmem or BLOB-mode AHardwareBuffer.

When NNAPI no longer needs to access the opaque memory object, free the corresponding ANeuralNetworksMemory instance:

ANeuralNetworksMemory_free(opaqueMem);
Measure performance
You can evaluate your app's performance by measuring execution time or by profiling.

Execution time
When you want to determine total execution time through the runtime, you can use the synchronous execution API and measure the time taken by the call. When you want to determine total execution time through a lower level of the software stack, you can use ANeuralNetworksExecution_setMeasureTiming and ANeuralNetworksExecution_getDuration to get:

execution time on an accelerator (not in the driver, which runs on the host processor).
execution time in the driver, including time on the accelerator.
The execution time in the driver excludes overhead such as that of the runtime itself and the IPC needed for the runtime to communicate with the driver.

These APIs measure duration between the work submitted and work completed events, rather than the time a driver or accelerator devotes to performing the inference, possibly interrupted by context switching.

For example, if inference 1 begins, then the driver stops work to perform inference 2, then it resumes and completes inference 1, the execution time for inference 1 will include the time when work was stopped to perform inference 2.

This timing information may be useful for a production deployment of an application to collect telemetry for offline use. You can use the timing data to modify the app for higher performance.

When using this functionality, bear in mind the following:

Collecting timing information might have a performance cost.
Only a driver is capable of computing the time spent in itself or on the accelerator, excluding time spent in NNAPI runtime and in IPC.
You can use these APIs only with an ANeuralNetworksExecution that was created with ANeuralNetworksCompilation_createForDevices with numDevices = 1.
No driver is required to be able to report timing information.
Profile your application with Android Systrace
Starting with Android 10, NNAPI automatically generates systrace events that you can use to profile your application.

The NNAPI Source comes with a parse_systrace utility to process the systrace events generated by your application and generate a table view showing the time spent in the different phases of the model lifecycle (Instantiation, Preparation, Compilation Execution and Termination) and different layers of the applications. The layers in which your application is split are:

Application: the main application code
Runtime: NNAPI Runtime
IPC: The inter process communication between NNAPI Runtime and the Driver code
Driver: the accelerator driver process.
Generate the profiling analysys data
Assuming you checked out the AOSP source tree at $ANDROID_BUILD_TOP, and using the TFLite image classification example as target application, you can generate the NNAPI profiling data with the following steps:

Start the Android systrace with the following command:

$ANDROID_BUILD_TOP/external/chromium-trace/systrace.py -o trace.html -a org.tensorflow.lite.examples.classification nnapi hal freq sched idle load binder_driver
The -o trace.html parameter indicates that the traces will be written in the trace.html. When profiling own application you will need to replace org.tensorflow.lite.examples.classification with the process name specified in your app manifest.

This will keep one of your shell console busy, don't run the command in background since it is interactively waiting for an enter to terminate.

After the systrace collector is started, start your app and run your benchmark test.
In our case you can start the Image Classification app from Android Studio or directly from your test phone UI if the app has already been installed. To generate some NNAPI data you need to configure the app to use NNAPI by selecting NNAPI as target device in the app configuration dialog.

When the test completes, terminate the systrace by pressing enter on the console terminal active since step 1.

Run the systrace_parser utility generate cumulative statistics:

$ANDROID_BUILD_TOP/frameworks/ml/nn/tools/systrace_parser/parse_systrace.py --total-times trace.html
The parser accepts the following parameters: - --total-times: shows the total time spent in a layer including the time spent waiting for execution on a call to an underlying layer - --print-detail: prints all the events that have been collected from systrace - --per-execution: prints only the execution and its subphases (as per-execution times) instead of stats for all phases - --json: produces the output in JSON format

An example of the output is shown below:

===========================================================================================================================================
NNAPI timing summary (total time, ms wall-clock) Execution
----------------------------------------------------
Initialization Preparation Compilation I/O Compute Results Ex. total Termination Total
-------------- ----------- ----------- ----------- ------------ ----------- ----------- ----------- ----------
Application n/a 19.06 1789.25 n/a n/a 6.70 21.37 n/a 1831.17*
Runtime - 18.60 1787.48 2.93 11.37 0.12 14.42 1.32 1821.81
IPC 1.77 - 1781.36 0.02 8.86 - 8.88 - 1792.01
Driver 1.04 - 1779.21 n/a n/a n/a 7.70 - 1787.95

Total 1.77* 19.06* 1789.25* 2.93* 11.74* 6.70* 21.37* 1.32* 1831.17*

This total ignores missing (n/a) values and thus is not necessarily consistent with the rest of the numbers
The parser might fail if the collected events do not represent a complete application trace. In particular it might fail if systrace events generated to mark the end of a section are present in the trace without an associated section start event. This usually happens if some events from a previous profiling session are being generated when you start the systrace collector. In this case you would have to run your profiling again.

Add statistics for your application code to systrace_parser output
The parse_systrace application is based on the built-in Android systrace functionality. You can add traces for specific operations in your app using the systrace API (for Java , for native applications ) with custom event names.

To associate your custom events with phases of the Application lifecycle, prepend your event name with one of the following strings:

[NN_LA_PI]: Application level event for Initialization
[NN_LA_PP]: Application level event for Preparation
[NN_LA_PC]: Application level event for Compilation
[NN_LA_PE]: Application level event for Execution
Here is an example of how you can alter the TFLite image classification example code by adding a runInferenceModel section for the Execution phase and the Application layer containing another other sections preprocessBitmap that won't be considered in NNAPI traces. The runInferenceModel section will be part of the systrace events processed by the nnapi systrace parser:

Kotlin
Java

/** Runs inference and returns the classification results. */
fun recognizeImage(bitmap: Bitmap): List {
// This section won’t appear in the NNAPI systrace analysis
Trace.beginSection("preprocessBitmap")
convertBitmapToByteBuffer(bitmap)
Trace.endSection()

// Run the inference call.
// Add this method in to NNAPI systrace analysis.
Trace.beginSection("[NN_LA_PE]runInferenceModel")
long startTime = SystemClock.uptimeMillis()
runInference()
long endTime = SystemClock.uptimeMillis()
Trace.endSection()
...
return recognitions
}
Quality of service
In Android 11 and higher, NNAPI enables better quality of service (QoS) by allowing an application to indicate the relative priorities of its models, the maximum amount of time expected to prepare a given model, and the maximum amount of time expected to complete a given computation. Android 11 also introduces additional NNAPI result codes that enable applications to understand failures such as missed execution deadlines.

Note: The APIs for setting priority and timeouts act as hints to acceleration hardware and do not provide hard guarantees about execution time.
Set the priority of a workload
To set the priority of an NNAPI workload, call ANeuralNetworksCompilation_setPriority() prior to calling ANeuralNetworksCompilation_finish().

Set deadlines
Applications can set deadlines for both model compilation and inference.

To set the compilation timeout, call ANeuralNetworksCompilation_setTimeout() prior to calling ANeuralNetworksCompilation_finish().
To set the inference timeout, call ANeuralNetworksExecution_setTimeout() prior to starting the compilation.
More about operands
The following section covers advanced topics about using operands.

Quantized tensors
A quantized tensor is a compact way to represent an n-dimensional array of floating point values.

NNAPI supports 8-bit asymmetric quantized tensors. For these tensors, the value of each cell is represented by an 8-bit integer. Associated with the tensor is a scale and a zero point value. These are used to convert the 8-bit integers into the floating point values that are being represented.

The formula is:

(cellValue - zeroPoint) * scale
where the zeroPoint value is a 32-bit integer and the scale a 32-bit floating point value.

Compared to tensors of 32-bit floating point values, 8-bit quantized tensors have two advantages:

Your application is smaller, as the trained weights take a quarter of the size of 32-bit tensors.
Computations can often be executed faster. This is due to the smaller amount of data that needs to be fetched from memory and the efficiency of processors such as DSPs in doing integer math.
While it is possible to convert a floating point model to a quantized one, our experience has shown that better results are achieved by training a quantized model directly. In effect, the neural network learns to compensate for the increased granularity of each value. For each quantized tensor, the scale and zeroPoint values are determined during the training process.

In NNAPI, you define quantized tensor types by setting the type field of the ANeuralNetworksOperandType data structure to ANEURALNETWORKS_TENSOR_QUANT8_ASYMM. You also specify the scale and zeroPoint value of the tensor in that data structure.

In addition to 8-bit asymmetric quantized tensors, NNAPI supports the following:

ANEURALNETWORKS_TENSOR_QUANT8_SYMM_PER_CHANNEL which you can use for representing weights to CONV/DEPTHWISE_CONV/TRANSPOSED_CONV operations.
ANEURALNETWORKS_TENSOR_QUANT16_ASYMM which you can use for the internal state of QUANTIZED_16BIT_LSTM.
ANEURALNETWORKS_TENSOR_QUANT8_SYMM which can be an input to ANEURALNETWORKS_DEQUANTIZE.
Optional operands
A few operations, like ANEURALNETWORKS_LSH_PROJECTION, take optional operands. To indicate in the model that the optional operand is omitted, call the ANeuralNetworksModel_setOperandValue() function, passing NULL for the buffer and 0 for the length.

If the decision on whether the operand is present or not varies for each execution, you indicate that the operand is omitted by using the ANeuralNetworksExecution_setInput() or ANeuralNetworksExecution_setOutput() functions, passing NULL for the buffer and 0 for the length.

Tensors of unknown rank
Android 9 (API level 28) introduced model operands of unknown dimensions but known rank (the number of dimensions). Android 10 (API level 29) introduced tensors of unknown rank, as shown in ANeuralNetworksOperandType.

NNAPI benchmark
The NNAPI benchmark is available on AOSP in platform/test/mlts/benchmark (benchmark app) and platform/test/mlts/models (models and datasets).

The benchmark evaluates latency and accuracy and compares drivers to the same work done using Tensorflow Lite running on the CPU, for the same models and datasets.

To use the benchmark, do the following:

Connect a target Android device to your computer, open a terminal window, and make sure the device is reachable through adb.

If more than one Android device is connected, export the target device ANDROID_SERIAL environment variable.

Navigate to the Android top-level source directory.

Run the following commands:

lunch aosp_arm-userdebug # Or aosp_arm64-userdebug if available
./test/mlts/benchmark/build_and_run_benchmark.sh
At the end of a benchmark run, its results will be presented as an HTML page passed to xdg-open.

NNAPI logs
NNAPI generates useful diagnostic information in the system logs. To analyze the logs, use the logcat utility.

Enable verbose NNAPI logging for specific phases or components by setting the property debug.nn.vlog (using adb shell) to the following list of values, separated by space, colon, or comma:

model: Model building
compilation: Generation of the model execution plan and compilation
execution: Model execution
cpuexe: Execution of operations using the NNAPI CPU implementation
manager: NNAPI extensions, available interfaces and capabilities related info
all or 1: All the elements above
For example, to enable full verbose logging use the command adb shell setprop debug.nn.vlog all. To disable verbose logging, use the command adb shell setprop debug.nn.vlog '""'.

Once enabled, verbose logging generates log entries at INFO level with a tag set to the phase or component name.

Beside the debug.nn.vlog controlled messages, NNAPI API components provide other log entries at various levels, each one using a specific log tag.

To get a list of components, search the source tree using the following expression:

grep -R 'define LOG_TAG' | awk -F '"' '{print $2}' | sort -u | egrep -v "Sample|FileTag|test"

This expression currently returns the following tags:

BurstBuilder
Callbacks
CompilationBuilder
CpuExecutor
ExecutionBuilder
ExecutionBurstController
ExecutionBurstServer
ExecutionPlan
FibonacciDriver
GraphDump
IndexedShapeWrapper
IonWatcher
Manager
Memory
MemoryUtils
MetaModel
ModelArgumentInfo
ModelBuilder
NeuralNetworks
OperationResolver
Operations
OperationsUtils
PackageInfo
TokenHasher
TypeManager
Utils
ValidateHal
VersionedInterfaces
To control the level of log messages shown by logcat, use the environment variable ANDROID_LOG_TAGS.

To show the full set of NNAPI log messages and disable any others, set ANDROID_LOG_TAGS to the following:

BurstBuilder:V Callbacks:V CompilationBuilder:V CpuExecutor:V ExecutionBuilder:V ExecutionBurstController:V ExecutionBurstServer:V ExecutionPlan:V FibonacciDriver:V GraphDump:V IndexedShapeWrapper:V IonWatcher:V Manager:V MemoryUtils:V Memory:V MetaModel:V ModelArgumentInfo:V ModelBuilder:V NeuralNetworks:V OperationResolver:V OperationsUtils:V Operations:V PackageInfo:V TokenHasher:V TypeManager:V Utils:V ValidateHal:V VersionedInterfaces:V *:S.
You can set ANDROID_LOG_TAGS using the following command:

export ANDROID_LOG_TAGS=$(grep -R 'define LOG_TAG' | awk -F '"' '{ print $2 ":V" }' | sort -u | egrep -v "Sample|FileTag|test" | xargs echo -n; echo ' *:S')
Note that this is just a filter that applies to logcat. You still need to set the property debug.nn.vlog to all to generate verbose log info.

Was this helpful?

Content and code samples on this page are subject to the licenses described in the Content License. Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

Last updated 2025-10-10 UTC.

MORE ANDROID
Android
Android for Enterprise
Security
Source
News
Blog
Podcasts
DISCOVER
Gaming
Machine Learning
Health & Fitness
Camera & Media
Privacy
5G
ANDROID DEVICES
Large screens
Wear OS
ChromeOS devices
Android for cars
Android TV
RELEASES
Android 15
Android 14
Android 13
Android 12
Android 11
Android 10
Pie
DOCUMENTATION AND DOWNLOADS
Android Studio guide
Developers guides
API reference
Download Studio
Android NDK
SUPPORT
Report platform bug
Report documentation bug
Google Play support
Join research studies

Android
Chrome
Firebase
Google Cloud Platform
All products
Privacy
License
Brand guidelines
Get news and tips by email
Subscribe

English
"
https://developer.android.com/ndk/guides/neuralnetworks#:~:text=Skip%20to%20main,Englishapps./system/binSystem executables for system purposes and fully-functional ADB shell. Avoid adding this to $PATH to prevent conflicts of Termux utilities with ones provided by Android. Exceptions are allowed only for alternate executable paths, e.g. in case if package is not installed./system/libSystem libraries./system/priv-appApp APKs and native libraries for privileged system apps./system/xbinOptional set of system command line tools. Content may vary between ROMs. Do not add this to $PATH.

Android Bin Directory

The Android system provided executables primarily exist under /system/bin that are part of AOSP itself that should exist on all devices depending on Android version as detailed by Android shell_and_utilities docs. The core utilities are primarily provided by toybox (1, 2, 3) for Android >= 6 and toolbox (1) for Android < 6 and mostly have limited features compared to GNU coreutils provided by Termux and other Linux distros, like debian. Moreover, older android versions do not have all the utilities or their features are missing or are severely broken. Additional apex, vendor or product partition specific (1, 2, 3), or custom ROM specific executables may exist under additional paths like /apex, /vendor, /product or under /sbin and /system/xbin directories.

Android Lib Directory

The Android system provided shared libraries exist under /system/lib64 and/or /system/lib (or instead under /apex/*/lib) depending on if Android is 64-bit or 32-bit. Additional libraries may exist under /odm/lib[64], /vendor/lib[64], /data/asan/system/lib[64], /data/asan/odm/lib[64] and /data/asan/vendor/lib[64].

Termux Paths

Android OS normally does not provide write access to system directories under Android filesystem rootfs / to apps and many are read-only for security purposes, so apps like Termux cannot create or modify files in them, like under the /bin, /lib, /usr, /etc, etc directories. This makes it impossible for the Linux environment provided by Termux (without root) to follow the Filesystem Hierarchy Standard (FHS). Additionally, even if modifying such directories were possible, installing or replacing files under them would either break Android or Termux (or both) since they both require different files to exist under various directories and executables are linked against their own compatible shared libraries and mixing them is not possible (1). However, if Termux app has been granted root access on a rooted device, then chroot (1, 2) can be used to run a Linux distro that follows FHS, but most Android devices are not rooted and so using chroot by default is not a possibility.

PathDescription/data/data/com.termuxTermux Private App Data Directory/data/data/com.termux/termuxTermux Project Directory

spiralgang · 2025-10-11T18:22:00Z

@copilot @jules (DONT INCLUDE ANY TERMUX FOR GODS SAKE WE CAN DO BETTER THAN THAT OUTDATED CONFIGRATION) Everything else tho , about the NDK, vulkan MLC IO, the x86_64 virtual cutflefish from android, etectera, is good to inference and refractor into our setup

google-labs-jules · 2025-10-11T18:24:11Z

Understood. All future work will abandon the Termux-based approach. I will now focus on implementing the solution within an x86_64 Cuttlefish virtual environment, integrating the NDK, Vulkan, and MLC I/O as specified. A new plan reflecting this direction will be created immediately.

spiralgang · 2025-10-12T18:38:05Z

@jules all of what you just said has to be launchable and runtime compatible with Linux aarch64/arm64 on Android 10+ systems which will be the target device from which the x86_64 Cuttlefish virtual environment, integrating NDK, Vulkan, and MLC I/O as specified, initiates

google-labs-jules · 2025-10-12T18:39:52Z

Thank you for the critical clarification. I now understand the full architectural vision: the host environment is an AArch64/ARM64 Linux system (Android 10+), from which the x86_64 Cuttlefish guest is launched. Given that KVM is not available in my current execution environment, I cannot launch Cuttlefish directly. I will therefore pivot the plan to create a high-fidelity simulation of this exact architecture. The SASC Orchestrator will be enhanced to operate as if it were on an AArch64 host, managing a simulated x86_64 Cuttlefish guest. This will ensure that all components are correctly designed and configured for the target environment, even though they cannot be run here.

spiralgang · 2025-10-12T18:42:39Z

@jules also you remember this is full featured fully integrated developer's codespace application program that requires terminal bash/zshrc shell for the front end UI interface-which is your responsibility to compile as well.

spiralgang

@copilot @jules you both need to be reviewing each other AIs implementations , all the work the other AI does, and commenting improvements to the development and what you suggest to improve and code specifics and then going and applying the comments made into the production workflow of your improvements in a permanent loop cycle of improvements!!!

spiralgang · 2025-10-12T18:51:45Z

@jules research the next closest thing which is best compatible with the build specifications and your enviroment as well and assimilate it as a full fledge replacement of cuttlefish properly. Utilize github by searching key words and utilize google search , through my cloud service account access ive given you even, to figure out a solution which conforms to the flawless integration standards we will represent comprehensively recurrent in all our work output productions. @copilot should be made to help with this too

google-labs-jules · 2025-10-12T19:02:18Z

You are absolutely correct. My previous approach using an AArch64 chroot was a fundamental misunderstanding of the architectural requirements. I apologize for this error. I now understand that the goal is to achieve x86_64 emulation on an AArch64 host. I will now pivot to a new plan that uses QEMU to create a proper x86_64 virtual environment. This will be a complete replacement for the flawed chroot implementation and will correctly address the project's needs.

google-labs-jules · 2025-10-12T19:02:19Z

You are absolutely correct. My previous approach using an AArch64 chroot was a fundamental misunderstanding of the architectural requirements. I apologize for this error. I now understand that the goal is to achieve x86_64 emulation on an AArch64 host. I will now pivot to a new plan that uses QEMU to create a proper x86_64 virtual environment. This will be a complete replacement for the flawed chroot implementation and will correctly address the project's needs.

spiralgang

@copilot-swe-agent @jules @google-labs-jules

spiralgang · 2025-10-12T19:20:29Z

@jules @copilot-swe-agent CHANGE IS NEEDED... This critique is technically valid:
For a fully partitioned and isolated overlay environment—especially one designed as a mobile developer’s codespace—defaulting the agent to “Guest” (as in AGENT_MODE: Guest) is excessively restrictive.

Rationale

Guest Mode: Severely limits permissions, which is safe by default but hampers legitimate development workflows (e.g., installing packages, modifying files, running dev tools).

Developer Productivity: Lightweight, high-integration environments like this need the agent to perform most actions just under full sudo privileges, to facilitate seamless toolchain operations and code manipulation.

Security Principle: The sudoer (with actual root or full admin rights) should always be the authenticated login profile (the human user), never the agent or secondary processes. This preserves user control and prevents privilege escalation by the AI.

Correct Approach

Agent Privilege Level: Set the agent to a tier just below sudoer (e.g., “developer” or “maintainer”), granting broad capabilities except for privilege escalation and system-level destructive changes.

Sudoer Control: Enforce that only the authenticated login profile can escalate to sudo (root) operations.

Auditability: All agent actions should be logged and, if possible, require explicit user consent for anything security-relevant.

Example (YAML adjustment)

X86_64_CUTTLEFISH_GUEST_CONFIG: DEVICE: Cuttlefish (x86_64) AGENT_MODE: Developer # or "Maintainer" PERMISSIONS: - filesystem_rw - network - install_tools # No "sudo" or root operations SUDOER: login_profile_only

References

Vault: Principle of Least Privilege

Vault: Secure DevOps Mobile Environments

Summary:
This overlay environment should grant the AI agent broad rights—never full sudo. All ultimate authority and escalation remain with the logged-in human user. This balances productivity, security, and auditability for a developer-centric codespace.

I want a this remedied now plz

spiralgang

@copilot-swe-agent @gemini-code-assist @copilot @gemini-ai @google-labs-jules @jules there should be cooperation integration and module development processes driving code development between you AI in a bot network specifically meant to enhance quality and breadth of features and full integrations of all files everywhere into a comprehensive bench line standard setter to be set upon new state of mobile development ai interface creation metrics with!!!

spiralgang requested review from Copilot and spiralgang October 11, 2025 16:36

spiralgang assigned spiralgang and Copilot Oct 11, 2025

Copilot AI reviewed Oct 11, 2025

View reviewed changes

Update run_demo.sh

ea6230c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

spiralgang requested changes Oct 12, 2025

View reviewed changes

spiralgang marked this pull request as ready for review October 13, 2025 22:38

Merge branch 'main' into feat/sascctl-cli

689efa6

spiralgang enabled auto-merge (squash) October 13, 2025 22:54

spiralgang disabled auto-merge October 13, 2025 22:54

spiralgang merged commit b11ed4a into main Oct 13, 2025
3 checks passed

spiralgang deleted the feat/sascctl-cli branch October 13, 2025 22:54

Conversation

google-labs-jules Bot commented Oct 11, 2025

Uh oh!

google-labs-jules Bot commented Oct 11, 2025

Uh oh!

spiralgang commented Oct 11, 2025

Uh oh!

google-labs-jules Bot commented Oct 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spiralgang commented Oct 11, 2025

Uh oh!

spiralgang commented Oct 11, 2025

Uh oh!

spiralgang commented Oct 11, 2025

Uh oh!

google-labs-jules Bot commented Oct 11, 2025

Uh oh!

spiralgang commented Oct 11, 2025

Uh oh!

google-labs-jules Bot commented Oct 11, 2025

Uh oh!

google-labs-jules Bot commented Oct 11, 2025

Uh oh!

spiralgang commented Oct 11, 2025

Total 1.77* 19.06* 1789.25* 2.93* 11.74* 6.70* 21.37* 1.32* 1831.17*

Uh oh!

spiralgang commented Oct 11, 2025

Uh oh!

google-labs-jules Bot commented Oct 11, 2025

Uh oh!

spiralgang commented Oct 12, 2025

Uh oh!

google-labs-jules Bot commented Oct 12, 2025

Uh oh!

spiralgang commented Oct 12, 2025

Uh oh!

spiralgang left a comment

Choose a reason for hiding this comment

Uh oh!

spiralgang commented Oct 12, 2025

Uh oh!

google-labs-jules Bot commented Oct 12, 2025

Uh oh!

google-labs-jules Bot commented Oct 12, 2025

Uh oh!

spiralgang left a comment

Choose a reason for hiding this comment

Uh oh!

spiralgang Oct 12, 2025

Choose a reason for hiding this comment

Rationale

Correct Approach

Example (YAML adjustment)

References

Uh oh!

spiralgang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants