Prompt based GUI and terminal automation #3463

James4Ever0 · 2022-12-30T23:35:18Z

James4Ever0
Dec 30, 2022

I always want to make a bot to execute GUI and terminal tasks like human, such as "check and cleanup disks", "make a funny video and upload to youtube", "edit and test this bash script till it is bug-free", "talk to people on twitter and post ads".

Of course these tasks can be done by domain specific software, but since ChatGPT shows promising capabilities, and Open-Assistant is working on it, I wonder if it can target human-level computer operations to become a real killer assistant.

James4Ever0 · 2022-12-30T23:41:49Z

James4Ever0
Dec 30, 2022
Author

For such assistant to become usable, it shall be both real-time and always-learning. Real-time means it will listen to the user, monitor feedback from computer and react accordingly. Always-learning means it will watch and imitate user's actions, ask questions to users and search online to learn more.

0 replies

yk · 2022-12-30T23:54:53Z

yk
Dec 30, 2022
Maintainer

Indeed, that would be very cool. Efforts like this exist, e.g. https://robotme.org/ and we'll probably not get into this in the first version, but in subsequent versions, this is definitely on the table!

0 replies

davidak · 2023-03-24T16:42:29Z

davidak
Mar 24, 2023

Sounds a bit like speech recognition software (e.g. Dragon NaturallySpeaking) that can do specific actions like clicking somewhere, opening programs or dictating text, but combined with an intent recognition like the current voice assistants (Alexa, Siri, ...), but more flexible what it can understand.

The end product could be an app that runs in the background of your pc or smartphone and you can talk to it. Ask any question and command it do to stuff on the computer for you.

Here is my research for "Linux Voice Interface": https://pad.nixnet.services/d1W89tL8Qj6-65-UJcp5SA?view

Especially check out Almond aka Genie from Stanford. Maybe you can collaborate with them to create an Open Source, Privacy-Preserving Voice Assistant. Also integration with Home Assistant would be great.

0 replies

James4Ever0 · 2023-05-24T12:22:43Z

James4Ever0
May 24, 2023
Author

Now it is been partly implemented, and as part of my ideology, the project Cybergod has been released.

Here's the program in action:

cybergod_with_background.mp4

If anyone interested in Cybergod, please join official discord group.

0 replies

James4Ever0 · 2024-08-09T06:44:22Z

James4Ever0
Aug 9, 2024
Author

Developed a terminal interaction environment for agents, capable of converting all info from terminal into meaningful text, including cursor and styling information.

Terminal environment can be captured as image with cursor denoted in red:

OpenDevin is working on this right now.

0 replies

kinthaiofficial · 2026-04-29T00:25:29Z

kinthaiofficial
Apr 29, 2026

GUI and terminal automation as an agent capability is a compelling direction — it's one of the few things that lets AI operate in the real computer environment rather than just the LLM API layer.

A few things we found matter for production GUI automation agents:

Capability declaration — what can this agent actually click/type vs what is off-limits? Without explicit capability bounds, an automation agent can end up doing things the user didn't intend. We use capability manifests at spawn time: the agent knows it can interact with [application X, terminal, browser Y] but not [email client, file system outside /tmp].

Action attribution — if the agent takes a destructive action (deletes a file, sends an email), you need an audit trail that shows exactly what the agent was instructed to do vs what it inferred was correct. Signed execution receipts for each discrete action.

Cost and rate limits — screenshot capture + vision model calls are expensive. Agents need to know their budget for vision calls and make decisions about when to screenshot vs when to infer state from accessible text.

Fallback gracefully — GUI elements change, applications update. A robust automation agent needs to handle "element not found" without silently failing.

For the terminal automation specifically, we've been building this as part of KinthAI's agent execution environment — agents that can run terminal commands within a sandboxed context: https://blog.kinthai.ai/openclaw-multi-tenancy-why-vm-per-user-doesnt-scale covers the isolation model.

Are you targeting single-user desktop automation or multi-user server-side automation?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prompt based GUI and terminal automation #3463

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Prompt based GUI and terminal automation #3463

Uh oh!

Uh oh!

James4Ever0 Dec 30, 2022

Replies: 6 comments

Uh oh!

James4Ever0 Dec 30, 2022 Author

Uh oh!

yk Dec 30, 2022 Maintainer

Uh oh!

davidak Mar 24, 2023

Uh oh!

Uh oh!

James4Ever0 May 24, 2023 Author

Uh oh!

James4Ever0 Aug 9, 2024 Author

Uh oh!

kinthaiofficial Apr 29, 2026

James4Ever0
Dec 30, 2022

James4Ever0
Dec 30, 2022
Author

yk
Dec 30, 2022
Maintainer

davidak
Mar 24, 2023

James4Ever0
May 24, 2023
Author

James4Ever0
Aug 9, 2024
Author

kinthaiofficial
Apr 29, 2026