Merge branch 'main' of https://github.com/davefojtik/RunPod-Fooocus-API

davefojtik · davefojtik · commit eff18cb90c22 · 2024-01-21T04:12:54.000+01:00
diff --git a/README.md b/README.md
@@ -1,21 +1,26 @@
-# RunPod-Fooocus-API
+![Static Badge](https://img.shields.io/badge/API_version-0.3.29-blue) ![Static Badge](https://img.shields.io/badge/API_coverage-100%25-vividgreen) ![Static Badge](https://img.shields.io/badge/API_tests-passed-vividgreen) ![Static Badge](https://img.shields.io/badge/Known_bugs-1-red) ![Static Badge](https://img.shields.io/badge/Fooocus_version-2.0.78-lightgrey)
 
-This is a RunPod Fooocus-API worker that expects a Fooocus-API v0.3.26 instance installed on a RunPod Network Volume.
+# RunPod-Fooocus-API
 
-Ready-to-use Docker Image with this repo's code: https://hub.docker.com/r/3wad/runpod-fooocus-api (use `3wad/runpod-fooocus-api:0.2.41`)
+This is a RunPod Fooocus-API worker that expects a **Fooocus-API `v0.3.29`** instance installed on a RunPod Network Volume.  
+For ready-to-use serverless endpoint image with this repo's code use: [`3wad/runpod-fooocus-api:0.3.29`](https://hub.docker.com/r/3wad/runpod-fooocus-api/tags)
 
 ## How to prepare Network Volume
-- Create RunPod network volume. 15GB is just enough for the generic Foocus with Juggernaut model. You can increase its size any time if you need additional models, loras etc. But unfortunately, it cannot be reduced back.
-- Create a custom Pod Template and use the `konieshadow/fooocus-api:v0.3.26` image. I went with 30GB disk sizes, mount path /workspace, and expose http 8888 and tcp 22.
-- Run the network volume with the custom fooocus-api image you've just created. You don't need a strong GPU pod, the installation is CPU and download-intensive, but be aware that some older-gen pods might not support the required CUDA versions. Let it download and install everything. After the Juggernaut model is downloaded, use the connect button to load into the Fooocus-API docs running on the pod's 8888 port. Here you should try all the API methods you plan to use. Not only to verify they work, but also additional models are downloaded once you run inpaint, outpaint, upscale, vary and image inputs (canny, face swap etc.) endpoints for the first time.
-- After that you are ready to connect to the pod's console and use cp -r /app/* /workspace/ to copy everything into the persistent network volume
+- [**Create RunPod network volume:**](https://www.runpod.io/console/user/storage)
+  17GB is just enough for the generic Foocus with Juggernaut and all controlnet models. You can increase its size any time if you need additional models, loras etc. But unfortunately, it cannot be reduced back without creating new one.
+- [**Create a custom Pod Template:**](https://www.runpod.io/console/user/templates) and use the `konieshadow/fooocus-api:v0.3.29` image. I went with 30GB disk sizes, mount path `/workspace`, and expose `http 8888` and `tcp 22`.
+- [**Run a GPU pod:**](https://www.runpod.io/console/gpu-secure-cloud) with network volume and custom fooocus-api template you've just created. You don't need a strong GPU pod, the installation is CPU and download-intensive, but be aware that some older-gen pods might not support the required CUDA versions. Let it download and install everything. After the Juggernaut model is downloaded, use the connect button to load into the Fooocus-API docs running on the pod's 8888 port. Here you should try all the API methods you plan to use. Not only to verify they work, but also because additional up-to-date models are downloaded once you run inpaint, outpaint, upscale, vary and img2img (canny, face swap etc.) endpoints for the first time.
+- After that you are ready to connect to the pod's console and use `cp -r /app/* /workspace/` to copy everything into the persistent network volume
 - Once everything is copied successfully, you can terminate the pod. You have the network volume ready.
-- ---
-- Now you can use our premade image: `3wad/runpod-fooocus-api:0.2.41` and skip the next step OR create your custom docker image from this repo that will run on the actual serverless API. Feel free to adjust handler.py based on how you want to make your requests and it's parameters, or add additional features.
-- Once you build it, upload it to the Docker Hub.
-- Now you create a custom Serverless Pod Template using the Docker Hub image you've just uploaded (or our premade one). Active container disk should be slightly bigger than the size of the worker docker image.
-- Create a new Serverless API Endpoint. Make sure to choose your (or ours) Docker Hub image and not the `konieshadow/fooocus-api` from step 2. In Advanced settings choose your created network volume.
-- Other settings are your choice, but I personally found that using 4090/L4 GPUs + Flashboot is the most cost-effective one. In frequent use, the 4090 is able to return an image in ~8s including cold start, making it ~4x cheaper to run this on RunPod than for example using DALLE-3 API. This fact can of course vary based on datacenter locations and GPU availability.
+---
+- Now you can use our premade image: `3wad/runpod-fooocus-api:0.3.29` and skip the next step OR create your custom docker image from this repo that will run on the actual serverless API. Feel free to adjust the code to your needs.
+- *If you built your own image, upload it to the Docker Hub.*
+- [**Create a custom Serverless Pod Template:**](https://www.runpod.io/console/serverless/user/templates) using the Docker Hub image you've just uploaded (or our premade one). Active container disk should be slightly bigger than the size of that docker image. In the case of our prebuild one, it's currently about 13.7GB
+- [**Create a new Serverless API Endpoint:**](https://www.runpod.io/console/serverless) Make sure to choose your (or our) Docker Hub image and not the `konieshadow/fooocus-api` from the step 2. In Advanced settings choose your created network volume.
+- Other settings are your choice, but I personally found that using 4090/L4 GPUs + Flashboot is the most cost-effective one. In frequent use, the 4090 is able to return a txt2img in ~8s including cold start, making it **~25x** cheaper to run Fooocus on RunPod than for example using DALLE-3 API. **(01/24 prices: 0,0016usd/img vs 0,04usd/img), This fact can of course vary based on datacenter locations and GPU availability.*
+
+## How to send requests
+[request_examples.js](https://github.com/davefojtik/RunPod-Fooocus-API/blob/main/request_examples.js) contain example payloads for all endpoints on your serverless worker. But don't hesitate to ask in the [Discussions](https://github.com/davefojtik/RunPod-Fooocus-API/discussions) if you need more help.
 
-## Contributors welcomed
-Feel free to do pull requests, fixes, improvements and suggestions to the code. I can spend limited time on this as it's only a side project for our community discord bot. So more people would definitely help me maintain this.
+## Contributors Welcomed
+Feel free to do pull requests, fixes, improvements and suggestions to the code. I can spend only limited time on this as it's a side project for our community discord bot. So any cooperation will help manage this repo better.
diff --git a/src/handler.py b/src/handler.py
@@ -1,4 +1,6 @@
 # Native
+import shutil
+import os
 import time
 import requests
 import re
@@ -29,15 +31,16 @@ def wait_for_service(url):
             print("Service not ready yet. Retrying...")
         except Exception as err:
             print("Error: ", err)
-
-        time.sleep(1)
-
+        time.sleep(0.5)
 
 def run_inference(params):
     config = {
         "baseurl": "http://127.0.0.1:8888",
         "api": {
-            "txt2img":  ("POST", "/v1/generation/text-to-image"),
+            "home": ("GET", "/"), 
+            "ping": ("GET", "/ping"), 
+            "txt2img":  ("POST", "/v1/generation/text-to-image"), 
+            "txt2img-ip": ("POST", "/v2/generation/text-to-image-with-ip"),
             "upscale-vary": ("POST", "/v1/generation/image-upscale-vary"), #multipart/form-data
             "upscale-vary2": ("POST", "/v2/generation/image-upscale-vary"),
             "inpaint-outpaint": ("POST", "/v1/generation/image-inpait-outpaint"), #multipart/form-data
@@ -46,10 +49,12 @@ def run_inference(params):
             "img2img2":  ("POST", "/v2/generation/image-prompt"),
             "queryjob": ("GET", "/v1/generation/query-job"),
             "jobqueue": ("GET", "/v1/generation/job-queue"),
-            "stop": ("POST", "/v1/generation/stop"),
-            "models": ("GET", "/v1/engines/all-models"),
-            "models-refresh": ("POST", "/v1/engines/refresh-models"),
-            "styles": ("GET", "/v1/engines/styles")
+            "jobhistory": ("GET", "/v1/generation/job-history"), 
+            "stop": ("POST", "/v1/generation/stop"), 
+            "describe": ("POST", "/v1/tools/describe-image"), #multipart/form-data
+            "models": ("GET", "/v1/engines/all-models"), 
+            "models-refresh": ("POST", "/v1/engines/refresh-models"), 
+            "styles": ("GET", "/v1/engines/styles") 
         },
         "timeout": 300
     }
@@ -60,38 +65,56 @@ def run_inference(params):
         api_config = config["api"][api_name]
     else:
         raise Exception("Method '%s' not yet implemented" % api_name)
-    #
+
     api_verb = api_config[0]
     api_path = api_config[1]
     response = {}
 
-    if api_verb == "GET":
-        response = sd_session.get(
-                url='%s%s' % (config["baseurl"], api_path),
-                timeout=config["timeout"])
+    # You can send the input_image, input_mask, and cn_images as PNG: binary encoded into base64 string OR as url string
+    input_imgs = {'input_image':None, 'input_mask':None, 'cn_img1':None, 'cn_img2':None, 'cn_img3':None, 'cn_img4':None, 'image_prompts':[None,None,None,None], 'image':None}
+    
+    def process_img(value):
+        if re.search(r'https?:\/\/\S+', value) is not None:
+            return requests.get(value).content
+        elif re.search(r'^[A-Za-z0-9+/]+[=]{0,2}$', value) is not None and value != "None":
+            return base64.b64decode(value)
+        else:
+            return value
         
-    # You can send the input_image, input_mask, and cn_images as binary encoded into url-safe base64 string or as publicly accessible url link string
-    input_imgs = {'input_image':None, 'input_mask':None, 'cn_img1':None, 'cn_img2':None, 'cn_img3':None, 'cn_img4':None,}
-    for key in input_imgs.items():
+    for key, value in input_imgs.items():
         if key in params:
             try:
-                if not re.search(r'https?://\S+', params[key]) is None:
-                    img_url = params.get(key)
-                    input_imgs[key] = requests.get(img_url).content
+                if key == "image_prompts":
+                    for index, prompt in enumerate(params.get("image_prompts", [])):
+                        input_imgs["image_prompts"][index] = process_img(prompt["cn_img"])
                 else:
-                    input_imgs[key] = base64.urlsafe_b64decode(params[key])
+                    input_imgs[key] = process_img(params[key])
             except Exception as e:
                 print("Image conversion task failed: ", e)
-                return None
+                return e
+    
+    # --- Send requests to the Fooocus-API ---       
+    if api_verb == "GET":
+        response = sd_session.get(url='%s%s' % (config["baseurl"], api_path), timeout=config["timeout"])
 
     if api_verb == "POST":
         # If the request should be multipart/form-data, convert the application/json data into it.
-        if api_name in ["upscale-vary", "inpaint-outpaint", "img2img"]:
+        if api_name in ["upscale-vary", "inpaint-outpaint", "img2img", "describe"]:
             try:
-                # Remove the input_image key/value from original request data (it gets confused otherwise)
-                del params['input_image']
+                # Replace the original image params with the processed ones
+                for key, value in input_imgs.items():
+                    if value is not None:
+                        if type(value) == list:
+                            for i, value in enumerate(input_imgs['image_prompts']):
+                                if value is not None:
+                                    params['image_prompts'][i]['cn_img'] = (key+'.png', value, 'image/png')
+                        else:
+                            if isinstance(value, bytes):
+                                params[key] = (key+'.png', value, 'image/png')
+                            else:
+                                params[key] = value
                 # Convert
-                multipart_data = MultipartEncoder(fields={key: (f'{key}.png', value, 'image/png') for key, value in input_imgs.items()}, **params)
+                multipart_data = MultipartEncoder(fields={**params})
                 headers = {'Content-Type': multipart_data.content_type}
                 response = sd_session.post(
                     url='%s%s' % (config["baseurl"], api_path),
@@ -100,17 +123,29 @@ def run_inference(params):
                     timeout=config["timeout"])
             except Exception as e:
                 print("multipart/form-data task failed: ", e)
-                return None
+                return e
         # If the final request should be application/json. Send the original request data
         else:
+            # Convert the processed binary image back to url-safe-base64
             for key, value in input_imgs.items():
-                if not value is None:
-                    params[key] = base64.urlsafe_b64encode(value).decode('utf-8')
+                if value is not None:
+                    if type(value) == list:
+                        for i, value in enumerate(input_imgs['image_prompts']):
+                            if isinstance(value, bytes):
+                                params['image_prompts'][i]["cn_img"] = base64.b64encode(value).decode('utf-8')
+                    elif isinstance(value, bytes):
+                        params[key] = base64.b64encode(value).decode('utf-8')
             response = sd_session.post(
                 url='%s%s' % (config["baseurl"], api_path),
                 json=params, 
                 timeout=config["timeout"])
-    return response.json()
+
+    # --- Return the API response to the RunPod ---        
+    content_type = response.headers.get('Content-Type', '')
+    if 'application/json' in content_type:
+        return response.json()
+    else:
+        return response.text
 
 # ---------------------------------------------------------------------------- #
 #                                RunPod Handler                                #
@@ -119,16 +154,21 @@ def handler(event):
     '''
     This is the handler function that will be called by the serverless.
     '''
+    # Clear outputs (Comment out or delete this part to keep images on the volume. Keep in mind you'll always need enough space to generate more)
+    print("Clearing outputs...")
+    shutil.rmtree('/workspace/outputs/files')
+    os.makedirs('/workspace/outputs/files')
+    shutil.rmtree('/workspace/repositories/Fooocus/outputs')
+    os.makedirs('/workspace/repositories/Fooocus/outputs')
 
     json = run_inference(event["input"])
-
+    
     # Return the output that you want to be returned like pre-signed URLs to output artifacts
     return json
 
-
 if __name__ == "__main__":
     wait_for_service(url='http://127.0.0.1:8888/v1/generation/text-to-image')
 
     print("Fooocus API Service is ready. Starting RunPod...")
 
-    runpod.serverless.start({"handler": handler})
+    runpod.serverless.start({"handler": handler})