merging development into master (#62)

Kabeer2004 · web-flow · commit cdcab75843c0 · 2025-05-08T02:56:15.000+05:30
chore: update README.md for v2

feat: modified orpheus generator script to handle batch generation
diff --git a/README.md b/README.md
@@ -31,6 +31,8 @@
     <a href="https://github.com/existence-master/Sentient/issues/">Report Bug</a>
   <span> · </span>
     <a href="https://github.com/existence-master/Sentient/issues/">Request Feature</a>
+  <span> · </span>
+    <a href="https://www.youtube.com/watch?v=l481bvpCjbc">Watch our Ad!</a>
   </h4>
 </div>
 
@@ -75,27 +77,27 @@ We at [Existence](https://existence.technology) believe that AI won't simply die
 ### :camera: Screenshots
 
 <div align="center"> 
-  <img src="https://private-user-images.githubusercontent.com/59280736/431842199-b76c7a9a-1689-42de-93ed-5d04d6c7ad10.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDIxOTktYjc2YzdhOWEtMTY4OS00MmRlLTkzZWQtNWQwNGQ2YzdhZDEwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY2ZThhMDIyZmJkZWYxYzE5MzMyNTYzZDM5NjY0MmM3ZDc2NmJjMmYwNGU5MjUzMmJhYTE1NDU3NDhhZGIwODgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.U2Bn6mIdJF2SvXpJ9fyKe2c36-feA2wKtvQNcYjaEYY" alt="screenshot" />
+  <img src="https://i.postimg.cc/jqNX99VF/image.png" alt="screenshot" />
   <p align="center">Context is streamed in from your apps - Sentient uses this context to 👇</p>
 </div>
 <div align="center"> 
-  <img src="https://private-user-images.githubusercontent.com/59280736/431841076-c7337318-38e2-4515-848d-df6ce9ec8685.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDEwNzYtYzczMzczMTgtMzhlMi00NTE1LTg0OGQtZGY2Y2U5ZWM4Njg1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFkYmIzYWJkMDExMmU0NzllMmZmNjU0NmUyNzIyYzJlZjUwMzM1ZDY0NjY0NjlhYTM4ODNiOGNmNDRkYzhhZTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.s87SsI2uPocqdoRQK-b_1R89ApFKnvOoVzislh77bAw" alt="screenshot" />
+  <img src="https://i.postimg.cc/FRVMVKxj/image.png" alt="screenshot" />
   <p align="center">Learn Long-Term Memories about you</p>
 </div>
 <div align="center"> 
-  <img src="https://private-user-images.githubusercontent.com/59280736/431841142-33edc431-6be9-45b3-9b9c-5262f459ede6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDExNDItMzNlZGM0MzEtNmJlOS00NWIzLTliOWMtNTI2MmY0NTllZGU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFlOTVhMWEyODVhMWZmN2ZmMzNjMGMyZWMxZjQwYzFkNGM4OGZhZTQ4YjVkYTc5MmRhY2ZmZGQxZTBmOTY4NjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.GUEsRDZzletVFm4uKBQhRehk4l2FhzEJuX5jFnglbZ4" alt="screenshot" />
+  <img src="https://i.postimg.cc/hth7Fzzt/image.png" alt="screenshot" />
   <p align="center">Learn Short-Term Memories about you</p>
 </div>
 <div align="center"> 
-  <img src="https://private-user-images.githubusercontent.com/59280736/431841274-ea980432-1357-451b-93d2-d952a65f4607.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDEyNzQtZWE5ODA0MzItMTM1Ny00NTFiLTkzZDItZDk1MmE2NWY0NjA3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZjMTM5OWRlMGI1ODI0Zjg4YmJiYjk2MDBmMWNjNDdhMDRjODM2YjBhNjJjY2JiMzMxMGNlM2UzYjU5OGFmYzcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.F76G4nymktipQtkQZQ_9sfmMFKiQ1AH-0hMoPWt0DQE" alt="screenshot" />
+  <img src="https://i.postimg.cc/FFM9FYBK/image.png" alt="screenshot" />
   <p align="center">Perform Actions for you, asynchronously and by combining all the different tools it needs to complete a task.</p>
 </div>
 <div align="center"> 
-  <img src="https://private-user-images.githubusercontent.com/59280736/431842176-c1ec90b6-edcc-4f9c-bc94-aa2e40b6422f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDIxNzYtYzFlYzkwYjYtZWRjYy00ZjljLWJjOTQtYWEyZTQwYjY0MjJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRkZGMyY2Y1NTkyMDk1YjY4NWEwZjY1NDUxNWQ5NDc2NWU1OTAwZmM3ZjVjYWNmZDQzYWE1ZGNkMjJiYjQ3ZDImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.4djs4rCVqHY4L5_gshezAiMNgIcLui_eiFbZc8rsKrY" alt="screenshot" />
+  <img src="https://i.postimg.cc/TPpSW9yv/image.png" alt="screenshot" />
   <p align="center">You can also voice-call Sentient anytime for a low-latency, human-like interactive experience.</p>
 </div>
 <div align="center"> 
-  <img src="https://private-user-images.githubusercontent.com/59280736/431842396-03af93ff-6acd-44c7-a973-dca20ac205bd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDQzMTkyMTYsIm5iZiI6MTc0NDMxODkxNiwicGF0aCI6Ii81OTI4MDczNi80MzE4NDIzOTYtMDNhZjkzZmYtNmFjZC00NGM3LWE5NzMtZGNhMjBhYzIwNWJkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDEwVDIxMDE1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZmNGNkZjI5OTQ1YmFjNmYzMmIzNThiOWEyZmIyZTBiMjVlMjczNTc2NmY3MjU1NjkzOTMwNjUwYzgyZDliMzImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.XCieUKi8dB-r8H75QHWKwX7UBtC6m1NXbSFxbUV_lkI" alt="screenshot" />
+  <img src="https://i.postimg.cc/tJSWPhZ8/image.png" alt="screenshot" />
   <p align="center">Your profile can also be enriched with data from other social media sites.</p>
 </div>
 
diff --git a/src/server/tests/test_orpheus.py b/src/server/tests/test_orpheus.py
@@ -137,7 +137,7 @@ def run_async():
 from llama_cpp import Llama
 
 # Set the path to your GGUF model file (update this to the correct path)
-MODEL_PATH = "./models/orpheus-3b-0.1-ft-q4_k_m.gguf"  # Replace with your GGUF file path
+MODEL_PATH = "../voice/models/orpheus-3b-0.1-ft-q4_k_m.gguf"  # Replace with your GGUF file path
 
 # Number of layers to offload to GPU (adjust based on your GPU memory, e.g., 30 for 8GB VRAM)
 N_GPU_LAYERS = 20
@@ -161,6 +161,23 @@ def run_async():
 END_TOKEN_IDS = [128009, 128260, 128261, 128257]
 CUSTOM_TOKEN_PREFIX = "<custom_token_"
 
+# Default text to be spoken if no text is provided
+DEFAULT_TEXT = "This is a default sentence."
+BATCH_SENTENCES = [
+    "Good morning Kabeer!",
+    "You've got a busy day ahead.",
+    "Meetings, presentations and even a night out with the boys! <chuckle>",
+    "You ready to crush this?",
+]
+
+def create_filename(sentence, max_words=3, max_length=50):
+    words = sentence.split()[:max_words]
+    base = "_".join(words)
+    safe_base = "".join(c for c in base if c.isalnum() or c in ("_", "-"))
+    if len(safe_base) > max_length:
+        safe_base = safe_base[:max_length]
+    return safe_base + ".wav"
+
 def format_prompt(prompt, voice=DEFAULT_VOICE):
     """Format prompt for Orpheus model with voice prefix and special tokens."""
     if voice not in AVAILABLE_VOICES:
@@ -351,55 +368,76 @@ def list_available_voices():
     print("<laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>")
 
 def main():
-    # Parse command line arguments
-    parser = argparse.ArgumentParser(description="Orpheus Text-to-Speech using local GGUF model")
-    parser.add_argument("--text", type=str, help="Text to convert to speech")
-    parser.add_argument("--voice", type=str, default=DEFAULT_VOICE, help=f"Voice to use (default: {DEFAULT_VOICE})")
-    parser.add_argument("--output", type=str, help="Output WAV file path")
+    parser = argparse.ArgumentParser(description="Generate speech from text.")
+    parser.add_argument("text", nargs="*", help="Text to convert to speech")
+    parser.add_argument("--voice", default="default_voice", help="Voice to use")
+    parser.add_argument("--output", help="Output file or directory (in batch mode)")
+    parser.add_argument("--batch", action="store_true", help="Process predefined batch of sentences")
+    parser.add_argument("--temperature", type=float, default=0.7, help="Temperature for generation")
+    parser.add_argument("--top_p", type=float, default=0.9, help="Top-p sampling")
+    parser.add_argument("--repetition_penalty", type=float, default=1.0, help="Repetition penalty")
     parser.add_argument("--list-voices", action="store_true", help="List available voices")
-    parser.add_argument("--temperature", type=float, default=TEMPERATURE, help="Temperature for generation")
-    parser.add_argument("--top_p", type=float, default=TOP_P, help="Top-p sampling parameter")
-    parser.add_argument("--repetition_penalty", type=float, default=REPETITION_PENALTY, 
-                       help="Repetition penalty (>=1.1 required for stable generation)")
     
     args = parser.parse_args()
-    
+
     if args.list_voices:
         list_available_voices()
         return
-    
-    # Use text from command line or prompt user
-    prompt = args.text
-    if not prompt:
-        if len(sys.argv) > 1 and sys.argv[1] not in ("--voice", "--output", "--temperature", "--top_p", "--repetition_penalty"):
-            prompt = " ".join([arg for arg in sys.argv[1:] if not arg.startswith("--")])
+
+    if args.batch:
+        # Batch mode
+        if args.output:
+            batch_dir = args.output
+            if not os.path.isdir(batch_dir):
+                os.makedirs(batch_dir, exist_ok=True)
         else:
-            prompt = input("Enter text to synthesize: ")
-            if not prompt:
-                prompt = "Hello, I am Orpheus, an AI assistant with emotional speech capabilities."
-    
-    # Default output file if none provided
-    output_file = args.output
-    if not output_file:
-        os.makedirs("outputs", exist_ok=True)
-        timestamp = time.strftime("%Y%m%d_%H%M%S")
-        output_file = f"outputs/{args.voice}_{timestamp}.wav"
-        print(f"No output file specified. Saving to {output_file}")
-    
-    # Generate speech
-    start_time = time.time()
-    audio_segments = generate_speech_from_api(
-        prompt=prompt,
-        voice=args.voice,
-        temperature=args.temperature,
-        top_p=args.top_p,
-        repetition_penalty=args.repetition_penalty,
-        output_file=output_file
-    )
-    end_time = time.time()
-    
-    print(f"Speech generation completed in {end_time - start_time:.2f} seconds")
-    print(f"Audio saved to {output_file}")
+            batch_dir = "outputs"
+            os.makedirs(batch_dir, exist_ok=True)
+
+        for sentence in BATCH_SENTENCES:
+            filename = create_filename(sentence)
+            output_file = os.path.join(batch_dir, filename)
+            print(f"Generating audio for: {sentence}")
+            start_time = time.time()
+            audio_segments = generate_speech_from_api(
+                prompt=sentence,
+                voice=args.voice,
+                temperature=args.temperature,
+                top_p=args.top_p,
+                repetition_penalty=args.repetition_penalty,
+                output_file=output_file
+            )
+            end_time = time.time()
+            print(f"Speech generation for '{sentence}' completed in {end_time - start_time:.2f} seconds")
+            print(f"Audio saved to {output_file}")
+    else:
+        # Non-batch mode
+        if args.text:
+            prompt = " ".join(args.text)
+        else:
+            prompt = DEFAULT_TEXT
+            print(f"No text provided. Using default text: {DEFAULT_TEXT}")
+
+        if args.output:
+            output_file = args.output
+        else:
+            os.makedirs("outputs", exist_ok=True)
+            timestamp = time.strftime("%Y%m%d_%H%M%S")
+            output_file = f"outputs/{args.voice}_{timestamp}.wav"
+            print(f"No output file specified. Saving to {output_file}")
+
+        start_time = time.time()
+        audio_segments = generate_speech_from_api(
+            prompt=prompt,
+            voice=args.voice,
+            temperature=args.temperature,
+            top_p=args.top_p,
+            repetition_penalty=args.repetition_penalty,
+            output_file=output_file
+        )
+        end_time = time.time()
+        print(f"Speech generation completed in {end_time - start_time:.2f} seconds")
+        print(f"Audio saved to {output_file}")
 
 if __name__ == "__main__":
     main()