This guide will walk you through integrating LLM transcript processing into your existing loom-transcript-scraper in a way that maintains folder integrity and ensures smooth operation for future runs.
Before making any changes, create a backup of your existing setup:
# Backup your main script
cp process.py process.py.originalYou have two options for integration:
Run the integration script to automatically update your process.py file:
python process_llm_integration.pyThis will:
- Add LLM processing functions to your main script
- Add command-line arguments for controlling LLM processing
- Create a backup of your original script (as
process.py.backup)
If you prefer to manually integrate the changes or have a heavily customized process.py:
- Copy the
clean_transcript()andprocess_for_llm()functions fromintegrated_solution.py - Add these functions to your
process.pyfile - Add the command-line arguments for LLM processing
- Add the code to create the LLM directory
- Add the call to process transcripts after saving them
To download and process transcripts in one step:
python process.py --process-llmThis will:
- Download transcripts from Loom as usual
- Process each transcript for LLM use
- Save processed transcripts to the
llm_ready_transcriptsdirectory
You can specify a custom directory for LLM-ready transcripts:
python process.py --process-llm --llm-dir custom_directoryIf you already have transcripts downloaded and want to process them separately:
python integrated_solution.pyFor customization:
python integrated_solution.py --source-dir "my_transcripts" --target-dir "llm_ready" --forceAfter running the processing, check that:
- The original transcripts remain intact in their original location
- Processed transcripts are stored in the target directory with the "_llm.txt" suffix
- The processing has correctly preserved timestamps and formatted the text
Example verification:
# List the processed transcripts
ls -la llm_ready_transcripts/
# Compare an original transcript with its processed version
diff -y --suppress-common-lines "original_transcript.txt" "llm_ready_transcripts/original_transcript_llm.txt"If you need to revert to the original script:
python process_llm_integration.py --restoreOr manually restore from your backup:
cp process.py.original process.py# Add Loom video URLs to loom-videos.txt
# Then run:
python process.py --process-llmpython integrated_solution.py --forcepython process.py --process-llm --llm-dir "/path/to/llm_transcripts"Issue: No transcripts are being processed for LLM Solution: Check if the source directory contains transcript files and that you're using the correct path
Issue: Error when running the integrated script
Solution: Ensure all required packages are installed (string, re, etc.)
Issue: Processed transcripts missing timestamps
Solution: Check the regex pattern in clean_transcript() function and adjust if needed for your specific format
Issue: Original script functionality broke after integration Solution: Restore from backup and try the manual integration approach