This document serves as a detailed version of the FramePack-eichi main README, providing in-depth information about each feature and setting. For more practical usage methods, please refer to the User Guide, and for update history, see the Changelog.
- F1 Model: Forward generation method, more dynamic movement, simple operation
- Standard Model: Reverse generation method, precise control, feature-rich
- Forward Generation Method: Generates intuitive movements in the normal generation direction (from first to last)
- Simple Interface: Omits Section (keyframe images) and Final (endframe) functions
- Movement Richness: Easier to generate dynamic movements compared to the standard version
- Beginner-Friendly: Fewer settings, intuitive operation
- Image Influence: Function to control change from the initial Image in the first section (adjustable from 100.0% to 102.0%)
- Reverse Generation Method: Unique approach that generates from the final frame backwards
- Feature-Rich Interface: Complex UI allowing detailed settings
- Keyframe Control: Fine control using Image, Final, and section images
- Advanced User-Oriented: Enables sophisticated control through detailed settings
- Image: Main starting keyframe
- Final Frame: Final frame (optional)
- Section Settings: Individual keyframe images and prompts can be set for each section
- Red Frame (Section 0): Automatically copied to all even-numbered sections (0,2,4,6...)
- Blue Frame (Section 1): Automatically copied to all odd-numbered sections (1,3,5,7...)
- Toggle On/Off: Can be toggled with a checkbox
- Benefit: Covers all sections with just two keyframe settings
- Ability to set individual prompts for each section
- Section-specific prompts are only used during generation of that section
- Common prompt is used when left blank
- Note: This feature is an experimental implementation and its effectiveness is not guaranteed
- Automatically embeds prompts, seed values, and section information in generated images
- Settings can be retrieved from saved images
- Standard metadata format compatible with SD-based tools
- Setting information can be shared along with images
- Reduces VRAM usage during LoRA application with 8-bit floating-point format
- Performance improvement with
scaled_mmoptimization for RTX 40 series GPUs - Recommended to keep disabled by default (may result in warnings or errors in some environments)
- "Use LoRA" checkbox: Toggles LoRA on/off
- LoRA file selection: Select the LoRA file to use
- Application strength slider: Adjust LoRA influence from 0.0 to 1.0
- Format: Unified to HunyuanVideo format in v1.9, improving compatibility
- Note: When using LoRA, there may be a waiting time before the progress bar starts
- In v1.9, with code introduced from kohya-ss/FramePack-LoRAReady, both high VRAM and low VRAM modes use the same direct application method, significantly improving stability
- Specify output folder name
- "Save and Open Output Folder" button allows saving settings and opening the folder
- Settings are retained after restarting the application
- Adjustable from 0 to 100 range with slider (0=uncompressed, 16=default, higher values=higher compression/lower quality)
- Lower values result in higher quality but larger file sizes
- Setting to 16 may solve black screen issues
FramePack's most distinctive feature is its unique "future to past" video generation approach. Conventional video generation AIs create frames sequentially from the first frame to the future, which can lead to quality degradation and reduced consistency in longer videos.
In FramePack, it first generates the final frame from the input image, then creates each frame in reverse direction. This allows it to maintain high quality and consistency even in longer videos.
For the F1 model, generation happens in the conventional direction (from first to last). This makes it easier to create dynamic movements, but the complexity of settings has been greatly simplified.
FramePack-eichi improves quality further by strategically placing multiple keyframe images:
-
Preventing Abrupt Changes in Final Section:
- The original endframe with images set only for the first (last 1 second) section had an issue where the image changed suddenly in the final section (near the first 1 second)
- FramePack-eichi takes a straightforward approach by allowing keyframe images to be set for all sections
- Particularly important keyframes are highlighted with red frames, and setting images for these allows automatic copying
- As mentioned above, FramePack generates videos from the final section, so both the section order and keyframe arrangement are set from the end
- In 6-second mode, FramePack may not reach the keyframe images for loop mode before the loop ends
- 8-second mode provides more gradual image transitions than 6-second mode
- In any case (including for multiple scenes), the greater the difference between images, the larger the movement changes, resulting in smoother motion
-
Loop Function Optimization:
- In loop mode, the first keyframe is automatically copied to the Final Frame
- Since v1.5.1, copying from input images in normal mode has been stopped, and image copying is only enabled in loop mode
- By setting the loop's starting pose in keyframe image 1, you can create smooth cyclic videos
-
Section-Specific Prompt Settings: *Added in v1.2 [Experimental Implementation]
- Set unique prompts for each section to achieve different movements or expressions per section
- For example, you can naturally express changes in movement like "walking" → "sitting" → "waving"
- The influence of prompts is subtle, but effective when combined with keyframe images
-
Short Video Modes: *Added in v1.6.1
- Support for 1-second, 2-second, 3-second, and 4-second short video modes
- Each mode has optimized section numbers and copy patterns
- Enables specialized control for short-duration expressions
The F1 model has these major differences from the standard version:
-
Forward Generation Approach:
- Generates from the first frame sequentially, allowing more natural movement transitions
- Section (keyframe images) and Final (endframe) functions become unnecessary
-
Simplified Interface:
- Only set the "Image" image
- Adjust the degree of change in the first section with the "Image Influence" slider (100.0% to 102.0%)
-
Operability and Results:
- High-quality results with fewer settings
- Produces more movement and dynamic footage
- Simple operation, user-friendly for beginners
Relationship between Image (input image), Final Frame (final frame), and keyframe images:
- About Priority:
- Basically, except for the very last section, each section is based on the linearly generated previous section
- If a keyframe image is set for the current section, it is used; if not, an intermediate state inferred from other images is used
- If a keyframe image is set for the final section, it takes priority over the Image.
This structure allows for fine control over each section, resulting in more natural and consistent movements.
In v1.7, the keyframe image copy function was greatly improved, introducing a more efficient and intuitive system:
-
Visual Distinction with Red/Blue Frames:
- Red Frame (Section 0): Automatically copied to all even-numbered sections (0,2,4,6...)
- Blue Frame (Section 1): Automatically copied to all odd-numbered sections (1,3,5,7...)
-
Keyframe Setting Efficiency:
- Cover all sections with just two keyframe settings
- Previously required setting each section individually, now greatly streamlined with pattern-based automatic copying
-
Dynamic Section Number Adaptation:
- Accurately calculates section numbers based on selected video length and frame size
- Automatically adjusts copy destinations based on calculated section numbers
-
Flexible Control with Checkbox:
- Easily toggle automatic keyframe copy function on/off
- For complex videos, you can turn it off to control each section individually when needed
This system significantly reduces the effort of keyframe settings, especially when creating longer videos (10+ seconds).
Prompt settings are as important as keyframe images:
-
Basic Prompt Structure:
- Writing in order of subject → movement → other elements is effective
- Example:
The character walks gracefully, with clear movements, across the room.
-
Movement Specification Levels:
- No prompt: Almost no movement is generated
- Simple movement: Basic movements are generated with prompts like
moves back and forth, side to side - Specific movement: More complex movements are generated with detailed prompts like
dances powerfully, with clear movements, full of energy
-
Important Notes:
- Words representing large movements like "dance" may result in more exaggerated movements than expected
- Practical prompt examples:
- Gentle movement:
The character breathes calmly, with subtle body movements. - Moderate movement:
The character walks forward, gestures with hands, with natural posture. - Complex movement:
The character performs dynamic movements with energy and flowing motion.
- Gentle movement:
-
Deep Structure of Prompts (LLAMA and CLIP Separation):
-
Inside FramePack, prompts are processed by two different models:
-
LLAMA Model (256 token limit):
- Responsible for detailed understanding of text and context processing
- Used for controlling overall content and sequence of the video
- Character count guideline: About 1000-1300 characters (English) or 200-400 characters (Japanese)
- Relates to control of scene context and narrative
-
CLIP Model (77 token limit):
- Model specialized in associating images and text
- Influences the generation of specific visual features in video frames
- Character count guideline: About 300-400 characters (English) or 50-150 characters (Japanese)
- Relates to control of style, subject, and visual attributes
-
-
Effective Prompt Writing Strategy:
-
First 300-400 characters (English)/50-150 characters (Japanese):
- Important "visual part" processed by both LLAMA and CLIP
- Describe main visual elements, style, subject, and overall tone here
- Example:
A young woman with long flowing hair, cinematic lighting, detailed facial features, soft expressions, gentle movements
-
Latter 600-900 characters (English)/150-250 characters (Japanese):
- "Narrative part" processed only by LLAMA
- Describe movement details, scene context, and sequence information here
- Example:
The camera slowly pans from left to right. The woman gradually turns her head, her expressions changing from neutral to a slight smile. There is a sense of emotional buildup as if emotional music is playing in the background.
-
-
Section-Specific Prompt Usage: *Added in v1.2 [Experimental Implementation]
- Keep section-specific prompts concise, focusing on important movements for that section
- Clear, specific instructions are more effective than long sentences
- Example: Section 1 "walking motion", Section 2 "sitting motion", Section 3 "waving motion"
- Note: The effect of section prompts is subtle, and combination with image settings is important
-
F1 Model Prompts (v1.9.1):
- Expressions that clearly specify movement are particularly effective
- Specifying emotional expressions and movement speed concretely also yields good results
- Example:
A character enthusiastically dancing with dynamic movements, arms swinging freely, head nodding to the rhythm, full of energy and life
-
Style Adjustment Using LoRA: *Added in v1.3 [Experimental Implementation]
- Combining LoRA selection and prompts can emphasize specific styles or expressions (probably)
- LoRA effects can be adjusted with application strength (0.1-0.3 for subtle effects, 0.5-0.8 for pronounced effects)
- Effects are maximized when prompts and LoRA selection match
The quality of FramePack's video generation is largely determined by keyframe image selection. Important points for choosing ideal difference images:
-
Optimal Difference Level:
- Excessively small differences: Using almost identical images ("so-called intelligent differences") will generate almost no movement
- Excessively large differences: Using completely unrelated images will not lead to natural movement
- Ideal differences: Changes that AI can find relationships between, such as different poses of the same character, are optimal
-
Maintaining Relevance:
- Simply flipping an image horizontally will be recognized by AI as a completely different image and won't lead to natural movement
- Changes in face direction, hand position, and body pose are ideal difference elements
- Maintaining consistency in background and clothing as much as possible allows AI to focus on character movement
- Coincidentally, variations of a character created with similar prompts using image generation AI can be one of the ideal difference elements
-
Characteristics of Ideal Difference Images:
- Same character with slightly changed posture
- Subtle changes in expression (e.g., expressionless → smiling, however, without face position change, movement will be weak)
- Pose changes with natural hand or arm movements
- Gentle changes in head direction
-
Experimental Approach:
- Selection of difference images has a stronger artistic aspect than scientific, so trial and error is important
- Initially try differences with similar poses, then gradually adjust the magnitude of differences
- Noting successful combinations allows application to future works
-
Combination with Image Generation AI:
- If ideal difference images are not available, using image generation AI to create different poses of the same character is also effective
- Keep pose changes specified in prompts modest and avoid dramatic changes for more natural movement
This structured approach helps maximize the strengths of both models and generate more expressive videos.
The F1 model introduces a new parameter called "Image Influence":
What it affects: This directly adjusts the degree of change from the Image in the first section.
Technical mechanism:
- Set in a very narrow range from 100.0% to 102.0%
- Setting to 100.0% maintains faithfulness to the Image
- As it approaches 102.0%, movement decreases, approaching almost static
Effects:
- At 100.0%, natural movement is generated
- Around 101.0%, changes from the initial frame become more gradual
- At 102.0%, almost no movement is generated
The standard FramePack-eichi has two important functions for controlling video movement: "EndFrame Influence Adjustment" and "All-Padding". While these may seem similar, their operating principles and effects are completely different.
What it affects: This directly changes the strength of the final frame (Final Frame) itself.
Technical mechanism:
- Exactly multiplies the specified value to the latent representation of the final frame
- Implemented in code as
modified_end_frame_latent = end_frame_latent * end_frame_strength - Value range is 0.01 to 1.00, with 1.00 being the default (no change)
Effects:
- Reducing the value from 1.0 to 0.5 makes the influence of the final frame "overall" exactly half
- Setting to 0.3 makes the influence of the final frame "overall" exactly 30%
- By weakening the direct influence of the final frame, characteristics of the first frame (Input Image) appear earlier
What it affects: This changes how sections are connected to each other.
Technical mechanism:
- Normally, padding values between sections are automatically calculated as
[3, 2, 2, 2, 1, 0] - When All-Padding is enabled, this value is unified to a single specified value (e.g.,
[1.5, 1.5, 1.5, 1.5, 1.5, 0]) - Value range is 0.2 to 3.0, with 1.0 being the standard value
- The final section (0th) is always forced to 0
Effects:
- Setting to higher values like 1.5 makes each section reference the previous section more strongly, resulting in rarer changes
- Setting to lower values like 0.5 makes each section reference the previous section less, resulting in more changes
- The "distribution" of change amount is altered, but the overall strength of frames remains the same
Guidelines for appropriate settings based on the scene:
- Large difference images: 0.3 to 0.6 (achieving gradual changes)
- Moderate differences: 0.5 to 0.8 (balanced transition)
- Small differences: 0.8 to 1.0 (near-standard influence)
- Facial expression changes: 0.7 to 0.8 (natural transition of expressions)
- Large movements of body or hands: 0.3 to 0.5 (more natural intermediate frames)
- Smooth transitions: 1.5 to 2.0 (pure transitions where section boundaries are not noticeable)
- Standard movement: 1.0 (balanced transition)
- Active movement: 0.5 to 0.7 (larger changes in each section)
- Extreme movement: 0.2 to 0.4 (very active and unpredictable movement)
- Influence 0.5 + All-Padding 0.5: For more dynamic movement
- Influence 0.3 + Short-time mode: When creating quick-change loop animations
- Influence 0.8 + Long-time mode: Expressing gentle movements with slow changes
- Ultra-low influence (0.01 to 0.1): Almost ignoring the final frame and using a change of thinking where the first frame is the "goal"
- High All-Padding value (2.0+) + Influence 0.5: Weakens the influence of the final frame without making section boundaries noticeable
-
Port setting:
--portparameter (Default: 8001)- Port number used by WebUI
- Change if it conflicts with other applications
-
Server address:
--serverparameter (Default: '127.0.0.1')- Change to
0.0.0.0for access within local network
- Change to
-
Auto browser launch:
--inbrowseroption- Automatically opens browser at startup
- Image Influence: Slider (Default: 100.0%)
- Range: 100.0% to 102.0%
- Adjusts the degree of change from the Image in the first section
- Lower values: More active movement (100.0% is standard)
- Higher values: More static movement (102.0% is almost stationary)
-
GPU memory preservation setting:
gpu_memory_preservationslider (Default: 10GB) *Existing feature- Lower value = More VRAM usage = Faster processing
- Higher value = Less VRAM usage = Stable operation
- Mechanism: The smaller the setting value, the more VRAM is released for the transformer model
- Calculation method: Available VRAM for the tool is the upper limit (with margin) minus the setting value (minimum 6GB is always secured)
- Example: With 16GB VRAM, assuming margin to 14GB, "14-(10-6)=10GB" is used
- Setting to the lower limit of 6GB results in "14-(6-6)=14GB", using almost the upper limit (reduces processing time by about 10 seconds per section, but with memory swap risk)
- Recommended values:
- 8GB VRAM: 7-8GB
- 12GB VRAM: 6-8GB
- 16GB or more: Around 6GB
- Note: Increase the value if running other applications simultaneously
- This tool secures a 3GB margin in the background for other image generation tools to run
- When using LoRA, it's better to prepare an additional margin
-
High VRAM mode: Auto-detection (v1.5.1: 60GB+ free VRAM, v1.6: 100GB+ free VRAM) *Feature improvement
- When enabled: Models are always kept on GPU, reducing memory transfer overhead
- Effect: Up to 20% processing speed improvement
- In v1.6, the threshold was raised, making low VRAM mode used in most environments
- Even in low VRAM mode, the same direct application method as high VRAM mode is used, improving feature consistency
- FP8 optimization: Checkbox (Default: Disabled)
- When enabled: Applies quantization using 8-bit floating-point format
- Effect: Significantly reduces VRAM usage, also improves processing speed on RTX 40 series GPUs
- Note: May cause warnings or errors in some environments
- Recommendation: Keep disabled normally, consider enabling only when using LoRA in low VRAM environments
-
Frame size setting:
frame_sizedropdown menu (Default: 1 second) *Added in v1.5- 0.5 second: Generates frames for 0.5 seconds, section count and processing time almost double
- By giving differences to images of each frame with All-padding 0 mode, even more intense movement is possible
- 1 second: Generates frames for 1 second
-
Steps:
stepsslider (Default: 25) *Existing feature- Increasing values improves quality but proportionally increases processing time
- Recommended range: 20-30 (often similar quality can be achieved with 20)
- 15 or less: Noticeable quality degradation occurs
-
TeaCache:
use_teacachecheckbox (Default: Enabled) *Existing feature- Enabled: Processing approximately 15-20% faster
- Side effect: May slightly degrade representation of hands and fingertips
- Usage: Recommended to enable for general video generation, disable when fine detail is important
-
Random seed value:
seednumeric input or "Use Random Seed" checkbox *Feature added by nirvash- Same seed value = Reproducible results
- Random seed: Generates different movements each time
- Note: Results change with the same seed if prompts or images change
-
Distilled CFG Scale:
gsslider (Default: 10.0) *Existing feature- Distilled guidance scale value
- Lower values = More free movement (increased deviation from prompt)
- Higher values = More faithful to prompt (movement may be restricted)
- Recommendation: Maintain default value (changing is for advanced users)
-
MP4 compression setting:
mp4_crfslider (Default: 16) *Merged from original in v1.6.2- Range: 0 to 100 (0=uncompressed, 100=maximum compression)
- Lower values result in higher quality videos but larger file sizes
- Higher values result in higher compression and smaller file sizes, but lower quality
- Setting to 16 may solve black screen issues
- Usage: Low values (0-10) for archival purposes, moderate values (16-30) for web sharing
-
Use LoRA:
use_loracheckbox (Default: Disabled)- Enabled: Customizes the model using a LoRA file
- When using LoRA, there may be longer waiting times before the counter starts
-
LoRA file: File selection component
- Specifies the LoRA file to use
- Supported format: Unified to HunyuanVideo format in v1.9
-
LoRA strength:
lora_strengthslider (Default: 0.8)- Range: 0.0 to 1.0
- Lower values: Subtle effect
- Higher values: Strong effect
- Optimal value varies by LoRA file
-
Improvements in v1.9
- Significantly improved stability with code from kohya-ss/FramePack-LoRAReady
- Unified application method for high VRAM and low VRAM modes
- Reduced code complexity by using the common load_and_apply_lora function
-
Video length: Radio buttons +
total_second_lengthslider *Extension of unique feature- 1 second: Ultra-short video (about 30 frames @ 30fps) - Added in v1.5.1
- 2 seconds: Short video (about 60 frames @ 30fps) - Added in v1.6.1
- 3 seconds: Short video (about 90 frames @ 30fps) - Added in v1.6.1
- 4 seconds: Short video (about 120 frames @ 30fps) - Added in v1.6.1
- 6 seconds: Standard mode (about 180 frames @ 30fps)
- 8 seconds: Standard mode (about 240 frames @ 30fps)
- 10 seconds: Long video (about 300 frames @ 30fps)
- 12 seconds: Long video (about 360 frames @ 30fps)
- 16 seconds: Long video (about 480 frames @ 30fps)
- 20 seconds: Long video (about 600 frames @ 30fps)
-
Automatic keyframe copy:
enable_keyframe_copycheckbox (Default: Disabled - Changed in v1.5.1) *Unique feature- Enabled: Keyframe images are automatically copied to other sections
- Disabled: Each keyframe needs to be set individually
- Usage: Advanced users designing complex movements may disable this
-
Output folder: Output folder setting field (Default: 'outputs') *Added in v1.2
- Destination for generated videos and images
- Folder name can be directly entered in the input field
- "Save and Open Output Folder" button opens the folder
- Settings are saved in JSON format and maintained after restart
-
Save section still images:
save_section_framescheckbox (Default: Disabled) *Feature added by nirvash- Enabled: Final frame of each section is saved as a still image
- Usage: Useful for checking connection points between sections
-
Save section videos:
keep_section_videoscheckbox (Default: Disabled) *Unique feature- Enabled: Video files for each section are retained, will remain if ended with "End"
- Disabled: Only the final completed video is saved (intermediate files are deleted), note they don't go to trash
- Usage: Useful for checking movements in each section individually
-
Save tensor data: Checkbox (Default: Disabled) *Added in v1.8
- Enabled: Saves latent representations of generated videos in .safetensors format
- Usage: Useful for later combining with other videos or saving the generation process
-
Preset save: "Save" button
- Saves current prompt with a name
- Setting a blank name sets it as the default prompt at startup
-
Apply preset: "Apply" button
- Applies the selected preset's prompt to current generation settings
-
Preset management:
- Delete: Removes unwanted presets (default preset cannot be deleted)
- Clear: Clears the editing field
- Metadata embedding: Automatically enabled
- Automatically saves prompts, seed values, and section information in generated images
- Standard metadata format compatible with SD-based tools
- Setting information can be shared along with images
-
Language selection: Set with command parameter in docker-compose.yml
# For Japanese: command: ["--lang", "ja"] # For Traditional Chinese: command: ["--lang", "zh-tw"] # For English (default): command: ["--lang", "en"]
-
Volume settings: Set with volumes parameter in docker-compose.yml
volumes: - ./data:/app/framepack/data - ./models:/app/framepack/hf_download - ./outputs:/app/framepack/outputs
- data: Storage location for configuration files
- models: Location for downloaded models
- outputs: Storage location for generated videos