-
Notifications
You must be signed in to change notification settings - Fork 84
Single Face: How Face Tracking Works
This page provides an in-depth explanation of how the _process_face_tracking_single function works. We will delve into the algorithms and logic behind this process, including how it handles face embeddings, position tracking, and how it determines if a face in the current frame is the same face it was tracking in previous frames.
The core function of _process_face_tracking_single is to ensure that when a video frame is processed, the face swap occurs on the same face throughout the video. This involves accurately tracking the desired face and using this identified face for the swap. This process is not straightforward since the faces are constantly in motion, rotating, scaling, and can even disappear from the camera for a few frames.
Let's review the variables in more detail:
-
first_face_embedding(numpy.ndarray, Optional):- This variable holds the face embedding for the face we are tracking. A face embedding is a high-dimensional vector (a long list of numbers) that uniquely represents a face's features. The key idea here is that similar faces have similar embeddings. We get this embedding from the face analyzer.
- It's initialized to
Nonebecause when the tracker starts we don't have a face to track yet. - It is updated each frame by a weighted average.
-
first_face_position(Tuple[float, float], Optional):- This is a tuple containing the (x, y) coordinates of the face's center in the frame.
- It is used for calculating the consistency of movement with past tracked face.
- It is also initialized to
Noneinitially, updated each frame. - It is updated each frame by a weighted average.
-
first_face_id(int, Optional): * This is an unique identification number for the face on each frame. * If the faces have the same identification number on two frames it increases our confidence that they are the same face. * It is also initialized toNoneinitially, updated each frame. -
face_lost_count(int):- A counter that keeps track of how many consecutive frames the tracking algorithm failed to find the face it was tracking.
- If this count exceeds a certain threshold (not defined directly in this function, it is implicit in if we found a face or not), the face will no longer be considered for tracking.
- Initialized to
0and incremented or reset each frame.
-
face_position_history(deque):- A double ended queue (deque) storing the last 30 face positions.
- Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
-
target_face(Face):- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is the face that we were previously tracking.
-
source_face(List[Face]):- A list containing the
Faceobject for our source face. - This is the face we use to replace the tracked target face.
- A list containing the
-
active_source_index(int):- An integer determining which index from the
source_faceto use. - Usually just
0, meaning we only use one source face.
- An integer determining which index from the
The _process_face_tracking_single function can be broken down into distinct stages:
-
Initialization Check:
- The function first checks if
first_face_embeddingisNone. If it is, this is the first frame where we've encountered a face to track. - In this case:
- The
target_face's embedding is extracted usingextract_face_embeddingand is assigned tofirst_face_embedding. This essentially becomes the "reference fingerprint" of the face we want to track. - The
target_face's center position is extracted usingget_face_centerand assigned tofirst_face_position. - The
target_face's unique id is extracted usingid(target_face)and assigned tofirst_face_id. -
face_lost_countis set to0, because we found a face. -
face_position_historyis cleared, to only track this face. -
first_face_positionis appended to theface_position_history. - The
_process_face_swapfunction is called and the current timestamp is saved to thelast_swap_timeglobal variable, marking the first frame's successful swap. - The function then returns, having processed the initial face.
- The
- The function first checks if
-
Face Detection and Best Match Finding:
- If
first_face_embeddingis notNone, the algorithm is already tracking a face. - The function will call
_detect_faces(frame)to get a new list of faces that may be the face we are tracking. - If no face is detected:
- The counter for
face_lost_countis incremented, as the face was not found in the current frame. * If the use ofpseudo_faceis enabled usingmodules.globals.use_pseudo_faceand our best score is below a certain value frommodules.globals.pseudo_face_thresholdwe create a pseudo face and use it for the swap. * The function then returns. - Otherwise, the algorithm prepares to loop through all the faces and score them based on how well they match the face we are tracking.
- For each detected face in the current frame, the following is done:
-
target_embedding: A new embedding of the detectedtarget_faceis obtained usingextract_face_embedding. -
target_position: The position of the detectedtarget_faceis obtained usingget_face_center. -
Embedding Similarity: The cosine similarity between
first_face_embeddingand thetarget_embeddingis calculated using thecosine_similarityfunction. This score represents how similar the new face looks to the previously tracked face based on their "fingerprints". The closer to 1.0, the more similar the faces are in appearance. -
Position Consistency: This is calculated by determining the inverse of the distance between the average of previous positions stored in
face_position_historyand thetarget_position. This score is not the distance, the closer the new face position to the average of the last 30 positions, the higher the score is. If theface_position_historyis empty, then we use the previousfirst_face_position. -
Total Match Score: A weighted score of these two metrics (
embedding_similarityandposition_consistency) is calculated based on weights frommodules.globals.embedding_weight_sizeandmodules.globals.position_sizeby the following forumla:TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHTmatch_score = ((EMBEDDING_WEIGHT * embedding_similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
-
Stickiness: If the unique
idof the detected face is the same as the currentfirst_face_idwe multiply the score by(1 + STICKINESS_FACTOR). This means that the current face is more likely to be picked as the best match. This helps to avoid sudden flickering or swapping between multiple faces. * The score with the highest totalmatch_scoreis chosen as thebest_match_face.
-
- If
-
Face Tracking Update:
- After going through all detected faces, we check if a
best_match_facewas found and the score is higher then themodules.globals.sticky_face_value. -
If a face is found:
- The counter for
face_lost_countis set to0. - The tracked face's embedding is updated to reflect the new
best_match_faceby taking a weighted average of both the old embedding and the new embedding usingOLD_WEIGHTfrommodules.globals.old_embedding_weightandNEW_WEIGHTfrommodules.globals.new_embedding_weight - The tracked face's position is updated to reflect the new
best_match_faceby getting the position of thebest_match_faceusingget_face_center. - The unique
idis also updated with the new id usingid(best_match_face) - We add the new position to the breadcrumbs (
face_position_history) - The
_process_face_swapfunction is called, which performs the actual face swap using ourbest_match_face.
- The counter for
-
If no good face is found:
* The counter for
face_lost_countis incremented, as the face was not found in the current frame. * If the use ofpseudo_faceis enabled usingmodules.globals.use_pseudo_faceand our best score is below a certain value frommodules.globals.pseudo_face_thresholdwe create a pseudo face and use it for the swap.
- After going through all detected faces, we check if a
-
Do Nothing:
- If for any reason no face was found or a fake face was used, we do nothing to the current frame.
- Face Embeddings: Face embeddings are a technique for capturing the essence of a face in a high-dimensional space. A mathematical algorithm (the face analyzer) translates a face into a long list of numbers. Similar faces will have similar numbers.
- Cosine Similarity: This is a way to measure how similar two vectors (like embeddings) are. A cosine similarity of 1.0 means the vectors are exactly the same, while 0.0 means they are completely different.
- Position Consistency: By tracking the position of the face and only selecting a face that is near our previous location, we minimize the possibility of tracking the wrong face.
- Weighted Averages: By taking weighted averages we get a more consistent update to the face position and embedding that helps to avoid jitteriness and abrupt changes when the face moves.
- Stickiness: The "stickiness" logic is used to ensure that the tracker sticks to the same face between frames instead of quickly swapping to a new face when a new face is detected.
- Pseudo Face: When we can't track a face, we can create a pseudo face by using our breadcrumb history.
-
modules.globals: This module contains all settings and global configurations. This is where all the weights, distances, and variables are kept. This means you can change many settings on the fly from the UI, without changing the code. - Thresholds: The thresholds, like the stickiness and pseudo face threshold, greatly effect the way the face tracker works. These values may need to be tweaked to get the desired effect.
The _process_face_tracking_single function is a powerful algorithm for tracking a single face in a video. It uses a combination of face embeddings for recognition, position for consistency, and smoothing techniques to make a seamless face-swapping experience. While there are several constants and variables in the function, these values are carefully tuned to achieve a good trade off between accuracy and responsiveness.
Understanding these details provides insight into the complexities of the face-swapping process and the methods used to keep the desired face consistent throughout a video.