-
Notifications
You must be signed in to change notification settings - Fork 84
Many Faces: How Face Tracking Works
This page provides a comprehensive and in-depth analysis of the _process_face_tracking_many function. It details the methods used for tracking multiple faces, up to 10 in this implementation, focusing on dynamic tracking, avoiding face-id mixing, and maintaining a smooth and consistent face-swapping experience. This function is the most complex, as it deals with a dynamic number of tracked objects.
The primary objective of _process_face_tracking_many is to track up to 10 distinct faces in a video, allowing for face swapping on each of them with different source faces. This involves tracking a dynamic number of faces, assigning a source face to each target face correctly, and handling situations when faces enter, exit, or are temporarily lost in the video frame.
Let's detail the crucial variables within this function:
-
tracked_faces_many(dict):- A dictionary that stores the data of each tracked face. The dictionary keys are integers, and the values are dictionaries containing the following:
-
embedding(numpy.ndarray): The face embedding of the tracked face. -
position(Tuple[float, float]): The (x, y) coordinates of the center of the tracked face. -
id(int): The unique identification number for the face. - The keys are the track ids for each face.
- This dictionary is created using the
globals()function if it does not exist.
-
position_histories_many(dict):- A dictionary that stores each faces position history. The keys of the dictionary are track ids for each face, and the value is a
dequestoring the last 30 face positions for that particular face. - Used to calculate an average position, useful for predicting where the face should be if we did not see the face on a current frame.
- Limited to a size of 30, so it acts as a short memory of where the face was.
- This dictionary is created using the
globals()function if it does not exist.
- A dictionary that stores each faces position history. The keys of the dictionary are track ids for each face, and the value is a
-
target_embedding(numpy.ndarray):- The face embedding of the current
target_face.
- The face embedding of the current
-
target_position(Tuple[float, float]):- The (x, y) coordinates of the center of the current
target_face.
- The (x, y) coordinates of the center of the current
-
face_id(int):- Unique identification number of the current
target_face.
- Unique identification number of the current
-
use_pseudo_face(bool):- A variable to check if we should create a pseudo face or not.
-
target_face(Face):- An object containing the information about each of the detected faces in the current frame.
- This includes properties like the bounding box, landmarks, and embedding.
- It's what the algorithm uses to find if this face is one of the tracked faces.
-
source_face(List[Face]):- A list containing the
Faceobject for our source face. - This is the face we use to replace the tracked target face.
- A list containing the
-
source_index(int):- An integer determining which index from the
source_faceto use for thistarget_face. - Used for identifying which of our source faces are going to be swapped.
- An integer determining which index from the
-
source_face_order(List[int]):- A list containing
[0, 1]which is the order of which source faces to use. If flipping faces is enabled this list will change to[1, 0]. This parameter is actually not used in_process_face_tracking_manyand is left for compatibility.
- A list containing
The _process_face_tracking_many function operates through the following detailed process:
-
Initialization of Global Dictionaries:
- We use
globals()to check if the face tracking dictionaries exist or not, if they do not then we create them and add them to theglobals()variables, this means these variables will persist between function calls. This is to ensure the dictionaries are persistent and shared between function calls.
- We use
-
Extraction of Target Face Data:
- The function extracts the embedding (
target_embedding), position (target_position), and unique id (face_id) of the inputtarget_face. -
use_pseudo_faceis a variable that we set toFalseinitially.
- The function extracts the embedding (
-
Iterating Through Tracked Faces and Finding the Best Match
- We initialize
best_match_scoreto-1andbest_match_keytoNone. - The code then iterates through each tracked face in
tracked_faces_manyto determine a best match by looping through the track ids and their data in thetracked_faces_manydictionary using theitems()function.- For each tracked face, we get:
- The
track_embedding,track_position, andtrack_history(fromposition_histories_many) -
Similarity Score: The cosine similarity between
track_embeddingandtarget_embeddingis calculated. This indicates how similar the currenttarget_facelooks compared to the tracked face, based on their "fingerprints." The closer this value is to 1.0 the more similar it is. -
Position Consistency Score: We calculate a position consistency score, this score is the inverse of the distance between the average from
track_historyor thetrack_positioniftrack_historyis empty. * Total Match Score: A weighted score is calculated, combining the "fingerprint score" and "position consistency score" using the following forumla using global variables frommodules.globals:TOTAL_WEIGHT = EMBEDDING_WEIGHT * modules.globals.weight_distribution_size + POSITION_WEIGHTscore = ((EMBEDDING_WEIGHT * similarity + POSITION_WEIGHT * position_consistency) / TOTAL_WEIGHT)
-
Stickiness: If the
face_idis the same as the trackedidof the face we increase the score further by using the following code:if track_data.get("id") == face_id: score *= (1 + STICKINESS_FACTOR)
- If the calculated
scoreis higher thenbest_match_scorethen we update ourbest_match_scoreand thebest_match_key.
- The
- For each tracked face, we get:
- We initialize
-
Updating Tracked Face Based on Best Match
- After looping through each tracked face, we determine if a
best_match_keywas found and is higher then the value in the variablemodules.globals.sticky_face_value. -
If we find a best match
- We get the tracked face data from
tracked_faces_manyusing thebest_match_key. - We get the
track_historyfrom theposition_histories_manydictionary using the samebest_match_key. - The embedding of this face is updated using a weighted average of the
track_embeddingand the newtarget_embeddingusing the variablesmodules.globals.old_embedding_weightandmodules.globals.new_embedding_weight. - The position of this face is updated using a weighted average of the
track_positionand the newtarget_positionby using a static weight of 0.8 for the old position and 0.2 for the new position. - The unique
idof the face is also updated - The new
target_positionis appended to thetrack_history. -
The score is also stored in a global variable to update the UI. If the `best_match_key` is below `10`, we use the following: `setattr(modules.globals, f"target_face{best_match_key + 1}_score", best_match_score)` - The
source_indexis determined using the following code:source_index = best_match_key % len(source_face)
- The
_process_face_swapis called and we return the result. * Else if no good match was found: - We check to see if we should create a pseudo face by checking
modules.globals.use_pseudo_faceand if our best score is lower thenmodules.globals.pseudo_face_threshold.- If we should create a pseudo face:
- We get the tracked
track_historyfromposition_histories_manyusing thebest_match_keyor if that is empty we get thepositionfor the tracked face, or if that isNonewe use the currenttarget_position. - A pseudo face is created with this position by calling
create_pseudo_face(). - The correct
source_indexis set usingsource_index = 0 if not source_face else 0 % len(source_face) -
_process_face_swapis called using thepseudo_face,source_faceandsource_index.
- We get the tracked
- If we should create a pseudo face:
- We get the tracked face data from
- After looping through each tracked face, we determine if a
-
Handling New Faces:
- If no
best_match_keywas found or we should not create a pseudo face we need to handle if this is a new face that we should track.- The code checks if the
tracked_faces_manydictionary is less then10, meaning we can track more faces. * If we can track more faces:- A new track id (
new_key) is created by counting the number of current keys in thetracked_faces_manydictionary. - A new dictionary is created and added to
tracked_faces_manyusing thenew_keywith theembedding,positionandidfrom ourtarget_face. - A new deque is created and stored in
position_histories_manyusing the samenew_key. - The new position is added to the position history.
- A
source_indexis assigned using the following code:source_index = new_key % len(source_face)
- The score is also initialized in the global variable, for display purposes in the UI. if the
new_keyis less then10:setattr(modules.globals, f"target_face{new_key + 1}_score", 0.00) - The
_process_face_swapis called with thetarget_face,source_faceand thesource_indexand the result is returned.
- A new track id (
- If we have max tracked faces: * We do not add any new faces and we simply return the current frame.
- The code checks if the
- If no
- Dynamic Tracking: The function is designed to dynamically track an arbitrary number of faces (up to 10) as they enter and exit the frame.
-
Modular Design: The function builds upon previously discussed functions like
extract_face_embedding,get_face_center,_process_face_swap, andcosine_similarity. - Weighted Averaging: Weighted averaging of the embeddings and positions ensures a smoother update, reducing abrupt changes and flickering and improving the stability of the tracking.
- Position History Tracking: By using a short history of previous positions, we can predict the next likely location of the face making the tracker more robust.
- Stickiness Factor: This helps maintain continuity in face tracking. It is more likely to track the same face between frames than to switch to a different face.
- Global Dictionaries: By using global dictionaries with keys as track ids, we can uniquely identify, access and modify tracked faces more efficiently.
- Pseudo Faces: We can use the face tracking history to make fake faces to make our tracking more robust.
-
Global Module: The use of global values in
modules.globalsallows for configuration of many of these variables such as thresholds, weights and distances on the fly from the UI without modifying the code.
- Performance: Tracking multiple faces simultaneously can be computationally intensive, and may impact the overall performance of the application depending on device limitations.
- Robustness: Although this function is robust, it can have problems with rapid movements, overlapping faces, or faces that disappear for extended periods of time.
-
Max Tracked Faces: The max of
10tracked faces is hard coded in this function and can only be increased by changing the code. -
Settings: The specific values used for constants like
STICKINESS_FACTOR,embedding_weight_size,position_sizeand thepseudo_face_thresholdall have an impact on the overall performance of the tracker and can be changed on the fly from the UI. - Memory Management: Since we are storing data in dictionaries, care should be taken to manage the memory usage so we don't have issues with performance as more faces are tracked.
The _process_face_tracking_many function represents an advanced approach to face tracking, enabling multi-face processing through careful management of face embeddings, positions, history, and ID assignment. This in-depth exploration should provide a thorough understanding of this core component and the complexity involved in achieving robust multi-face swapping in real-time video. The ability to dynamically track up to 10 faces with good stability and accuracy is a significant achievement.