-
Notifications
You must be signed in to change notification settings - Fork 17
Preprocessing: Added functions for outlier and anomaly detection. #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
785c7ec
61be7f9
12111b4
c5d0f8c
9ac8ec5
1d47c0b
15d9c87
62caaa6
16c537b
a3d82d6
9414d93
8030b73
2e33e50
bde30d7
46ac491
219a5d3
3bd2c3b
f19b5be
2ac34d1
9aabe50
afa0168
07a1534
cb39276
78a8adf
4d9ef1d
29ca64a
d26ce82
b38b4d9
4e94f79
e2222b2
9b5eb0c
e507e2f
abf8462
7ed1a73
805cda4
5f9f0f3
90297cc
90a465d
f64e8f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,316 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "0", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import pathlib\n", | ||
| "\n", | ||
| "import matplotlib.pyplot as plt\n", | ||
| "import shapely\n", | ||
| "from matplotlib.lines import Line2D\n", | ||
| "\n", | ||
| "import pedpy\n", | ||
| "from pedpy.plotting.plotting import PEDPY_ORANGE, PEDPY_PETROL" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "1", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Preprocessing\n", | ||
| "\n", | ||
| "Pedpy provides functions for preprocessing:\n", | ||
| "\n", | ||
| "1. Outlier detection\n", | ||
| "2. Correcting invalid trajectories\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "2", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Outlier detection\n", | ||
| "\n", | ||
| "*PedPy* provides a function that detects and corrects outliers and also detects vertical displacements within the trajectory, which occur when the tracking of a person is interrupted and the tracker continues tracking something else instead.\n", | ||
| "\n", | ||
| "The algorithm for detecting outliers splits the trajectory into multiple dataframes, one per person, and calculates the distance between each pair of consecutive points. The expected distance d is defined as the 99% quantile of the distances between all consecutive points, multiplied by the tolerance t.\n", | ||
| "\n", | ||
| "$$\n", | ||
| "d = t * q_{0.99}\n", | ||
| "$$\n", | ||
| "\n", | ||
| "##### tolerance:\n", | ||
| "The tolerance parameter can be chosen manually. A low value for this parameter means a low tolerance for potential outliers, which can be useful in trajectories where pedestrians’ speed stays within a similar range. If pedestrian speed varies, for example in bottleneck experiments, the tolerance should be chosen higher. A value between 2 and 10 should cover most cases.\n", | ||
| "\n", | ||
| "\n", | ||
| "If an outlier is detected, the program checks whether there are consecutive outliers. Since the distance can no longer be used as an indicator, the function searches for the next frame within a realistic range r. Every subsequent frame that is not within this range is also considered an outlier, and the factor n is increased by one. In this case, as points should not be considered valid again by accident, the tolerance t' is much smaller.\n", | ||
| "\n", | ||
| "$$\n", | ||
| "r = n * t' * q_{0.99}\n", | ||
| "$$\n", | ||
| "\n", | ||
| "##### quantile:\n", | ||
| "\n", | ||
| "Like the tolerance, the quantile for the expected distance can also be chosen manually. This also influences the tolerance.\n", | ||
| "\n", | ||
| "For every part of the trajectory, where anomalies were detected, the corresponding person id and frames, where outlier occurred, are put into the log output.\n", | ||
| "\n", | ||
| "Outliers in the middle of the trajectory are corrected by interpolating the incorrect points as a straight line between the two correct points before and after the outlier occurs. Outliers at the beginning or at the end are extrapolated in the average direction of the trajectory." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "3", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "trajectory_data = pedpy.load_trajectory(\n", | ||
| " trajectory_file=pathlib.Path(\"demo-data/preprocessing/uni_corr_500_08_modified.txt\"),\n", | ||
| " default_unit=pedpy.TrajectoryUnit.METER,\n", | ||
| ")\n", | ||
| "trajectory_data_corrected, changed_index_orig, changed_index_new = pedpy.detect_anomalies_in_trajectories(\n", | ||
| " trajectory_data, tolerance=6, quantile=0.98\n", | ||
| ")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "4", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "\n", | ||
| "### Invalid trajectories\n", | ||
| "\n", | ||
| "If in a trajectory data set of a single person id more that certain percentage of all frames were considered outliers, this part of the trajectory is considered invalid.\n", | ||
| "\n", | ||
| "##### percentage_invalid:\n", | ||
| "This percentage mentioned above can be chosen manually by the percentage_invalid parameter, an integer parameter between 1 and 100. The default value is 20%.\n", | ||
| "\n", | ||
| "##### deleting:\n", | ||
| "\n", | ||
| "The function provides the bool parameter deleting, where the user can determine, that invalid data sets should be removed in the returned trajectory.\n", | ||
| "\n", | ||
| "### Focus on displacement detection\n", | ||
| "\n", | ||
| "##### displacements_only:\n", | ||
| "\n", | ||
| "It is possible to filter only for displacements in the trajectory by setting displacements_only = True. In this case, anomalies that do not occur at the very beginning or the very end are ignored, and the trajectory data of the affected person ID is only cropped after a displacement. If outliers occur at the very beginning, they are removed as well.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "5", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "trajectory_data_jumps_only, index_orig, index_new = pedpy.detect_anomalies_in_trajectories(\n", | ||
| " trajectory_data, displacements_only=True\n", | ||
| ")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "6", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Other parameters\n", | ||
| "\n", | ||
| "In the following description the term trajectory means the trajectory data of a single person id.\n", | ||
| "\n", | ||
| "##### max_length:\n", | ||
| "An integer value. Sometimes it may happen that a few outliers occur directly one after another without a jump back to the correct trajectory. The max_length parameter defines how many frames long these consecutive outliers can be before the program checks whether this indicates a vertical displacement in the trajectory. The default value is 8.\n", | ||
| "\n", | ||
| "##### critical_length_traj:\n", | ||
| "The minimum length a trajectory can have. This integer value is only relevant in cases where it seems that there is a displacement in the trajectory. If the supposed displacement happens before the number of previous frames can be considered a trajectory in its own right, every frame before the detected anomaly is assumed to be an outlier. If the minimum length has already been reached, the trajectory is cropped at the displacement. The default value is 10% of the trajectory’s length." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "7", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "traj_data_low_tolerance = pedpy.detect_anomalies_in_trajectories(\n", | ||
| " trajectory_data, tolerance=3, quantile=0.95, percentage_invalid=20, deleting=True, max_length=10\n", | ||
| ")[0]" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "8", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Compare original and corrected trajectory\n", | ||
| "\n", | ||
| "The function returns a corrected copy of the input trajectory data. Furthermore, it returns two lists: the first contains all person ids of the parts of the original trajectory where anomalies were found, and the second contains the corresponding person IDs of the corrected trajectory. In most cases, these are the same; only if some person IDs were deleted do subsequent person IDs shift.\n", | ||
| "\n", | ||
| "These lists can be used to plot the trajectory segments to get an impression of the outliers and how they were corrected. The black line represents the original trajectory, and the blue line represents the corrected one." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "9", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "walk_area = pedpy.WalkableArea(\n", | ||
| " shapely.from_wkt(\n", | ||
| " \"POLYGON ((10 -2, -10 -2, -10 7, 10 7, 10 -2), (9 6, -9 6, -9 5, 9 5, 9 6), (-9 -1, 9 -1, 9 0, -9 0, -9 -1))\"\n", | ||
| " )\n", | ||
| ")\n", | ||
| "\n", | ||
| "%config InlineBackend.figure_format = 'retina'\n", | ||
| "\n", | ||
| "\n", | ||
| "for i in range(len(changed_index_orig)):\n", | ||
| " original_trajectory = trajectory_data.data[trajectory_data.data[\"id\"] == changed_index_orig[i]]\n", | ||
| " trajectory_corrected = trajectory_data_corrected.data[trajectory_data_corrected.data[\"id\"] == changed_index_new[i]]\n", | ||
| " pedpy.plot_trajectories(\n", | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some comments about the plots being made:
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. I tried to find a combination of two PedPy colors, that works with light- and with darkmode. I did not add a title, because depending on the parameters a trajectory could have outliers and a displacement later, so it is difficult to define a clear type. |
||
| " traj=pedpy.TrajectoryData(data=original_trajectory, frame_rate=trajectory_data.frame_rate),\n", | ||
| " walkable_area=walk_area,\n", | ||
| " traj_width=1.75,\n", | ||
| " traj_color=PEDPY_PETROL,\n", | ||
| " ).set_aspect(\"equal\")\n", | ||
| " pedpy.plot_trajectories(\n", | ||
| " traj=pedpy.TrajectoryData(data=trajectory_corrected, frame_rate=trajectory_data.frame_rate),\n", | ||
| " walkable_area=walk_area,\n", | ||
| " traj_width=0.5,\n", | ||
| " traj_color=PEDPY_ORANGE,\n", | ||
| " ).set_aspect(\"equal\")\n", | ||
| " legend_elements = [\n", | ||
| " Line2D([0], [0], color=PEDPY_PETROL, lw=2, label=\"Original\"),\n", | ||
| " Line2D([0], [0], color=PEDPY_ORANGE, lw=2, label=\"Corrected\"),\n", | ||
| " ]\n", | ||
| "\n", | ||
| " plt.legend(handles=legend_elements, bbox_to_anchor=(1, 1), fontsize=8)\n", | ||
| " plt.xlabel(f\"personID {changed_index_orig[i]} / {changed_index_new[i]}\")\n", | ||
| " plt.show()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "10", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Correct invalid trajectories\n", | ||
| "\n", | ||
| "When working with head trajectories, participants may occasionally lean over obstacles. As a result, their trajectories can leave the walkable area for some frames, and this data cannot be processed by *PedPy*.\n", | ||
| "\n", | ||
| "To address this, there is a function that moves trajectory points that lay inside a wall or too close to it. The distance that should remain between the point and the wall afterwards is calculated by linear interpolation. The new distance lies within the interval between min_distance and max_distance:\n", | ||
| "\n", | ||
| "$$\n", | ||
| "d' = (d-b)*{(e-s) \\over (e-b)}+s\n", | ||
| "$$\n", | ||
| "\n", | ||
| "- d' is the new distance to the wall\n", | ||
| "- d is the original distance to the wall\n", | ||
| "- b corresponds to back_distance\n", | ||
| "- s corresponds to min_distance\n", | ||
| "- e corresponds to max_distance\n", | ||
| "\n", | ||
| "```{eval-rst}\n", | ||
| ".. figure:: images/parameters_preprocessing.png\n", | ||
| " :width: 400px\n", | ||
| " :align: center\n", | ||
| "```\n", | ||
| "\n", | ||
| "If a point lies inside the geometry or too close to it, it will be pushed outward. The distance interval for these points starts at back_distance, which must be negative because it represents the maximum depth inside the wall, and ends at max_distance. Points located deeper inside an obstacle are assigned a smaller new distance than points located near the boundary of the interval.\n", | ||
| "\n", | ||
| "For example, a point, which lays deep inside an obstacle will receive a new distance close to min_distance, which represents the minimum possible value for new_distance. A point that is already outside the obstacle but needs to be adjusted for smoother results will also receive a new distance, but this value will be only slightly larger than its original distance.\n", | ||
| "\n", | ||
| "It is essential that max_distance is larger than min_distance, and that back_distance is negative. Depending on the geometry and the parameter values, it can also be beneficial to buffer the geometry beforehand to create thicker walls. If the walls are too thin, the function may accidentally move a point to the wrong side.\n", | ||
| "\n", | ||
| "The function returns a pedpy.TrajectoryData, either the corrected version of the trajectory or the\n", | ||
| " original trajectory, if the original trajectory was valid." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "11", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "trajectory_data = pedpy.load_trajectory(\n", | ||
| " trajectory_file=pathlib.Path(\"demo-data/preprocessing/030_c_56_h0_invalid.txt\"),\n", | ||
| " default_unit=pedpy.TrajectoryUnit.METER,\n", | ||
| ")\n", | ||
| "\n", | ||
| "walk_area = pedpy.WalkableArea(\n", | ||
| " [\n", | ||
| " (3.5, -2),\n", | ||
| " (3.5, 8),\n", | ||
| " (-3.5, 8),\n", | ||
| " (-3.5, -2),\n", | ||
| " ],\n", | ||
| " obstacles=[\n", | ||
| " [\n", | ||
| " (-0.7, -1.1),\n", | ||
| " (-0.25, -1.1),\n", | ||
| " (-0.25, -0.15),\n", | ||
| " (-0.4, 0.0),\n", | ||
| " (-2.8, 0.0),\n", | ||
| " (-2.8, 6.7),\n", | ||
| " (-3.05, 6.7),\n", | ||
| " (-3.05, -0.3),\n", | ||
| " (-0.7, -0.3),\n", | ||
| " (-0.7, -1.0),\n", | ||
| " ],\n", | ||
| " [\n", | ||
| " (0.25, -1.1),\n", | ||
| " (0.7, -1.1),\n", | ||
| " (0.7, -0.3),\n", | ||
| " (3.05, -0.3),\n", | ||
| " (3.05, 6.7),\n", | ||
| " (2.8, 6.7),\n", | ||
| " (2.8, 0.0),\n", | ||
| " (0.4, 0.0),\n", | ||
| " (0.25, -0.15),\n", | ||
| " (0.25, -1.1),\n", | ||
| " ],\n", | ||
| " ],\n", | ||
| ")\n", | ||
| "\n", | ||
| "print(\"Valid before: \", pedpy.is_trajectory_valid(traj_data=trajectory_data, walkable_area=walk_area))\n", | ||
| "\n", | ||
| "valid_trajectory = pedpy.correct_invalid_trajectories(\n", | ||
| " trajectory_data=trajectory_data,\n", | ||
| " walkable_area=walk_area,\n", | ||
| " min_distance_obst=0.01,\n", | ||
| " max_distance_obst=0.05,\n", | ||
| " back_distance_obst=-0.5,\n", | ||
| " min_distance_wall=0.01,\n", | ||
| " max_distance_wall=0.05,\n", | ||
| " back_distance_wall=-0.5,\n", | ||
| ")\n", | ||
| "print(\"Valid after: \", pedpy.is_trajectory_valid(traj_data=valid_trajectory, walkable_area=walk_area))" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to include two plots, the not corrected and the corrected trajectories.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. I added two exemplary plots to give an idea of how the function modifies the trajectory. I will add a plotting function to the notebook, similar to the one used for outlier detection, when I include a list of modified person IDs in the return values. |
||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "12", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "The values for min_-/max_- and back_distance are chosen differentially for walls around the geometry and for obstacles within it.\n", | ||
| "\n", | ||
| "An example, how the function corrects invalid trajectories: The first plot shows the original invalid trajectory, the second plot the corrected one.\n", | ||
| "\n", | ||
| "\n", | ||
| "\n" | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": {}, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description above, it is not clear what the parameters do that you use in this example. Can you include parameters that you use in the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.