- Variable Description
- Variable description for original data set
- Variable description of "tidydata.txt" datatable
- Steps taken to generate the "tidydata.txt" datatable
- R packages used
- Loading raw data into R
- Merging of raw data files
- Extraction of mean and standard deviation for each measurement
- Naming activities with descriptive activity names
- Adding appropriate labels
- Generating a second, independent tidy dataset containing average of each variable for each activity performed by each subject
- Writing data to text file
- About the raw data
- Description
- Citation
The following is copied from the file "features_info.txt" file present in the zipfile containing all raw data. The zipfile can be downloaded here: "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"
The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.
Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).
Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).
The following table summarises each variable of the feature vector for each pattern, whereby '-XYZ' is used to denot the 3-axail signals in the X, Y, and Z directions:
| Variable | Description |
|---|---|
| tBodyAcc-X | Acceleration of body in X direction |
| tBodyAcc-Y | Acceleration of body in Y direction |
| tBodyAcc-Z | Acceleration of body in Z direction |
| tGravityAcc-X | Acceleration of gravity in X direction |
| tGravityAcc-Y | Acceleration of gravity in Y direction |
| tGravityAcc-Z | Acceleration of gravity in Z direction |
| tBodyAccJerk-X | Body acceleration jerk for X direction |
| tBodyAccJerk-Y | Body acceleration jerk for Y direction |
| tBodyAccJerk-Z | Body acceleration jerk for Z direction |
| tBodyGyro-X | Body gyroscope measurement for X direction |
| tBodyGyro-Y | Body gyroscope measurement for Y direction |
| tBodyGyro-Z | Body gyroscope measurement for Z direction |
| tBodyGyroJerk-X | Jerk signal of body for X direction |
| tBodyGyroJerk-Y | Jerk signal of body for Y direction |
| tBodyGyroJerk-Z | Jerk signal of body for Z direction |
| tBodyAccMag | Magnitude of body acceleration |
| tGravityAccMag | Magnitutde of gravity acceleration |
| tBodyAccJerkMag | Magnitude of body acceleration jerk |
| tBodyGyroMag | Magnitude of body gyroscope measurement |
| tBodyGyroJerkMag | Magnitude of body gyroscope jerk measurement |
| fBodyAcc-X | Frequency of body acceleration for X direction |
| fBodyAcc-Y | Frequency of body acceleration for Y direction |
| fBodyAcc-Z | Frequency of body acceleration for Z direction |
| fBodyAccJerk-X | Frequency of body acceleration jerk for X direction |
| fBodyAccJerk-Y | Frequency of body acceleration jerk for Y direction |
| fBodyAccJerk-Z | Frequency of body acceleration jerk for Z direction |
| fBodyGyro-X | Frequency of body gyroscope measurement for X direction |
| fBodyGyro-Y | Frequency of body gyroscope measurement for Y direction |
| fBodyGyro-Z | Frequency of body gyroscope measurement for Z direction |
| fBodyAccMag | Frequency of body acceleration magnitude |
| fBodyAccJerkMag | Frequency of body acceleration jerk magnitude |
| fBodyGyroMag | Frequency of body magnitude gyroscope measurement |
| fBodyGyroJerkMag | Frequency of body magnitude gyroscope jerk measurement |
The following table lists the set of variables that were estimated from the signals:
| Variable | Description |
|---|---|
| mean() | Mean value |
| std() | Standard deviation |
| mad() | Median absolute deviation |
| max() | Largest value in array |
| min() | Smallest value in array |
| sma() | Signal magnitude area |
| energy() | Energy measure. Sum of the squares divided by the number of values. |
| iqr() | Interquartile range |
| entropy() | Signal entropy |
| arCoeff() | Autorregresion coefficients with Burg order equal to 4 |
| correlation() | correlation coefficient between two signals |
| maxInds() | index of the frequency component with largest magnitude |
| meanFreq() | Weighted average of the frequency components to obtain a mean frequency |
| skewness() | skewness of the frequency domain signal |
| kurtosis() | kurtosis of the frequency domain signal |
| bandsEnergy() | Energy of a frequency interval within the 64 bins of the FFT of each window. |
| angle() | Angle between to vectors. |
Additional vectors obtained by averaging the signals in a signal window sample. These are used on the angle() variable:
| Variable |
|---|
| gravityMean |
| tBodyAccMean |
| tBodyAccJerkMean |
| tBodyGyroMean |
| tBodyGyroJerkMean |
The complete list of variables of each feature vector is available in 'features.txt'
The following table provides a description for each column heading present in the "tidydata.txt" datatable
| Variable Name | Description |
|---|---|
| Subject_ID | Subject ID Number, range 1- 30 |
| Activity_Name | Activity Name; can take the terms: "Walking", "Walking Upstairs", "Walking Downstairs", "Sitting", "Standing", "Laying" |
| Average_Measurement_Value | Mean value measured for each activity name |
| Measurement_Type | Can take any of the 46 terms listed in the "measurement variable" table below |
The following table provides a description for each measurement variable present in the "tidydata.txt" datatable
| Variable Name | Description |
|---|---|
| Time_Domain_Body_Accelerometer_Mean_X | Mean acceleration of body in X direction |
| Time_Domain_Body_Accelerometer_Mean_Y | Mean acceleration of body in Y direction |
| Time_Domain_Body_Accelerometer_Mean_Z | Mean acceleration of body in Z direction |
| Time_Domain_Gravity_Accelerometer_Mean_X | Mean acceleration of gravity in X direction |
| Time_Domain_Gravity_Accelerometer_Mean_Y | Mean acceleration of gravity in Y direction |
| Time_Domain_Gravity_Accelerometer_Mean_Z | Mean acceleration of gravity in Z direction |
| Time_Domain_Body_Accelerometer_Jerk_Mean_X | Mean body acceleration jerk for X direction |
| Time_Domain_Body_Accelerometer_Jerk_Mean_Y | Mean body acceleration jerk for Y direction |
| Time_Domain_Body_Accelerometer_Jerk_Mean_Z | Mean body acceleration jerk for Z direction |
| Time_Domain_Body_Gyroscope_Mean_X | Mean body gyroscope measurement for X direction |
| Time_Domain_Body_Gyroscope_Mean_Y | Mean body gyroscope measurement for Y direction |
| Time_Domain_Body_Gyroscope_Mean_Z | Mean body gyroscope measurement for Z direction |
| Time_Domain_Body_Gyroscope_Jerk_Mean_X | Mean jerk signal of body for X direction |
| Time_Domain_Body_Gyroscope_Jerk_Mean_Y | Mean jerk signal of body for Y direction |
| Time_Domain_Body_Gyroscope_Jerk_Mean_Z | Mean jerk signal of body for Z direction |
| Time_Domain_Body_Accelerometer_Magnitude_Mean | Mean magnitude of body acceleration |
| Time_Domain_Gravity_Accelerometer_Magnitude_Mean | Mean magnitutde of gravity acceleration |
| Time_Domain_Body_Accelerometer_Jerk_Magnitude_Mean | Mean magnitude of body gyroscope jerk measurement |
| Time_Domain_Body_Gyroscope_Magnitude_Mean | Mean magnitude of body gyroscope measurement |
| Time_Domain_Body_Gyroscope_Jerk_Magnitude_Mean | Mean magnitude of body gyroscope jerk measurement |
| Frequency_Domain_Body_Accelerometer_Mean_X | Mean frequency body acceleration for X direction |
| Frequency_Domain_Body_Accelerometer_Mean_Y | Mean frequency body acceleration for Y direction |
| Frequency_Domain_Body_Accelerometer_Mean_Z | Mean frequency body acceleration for Z direction |
| Frequency_Domain_Body_Accelerometer_Mean_Frequency_X | Mean frequency of body acceleration for X direction frequency |
| Frequency_Domain_Body_Accelerometer_Mean_Frequency_Y | Mean frequency of body acceleration for Y direction frequency |
| Frequency_Domain_Body_Accelerometer_Mean_Frequency_Z | Mean frequency of body acceleration for Z direction frequency |
| Frequency_Domain_Body_Accelerometer_Jerk_Mean_X | Mean frequency of body acceleration jerk for X direction |
| Frequency_Domain_Body_Accelerometer_Jerk_Mean_Y | Mean frequency of body acceleration jerk for Y direction |
| Frequency_Domain_Body_Accelerometer_Jerk_Mean_Z | Mean frequency of body acceleration jerk for Z direction |
| Frequency_Domain_Body_Accelerometer_Jerk_Mean_Frequency_X | Mean frequency of body acceleration jerk for X direction frequency |
| Frequency_Domain_Body_Accelerometer_Jerk_Mean_Frequency_Y | Mean frequency of body acceleration jerk for Y direction frequency |
| Frequency_Domain_Body_Accelerometer_Jerk_Mean_Frequency_Z | Mean frequency of body acceleration jerk for Z direction frequency |
| Frequency_Domain_Body_Gyroscope_Mean_X | Mean frequency of body gyroscope measurement for X direction |
| Frequency_Domain_Body_Gyroscope_Mean_Y | Mean frequency of body gyroscope measurement for X direction |
| Frequency_Domain_Body_Gyroscope_Mean_Z | Mean frequency of body gyroscope measurement for X direction |
| Frequency_Domain_Body_Gyroscope_Mean_Frequency_X | Mean frequency of body gyroscope measurement for X direction frequency |
| Frequency_Domain_Body_Gyroscope_Mean_Frequency_Y | Mean frequency of body gyroscope measurement for Y direction frequency |
| Frequency_Domain_Body_Gyroscope_Mean_Frequency_Z | Mean frequency of body gyroscope measurement for Z direction frequency |
| Frequency_Domain_Body_Accelerometer_Magnitude_Mean | Mean frequency of body acceleration magnitude |
| Frequency_Domain_Body_Accelerometer_Magnitude_Mean_Frequency | Mean frequency of body acceleration magnitude frequency |
| Frequency_Domain_Body_Accelerometer_Jerk_Magnitude_Mean | Mean frequency of body acceleration jerk magnitude |
| Frequency_Domain_Body_Accelerometer_Jerk_Magnitude_Mean_Frequency | Mean frequency of body acceleration jerk magnitude mean |
| Frequency_Domain_Body_Gyroscope_Magnitude_Mean | Mean frequency of body gyoscope magnitude |
| Frequency_Domain_Body_Gyroscope_Magnitude_Mean_Frequency | Mean frequency of body gyoscope magnitude frequency |
| Frequency_Domain_Body_Gyroscope_Jerk_Magnitude_Mean | Mean frequency of body gyoscope jerk magnitude |
| Frequency_Domain_Body_Gyroscope_Jerk_Magnitude_Mean_Frequency | Mean frequency of body gyoscope jerk magnitude frequency |
The following R packages were used for this assignment:
"tidyverse", package version 1.3.0
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
"stringr", package version 1.4.0
Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. https://CRAN.R-project.org/package=stringr
"plyr", package version 1.8.6
Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2019). dplyr: A Grammar of Data Manipulation. R package version 0.8.3. https://CRAN.R-project.org/package=dplyr
Using download.file(), the zipfile containing the raw data was downloaded and directly saved in the appropriate directory (working directory). The following link was used to download the zipfile:
"https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip"
Using unzip(), the zipfile was unzipped and its contents were directly saved in the appropriate directory (working directory)
1. Reading in the data
Using read.table(), the relevant files were read into R. The table below lists the original file name, the R object name that each was assigned to upon reading it into R, as well as the number of observations and variables each contains.
| File name | R Object Name | No. Observations | No. Variables | Description |
|---|---|---|---|---|
| "features.txt" | features | 561 | 2 | List of all features |
| "subject_test.txt" | subject_test | 2947 | 1 | Each row identifies the subject who performed the activity for each window sample. Its range is from 2 to 24. |
| "X_test.txt" | x_test | 2947 | 561 | Test set |
| "y_test.txt" | y_test | 2947 | 1 | Test labels |
| "subject_train.txt" | subject_train | 7352 | 1 | Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30. |
| "X_train.txt" | x_test | 7352 | 561 | Training set |
| "y_train.txt" | y_train | 7352 | 1 | Training labels |
Using rename() and rename_all(), the columns of the six test and train files were appropriately renamed.
2. Merging the data
The three "test" and the three "train" files were merged using cbind() to generate the dataframes "testmerged" and "trainmerged", respectively. These two dataframes were then merged together using rbind() to generate the dataframe "alldata". This table below lists these dataframes, as well as the number of observations and variables each contains.
| R Object Name | No. Observations | No. Variables |
|---|---|---|
| testmerged | 2947 | 563 |
| trainmerged | 7352 | 563 |
| alldata | 10299 | 563 |
Using grepl(), the dataframe "alldata" was subsetted to include only columns which had the terms "mean" and "std" in their name. A vector named "names_subset" containing the terms "mean" and "std" was created to faciliate this subsetting step. The resulting dataframe was called "alldata_subset". A second subsetted dataframe was generated from the first and second columns of the "alldata" dataframe called "ID_type_subset". These columns contain the "activity_ID" and the "activity_type". Using cbind(), "ID_type_subset" and "alldata_subset" were merged to generate "alldata_meanstd", the dataframe containing only the mean measurements and the standard deviation of each measurement. This table below lists these objects, as well as the number of observations and variables each contains.
| R Object Name | No. Observations | No. Variables |
|---|---|---|
| alldata_subset | 10299 | 46 |
| ID_type_subset | 10299 | 2 |
| alldata_meanstd | 10299 | 48 |
The data for the activity labels was read in using read.table() and the columns were appropriately renamed using rename() to "activity_ID" and "activity_name". The dataframe "alldata_meanstd" was merged with the "activitylabs" by the shared column "activity_ID" the using merge(). This resulted in the dataframe "alldata_meanstd_actnames". The columns of this dataframe were then rearranged so that the column "activity_name" was placed in second position. This table below lists these objects, as well as the number of observations and variables each contains.
| R Object Name | No. Observations | No. Variables |
|---|---|---|
| acivitylabs | 6 | 2 |
| alldata_meanstd_actnames | 10299 | 49 |
The vector "columnnames" was created using colnames() on the dataframe "alldata_meanstd_actnames" in order to extract all all column names. Using str_replace_all(), a new vector called "columnnames2" was created based off of the contents of the vector "columnnames". The following table summarises the changes performed by str_replace_all():
| Pattern | Replacement |
|---|---|
| "Acc" | "Accelerometer_" |
| "Gyro" | "Gyroscope_" |
| "Mag" | "Magnitude_" |
| "Freq" | "Frequency" |
| "mean()" | "Mean" |
| "std" | "Standard_Deviation" |
| "BodyBody" | "Body_" |
| "Body" | "Body_" |
| "Jerk" | "Jerk_" |
| "Gravity" | "Gravity_" |
| "X" | "_X" |
| "Y" | "_Y" |
| "Z" | "_Z" |
| "MeanFrequency" | "Mean_Frequency" |
| "Body__" | "Body_" |
| "activity_ID" | "Activity_ID" |
| "activity_name" | "Activity_Name" |
| "subject_ID" | "Subject_ID" |
| "^t" | "Time_Domain_" |
| "^f" | "Frequency_Domain_" |
| "[\(\)-]" | "" |
The vector "columnnames2" was then used to replace the original column names in the dataframe "alldata_meanstd_actnames".
Generating a second, independent tidy dataset containing average of each variable for each activity performed by each subject
Using group_by(), the "alldata_meanstd_actnames" dataframe was grouped first by the variable "Subject_ID" and then by the variable "Activity_Name". Using summarise_all(), the mean of each variable for each activity performed by each subject was calculated. After ungrouping the dataframe using ungroup(), the rows were arranged by the variable "Subject_ID" and then by the variable "Activity_ID" using arrange(). The resulting dataframe was named "alldata_tidy", and the table below lists the number of observations and variables it contains.
| R Object Name | No. Observations | No. Variables |
|---|---|---|
| alldata_tidy | 180 | 49 |
The definition of tidy data as stated on the cran.r-project.org website is as follows:
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table.
(https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html, accessed 25.02.2021)
To this end, the datatable "alldata_tidy" was reorganised into the tidy format.
A vector "subID" containing the numbers 1 to 30, each repeated 276 times was created using rep() (there are 30 "Subject_ID"s, each of which performs 6 activities, and for each activity 46 measurements are taken; this means, there are 6x46 = 276 measurements for each "Subject_ID". Similarly, a vector "actID" containing the 6 "activity_name" terms from the "activitylabs" table, where each individual term was repeated 46 times, and the whole series was repeated 30 times, was created using rep() (there are 46 measurements for each of the 6 terms "Walking", "Walking Upstairs", "Walking Downstairs", "Sitting", "Standing", and "Laying" for each of the 30 subjects). The table below summaries the dimensions of these two vectors.
| R Object Name | Dimensions |
|---|---|
| subID | [1:8280] |
| actID | [1:8280] |
A vector "measurements" was created containing the names of the columns 4:49 from the "alldata_meanstd_actnames" dataframe using colnames(). This vector was repeated 180 times using rep() and converted to a dataframe using as.data.frame(). Subsequently, the vectors "subID" and "actID" and the dataframe "measurements" were merged together using cbind() to create the dataframe "tidy1", the columns of which were renamed using rename(); "subID", "actID" and "measurements" were renamed to "Subject_ID", "Activity_ID", and "Measurement_Type", respectively. The dataframe "alldata_tidy2" was created by transposing the dataframe "alldata_tidy" using t(). The first three rows of "alldata_tidy2" were then removed, before the entire dataframe was unlisted using unlist() to rearrange all measurement values into a single column. The table below summaries the dimensions of the dataframes.
| R Object Name | No. Observations | No. Variables |
|---|---|---|
| measurements | 8280 | 1 |
| tidy1 | 8280 | 3 |
| alldata_tidy2 | 8280 | 1 |
The final dataframe "finaldata_tidy" was created by merging the columns from the dataframes "tidy1" with "alldata_tidy2" together using cbind(). The table below summaries the dimensions of this final, tidy, dataframe.
| R Object Name | No. Observations | No. Variables |
|---|---|---|
| finaldata_tidy | 8280 | 4 |
Using the command write.table(), the final tidy datatable "finaldata_tidy" was saved in the appropriate directory as a "txt" file.
A description of the "Human Activity Recognition Using Smartphones Data Set" can be found here:
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
The raw data can be downloaded via this link:
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012