Skip to content

JaneYakovleva/getting-and-cleaning-data-coursera

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

getting-and-cleaning-data-coursera

This repository created according to peer assessments for "Getting and Cleaning Data Course"


##About project

The purpose of this project is to demonstrate ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis.

The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones


##Files descriotion

'UCI HAR Dataset' directory with raw data

README.md - provides information about project and description of script workflow (all data transformation)

run_analysis.R - file with code for performing analysis

result.txt - result data

codebook.md - вescription of variables in the result data (original variables (features) description can be find in 'UCI HAR Dataset' directory)


##Script description run_analysis.R - code for performing analysis.

First of all, the script loads and setups necessary environment and variables. Assuming r working directory sets to project root directory.

obtainData - additional function for downloading and unzipping data from the Web. If directory already exist, downloading will be skipped. Takes as input the target URL and name of directory for storing data.

getFilePath - function for generating path from sequence of directories. Takes as input an ordered sequence of directories.

readData - wrapper for reading data from file to data frame. Takes as input the path to data assuming datasetDirectory already include in path.

The processing starts with the reading data sets and the merging it together. Column names are specified in accordance to "features.txt" file. Result stored in inputData.

Data frame meanAndStdMeas storing only the measurements on the mean and standard deviation for each measurement from inputData.

Next step is reading output signals - activity, - and setting it to data frame.

Then the script edits the column names replacing the reduction and noisy literals. Script also reads the subject data and adds it in the data frame.

Finally, for more convenience, meanAndStdMeas storing as data.table into firstTidyDataSet. Then the average of each variable for each activity and each subject is calculated and stored in secondTidyDataSet.

Last step is creating comma separated text file named "result.txt" which contains secondTidyDataSet.

The file can be found in the repository near run_analysis.R scripts.

About

Course project for https://www.coursera.org/course/getdata

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages