@@ -99,7 +99,8 @@ str(gapminder)
9999 $ gdpPercap: num 779 821 853 836 740 ...
100100```
101101
102- We can also examine individual columns of the data frame with our ` class ` function:
102+ We can also examine individual columns of the data frame with the ` class ` or
103+ 'typeof' functions:
103104
104105
105106``` r
@@ -110,6 +111,14 @@ class(gapminder$year)
110111[1] "integer"
111112```
112113
114+ ``` r
115+ typeof(gapminder $ year )
116+ ```
117+
118+ ``` output
119+ [1] "integer"
120+ ```
121+
113122``` r
114123class(gapminder $ country )
115124```
@@ -400,6 +409,131 @@ tail(gapminder_norway)
400409```
401410
402411
412+
413+ ## Removing columns and rows in data frames
414+
415+ To remove columns from a data frame, we can use the 'subset' function.
416+ This function allows us to remove columns using their names.
417+ If we want to keep all columns except continent, pop and gdpPercap we can use the following ` subset ` command:
418+
419+
420+ ``` r
421+ life_expectancy <- subset(gapminder , select = - c(continent , pop , gdpPercap ))
422+ head(life_expectancy )
423+ ```
424+
425+ ``` output
426+ country year lifeExp below_average
427+ 1 Afghanistan 1952 28.801 TRUE
428+ 2 Afghanistan 1957 30.332 TRUE
429+ 3 Afghanistan 1962 31.997 TRUE
430+ 4 Afghanistan 1967 34.020 TRUE
431+ 5 Afghanistan 1972 36.088 TRUE
432+ 6 Afghanistan 1977 38.438 TRUE
433+ ```
434+
435+ We can also use a logical vector to achieve the same result. Make sure the
436+ vector's length matches the number of columns in the data frame (to avoid R
437+ repeating the shorter vector to match the length of the longer vector, called
438+ "vector recycling"):
439+
440+
441+ ``` r
442+ life_expectancy <- gapminder [c(TRUE , TRUE , FALSE , FALSE , TRUE , FALSE )]
443+ head(life_expectancy )
444+ ```
445+
446+ ``` output
447+ country year lifeExp below_average
448+ 1 Afghanistan 1952 28.801 TRUE
449+ 2 Afghanistan 1957 30.332 TRUE
450+ 3 Afghanistan 1962 31.997 TRUE
451+ 4 Afghanistan 1967 34.020 TRUE
452+ 5 Afghanistan 1972 36.088 TRUE
453+ 6 Afghanistan 1977 38.438 TRUE
454+ ```
455+
456+ :::::: spoiler
457+
458+ ### Vector Recycling
459+
460+ Vector recycling occurs when working with vectors of different length and it
461+ consist of repeating the elements of the shorter vector up to the length of
462+ the larger one. For more information, check the book R for Data Science and its
463+ [ chapter about vectors] ( https://r4ds.had.co.nz/vectors.html#scalars-and-recycling-rules ) .
464+ ::::::::
465+
466+ Alternatively, we can use column positions:
467+
468+
469+ ``` r
470+ life_expectancy <- gapminder [- c(3 , 4 , 6 )]
471+ head(life_expectancy )
472+ ```
473+
474+ ``` output
475+ country year lifeExp below_average
476+ 1 Afghanistan 1952 28.801 TRUE
477+ 2 Afghanistan 1957 30.332 TRUE
478+ 3 Afghanistan 1962 31.997 TRUE
479+ 4 Afghanistan 1967 34.020 TRUE
480+ 5 Afghanistan 1972 36.088 TRUE
481+ 6 Afghanistan 1977 38.438 TRUE
482+ ```
483+
484+ Note that typically we select the rows we want to keep, rather than removing rows we do not want in the data.
485+ However, to remove rows from a data frame, we can use their positions.
486+ To practice on a smaller subset, we will filter the data to only those entries from Afghanistan after the year 2000.
487+ This smaller dataset will be easier for us to inspect by eye and see the changes we are making.
488+
489+
490+ ``` r
491+ # Filter data for Afghanistan during the 20th century:
492+ afghanistan_20c <- gapminder [gapminder $ country == " Afghanistan" &
493+ gapminder $ year > 2000 , ]
494+
495+ # Now remove data for 2002, that is, the first row:
496+ afghanistan_20c [- 1 , ]
497+ ```
498+
499+ ``` output
500+ country year pop continent lifeExp gdpPercap below_average
501+ 12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
502+ ```
503+
504+
505+ In research, we often remove rows based on features of the data itself, rather than its location.
506+ For example, you may want to remove all the missing data prior to an analysis. Let's first add some missing values (NAs) into the data and then we can use ` na.omit() ` to remove them.
507+
508+
509+ ``` r
510+ # Turn some values into NAs:
511+ afghanistan_20c <- gapminder [gapminder $ country == " Afghanistan" , ]
512+ afghanistan_20c [afghanistan_20c $ year < 2007 , " year" ] <- NA
513+ head(afghanistan_20c )
514+ ```
515+
516+ ``` output
517+ country year pop continent lifeExp gdpPercap below_average
518+ 1 Afghanistan NA 8425333 Asia 28.801 779.4453 TRUE
519+ 2 Afghanistan NA 9240934 Asia 30.332 820.8530 TRUE
520+ 3 Afghanistan NA 10267083 Asia 31.997 853.1007 TRUE
521+ 4 Afghanistan NA 11537966 Asia 34.020 836.1971 TRUE
522+ 5 Afghanistan NA 13079460 Asia 36.088 739.9811 TRUE
523+ 6 Afghanistan NA 14880372 Asia 38.438 786.1134 TRUE
524+ ```
525+
526+ ``` r
527+ # Remove NAs
528+ na.omit(afghanistan_20c )
529+ ```
530+
531+ ``` output
532+ country year pop continent lifeExp gdpPercap below_average
533+ 12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
534+ ```
535+
536+
403537## Factors
404538
405539Here is another thing to look out for: in a ` factor ` , each different value
0 commit comments