You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/data-manipulation/reading/aggregation/_index.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ This reading, and following readings, will provide examples from the `titanic.cs
11
11
12
12
## Groupby
13
13
14
-
The `.groupby()` function groups data together from one or more columns. As we group the data together, it forms a new **GroupBy** object. The offical[pandas documenation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) states that a "group by" accomplshes the following:
14
+
The `.groupby()` function groups data together from one or more columns. As we group the data together, it forms a new **GroupBy** object. The official[pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) states that a "group by" accomplishes the following:
15
15
1. Splitting: Split the data based on the criteria provided.
16
16
1. Applying: Provide an applicable function to the groups that were split.
17
17
1. Combining: Combine the results from the function into a new data structure.
Applying an aggregate function to multipled grouped columns can also be accomplished with method chaining. The following image uses columns from the titanic dataset as an example.
46
+
Applying an aggregate function to multiple grouped columns can also be accomplished with method chaining. The following image uses columns from the titanic dataset as an example.
47
47
48
48

Note that the `mode()` function can return multiple values per column if there are multiple modes (values that appear with equal frequency). This may result in a DataFrame with more rows than expected. If you need only one mode value, you may want to use `mode()[0]` or apply mode to specific columns individually.
70
+
{{% /notice %}}
71
+
68
72
### Aggregation Using a Dictionary
69
73
70
74
pandas also allows the ability to provide a dictionary with columns as a key and aggregate functions as an associated value.
This dictionary object has now become a tempate for the aggregations we want to preform. However, on it's own, it does nothing. Once passed to the agg() method, it will pick out the specific location of data we want to examine. Making a subset table.
86
+
This dictionary object has now become a template for the aggregations we want to perform. However, on it's own, it does nothing. Once passed to the agg() method, it will pick out the specific location of data we want to examine. Making a subset table.
Creating a function to aggregate data or create new columns is another common practice used when analyzing data. Pandas utilizes the `.apply()` method to execute a function on a pandas Series or DataFrame.
47
47
48
48
{{% notice blue Example "rocket" %}}
49
-
Suppose you wanted to know how many survivors under the age of 20 are still alive from the titanic dataset:
49
+
Suppose you wanted to know how many survivors age 20 and under are still alive from the titanic dataset:
Note in the output image above the inclusion of the `axis` parameter when printing the dataframe a second time. The axis parameter specifies that the two DataFrames should be joined along the columns instead of rows, providing a cleaner dataset.
49
49
{{% /notice %}}
50
50
51
-
In the lesson on exploring data with python we covered how to create a DataFrame using the `.concat()` method by providing two Series as parameters. The `.concat` function can alse be used to add a Series within an existing DataFrame!
51
+
In the lesson on exploring data with python we covered how to create a DataFrame using the `.concat()` method by providing two Series as parameters. The `.concat` function can also be used to add a Series within an existing DataFrame!
The `.merge()` function is used to combine two DataFrames based on common columns or indices, similar to SQL joins. Unlike `.concat()` which simply stacks DataFrames, `.merge()` intelligently combines rows based on matching values in specified columns.
70
+
71
+
### Common Merge Types
72
+
73
+
There are four main types of merges:
74
+
1.`inner`: Returns only rows with matching values in both DataFrames (default)
75
+
1.`left`: Returns all rows from the left DataFrame and matching rows from the right
76
+
1.`right`: Returns all rows from the right DataFrame and matching rows from the left
77
+
1.`outer`: Returns all rows from both DataFrames, filling in missing values with NaN
The `inner` merge will return only the 3 passengers (IDs 1, 2, 3) that exist in both DataFrames. The `left` merge will return all 4 passengers, with NaN values for the ticket information of passenger 4 who doesn't have a ticket.
110
+
{{% /notice %}}
111
+
112
+
### Merging on Multiple Columns
113
+
114
+
You can merge on multiple columns by passing a list to the `on` parameter:
0 commit comments