Skip to content

Commit 97eb2e3

Browse files
authored
Sample code for the article on YData (#743)
* Sample code for the article on YData * Linter issues
1 parent ac9079d commit 97eb2e3

File tree

7 files changed

+10067
-0
lines changed

7 files changed

+10067
-0
lines changed

ydata-profiling/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Automate Python Data Analysis With YData Profiling
2+
3+
This folder provides the code examples for the Real Python tutorial [Automate Python Data Analysis With YData Profiling](https://realpython.com/ydata-profiling-eda/).
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import pandas as pd
2+
from ydata_profiling import ProfileReport
3+
4+
df = pd.read_csv("flight_data_2024_sample.csv")
5+
6+
# Split into flights originating from LAX and ATL
7+
df_lax = df[df["origin"] == "LAX"]
8+
df_atl = df[df["origin"] == "ATL"]
9+
10+
lax_profile = ProfileReport(df_lax, title="LAX Flights")
11+
atl_profile = ProfileReport(df_atl, title="ATL Flights")
12+
13+
comparison = lax_profile.compare(atl_profile)
14+
comparison.to_file("airport_comparison.html")

ydata-profiling/flight_data_2024_sample.csv

Lines changed: 10001 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
import pandas as pd
2+
from ydata_profiling import ProfileReport
3+
4+
df = pd.read_csv("flight_data_2024_sample.csv")
5+
6+
profile = ProfileReport(df)
7+
profile.to_file("flight_report.html")
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import pandas as pd
2+
from ydata_profiling import ProfileReport
3+
4+
df = pd.read_csv("flight_data_2024_sample.csv")
5+
6+
profile = ProfileReport(
7+
df,
8+
variables={
9+
"descriptions": {
10+
"origin": "Airport code where the flight originated",
11+
"dest": "Airport code of flight destination",
12+
"crs_dep_time": "Scheduled departure time at origin (hhmm)",
13+
}
14+
},
15+
)
16+
profile.to_file("documented_report.html")
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import pandas as pd
2+
from ydata_profiling import ProfileReport
3+
4+
df = pd.read_csv("flight_data_2024_sample.csv")
5+
6+
# Option 1: Generate a minimal report
7+
profile = ProfileReport(df, minimal=True)
8+
profile.to_file("minimal_report.html")
9+
10+
# Option 2: Sample your data before profiling
11+
df_sample = df.sample(n=10000, random_state=42)
12+
profile = ProfileReport(df_sample)
13+
profile.to_file("sampled_report.html")
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import pandas as pd
2+
from ydata_profiling import ProfileReport
3+
4+
df = pd.read_csv("flight_data_2024_sample.csv")
5+
df["fl_date"] = pd.to_datetime(df["fl_date"])
6+
7+
profile = ProfileReport(
8+
df,
9+
title="Flight Delay Report",
10+
tsmode=True,
11+
sortby="fl_date",
12+
)
13+
profile.to_file("flight_timeseries_report.html")

0 commit comments

Comments
 (0)