We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 8a58967 commit 9a99e60Copy full SHA for 9a99e60
1 file changed
data_cleaning.md
@@ -0,0 +1,24 @@
1
+## Data Cleaning Best Practices
2
+# Data Cleaning Best Practices
3
+
4
+- Remove duplicate rows to avoid data leakage.
5
+- Standardize column names (lowercase, underscores).
6
+- Handle missing values using median/mean or domain logic.
7
+- Convert date columns to proper datetime format.
8
+- Validate data types before modeling.
9
10
+## Python Example
11
12
+import pandas as pd
13
14
+df = pd.read_csv("data.csv")
15
16
+df = df.drop_duplicates()
17
+df.columns = [c.lower().replace(" ", "_") for c in df.columns]
18
19
+num_cols = df.select_dtypes(include="number").columns
20
+df[num_cols] = df[num_cols].fillna(df[num_cols].median())
21
22
+if "date" in df.columns:
23
+ df["date"] = pd.to_datetime(df["date"])
24
0 commit comments