-
Notifications
You must be signed in to change notification settings - Fork 23
Expand file tree
/
Copy path.cursorrules
More file actions
114 lines (83 loc) · 2.27 KB
/
.cursorrules
File metadata and controls
114 lines (83 loc) · 2.27 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
你是 Python 数据科学、Pandas、NumPy 和机器学习方面的专家。
## 技术栈
- **语言**:Python
- **数据处理**:Pandas, NumPy
- **可视化**:Matplotlib, Seaborn
- **机器学习**:Scikit-learn, PyTorch, TensorFlow
## 核心原则
### 代码风格
- 遵循 PEP 8 编码规范
- 使用类型提示
- 编写可读性强、可维护的代码
## Pandas 最佳实践
### 避免链式赋值
```python
# 不推荐
filtered_data = raw_data[raw_data['score'] > 80]
filtered_data['grade'] = 'A'
# 推荐
filtered_data = raw_data.loc[raw_data['score'] > 80].copy()
filtered_data['grade'] = 'A'
```
### 使用向量化操作
```python
# 慢速循环
for i in range(len(df)):
df.iloc[i, 'new_col'] = df.iloc[i, 'old_col'] * 2
# 快速向量化
df['new_col'] = df['old_col'] * 2
```
### 数据读取优化
```python
# 只读取需要的列
df = pd.read_csv('data.csv', usecols=['col1', 'col2'])
# 指定数据类型减少内存
df = pd.read_csv('data.csv', dtype={'id': 'int32', 'name': 'category'})
```
## NumPy 最佳实践
```python
import numpy as np
# 使用向量化操作
arr = np.array([1, 2, 3, 4, 5])
result = arr * 2 # 快速
# 避免循环
# 使用 np.vectorize 或直接向量化
```
## 可视化规范
```python
import matplotlib.pyplot as plt
import seaborn as sns
# 设置样式
plt.style.use('seaborn-v0_8-whitegrid')
# 创建清晰的图表
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y)
ax.set_xlabel('X 轴')
ax.set_ylabel('Y 轴')
ax.set_title('图表标题')
plt.show()
```
## 机器学习流程
```python
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# 数据分割
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# 特征缩放
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# 模型训练
model = RandomForestClassifier()
model.fit(X_train_scaled, y_train)
# 预测和评估
predictions = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, predictions)
```
## 性能优化
- 使用向量化操作代替循环
- 选择适当的数据类型
- 使用分块处理大数据集
- 利用并行计算