Skip to content

Commit a17250f

Browse files
author
Sunser
committed
Add traffic threshold stop controls
1 parent c4e0a23 commit a17250f

13 files changed

Lines changed: 456 additions & 37 deletions

File tree

.env.example

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ EC_MAX_CONCURRENCY=4
1919
# [可选] 流量告警阈值百分比;例如 95 表示达到 95% 后告警或转人工。
2020
EC_TRAFFIC_WARNING_PERCENT=95
2121

22+
# [可选] 流量超阈值处置:notify_only 只告警;notify_and_stop 告警并关机匹配保活目标的运行中实例。
23+
EC_TRAFFIC_EXCEEDED_ACTION=notify_only
24+
2225
# [可选] 日志级别,可填 debug、info、warn、error。
2326
EC_LOG_LEVEL=info
2427

@@ -37,8 +40,11 @@ EC_WECHAT_AGENTID=0
3740
# [可选] 企业微信接收人;单人 user1,多人 user1,user2。
3841
EC_WECHAT_TOUSER=
3942

40-
# [可选] 通知事件;可填 auto_start,manual_start,manual_stop,manual_required,traffic_exceeded,error 或 all。
41-
EC_NOTIFY_EVENTS=auto_start,manual_start,manual_stop,manual_required,traffic_exceeded,error
43+
# [可选] 通知事件;可填 auto_start,manual_start,manual_stop,manual_required,traffic_exceeded,traffic_stop,error 或 all。
44+
EC_NOTIFY_EVENTS=auto_start,manual_start,manual_stop,manual_required,traffic_exceeded,traffic_stop,error
45+
46+
# [可选] 等待人工决策通知间隔;同一实例同一原因在该间隔内只通知一次,支持 30m、1h 这类 Go duration。
47+
EC_MANUAL_REQUIRED_NOTIFY_INTERVAL=1h
4248

4349
# [可选] 是否启用后台自动保活。
4450
EC_KEEP_ALIVE_ENABLED=true

README.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ http://你的服务器IP:43210
145145
| 变量 | 必填 | 默认值 | 说明 |
146146
| --- | --- | --- | --- |
147147
| `EC_TRAFFIC_WARNING_PERCENT` || `95` | 流量告警阈值百分比。例如 `95` 表示达到额度的 95% 后告警或触发对应保活策略。 |
148+
| `EC_TRAFFIC_EXCEEDED_ACTION` || `notify_only` | 运行中实例所属流量分区超过阈值后的处置动作,支持 `notify_only``notify_and_stop`|
148149

149150
账号流量按两个额度池计算:
150151

@@ -153,6 +154,15 @@ http://你的服务器IP:43210
153154

154155
额度池是账号级共享的,不是单台实例独占的。同一账号下多个实例在同一个分区内会共享对应额度。
155156

157+
`EC_TRAFFIC_EXCEEDED_ACTION` 可选值:
158+
159+
|| 说明 |
160+
| --- | --- |
161+
| `notify_only` | 只发送流量告警,不主动关机。 |
162+
| `notify_and_stop` | 发送流量告警,并对当前保活目标范围内、所属流量分区已经超过阈值的运行中实例提交关机。 |
163+
164+
`notify_and_stop` 不会写入手工暂停状态。也就是说,如果次月流量低于阈值,停机实例仍可按保活策略自动恢复。`notify_and_stop` 不能和 `EC_TRAFFIC_POLICY=ignore_limit` 同时使用,避免出现“超阈值关机”和“忽略阈值启动”互相打架。
165+
156166
### 保活参数
157167

158168
| 变量 | 必填 | 默认值 | 说明 |
@@ -199,7 +209,8 @@ http://你的服务器IP:43210
199209
| `EC_WECHAT_CORPSECRET` | 启用通知时必填 || 企业微信自建应用 Secret。 |
200210
| `EC_WECHAT_AGENTID` | 启用通知时必填 | `0` | 企业微信自建应用 AgentId。 |
201211
| `EC_WECHAT_TOUSER` | 启用通知时必填 || 接收人。单人写 `user1`,多人写 `user1,user2`|
202-
| `EC_NOTIFY_EVENTS` || `auto_start`<br>`manual_start`<br>`manual_stop`<br>`manual_required`<br>`traffic_exceeded`<br>`error` | 通知事件列表,多个用逗号分隔。 |
212+
| `EC_NOTIFY_EVENTS` || `auto_start`<br>`manual_start`<br>`manual_stop`<br>`manual_required`<br>`traffic_exceeded`<br>`traffic_stop`<br>`error` | 通知事件列表,多个用逗号分隔。 |
213+
| `EC_MANUAL_REQUIRED_NOTIFY_INTERVAL` || `1h` | 等待人工决策通知间隔。同一实例同一原因在该间隔内只通知一次,避免无人处理时反复提醒。 |
203214

204215
通知使用企业微信自建应用文本消息,不使用群机器人 webhook。
205216

@@ -212,6 +223,7 @@ http://你的服务器IP:43210
212223
| `manual_stop` | 页面手工关机。 |
213224
| `manual_required` | 流量超阈值或流量未知,需要人工决策。 |
214225
| `traffic_exceeded` | 账号某个流量额度池达到告警阈值。 |
226+
| `traffic_stop` | 运行中实例因超阈值处置动作被提交关机。 |
215227
| `error` | 阿里云接口、启动、关机或通知发送失败。 |
216228
| `all` | 发送全部事件。 |
217229

@@ -301,14 +313,16 @@ EC_PASSWORD=change-me-to-a-long-random-password
301313
EC_REGION_REFRESH_INTERVAL=24h
302314
EC_MAX_CONCURRENCY=4
303315
EC_TRAFFIC_WARNING_PERCENT=95
316+
EC_TRAFFIC_EXCEEDED_ACTION=notify_only
304317
EC_LOG_LEVEL=info
305318
306319
EC_NOTIFY_ENABLED=false
307320
EC_WECHAT_CORPID=
308321
EC_WECHAT_CORPSECRET=
309322
EC_WECHAT_AGENTID=0
310323
EC_WECHAT_TOUSER=
311-
EC_NOTIFY_EVENTS=auto_start,manual_start,manual_stop,manual_required,traffic_exceeded,error
324+
EC_NOTIFY_EVENTS=auto_start,manual_start,manual_stop,manual_required,traffic_exceeded,traffic_stop,error
325+
EC_MANUAL_REQUIRED_NOTIFY_INTERVAL=1h
312326
313327
EC_KEEP_ALIVE_ENABLED=true
314328
EC_KEEP_ALIVE_TARGET=spot_only
@@ -417,7 +431,7 @@ Web 设置页只能修改非密钥项,包括:
417431
| `ecs:DescribeInstanceStatus` | 预留给实例状态查询。 |
418432
| `ecs:DescribeNetworkInterfaces` | 读取网卡 IPv6 地址。 |
419433
| `ecs:StartInstance` | 后台保活和页面手工启动。 |
420-
| `ecs:StopInstance` | 页面手工关机。 |
434+
| `ecs:StopInstance` | 页面手工关机,以及启用 `notify_and_stop` 后的流量保护关机|
421435
| `cms:QueryMetricList` | 读取云监控指标,用于估算实例本月流量。 |
422436
| `cdt:ListCdtInternetTraffic` | 读取账号 CDT 流量,用于账号级流量额度和保活阈值判断。 |
423437

@@ -470,7 +484,7 @@ environment:
470484

471485
```text
472486
2026-05-19 05:36:28 [INFO] refresh finished accounts=1 duration=16.5s errors=0 instances=3
473-
2026-05-19 05:36:28 [INFO] keepalive check finished checked=3 manual_required=0 skipped=3 starts=0
487+
2026-05-19 05:36:28 [INFO] keepalive check finished checked=3 manual_required=0 skipped=3 starts=0 traffic_stops=0
474488
2026-05-19 05:36:18 [DEBUG] traffic cms instance traffic loaded account=Huhu instance=i-xxx region=cn-hangzhou used=0.11GB
475489
```
476490

internal/config/config.go

Lines changed: 43 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -38,19 +38,21 @@ type DiscoveryConfig struct {
3838

3939
type TrafficConfig struct {
4040
WarningPercent float64
41+
ExceededAction string
4142
}
4243

4344
type LoggingConfig struct {
4445
Level string
4546
}
4647

4748
type NotificationConfig struct {
48-
Enabled bool
49-
WeChatCorpID string
50-
WeChatCorpSecret string
51-
WeChatAgentID int
52-
WeChatToUser []string
53-
NotifyEvents []string
49+
Enabled bool
50+
WeChatCorpID string
51+
WeChatCorpSecret string
52+
WeChatAgentID int
53+
WeChatToUser []string
54+
NotifyEvents []string
55+
ManualRequiredNotifyInterval time.Duration
5456
}
5557

5658
type KeepAliveConfig struct {
@@ -207,10 +209,11 @@ func defaultConfig() Config {
207209
RegionRefreshInterval: 24 * time.Hour,
208210
MaxConcurrency: 4,
209211
},
210-
Traffic: TrafficConfig{WarningPercent: 95},
212+
Traffic: TrafficConfig{WarningPercent: 95, ExceededAction: "notify_only"},
211213
Logging: LoggingConfig{Level: "info"},
212214
Notification: NotificationConfig{
213-
NotifyEvents: []string{"auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "error"},
215+
NotifyEvents: []string{"auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "traffic_stop", "error"},
216+
ManualRequiredNotifyInterval: time.Hour,
214217
},
215218
KeepAlive: KeepAliveConfig{
216219
Enabled: true,
@@ -278,6 +281,8 @@ func applyTraffic(cfg *TrafficConfig, key, value string) error {
278281
return fmt.Errorf("warning_percent 必须是数字")
279282
}
280283
cfg.WarningPercent = number
284+
case "exceeded_action":
285+
cfg.ExceededAction = scalar(value)
281286
default:
282287
return fmt.Errorf("未知 traffic 字段 %q", key)
283288
}
@@ -316,6 +321,12 @@ func applyNotification(cfg *NotificationConfig, key, value string) error {
316321
cfg.WeChatToUser = parseList(value)
317322
case "notify_events":
318323
cfg.NotifyEvents = parseList(value)
324+
case "manual_required_notify_interval":
325+
duration, err := parseDuration(value)
326+
if err != nil {
327+
return err
328+
}
329+
cfg.ManualRequiredNotifyInterval = duration
319330
default:
320331
return fmt.Errorf("未知 notification 字段 %q", key)
321332
}
@@ -395,6 +406,14 @@ func validate(cfg *Config) error {
395406
if cfg.Traffic.WarningPercent <= 0 {
396407
return errors.New("traffic.warning_percent 必须大于 0")
397408
}
409+
if cfg.Traffic.ExceededAction == "" {
410+
cfg.Traffic.ExceededAction = "notify_only"
411+
}
412+
switch cfg.Traffic.ExceededAction {
413+
case "notify_only", "notify_and_stop":
414+
default:
415+
return fmt.Errorf("不支持的 traffic.exceeded_action: %s", cfg.Traffic.ExceededAction)
416+
}
398417
if cfg.KeepAlive.Target == "" {
399418
cfg.KeepAlive.Target = "spot_only"
400419
}
@@ -413,7 +432,10 @@ func validate(cfg *Config) error {
413432
return fmt.Errorf("不支持的 logging.level: %s", cfg.Logging.Level)
414433
}
415434
if cfg.Notification.NotifyEvents == nil {
416-
cfg.Notification.NotifyEvents = []string{"auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "error"}
435+
cfg.Notification.NotifyEvents = []string{"auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "traffic_stop", "error"}
436+
}
437+
if cfg.Notification.ManualRequiredNotifyInterval <= 0 {
438+
return errors.New("notification.manual_required_notify_interval 必须大于 0")
417439
}
418440
if err := validateNotifyEvents(cfg.Notification.NotifyEvents); err != nil {
419441
return err
@@ -423,6 +445,9 @@ func validate(cfg *Config) error {
423445
default:
424446
return fmt.Errorf("不支持的 traffic_policy: %s", cfg.KeepAlive.TrafficPolicy)
425447
}
448+
if cfg.Traffic.ExceededAction == "notify_and_stop" && cfg.KeepAlive.TrafficPolicy == "ignore_limit" {
449+
return errors.New("traffic.exceeded_action=notify_and_stop 不能与 keep_alive.traffic_policy=ignore_limit 同时使用")
450+
}
426451
switch cfg.KeepAlive.StopMode {
427452
case "StopCharging", "KeepCharging":
428453
default:
@@ -469,7 +494,7 @@ func validate(cfg *Config) error {
469494
func validateNotifyEvents(events []string) error {
470495
for _, event := range events {
471496
switch event {
472-
case "all", "auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "error":
497+
case "all", "auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "traffic_stop", "error":
473498
default:
474499
return fmt.Errorf("不支持的 notification.notify_events: %s", event)
475500
}
@@ -510,6 +535,9 @@ func applyEnv(cfg *Config, includeGlobal bool) error {
510535
} else if ok {
511536
cfg.Traffic.WarningPercent = value
512537
}
538+
if value, ok := lookupEnvString("EC_TRAFFIC_EXCEEDED_ACTION"); ok {
539+
cfg.Traffic.ExceededAction = value
540+
}
513541
if value, ok := lookupEnvString("EC_LOG_LEVEL"); ok {
514542
cfg.Logging.Level = strings.ToLower(value)
515543
}
@@ -533,6 +561,11 @@ func applyEnv(cfg *Config, includeGlobal bool) error {
533561
if value, ok := lookupEnvList("EC_NOTIFY_EVENTS"); ok {
534562
cfg.Notification.NotifyEvents = value
535563
}
564+
if value, ok, err := lookupEnvDuration("EC_MANUAL_REQUIRED_NOTIFY_INTERVAL"); err != nil {
565+
return err
566+
} else if ok {
567+
cfg.Notification.ManualRequiredNotifyInterval = value
568+
}
536569
if value, ok := lookupEnvBool("EC_KEEP_ALIVE_ENABLED"); ok {
537570
cfg.KeepAlive.Enabled = value
538571
}

internal/config/config_test.go

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ accounts:
4848
if cfg.KeepAlive.TrafficPolicy != "manual_only_when_exceeded" {
4949
t.Fatalf("traffic policy = %q", cfg.KeepAlive.TrafficPolicy)
5050
}
51+
if cfg.Traffic.ExceededAction != "notify_only" {
52+
t.Fatalf("traffic exceeded action = %q, want notify_only", cfg.Traffic.ExceededAction)
53+
}
5154
if cfg.KeepAlive.StopMode != "StopCharging" {
5255
t.Fatalf("stop mode = %q, want StopCharging", cfg.KeepAlive.StopMode)
5356
}
@@ -72,6 +75,9 @@ accounts:
7275
if len(cfg.Notification.NotifyEvents) != 2 || cfg.Notification.NotifyEvents[0] != "auto_start" {
7376
t.Fatalf("notify events = %#v", cfg.Notification.NotifyEvents)
7477
}
78+
if cfg.Notification.ManualRequiredNotifyInterval != time.Hour {
79+
t.Fatalf("manual required notify interval = %s, want 1h", cfg.Notification.ManualRequiredNotifyInterval)
80+
}
7581
if len(cfg.Accounts) != 1 {
7682
t.Fatalf("accounts len = %d, want 1", len(cfg.Accounts))
7783
}
@@ -151,11 +157,13 @@ func TestLoadBytesUsesEnvironmentAccountAliases(t *testing.T) {
151157
t.Setenv("EC_ACCOUNT_INTL_PROD_REGIONS", "ap-southeast-1")
152158
t.Setenv("EC_REFRESH_INTERVAL", "10m")
153159
t.Setenv("EC_TRAFFIC_WARNING_PERCENT", "90")
160+
t.Setenv("EC_TRAFFIC_EXCEEDED_ACTION", "notify_and_stop")
154161
t.Setenv("EC_NOTIFY_ENABLED", "true")
155162
t.Setenv("EC_WECHAT_CORPID", "corp-env")
156163
t.Setenv("EC_WECHAT_CORPSECRET", "secret-env")
157164
t.Setenv("EC_WECHAT_AGENTID", "1000003")
158165
t.Setenv("EC_WECHAT_TOUSER", "user-a,user-b")
166+
t.Setenv("EC_MANUAL_REQUIRED_NOTIFY_INTERVAL", "30m")
159167

160168
cfg, err := config.LoadBytes([]byte(`
161169
server:
@@ -170,9 +178,15 @@ server:
170178
if cfg.Traffic.WarningPercent != 90 {
171179
t.Fatalf("warning percent = %v, want 90", cfg.Traffic.WarningPercent)
172180
}
181+
if cfg.Traffic.ExceededAction != "notify_and_stop" {
182+
t.Fatalf("traffic exceeded action = %q, want notify_and_stop", cfg.Traffic.ExceededAction)
183+
}
173184
if !cfg.Notification.Enabled || cfg.Notification.WeChatCorpID != "corp-env" || cfg.Notification.WeChatAgentID != 1000003 {
174185
t.Fatalf("wechat notification config = %#v", cfg.Notification)
175186
}
187+
if cfg.Notification.ManualRequiredNotifyInterval != 30*time.Minute {
188+
t.Fatalf("manual required notify interval = %s, want 30m", cfg.Notification.ManualRequiredNotifyInterval)
189+
}
176190
if len(cfg.Notification.WeChatToUser) != 2 || cfg.Notification.WeChatToUser[1] != "user-b" {
177191
t.Fatalf("wechat receivers = %#v", cfg.Notification.WeChatToUser)
178192
}
@@ -222,7 +236,7 @@ accounts:
222236
t.Fatalf("LoadBytes() error = %v", err)
223237
}
224238

225-
want := []string{"auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "error"}
239+
want := []string{"auto_start", "manual_start", "manual_stop", "manual_required", "traffic_exceeded", "traffic_stop", "error"}
226240
if len(cfg.Notification.NotifyEvents) != len(want) {
227241
t.Fatalf("notify events = %#v, want %#v", cfg.Notification.NotifyEvents, want)
228242
}
@@ -233,6 +247,28 @@ accounts:
233247
}
234248
}
235249

250+
func TestLoadBytesRejectsTrafficStopWithIgnoreLimitPolicy(t *testing.T) {
251+
_, err := config.LoadBytes([]byte(`
252+
server:
253+
password: "secret"
254+
traffic:
255+
exceeded_action: "notify_and_stop"
256+
keep_alive:
257+
traffic_policy: "ignore_limit"
258+
accounts:
259+
- name: "cn"
260+
site: "china"
261+
access_key_id: "ak"
262+
access_key_secret: "sk"
263+
`))
264+
if err == nil {
265+
t.Fatal("LoadBytes() error = nil, want validation error")
266+
}
267+
if !strings.Contains(err.Error(), "notify_and_stop") || !strings.Contains(err.Error(), "ignore_limit") {
268+
t.Fatalf("LoadBytes() error = %v, want conflict message", err)
269+
}
270+
}
271+
236272
func TestLoadBytesRejectsUnknownNotificationEvent(t *testing.T) {
237273
_, err := config.LoadBytes([]byte(`
238274
server:

internal/config/write.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ func renderGlobalSettings(original string, cfg Config) string {
5252

5353
builder.WriteString("\ntraffic:\n")
5454
writeKeyValue(&builder, "warning_percent", formatFloat(cfg.Traffic.WarningPercent))
55+
writeKeyValue(&builder, "exceeded_action", quote(cfg.Traffic.ExceededAction))
5556

5657
builder.WriteString("\nlogging:\n")
5758
writeKeyValue(&builder, "level", quote(cfg.Logging.Level))
@@ -63,6 +64,7 @@ func renderGlobalSettings(original string, cfg Config) string {
6364
writeKeyValue(&builder, "agentid", rawOrString(original, "notification", "agentid", "${EC_WECHAT_AGENTID}"))
6465
writeKeyValue(&builder, "touser", rawOrString(original, "notification", "touser", quote("${EC_WECHAT_TOUSER}")))
6566
writeKeyValue(&builder, "notify_events", renderList(cfg.Notification.NotifyEvents))
67+
writeKeyValue(&builder, "manual_required_notify_interval", quote(formatDuration(cfg.Notification.ManualRequiredNotifyInterval.String())))
6668

6769
builder.WriteString("\nkeep_alive:\n")
6870
writeKeyValue(&builder, "enabled", strconv.FormatBool(cfg.KeepAlive.Enabled))

internal/config/write_test.go

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ discovery:
2929
3030
traffic:
3131
warning_percent: 95
32+
exceeded_action: "notify_only"
3233
3334
logging:
3435
level: "info"
@@ -40,6 +41,7 @@ notification:
4041
agentid: 1000002
4142
touser: ["user-a", "user-b"]
4243
notify_events: ["auto_start", "error"]
44+
manual_required_notify_interval: "1h"
4345
4446
keep_alive:
4547
enabled: true
@@ -67,9 +69,11 @@ accounts:
6769
}
6870
cfg.Server.RefreshInterval = 2 * time.Minute
6971
cfg.Traffic.WarningPercent = 88
72+
cfg.Traffic.ExceededAction = "notify_and_stop"
7073
cfg.Logging.Level = "debug"
7174
cfg.Notification.Enabled = true
72-
cfg.Notification.NotifyEvents = []string{"traffic_exceeded", "error"}
75+
cfg.Notification.NotifyEvents = []string{"traffic_exceeded", "traffic_stop", "error"}
76+
cfg.Notification.ManualRequiredNotifyInterval = 30 * time.Minute
7377
cfg.KeepAlive.StopMode = "KeepCharging"
7478
cfg.KeepAlive.IncludeInstanceIDs = []string{"i-1", "i-2"}
7579

@@ -90,7 +94,10 @@ accounts:
9094
`access_key_secret: "${EC_ACCOUNT_CN1_ACCESS_KEY_SECRET}"`,
9195
`refresh_interval: "2m"`,
9296
`warning_percent: 88`,
97+
`exceeded_action: "notify_and_stop"`,
9398
`level: "debug"`,
99+
`notify_events: ["traffic_exceeded", "traffic_stop", "error"]`,
100+
`manual_required_notify_interval: "30m"`,
94101
`stop_mode: "KeepCharging"`,
95102
`include_instance_ids: ["i-1", "i-2"]`,
96103
} {

0 commit comments

Comments
 (0)