feat(group-a): EvaluationState extension + Hat-BMAD mappings (#228, #229) by ComBba · Pull Request #257 · Two-Weeks-Team/somm.dev

ComBba · 2026-02-09T07:08:44Z

Summary

Epic #223 — Group A Wave 2. EvaluationState를 full_techniques 평가에 필요한 필드로 확장하고, 6색 모자(Hat)와 BMAD 항목, 75개 기법 간의 매핑 체계를 구축합니다.

Base: feat/group-a-foundation (PR #256) — Wave 1 머지 후 base를 master로 변경 예정

Changes

#228 — EvaluationState 확장 + ItemScore 강화 + Quality Gate

backend/app/models/graph.py

ItemScore.score 제약 완화: ge=0, le=100 → ge=0 (BMAD 항목별 최대 4~7점)
8개 Optional 필드 추가: item_name, max_score, status, hat_used, evidence, rationale, confidence, unevaluated_reason

backend/app/graph/state.py

14개 신규 필드: evaluation_mode, quality_gate, category_scores, hat_contributions, total_score, strengths, improvements 등
merge_hat_contributions reducer 추가

backend/app/constants.py (신규)

Quality Gate 로직: PASS(≥70) / CONCERNS(≥50) / FAIL(<50) / INCOMPLETE(coverage<0.5)

#229 — Hat→BMAD + Technique→Hat/Item/Priority mappings

backend/app/criteria/hat_mappings.py (신규)

Hat enum (white, red, black, yellow, green, blue)
HAT_TO_ITEMS / ITEM_TO_HATS 양방향 매핑
validate_coverage() — 17개 BMAD 항목 전체 커버 검증

backend/app/criteria/technique_mappings.py (신규)

75개 기법 → BMAD 항목 매핑 (TECHNIQUE_TO_ITEMS)
get_techniques_for_mode() — 모드별 필터링
- full_techniques: 75개 전체
- six_sommeliers: P0+P1
- grand_tasting: P0만

Testing

165 passed, 0 failed

테스트 파일	테스트 수	내용
test_quality_gate.py	10	경계값 테스트 (49/50/69/70, coverage 0.49/0.5)
test_item_score_model.py	8	하위 호환성, BMAD 스케일, validation
test_hat_mappings.py	10	17 항목 전체 커버, 양방향 매핑 일관성
test_technique_mappings.py	11 (추가)	모드별 필터링, 기법→항목 매핑

Checklist

모든 17개 BMAD 항목이 최소 1개 Hat에 매핑
full_techniques 모드: 75개 기법 반환
모드 순서: grand_tasting < six_sommeliers < full_techniques
ItemScore 하위 호환 유지 (기존 필드만 쓰는 코드 영향 없음)
165개 테스트 전체 통과

Closes #228, Closes #229

#226, #227) - #224: Remove input_source field, add fairthon_source for YAML preservation - Remove filter_techniques() and determine_available_inputs() from loader - Add get_all_techniques() and has_readme_content() to registry - #225: Replace 75 Korean YAML technique definitions with English Fairthon templates - Map Fairthon categories to wine-themed folder names - Remove nameKo field from schema and all YAMLs - #226: Unify EvaluationMode enum as single source of truth - Remove duplicate EvaluationMode from models/graph.py - Replace SIX_HATS with SIX_SOMMELIERS everywhere - Add FULL_TECHNIQUES mode - Update frontend TypeScript types - #227: Define BMAD 17-item evaluation canon (A1-D4, 100 points) - Create backend/app/criteria/ module with bmad_items.py - A=25pts, B=25pts, C=30pts, D=20pts - Fix pre-existing auth test failures (7 tests) - Update _check_ownership tests for 3-param signature - Fix Secure cookie assertion for localhost environment Closes #224, Closes #225, Closes #226, Closes #227

) - #228: Extend EvaluationState with full_techniques fields - Add 14 new state fields (evaluation_mode, quality_gate, category_scores, etc.) - Extend ItemScore: relax score constraint (ge=0), add 8 optional fields - Create constants.py with quality gate logic (PASS/CONCERNS/FAIL/INCOMPLETE) - Add merge_hat_contributions reducer - #229: Create hat and technique mapping modules - hat_mappings.py: Hat enum, Hat→BMAD items, reverse ITEM→Hats lookup - technique_mappings.py: 75 technique→BMAD item mappings - Mode filtering: full_techniques(75) > six_sommeliers(P0+P1) > grand_tasting(P0) - All 17 BMAD items covered by at least one hat Closes #228, Closes #229

vercel · 2026-02-09T07:08:50Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
somm-dev	Ready	Preview, Comment	Feb 9, 2026 7:09am

chatgpt-codex-connector · 2026-02-09T07:08:50Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

coderabbitai · 2026-02-09T07:08:52Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/group-a-wave2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-09T07:09:08Z

Summary of Changes

Hello @ComBba, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

이 PR은 평가 시스템의 핵심 데이터 모델과 로직을 확장하여, 'full_techniques' 평가를 위한 기반을 마련합니다. EvaluationState와 ItemScore 모델에 상세 평가 정보를 담을 수 있도록 필드를 추가하고, 6색 모자 및 75개 기법과 BMAD 항목 간의 복잡한 매핑 체계를 구축합니다. 또한, 평가 결과에 대한 품질 게이트를 도입하여 평가의 신뢰성과 유용성을 높이는 것을 목표로 합니다.

Highlights

EvaluationState 모델 확장: full_techniques 평가에 필요한 14개 신규 필드(예: evaluation_mode, quality_gate, hat_contributions, total_score)가 EvaluationState TypedDict에 추가되었습니다.
ItemScore 모델 개선: ItemScore의 score 필드 제약이 ge=0으로 완화되었고, item_name, max_score, hat_used, evidence 등 8개의 상세 평가 필드가 추가되어 평가 정보의 풍부함을 더했습니다.
품질 게이트 로직 도입: backend/app/constants.py에 PASS(≥70), CONCERNS(≥50), FAIL(<50), INCOMPLETE(coverage<0.5)를 정의하는 get_quality_gate 함수가 새로 추가되었습니다.
6색 모자(Hat) 매핑 시스템 구축: backend/app/criteria/hat_mappings.py에 6색 모자 Enum과 BMAD 항목 간의 양방향 매핑(HAT_TO_ITEMS, ITEM_TO_HATS)이 구현되었으며, 전체 BMAD 항목 커버리지를 검증하는 validate_coverage 함수가 포함되었습니다.
기법(Technique) 매핑 및 필터링: backend/app/criteria/technique_mappings.py에 75개 기법과 BMAD 항목 간의 매핑(TECHNIQUE_TO_ITEMS)이 정의되었고, 평가 모드(예: full_techniques, six_sommeliers, grand_tasting)에 따라 기법을 필터링하는 기능이 추가되었습니다.
hat_contributions 병합 로직 추가: backend/app/graph/state.py에 hat_contributions 딕셔너리를 병합하기 위한 merge_hat_contributions 리듀서가 추가되어 복잡한 데이터 구조의 병합을 지원합니다.

Changelog

backend/app/constants.py
- 품질 게이트 임계값(PASS_THRESHOLD, CONCERNS_THRESHOLD, COVERAGE_THRESHOLD) 및 get_quality_gate 함수가 새로 추가되었습니다.
backend/app/criteria/hat_mappings.py
- 6색 모자(Hat) Enum, HAT_TO_ITEMS (모자→BMAD 항목), ITEM_TO_HATS (BMAD 항목→모자) 양방향 매핑, 그리고 관련 유틸리티 함수(get_items_for_hat, get_hats_for_item, get_all_hats, validate_coverage)가 새로 추가되었습니다.
backend/app/criteria/technique_mappings.py
- 75개 기법과 BMAD 항목 간의 매핑(TECHNIQUE_TO_ITEMS), 평가 모드에 따라 기법을 필터링하는 get_techniques_for_mode 함수, 특정 BMAD 항목에 대한 기법을 가져오는 get_techniques_for_item 함수 등이 새로 추가되었습니다.
backend/app/graph/state.py
- hat_contributions 딕셔너리를 병합하는 merge_hat_contributions 함수가 추가되었습니다.
- EvaluationState TypedDict에 evaluation_mode, quality_gate, hat_contributions, total_score, category_scores, strengths, improvements 등 14개의 새로운 필드가 추가되었습니다.
backend/app/models/graph.py
- ItemScore.score 필드의 하한 제약이 ge=0, le=100에서 ge=0으로 완화되었습니다.
- ItemScore 모델에 item_name, max_score, status, hat_used, evidence, rationale, confidence, unevaluated_reason 등 8개의 선택적 필드가 추가되었습니다.
backend/tests/test_hat_mappings.py
- 새로 추가된 hat_mappings 모듈의 기능(매핑 일관성, 커버리지, 정렬 등)을 검증하는 테스트 케이스가 추가되었습니다.
backend/tests/test_item_score_model.py
- ItemScore 모델의 변경 사항(스코어 제약 완화, 신규 필드) 및 하위 호환성을 검증하는 테스트 케이스가 추가되었습니다.
backend/tests/test_quality_gate.py
- 새로 추가된 품질 게이트 로직의 경계값 및 정확성을 검증하는 테스트 케이스가 추가되었습니다.
backend/tests/test_technique_mappings.py
- app.techniques.mappings에서 InvalidMappingError 임포트가 제거되었습니다.
- 새로 추가된 app.criteria.technique_mappings 모듈의 기능(모드별 필터링, 기법-항목 매핑, 우선순위)을 검증하는 테스트 케이스가 추가되었습니다.

Activity

PR 생성 후, ComBba님이 상세한 변경 사항 요약과 함께 테스트 결과 및 체크리스트를 제공했습니다.
165개의 모든 테스트가 성공적으로 통과되었음을 보고했습니다.
feat(state): EvaluationState 확장 + ItemScore 강화 + Quality Gate 추가 #228 및 feat(criteria): Hat→BMAD + Technique→Hat/Item/Priority mappings #229 이슈를 닫았습니다.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR significantly extends EvaluationState and introduces a mapping system for full_techniques evaluation, incorporating new hat_mappings, technique_mappings modules, and quality_gate logic. However, two security concerns were identified: the inclusion of sensitive API keys in the persistent graph state, posing a risk of credential exposure, and a potential crash in the state merging logic when handling structured data from agents. It is recommended to handle secrets using non-persistent configuration and to harden the merging logic against unhashable types. Additionally, specific review comments provide suggestions for improving code readability and maintainability through Pythonic style.

gemini-code-assist · 2026-02-09T07:13:06Z

+    evaluation_mode: NotRequired[str]
+    github_url: NotRequired[Optional[str]]
+    github_analysis: NotRequired[Optional[dict]]
+    user_api_key: NotRequired[Optional[str]]


The user_api_key field is added to the EvaluationState TypedDict, which is used as the persistent state for the LangGraph evaluation pipeline. Storing sensitive API keys in a persistent state store (managed by a checkpointer as seen in backend/app/graph/graph.py) is insecure as these stores are typically not encrypted and the state may be exposed via logs, debugging tools, or API responses. This violates the principle of secure data handling for sensitive credentials.

gemini-code-assist · 2026-02-09T07:13:06Z

+                merged = dict(existing)
+                for k, v in data.items():
+                    if isinstance(v, list) and isinstance(merged.get(k), list):
+                        merged[k] = sorted(set(merged[k]) | set(v))


The use of set() and sorted() on the merged list will cause a TypeError and crash the evaluation pipeline if the list contains unhashable or non-comparable types, such as dictionaries. Since hat_contributions is intended to aggregate data from various agents (sommeliers), it is highly likely to contain structured data (e.g., lists of improvement objects). An attacker could potentially trigger a Denial of Service (DoS) for an evaluation by influencing the LLM output to include such structured data in a merged field.

Suggested change

merged[k] = sorted(set(merged[k]) | set(v))

combined = merged[k] + v

unique = []

for item in combined:

if item not in unique:

unique.append(item)

try:

merged[k] = sorted(unique)

except TypeError:

merged[k] = unique

gemini-code-assist · 2026-02-09T07:13:06Z

+for hat, items in HAT_TO_ITEMS.items():
+    for item in items:
+        if item not in ITEM_TO_HATS:
+            ITEM_TO_HATS[item] = []
+        ITEM_TO_HATS[item].append(hat)


ITEM_TO_HATS를 생성하는 로직을 dict.setdefault()를 사용하여 더 간결하게 개선할 수 있습니다. 이렇게 하면 item이 딕셔너리에 있는지 확인하는 if 문을 제거하여 코드를 더 Pythonic하게 만들 수 있습니다.

for hat, items in HAT_TO_ITEMS.items(): for item in items: ITEM_TO_HATS.setdefault(item, []).append(hat)

gemini-code-assist · 2026-02-09T07:13:06Z

+    result = []
+    for tech in all_techs:
+        if hat in tech.applicable_hats:
+            result.append(tech.id)
+    return sorted(result)


for 루프를 사용하여 result 리스트를 만드는 대신, list comprehension을 사용하면 코드를 더 간결하고 Pythonic하게 작성할 수 있습니다.

Suggested change

result = []

for tech in all_techs:

if hat in tech.applicable_hats:

result.append(tech.id)

return sorted(result)

return sorted([tech.id for tech in all_techs if hat in tech.applicable_hats])

gemini-code-assist · 2026-02-09T07:13:06Z

+    if mode == "full_techniques":
+        return sorted(all_ids)
+
+    if mode == "grand_tasting":
+        return sorted(
+            [
+                t_id
+                for t_id in all_ids
+                if TECHNIQUE_PRIORITY.get(t_id, Priority.P2) == Priority.P0
+            ]
+        )
+
+    if mode == "six_sommeliers":
+        return sorted(
+            [
+                t_id
+                for t_id in all_ids
+                if TECHNIQUE_PRIORITY.get(t_id, Priority.P2)
+                in (Priority.P0, Priority.P1)
+            ]
+        )
+
+    return sorted(all_ids)


get_techniques_for_mode 함수의 로직이 다소 중복됩니다. full_techniques 모드와 유효하지 않은 모드 모두 모든 기법을 반환하므로, if/elif/else 구조를 사용하여 로직을 더 명확하게 만들고 마지막에 sorted()를 한 번만 호출하도록 리팩토링하는 것이 좋습니다.

if mode == "grand_tasting": filtered_ids = [ t_id for t_id in all_ids if TECHNIQUE_PRIORITY.get(t_id, Priority.P2) == Priority.P0 ] elif mode == "six_sommeliers": filtered_ids = [ t_id for t_id in all_ids if TECHNIQUE_PRIORITY.get(t_id, Priority.P2) in (Priority.P0, Priority.P1) ] else: # "full_techniques" or any other invalid mode filtered_ids = all_ids return sorted(filtered_ids)

gemini-code-assist · 2026-02-09T07:13:07Z

+    best = None
+    best_priority = Priority.P2.value + 1
+    for t_id in tech_ids:
+        p = TECHNIQUE_PRIORITY.get(t_id, Priority.P2).value
+        if p < best_priority:
+            best_priority = p
+            best = t_id
+    return best


가장 우선순위가 높은 기법을 찾기 위해 for 루프를 사용하는 대신, min() 내장 함수와 key 인자를 사용하면 코드를 훨씬 간결하고 Pythonic하게 작성할 수 있습니다.

return min(tech_ids, key=lambda t_id: TECHNIQUE_PRIORITY.get(t_id, Priority.P2).value)

gemini-code-assist · 2026-02-09T07:13:07Z

+    current: Optional[Dict[str, object]], incoming: Optional[Dict[str, object]]
+) -> Dict[str, object]:


타입 힌트로 object를 사용하는 것보다 Any를 사용하는 것이 Python 타이핑 컨벤션에 더 부합하며, "어떤 타입이든 올 수 있음"을 더 명확하게 나타냅니다. typing에서 Any를 임포트하여 사용하시는 것을 권장합니다.

Suggested change

current: Optional[Dict[str, object]], incoming: Optional[Dict[str, object]]

) -> Dict[str, object]:

current: Optional[Dict[str, Any]], incoming: Optional[Dict[str, Any]]

) -> Dict[str, Any]:

gemini-code-assist · 2026-02-09T07:13:07Z

+    def test_full_techniques_returns_75(self):
+        """get_techniques_for_mode('full_techniques') returns 75 techniques."""
+        from app.criteria.technique_mappings import get_techniques_for_mode
+
+        techs = get_techniques_for_mode("full_techniques")
+        assert len(techs) == 75, f"Expected 75 techniques, got {len(techs)}"


테스트 코드에 기법의 총 개수인 75가 하드코딩되어 있어, 향후 기법이 추가되거나 삭제될 때 테스트가 실패하고 수동으로 수정해야 하는 번거로움이 있습니다. get_registry().count()를 사용하여 기법의 총 개수를 동적으로 가져오면 테스트의 유지보수성과 견고성을 높일 수 있습니다. 이 파일의 다른 테스트 케이스들(test_mode_count_ordering, test_get_techniques_for_mode_invalid_returns_all)에서도 75가 하드코딩되어 있으니 함께 수정하면 좋을 것 같습니다.

Suggested change

def test_full_techniques_returns_75(self):

"""get_techniques_for_mode('full_techniques') returns 75 techniques."""

from app.criteria.technique_mappings import get_techniques_for_mode

techs = get_techniques_for_mode("full_techniques")

assert len(techs) == 75, f"Expected 75 techniques, got {len(techs)}"

def test_full_techniques_returns_all_techniques(self):

"""get_techniques_for_mode('full_techniques') returns all techniques from the registry."""

from app.criteria.technique_mappings import get_techniques_for_mode

from app.techniques.registry import get_registry

techs = get_techniques_for_mode("full_techniques")

expected_count = get_registry().count()

assert len(techs) == expected_count, f"Expected {expected_count} techniques, got {len(techs)}"

ComBba added 2 commits February 9, 2026 15:56

ComBba self-assigned this Feb 9, 2026

vercel Bot deployed to Preview – somm-dev February 9, 2026 07:09 View deployment

ComBba force-pushed the feat/group-a-foundation branch from bbc7143 to 1be88eb Compare February 9, 2026 07:10

gemini-code-assist Bot reviewed Feb 9, 2026

View reviewed changes

ComBba mentioned this pull request Feb 9, 2026

feat(group-b): Infrastructure layer — BaseTechnique, YAMLTechnique, TechniqueRouter, BYOK, Quota (#230, #231, #232, #233, #238, #251) #258

Merged

ComBba deleted the branch feat/group-a-foundation February 9, 2026 07:46

ComBba closed this Feb 9, 2026

ComBba deleted the feat/group-a-wave2 branch February 9, 2026 09:29

-                        merged[k] = sorted(set(merged[k]) | set(v))
+                        combined = merged[k] + v
+                        unique = []
+                        for item in combined:
+                            if item not in unique:
+                                unique.append(item)
+                        try:
+                            merged[k] = sorted(unique)
+                        except TypeError:
+                            merged[k] = unique

		current: Optional[Dict[str, object]], incoming: Optional[Dict[str, object]]
		) -> Dict[str, object]:

Conversation

ComBba commented Feb 9, 2026

Summary

Changes

#228 — EvaluationState 확장 + ItemScore 강화 + Quality Gate

#229 — Hat→BMAD + Technique→Hat/Item/Priority mappings

Testing

Checklist

Uh oh!

vercel Bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Feb 9, 2026

Uh oh!

coderabbitai Bot commented Feb 9, 2026

Review skipped

Uh oh!

gemini-code-assist Bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Feb 9, 2026 •

edited

Loading