Skip to content

feat(group-a): EvaluationState extension + Hat-BMAD mappings (#228, #229)#257

Closed
ComBba wants to merge 2 commits into
feat/group-a-foundationfrom
feat/group-a-wave2
Closed

feat(group-a): EvaluationState extension + Hat-BMAD mappings (#228, #229)#257
ComBba wants to merge 2 commits into
feat/group-a-foundationfrom
feat/group-a-wave2

Conversation

@ComBba

@ComBba ComBba commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Epic #223 — Group A Wave 2. EvaluationState를 full_techniques 평가에 필요한 필드로 확장하고, 6색 모자(Hat)와 BMAD 항목, 75개 기법 간의 매핑 체계를 구축합니다.

Base: feat/group-a-foundation (PR #256) — Wave 1 머지 후 base를 master로 변경 예정

Changes

#228 — EvaluationState 확장 + ItemScore 강화 + Quality Gate

backend/app/models/graph.py

  • ItemScore.score 제약 완화: ge=0, le=100ge=0 (BMAD 항목별 최대 4~7점)
  • 8개 Optional 필드 추가: item_name, max_score, status, hat_used, evidence, rationale, confidence, unevaluated_reason

backend/app/graph/state.py

  • 14개 신규 필드: evaluation_mode, quality_gate, category_scores, hat_contributions, total_score, strengths, improvements
  • merge_hat_contributions reducer 추가

backend/app/constants.py (신규)

  • Quality Gate 로직: PASS(≥70) / CONCERNS(≥50) / FAIL(<50) / INCOMPLETE(coverage<0.5)

#229 — Hat→BMAD + Technique→Hat/Item/Priority mappings

backend/app/criteria/hat_mappings.py (신규)

  • Hat enum (white, red, black, yellow, green, blue)
  • HAT_TO_ITEMS / ITEM_TO_HATS 양방향 매핑
  • validate_coverage() — 17개 BMAD 항목 전체 커버 검증

backend/app/criteria/technique_mappings.py (신규)

  • 75개 기법 → BMAD 항목 매핑 (TECHNIQUE_TO_ITEMS)
  • get_techniques_for_mode() — 모드별 필터링
    • full_techniques: 75개 전체
    • six_sommeliers: P0+P1
    • grand_tasting: P0만

Testing

165 passed, 0 failed
테스트 파일 테스트 수 내용
test_quality_gate.py 10 경계값 테스트 (49/50/69/70, coverage 0.49/0.5)
test_item_score_model.py 8 하위 호환성, BMAD 스케일, validation
test_hat_mappings.py 10 17 항목 전체 커버, 양방향 매핑 일관성
test_technique_mappings.py 11 (추가) 모드별 필터링, 기법→항목 매핑

Checklist

  • 모든 17개 BMAD 항목이 최소 1개 Hat에 매핑
  • full_techniques 모드: 75개 기법 반환
  • 모드 순서: grand_tasting < six_sommeliers < full_techniques
  • ItemScore 하위 호환 유지 (기존 필드만 쓰는 코드 영향 없음)
  • 165개 테스트 전체 통과

Closes #228, Closes #229

#226, #227)

- #224: Remove input_source field, add fairthon_source for YAML preservation
  - Remove filter_techniques() and determine_available_inputs() from loader
  - Add get_all_techniques() and has_readme_content() to registry

- #225: Replace 75 Korean YAML technique definitions with English Fairthon templates
  - Map Fairthon categories to wine-themed folder names
  - Remove nameKo field from schema and all YAMLs

- #226: Unify EvaluationMode enum as single source of truth
  - Remove duplicate EvaluationMode from models/graph.py
  - Replace SIX_HATS with SIX_SOMMELIERS everywhere
  - Add FULL_TECHNIQUES mode
  - Update frontend TypeScript types

- #227: Define BMAD 17-item evaluation canon (A1-D4, 100 points)
  - Create backend/app/criteria/ module with bmad_items.py
  - A=25pts, B=25pts, C=30pts, D=20pts

- Fix pre-existing auth test failures (7 tests)
  - Update _check_ownership tests for 3-param signature
  - Fix Secure cookie assertion for localhost environment

Closes #224, Closes #225, Closes #226, Closes #227
)

- #228: Extend EvaluationState with full_techniques fields
  - Add 14 new state fields (evaluation_mode, quality_gate, category_scores, etc.)
  - Extend ItemScore: relax score constraint (ge=0), add 8 optional fields
  - Create constants.py with quality gate logic (PASS/CONCERNS/FAIL/INCOMPLETE)
  - Add merge_hat_contributions reducer

- #229: Create hat and technique mapping modules
  - hat_mappings.py: Hat enum, Hat→BMAD items, reverse ITEM→Hats lookup
  - technique_mappings.py: 75 technique→BMAD item mappings
  - Mode filtering: full_techniques(75) > six_sommeliers(P0+P1) > grand_tasting(P0)
  - All 17 BMAD items covered by at least one hat

Closes #228, Closes #229
@ComBba ComBba self-assigned this Feb 9, 2026
@vercel

vercel Bot commented Feb 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
somm-dev Ready Ready Preview, Comment Feb 9, 2026 7:09am

Request Review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai

coderabbitai Bot commented Feb 9, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/group-a-wave2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello @ComBba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

이 PR은 평가 시스템의 핵심 데이터 모델과 로직을 확장하여, 'full_techniques' 평가를 위한 기반을 마련합니다. EvaluationState와 ItemScore 모델에 상세 평가 정보를 담을 수 있도록 필드를 추가하고, 6색 모자 및 75개 기법과 BMAD 항목 간의 복잡한 매핑 체계를 구축합니다. 또한, 평가 결과에 대한 품질 게이트를 도입하여 평가의 신뢰성과 유용성을 높이는 것을 목표로 합니다.

Highlights

  • EvaluationState 모델 확장: full_techniques 평가에 필요한 14개 신규 필드(예: evaluation_mode, quality_gate, hat_contributions, total_score)가 EvaluationState TypedDict에 추가되었습니다.
  • ItemScore 모델 개선: ItemScore의 score 필드 제약이 ge=0으로 완화되었고, item_name, max_score, hat_used, evidence 등 8개의 상세 평가 필드가 추가되어 평가 정보의 풍부함을 더했습니다.
  • 품질 게이트 로직 도입: backend/app/constants.py에 PASS(≥70), CONCERNS(≥50), FAIL(<50), INCOMPLETE(coverage<0.5)를 정의하는 get_quality_gate 함수가 새로 추가되었습니다.
  • 6색 모자(Hat) 매핑 시스템 구축: backend/app/criteria/hat_mappings.py에 6색 모자 Enum과 BMAD 항목 간의 양방향 매핑(HAT_TO_ITEMS, ITEM_TO_HATS)이 구현되었으며, 전체 BMAD 항목 커버리지를 검증하는 validate_coverage 함수가 포함되었습니다.
  • 기법(Technique) 매핑 및 필터링: backend/app/criteria/technique_mappings.py에 75개 기법과 BMAD 항목 간의 매핑(TECHNIQUE_TO_ITEMS)이 정의되었고, 평가 모드(예: full_techniques, six_sommeliers, grand_tasting)에 따라 기법을 필터링하는 기능이 추가되었습니다.
  • hat_contributions 병합 로직 추가: backend/app/graph/state.py에 hat_contributions 딕셔너리를 병합하기 위한 merge_hat_contributions 리듀서가 추가되어 복잡한 데이터 구조의 병합을 지원합니다.
Changelog
  • backend/app/constants.py
    • 품질 게이트 임계값(PASS_THRESHOLD, CONCERNS_THRESHOLD, COVERAGE_THRESHOLD) 및 get_quality_gate 함수가 새로 추가되었습니다.
  • backend/app/criteria/hat_mappings.py
    • 6색 모자(Hat) Enum, HAT_TO_ITEMS (모자→BMAD 항목), ITEM_TO_HATS (BMAD 항목→모자) 양방향 매핑, 그리고 관련 유틸리티 함수(get_items_for_hat, get_hats_for_item, get_all_hats, validate_coverage)가 새로 추가되었습니다.
  • backend/app/criteria/technique_mappings.py
    • 75개 기법과 BMAD 항목 간의 매핑(TECHNIQUE_TO_ITEMS), 평가 모드에 따라 기법을 필터링하는 get_techniques_for_mode 함수, 특정 BMAD 항목에 대한 기법을 가져오는 get_techniques_for_item 함수 등이 새로 추가되었습니다.
  • backend/app/graph/state.py
    • hat_contributions 딕셔너리를 병합하는 merge_hat_contributions 함수가 추가되었습니다.
    • EvaluationState TypedDict에 evaluation_mode, quality_gate, hat_contributions, total_score, category_scores, strengths, improvements 등 14개의 새로운 필드가 추가되었습니다.
  • backend/app/models/graph.py
    • ItemScore.score 필드의 하한 제약이 ge=0, le=100에서 ge=0으로 완화되었습니다.
    • ItemScore 모델에 item_name, max_score, status, hat_used, evidence, rationale, confidence, unevaluated_reason 등 8개의 선택적 필드가 추가되었습니다.
  • backend/tests/test_hat_mappings.py
    • 새로 추가된 hat_mappings 모듈의 기능(매핑 일관성, 커버리지, 정렬 등)을 검증하는 테스트 케이스가 추가되었습니다.
  • backend/tests/test_item_score_model.py
    • ItemScore 모델의 변경 사항(스코어 제약 완화, 신규 필드) 및 하위 호환성을 검증하는 테스트 케이스가 추가되었습니다.
  • backend/tests/test_quality_gate.py
    • 새로 추가된 품질 게이트 로직의 경계값 및 정확성을 검증하는 테스트 케이스가 추가되었습니다.
  • backend/tests/test_technique_mappings.py
    • app.techniques.mappings에서 InvalidMappingError 임포트가 제거되었습니다.
    • 새로 추가된 app.criteria.technique_mappings 모듈의 기능(모드별 필터링, 기법-항목 매핑, 우선순위)을 검증하는 테스트 케이스가 추가되었습니다.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ComBba ComBba force-pushed the feat/group-a-foundation branch from bbc7143 to 1be88eb Compare February 9, 2026 07:10

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR significantly extends EvaluationState and introduces a mapping system for full_techniques evaluation, incorporating new hat_mappings, technique_mappings modules, and quality_gate logic. However, two security concerns were identified: the inclusion of sensitive API keys in the persistent graph state, posing a risk of credential exposure, and a potential crash in the state merging logic when handling structured data from agents. It is recommended to handle secrets using non-persistent configuration and to harden the merging logic against unhashable types. Additionally, specific review comments provide suggestions for improving code readability and maintainability through Pythonic style.

evaluation_mode: NotRequired[str]
github_url: NotRequired[Optional[str]]
github_analysis: NotRequired[Optional[dict]]
user_api_key: NotRequired[Optional[str]]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The user_api_key field is added to the EvaluationState TypedDict, which is used as the persistent state for the LangGraph evaluation pipeline. Storing sensitive API keys in a persistent state store (managed by a checkpointer as seen in backend/app/graph/graph.py) is insecure as these stores are typically not encrypted and the state may be exposed via logs, debugging tools, or API responses. This violates the principle of secure data handling for sensitive credentials.

merged = dict(existing)
for k, v in data.items():
if isinstance(v, list) and isinstance(merged.get(k), list):
merged[k] = sorted(set(merged[k]) | set(v))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The use of set() and sorted() on the merged list will cause a TypeError and crash the evaluation pipeline if the list contains unhashable or non-comparable types, such as dictionaries. Since hat_contributions is intended to aggregate data from various agents (sommeliers), it is highly likely to contain structured data (e.g., lists of improvement objects). An attacker could potentially trigger a Denial of Service (DoS) for an evaluation by influencing the LLM output to include such structured data in a merged field.

Suggested change
merged[k] = sorted(set(merged[k]) | set(v))
combined = merged[k] + v
unique = []
for item in combined:
if item not in unique:
unique.append(item)
try:
merged[k] = sorted(unique)
except TypeError:
merged[k] = unique

Comment on lines +33 to +37
for hat, items in HAT_TO_ITEMS.items():
for item in items:
if item not in ITEM_TO_HATS:
ITEM_TO_HATS[item] = []
ITEM_TO_HATS[item].append(hat)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

ITEM_TO_HATS를 생성하는 로직을 dict.setdefault()를 사용하여 더 간결하게 개선할 수 있습니다. 이렇게 하면 item이 딕셔너리에 있는지 확인하는 if 문을 제거하여 코드를 더 Pythonic하게 만들 수 있습니다.

for hat, items in HAT_TO_ITEMS.items():
    for item in items:
        ITEM_TO_HATS.setdefault(item, []).append(hat)

Comment on lines +113 to +117
result = []
for tech in all_techs:
if hat in tech.applicable_hats:
result.append(tech.id)
return sorted(result)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

for 루프를 사용하여 result 리스트를 만드는 대신, list comprehension을 사용하면 코드를 더 간결하고 Pythonic하게 작성할 수 있습니다.

Suggested change
result = []
for tech in all_techs:
if hat in tech.applicable_hats:
result.append(tech.id)
return sorted(result)
return sorted([tech.id for tech in all_techs if hat in tech.applicable_hats])

Comment on lines +147 to +169
if mode == "full_techniques":
return sorted(all_ids)

if mode == "grand_tasting":
return sorted(
[
t_id
for t_id in all_ids
if TECHNIQUE_PRIORITY.get(t_id, Priority.P2) == Priority.P0
]
)

if mode == "six_sommeliers":
return sorted(
[
t_id
for t_id in all_ids
if TECHNIQUE_PRIORITY.get(t_id, Priority.P2)
in (Priority.P0, Priority.P1)
]
)

return sorted(all_ids)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

get_techniques_for_mode 함수의 로직이 다소 중복됩니다. full_techniques 모드와 유효하지 않은 모드 모두 모든 기법을 반환하므로, if/elif/else 구조를 사용하여 로직을 더 명확하게 만들고 마지막에 sorted()를 한 번만 호출하도록 리팩토링하는 것이 좋습니다.

    if mode == "grand_tasting":
        filtered_ids = [
            t_id
            for t_id in all_ids
            if TECHNIQUE_PRIORITY.get(t_id, Priority.P2) == Priority.P0
        ]
    elif mode == "six_sommeliers":
        filtered_ids = [
            t_id
            for t_id in all_ids
            if TECHNIQUE_PRIORITY.get(t_id, Priority.P2)
            in (Priority.P0, Priority.P1)
        ]
    else:  # "full_techniques" or any other invalid mode
        filtered_ids = all_ids

    return sorted(filtered_ids)

Comment on lines +184 to +191
best = None
best_priority = Priority.P2.value + 1
for t_id in tech_ids:
p = TECHNIQUE_PRIORITY.get(t_id, Priority.P2).value
if p < best_priority:
best_priority = p
best = t_id
return best

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

가장 우선순위가 높은 기법을 찾기 위해 for 루프를 사용하는 대신, min() 내장 함수와 key 인자를 사용하면 코드를 훨씬 간결하고 Pythonic하게 작성할 수 있습니다.

    return min(tech_ids, key=lambda t_id: TECHNIQUE_PRIORITY.get(t_id, Priority.P2).value)

Comment on lines +10 to +11
current: Optional[Dict[str, object]], incoming: Optional[Dict[str, object]]
) -> Dict[str, object]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

타입 힌트로 object를 사용하는 것보다 Any를 사용하는 것이 Python 타이핑 컨벤션에 더 부합하며, "어떤 타입이든 올 수 있음"을 더 명확하게 나타냅니다. typing에서 Any를 임포트하여 사용하시는 것을 권장합니다.

Suggested change
current: Optional[Dict[str, object]], incoming: Optional[Dict[str, object]]
) -> Dict[str, object]:
current: Optional[Dict[str, Any]], incoming: Optional[Dict[str, Any]]
) -> Dict[str, Any]:

Comment on lines +175 to +180
def test_full_techniques_returns_75(self):
"""get_techniques_for_mode('full_techniques') returns 75 techniques."""
from app.criteria.technique_mappings import get_techniques_for_mode

techs = get_techniques_for_mode("full_techniques")
assert len(techs) == 75, f"Expected 75 techniques, got {len(techs)}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

테스트 코드에 기법의 총 개수인 75가 하드코딩되어 있어, 향후 기법이 추가되거나 삭제될 때 테스트가 실패하고 수동으로 수정해야 하는 번거로움이 있습니다. get_registry().count()를 사용하여 기법의 총 개수를 동적으로 가져오면 테스트의 유지보수성과 견고성을 높일 수 있습니다. 이 파일의 다른 테스트 케이스들(test_mode_count_ordering, test_get_techniques_for_mode_invalid_returns_all)에서도 75가 하드코딩되어 있으니 함께 수정하면 좋을 것 같습니다.

Suggested change
def test_full_techniques_returns_75(self):
"""get_techniques_for_mode('full_techniques') returns 75 techniques."""
from app.criteria.technique_mappings import get_techniques_for_mode
techs = get_techniques_for_mode("full_techniques")
assert len(techs) == 75, f"Expected 75 techniques, got {len(techs)}"
def test_full_techniques_returns_all_techniques(self):
"""get_techniques_for_mode('full_techniques') returns all techniques from the registry."""
from app.criteria.technique_mappings import get_techniques_for_mode
from app.techniques.registry import get_registry
techs = get_techniques_for_mode("full_techniques")
expected_count = get_registry().count()
assert len(techs) == expected_count, f"Expected {expected_count} techniques, got {len(techs)}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant