fix: Repair _remend, card fallback, parser thematic breaks and headings#12
Conversation
… headings - Rewrite _remend to strip fenced code content before counting inline markers, add underscore bold/italic handling, fix dead star_count2 code, and use proper code fence detection (closes unclosed fences directly) - BaseFormatConverter.render_postable now calls card_to_fallback_text instead of returning empty string for card messages - Markdown parser _THEMATIC_BREAK_RE now matches spaced forms (e.g. * * *) - Heading parser strips trailing # characters - Add CLAUDE.md agent guidance for LLM/Claude Code contributors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a project quick reference guide in CLAUDE.md, refines card-to-fallback text conversion in the base format converter, and improves markdown parsing and streaming repair logic. Key changes include updating thematic break and heading parsing, and refactoring the _remend function to ignore content within code fences when repairing unclosed constructs. Feedback highlights that the thematic break regex incorrectly allows mixed characters, and the parity-based repair logic fails for double-character markers like bold (**). Additionally, the code fence closing logic is duplicated and hardcodes backticks, and a redundant type check was identified in the format converter.
| # Patterns used by the block parser | ||
| _HEADING_RE = re.compile(r"^(#{1,6})\s+(.*)") | ||
| _THEMATIC_BREAK_RE = re.compile(r"^(\*{3,}|-{3,}|_{3,})\s*$") | ||
| _THEMATIC_BREAK_RE = re.compile(r"^([-*_]\s*){3,}\s*$") |
There was a problem hiding this comment.
The updated regex for thematic breaks allows mixed characters (e.g., * - *), which is invalid according to the CommonMark specification. A thematic break must consist of a sequence of three or more matching characters (-, _, or *), though they may be separated by spaces.
| _THEMATIC_BREAK_RE = re.compile(r"^([-*_]\s*){3,}\s*$") | |
| _THEMATIC_BREAK_RE = re.compile(r"^(?:(?:\*\s*){3,}|(?:-\s*){3,}|(?:_\s*){3,})\s*$") |
| while j < len(outside_fences): | ||
| if outside_fences[j] == "\\": | ||
| j += 2 | ||
| continue | ||
| if temp[j] == "*": | ||
| run = 0 | ||
| while j < len(temp) and temp[j] == "*": | ||
| run += 1 | ||
| j += 1 | ||
| star_count += run | ||
| continue | ||
| if outside_fences[j] == "*": | ||
| star_count += 1 | ||
| j += 1 | ||
|
|
||
| if star_count % 2 != 0: | ||
| result += "*" |
There was a problem hiding this comment.
The parity-based approach for fixing unclosed markers only works for single-character markers (italic). It fails for double-character markers like bold (**). For example, if the text is **bold, the star_count will be 2, and since 2 % 2 == 0, no closing marker will be appended, leaving the bold text unclosed. The same issue applies to the underscore (_) handling logic starting on line 114.
| in_code_fence = False | ||
| for line in result.split("\n"): | ||
| stripped = line.lstrip() | ||
| if stripped.startswith("```") or stripped.startswith("~~~"): | ||
| in_code_fence = not in_code_fence | ||
|
|
||
| if in_code_fence: | ||
| result += "\n```" | ||
| return result |
There was a problem hiding this comment.
This logic for detecting and closing code fences is duplicated from the existing _is_inside_code_fence helper. Additionally, it hardcodes the backtick fence (```) as the closer, which will be incorrect if the block was opened with tildes (~~~).
if _is_inside_code_fence(result):
# For simplicity, we append a newline and backticks to close the fence
result += "\n```"
return result| if message.get("type") == "card": | ||
| from chat_sdk.cards import is_card_element | ||
|
|
||
| if is_card_element(message): | ||
| from chat_sdk.cards import card_to_fallback_text | ||
|
|
||
| return card_to_fallback_text(message) | ||
| return "" |
There was a problem hiding this comment.
The call to is_card_element(message) is redundant here because the preceding if message.get("type") == "card" already performs the same check, and the message is already known to be a dictionary from the check on line 177.
if message.get("type") == "card":
from chat_sdk.cards import card_to_fallback_text
return card_to_fallback_text(message)There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab35767141
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| under_count += 1 | ||
| j += 1 | ||
|
|
||
| if under_count % 2 != 0: | ||
| result += "_" |
There was a problem hiding this comment.
Avoid stalling streams on single underscores
The new underscore parity block makes _remend treat plain text like foo_bar\n as unbalanced and append an extra _. That flips _is_clean to false, but _find_clean_prefix only backtracks on INLINE_MARKER_CHARS (*, ~, `, [), so it cannot recover a clean prefix for underscore-only cases and returns an empty string. In practice, get_committable_text() can emit nothing for common single-underscore text (e.g., snake_case) until another underscore arrives.
Useful? React with 👍 / 👎.
| if heading_match: | ||
| depth = len(heading_match.group(1)) | ||
| heading_text = heading_match.group(2).strip() | ||
| heading_text = heading_match.group(2).rstrip().rstrip("#").rstrip() |
There was a problem hiding this comment.
Preserve literal trailing # in heading text
This heading normalization removes all trailing # characters unconditionally, so headings like # C# or # foo#bar# lose meaningful content. Markdown only strips optional closing hashes when they are a closing sequence, not when # is part of the title text, so this introduces text corruption for valid headings containing literal # at the end.
Useful? React with 👍 / 👎.
| # Patterns used by the block parser | ||
| _HEADING_RE = re.compile(r"^(#{1,6})\s+(.*)") | ||
| _THEMATIC_BREAK_RE = re.compile(r"^(\*{3,}|-{3,}|_{3,})\s*$") | ||
| _THEMATIC_BREAK_RE = re.compile(r"^([-*_]\s*){3,}\s*$") |
There was a problem hiding this comment.
Reject mixed-marker thematic breaks
The updated thematic-break regex now matches mixed markers such as - * - or * _ *, which are not thematic breaks. Since thematic breaks are parsed before lists, these lines are now converted to thematicBreak nodes instead of list/paragraph content, changing AST output for non-divider input. The pattern should require one repeated marker type (with optional spaces) rather than any mix of -, *, and _.
Useful? React with 👍 / 👎.
Hardens the socket-mode port against the seven review findings on PR #86: * Add ``connect_timeout_s`` config (default 30s) and wrap the initial socket-mode handshake in ``asyncio.wait_for`` so a hung ``SocketModeClient.connect()`` can't make ``initialize()`` block forever (hazard #11). On timeout we tear the loop down before raising. * Reject forwarded socket events whose ``timestamp`` field is outside the same 5-minute window ``_verify_signature`` enforces, so a captured forwarded payload can't be replayed indefinitely (hazard #12). * Narrow the ``contextlib.suppress`` in ``stop_socket_mode`` to ``CancelledError`` only — surprising loop crashes are no longer silently swallowed during shutdown. * Replace ``asyncio.get_event_loop().create_task`` in the socket-mode ``wrap_async`` helper with ``get_running_loop`` (Python 3.12+ compatibility, hazard #5). * Have the socket-mode interactive branch ack with ``response_action: errors`` instead of an empty ack when dispatch raises — an empty ack on ``view_submission`` silently closes the modal so the user sees no signal anything went wrong. * Drop ``is_ext_shared_channel`` from the synthesized ``event_callback`` payload in the socket-mode events_api branch so socket and webhook paths feed identical shapes into ``_process_event_payload`` (hazard #7). * Add a regression test firing two concurrent ``_route_socket_event`` events_api dispatches for different teams via ``asyncio.gather`` to pin down the existing ``copy_context()`` isolation against future drift (hazard #6). Also fixes a pre-existing B010 lint warning in the test stub. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Summary
_remendrewrite -- removed deadstar_count2code, added_strip_fenced_codeto ignore markers inside code blocks, added_/__underscore handling, added code fence closure detectionrender_postablecard fallback -- now callscard_to_fallback_text()instead of returning""for card messages (both dict and object forms)_THEMATIC_BREAK_REnow matches* * *,- - -,_ _ _etc.#in headings -- parser strips trailing#and whitespace from ATX headingsTest plan
_remend)ruff checkandruff formatclean#stripped from headings🤖 Generated with Claude Code