Skip to content

fix: Repair _remend, card fallback, parser thematic breaks and headings#12

Merged
patrick-chinchill merged 1 commit into
mainfrom
fix/markdown-streaming-audit-fixes
Apr 7, 2026
Merged

fix: Repair _remend, card fallback, parser thematic breaks and headings#12
patrick-chinchill merged 1 commit into
mainfrom
fix/markdown-streaming-audit-fixes

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

Summary

  • Critical: _remend rewrite -- removed dead star_count2 code, added _strip_fenced_code to ignore markers inside code blocks, added _/__ underscore handling, added code fence closure detection
  • Critical: render_postable card fallback -- now calls card_to_fallback_text() instead of returning "" for card messages (both dict and object forms)
  • Important: spaced thematic breaks -- _THEMATIC_BREAK_RE now matches * * *, - - -, _ _ _ etc.
  • Important: trailing # in headings -- parser strips trailing # and whitespace from ATX headings
  • CLAUDE.md -- agent guidance file for LLM contributors

Test plan

  • All 2,477 existing tests pass (including the exhaustive prefix invariant test that was failing with the old _remend)
  • ruff check and ruff format clean
  • Verified spaced thematic breaks parse correctly
  • Verified trailing # stripped from headings
  • Verified card fallback text generated (not empty string)

🤖 Generated with Claude Code

… headings

- Rewrite _remend to strip fenced code content before counting inline
  markers, add underscore bold/italic handling, fix dead star_count2 code,
  and use proper code fence detection (closes unclosed fences directly)
- BaseFormatConverter.render_postable now calls card_to_fallback_text
  instead of returning empty string for card messages
- Markdown parser _THEMATIC_BREAK_RE now matches spaced forms (e.g. * * *)
- Heading parser strips trailing # characters
- Add CLAUDE.md agent guidance for LLM/Claude Code contributors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@patrick-chinchill patrick-chinchill merged commit 620506f into main Apr 7, 2026
6 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a project quick reference guide in CLAUDE.md, refines card-to-fallback text conversion in the base format converter, and improves markdown parsing and streaming repair logic. Key changes include updating thematic break and heading parsing, and refactoring the _remend function to ignore content within code fences when repairing unclosed constructs. Feedback highlights that the thematic break regex incorrectly allows mixed characters, and the parity-based repair logic fails for double-character markers like bold (**). Additionally, the code fence closing logic is duplicated and hardcodes backticks, and a redundant type check was identified in the format converter.

# Patterns used by the block parser
_HEADING_RE = re.compile(r"^(#{1,6})\s+(.*)")
_THEMATIC_BREAK_RE = re.compile(r"^(\*{3,}|-{3,}|_{3,})\s*$")
_THEMATIC_BREAK_RE = re.compile(r"^([-*_]\s*){3,}\s*$")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The updated regex for thematic breaks allows mixed characters (e.g., * - *), which is invalid according to the CommonMark specification. A thematic break must consist of a sequence of three or more matching characters (-, _, or *), though they may be separated by spaces.

Suggested change
_THEMATIC_BREAK_RE = re.compile(r"^([-*_]\s*){3,}\s*$")
_THEMATIC_BREAK_RE = re.compile(r"^(?:(?:\*\s*){3,}|(?:-\s*){3,}|(?:_\s*){3,})\s*$")

Comment on lines +101 to 110
while j < len(outside_fences):
if outside_fences[j] == "\\":
j += 2
continue
if temp[j] == "*":
run = 0
while j < len(temp) and temp[j] == "*":
run += 1
j += 1
star_count += run
continue
if outside_fences[j] == "*":
star_count += 1
j += 1

if star_count % 2 != 0:
result += "*"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parity-based approach for fixing unclosed markers only works for single-character markers (italic). It fails for double-character markers like bold (**). For example, if the text is **bold, the star_count will be 2, and since 2 % 2 == 0, no closing marker will be appended, leaving the bold text unclosed. The same issue applies to the underscore (_) handling logic starting on line 114.

Comment on lines +75 to +83
in_code_fence = False
for line in result.split("\n"):
stripped = line.lstrip()
if stripped.startswith("```") or stripped.startswith("~~~"):
in_code_fence = not in_code_fence

if in_code_fence:
result += "\n```"
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for detecting and closing code fences is duplicated from the existing _is_inside_code_fence helper. Additionally, it hardcodes the backtick fence (```) as the closer, which will be incorrect if the block was opened with tildes (~~~).

    if _is_inside_code_fence(result):
        # For simplicity, we append a newline and backticks to close the fence
        result += "\n```"
        return result

Comment on lines 188 to 195
if message.get("type") == "card":
from chat_sdk.cards import is_card_element

if is_card_element(message):
from chat_sdk.cards import card_to_fallback_text

return card_to_fallback_text(message)
return ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The call to is_card_element(message) is redundant here because the preceding if message.get("type") == "card" already performs the same check, and the message is already known to be a dictionary from the check on line 177.

            if message.get("type") == "card":
                from chat_sdk.cards import card_to_fallback_text

                return card_to_fallback_text(message)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab35767141

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +121 to +125
under_count += 1
j += 1

if under_count % 2 != 0:
result += "_"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid stalling streams on single underscores

The new underscore parity block makes _remend treat plain text like foo_bar\n as unbalanced and append an extra _. That flips _is_clean to false, but _find_clean_prefix only backtracks on INLINE_MARKER_CHARS (*, ~, `, [), so it cannot recover a clean prefix for underscore-only cases and returns an empty string. In practice, get_committable_text() can emit nothing for common single-underscore text (e.g., snake_case) until another underscore arrives.

Useful? React with 👍 / 👎.

if heading_match:
depth = len(heading_match.group(1))
heading_text = heading_match.group(2).strip()
heading_text = heading_match.group(2).rstrip().rstrip("#").rstrip()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve literal trailing # in heading text

This heading normalization removes all trailing # characters unconditionally, so headings like # C# or # foo#bar# lose meaningful content. Markdown only strips optional closing hashes when they are a closing sequence, not when # is part of the title text, so this introduces text corruption for valid headings containing literal # at the end.

Useful? React with 👍 / 👎.

# Patterns used by the block parser
_HEADING_RE = re.compile(r"^(#{1,6})\s+(.*)")
_THEMATIC_BREAK_RE = re.compile(r"^(\*{3,}|-{3,}|_{3,})\s*$")
_THEMATIC_BREAK_RE = re.compile(r"^([-*_]\s*){3,}\s*$")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject mixed-marker thematic breaks

The updated thematic-break regex now matches mixed markers such as - * - or * _ *, which are not thematic breaks. Since thematic breaks are parsed before lists, these lines are now converted to thematicBreak nodes instead of list/paragraph content, changing AST output for non-divider input. The pattern should require one repeated marker type (with optional spaces) rather than any mix of -, *, and _.

Useful? React with 👍 / 👎.

patrick-chinchill pushed a commit that referenced this pull request May 9, 2026
Hardens the socket-mode port against the seven review findings on
PR #86:

* Add ``connect_timeout_s`` config (default 30s) and wrap the initial
  socket-mode handshake in ``asyncio.wait_for`` so a hung
  ``SocketModeClient.connect()`` can't make ``initialize()`` block
  forever (hazard #11). On timeout we tear the loop down before raising.
* Reject forwarded socket events whose ``timestamp`` field is outside
  the same 5-minute window ``_verify_signature`` enforces, so a captured
  forwarded payload can't be replayed indefinitely (hazard #12).
* Narrow the ``contextlib.suppress`` in ``stop_socket_mode`` to
  ``CancelledError`` only — surprising loop crashes are no longer
  silently swallowed during shutdown.
* Replace ``asyncio.get_event_loop().create_task`` in the socket-mode
  ``wrap_async`` helper with ``get_running_loop`` (Python 3.12+
  compatibility, hazard #5).
* Have the socket-mode interactive branch ack with
  ``response_action: errors`` instead of an empty ack when dispatch
  raises — an empty ack on ``view_submission`` silently closes the
  modal so the user sees no signal anything went wrong.
* Drop ``is_ext_shared_channel`` from the synthesized ``event_callback``
  payload in the socket-mode events_api branch so socket and webhook
  paths feed identical shapes into ``_process_event_payload`` (hazard
  #7).
* Add a regression test firing two concurrent ``_route_socket_event``
  events_api dispatches for different teams via ``asyncio.gather`` to
  pin down the existing ``copy_context()`` isolation against future
  drift (hazard #6).

Also fixes a pre-existing B010 lint warning in the test stub.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant