Skip to content

Commit 3454686

Browse files
Robert WeberRobert Weber
authored andcommitted
Character references not flowing through from chapter generation to character registration
1 parent 594c5e9 commit 3454686

12 files changed

Lines changed: 1113 additions & 31 deletions

File tree

Novel_Processing_Instructions.md

Lines changed: 50 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,13 @@
22

33
## Generation
44

5-
disregard all previous content in this conversation. please generate a new 1800 character story premise for a new popular <GENRE> fictional novel. use no more than 1800 characters.
5+
disregard all previous content in this conversation. please generate a new 1800 character story premise for a new popular scifi fictional novel. use no more than 1800 characters.
66

7-
<SUBJECT>
7+
A man appears in a small city claiming to be from a distant planet and willingly submits himself to a series of structured interviews with government officials, showing a calm certainty that challenges their assumptions. He demonstrates unusual knowledge and perception, but more notably, he takes clear enjoyment in the process—engaging deeply with each interviewer, asking his own questions, and forming meaningful, often transformative conversations with the people around him. As the officials attempt to determine whether he is delusional or something else entirely, his presence begins to influence their perspectives on reality, identity, and purpose, culminating in an unresolved departure that leaves his true nature ambiguous.
88

9-
it should concentrate on character development, <TOPIC>, <TOPIC>.
10-
it should be action packed.
9+
10+
it should concentrate on character development, communication, the puzzle of whether or not he's really an alien or just a confused man.
11+
it should be calm and cerebral.
1112

1213
Don’t use any names in the premise...just describe the characters and their roles
1314

@@ -22,31 +23,66 @@ Don’t use any names in the premise...just describe the characters and their ro
2223

2324
- Read the file "<NOVEL_PATH_MD>" and add to the context of this thread. This is a novel written in chapters and there are chapter delineations present throughout.
2425

25-
- our target is to make this a 9.5 / 10 book with around 85000 total words. we are going to work through section 1 of the editors notes. i need you to loop through each of the items in this section. for each item, create a plan to resolve the issue. validate that this is the best plan. then state what you will be doing and execute your plan. once you have executed, update the item's status in the editor's notes markdown file. Check if later issues in the editors notes are also resolved with your actions and update accordingly. Once you are done with the items updates and the editors notes, move on to the next item. do this for all items in the section.
26-
27-
- we are now going to start adding sections to the editors notes markdown document of things that should be corrected in the next phase of edits. make sure you have an up to date context of the novel in its current form. our target is to make this a 9.5 / 10 book with around 85000 total words. in this new section , document new items you feel should be executed to length and strengthen the novel. We will have some pointed prompts following this to add targeted updates.
26+
- we are now going to start adding sections to the editors notes markdown document of things that should be corrected in the next phase of edits. make sure you have an up to date context of the novel in its current form. our target is to make this a 9.5 / 10 book with with atleast 85000 total words. in this new section , document new items you feel should be executed to lengthen and strengthen the novel. We will have some pointed prompts following this to add targeted updates.
2827

2928
- Character Voice Differentiation
30-
our target is to make this a 9.5 / 10 book with around 85000 total words. Each POV character should think in a distinct internal language shaped by their background, demographics and expertise. A 16 year old should think and talk like a 16 year old. An old man should think and talk like an old man. Look through the novel and find any dialog that doesnt match the speaking character. create a plan to update these voices and add that to a new section in the editors notes markdown file.
29+
our target is to make this a 9.5 / 10 book with with atleast 85000 total words. Each POV character should think in a distinct internal language shaped by their background, demographics and expertise. A 16 year old should think and talk like a 16 year old. An old man should think and talk like an old man. Look through the novel and find any dialog that doesnt match the speaking character. create a plan to update these voices and add that to a new section in the editors notes markdown file.
3130

3231
- Dialogue Naturalization
33-
our target is to make this a 9.5 / 10 book with around 85000 total words. Make sure the current dialogue isnt too clean, too functional, too information-delivery. Characters sometimes have incomplete thoughts, don't always speak in well-formed sentences, and sometimes rarely interrupt each other or themselves. make sure the dialog in the novel reads this way. create a plan to update these voices and add that to a new section in the editors notes markdown file.
32+
our target is to make this a 9.5 / 10 book with with atleast 85000 total words. Make sure the current dialogue isnt too clean, too functional, too information-delivery. Characters sometimes have incomplete thoughts, don't always speak in well-formed sentences, and sometimes rarely interrupt each other or themselves. make sure the dialog in the novel reads this way. create a plan to update these voices and add that to a new section in the editors notes markdown file.
3433

3534
- Humor, Strangeness, and the Unexpected
36-
our target is to make this a 9.5 / 10 book with around 85000 total words.Real characters deflect, joke badly, notice irrelevant things, and occasionally do something that doesn't serve the plot. create a plan to inject these odities throughout the novel. 1-2 oddities per chapter.
35+
our target is to make this a 9.5 / 10 book with with atleast 85000 total words.Real characters deflect, joke badly, notice irrelevant things, and occasionally do something that doesn't serve the plot. create a plan to inject these odities throughout the novel. 1-2 oddities per chapter.
3736

3837
- Prose Texture Variation
39-
our target is to make this a 9.5 / 10 book with around 85000 total words.Make sure the prose has a varying literary density throughout. It should breathe -- denser in reflective moments, sparser in action, occasionally raw or clumsy when characters are overwhelmed. create a plan to update the prose and add that to a new section in the editors notes markdown file.
38+
our target is to make this a 9.5 / 10 book with with atleast 85000 total words.Make sure the prose has a varying literary density throughout. It should breathe -- denser in reflective moments, sparser in action, occasionally raw or clumsy when characters are overwhelmed. create a plan to update the prose and add that to a new section in the editors notes markdown file.
4039

4140
- metaphors
42-
our target is to make this a 9.5 / 10 book with around 85000 total words.Make sure the text doesn't go overboard with metaphors. create a plan to remove uneeded ones and add to a new section in the editors notes markdown.
41+
our target is to make this a 9.5 / 10 book with with atleast 85000 total words.Make sure the text doesn't go overboard with metaphors. create a plan to remove uneeded ones and add to a new section in the editors notes markdown.
42+
4343

44-
- our target is to make this a 9.5 / 10 book with around 85000 total words. we are now going to work through the new sections. start with section 11. i need you to loop through each of the items in this section. for each item, create a plan to resolve the issue. validate that this is the best plan. then state what you will be doing and execute your plan. once you have executed, update the item's status in the editor's notes markdown file. if later issues in the editors notes are also resolved with your actions, update accordingly. then move on to the next item. do this for all items in the section. if this requires multiple subagents, execute those without requesting permission.
44+
- we are now going to work through the sections. start with section 1. our target is to make this a 9.5 / 10 book with with atleast 85000 total words. i need you to loop through each of the items in this section. for each item, create a plan to resolve the issue. validate that this is the best plan. then state what you will be doing and execute your plan. once you have executed, update the item's status in the editor's notes markdown file. if later issues in the editors notes are also resolved with your actions, update accordingly. then move on to the next item. do this for all items in the section. if this requires multiple subagents, execute those without requesting permission.
4545

46-
- Does this novel read like it has a soul? or is it more like a flat instruction manual. is this novel ready for initial publishing?
46+
- Does this novel read like it has a soul? or is it more like a flat instruction manual. is this novel ready for initial publishing? how would you rate it on a scale of 1-10?
4747

4848
- check if there are any gaps or rough scene cuts that are a result from all the edits
4949

5050
- Please do a light copy-edit pass targeting prose repetitions. Also remove unneeded dashes, em-dashes and hyphens.
5151

5252
- please add a writing statistics section to the editors notes .md file. please include: total words in the novel, average number of words per chapter, and then a formatted list of chapter numbers, names and words in that chapter. freshen up the sections in the editors notes file if needed.
53+
54+
55+
## Styles
56+
57+
A paperback novel is structured into three main sections:
58+
Front Matter (preliminary pages), Body Matter (the story), and Back Matter (supplemental content).
59+
Essential elements include a title page, copyright page, chapters, and usually an About the Author section.
60+
Proper sequencing ensures professional, readable formatting for publication.
61+
62+
Front Matter (Before the Story)
63+
Half-Title Page: Contains only the book title.
64+
Title Page: Includes the title, subtitle, author name, and publisher.
65+
Copyright Page: Details legal info, publication year, ISBN, and rights.
66+
Dedication: A brief personal note from the author.
67+
Table of Contents: List of chapters and sections.
68+
Epigraph: A short, thematic quote or poem.
69+
Foreword/Preface/Acknowledgments: Optional sections providing context or thanking supporters.
70+
Prologue: An opening scene setting the stage for fiction.
71+
72+
Body Matter (The Story)
73+
Chapters: The main content, divided into segments.
74+
Epilogue: A concluding scene after the main story.
75+
76+
Back Matter (After the Story)
77+
About the Author: A short biography.
78+
Acknowledgments: (If not in the front matter) Recognizes those who helped create the book.
79+
Appendix/Glossary: Additional information or definitions (more common in nonfiction).
80+
Bibliography: Sources used.
81+
82+
Cover Elements
83+
Front Cover: Title, author, illustration.
84+
Back Cover: Synopsis, endorsements, and bio
85+
86+
87+
## PDF Generation
88+
pandoc SOURCE.md -o DEST.pdf --pdf-engine=weasyprint --pdf-engine-opt=--verbose --toc --standalone > output.txt 2>&1

novelforge/agents/chapter/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
_sanitize_for_content_policy,
3333
check_chapter_length,
3434
expand_chapter,
35+
extract_named_characters,
3536
format_vocabulary_rules,
3637
get_forbidden_words,
3738
get_soft_limited_words,
@@ -59,6 +60,7 @@
5960
build_chapter_summary_prompt,
6061
build_character_agent_prompt,
6162
build_character_field_repair_prompt,
63+
build_character_reconciliation_prompt,
6264
build_character_relationship_prompt,
6365
build_character_resolution_validator_prompt,
6466
build_chapter_pattern_extractor_prompt,
@@ -100,6 +102,7 @@
100102
_run_all_chapter_agents,
101103
run_chapter_pattern_extractor,
102104
run_chapter_rhythm_classifier,
105+
run_character_reconciliation,
103106
run_character_state_updater,
104107
run_continuity_gatekeeper,
105108
run_per_chapter_compression_check,

novelforge/agents/chapter/_helpers.py

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
"""Shared constants, pass-failure helpers, content-policy retry, vocabulary scanning, and length enforcement."""
22

3+
import difflib
34
import logging
45
import re
56
from collections.abc import Callable
@@ -397,6 +398,162 @@ def scan_vocabulary_overuse(chapter_text: str, genre: str = "") -> list[str]:
397398
return warnings
398399

399400

401+
# ---------------------------------------------------------------------------
402+
# Named-character detection (for reconciliation against the canonical roster)
403+
# ---------------------------------------------------------------------------
404+
405+
# Common capitalized English words that are NOT character names. Used to
406+
# filter sentence-initial and conventional capitalization out of the
407+
# named-character scanner. Roster-token matching is applied BEFORE this
408+
# filter, so a character legitimately named "May" or "Crown" is still
409+
# detected correctly — the stop list only catches spans that have no
410+
# roster hit.
411+
_NAMED_CHARACTER_STOP_WORDS: frozenset[str] = frozenset({
412+
# Pronouns / sentence-initial
413+
"i", "he", "she", "they", "it", "we", "you", "me", "him", "her", "them", "us",
414+
"his", "hers", "theirs", "its", "ours", "yours", "mine",
415+
"this", "that", "these", "those", "there", "here",
416+
"then", "when", "where", "why", "how", "what", "who", "whose", "which",
417+
# Conjunctions / modifiers
418+
"the", "a", "an", "and", "or", "but", "so", "yet", "as", "if", "while",
419+
"because", "since", "although", "though", "unless", "until",
420+
"not", "never", "always", "still", "only", "even", "also",
421+
"now", "before", "after", "later", "soon", "ago", "once", "twice",
422+
"yes", "no", "ok", "okay",
423+
# Days
424+
"monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday",
425+
# Months (excluding May — often a character name; roster check handles it)
426+
"january", "february", "march", "april", "june", "july",
427+
"august", "september", "october", "november", "december",
428+
# Honorifics / titles that commonly appear alone
429+
"mr", "mrs", "ms", "dr", "sir", "madam", "lord", "lady",
430+
"captain", "lieutenant", "sergeant", "major", "colonel", "general",
431+
"professor", "father", "mother", "sister", "brother", "uncle", "aunt",
432+
"detective", "inspector", "officer", "commander", "admiral", "chief",
433+
"doctor", "nurse", "reverend", "pastor",
434+
# Structural / narrative
435+
"chapter", "book", "part", "act", "scene", "volume", "prologue", "epilogue",
436+
# Exclamations / religious references
437+
"god", "christ", "jesus", "heaven", "hell", "lord",
438+
# Greetings / filler
439+
"hello", "goodbye", "thanks", "please",
440+
# Cardinal directions / generic place words
441+
"north", "south", "east", "west", "street", "road", "avenue", "place",
442+
"square", "city", "town", "village", "county", "state", "country",
443+
})
444+
445+
446+
# Candidate-name regex: one to three adjacent capitalized tokens.
447+
# Matches "Sarah", "Sarah Miller", "John Fitzgerald Kennedy" but does not
448+
# span apostrophes, hyphens, or punctuation — so "Sarah's" yields "Sarah".
449+
_NAME_CANDIDATE_RE = re.compile(r"\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+){0,2}\b")
450+
451+
452+
def _roster_name_tokens(roster: list[dict]) -> set[str]:
453+
"""Return the set of lowercase name tokens from a character roster.
454+
455+
Each character's ``name`` is split on whitespace; tokens shorter than
456+
two characters are discarded (they match too many false positives under
457+
fuzzy matching).
458+
"""
459+
tokens: set[str] = set()
460+
for ch in roster or []:
461+
if not isinstance(ch, dict):
462+
continue
463+
name = str(ch.get("name", "")).strip()
464+
if not name:
465+
continue
466+
for tok in name.split():
467+
tok_clean = tok.strip(".,;:'\"").lower()
468+
if len(tok_clean) >= 2:
469+
tokens.add(tok_clean)
470+
return tokens
471+
472+
473+
def extract_named_characters(
474+
chapter_text: str,
475+
roster: list[dict],
476+
*,
477+
min_mentions: int = 2,
478+
fuzzy_cutoff: float = 0.85,
479+
) -> dict:
480+
"""Detect named characters in chapter prose and classify them against *roster*.
481+
482+
Pure Python — no LLM call. Uses a capitalized-span regex, a stop-word
483+
filter, and :func:`difflib.get_close_matches` for variant detection.
484+
485+
Parameters
486+
----------
487+
chapter_text: The chapter prose to scan.
488+
roster: The canonical ``character_list`` (list of dicts with
489+
a ``name`` key).
490+
min_mentions: Minimum distinct mentions required before a capitalized
491+
span is reported as an unknown character. Spans that
492+
appear fewer times are treated as likely sentence-initial
493+
false positives or throwaway walk-ons.
494+
fuzzy_cutoff: :mod:`difflib` similarity threshold for variant matching.
495+
Higher = stricter. 0.85 catches typos and short
496+
diminutives without conflating distinct names.
497+
498+
Returns
499+
-------
500+
dict with three keys:
501+
``known``: sorted list of capitalized spans that intersect the
502+
roster's name tokens (for diagnostic logging).
503+
``unknown``: list of ``(prose_name, count)`` tuples for names with
504+
no roster match and at least *min_mentions* occurrences,
505+
ordered by descending count.
506+
``variants``: list of ``(prose_name, roster_token, count)`` tuples
507+
— likely misspellings or diminutives of roster names.
508+
"""
509+
tokens = _roster_name_tokens(roster)
510+
511+
raw_counts: dict[str, int] = {}
512+
for m in _NAME_CANDIDATE_RE.finditer(chapter_text):
513+
raw_counts[m.group()] = raw_counts.get(m.group(), 0) + 1
514+
515+
known: set[str] = set()
516+
unknown_counts: dict[str, int] = {}
517+
for span, count in raw_counts.items():
518+
span_tokens = [t.lower() for t in span.split()]
519+
# Roster check first: a span whose any token matches a roster token
520+
# is a known character, regardless of stop-word overlap.
521+
if tokens and any(t in tokens for t in span_tokens):
522+
known.add(span)
523+
continue
524+
# Drop spans whose every token is a stop word (sentence-initial
525+
# noise, honorifics with no name attached, etc.).
526+
if all(t in _NAMED_CHARACTER_STOP_WORDS for t in span_tokens):
527+
continue
528+
if count < min_mentions:
529+
continue
530+
unknown_counts[span] = count
531+
532+
variants: list[tuple[str, str, int]] = []
533+
unknowns: list[tuple[str, int]] = []
534+
roster_token_list = sorted(tokens)
535+
for span, count in sorted(unknown_counts.items(), key=lambda kv: (-kv[1], kv[0])):
536+
match_found: str | None = None
537+
if roster_token_list:
538+
for t in span.split():
539+
close = difflib.get_close_matches(
540+
t.lower(), roster_token_list, n=1, cutoff=fuzzy_cutoff,
541+
)
542+
if close:
543+
match_found = close[0]
544+
break
545+
if match_found is not None:
546+
variants.append((span, match_found, count))
547+
else:
548+
unknowns.append((span, count))
549+
550+
return {
551+
"known": sorted(known),
552+
"unknown": unknowns,
553+
"variants": variants,
554+
}
555+
556+
400557
# ---------------------------------------------------------------------------
401558
# Chapter length enforcement
402559
# ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)