Skip to content

Add GEDCOM 7 extensions: gedcom-occurrences and gedcom-evidence#173

Closed
glamberson wants to merge 4 commits into
FamilySearch:mainfrom
glamberson:add-gedcom-extensions
Closed

Add GEDCOM 7 extensions: gedcom-occurrences and gedcom-evidence#173
glamberson wants to merge 4 commits into
FamilySearch:mainfrom
glamberson:add-gedcom-extensions

Conversation

@glamberson
Copy link
Copy Markdown
Collaborator

Summary

This PR adds two GEDCOM 7 extensions to the registry that solve long-standing problems in genealogy data exchange.

Extensions Added

1. gedcom-occurrences (v0.2.0)

Repository: https://github.com/glamberson/gedcom-occurrences

Provides independent event records with multiple participants, solving the event synchronization problem that occurs when the same event (census, burial, ceremony) involves multiple people.

Key features:

  • Container-only model matching GRAMPS Event/EventRef pattern
  • Clean separation between event data and participation data
  • No data duplication or sync issues

2. gedcom-evidence (v0.1.0)

Repository: https://github.com/glamberson/gedcom-evidence

Enables "floating evidence" that can exist independently of identity conclusions, solving the "Which John Smith?" problem that has plagued genealogy software for 15+ years.

Key features:

  • Evidence containers preserve original source information
  • Gradual identity refinement as research progresses
  • Supports the Genealogical Proof Standard

Validation

All YAML files have been validated using the registry validator:

  • 14 structure files
  • 7 enumeration files
  • 2 enumeration-set files

Background

Both extensions are based on:

  • Analysis of 594+ BetterGEDCOM community discussions (2010-2025)
  • Professional genealogy research standards
  • Existing implementation patterns (GRAMPS, Evidence Explained)

Related Discussions

Please let me know if you need any clarification or would like me to make any adjustments to the submission.

This PR adds two GEDCOM 7 extensions to the registry:

1. gedcom-occurrences (v0.2.0) - Independent events with multiple participants
   - Solves the event synchronization problem
   - Implements container-only model matching GRAMPS Event/EventRef pattern
   - Repository: https://github.com/glamberson/gedcom-occurrences

2. gedcom-evidence (v0.1.0) - Floating evidence and research documentation
   - Solves the "Which John Smith?" problem
   - Allows evidence to exist independently of identity conclusions
   - Repository: https://github.com/glamberson/gedcom-evidence

Both extensions have been validated with the registry validator.
Both are based on analysis of 15+ years of genealogy community discussions.
@glamberson
Copy link
Copy Markdown
Collaborator Author

My apologies for submitting both extensions as one PR in my haste. I realize now they should have been submitted separately. Please let me know if you'd like me to split them and resubmit as two separate PRs.

Also, I wanted to note that we've since released patch versions of both extensions:

  • gedcom-occurrences v0.2.1
  • gedcom-evidence v0.1.1

These patch releases add the registry-compliant YAML files to the extension repositories themselves (in a registry-yaml/ directory) but don't change any functionality. The YAML files submitted in this PR remain correct and unchanged.

Thank you for your patience with this submission!

Copy link
Copy Markdown
Collaborator

@dthaler dthaler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really interesting submission, just some YAML formatting issues that should be easy to address.

Comment on lines +12 to +14

occurrence-specific participant attributes such as dwelling number,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
occurrence-specific participant attributes such as dwelling number,
occurrence-specific participant attributes such as dwelling number,

Comment thread structure/extension/_CONC.yaml Outdated
specification:
- Evidence Conclusion
- 'A reference from an individual or family to evidence that supports

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank lines in middle of sentences

@@ -1,34 +0,0 @@
%YAML 1.2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't delete other registered extensions

@@ -1,37 +0,0 @@
%YAML 1.2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't delete other registered extensions

- Add %YAML 1.2 header to all enum files
- Add proper document start (---) and end (...) markers
- Fix extension tags from list to single value format
- Fix indentation for specification and value of lists
- Add blank lines between sections for readability

Addresses validation errors from PR review
@dthaler
Copy link
Copy Markdown
Collaborator

dthaler commented Jul 28, 2025

FamilySearch/GEDCOM.io#283 recently added prerelease to the YAML schema.
Do you want these to have prerelease: true or do you consider the syntax of these immutable?

@glamberson
Copy link
Copy Markdown
Collaborator Author

@dthaler Thank you for the review! I've pushed fixes for all the YAML formatting issues:

  • Added %YAML 1.2 headers to all enumeration files
  • Added proper document markers (--- and ...)
  • Fixed extension tags from list to single value format
  • Corrected indentation for specification and value lists
  • Added blank lines between sections for readability

The validation should pass now. Please let me know if you need any other changes!

- Fix extension tags format in structure files (single value instead of list)
- Add missing newlines at end of enumset files

This should resolve all validation errors from the CI check
@glamberson
Copy link
Copy Markdown
Collaborator Author

Fixed the remaining validation errors:

  • Changed extension tags: from list format to single value format in all structure files
  • Added missing newlines at the end of enumset files

The validation should pass now. Thanks for your patience!

@albertemmerich
Copy link
Copy Markdown
Collaborator

Yes, very interesting extension! I have put in my program a similar model for source citations: Every event (or other recorded data) gets a source citation record, and these are cited by individual records or family records. The application finds all involved people by simply looking for the source citation XREF, and has their roles by the ROLE tag.
Looking more detailed at this proposal, we should decide, where the link in between individual and occurance is documented in the GEDCOM file: So far we have it on both sides. As GEDCOM is a format for computer readable genealogical data we should avoid any duplication (as we have so far with CHIL <=> FAMC and HUSB, WIFE <=> FAMS).
It is up to the application to show the user all person's events (if we put the link to the occurence) or vice versa. If we transfer both sides we have to handle mismatches in between these data. As we have today, if we have a CHIL in a family record, but no FAMC in the cited individual record.
But this is a point for an optimal realisation of a great suggestion - and it should find its way to a better GEDCOM structure in version 8.x ! This is more than my source citation model, as the event data are stored in the occurance, too - so we do not need to put them to in the individual records....

@albertemmerich
Copy link
Copy Markdown
Collaborator

The evidence extension:
I do not understand how this works with the rest of the GEDCOM code. If we have:

0 @E1@ _EVID
1 _ID John Smith
1 _ID son of Mary
1 _ID aged 40
1 SOUR @S1@
2 PAGE 1850 Census, p. 42

0 @I1@ INDI
1 NAME John /Smith/
1 _EVID @E1@
2 _CONF 3         # 60% confidence this is him

0 @I2@ INDI
1 NAME John /Smith/
1 _EVID @E1@
2 _CONF 2         # 40% confidence it's this one

What is the the family record of the mother? Or is there no family record including John as long as we have no 100% solution? Today I have John Smith as separate record @I3@ citing the source @S1@, linked as CHIL in Mary's family record and linked by ALIA to the @I1@ and @I2@. To document the confidence for the ALIA links I use substructures - so far not part of official gedcom standard.

I do agree that we urgently need a way to document different possible conclusions in GEDCOM transfer as long the known data can be interpreted with different solutions!

@glamberson
Copy link
Copy Markdown
Collaborator Author

Thank you for pointing out the prerelease field!

We do not need prerelease: true for these extensions. We consider the syntax stable and immutable at this point. These extensions have been:

We're committed to maintaining backward compatibility going forward. Any future enhancements will follow semantic versioning principles.

@glamberson
Copy link
Copy Markdown
Collaborator Author

@albertemmerich Thank you for your thoughtful feedback. You've raised excellent points about both extensions. Let me address each:

Regarding Occurrences Extension

You're absolutely right about the duplication concern. The current design does maintain bidirectional links (INDI → _OCUR and _OCUR → _PART), similar to GEDCOM's existing CHIL ↔ FAMC pattern.

Your point about avoiding duplication for GEDCOM 8.x is well-taken. For now, we've followed GEDCOM 7's patterns for compatibility, but we agree that future versions should consider unidirectional links with applications handling the reverse lookups.

The occurrence model does indeed store event data centrally, which is one of its key benefits - allowing multiple people to share the same documented event without duplicating the event details.

Regarding Evidence Extension

Your example highlights an important use case. The evidence extension is designed to work alongside existing GEDCOM structures, not replace them. In your example:

  1. You would still create family records for Mary with @i3@ as CHIL
  2. The _EVID records document the evidence for these conclusions
  3. The _CONF values indicate confidence levels for the links

For your specific workflow:

  • Keep @i3@ as the John Smith from the source
  • Use ALIA to link to @i1@ and @i2@ as you do now
  • The _EVID/_CONF structures document WHY you made these links
  • Future analysis tools can use the confidence values to highlight uncertain connections

The evidence extension essentially provides a formal way to document what you're already tracking - the reasoning and confidence behind genealogical conclusions.

Looking Forward

Both extensions aim to provide transport mechanisms for evidence-based genealogy without breaking existing GEDCOM compatibility. They're designed to be optional enhancements that tools can adopt incrementally.

Would love to hear more about your source citation model - it sounds like it aligns well with these extensions' goals.

@dthaler
Copy link
Copy Markdown
Collaborator

dthaler commented Jul 28, 2025

@glamberson Still getting YAML validation errors. And I do think it would help to split into two separate PRs even if it's not strictly required.

Comment thread structure/extension/_CONF.yaml Outdated
Comment on lines +11 to +19

from QUAY (quality of data) by assessing the researcher''s confidence

in the evidence''s relevance and interpretation.


Confidence levels can change as additional evidence is discovered

and analysis progresses.'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra blank lines in the middle of sentences.

Comment thread structure/extension/_EVID.yaml Outdated
https://github.com/glamberson/gedcom-evidence/_CONF: '{0:1}'
https://github.com/glamberson/gedcom-evidence/_FIND: '{0:M}'
https://github.com/glamberson/gedcom-evidence/_ID: '{0:M}'
superstructures: {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/glamberson/gedcom-evidence?tab=readme-ov-file#quick-start shows that _EVID is not just used as a record, but also as a substructure of https://gedcom.io/terms/v7/record-INDI

I think you need two YAML files as a result, one for record-EVID and one for EVID as a substructure.

https://gedcom.io/terms/v7/DATE: '{0:1}'
https://gedcom.io/terms/v7/NOTE: '{0:M}'
https://gedcom.io/terms/v7/SNOTE: '{0:M}'
superstructures:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/glamberson/gedcom-evidence/blob/master/specification.md#2-research-documentation-_rdoc shows that _RDOC is also used for a record structure, so I think you also need a YAML file for record-_RDOC

- Split _EVID and _RDOC into separate record and structure files
- Fixed extension tags format from list to single value
- Removed blank lines within specification text
- Changed payload types appropriately (XREF for references)
- Added proper YAML headers to all files
- Added document markers (--- and ...) where missing
@glamberson
Copy link
Copy Markdown
Collaborator Author

@dthaler Thank you for your patience! I've pushed another commit that should fix all the remaining YAML validation errors:

Changes made:

  • ✅ Split _EVID into two files:
    • record/gedcom-evidence/record-EVID.yaml for the record usage
    • structure/extension/_EVID.yaml for the substructure usage (with payload: XREF)
  • ✅ Split _RDOC into two files:
    • record/gedcom-evidence/record-_RDOC.yaml for the record usage
    • structure/extension/_RDOC.yaml for the substructure usage (with payload: XREF)
  • ✅ Fixed all extension tags from list format to single value format
  • ✅ Removed all blank lines within specification text
  • ✅ Added proper YAML headers (%YAML 1.2) to all files
  • ✅ Added document markers (--- and ...) where needed

The validation should pass now. I agree it would be cleaner to split this into two PRs - would you like me to do that now, or shall we proceed with the current PR?

@glamberson
Copy link
Copy Markdown
Collaborator Author

This PR has been split into two separate PRs as requested:

Both PRs contain the same content that was in this combined PR, with all YAML validation fixes applied. Thank you for your patience with the review process!

@glamberson glamberson closed this Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants