Skip to content

Add gedcom-evidence extension v0.1.3#178

Closed
glamberson wants to merge 4 commits into
FamilySearch:mainfrom
glamberson:add-gedcom-evidence
Closed

Add gedcom-evidence extension v0.1.3#178
glamberson wants to merge 4 commits into
FamilySearch:mainfrom
glamberson:add-gedcom-evidence

Conversation

@glamberson
Copy link
Copy Markdown
Collaborator

@glamberson glamberson commented Jul 29, 2025

Summary

This PR adds the gedcom-evidence extension to the GEDCOM registry.

Note: This was previously part of PR #173 which has been split at reviewer request.

Extension Overview

Purpose: Enables "floating evidence" that can exist independently of identity conclusions, solving the "Which John Smith?" problem that has plagued genealogy software for 15+ years.

Repository: https://github.com/glamberson/gedcom-evidence
Version: 0.1.5 (includes all YAML validation fixes)
Status: Stable (not prerelease)

Key Features

  • Evidence containers preserve original source information exactly as found
  • Gradual identity refinement as research progresses
  • Supports GPS (Genealogical Proof Standard) methodology
  • Research documentation with _RDOC structures

Problem Solved

Current genealogy software forces users to make premature conclusions about identity. When you find a record mentioning "John Smith," you must immediately decide WHICH John Smith it refers to, potentially losing the original evidence if your conclusion proves wrong.

This extension allows evidence to "float" until you determine through research which person it describes.

YAML Structure Updates (v0.1.5)

  • Converted all 'extension tags' to array format
  • Fixed 'type: record' to 'type: structure'
  • Added proper quotation for @ symbols in payloads
  • Fixed missing empty objects for superstructures
  • Corrected all indentation issues

Testing

Background

Based on 15 years of community discussion, particularly insights from the BetterGEDCOM project (2010-2013) which identified evidence/conclusion separation as critical for genealogy data exchange.

Closes second part of #173

Enables 'floating evidence' that can exist independently of identity
conclusions, solving the 'Which John Smith?' problem that has plagued
genealogy software for 15+ years.

Key features:
- Evidence containers preserve original source information
- Gradual identity refinement as research progresses
- Supports the Genealogical Proof Standard

Repository: https://github.com/glamberson/gedcom-evidence
Copy link
Copy Markdown
Collaborator

@tychonievich tychonievich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of these files are correctly formatted (e.g. enumeration-set/extension/enumset-Confidence.yaml), but many (e.g. enumeration/gedcom-evidence/enum-Hypothesis.yaml) lack the indentation that is required by YAML

- Add proper 2-space indentation for all list items as required by YAML spec
- Addresses review feedback from @tychonievich
- Updates gedcom-evidence to v0.1.4
@glamberson
Copy link
Copy Markdown
Collaborator Author

@tychonievich Thank you for the review! I've fixed the YAML indentation issues in all four enumeration files (enum-High.yaml, enum-Medium.yaml, enum-Low.yaml, and enum-Hypothesis.yaml).

All list items now have proper 2-space indentation as required by the YAML specification. The changes have been pushed and should be ready for re-review.

I've also updated the gedcom-evidence extension to v0.1.4 with these fixes.

If there are any other formatting issues or concerns, please let me know.

Greg Lamberson
lamberson@yahoo.com

@tychonievich
Copy link
Copy Markdown
Collaborator

@tychonievich Thank you for the review! I've fixed the YAML indentation issues

I'm still getting validation errors for missing indentation (https://github.com/FamilySearch/GEDCOM-registries/actions/runs/16598922334/job/46955973713?pr=178 has error messages like Error: 9:1 [indentation] wrong indentation: expected at least 1), and getting them on more files than I was before. Is there possibly something in your workflow that is removing whitespace?

- Convert 'extension tags' from string to array format
- Add proper YAML indentation for all list items
- Quote payloads containing @ symbols
- Fix 'type: record' to 'type: structure' (record is not a valid type)
- Fix missing empty objects for superstructures
- Ensure proper spacing between sections

All files now pass validation with registry_tools/validator.py
@glamberson
Copy link
Copy Markdown
Collaborator Author

@tychonievich Luther, Sorry about that!

I've thoroughly fixed all the YAML validation problems:

  • Converted extension tags from strings to arrays (e.g., _FIND → [_FIND])
  • Fixed the indentation on ALL list items (not just the enumerations)
  • Quoted all payloads containing @ symbols
  • Changed type: record to type: structure (learned that "record" isn't a valid type in the schema)
  • Fixed missing empty objects for superstructures (superstructures: {})
  • Cleaned up spacing between sections

I ran all files through registry_tools/validator.py locally and they all pass now (14 files checked, 14 passed).

Note: I've updated the gedcom-evidence extension to v0.1.5 to reflect all these validation fixes.

Really appreciate your patience with this. The validation workflow seems to have run successfully this time - let me
know if there's anything else that needs attention.
Thanks,
Greg Lamberson
lamberson@yahoo.com

@dthaler
Copy link
Copy Markdown
Collaborator

dthaler commented Jul 29, 2025

Still seeing some YAML validation errors. Sorry YAML is so tricky!

Copy link
Copy Markdown
Collaborator

@tychonievich tychonievich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several places that the YAML suggests a structure that either isn't supported by many GEDCOM tools (like variable-type pointers or pointers to substructures), is under-specified (XREF without a pointed-to type), or seems either incomplete or contradictory (superstructure/substructure sets not matching).

I'll be happy to help clean up the YAML once I have time to finish reading through the specification (which I think is located at https://github.com/glamberson/gedcom-evidence/blob/master/specification.md, correct me if that's the wrong link)

Comment thread structure/extension/_RDOC.yaml Outdated
extension tags: _RDOC
label: Research Documentation
lang: en-US
payload: XREF
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the payload for a pointer should be the URI it points to, like @<https://github.com/glamberson/gedcom-evidence/_RACT>@ (but replacing that with the correct URI, I'm not sure what it's pointing to.

Comment thread structure/extension/_ID.yaml Outdated
extension tags: _ID
label: Evidence Identifier
lang: en-US
payload: '@<https://gedcom.io/terms/v7/record-INDI>@ | @<https://gedcom.io/terms/v7/record-FAM>@ | @<https://gedcom.io/terms/v7/record-SOUR>@'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have pointers that can point to multiple types of records in the spec, so this won't pass YAML validation. The workaround I know of now is replacing _ID with three different structures (perhaps _IDI, _IDF, and _IDS), each pointing to just one type of thing. Reverse pointers (from INDI to _ID) won't work because we also don't allow pointers to substructures.

As an aside, I have proposed before being loser with identifiers, but because many software implementations use relational databases having polymorphic pointers or non-record pointers can be problematic form implementers and don't seem to be likely to be added anytime soon.

Comment thread structure/extension/_EVID.yaml Outdated
Comment on lines +15 to +16
https://github.com/glamberson/gedcom-evidence/_CONF: '{0:1}'
https://gedcom.io/terms/v7/NOTE: '{0:M}'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this list include _ID and _FIND? They both list _EVID as a superstructure

Comment thread structure/extension/_EVID.yaml Outdated
extension tags: _EVID
label: Evidence Reference
lang: en-US
payload: XREF
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the payload for a pointer should be the URI it points to, like @https://github.com/glamberson/gedcom-evidence/_RACT@ (but replacing that with the correct URI, I'm not sure what it's pointing to.

Comment thread structure/extension/_CONF.yaml Outdated
extension tags: _CONF
label: Evidence Confidence
lang: en-US
payload: https://github.com/glamberson/gedcom-evidence/enumset-Confidence
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
payload: https://github.com/glamberson/gedcom-evidence/enumset-Confidence
payload: https://gedcom.io/terms/v7/type-Enum
enumeration set: https://github.com/glamberson/gedcom-evidence/enumset-Confidence

Comment thread structure/extension/_CONC.yaml Outdated
extension tags: _CONC
label: Evidence Conclusion
lang: en-US
payload: '@<https://github.com/glamberson/gedcom-evidence/_EVID>@'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because _EVID has a superstructure (is not a record) it cannot be pointed to.

https://gedcom.io/terms/v7/SNOTE: '{0:M}'
https://gedcom.io/terms/v7/UID: '{0:M}'
https://github.com/glamberson/gedcom-evidence/_RACT: '{0:M}'
superstructures: []
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_CONC and _ID and _RACT all list _RDOC as a substructure, so should this list them as superstructures?

Or is the intent to have those structures have substructures that point to an _RDOC? If so, that needs two URIs: one for the record (this one) and another for the pointer structure (compare pointer https://gedcom.io/terms/v7/SNOTE and record https://gedcom.io/terms/v7/record-SNOTE)


type: enumeration set

uri: https://github.com/glamberson/gedcom-evidence/enumset-Confidence
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a duplicate of enumeration-set/extension/enumset-Confidence.yaml.

---
lang: en-US
type: enumeration
uri: https://github.com/glamberson/gedcom-evidence/enum-High
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a duplicate of enumeration/extension/enum-High.yaml.
I think all the files in enumeration/gedcom-evidence can be removed from this PR.

Comment thread fix_yaml_format.py
@@ -0,0 +1,87 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't have python files in the root of this repo.
If you want to contribute this (which seems useful to me), whether in this PR or in a separate PR since it's not specific to the gedcom-evidence extension, put it under registry_tools.

Comment thread gedcom-evidence/VERSION
@@ -0,0 +1 @@
0.1.5
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this file doesn't belong in this registry

@@ -0,0 +1,35 @@
%YAML 1.2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current convention is that records go under structure (hence under structure/extension). Standard records are structure/standard/record-*.yaml.

Comment on lines +19 to +24
payload: "@<https://gedcom.io/terms/v7/record-INDI>@ | @<https://gedcom.io/terms/v7/record-FAM>@ | @<https://gedcom.io/terms/v7/record-SOUR>@"

substructures:
"https://gedcom.io/terms/v7/NOTE": "{0:M}"
"https://gedcom.io/terms/v7/QUAY": "{0:1}"
"https://github.com/glamberson/gedcom-evidence/_RDOC": "{0:M}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://github.com/glamberson/gedcom-evidence/blob/master/specification.md the _ID structure is listed with a text payload and no substructures. I'm curious which one is more correct.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, Luther! I broke the rules. OK but I do have a path forward. This current mess was created because I kept thinking I was just cleaning up formatting and lost track of what I was doing (I won't even mention someone named Claude). What I want to do is retract this submission and resubmit after getting my sanity back. I have the right model to make a semblance of this work. It'll be a top level entity and a shadow reference. I doubt it'll be today, though. I'm really sorry for wasting your time on my mistakes. I'll get it together quickly, though! - Greg

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No time was wasted! We benefited greatly from seeing how someone not on the steering committee put together a set of YAML files, and had several interesting and fruitful discussions along the way. Also, the data these files are leading toward look very interesting to me! I look forward to your revised submission.

@glamberson
Copy link
Copy Markdown
Collaborator Author

Realizing my violations of GEDCOM and my increasingly sloppy submission, I'll retract it and resubmit with a top level entity and shadow architecture. I apologize for wasting everyone's time with shoddy work. Give me a day or two to get it right! Thanks! - Greg Lamberson lamberson@yahoo.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants