Add gedcom-evidence extension v0.1.3#178
Conversation
Enables 'floating evidence' that can exist independently of identity conclusions, solving the 'Which John Smith?' problem that has plagued genealogy software for 15+ years. Key features: - Evidence containers preserve original source information - Gradual identity refinement as research progresses - Supports the Genealogical Proof Standard Repository: https://github.com/glamberson/gedcom-evidence
tychonievich
left a comment
There was a problem hiding this comment.
A few of these files are correctly formatted (e.g. enumeration-set/extension/enumset-Confidence.yaml), but many (e.g. enumeration/gedcom-evidence/enum-Hypothesis.yaml) lack the indentation that is required by YAML
- Add proper 2-space indentation for all list items as required by YAML spec - Addresses review feedback from @tychonievich - Updates gedcom-evidence to v0.1.4
|
@tychonievich Thank you for the review! I've fixed the YAML indentation issues in all four enumeration files (enum-High.yaml, enum-Medium.yaml, enum-Low.yaml, and enum-Hypothesis.yaml). All list items now have proper 2-space indentation as required by the YAML specification. The changes have been pushed and should be ready for re-review. I've also updated the gedcom-evidence extension to v0.1.4 with these fixes. If there are any other formatting issues or concerns, please let me know. Greg Lamberson |
I'm still getting validation errors for missing indentation (https://github.com/FamilySearch/GEDCOM-registries/actions/runs/16598922334/job/46955973713?pr=178 has error messages like |
- Convert 'extension tags' from string to array format - Add proper YAML indentation for all list items - Quote payloads containing @ symbols - Fix 'type: record' to 'type: structure' (record is not a valid type) - Fix missing empty objects for superstructures - Ensure proper spacing between sections All files now pass validation with registry_tools/validator.py
|
@tychonievich Luther, Sorry about that! I've thoroughly fixed all the YAML validation problems:
I ran all files through registry_tools/validator.py locally and they all pass now (14 files checked, 14 passed). Note: I've updated the gedcom-evidence extension to v0.1.5 to reflect all these validation fixes. Really appreciate your patience with this. The validation workflow seems to have run successfully this time - let me |
|
Still seeing some YAML validation errors. Sorry YAML is so tricky! |
tychonievich
left a comment
There was a problem hiding this comment.
There are several places that the YAML suggests a structure that either isn't supported by many GEDCOM tools (like variable-type pointers or pointers to substructures), is under-specified (XREF without a pointed-to type), or seems either incomplete or contradictory (superstructure/substructure sets not matching).
I'll be happy to help clean up the YAML once I have time to finish reading through the specification (which I think is located at https://github.com/glamberson/gedcom-evidence/blob/master/specification.md, correct me if that's the wrong link)
| extension tags: _RDOC | ||
| label: Research Documentation | ||
| lang: en-US | ||
| payload: XREF |
There was a problem hiding this comment.
The value of the payload for a pointer should be the URI it points to, like @<https://github.com/glamberson/gedcom-evidence/_RACT>@ (but replacing that with the correct URI, I'm not sure what it's pointing to.
| extension tags: _ID | ||
| label: Evidence Identifier | ||
| lang: en-US | ||
| payload: '@<https://gedcom.io/terms/v7/record-INDI>@ | @<https://gedcom.io/terms/v7/record-FAM>@ | @<https://gedcom.io/terms/v7/record-SOUR>@' |
There was a problem hiding this comment.
We don't currently have pointers that can point to multiple types of records in the spec, so this won't pass YAML validation. The workaround I know of now is replacing _ID with three different structures (perhaps _IDI, _IDF, and _IDS), each pointing to just one type of thing. Reverse pointers (from INDI to _ID) won't work because we also don't allow pointers to substructures.
As an aside, I have proposed before being loser with identifiers, but because many software implementations use relational databases having polymorphic pointers or non-record pointers can be problematic form implementers and don't seem to be likely to be added anytime soon.
| https://github.com/glamberson/gedcom-evidence/_CONF: '{0:1}' | ||
| https://gedcom.io/terms/v7/NOTE: '{0:M}' |
There was a problem hiding this comment.
Shouldn't this list include _ID and _FIND? They both list _EVID as a superstructure
| extension tags: _EVID | ||
| label: Evidence Reference | ||
| lang: en-US | ||
| payload: XREF |
There was a problem hiding this comment.
The value of the payload for a pointer should be the URI it points to, like @https://github.com/glamberson/gedcom-evidence/_RACT@ (but replacing that with the correct URI, I'm not sure what it's pointing to.
| extension tags: _CONF | ||
| label: Evidence Confidence | ||
| lang: en-US | ||
| payload: https://github.com/glamberson/gedcom-evidence/enumset-Confidence |
There was a problem hiding this comment.
| payload: https://github.com/glamberson/gedcom-evidence/enumset-Confidence | |
| payload: https://gedcom.io/terms/v7/type-Enum | |
| enumeration set: https://github.com/glamberson/gedcom-evidence/enumset-Confidence |
| extension tags: _CONC | ||
| label: Evidence Conclusion | ||
| lang: en-US | ||
| payload: '@<https://github.com/glamberson/gedcom-evidence/_EVID>@' |
There was a problem hiding this comment.
Because _EVID has a superstructure (is not a record) it cannot be pointed to.
| https://gedcom.io/terms/v7/SNOTE: '{0:M}' | ||
| https://gedcom.io/terms/v7/UID: '{0:M}' | ||
| https://github.com/glamberson/gedcom-evidence/_RACT: '{0:M}' | ||
| superstructures: [] |
There was a problem hiding this comment.
_CONC and _ID and _RACT all list _RDOC as a substructure, so should this list them as superstructures?
Or is the intent to have those structures have substructures that point to an _RDOC? If so, that needs two URIs: one for the record (this one) and another for the pointer structure (compare pointer https://gedcom.io/terms/v7/SNOTE and record https://gedcom.io/terms/v7/record-SNOTE)
|
|
||
| type: enumeration set | ||
|
|
||
| uri: https://github.com/glamberson/gedcom-evidence/enumset-Confidence |
There was a problem hiding this comment.
This seems like a duplicate of enumeration-set/extension/enumset-Confidence.yaml.
| --- | ||
| lang: en-US | ||
| type: enumeration | ||
| uri: https://github.com/glamberson/gedcom-evidence/enum-High |
There was a problem hiding this comment.
This seems like a duplicate of enumeration/extension/enum-High.yaml.
I think all the files in enumeration/gedcom-evidence can be removed from this PR.
| @@ -0,0 +1,87 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
shouldn't have python files in the root of this repo.
If you want to contribute this (which seems useful to me), whether in this PR or in a separate PR since it's not specific to the gedcom-evidence extension, put it under registry_tools.
| @@ -0,0 +1 @@ | |||
| 0.1.5 | |||
There was a problem hiding this comment.
seems like this file doesn't belong in this registry
| @@ -0,0 +1,35 @@ | |||
| %YAML 1.2 | |||
There was a problem hiding this comment.
Current convention is that records go under structure (hence under structure/extension). Standard records are structure/standard/record-*.yaml.
| payload: "@<https://gedcom.io/terms/v7/record-INDI>@ | @<https://gedcom.io/terms/v7/record-FAM>@ | @<https://gedcom.io/terms/v7/record-SOUR>@" | ||
|
|
||
| substructures: | ||
| "https://gedcom.io/terms/v7/NOTE": "{0:M}" | ||
| "https://gedcom.io/terms/v7/QUAY": "{0:1}" | ||
| "https://github.com/glamberson/gedcom-evidence/_RDOC": "{0:M}" |
There was a problem hiding this comment.
In https://github.com/glamberson/gedcom-evidence/blob/master/specification.md the _ID structure is listed with a text payload and no substructures. I'm curious which one is more correct.
There was a problem hiding this comment.
Ugh, Luther! I broke the rules. OK but I do have a path forward. This current mess was created because I kept thinking I was just cleaning up formatting and lost track of what I was doing (I won't even mention someone named Claude). What I want to do is retract this submission and resubmit after getting my sanity back. I have the right model to make a semblance of this work. It'll be a top level entity and a shadow reference. I doubt it'll be today, though. I'm really sorry for wasting your time on my mistakes. I'll get it together quickly, though! - Greg
There was a problem hiding this comment.
No time was wasted! We benefited greatly from seeing how someone not on the steering committee put together a set of YAML files, and had several interesting and fruitful discussions along the way. Also, the data these files are leading toward look very interesting to me! I look forward to your revised submission.
|
Realizing my violations of GEDCOM and my increasingly sloppy submission, I'll retract it and resubmit with a top level entity and shadow architecture. I apologize for wasting everyone's time with shoddy work. Give me a day or two to get it right! Thanks! - Greg Lamberson lamberson@yahoo.com |
Summary
This PR adds the gedcom-evidence extension to the GEDCOM registry.
Note: This was previously part of PR #173 which has been split at reviewer request.
Extension Overview
Purpose: Enables "floating evidence" that can exist independently of identity conclusions, solving the "Which John Smith?" problem that has plagued genealogy software for 15+ years.
Repository: https://github.com/glamberson/gedcom-evidence
Version: 0.1.5 (includes all YAML validation fixes)
Status: Stable (not prerelease)
Key Features
Problem Solved
Current genealogy software forces users to make premature conclusions about identity. When you find a record mentioning "John Smith," you must immediately decide WHICH John Smith it refers to, potentially losing the original evidence if your conclusion proves wrong.
This extension allows evidence to "float" until you determine through research which person it describes.
YAML Structure Updates (v0.1.5)
Testing
Background
Based on 15 years of community discussion, particularly insights from the BetterGEDCOM project (2010-2013) which identified evidence/conclusion separation as critical for genealogy data exchange.
Closes second part of #173