Use .arg instead of .trees#3152
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3152 +/- ##
=======================================
Coverage 89.58% 89.58%
=======================================
Files 28 28
Lines 31841 31841
Branches 5849 5849
=======================================
Hits 28524 28524
Misses 1887 1887
Partials 1430 1430
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
jeromekelleher
left a comment
There was a problem hiding this comment.
I'm +1 on this, but we should document the change somewhere clearly (probably in several places), and make sure that there's wide support in the community before merging
| is not enforced in any way), and we will sometimes refer to them as ".trees" | ||
| By convention, these files are given the `.arg` suffix (although this | ||
| is not enforced in any way), and we will sometimes refer to them as ".arg" | ||
| files. We also refer to them as "tree sequence files". |
There was a problem hiding this comment.
Should add a "note" call out here or something noting that we changed the convention from .trees to .arg, also with some reassurances that you can keep any .trees or .ts files if you want, it doesn't make any actual difference.
|
Were there any references to a ".ts" file? |
|
I guess we'd want to scan downstream repos like msprime and tszip etc to make sure the docs use this new convention (I wouldn't bother changing the actual code in test suites or anything, though) |
|
Hmm. This should be discussed before being merged, yes? @petrelharp you'll probably be interested. I see several caveats/objections here:
My preference is pretty strongly for .trees. I think it's descriptive and specific in a way that .arg isn't; and I don't really think changing the filename suffix that we use is going to really make much difference to how people perceive tskit and tree sequences; and I don't really want to change all the places that I talk about .trees files (in the manual, in the recipes, in the workshop including in voice recordings that I'd have to re-record, etc.). This seems like a lot of work for very little payoff – or perhaps even, I think, negative payoff. |
Absolutely! Thought the best way to trigger discussion was a PR to show what would be needed. |
Not that I could see in this repo. |
|
Thanks @bhaller! First:
We're definitely not going to push this through without discussion and broad agreement A quick response on the fundamental point:
Well, we would argue that yes it is a general ARG format. We wrote a paper making this very point at great length and detail. So, this initial reaction alone (a confusion about this basic point from someone deep in the community) makes me feel like the change is worth making. We want tskit to be seen as "the" ARG library, and this is a useful step in that direction. There is no other "general ARG format". Please do read the paper - there's a lot in there which I don't really want to rehearse here. At the end of the day, this is just a change in the suggested convention used for naming files. We don't have to be exclusive about it - we can just say that by convention tskit files are given the extension ".arg", ".trees" or ".ts". If you want to keep the documentation on SLiM using .trees that's totally fine. |
Great.
OK. On vacation right now, with very little time before I need to leave my hotel; can't read the paper right now. :-> But it certainly seems like there are lots of other groups proposing lots of other file formats for information that is quite similar. It seems presumptuous to say "WE are _the_ARG library, forevermore". Maybe we'd like to establish ourselves as being that, sure; that's a goal to aspire to. But simply taking the crown for ourselves seems like it might rub many people the wrong way.
But then that's just confusing. People will constantly ask "so, what's the difference between these different file formats?" Apart from ".JPEG" vs. ".jpeg" vs. ".jpg" (which is already annoying and confusing), I can't think of a case where a single file format on disk is given multiple distinct file extensions that actually all mean the same thing. I think that's a recipe for confusion. And I don't think it really means that I won't need to change workshop materials, re-record lectures, etc., because I'll need to change my materials to try to avoid that confusion. |
|
For now, gotta go, will check it again this evening. |
|
Please do read the paper Ben - there's an extensive case built up to address exactly the points you're raising here. There's no hurry in responding, we're not going to merge. |
|
One thing to note is that ARGweaver outputs a ".arg" file with a different format which could be confusing especially if anyone is converting between tools. |
|
I don't have strong feelings here as long as a (specific) suffix is not a requirement. I'd probably continue to use The issue that @kitchensjn brings up about what to do w/outputs from multiple tools in the arg-o-sphere is a real problem. |
I have used |
|
I wasn't initially compelled by @bhaller's basic point. However, thinking more - I think a related point is that the name Nonetheless, I still kinda like the proposal, since as a relatively small field we want to settle actually on a single format and not have everyone inventing their own. Plus, the way tskit stores things is closer to "ARG" than it is to "sequence of trees". Another option would be to come up with a suffix like maybe |
|
As we didn't have consensus on this PR, I'll close it. Thanks for all your input. |
From a suggestion by @hyanwong on Slack. This could be advantageous as some folks don't realise tskit is an ARG library.