feat(file-mode-api: add filename extractor component#453
Conversation
Maxime Carbonneau-Leclerc (maxi297)
left a comment
There was a problem hiding this comment.
I have a concern about file naming. Can you add more context to this? I would like to make sure we avoid collisions
| relative_path = relative_path.lstrip("/") | ||
| file_relative_path = Path(relative_path) | ||
|
|
||
| full_path = files_directory / file_relative_path |
There was a problem hiding this comment.
Should we have the stream name somewhere in there? It feels like multiple streams could have a file with the same name.
Even more than this, should we have a unique ID per file? It feels like there could even be two files in the same stream with the same name...
There was a problem hiding this comment.
Well , for Zendesk support, we do actually, e.g.:
filename_extractor: "{{ record.relative_path }}/{{ record.file_name }}/"
Interpolates as:
hc/article_attachments/"attachments_id"/"name _of_the_file.extension"
This works for this specific endpoint in Zendesk, but I can see it is not guaranteed for every connector in the future. So, I guess we can let the user add any extra path but make the component prefix to the path the stream and the attachment/file ID.
There was a problem hiding this comment.
Ok so what we are saying is that it is the developer's responsibility to make sure there are no clash. Could we remove this concern from the developer's and do it ourselves?
Regarding timing: I'm not 100% sure we need this right now and maybe we can make filename_extractor optional in the future when we find a solution this this. On the top of my head, I can only see one way and it is when the stream declares a PK which seems to be common when I checked for Confluence, Jira and Salesforce so maybe this is viable in the future
There was a problem hiding this comment.
so what we are saying is that it is the developer's responsibility to make sure there are no clash. Could we remove this concern from the developer's and do it ourselves?
No, I didn't make myself clear. I'm sorry about that. To reduce the risk of collisions, I will add the stream name + unique ID on the backend (CDK).
There was a problem hiding this comment.
What is the logic for the unique ID? Autogenerated UUID?
There was a problem hiding this comment.
I think we can:
- Add ourselves stream name to the path reducing collision risk
- Make filename_extractor optional so the developer can include a unique ID. There is a risk that he could mess up, but we can add some documentation to the component.
- Use Autogenerated UUID if filename_extractor is not present.
Maxime Carbonneau-Leclerc (maxi297)
left a comment
There was a problem hiding this comment.
Accepting under the premise that we are fine that the connector developer ensure no file collisions for now
Maxime Carbonneau-Leclerc (@maxi297) This is still true with slight modifications that we can refine in the future:
|
68480b7
into
aldogonzalez8/poc-emit-file-reference-record
Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/12196