Assume that TESK is deployed with a config file that sets the output endpoint to some s3 instance in http or https format...
[default]
endpoint_url=http://some.endpoint.com
Then in the job json the "url" for outputs is set to the following:
# This works!
"url": "s3://output",
The s3 schema means "output" gets treated as the bucket name.
# These all fail!
"url": "s3://outputs3",
"url": "s3://s3output",
# Here a less contrived example to show how this can happen when you don't even intentionally use "s3" to mean "s3"
"url": "s3://shoulders3486output",
The s3 schema is detected but because the bucket name also contains "s3" it falsely triggers this regex:
|
match = re.search('^([^.]+).s3', self.netloc) |
Which mangles the bucket name leading to a bucket not found error.
But we can trick it...
# This works!
"url": "http://s3.foo.bar.baz/shoulders3486output",
HTTP is detected as the schema, but the netloc part of the url contains "s3" so it is treated as s3 due to this logic:
|
if 's3' in netloc: |
|
return S3Transput |
The bucket name is now part of the URL "path" not the URL "netloc", so it doesn't get mangled.
s3.foo.bar.baz the netloc part is never actually used other than detecting if it's an s3 transfer or http transfer.
Assume that TESK is deployed with a config file that sets the output endpoint to some s3 instance in http or https format...
Then in the job json the "url" for outputs is set to the following:
The s3 schema means "output" gets treated as the bucket name.
The s3 schema is detected but because the bucket name also contains "s3" it falsely triggers this regex:
tesk-core/src/tesk_core/filer_s3.py
Line 64 in 1a7b810
Which mangles the bucket name leading to a bucket not found error.
But we can trick it...
HTTP is detected as the schema, but the netloc part of the url contains "s3" so it is treated as s3 due to this logic:
tesk-core/src/tesk_core/filer.py
Lines 416 to 417 in 1a7b810
The bucket name is now part of the URL "path" not the URL "netloc", so it doesn't get mangled.
s3.foo.bar.bazthe netloc part is never actually used other than detecting if it's an s3 transfer or http transfer.