update hadoop to more recent version

Currently used hadoop version (2.7.3) is way too old (released back in June 2017). One of the consequences is the missing support for "fs.s3a.path.style.access" property of s3a filesystem layer, which means the s3-compatible object store must be configured with virtual hosting for buckets. This in turn is not supported on the OpenShift Container Storage 4.4 (or at least it is not enabled by default and not properly documented how to configure this feature).

That means the Spark + Object Store example in the [basic tutorial](https://opendatahub.io/docs/getting-started/basic-tutorial.html) won't work on the latest OpenShift Container Platform (4.4) with OpenShift Container Storage.

Despite having the following settings:
```
s3_endpoint_url = 'https://s3.openshift-storage.svc:443'
s3_bucket = 'odh-jupyterhub-9654ef69-1f36-48f1-b50f-4d2dbef1357d'

hadoopConf.set("fs.s3a.path.style.access", "true")
```
the code from tutorial raises exception trying to connect to the bucket vi virtual host (http://bucket.s3endpoint/ instead of https://s3endpoint/bucket/):
```
Py4JJavaError: An error occurred while calling o96.csv.
: com.amazonaws.AmazonClientException: Unable to execute HTTP request: odh-jupyterhub-9654ef69-1f36-48f1-b50f-4d2dbef1357d.s3.openshift-storage.svc: Name or service not known
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update hadoop to more recent version #109

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

update hadoop to more recent version #109

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions