This repository was archived by the owner on Aug 15, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 20
Expand file tree
/
Copy pathosdf.html
More file actions
81 lines (62 loc) · 4.06 KB
/
osdf.html
File metadata and controls
81 lines (62 loc) · 4.06 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
title: Open Science Data Federation
---
<figure class="w-100 figure">
<iframe width="100%" height="500px" frameBorder="0" style="margin-top:1em" src="https://map.opensciencegrid.org/map/iframe?view=XRootD#38.61687,-97.86621|4|hybrid"></iframe>
<figcaption>US Map featuring the locations of current OSDF <a href="https://opensciencegrid.org/docs/data/stashcache/overview/#architecture">architectural components</a>.</figcaption>
</figure>
<h3 id="what-is-the-open-science-data-federation">What is the Open Science Data Federation?</h3>
<div class="p-2 bg-light">
<p>
The Open Science Data Federation (OSDF) is an OSG service that enables users and institutions to
make datasets available to distributed high-throughput computing (dHTC) environments such as the
<a href="/about/open_science_pool">Open Science Pool</a> (OSPool). The OSDF provides execution points with remote
access to data via a global namespace and a set of <a href="https://opensciencegrid.org/docs/data/stashcache/overview/#architecture"><em>data caches</em></a> which can access data stored in
<a href="https://opensciencegrid.org/docs/data/stashcache/overview/#architecture"><em>data origins</em></a>.
</p>
<p>
By providing the distributed data access layer via the caches, jobs running in the OSPool (or any
other resource pool) can reduce wide-area network consumption, load on the data origins, and latency
of data access.
</p>
</div>
<h3 id="who-can-use-the-osdf">Who can use the OSDF?</h3>
<div class="p-2 bg-light">
<p>
Any US-based academic, government, or non-profit institution may operate a data origin to export their
users’ data. Researchers with an OSG Connect account may also use the origin at the <a href="https://connect.osg-htc.org">OSG Connect</a>
access point.
</p>
<p>
To learn how to access your data through the OSDF, please reach out to our team of Research Computing Facilitators
through <a href="mailto:support@opensciencegrid.org">support@opensciencegrid.org</a>.
</p>
</div>
<h3 id="who-can-access-my-data">Who can access my data?</h3>
<div class="p-2 bg-light">
<p>
Access control for the OSDF is managed by each <a href="https://opensciencegrid.org/docs/data/stashcache/overview/#architecture"><em>origin</em></a> service. The origin can be configured to make data public
or private, and can control the rules for sharing. For example, the OSG Connect service allows users to
make their data accessible to all (public) or only accessible to their own jobs.
</p>
<p>
The data may additionally be visible to the administrators of the <a href="https://opensciencegrid.org/docs/data/stashcache/overview/#architecture"><em>cache</em></a> services, and the execution point where
the job runs. Non-public data is encrypted when sent over the network but not on disk.
</p>
</div>
<h3 id="who-manages-the-osdf">Who manages the OSDF?</h3>
<div class="p-2 bg-light">
<p>
The origins on the OSDF are generally managed by the projects or institutions which own the underlying storage. Additionally the
<a href="https://path-cc.io/">PATh project</a> (which funds many core OSG technologies and services) offers options for
PATh-hosted origins otherwise owned by and configured specifically for relevant organizations, as described
<a href="{{ '/about/osdf/deploying_an_osdf_origin.html' | relative_url }}">here</a>.
</p>
<p>
The caches are largely managed by OSG staff, who remotely operate the services. Some caches are dedicated to a specific
experiment; a number of caches are specific to the <a href="https://www.ligo.caltech.edu/">LIGO</a> experiment. The cache
hardware is distributed throughout the US, including points of presence in the <a href="https://internet2.edu/">Internet2</a>
and <a href="https://www.es.net/">ESNet</a> networks and universities facilities such as UW-Madison,
Chicago, Syracuse, UCSD, and Nebraska.
</p>
</div>