Skip to content

Commit ffc105d

Browse files
authored
Merge pull request IQSS#7729 from IQSS/7400-opendp-download
Auxiliary file download
2 parents 3257cd3 + cf7b6d1 commit ffc105d

23 files changed

Lines changed: 738 additions & 132 deletions

conf/docker-aio/run-test-suite.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ fi
88

99
# Please note the "dataverse.test.baseurl" is set to run for "all-in-one" Docker environment.
1010
# TODO: Rather than hard-coding the list of "IT" classes here, add a profile to pom.xml.
11-
source maven/maven.sh && mvn test -Dtest=DataversesIT,DatasetsIT,SwordIT,AdminIT,BuiltinUsersIT,UsersIT,UtilIT,ConfirmEmailIT,FileMetadataIT,FilesIT,SearchIT,InReviewWorkflowIT,HarvestingServerIT,MoveIT,MakeDataCountApiIT,FileTypeDetectionIT,EditDDIIT,ExternalToolsIT,AccessIT,DuplicateFilesIT,DownloadFilesIT,LinkIT,DeleteUsersIT,DeactivateUsersIT -Ddataverse.test.baseurl=$dvurl
11+
source maven/maven.sh && mvn test -Dtest=DataversesIT,DatasetsIT,SwordIT,AdminIT,BuiltinUsersIT,UsersIT,UtilIT,ConfirmEmailIT,FileMetadataIT,FilesIT,SearchIT,InReviewWorkflowIT,HarvestingServerIT,MoveIT,MakeDataCountApiIT,FileTypeDetectionIT,EditDDIIT,ExternalToolsIT,AccessIT,DuplicateFilesIT,DownloadFilesIT,LinkIT,DeleteUsersIT,DeactivateUsersIT,AuxiliaryFilesIT -Ddataverse.test.baseurl=$dvurl
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
Auxiliary Files can now be downloaded from the web interface.
2+
3+
- Aux files uploaded as type=DP appear under "Differentially Private Statistics" under file level download. The rest appear under "Other Auxiliary Files".
4+
5+
In addition, related changes were made, including the following:
6+
7+
- New tooltip over the lock indicating if you have been granted access to a restricted file or not.
8+
- When downloading individual files, you will see "Restricted with Access Granted" or just "Restricted" (followed by "Users may not request access to files.") as appropriate.
9+
- When downloading individual files, instead of "Download" you should expect to see the file type such as "JPEG Image" or "Original File Format" if the type is unknown.
10+
- Downloaded aux files now have a file extension if it can be determined.
11+
12+
Please note that the auxiliary files feature is experimental and if you don't need it, its API endpoints can be blocked.

doc/sphinx-guides/source/developers/aux-file-support.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
Auxiliary File Support
22
======================
33

4-
Auxiliary file support is experimental. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the OpenDP project (OpenDP.io). In future versions, this approach may become more broadly used and supported.
4+
Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the OpenDP project (opendp.org). In future versions, this approach will likely become more broadly used and supported.
55

66
Adding an Auxiliary File to a Datafile
77
--------------------------------------
8-
To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and the formatTag and formatVersion (if applicable) associated with the auxiliary file. There are two form parameters. "Origin" specifies the application/entity that created the auxiliary file, an "isPublic" controls access to downloading the file. If "isPublic" is true, any user can download the file, else, access authorization is based on the access rules as defined for the DataFile itself.
8+
To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and the formatTag and formatVersion (if applicable) associated with the auxiliary file. There are multiple form parameters. "Origin" specifies the application/entity that created the auxiliary file, and "isPublic" controls access to downloading the file. If "isPublic" is true, any user can download the file if the dataset has been published, else, access authorization is based on the access rules as defined for the DataFile itself. The "type" parameter is used to group similar auxiliary files in the UI. Currently, auxiliary files with type "DP" appear under "Differentially Private Statistics", while all other auxiliary files appear under "Other Auxiliary Files".
99

1010
.. code-block:: bash
1111
@@ -14,9 +14,10 @@ To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and
1414
export FILE_ID='12345'
1515
export FORMAT_TAG='dpJson'
1616
export FORMAT_VERSION='v1'
17+
export TYPE='DP'
1718
export SERVER_URL=https://demo.dataverse.org
1819
19-
curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' "$SERVER_URL/api/access/datafile/$FILE_ID/metadata/$FORMAT_TAG/$FORMAT_VERSION"
20+
curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/metadata/$FORMAT_TAG/$FORMAT_VERSION"
2021
2122
You should expect a 200 ("OK") response and JSON with information about your newly uploaded auxiliary file.
2223

@@ -33,4 +34,4 @@ formatTag and formatVersion (if applicable) associated with the auxiliary file:
3334
export FORMAT_TAG='dpJson'
3435
export FORMAT_VERSION='v1'
3536
36-
curl "$SERVER_URL/api/access/datafile/$FILE_ID/$FORMAT_TAG/$FORMAT_VERSION"
37+
curl "$SERVER_URL/api/access/datafile/$FILE_ID/metadata/$FORMAT_TAG/$FORMAT_VERSION"

doc/sphinx-guides/source/user/dataset-management.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,8 @@ Additional download options available for tabular data (found in the same drop-d
179179
- Data File Citation (currently in either RIS, EndNote XML, or BibTeX format);
180180
- All of the above, as a zipped bundle.
181181

182+
Differentially Private (DP) Metadata can also be accessed for restricted tabular files if the data depositor has created a DP Metadata Release. See :ref:`dp-release-create` for more information.
183+
182184
Astronomy (FITS)
183185
----------------
184186

@@ -210,6 +212,8 @@ Restricted Files
210212

211213
When you restrict a file it cannot be downloaded unless permission has been granted.
212214

215+
Differentially Private (DP) Metadata can be accessed for restricted tabular files if the data depositor has created a DP Metadata Release. See :ref:`dp-release-create` for more information.
216+
213217
See also :ref:`terms-of-access` and :ref:`permissions`.
214218

215219
Edit Files
@@ -302,6 +306,23 @@ If you restrict any files in your dataset, you will be prompted by a pop-up to e
302306

303307
See also :ref:`restricted-files`.
304308

309+
.. _dp-release-create:
310+
311+
Creating and Depositing Differentially Private Metadata (Experimental)
312+
----------------------------------------------------------------------
313+
314+
Through an integration with tools from the OpenDP Project (opendp.org), the Dataverse Software offers an experimental workflow that allows a data depositor to create and deposit Differentially Private (DP) Metadata files, which can then be used for exploratory data analysis. This workflow allows researchers to view the DP metadata for a tabular file, determine whether or not the file contains useful information, and then make an informed decision about whether or not to request access to the original file.
315+
316+
If this integration has been enabled in your Dataverse installation, you can follow these steps to create a DP Metadata Release and make it available to researchers, while still keeping the files themselves restricted and able to be accessed after a successful access request.
317+
318+
- Deposit a tabular file and let the ingest process complete
319+
- Restrict the File
320+
- In the kebab next to the file on the dataset page, or from the "Edit Files" dropdown on the file page, click "OpenDP Tool"
321+
- Go through the process to create a DP Metadata Release in the OpenDP tool, and at the end of the process deposit the DP Metadata Release back to the Dataverse installation
322+
- Publish the Dataset
323+
324+
Once the dataset is published, users will be able to request access using the normal process, but will also have the option to download DP Statistics in order to get more information about the file.
325+
305326
Guestbook
306327
---------
307328

doc/sphinx-guides/source/user/find-use-data.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,19 @@ Explore Data
153153

154154
Some file types and datasets offer data exploration options if external tools have been installed. The tools are described in the :doc:`/admin/external-tools` section of the Admin Guide.
155155

156+
Exploratory Data Analysis Using Differentially Private Metadata (Experimental)
157+
------------------------------------------------------------------------------
158+
159+
Through an integration with tools from the OpenDP Project (opendp.org), the Dataverse Software offers an experimental workflow that allows a data depositor to create and deposit Differentially Private (DP) Metadata files, which can then be used for exploratory data analysis. This workflow allows researchers to view the DP metadata for a tabular file, determine whether or not the file contains useful information, and then make an informed decision about whether or not to request access to the original file.
160+
161+
If the data depositor has made available DP metadata for one or more files in their dataset, these access options will appear on the access dropdown on both the Dataset Page and the File Page. These access options will be available even if a file is restricted. Three types of DP metadata will be available:
162+
163+
- .PDF
164+
- .XML
165+
- .JSON
166+
167+
For more information about how data depositors can enable access using the OpenDP tool, visit the :doc:`/user/dataset-management` section of the User Guide.
168+
156169
.. |image-file-tree-view| image:: ./img/file-tree-view.png
157170
:class: img-responsive
158171
.. |image-file-search-facets| image:: ./img/file-search-facets.png

src/main/java/edu/harvard/iq/dataverse/AuxiliaryFile.java

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,39 @@
11

22
package edu.harvard.iq.dataverse;
33

4+
import edu.harvard.iq.dataverse.util.BundleUtil;
45
import java.io.Serializable;
6+
import java.util.MissingResourceException;
57
import javax.persistence.Entity;
68
import javax.persistence.GeneratedValue;
79
import javax.persistence.GenerationType;
810
import javax.persistence.Id;
911
import javax.persistence.JoinColumn;
1012
import javax.persistence.ManyToOne;
13+
import javax.persistence.NamedNativeQueries;
14+
import javax.persistence.NamedNativeQuery;
15+
import javax.persistence.NamedQueries;
16+
import javax.persistence.NamedQuery;
1117

1218
/**
1319
*
1420
* @author ekraffmiller
1521
* Represents a generic file that is associated with a dataFile.
1622
* This is a data representation of a physical file in StorageIO
1723
*/
24+
@NamedQueries({
25+
@NamedQuery(name = "AuxiliaryFile.lookupAuxiliaryFile",
26+
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.formatTag = :formatTag and o.formatVersion = :formatVersion"),
27+
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFiles",
28+
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId"),
29+
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByType",
30+
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type = :type"),
31+
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesWithoutType",
32+
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),})
33+
@NamedNativeQueries({
34+
@NamedNativeQuery(name = "AuxiliaryFile.findAuxiliaryFileTypes",
35+
query = "select distinct type from auxiliaryfile where datafile_id = ?1")
36+
})
1837
@Entity
1938
public class AuxiliaryFile implements Serializable {
2039

@@ -44,6 +63,12 @@ public class AuxiliaryFile implements Serializable {
4463

4564
private String checksum;
4665

66+
/**
67+
* A way of grouping similar auxiliary files together. The type could be
68+
* "DP" for "Differentially Private Statistics", for example.
69+
*/
70+
private String type;
71+
4772
public Long getId() {
4873
return id;
4974
}
@@ -115,6 +140,21 @@ public String getChecksum() {
115140
public void setChecksum(String checksum) {
116141
this.checksum = checksum;
117142
}
118-
119-
143+
144+
public String getType() {
145+
return type;
146+
}
147+
148+
public void setType(String type) {
149+
this.type = type;
150+
}
151+
152+
public String getTypeFriendly() {
153+
try {
154+
return BundleUtil.getStringFromPropertyFile("file.auxfiles.types." + type, "Bundle");
155+
} catch (MissingResourceException ex) {
156+
return null;
157+
}
158+
}
159+
120160
}

src/main/java/edu/harvard/iq/dataverse/AuxiliaryFileServiceBean.java

Lines changed: 78 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,16 @@
88
import java.io.InputStream;
99
import java.security.DigestInputStream;
1010
import java.security.MessageDigest;
11+
import java.util.ArrayList;
12+
import java.util.List;
1113
import java.util.logging.Logger;
1214
import javax.ejb.EJB;
1315
import javax.ejb.Stateless;
1416
import javax.inject.Named;
1517
import javax.persistence.EntityManager;
1618
import javax.persistence.PersistenceContext;
1719
import javax.persistence.Query;
20+
import javax.persistence.TypedQuery;
1821
import org.apache.tika.Tika;
1922

2023
/**
@@ -28,7 +31,7 @@ public class AuxiliaryFileServiceBean implements java.io.Serializable {
2831
private static final Logger logger = Logger.getLogger(AuxiliaryFileServiceBean.class.getCanonicalName());
2932

3033
@PersistenceContext(unitName = "VDCNet-ejbPU")
31-
private EntityManager em;
34+
protected EntityManager em;
3235

3336
@EJB
3437
private SystemConfig systemConfig;
@@ -54,9 +57,11 @@ public AuxiliaryFile save(AuxiliaryFile auxiliaryFile) {
5457
* @param formatVersion - to distinguish between multiple versions of a file
5558
* @param origin - name of the tool/system that created the file
5659
* @param isPublic boolean - is this file available to any user?
60+
* @param type how to group the files such as "DP" for "Differentially
61+
* Private Statistics".
5762
* @return success boolean - returns whether the save was successful
5863
*/
59-
public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile dataFile, String formatTag, String formatVersion, String origin, boolean isPublic) {
64+
public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile dataFile, String formatTag, String formatVersion, String origin, boolean isPublic, String type) {
6065

6166
StorageIO<DataFile> storageIO =null;
6267
AuxiliaryFile auxFile = new AuxiliaryFile();
@@ -81,6 +86,7 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
8186
auxFile.setFormatVersion(formatVersion);
8287
auxFile.setOrigin(origin);
8388
auxFile.setIsPublic(isPublic);
89+
auxFile.setType(type);
8490
auxFile.setDataFile(dataFile);
8591
auxFile.setFileSize(storageIO.getAuxObjectSize(auxExtension));
8692
auxFile = save(auxFile);
@@ -101,7 +107,7 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
101107

102108
public AuxiliaryFile lookupAuxiliaryFile(DataFile dataFile, String formatTag, String formatVersion) {
103109

104-
Query query = em.createQuery("select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.formatTag = :formatTag and o.formatVersion = :formatVersion");
110+
Query query = em.createNamedQuery("AuxiliaryFile.lookupAuxiliaryFile");
105111

106112
query.setParameter("dataFileId", dataFile.getId());
107113
query.setParameter("formatTag", formatTag);
@@ -114,4 +120,73 @@ public AuxiliaryFile lookupAuxiliaryFile(DataFile dataFile, String formatTag, St
114120
}
115121
}
116122

123+
public List<AuxiliaryFile> findAuxiliaryFiles(DataFile dataFile) {
124+
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
125+
query.setParameter("dataFileId", dataFile.getId());
126+
return query.getResultList();
127+
}
128+
129+
/**
130+
* @param inBundle If true, only return types that are in the bundle. If
131+
* false, only return types that are not in the bundle.
132+
*/
133+
public List<String> findAuxiliaryFileTypes(DataFile dataFile, boolean inBundle) {
134+
List<String> allTypes = findAuxiliaryFileTypes(dataFile);
135+
List<String> typesInBundle = new ArrayList<>();
136+
List<String> typeNotInBundle = new ArrayList<>();
137+
for (String type : allTypes) {
138+
// Check if type is in the bundle.
139+
String friendlyType = getFriendlyNameForType(type);
140+
if (friendlyType != null) {
141+
typesInBundle.add(type);
142+
} else {
143+
typeNotInBundle.add(type);
144+
}
145+
}
146+
if (inBundle) {
147+
return typesInBundle;
148+
} else {
149+
return typeNotInBundle;
150+
}
151+
}
152+
153+
public List<String> findAuxiliaryFileTypes(DataFile dataFile) {
154+
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFileTypes");
155+
query.setParameter(1, dataFile.getId());
156+
return query.getResultList();
157+
}
158+
159+
public List<AuxiliaryFile> findAuxiliaryFilesByType(DataFile dataFile, String typeString) {
160+
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
161+
query.setParameter("dataFileId", dataFile.getId());
162+
query.setParameter("type", typeString);
163+
return query.getResultList();
164+
}
165+
166+
public List<AuxiliaryFile> findOtherAuxiliaryFiles(DataFile dataFile) {
167+
List<AuxiliaryFile> otherAuxFiles = new ArrayList<>();
168+
List<String> otherTypes = findAuxiliaryFileTypes(dataFile, false);
169+
for (String typeString : otherTypes) {
170+
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
171+
query.setParameter("dataFileId", dataFile.getId());
172+
query.setParameter("type", typeString);
173+
List<AuxiliaryFile> auxFiles = query.getResultList();
174+
otherAuxFiles.addAll(auxFiles);
175+
}
176+
otherAuxFiles.addAll(findAuxiliaryFilesWithoutType(dataFile));
177+
return otherAuxFiles;
178+
}
179+
180+
public List<AuxiliaryFile> findAuxiliaryFilesWithoutType(DataFile dataFile) {
181+
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesWithoutType", AuxiliaryFile.class);
182+
query.setParameter("dataFileId", dataFile.getId());
183+
return query.getResultList();
184+
}
185+
186+
public String getFriendlyNameForType(String type) {
187+
AuxiliaryFile auxFile = new AuxiliaryFile();
188+
auxFile.setType(type);
189+
return auxFile.getTypeFriendly();
190+
}
191+
117192
}

src/main/java/edu/harvard/iq/dataverse/FileDownloadServiceBean.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,14 @@ private void redirectToBatchDownloadAPI(String multiFileString, Boolean download
280280
redirectToBatchDownloadAPI(multiFileString, true, downloadOriginal);
281281
}
282282

283+
public void redirectToAuxFileDownloadAPI(Long fileId, String formatTag, String formatVersion) {
284+
String fileDownloadUrl = "/api/access/datafile/" + fileId + "/metadata/" + formatTag + "/" + formatVersion;
285+
try {
286+
FacesContext.getCurrentInstance().getExternalContext().redirect(fileDownloadUrl);
287+
} catch (IOException ex) {
288+
logger.info("Failed to issue a redirect to aux file download url (" + fileDownloadUrl + "): " + ex);
289+
}
290+
}
283291

284292
/**
285293
* Launch an "explore" tool which is a type of ExternalTool such as

0 commit comments

Comments
 (0)