Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased (Targeting 0.2.5)]

### Added
- Optional parameter to `fetch_file()` with a modified time of the remote file pulled from the TIND API
- `fetch_file()` uses this to avoid unnecessary downloads if a file already exists at the target
Comment thread
awilfox marked this conversation as resolved.
Outdated
location and has a modified time that is newer than the requested file

### Changed
- slight change to raise a file not downloaded error if `tind_download()` fails to return a written file path


## [0.2.4]

### Added
Expand Down
23 changes: 21 additions & 2 deletions tind_client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
"""

import json
import logging
import os
import re
from datetime import datetime, timezone
from io import StringIO
from pathlib import Path
from typing import Any, Iterator
Expand All @@ -16,6 +18,7 @@
from .api import tind_get, tind_download
from .errors import RecordNotFoundError, TINDError

logger = logging.getLogger(__name__)

NS = "http://www.loc.gov/MARC21/slim"
E.register_namespace("", NS)
Expand Down Expand Up @@ -69,12 +72,17 @@ def fetch_metadata(self, record: str) -> Record:

return records[0]

def fetch_file(self, file_url: str, output_dir: str = "") -> str:
def fetch_file(self, file_url: str, output_dir: str = "", modified: str = "") -> str:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following @awilfox's suggestion, i think modified should actually be a datetime.datetime - it shouldn't be this method's responsibility to convert the incoming string into a value that can be compared.

"""Download a file from TIND and save it locally.

If the file already exists in the output directory and was modified at or after a supplied
``modified`` timestamp, the file will not be re-downloaded.

:param str file_url: The TIND file download URL.
:param str output_dir: Directory in which to save the file.
Falls back to ``default_storage_dir`` when empty.
:param str modified: Optional modified timestamp from the file metadata returned by TIND.
If not specified, the file will always be downloaded.
:raises AuthorizationError: When the TIND API key is invalid or the file is restricted.
:raises ValueError: When ``file_url`` is not a valid TIND file download URL.
:raises RecordNotFoundError: When the file is invalid or not found.
Expand All @@ -84,9 +92,20 @@ def fetch_file(self, file_url: str, output_dir: str = "") -> str:
raise ValueError("URL is not a valid TIND file download URL.")

output_target = output_dir or self.default_storage_dir

expected_filename = file_url.rstrip("/").split("/")[-2]
expected_path = Path(output_target) / expected_filename

if modified and expected_path.exists():
meta_mtime = datetime.fromisoformat(modified).replace(tzinfo=timezone.utc)
local_mtime = datetime.fromtimestamp(expected_path.stat().st_mtime, tz=timezone.utc)
if local_mtime >= meta_mtime:
logger.debug("Cached file at (%s) is newer; skipping download.", expected_path)
return str(expected_path)

(status, saved_to) = tind_download(file_url, output_dir=output_target, api_key=self.api_key)

if status != 200:
if status != 200 or not saved_to:
raise RecordNotFoundError("Referenced file could not be downloaded.")

return saved_to
Expand Down