Skip to content

Commit 061090d

Browse files
authored
123: Support process arrays (#166)
* docs: add test requirements to development install instructions * feat: deprecate process definition as dictionary - emit deprecation warning if payload process definition is a bare dictionary - expect payload process definition to be first element in a list * test: update existing tests and add deprecation test - update existing tests fixture to use a list of process definitions - add a test to check for deprecation warning when using a dict for process definition * fix: sidecar fix making default task config a dictionary * docs: update CHANGELOG * docs: update README * fix: update warning and error messages to address review comments
1 parent 436691e commit 061090d

5 files changed

Lines changed: 169 additions & 113 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,13 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
66
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Deprecated
11+
12+
- ([#123](https://github.com/stac-utils/stac-task/issues/123)) Bare `ProcessDefinition`
13+
objects are deprecated in favor of arrays of `ProcessDefinition` objects.
14+
815
## [0.6.0]
916

1017
### ⚠️ Breaking Change

README.md

Lines changed: 81 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,20 @@
1515
- [collections](#collections)
1616
- [tasks](#tasks)
1717
- [TaskConfig Object](#taskconfig-object)
18-
- [Full Process Definition Example](#full-process-definition-example)
18+
- [Full ProcessDefinition Example](#full-processdefinition-example)
1919
- [Migration](#migration)
2020
- [0.4.x -\> 0.5.x](#04x---05x)
21+
- [0.5.x -\> 0.6.0](#05x---060)
2122
- [Development](#development)
2223
- [Contributing](#contributing)
2324

24-
This Python library consists of the Task class, which is used to create custom tasks based
25-
on a "STAC In, STAC Out" approach. The Task class acts as wrapper around custom code and provides
26-
several convenience methods for modifying STAC Items, creating derived Items, and providing a CLI.
25+
This Python library consists of the Task class, which is used to create custom tasks
26+
based on a "STAC In, STAC Out" approach. The Task class acts as wrapper around custom
27+
code and provides several convenience methods for modifying STAC Items, creating derived
28+
Items, and providing a CLI.
2729

28-
This library is based on a [branch of cirrus-lib](https://github.com/cirrus-geo/cirrus-lib/tree/features/task-class) except aims to be more generic.
30+
This library is based on a [branch of cirrus-lib](https://github.com/cirrus-geo/cirrus-lib/tree/features/task-class)
31+
except aims to be more generic.
2932

3033
## Quickstart for Creating New Tasks
3134

@@ -59,25 +62,33 @@ class MyTask(Task):
5962

6063
## Task Input
6164

62-
| Field Name | Type | Description |
63-
| ---------- | ----------------- | ------------------------- |
64-
| type | string | Must be FeatureCollection |
65-
| features | [Item] | A list of STAC `Item` |
66-
| process | ProcessDefinition | A Process Definition |
65+
Task input is often referred to as a 'payload'.
66+
67+
| Field Name | Type | Description |
68+
| ---------- | ------------------------- | --------------------------------------------------- |
69+
| type | string | Must be FeatureCollection |
70+
| features | [Item] | An array of STAC Items |
71+
| process | [`ProcessDefinition`] | An array of `ProcessDefinition` objects. |
72+
| ~~process~~ | ~~`ProcessDefinition`~~ | **DEPRECATED** A `ProcessDefinition` object |
6773

6874
### ProcessDefinition Object
6975

70-
A STAC task can be provided additional configuration via the 'process' field in the input
71-
ItemCollection.
76+
A Task can be provided additional configuration via the 'process' field in the input
77+
payload.
78+
79+
| Field Name | Type | Description |
80+
| -------------- | ------------------ | ---------------------------------------------- |
81+
| description | string | Description of the process configuration |
82+
| upload_options | `UploadOptions` | An `UploadOptions` object |
83+
| tasks | Map<str, Map> | Dictionary of task configurations. |
84+
| ~~tasks~~ | ~~[`TaskConfig`]~~ | **DEPRECATED** A list of `TaskConfig` objects. |
7285

73-
| Field Name | Type | Description |
74-
| -------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
75-
| description | string | Optional description of the process configuration |
76-
| upload_options | UploadOptions | Options used when uploading assets to a remote server |
77-
| tasks | Map<str, Map> | Dictionary of task configurations. A list of [task configurations](#taskconfig-object) is supported for backwards compatibility reasons, but a dictionary should be preferred. |
7886

7987
#### UploadOptions Object
8088

89+
Options used when uploading Item assets to a remote server can be specified in a
90+
'upload_options' field in the `ProcessDefinition` object.
91+
8192
| Field Name | Type | Description |
8293
| ------------- | ------------- | --------------------------------------------------------------------------------------- |
8394
| path_template | string | **REQUIRED** A string template for specifying the location of uploaded assets |
@@ -88,16 +99,19 @@ ItemCollection.
8899

89100
##### path_template
90101

91-
The path_template string is a way to control the output location of uploaded assets from a STAC Item using metadata from the Item itself.
92-
The template can contain fixed strings along with variables used for substitution.
93-
See [the PySTAC documentation for `LayoutTemplate`](https://pystac.readthedocs.io/en/stable/api/layout.html#pystac.layout.LayoutTemplate) for a list of supported template variables and their meaning.
102+
The 'path_template' string is a way to control the output location of uploaded assets
103+
from a STAC Item using metadata from the Item itself. The template can contain fixed
104+
strings along with variables used for substitution. See [the PySTAC documentation for
105+
`LayoutTemplate`](https://pystac.readthedocs.io/en/stable/api/layout.html#pystac.layout.LayoutTemplate)
106+
for a list of supported template variables and their meaning.
94107

95108
##### collections
96109

97-
The collections dictionary provides a collection ID and JSONPath pattern for matching against STAC Items.
98-
At the end of processing, before the final STAC Items are returned, the Task class can be used to assign
99-
all of the Items to specific collection IDs. For each Item the JSONPath pattern for all collections will be
100-
compared. The first match will cause the Item's Collection ID to be set to the provided value.
110+
The 'collections' dictionary provides a collection ID and JSONPath pattern for matching
111+
against STAC Items. At the end of processing, before the final STAC Items are returned,
112+
the Task class can be used to assign all of the Items to specific collection IDs. For
113+
each Item the JSONPath pattern for all collections will be compared. The first match
114+
will cause the Item's Collection ID to be set to the provided value.
101115

102116
For example:
103117

@@ -107,15 +121,18 @@ For example:
107121
}
108122
```
109123

110-
In this example, the task will set any STAC Items that have an ID beginning with "LC08" to the `landsat-c2l2` collection.
124+
In this example, the task will set any STAC Items that have an ID beginning with "LC08"
125+
to the `landsat-c2l2` collection.
111126

112-
See [JSONPath Online Evaluator](https://jsonpath.com) to experiment with JSONPath and [regex101](https://regex101.com) to experiment with regex.
127+
See [JSONPath Online Evaluator](https://jsonpath.com) to experiment with JSONPath and
128+
[regex101](https://regex101.com) to experiment with regex.
113129

114130
#### tasks
115131

116-
The tasks field is a dictionary with an optional key for each task. If present, it contains
117-
a dictionary that is converted to a set of keywords and passed to the Task's `process` function.
118-
The documentation for each task will provide the list of available parameters.
132+
The 'tasks' field is a dictionary with an optional key for each task. If present, it
133+
contains a dictionary that is converted to a set of keywords and passed to the Task's
134+
`process` function. The documentation for each Task will provide the list of available
135+
parameters.
119136

120137
```json
121138
{
@@ -130,32 +147,32 @@ The documentation for each task will provide the list of available parameters.
130147
}
131148
```
132149

133-
In the example above a task named `task-a` would have the `param1=value1` passed as a keyword, while `task-c`
134-
would have `param2=value2` passed. If there were a `task-b` to be run it would not be passed any keywords.
150+
In the example above, a task named `task-a` would have the `param1=value1` passed as a
151+
keyword, while `task-c` would have `param2=value2` passed. If there were a `task-b` to
152+
be run, it would not be passed any keywords.
135153

136154
#### TaskConfig Object
137155

138-
**DEPRECATED**: `tasks` should be a dictionary of parameters, with task names as keys. See [tasks](#tasks) for more information.
156+
**DEPRECATED** The 'tasks' field _should_ be a dictionary of parameters, with task names
157+
as keys. See [tasks](#tasks) for more information. `TaskConfig` objects are supported
158+
for backwards compatibility.
139159

140-
A Task Configuration contains information for running a specific task.
160+
| Field Name | Type | Description |
161+
| ---------- | ------------- | ----------------------------------------------------------------------------------- |
162+
| name | str | **REQUIRED** Name of the task |
163+
| parameters | Map<str, str> | Dictionary of keyword parameters that will be passed to the Task `process` function |
141164

142-
| Field Name | Type | Description |
143-
| ---------- | ------------- | ------------------------------------------------------------------------------------ |
144-
| name | str | **REQUIRED** Name of the task |
145-
| parameters | Map<str, str> | Dictionary of keyword parameters that will be passed to the Tasks `process` function |
146165

147-
## Full Process Definition Example
148-
149-
Process definitions are sometimes called "Payloads":
166+
### Full ProcessDefinition Example
150167

151168
```json
152169
{
153170
"description": "My process configuration",
154-
"collections": {
155-
"landsat-c2l2": "$[?(@.id =~ 'LC08.*')]"
156-
},
157171
"upload_options": {
158-
"path_template": "s3://my-bucket/${collection}/${year}/${month}/${day}/${id}"
172+
"path_template": "s3://my-bucket/${collection}/${year}/${month}/${day}/${id}",
173+
"collections": {
174+
"landsat-c2l2": "$[?(@.id =~ 'LC08.*')]"
175+
}
159176
},
160177
"tasks": {
161178
"task-name": {
@@ -169,13 +186,13 @@ Process definitions are sometimes called "Payloads":
169186

170187
### 0.4.x -> 0.5.x
171188

172-
In 0.5.0, the previous use of fsspec to download Item Assets has been replaced with
173-
the stac-asset library. This has necessitated a change in the parameters
174-
that the download methods accept.
189+
In 0.5.0, the previous use of fsspec to download Item Assets has been replaced with the
190+
stac-asset library. This has necessitated a change in the parameters that the download
191+
methods accept.
175192

176193
The primary change is that the Task methods `download_item_assets` and
177-
`download_items_assets` (items plural) now accept fewer explicit and implicit
178-
(kwargs) parameters.
194+
`download_items_assets` (items plural) now accept fewer explicit and implicit (kwargs)
195+
parameters.
179196

180197
Previously, the methods looked like:
181198

@@ -225,8 +242,9 @@ async def download_item_assets(
225242
) -> Item:
226243
```
227244

228-
Additionally, `kwargs` keys were set to pass configuration through to fsspec. The most common
229-
parameter was `requester_pays`, to set the Requester Pays flag in AWS S3 requests.
245+
Additionally, `kwargs` keys were set to pass configuration through to fsspec. The most
246+
common parameter was `requester_pays`, to set the Requester Pays flag in AWS S3
247+
requests.
230248

231249
Many of these parameters can be directly translated into configuration passed in a
232250
`DownloadConfig` object, which is just a wrapper over the `stac_asset.Config` object.
@@ -239,17 +257,16 @@ Migration of these various parameters to `DownloadConfig` are as follows:
239257
`FileNameStrategy.FILE_NAME` if True or `FileNameStrategy.KEY` if False
240258
- `overwrite`: set `overwrite`
241259
- `save_item`: none, Item is always saved
242-
- `absolute_path`: none. To create or retrieve the Asset hrefs as absolute paths, use either
243-
`Item#make_all_asset_hrefs_absolute()` or `Asset#get_absolute_href()`
260+
- `absolute_path`: none. To create or retrieve the Asset hrefs as absolute paths, use
261+
either `Item#make_all_asset_hrefs_absolute()` or `Asset#get_absolute_href()`
244262

245263
### 0.5.x -> 0.6.0
246264

247-
Previously, the `validate` method was a _classmethod_, validating the payload
248-
argument passed. This has now been made an instance method, which validates
249-
the `self._payload` copy of the payload, from which the `Task` instance is
250-
constructed. This is behaviorally the same, in that construction will fail if
251-
validation fails, but allows implementers to utilize the instance method's
252-
convenience functions.
265+
Previously, the `validate` method was a _classmethod_, validating the payload argument
266+
passed. This has now been made an instance method, which validates the `self._payload`
267+
copy of the payload, from which the `Task` instance is constructed. This is
268+
behaviorally the same, in that construction will fail if validation fails, but allows
269+
implementers to utilize the instance method's convenience functions.
253270

254271
Previous implementations of `validate` would have been similar to this:
255272

@@ -270,12 +287,13 @@ And will now need to be updated to this form:
270287

271288
## Development
272289

273-
Clone, install in editable mode with development requirements, and install the **pre-commit** hooks:
290+
Clone, install in editable mode with development and test requirements, and install the
291+
**pre-commit** hooks:
274292

275293
```shell
276294
git clone https://github.com/stac-utils/stac-task
277295
cd stac-task
278-
pip install -e '.[dev]'
296+
pip install -e '.[dev,test]'
279297
pre-commit install
280298
```
281299

@@ -293,4 +311,5 @@ pre-commit run --all-files
293311

294312
## Contributing
295313

296-
Use Github [issues](https://github.com/stac-utils/stac-task/issues) and [pull requests](https://github.com/stac-utils/stac-task/pulls).
314+
Use Github [issues](https://github.com/stac-utils/stac-task/issues) and [pull
315+
requests](https://github.com/stac-utils/stac-task/pulls).

stactask/task.py

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,6 @@ def __init__(
7575
upload: bool = True,
7676
validate: bool = True,
7777
):
78-
7978
self._payload = payload
8079

8180
if not skip_validation and validate:
@@ -108,15 +107,37 @@ def __init__(
108107

109108
@property
110109
def process_definition(self) -> dict[str, Any]:
111-
process = self._payload.get("process", {})
110+
process = self._payload.get("process", [])
112111
if isinstance(process, dict):
112+
warnings.warn(
113+
(
114+
"`process` as a bare dictionary will be unsupported in a future "
115+
"version; wrap it in a list to remove this warning"
116+
),
117+
DeprecationWarning,
118+
stacklevel=2,
119+
)
113120
return process
114-
else:
115-
raise ValueError(f"process is not a dict: {type(process)}")
121+
122+
if not isinstance(process, list):
123+
raise TypeError("unable to parse `process`: must be type list")
124+
125+
if not process:
126+
return {}
127+
128+
if not isinstance(process[0], dict):
129+
raise TypeError(
130+
(
131+
"unable to parse `process`: the first element of the list must be "
132+
"a dictionary"
133+
)
134+
)
135+
136+
return process[0]
116137

117138
@property
118139
def parameters(self) -> dict[str, Any]:
119-
task_configs = self.process_definition.get("tasks", [])
140+
task_configs = self.process_definition.get("tasks", {})
120141
if isinstance(task_configs, list):
121142
warnings.warn(
122143
"task configs is list, use a dictionary instead",

tests/fixtures/sentinel2-l2a-j2k-payload.json

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,32 @@
11
{
22
"type": "FeatureCollection",
33
"id": "sentinel-s2-l2a/workflow-test/S2B_17HQD_20201103_0_L2A",
4-
"process": {
5-
"input_collections": [
6-
"sentinel-2-l2a"
7-
],
8-
"workflow": "cog-archive",
9-
"upload_options": {
10-
"path_template": "s3://sentinel-cogs/${collection}/${mgrs:utm_zone}/${mgrs:latitude_band}/${mgrs:grid_square}/${year}/${month}/${id}",
11-
"public_assets": "ALL",
12-
"collections": {
13-
"sentinel-2-l2a": "$[?(@.id =~ 'S2[AB].*')]"
14-
},
15-
"headers": {
16-
"CacheControl": "public, max-age=31536000, immutable"
17-
}
18-
},
19-
"tasks": {
20-
"nothing-task": {
21-
"do_nothing": true
4+
"process": [
5+
{
6+
"input_collections": [
7+
"sentinel-2-l2a"
8+
],
9+
"workflow": "cog-archive",
10+
"upload_options": {
11+
"path_template": "s3://sentinel-cogs/${collection}/${mgrs:utm_zone}/${mgrs:latitude_band}/${mgrs:grid_square}/${year}/${month}/${id}",
12+
"public_assets": "ALL",
13+
"collections": {
14+
"sentinel-2-l2a": "$[?(@.id =~ 'S2[AB].*')]"
15+
},
16+
"headers": {
17+
"CacheControl": "public, max-age=31536000, immutable"
18+
}
2219
},
23-
"derived-item-task": {
24-
"parameter": "value"
20+
"tasks": {
21+
"nothing-task": {
22+
"do_nothing": true
23+
},
24+
"derived-item-task": {
25+
"parameter": "value"
26+
}
2527
}
2628
}
27-
},
29+
],
2830
"features": [
2931
{
3032
"type": "Feature",

0 commit comments

Comments
 (0)