Skip to content

Commit 5ee79ee

Browse files
committed
v1.3.0 - on-demand support and specific scheduling
1 parent 8154937 commit 5ee79ee

11 files changed

Lines changed: 1297 additions & 955 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## 1.3.0 - 2024-11-18
6+
### Added features
7+
- Added manual/on-demand scheduler configuration option. Implements SSM Parameter to store the Step Functions event structure. Note: Impacts file timestamp comparison during runtime; recommended for on-demand use cases only.
8+
- Added ability to configure individual schedules in `SyncSettings`. Enables different schedules for multiple folders within single SFTP connection. Individual schedules take precedence over general `Schedule` setting. Reduces resource consumption by eliminating need for multiple configuration files and prevents the creation of dedicated Transfer Family Connector (including public IPs) and Secrets per configuration file.
9+
10+
### Changed
11+
- Implemented individual EventBridge Scheduler rules for each `SyncSettings` item in the configuration files.
12+
- Simplified Step Function by removing MAP State. Individual records now processed per execution, enabling above new features and providing improved error visibility for failed executions. Further improvements planned.
13+
514
## 1.2.0 - 2024-11-07
615
### Added features
716
- Added the ability to define tags for every resource created by the solution. This can be configured using the `configuration/solution_parameters/parameters.json` file.

README.md

Lines changed: 80 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
# File Transfer Synchronization solution
22
## Introduction
33

4-
This solution implements an automated strategy for synchronizing remote SFTP repositories with local S3 buckets. It schedules and orchestrates the process of listing remote directories, detecting changes, and transferring files.
4+
This solution implements an automated strategy for synchronizing remote SFTP repositories with local S3 buckets. It orchestrates the process of listing remote directories, detecting changes, and transferring files. It can be run based on a scheduler or on-demand.
55

66
**The solution leverages the following AWS services:**
77
- [Amazon EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html)
88
- [AWS Step Functions](https://aws.amazon.com/step-functions/)
99
- [AWS Transfer Family SFTP Connectors](https://docs.aws.amazon.com/transfer/latest/userguide/creating-connectors.html)
10+
- [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html)
1011

1112
**Key features:**
1213
- Monitors remote SFTP servers using SFTP Connectors' [List capabilities](https://docs.aws.amazon.com/transfer/latest/userguide/sftp-connector-list-dir.html)
@@ -28,13 +29,19 @@ A combination of Lambda, Step Functions and Transfer Family features facilitates
2829

2930
### Component Interactions
3031

31-
1. **Event Bridge Scheduler**
32-
- The Event Bridge Scheduler triggers the Step Function execution based on the configured schedule (e.g., daily, hourly, or a custom cron expression).
33-
- There are multiple schedules based on the Configuration files in this project and the Event passed to Step Functions includes the required parameters according to each schedule configuration.
32+
1. **Execution Phase**
33+
34+
a. **On-Demand Execution**
35+
- You can manually execute the Step Function by using the event structure stored in SSM Parameter Store.
36+
- While executing the Step Function, you can modify the `FromTimestamp` parameter in the event to specify the starting date and time for the file copy process.
37+
38+
b. **Event Bridge Scheduler**
39+
- The Event Bridge Scheduler triggers the Step Function execution based on the configured schedule (e.g., daily, hourly, or a custom cron expression).
40+
- There are multiple schedules based on the Configuration files in this project and the Event passed to Step Functions includes the required parameters according to each schedule configuration.
3441

3542
2. **Step Function**
3643
- The Step Function orchestrates the entire process and coordinates the interaction between different components.
37-
- For each `SyncSettings`, it invokes the `RemoteFoldersList` Lambda function interacts with the Transfer Family SFTP Connector to asynchronously retrieve a list of files in the remote folders to be synchronized.
44+
- For each event, it invokes the `RemoteFoldersList` Lambda function interacts with the Transfer Family SFTP Connector to asynchronously retrieve a list of files in the remote folders to be synchronized.
3845
- Then use the `GetListStatus` Lambda function, to check if the `List` process is finished and optionally get the list of child folder if `Recursive` is enabled to run a list again for those sub folders.
3946
- The `SyncRemoteFolder` Lambda function detects if new or modified files are available in the remote server, and then invokes the Transfer Family SFTP Connector to asynchronously transfer those files from the remote repository to the local S3 bucket.
4047
- If any errors occur during the synchronization process, the Step Function captures the error and sends a notification to the configured SNS topic.
@@ -70,11 +77,11 @@ To do so, you just need to push new configuration changes as `json` files to `./
7077

7178
The configuration file structure and content needs the following data:
7279

73-
```
80+
```json
7481
{
7582
"Description": <Connection Description>,
7683
"Name": <Identifying name for resources, no spaces allowed>,
77-
"Schedule": <Tag or AWS Cron Expression>,
84+
"Schedule": <Tag, AWS Cron Expression or "on-demand">,
7885
"Url": <Remote SFTP Server URL, FQDN and Port allowed>,
7986
"SecurityPolicyName": <TransferSFTPConnectorSecurityPolicy-2024-03 or TransferSFTPConnectorSecurityPolicy-2023-07>,
8087
"SyncSettings": [
@@ -87,7 +94,8 @@ The configuration file structure and content needs the following data:
8794
"RemoteFolders": {
8895
"Folder": <Remote Folder to Sync>,
8996
"Recursive": <true / false>
90-
}
97+
},
98+
(OPTIONAL) "Schedule": <Tag, AWS Cron Expression or "on-demand">
9199
},
92100
{ ... }
93101
],
@@ -101,23 +109,69 @@ The configuration file structure and content needs the following data:
101109
You can check the [example configuration file](configuration/examples/example-sftp-sync.json). Within AWS Account service limits, you can have as many configuration files as you need, and on the `SyncSettings` configuration list, you can define as many Remote to Local pairs as you wish and all will be run during the same schedule for the same Remote SFTP Server.
102110
The CDK Application will automatically resolve all the IAM Role permissions needed for the process to work and will create all the needed resources, including Event Bridge Scheduler, SFTP Connector and Secrets Manager Secret.
103111

104-
### Cron configuration
105-
For the Cron expression, you can use any of the pre-defined TAGs for simplicity or you can define your own cron expression. Keep in mind that this needs to be an [AWS Event Bridge Cron expression format](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-scheduled-rule-pattern.html#eb-cron-expressions). Available TAGs are:
106-
107-
| TAG | Expression |
108-
| :---------------- | :------: |
109-
| @monthly | 0 0 1 * ? * |
110-
| @daily | 0 0 * * ? * |
111-
| @hourly | 0 * * * ? * |
112-
| @minutely | * * * * ? * |
113-
| @sunday | 0 0 ? * 1 * |
114-
| @monday | 0 0 ? * 2 * |
115-
| @tuesday | 0 0 ? * 3 * |
116-
| @wednesday | 0 0 ? * 4 * |
117-
| @thursday | 0 0 ? * 5 * |
118-
| @friday | 0 0 ? * 6 * |
119-
| @saturday | 0 0 ? * 7 * |
120-
| @every10min | 0/10 * * * ? * |
112+
### Schedule Configuration
113+
114+
The file synchronization process can be configured to run based on a schedule or on-demand. The solution supports both global schedules for entire configurations and individual schedules for specific sync settings.
115+
116+
#### Scheduling Strategies
117+
118+
1. **Cron / Tag Schedule**:
119+
- Only considers files created in the remote repository between the current execution timestamp and the previous execution timestamp.
120+
- Useful for regular, periodic synchronization while avoiding duplicate transfers.
121+
122+
2. **On-Demand execution**:
123+
- When the `Schedule` value is configured as `on-demand`, at the Step Function execution phase you can set up an additional optional parameter called `FromTimestamp` that allows you to define from when (UTC Timestamp) files are considered to be copied.
124+
- By default the value for `FromTimestamp` is set to 0, meaning that all files (newer than 1 January 1970 00:00:00) will be compared.
125+
- Copy any modify files from the timestamp specified, including files that may have been deleted from S3 between runs but still exist on the remote SFTP server.
126+
127+
#### Individual Sync Setting Schedules
128+
129+
From version 1.3.0, you can define specific schedules for each item in your `SyncSettings` configuration list. This allows for:
130+
- Different schedules for multiple folders within a single SFTP connection
131+
- More granular control over synchronization timing
132+
- Reduced resource consumption by eliminating the need for multiple configuration files
133+
134+
Individual item level schedules take precedence over the general `Schedule` setting.
135+
136+
**Example scenario:**
137+
you need to synchronize a remote SFTP Server with 10 folders, 5 of those are updates once a day at midnight, 2 are updated hourly, 1 is updated weekly and for the remaining 2 you get notified when there are new files to run an on-demand copy. Before this update, you would have need to create 4 Configuration files, each with its dedicated Transfer Family Connector, public IPs and Secrets. Today you can create a single Configuration file (and it's resources) with different `Schedule` parameters for each item in the `SyncSettings` array according to the business needs.
138+
```json
139+
{
140+
"Schedule": "@daily",
141+
"SyncSettings": [
142+
{
143+
"LocalRepository": { ... },
144+
"RemoteFolders": { ... },
145+
"Schedule": "@hourly"
146+
},
147+
{
148+
"LocalRepository": { ... },
149+
"RemoteFolders": { ... }
150+
},
151+
{
152+
"LocalRepository": { ... },
153+
"RemoteFolders": { ... },
154+
"Schedule": "@weekly"
155+
},
156+
{
157+
"LocalRepository": { ... },
158+
"RemoteFolders": { ... },
159+
"Schedule": "on-demand"
160+
}
161+
]
162+
}
163+
```
164+
165+
#### Available Schedule Options
166+
* Predefined TAGs: @monthly, @daily, @hourly, @minutely, @sunday, @monday, @tuesday, @wednesday, @thursday, @friday, @saturday, @every10min
167+
* Custom Cron Expressions, keep in mind that this needs to be an [AWS Event Bridge Cron expression format](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-scheduled-rule-pattern.html#eb-cron-expressions)
168+
* "on-demand" for manual execution
169+
170+
#### Best Practices
171+
* Choose schedules that align with your data update frequency
172+
* Use individual schedules for folders with different update patterns
173+
* Consider resource usage and costs when setting frequent schedules
174+
* Test your configuration to ensure it meets your synchronization needs
121175

122176
### Target Bucket KMS Encryption
123177

@@ -187,7 +241,7 @@ After the first replication, the solution will only copy new or modified files f
187241
This project is built using Python3 and CDK, before you start, make sure to have all the pre requirements properly installed in your environment.
188242

189243
* AWS CLI https://aws.amazon.com/cli/
190-
* AWS CDK 2.150.0+ https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html#getting_started_install
244+
* AWS CDK 2.163.0+ https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html#getting_started_install
191245
* Python 3.9+
192246
* Python venv
193247

cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ def main():
343343
host_key = fetch_host_key(config['Url'])
344344
if host_key:
345345
if host_key in public_keys:
346-
print_colored(f"The fetched host key already exists in the configuration:", Fore.YELLOW)
346+
print_colored("The fetched host key already exists in the configuration:", Fore.YELLOW)
347347
print(host_key)
348348
else:
349349
add_fetched_key = inquirer.confirm(

configuration/examples/example-sftp-sync.json

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,19 @@
2424
"RemoteFolders": {
2525
"Folder": "/home/folder-2",
2626
"Recursive": false
27-
}
27+
},
28+
"Schedule": "@weekly"
29+
},
30+
{
31+
"LocalRepository": {
32+
"BucketName": "my-other-local-bucket",
33+
"Prefix": "sftp-provider-1/root"
34+
},
35+
"RemoteFolders": {
36+
"Folder": "/home/folder-3",
37+
"Recursive": false
38+
},
39+
"Schedule": "on-demand"
2840
}
2941
]
3042
}

images/architecture_diagram.png

24 KB
Loading

images/stepfunctions_graph.png

-13.3 KB
Loading

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[tool.ruff.format]
2+
quote-style = "single"
3+
indent-style = "tab"

0 commit comments

Comments
 (0)