All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Reducing default maxTriggersInMemory to 10_000, but leaving it the same for production environment. It allows to reduce the reserve heap size in non-production environments.
- Introduced
twTasks.processing.triggersLimitgauge to show how many triggers we can keep in memory for a bucket.
- Added support for Spring Boot 3.5.
- Updated the version of spring boot 3.4 to 3.4.6.
- Dropped support for Spring Boot 3.3.
By default, when a task fails with a processing error, we log it with ERROR level and retry it according to the retry policy.
Override ITaskRetryPolicy#getExceptionHandler method with custom exception handler to change the default behavior.
For example, if you want to log it differently or log it only on last the retry.
- Added support to cancel tasks in waiting state.
- Allow for partition.assignment.strategy for consumer to be overridden in tw-tasks-kafka-listener-spring-boot-starter
- Nothing to do for most cases.
It is worth keeping an eye on:
- changes to assignors used, log
Successfully synced group in generation Generation - on assignment strategy failures on consumers in prod and consumer state.
If you use com.wise.kafka.assignors.CanaryAwareRangeAssignor, consider setting this config:
spring.kafka.consumer.properties.partition.assignment.strategy:
com.wise.kafka.assignors.CanaryAwareRangeAssignor, org.apache.kafka.clients.consumer.RangeAssignor, org.apache.kafka.clients.consumer.CooperativeStickyAssignor
- Fixed an issue where a malformed message published to a trigger topic would cause task processing to become stuck.
- Fixed a lock timeout on update queries in tests, when running integration tests in a Transaction.
- introduced by 1.47.0
- Added timer
twTasks.tasks.taskGrabbingTimetracking the time between a task being triggered and a task being grabbed for processing
- Added
resetAndInitialize(Collection<IJob> jobs)method toITestJobsServiceto allow to init only specific jobs in tests
- Added support for Spring Boot 3.4.
- Dropped support for Spring Boot 3.2.
- Added
autoInitializeproperty to enable/disable autoInitialization for job service, preventing registration and resumption when disabled. - Added
startTasksCleanerandstartTaskResumerproperties to switch on/off task cleaner and task resumer.
- Restoring Spring 4 compatibly.
- Context switches are reduced.
- MDC values are cleared more aggressively.
- When registering cron tasks, log error if job already exists, but task is in error state.
- If silent mode is turned on, then this log will not appear.
- Added support for task context
You will need to do the following migration:
Postgres:
ALTER TABLE tw_task_data
ADD COLUMN task_context_format SMALLINT,
ADD COLUMN task_context BYTEA;
MariaDB:
ALTER TABLE tw_task_data WAIT 2
ADD COLUMN IF NOT EXISTS task_context_format SMALLINT,
ADD COLUMN IF NOT EXISTS task_context BLOB,
ALGORITHM = INSTANT,
LOCK = NONE;
- Support for Spring Boot 3.3.
- Support for spring boot 3.1 and 2.7 versions.
/getTaskTypesendpoint may be disabled through configuration propertytw-tasks.core.tasks-management.enable-get-task-types: false. Services with extreme amount of tasks might benefit from this.
- Use static methods to create BeanPostProcessors.
/getTaskTypesendpoint accepts optional query parameterstatusto filter only types of tasks in the particular status(es).- Fixed a bug with
taskTypeandtaskSubTypefilters on query endpoints when multiple values are supplied, where it would consider only one value.
- Add compatibility with Spring Boot 3.2.
- Update dependencies
- Kafka producer instantiation will be attempted up to 5 times with a 500ms delay between each attempt. In some cases, it has been observed that the CI fails to start the Kafka producer because the kafka docker container itself seems to not be fully up & accessible yet.
- When building a Spring
ResponseEntitywith an explicit status, provide an integer derived from theHttpStatusenum, rather than providing theHttpStatusdirectly, to handle binary incompatibility between Spring 5 and 6 causing NoSuchMethod errors when tw-tasks is used with Spring 6
- Added
taskTypeandtaskSubTypeparameters to management query endpoints. - Added
/getTaskTypesendpoint to retrieve list of registered task types and sub-types
- NullPointerException in TaskManagementService.getTaskData in case task is not found
- Setting METADATA_MAX_AGE_CONFIG to two minutes for producer
- Monitoring queries for Postgres finding approximate table sizes in the databases were using a wrong schema and thus no records were found.
- Support for Spring Boot 3.1
- Build against Spring Boot 3.0.6 --> 3.0.9
- Build against Spring Boot 2.7.11 --> 2.7.14
- Build against Spring Boot 2.6.14 --> 2.6.15
- introduced a new configuration parameter
tw-tasks.core.no-op-task-typesthat allows a default no operation task handler to pick up deprecated task types in your service.
commitSyncoperation sometimes reporting a WakeupException.
- CronJob annotation for Spring bean's methods
-
Circular dependency with graceful shutdown.
-
docker-compose on linux.
- Kafka consumer offset duration is always considered as positive since we cannot reset the offsets to future timestamps.
- Both
PT1Hand-PT1Hare treated the same iePT1H. This value gets subtracted by now() timestamp. - Added second kafka consumer for the tests in
SeekToDurationOnRebalanceListenerIntTestclass - Updated the
docker-compose.ymlto make kafka container run as expected.
- Support for Spring Boot 3.0.
- Replaced
@Validatedannotation with custom call to validator.@Validatedannotation based approach made services startup slow. - Improved the graceful shutdown speed to be less than medium delay interval (by default 5s).
- Changed
MySqlTaskDaotoJdbcTaskDao, because some Postgres users got confused/spooked having "mysql" in their stack trace.
- Support for Spring Boot 2.5.
-
Added IPartitionKeyStrategy interface. This interface allows for custom strategies to be implemented by clients that want more control over the partition key generation.
-
Add a basic implementation to IPartitionKeyStrategy: RandomPartitionKeyStrategy. This strategy always generates a random partition key (like the previous behaviour).
-
Included IPartitionKeyStrategy into SimpleTaskProcessingPolicy.
- The Spring Boot Version from which the library dependencies are derived, was moved from 2.7 to 2.6. This should give better compatibility, as backward compatibility is usually better than forward one.
-
Tasks' triggers' offset is committed synchronously, when partitions are revoked.
-
Reworked paranoid tasks cleaner to work with latest mariadb drivers.
-
Made it compatible with Spring Boot 2.7
-
Removing support for Spring Boot 2.4
- Metric
kafkaTasksExecutionTriggerer.failedCommitsCountwas removed.kafkaTasksExecutionTriggerer.commitsCountgotsyncandsuccesstags.
- Some initialization logs allowing to understand which lock keys are used.
ConsistentKafkaConsumeris asynchronously commiting offsets now with an interval, by default once in 5 seconds per partition. Notice that tw-tasks-kafka-listener is deprecated.ConsistentKafkaConsumeris doing a synchronous commit, during revoking of partitions. This would make it much less likely that a node getting those partitions assigned will find duplicates.
- Inserting unique key into database is more consistent.
- Using
CooperativeStickyAssignor,RangeAssignorwhen it is detected thatkafka-clientsis3.+. - Task grabbing is using just implicit transactions.
- Simple and small
tw-tasks-jobs-test-spring-boot-startermodule to reduce a bit of boilerplate in services testing jobs.
- Using
CooperativeStickyAssignorwhen it is detected thatkafka-clientsis3.+.
- Putting back
ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, CooperativeStickyAssignor.class.getName() + "," + RangeAssignor.class.getName()for Kafka consumers. Typically, the tw-tasks consumer group is shared with other kafka consumers in a service, so justCooperativeStickyAssignorwould create issues on older kafka-clients.
- Deadline is removed after task processing, allowing follow-up database operations to succeed in any case.
Example case: task processing threw
DeadlineExceededExceptionand asking retry time threw it again.
- Spring's 4.x
TransactionSynchronizationdoes not have default methods implemented. In order for tw-tasks to work on Spring 4.x, restoredTransactionSynchronizationAdapterclass.
tw-tasks-kafka-listenermodule is not depending onspring-kafkaanymore, so it can be used also on older services.- On offset loss in
tw-tasks-kafka-listener, by default, we are rewinding back to 1 hours. - Reduced integration tests suite runtime from approximately 2 minutes to 25 seconds. This was mainly achieved to have different Kafka consumer groups for different things/tests and thus avoid lengthy stop-the-consume re-balancing pauses. Can be reduced a bit more, but I had this work time-boxed.
- All Kafka consumers and producers register micrometer metrics.
- Small tweaks to consumers and producers configs. Important one
is
ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, CooperativeStickyAssignor.class.getName() + "," + RangeAssignor.class.getName() tw-tasks-kafka-listenernow rewinds 1 hour, when offset is lost. This can be changed viatw-tasks.impl.kafka.listener.autoResetOffsetToproperty.
- Removed deprecated
coreKafkaListenerTopicsConfiguringEnabledconfiguration property.
- Migrated CI from Circle to GHA.
- Stuck tasks resumer was hanging due to semaphore not getting released.
- Stuck tasks count metric now also has task status dimension.
- Scheduled and stuck tasks are now resumed concurrently, by default with the parallelism of 10. This eliminates a bottleneck for services relying on large volume of scheduled tasks.
- JDK 11+ is a requirement.
- Opensource facelift.
- Better support for implementing rate-limiting as an
ITaskConcurrencyPolicyimplementation.
- Checking of some database transactions state is done only when assertions are enabled.
- We don't check if task is in submitted status, when grabbing, by default. The version check is enough. It can be turned on though via a property, it can be useful for tw-task test suites.
- Refactoring and optimizing code around metrics.
- Fixed high CPU usage around
TasksProperties, due to@Validatedannotations. - Various small optimizations and library upgrades.
- Support type-level task management configuration
tw-tasks.core.tasks-management.type-specific
- Core and tasks triggering system does not depend on Spring Kafka, nor it's configuration. Services have to now
specify
tw-tasks.core.triggering.kafka.bootstrap-serversparameter. - Tasks triggering system has its own ObjectMapper instance.
- Node's tasks are resumed on startup by the same logic we resume other stuck tasks. On the startup, the current node tasks in
PROCESSINGstate will be marked toERRORnow. This is a safer default option. For example, whennodeIdis wrongly configured and not unique around the whole service cluster, we can easily have already executing task getting wrongly resumed and having it being executed twice at the same time. - Fixed also start-up race conditions around same-node
PROCESSINGtasks resuming. It was possible to start processing a task and this same task getting immediately resumed by the start-up logic in theTasksResumercomponent.
- Increases the tasks grabbing maximum concurrency from 10 to 25 and makes it configurable by a property.
- Fixes Base64 encoder package.
- Fixes of fetching task info and data via management endpoints for tasks with empty data.
- Allows fetching task data as Base64.
- Fixes approximate tables rows counts queries.
- Removes
copyDataToTwTaskFieldproperty and sets1.21.1as minimum upgradable version. We don't write into olddatafield anymore.
- Task data is still saved into old data field, to allow seamless upgrade process. When upgrade from versions before 1.21.1 is finished, i.e. all
cluster nodes have the new version;
TasksProperties.copyDataToTwTaskFieldcan be set tofalse, stopping writing into the old data field. TasksProperties.Environment.previousVersionis mandatory.
- Task data is now binary.
- The payload is kept in the storage in compressed format. By default, gzip is used.
- Introduced tw-tasks-core-test-spring-boot-starter for simpler test setup in services.
Migration for MySql.
CREATE TABLE tw_task_data
(
task_id BINARY(16) PRIMARY KEY NOT NULL,
data_format INT NOT NULL,
data LONGBLOB NOT NULL
);
CREATE TABLE tw_task_data
(
task_id UUID PRIMARY KEY NOT NULL,
data_format INT NOT NULL,
data BYTEA NOT NULL
) WITH (toast_tuple_target = 8160);
ALTER TABLE tw_task_data
ALTER COLUMN data SET STORAGE EXTERNAL;
- Removed deprecated kafka-publisher modules. Tw-tkms has been successfully used in 19 services and is stable now.
- Stuck tasks warning has information and metrics about specific task types.
- Remove AdminClientTopicPartitionsManager and remove configureKafkaTopics. You need to remove the configuration
property:
tw-tasks.core.configure-kafka-topics.
- Fix AdminClient Jmx registration issue.
- Allowing most beans defined by auto configuration to be overridden.
- MDC corrections. Following MDC keys are now set for tasks under processing:
twTaskIdtwTaskVersiontwTaskTypetwTaskSubType
twTaskVersionId is not set anymore.
-
Task can now define its TwContext criticality and owner.
-
Lots of corrections around entry points creation.
-
Optimization and configuration for fetching approximate tasks and unique keys count by cluster wide tasks state monitor. Consult with
com.transferwise.tasks.TasksProperties.ClusterWideTasksStateMonitorfor added configuration options. -
Minor external libraries upgrades.
-
Minor testsuite optimizations.
Some transactions are now using isolation level READ_UNCOMMITTED. If you are using JTA transaction manager, you may have to do two things.
- Wrap your datasource into
org.springframework.jdbc.datasource.IsolationLevelDataSourceAdapter - Set
org.springframework.transaction.jta.JtaTransactionManager.setAllowCustomIsolationLevelsto true.
Use separate DAOs for Core/Test/Management.
- ITaskDao - data access operations used by the core and extensions.
- IManagementTaskDao - data access operations used by the management extension.
- ITestTaskDao - data access operations used for testing purposes
Users of tw-tasks-core-test need to configure ITestTaskDao in the test configuration as from this version it is required by TestTasksService.
// either
@Bean
public ITestTaskDao postgresTestTaskDao(DataSource dataSource, TasksProperties tasksProperties) {
return new PostgresTestTaskDao(dataSource, tasksProperties);
}
// or
@Bean
public ITestTaskDao mysqlTestTaskDao(DataSource dataSource, TasksProperties tasksProperties) {
return new MySqlTestTaskDao(dataSource, tasksProperties);
}
- Partitions manager will log a warn only when a topic is missing or configured number of partitions is different from existing ones.
- Switched away from testcontainers, used docker-compose plugin for all integration tests.
- Removed support for xRequestId.
- Minor bugfixes for approximate tasks count in the database, related to multi schema setups.
- MySQL INSERT IGNORE has additional checks to make sure the failure was about duplicate records and not about something else.
- Added metrics for knowing approximate tasks count in the database.
- We are starting to use sequential UUIDs, which are more suitable for database storage. Gains are especially large and exponential on MariaDb. 1mln tasks 2x speed on db perf test. 2mln tasks 4x speed on db perf test.
Technically we use 38 bit timestamp (millis) prefix on random UUID as implicit task ids.
https://www.informit.com/articles/article.aspx?p=25862 https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/ https://en.wikipedia.org/wiki/Universally_unique_identifier#As_database_keys
-
(id,version) index was removed on Postgres as well, making db perf test to run 25% faster.
-
MariaDb schema for new services was redesigned. However, the code is still working and keeps working with older schema as well.
-
Another, more optimal table schema was tested and proposed for MariaDb applications which for whatever reasons are forced to use random UUIDs with large number of tw tasks.
-
Added a db perf test to
demoappandDemoAppRealTest, which is more suitable to compare database bottlenecks tests. -
When a task is being set to a final state, the next_event_time is set to current time. This will make the task cleaning process more accurate.
- Old tasks are now cleaned by ids only and not checking their versions. It allows to execute multivalue queries, which should be more efficient.
Previous situation can be set by
TasksProperties.paranoidTasksCleaning=true.
- Moving away from deprecated LeaderSelector to LeaderSelectorV2.
- Added new metric
twTasks.task.addings.countfor tracking adding of new tasks. - Background jobs start and stop messages contain
group.id. It allows quickly to understand, if some service is using another service's identifier. - Upgraded external libraries to latest.
- Optimized a TasksResumer query executed on startup for Postgres. Postgres was likely to decide to not use
(status, next_event_time)and do a full scan instead. - Properties
minPriorityandmaxPriorityontw-tasks.corewere renamed tohighestPriorityandlowestPriority. It will hopefully make it more clear, that lower priority numbers mean higher chance to be executed first.
- Fixes a bug, where using a max priority for a task causes a null pointer exception.
- IKafkaMessageHandler Topics can now specify a shard. Every shard will have it's own KafkaConsumer and processing thread. It is useful in scenarios where low latency processing is desired for a specific topic. The downside of multiple shards is having more KafkaConsumers per application, possibly increasing the load on Kafka server.
- tw-leader-selector was upgraded, it now brings in tw-curator. This in turn means, that you don't have to define a CuratorFramework bean in your application, it will be created automatically if missing.
- Optimized some queries for a case where there is enormous number of waiting or stuck tasks.
- Debug metrics are disabled by default.
- We are marking all buckets as dirty, when some concurrency slot frees up. To support cases where multiple buckets have the same concurrency policy.
- Added some debug metrics for tasks processing cycle.
- Removed 1.7.5 and 1.7.4 version from repositories and correctly increased the minor version instead. Because the ClockHolder change may need some minor changes in services test suites.
- Moving away from global ClockHolder to mock the time in tests. In that way we will create less surprises and flakiness for services also needing to mock that global time for other reasons.
- Reducing jobs logs spam in applications test suite.
- TaskHandlerRegistry is initializing handlers list in lazy way to avoid possible circular dependencies in applications.
- ITestTaskService got 2 new methods for controlling the automatic tasks processing:
stopProcessing()resumeProcessing()
Fixing possible race-condition in ClusterWideTasksStateMonitor.
Moved https://github.com/transferwise/tw-tasks-jobs to the main repository in a form of extension that consists of extension core, spring boot starter and test components. The typical tw-tasks-jobs library consumer will replace tw-tasks-jobs dependency with:
implementation("com.transferwise.tasks:tw-tasks-jobs-spring-boot-starter:${twTasksVersion}")
testImplementation("com.transferwise.tasks:tw-tasks-jobs-test:${twTasksVersion}")
Note that com.transferwise.tasks.impl.jobs.JobsAutonfiguration is replaced with
com.transferwise.tasks.ext.jobs.autoconfigure.TwTasksExtJobsAutoConfiguration
The project is split on modules. The tw-tasks-executor artifact is no longer published. From now on there is a core module and related extensions that can be easily switched on and off. The typical library consumer will replace tw-tasks-executor dependency with:
implementation("com.transferwise.tasks:tw-tasks-core-spring-boot-starter:${twTasksVersion}")
implementation("com.transferwise.tasks:tw-tasks-incidents-spring-boot-starter:${twTasksVersion}")
implementation("com.transferwise.tasks:tw-tasks-kafka-listener-spring-boot-starter:${twTasksVersion}")
implementation("com.transferwise.tasks:tw-tasks-management-spring-boot-starter:${twTasksVersion}")
testImplementation("com.transferwise.tasks:tw-tasks-core-test:${twTasksVersion}")
Note that tw-tasks-incidents and tw-tasks-kafka-listener are deprecated and soon will be removed
- Build alerting based on exposed metrics instead of using tw-tasks-incidents
- Use spring-kafka or another kafka library instead of using tw-tasks-kafka-listener
ExponentialTaskRetryPolicy is now handling arithmetic overflows. But for that, the multiplier was refactored from double to integer.
TwTasksManagement API has a getTask endpoint.
You can now secure all tasks management endpoints by specifying
TasksProperties.TasksManagement.viewTaskDataRoles