What happened?
If anyone runs CassandraIO to read all rows on a fairly large Cassandra Cluster (~50 Nodes, > 2 TB)
and there are any timeout exceptions a set of rows is never read, CassandraIO only logs the error and proceeds.
Root Cause
cassandraIO ReadAll does not let a pipeline handle or retry exceptions
JDBCIO throws exception which gets retried by dataflow runner on other nodes.
In the most ideal case there should be a way to plug in an exception handler to handle such corner cases in production.
Ref in Code
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
What happened?
If anyone runs CassandraIO to read all rows on a fairly large Cassandra Cluster (~50 Nodes, > 2 TB)
and there are any timeout exceptions a set of rows is never read, CassandraIO only logs the error and proceeds.
Root Cause
cassandraIO ReadAll does not let a pipeline handle or retry exceptions
JDBCIO throws exception which gets retried by dataflow runner on other nodes.
In the most ideal case there should be a way to plug in an exception handler to handle such corner cases in production.
Ref in Code
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components