-
Notifications
You must be signed in to change notification settings - Fork 985
EVF Tutorial Scan Framework Creator
The steps thus far give us a working, converted log batch reader. There are many improvements we can make. But, first let's actually try out our creation. Recall that the batch reader has thus far been an "orphan": nothing calls it. Let's fix that.
Prior to the EVF, Easy format plugins were based on the original ScanBatch. At the start of execution, the plugin creates a set of record readers which are passed to the scan batch. With EVF, we use the new scan framework. The new framework focuses on batches, and uses a new type of reader called a "batch reader." We provide a factory method to create batch readers on the fly. The batch reader itself does what the old record reader used to do, but using the EVF.
EVF is a general-purpose framework: it handles many kinds of scans. We must customize it for each specific reader. Rather than doing so by creating subclasses, we instead assemble the pieces we needed through composition by providing a framework builder class. This builder class is what allows the Easy framework to operator with both "legacy" and EVF-based readers.
In fact, if you are especially cautious, you can leverage the framework builder mechanism to offer both the new and old versions of your reader, as was done with the "v2" and "v3" versions of the text (CSV) reader in Drill 1.16.
The batch creator is a "factory" class that selects a framework to use, then customizes that framework by setting a number of properties specific to our plugin. The Easy framework set some additional generic properties for us.
Let's add a nested class within the LogFormatPlugin class:
private static class LogScanBatchCreator extends ScanFrameworkCreator {
private final LogFormatPlugin logPlugin;
public LogScanBatchCreator(LogFormatPlugin plugin) {
super(plugin);
logPlugin = plugin;
}
@Override
protected FileScanBuilder frameworkBuilder(
EasySubScan scan) throws ExecutionSetupException {
FileScanBuilder builder = new FileScanBuilder();
builder.setReaderFactory(new ColumnsReaderFactory(logPlugin));
// The default type of regex columns is nullable VarChar,
// so let's use that as the missing column type.
builder.setNullType(Types.optional(MinorType.VARCHAR));
// Pass along the output schema, if any
builder.setOutputSchema(scan.getSchema());
return builder;
}
}
Our class extends ScanFrameworkCreator which integrates with the Easy plugin framework. We hold onto the log format plugin for later use.
The main show is the frameworkBuilder() method which allows us to configure our preferred framework options.
The log reader reads from a file, so we use the FileScanBuilder class. We could support the columns column to read into an array, like CSV, if we wanted.
We specify the builder for our batch readers by calling setReaderFactory(). We'll define the actual class shortly.
Next we call setNullType() to define a type to use for missing columns rather than the traditional nullable INT. We observe that the native type of a regex column is nullable Varchar. So, if the user asked for a column that we don't have, we should use that same type so that types remain unchanged when the user later decides to define that column.
Next in the chain is a class to create our batch readers. Recall that EVF creates readers on the fly rather than up-front as in the legacy implementation. So, we need to provide the class that does the batch reader creation, again as a nested class within LogFormatPlugin:
private static class LogReaderFactory extends FileReaderFactory {
private final LogFormatPlugin plugin;
public LogReaderFactory(LogFormatPlugin plugin) {
this.plugin = plugin;
}
@Override
public ManagedReader<? extends FileSchemaNegotiator> newReader(
FileSplit split) {
return new LogBatchReader(split, plugin.getConfig());
}
}
This is simple enough: EVF calls the newReader() method when it is ready to read the next file split. (The split names a file and, if we said the plugin is block splitable, it also names a block offset and length.)
We are free to create the log batch reader any way we like: the constructor is up to us. We already trimmed it down earlier, so we simply use that constructor here.
All the work we've done thus far is still an "orphan": nothing calls it. We're finally ready to change that. By default, the Easy plugin creates the scan operator the legacy way; that's why the old plugin worked. To trigger an EVF-based scan, we return our scan batch creator:
@Override
protected ScanBatchCreator scanBatchCreator(OptionManager options) {
return new LogScanBatchCreator(this);
}
We mentioned before that we could be cautious and let the user choose between the old and new versions. Here's how that would look. This is for the text reader:
@Override
protected ScanBatchCreator scanBatchCreator(OptionManager options) {
// Create the "legacy", "V2" reader or the new "V3" version based on
// the result set loader. This code should be temporary: the two
// readers provide identical functionality for the user; only the
// internals differ.
if (options.getBoolean(ExecConstants.ENABLE_V3_TEXT_READER_KEY)) {
return new TextScanBatchCreator(this);
} else {
return new ClassicScanBatchCreator(this);
}
}
Here TextScanBatchCreator is the text version of the one we just created. ClassicScanBatchCreator is a generic one that does things the old-fashioned way.
With this method in place, our new version is "live". You should use your unit tests to step through the new code to make sure it works -- and to ensure you understand the EVF, or at least the parts you need.
We've now completed a "bare bones" conversion to the new framework. We'd be fine if we stopped here.
The new framework offers additional features that can further simplify the log format plugin. We'll look at those topics in the next section.