You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zubair Nabi edited this page May 20, 2015
·
8 revisions
In this step, you will create the dataset file necessary for the StormEmailBenchmark.
Before you begin:
Make sure you have performed this step: [Preprocess Enron Email Dataset](Preprocess Enron Email Dataset )
Three different datasets can be generated. The generation code for all
three is present within the package com.ibm.streamsx.storm.email.benchmark.testing.
Compressed and Serialized: for the main application benchmark
Generated using CreateDatasetSequential
For use with topologies: EnronTopology, BareboneTopology, and TrivialTopology1
Compressed and Unserialized
Generated using CreateCompressedDatasetSequential
For use with topology TrivialTopology2
Uncompressed and Serialized
Generated using CreateSerializedDatasetSequential
For use with topology RestrictedTopology
The input to these is the output of the preprocessing stage and their arguments are similar.
For instance, to generate the serialized/compressed data:
java -cp target/storm-email-benchmark-1.0-jar-with-dependencies.jar com.ibm.streamsx.storm.email.benchmark.testing.CreateDatasetSequential <input_path: the output of CoalesceEnronDataset> <output_file_path_and_filename_with_ext>