You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zubair Nabi edited this page May 20, 2015
·
11 revisions
To prepare the data set for the benchmark applications, you need to first compile the StormEmailBenchmark and then use the CoalesceEnronDataset class to combine all emails into a single file.
To generate the full dataset:
java -cp target/storm-email-benchmark-1.0-jar-with-dependencies.jarcom.ibm.streamsx.storm.email.benchmark.testing.CoalesceEnronDataset<path_to_downloaded_data>/enron_mail_20110402/maildir<output_file_path_and_name_with_ext>no
To generate the 25% dataset:
java -cp target/storm-email-benchmark-1.0-jar-with-dependencies.jar com.ibm.streamsx.storm.email.benchmark.testing.CoalesceEnronDataset <path_to_downloaded_data>/enron_mail_20110402/maildir <output_file_path_and_name_with_ext> yes
Next Steps:
[Create dataset for Apache Storm benchmark ](Create dataset for Apache Storm benchmark )
[Create dataset for InfoSphere Streams benchmark ](Create dataset for InfoSphere Streams benchmark )