GitHub - ninadshr/spark-nested

#Spark Nesting Examples

This repo provides quick samples of how to handle complex types in Spark using both its API and SQL format.

It contains a sample use case of customer transactions for typical online shopping model. Creation scripts for below tables are included in the code (DataCreator.scala):

customer_details
customer_payments
customer_transactions
transaction_details
shipment_details

Sample entries for these entities are also provided with the code base.

Code snippets in NestingExamples.scala has examples for:

Inserting data for nested customer table with their payment information.
Inserting data for nested transactions with shipping details
Fetching data from above nested tables.

This application is written using Spark 2.0 with Scala 2.11. To setup and run this code on CDH cluster follow below steps:

Step 1: Upload all the csv data files included in src/main/resources on your hdfs gateway node.

Step 2: Run these commands from the node


hdfs dfs -mkdir -p /data/customer_details
hdfs dfs -mkdir -p /data/customer_payments
hdfs dfs -mkdir -p /data/customer_transactions
hdfs dfs -mkdir -p /data/shipment_details
hdfs dfs -mkdir -p /data/transaction_details

hdfs dfs -put customer_details.csv /data/customer_details
hdfs dfs -put customer_payments.csv /data/customer_payments
hdfs dfs -put customer_transactions.csv /data/customer_transactions
hdfs dfs -put shipment_details.csv /data/shipment_details
hdfs dfs -put transaction_details.csv /data/transaction_details

Step 3: Compile this code using mvn clean install command. Upload the created jar in target directory on your Spark2 client gateway.

Step 4: Sample command to run this code on CDH cluster

spark2-submit --class org.spark.nested.NestedDriver --master yarn spark-nested-1.0-SNAPSHOT.jar yarn

To run locally

Keep all data file in some local path and point all table locations in DataCreator.scala class to these paths. Once done simply compile code using mvn clean install and run with:

spark2-submit --class org.spark.nested.NestedDriver --master yarn spark-nested-1.0-SNAPSHOT.jar

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.metadata		.metadata
.settings		.settings
metastore_db		metastore_db
src/main		src/main
.cache-main		.cache-main
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
derby.log		derby.log
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages