Skip to content

ninadshr/spark-nested

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#Spark Nesting Examples

This repo provides quick samples of how to handle complex types in Spark using both its API and SQL format.

It contains a sample use case of customer transactions for typical online shopping model. Creation scripts for below tables are included in the code (DataCreator.scala):

  • customer_details
  • customer_payments
  • customer_transactions
  • transaction_details
  • shipment_details

Sample entries for these entities are also provided with the code base.

Code snippets in NestingExamples.scala has examples for:

  • Inserting data for nested customer table with their payment information.
  • Inserting data for nested transactions with shipping details
  • Fetching data from above nested tables.

This application is written using Spark 2.0 with Scala 2.11. To setup and run this code on CDH cluster follow below steps:

Step 1: Upload all the csv data files included in src/main/resources on your hdfs gateway node.

Step 2: Run these commands from the node


hdfs dfs -mkdir -p /data/customer_details
hdfs dfs -mkdir -p /data/customer_payments
hdfs dfs -mkdir -p /data/customer_transactions
hdfs dfs -mkdir -p /data/shipment_details
hdfs dfs -mkdir -p /data/transaction_details

hdfs dfs -put customer_details.csv /data/customer_details
hdfs dfs -put customer_payments.csv /data/customer_payments
hdfs dfs -put customer_transactions.csv /data/customer_transactions
hdfs dfs -put shipment_details.csv /data/shipment_details
hdfs dfs -put transaction_details.csv /data/transaction_details

Step 3: Compile this code using mvn clean install command. Upload the created jar in target directory on your Spark2 client gateway.

Step 4: Sample command to run this code on CDH cluster

spark2-submit --class org.spark.nested.NestedDriver --master yarn spark-nested-1.0-SNAPSHOT.jar yarn

To run locally

Keep all data file in some local path and point all table locations in DataCreator.scala class to these paths. Once done simply compile code using mvn clean install and run with:

spark2-submit --class org.spark.nested.NestedDriver --master yarn spark-nested-1.0-SNAPSHOT.jar

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors