Skip to content

Latest commit

 

History

History
32 lines (26 loc) · 906 Bytes

File metadata and controls

32 lines (26 loc) · 906 Bytes

HIVE_Big_Data

MovieLens Genre Analytics Using Hive on Hadoop

Objective

Analyze the MovieLens dataset to identify popular genres using Hive over Hadoop in Cloudera VM. Extract insights for streaming platforms like Netflix or Prime.

Tech Stack

  • Apache Hive
  • Hadoop HDFS
  • Cloudera Quickstart VM
  • Linux Shell
  • Excel / Matplotlib for visualization

Key Features

  • Genre-wise popularity using Hive explode and split
  • Data stored and queried in HDFS
  • Business insights for recommendation engines
  • Visualization charts

Project Structure

  • datasets/: Input CSV files
  • Hive_Queries/: All Hive scripts
  • visualizations/: Graphs generated from output and Hive CLI output proofs

Report

See Movie analytics.docx for the full write-up.

How to Run

  1. Set up Cloudera VM
  2. Load movies.csv into HDFS
  3. Create external Hive table
  4. Run queries from hive_queries/