Michelin_NLP_Capstone/script.md at main · CodeupGourmands/Michelin_NLP_Capstone

Hello my name is Yuvia Cardenas your Maitre D, allow me to introduce Team Gourmand Justin Evans your Sous Chef Woody Sims your Pastry Chef & Cristina Lucin your Chef de Cuisine

Menu - Intro Yuvia
Hors d'oeuvre - Executive Summary Yuvia
Soup du Jour - Acquire Justin
Aperitif - Prepare Justin
Salad - Explore Cristina
Entree - Modeling Woody
Dessert - Conclusion Recommendations Cristina

INTRO SLIDE

Welcome to all travelers who were lead by the Michelin guide & through our shared love of food, to utilize Data Science to distill the essence of fine dining perfection. At present, a star award from the Michelin Guide is widely accepted as the pre-eminent culinary achievement of restauranteurs and chefs alike. Internally, Michelin preserves the integrity of the reviews by keeping reviewers (commonly called "inspectors")anonymous. Externally, "Inspectors" are strictly advised not to disclose their line of work to anyone, not even their parents. The amount of secrecy in this process, and importance of this review in the culinary world, led our Team to ask the following question: "What factors can be revealed by examining Michelin restaurant reviews?"

Today for your dining pleasure Team Gourmand will serve you some delectible data we hope you enjoy.

EXECUTIVE SUMMARY SLIDE

For the Hors d'oeuvre, here is our Executive Summary:

Some key part of Acquisition & Preparation:

Dataset of all Michelin Awardee restaurants worldwide was acquired from Kaggle (Updated quarterly)
We utilized the Michelin Guide URL for each restaurant and Beautiful Soup to scrape the review text, enhancing the original data set.

Some highligts from Exploration & Modeling:

"Bib Gourmand" was our baseline (50.3% of dataset)
3 Michelin star restaurants had reviews with the most words, but 2 star restaurants had the highest sentiment scores

Some key takeaways from our project:

Restaurants with higher Michelin award levels have, on average, longer reviews
An excellent and unique dining experience is a strong driver of 3 Michelin Star award

Now to our Sous Chef Justin, for a little more on this history of Michelin and how we sourced our data:

Soup du jour:

HISTORY slide

The Michelin guide was created in 1900 by French Brothers Édouard and André Michelin. The brothers sold tires and the guide was created to increase the amount of cars on the road in France, which at the time was estimated to be fewer than 3,000 cars.
The guide featured information valuable for motorists, including repair shops, local hotels, and dining reviews
Over the years, the Michelin guide has undergone many changes and developments, including adding additional award categories and expanding to over 40 countries

ACQUIRE slide

Our dataset of all Michelin Awardee restaurants worldwide was acquired from Kaggle. This dataset is updated quarterly with new additions of Michelin Awardee restaurants and removal of restaurants that no longer carry the award. From this initial dataset, we utilized the Michelin Guide URL for each restaurant and Beautiful Soup to web-scrape the review text for each restaurant, enhancing the original dataset.

The review text for each restaurant was then appended back to the original dataframe
Each row represents a Michelin Awardee restaurant
Each column represents a feature of the restaurant, including feature-engineered columns
We acquired 6780 restaurant reviews, including 6 NaN values caused by restaurants no longer active Michelin Awardees

Aperitif PREPARE SLIDE

Our data set was prepared following standard Data Processing procedures. Some of the preparation steps we took were:

Dropped features not useful for this project
Cleaned column names and values utilizing REGEX and string methods
Feature engineering including:
- Columns with clean and lemmatized text
- 'word_count' representing the word count of restaurant reviews
- A feature representing the sentiment score of the review text
- Missing values in the price column were imputed with the mode
We split the dataset into train, validate and test, stratifying on target of award
We scaled all numeric columns and encoded price level and country into dummy variables

Now here's Cristina to present an exploratory Salad course:

Salade

TARGET VARIABLE SLIDE

#1 Our first question begining exploration was: What is the distribution of award levels in our dataset?

Bib Gourmand, though not a star rating, is a fourth category of Michelin award that recognizes restaurants with a simpler style of cooking. Michelin describes Bib Gourmand restaurants as ones that "leave you with a sense of satisfaction, at having eaten so well at such a reasonable price."
One Michelin Star was the second most frequent award category, followed by two and three star restaurants. Only 2 percent of restauraunts in our dataset have received the highest, and most prestiguous, three star designation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FilesExpand file tree

script.md

Latest commit

History

script.md

File metadata and controls