Skip to content

Latest commit

 

History

History
225 lines (154 loc) · 12 KB

File metadata and controls

225 lines (154 loc) · 12 KB

Thanks for showing interest in taking part in our Co-learning activity. We assure you the journey full of learning by solving some problems. To get into the Data Science field, the first and the last thing you need is the portfolio where you have solved a problem using data science. Ask any good Data Scientist and they will tell you the same.

“Let’s put theory into practice”

In this activity, we will solve old Kaggle problems to enable you to participate in ongoing Kaggle competitions.

Why Old Kaggle problems?

Kaggle is an amazing Data science community with competition from top companies that has a close to the real dataset, Quality discussion, and Solution. A gold mine of knowledge is already there we just need to dig it :)

Testimonial (Click to Watch Video)

Course Outline

Python brush up with ML concepts and libraries [3 hours - optional]

A brief summary on basic concepts of python syntax, data types and in built data structures such as loops, lists, tuples, dictionary, functions.

Understanding basic Machine Learning concepts :

  1. What is machine learning?
  2. Supervised and Unsupervised machine learning concept.
  3. Bias - Variance trade-off.
  4. Overfitting and Underfitting in machine learning.
  5. Understanding classification and regression.
  6. Brief summary of ‘Scikit learn’.
  7. Understanding the problem statement for Kaggle problems.

Batch Categories

For enrollment into any batch a one on one discussion session would be scheduled with our team where our team will understand your aspiration for joining this course and your understanding about field of Data Science and Kaggle.Based on batch category problems have been categorised accordingly.

1. Beginner

Begineer batch will consist of people who have knowledge of Data science conceptually and want to transition in this field. Who have worked with small/academic datasets only and want to understand how really world problems are and how to begin with those problems. This batch is for people who are newbies in world of Data Science and want guidance to get their hands dirty with kaggle problems.

2. Intermediate

Intermediate batch would consist of people who have already embraced the world of Data Science but still face challenges with real life datasets. One who wants to move a step further in Data Science. People who have knowledge and hands on experience in Data Science problem solving. This batch is for people who want to stengthen their skill in Data Science by plunging in complex kaggle problem solving.

The whole idea is to mentor and guide you well in your Data Science journey. Like others, we don't want to give you False hopes and then fail at making a successful career transition. Also, once you join our course and after finishing you can solve plenty of Kaggle problems with help of close and premium community of Kagglers and guidance from mentors.

Problems to be undertaken

1. Titanic dataset (Beginner)

Titanic is one of the most infamous disaster in recent human history which resulted in the death of 1502 out of 2224 passengers. Analysis shows that while some amount of luck was involved, some passengers were more likely to survive than others. Train a machine learning model to predict what sort of people were likely to survive.

2. Predicting a Pulsar Star(Intermediate)

Pulsars are a rare type of Neutron star that produce radio emission detectable here on Earth. They are of considerable scientific interest as probes of space-time, the inter-stellar medium, and states of matter. Machine learning tools are now being used to automatically label pulsar candidates to facilitate rapid analysis. Classification systems in particular are being widely adopted,which treat the candidate data sets as binary classification problems.

3. Chronic KIdney Disease dataset (Intermediate)

The data was taken over a 2-month period in India with 25 features ( eg, red blood cell count, white blood cell count, etc). The target is the 'classification', which is either 'ckd' or 'notckd' - ckd=chronic kidney disease. Use machine learning techniques to predict if a patient is suffering from a chronic kidney disease or not

4. Employee Attrition (Intermediate)

The key to success in any organization is attracting and retaining top talent. As an HR analyst one of the key task is to determine which factors keep employees at the company and which prompt others to leave. Given in the data is a set of data points on the employees who are either currently working within the company or have resigned. The objective is to identify and improve these factors to prevent loss of good people.

Exploratory Data Analysis Syllabus (6-8 hours)

To delve into machine learning, the first step is to identify and analyze our dataset in order to ascertain different patterns, detect missing values and anomalies and identify different characteristics of our data. In other words, we attempt to understand what the data is trying to tell us.

Understand the different approaches to implement EDA :

  1. Understanding the data.
  2. Identifying variables and checking data types.
  3. Analyzing the basic metrics of different data types.
  4. Univariate Analysis- Non-Graphical.
  5. Univariate Analysis-Graphical (VDA).
  6. Bivariate Analysis.
  7. Pair Plot Analysis
  8. Missing value treatment.
  9. Outlier treatment.
  10. Correlation Analysis.
  11. Dimensionality Reduction.
  12. Binning
  13. Log Transformation
  14. Scaling

Dataset split- (1 hour)

It is a common practice in machine learning to split data into training, testing and validation set allowing us to calibrate our model and test it’s performance on unseen data

Learn the concepts and significance of data splitting in model development :

  1. Train
  2. Test
  3. Validation (different types)

ML Algorithms- Supervised learning (8-10 hours)

Understanding the theoretical and mathematical of machine learning algorithms is the crux of developing accurate predictive models. Different machine learning algorithms are suited for different types and distribution of data

Delve into the intricacies of the mathematics and mechanism of the functioning of these algorithms:

  1. Linear Regression
  2. Logistic Regression
  3. Decision Trees
  4. SVM
  5. Naive Bayes
  6. Random Forest
  7. XGBoost
  8. LightGBM

Evaluation metrics (3-4 hours)

A developed model needs to be evaluated for it’s performance before being actually deployed in real time environment.

Learn the concepts to measure the performance of model based on several metrics :

  1. Confusion Matrix
  2. F1 Score
  3. Gain and LiftChart
  4. AUC - ROC
  5. Log Loss
  6. Gini Coefficient
  7. Root Mean Square Error
  8. Cross Validation

ML Explainability (3-4 hours)

Towards a better understanding of why machine learning models make the decisions they do, and why it matters

Learn the concepts to measure the interpretability of the ML models :

  1. LIME
  2. Algorithimc Generalisation

Hyperparameter tuning (3-4 hours)

Once a machine learning algorithm has been identified, it is imperative to choose an optimal set of parameters to tune the model to better fit the data. Such parameters can’t be learn’t during the training process and thus requires an expertise for model tuning.

Develop an intuition to streamline the parameter optimization process.

What will you get out of this course?

  • You’ll get to understand the entire Machine Learning pipeline.
  • You’ll get to learn abstract topics such as Statistics & Exploratory Data Analysis.
  • You’ll learn methods to identify Outliers and methods to eliminate it’s effects.
  • You’ll get to learn Dimensionality Reduction.
  • You’ll be delving into the intricacies of different ML algorithms.
  • You’ll get to understand the nuances of Hyperparameter tuning.
  • You’ll be getting hands on experience through Weekly Assignments.
  • You’ll be able to reach out to peers and mentors through sync ups every Wednesday and Friday.
  • You’ll get to submit your work on Kaggle as a notebook.
  • You’ll be building a strong LinkedIn profile visible to recruiters.
  • We will teach you to create the content and increase LinkedIn presence to build your self-brand during the course(If you are interested)
  • This course should give enough confidence to solve the ML/DL assignments given by company's for the ML role.

Post - course benefits

  • Get your Resume and GitHub reviewed by experts. [Additional service at cheaper price]
  • Once you have undertaken and completed the course, you will get full-fledged support from our mentors from the community for any technical help, guidance, etc.
  • As most of the company prefer giving assignment to the candidate for ML role, we can help you to mentor for same.
  • As we have mentored and we know your skills and achievement, we will refer you for any AI/ML job which fits your profile.
  • As we know LinkedIn is the platform to catch recruiters' attention, we will shout out your achievements, help to boost your work on LinkedIn to get visibility.
  • You will be eligible for intermediate and advance concepts in the same series.

Resume and LinkedIn Review

  • Design and formatting
  • Content, Buzzwords, Matrix
  • Objective/Summary strength
  • ATS Score (With screenshot) & how to Improve
  • LinkedIn Profile strength review
  • Activity/Content review
  • LinkedIn Profile score (With screenshot) & How to Improve

Rules and Regulations

  • Since the course time-lines are fixed, you will have to submit the assigned learning module and work within a given time.
  • Please join the course based on your available bandwidth. Course fees is non-refundable.
  • Classes will mostly be held during the weekends (Saturday, Sunday) for 2 hours per day.
  • This course does not provide any certificate. Only certificate won't give you a job. Knowledge and projects will. That is what we primarily focus on.

How to apply?

If you feel you are qualified and are up for the learning ride, please send your profile (Resume, LinkedIn, GitHub) to colearninglounge@gmail.com with subject line "Kaggle based project learning"

While applying, do let us know:-

  1. Why do you want to join this course?
  2. What are your expectations from this course?

Training period fee - ₹ 5999/- per person

  • Course starts on 22nd August 2020 till completion of the challenge OR 25th October 2020. Whichever comes first.
  • To maintain the quality of learning maximum 20 person in batch is allowed.
  • Last day to apply is 12th August 2020.
  • For any query email us to colearninglounge@gmail.com.

Scholarship opportunity

During the span of the course, if you help us in creating content (learning material) for the course, then based on your contribution we will return the fees.

It's our sole decision about contribution and reward.