icon picker
Program Journey

Program Pipeline

Screenshot 2020-10-22 at 11.58.54 AM.png

The Data Science Transition program is a hybrid program that is a mix of self-paced learning and mentor led LIVE instruction. In your self-paced learning, pick up the basics of ML concepts. In the mentor led LIVE instruction, you can learn the problem solving skills from a real data scientist. The mentor led instruction is designed in such a way that irrespective of your progress on GLabs, you should be able to pick up the skills and get value. The more you progress on glabs, the more value you will get. Only pre-requisite - you must be comfortable with what is covered in the basecamp.
FOR BEST RESULTS: Plan your study in such a way that you finish the projects on Glabs in 6 months and attend as many LIVE sessions as you can. Do atleast 1-2 hackathons within this time (start the hackathon from month 3). Have 1-2 awesome portfolio projects. Then you are ready to make a transition in your career.

Recommended Program Effort

Non-Programmers - Effort of 8-10 hours per week needed.
Programmers - Effort of 4-6 hours per week needed.

Self-paced learning will proceed → Check it out here . As you keep going through the Glabs content in your own pace, attend the LIVE mentor session. The LIVE sessions will cover the following topics. Please note that the order might vary and topics might change based on feedback of industry and mentors. But this should give you a good idea of what to expect.
For every session when scheduled in Glabs, we will have the pre-requisites defined in the session description. Try to go through the pre-requisites before the session.

Data Science Pipeline

A typical Data Science pipeline is as follows. In the Data Science Transition Program, you will learn how to execute the different parts of the pipeline.
Screenshot 2020-10-22 at 11.37.07 AM.png

Identify Business Problem

A problem or pain point for the business is presented to you. As a data scientist, you must be able to formulate a data science problem from the business problem that you would be ready to solve.

Data Collection

In this stage, based on the problem that you have defined, collect the data that is required for solving the problem.

Data Cleaning and EDA

The next stage is to prepare the data by processing the collected data to solve the data science problem. And a data scientist spends about 60-70% of his time at this stage.

Model Building & Evaluation

Then, we train a machine learning model on this data. All the ML algorithms - both supervised and unsupervised learning are used here. The output of this model is then used to figure out the right insights for the business and solve the problem. If the model is found to be unsuitable or not giving satisfactory results, then you go back, collect more data and rebuild the ML model.

Reporting

Finally you would have to cut through the technical jargon and convey the key insights to the business. This is an important step where you showcase the solution to the business problem and convey the recommended data-driven decisions to the stakeholder. If the required pain point is not solved, you go back to reframe the business problem.
The LIVE sessions on Saturdays are as follows :
Basecamp
Session
Content
1
Understanding Git and Github and basics of Programming Logic
In this session we will understand what is Version Control System, basic git commands and the logic of programming
2
IPL Data Analysis using basic Python Constructs
We will extensively work with dictionaries and lists here to get started with our python training and analyze the IPL data
3
Spy Games Data
This session will help us understand string operations, File I/O, conditional statements, loops and Functions in programming
4
Loan Approval Analysis
We will use the Pandas library to analyze the data to create a targeted marketing campaign for different segments
5
Visualization for company Stakeholder
In this session we with the help of visualization techniques we will help our company stakeholder get visual insights on the company's operations.
6
Understanding basic Descriptive Stats with Titanic Data
In this session we will get an idea of how even basic descriptive statistics is used and can help improve our understanding of the data
7
Rent a Bike Regression Problem
We will understand the machine learning pipeline with this regression problem statement of predicting the number of bike sales
8
Classification in Machine Learning and Program Expectations
We will understand the ML pipeline for classification with the task of predicting whether a person makes over $50K a year or not and understand what to expect ahead in the program
There are no rows in this table
Mentor Led Sessions
Session Name
Description
Stage
1
Automobile Data exploration
In this session we are doing some basic analysis of automobile data which the learners can further expand on. This analysis can be used to find patterns and resolve quality issues either in the nick of time or prevent them from happening altogether.
Data Cleaning and EDA
2
IPL data analysis
In this session, we perform analysis to decide which factors led one of the teams winning. This would help plan out strategy for the future sessions.
Data Cleaning and EDA
3
Big mart regression
The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. The aim is to build a predictive model and find out the sales of each product at a particular store.
Model Building, Model Evaluation
4
Election data visualization
The Lok Sabha election is a very complex affair as it involves a lot of factors. There are more than 700 million voters with more than 800,000 polling stations. It is this very fact that makes it a perfect topic to analyze.
Data Cleaning and EDA
5
ML studio Azure
Azure ML Studio provides you an interactive, visual workspace where your drag and drop data sets and analysis are converted to an interactive canvas. This session will give you an idea on building a low code Machine Learning Solution
Model Building, Model Evaluation
6
Visualizing cricket performance
We want to know as to what happens during an IPL match which raises several questions in our mind. This analysis is done to know which factors led one of the teams to win and how does it matter.
Data Cleaning and EDA
7
Feature selection with Breast cancer data
Breast cancer (BC) is one of the most common cancers among women worldwide, representing the majority of new cancer cases. This analysis aims to observe which features are most helpful in predicting malignant or benign cancer and to see general trends that may aid us in model selection and hyperparameter selection.
Model Building, Model Evaluation
8
Ted Data Analysis
Since the time we have begun watching TED Talks, they have never ceased to amaze us.we are attempting to find insights about the world of TED, its speakers and its viewers and try to answer a few questions.
Data Cleaning and EDA
9
Low Code Machine Learning Solution
Creating a low code solution using useful libraries like Sweetviz, pycare and pandas profiling
Model Building, Model Evaluation
10
Intro to Plotly
In this session we will understand how to go through a new library with Plotly. Plotly is a library used to create interactive visualizations.
Data Cleaning and EDA
11
Startup Data Analysis
There are a lot of innovative startups coming up in the region and a lot of funding for these startups as well. We will analyze the startup eco system in this session
Data Cleaning and EDA
12
Plotly Dash
Dash is an open-source Python framework used for building analytical web applications. It is a powerful library that simplifies the development of data-driven applications. This session will help us create an interactive web app using Dash.
Reporting
13
A/B testing
A/B testing is an essential element while developing product in an organization. In this session we will be understanding A/B testing and its impact on business.
Model Building, Model Evaluation
14
Salary Prediction
With the data given of the income of various individuals, we will study the data and predict in which salary bracket do they come under
Model Building, Model Evaluation
15
Daily cleaning with exit survey
Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer? We will try to answer these questions in this session.
Data Cleaning and EDA
16
Time series - FB Prophet
In this session we will get started with using time series data in a hands on manner. We will understand various ways to use and gain insights from a time series data
Model Building, Model Evaluation
17
Cleaning Tata with Regex
Natural Language Processing is widely being applied in the world for various purposes at the moment. To make it work the input data has to be created in a specific manner, it requires a lot of data cleaning. In this session we will understand the data cleaning required for Natural Language Processing
Data Cleaning and EDA
18
Data collection - web scraping
Data extraction from web(Ranging from manual copy paste to complex automations)Websites come in different formats resulting in need of different web scrapers. The core process remains the same though.
Data Collection
19
Customer Marketing Strategy with Clustering
Segmentation in marketing is a technique used to divide customers or other entities into groups based on attributes such as behaviour or demographics. Here we will be using Credit card data to segment the customers
Model Building, Model Evaluation
20
Deployment with Flask/Heroku
We just dont create a ML model but there is a need to deploy it as well. In this session we will be creating an interactive web app using flask and deploying a ML model using Heroku
Reporting
21
Chennai water Management analysis
We will be analyzing the various water bodies of Chennai and try to solve the water crisis the city is facing and suggestions to avoid the same in the future
Data Cleaning and EDA
22
Credit Delinquency Prediction
Delinquency describes something or someone who fails to accomplish that which is required by law, duty, or contractual agreement, such as the failure to make a required payment or perform a particular action.This use-case requires learners to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial delinquency in the next two years
Model Building, Model Evaluation
23
Medical Insurance code along
The cost of treatment depends on many factors: diagnosis, type of clinic, city of residence, age and so on. We have no data on the diagnosis of patients. But we have other information that can help us to make a conclusion about the health of patients and practice regression analysis.
Model Building, Model Evaluation
24
Basics of NLP with Twitter Data
In this session we will understand the basics of how to deal with text data to be used for various applications of NLP.
Model Building, Model Evaluation
There are no rows in this table
Self Learning Sprints on Glabs
Sprints
Modules
Outcome
1
Getting Started with Data Science
We will get introduced to the world of Data Science by understanding the basics of programming, setup required and various use cases
2
Python Fundamentals
In this sprint we will start with the very basics of python programming, understand it and do some hands on exercises to get well versed with it
3
Numpy and Pandas
In this sprint we will be understanding with Numpy and Pandas library which are the pillars of wrangling with data, do some hands on exercise as well as projects
4
Visualization in Python
We will understand visualization using Matplotlib to generate insights from the data
5
Foundation of Statistics
We will understand descriptive statistics as well as probability which play a vital role in making sense of the data
6
Inferential Statistics and Linear Regression
In this sprint we will study Inferential statistics and study our first Machine Learning algorithm of Linear Regression
7
Regularization and Data Exploration
We will understand Regularization’s role in improving results and spend time understanding the various Data preprocessing techniques
8
Classification in Machine Learning
In this sprint we will understand Classification and feature selection techniques to come up with the best features
9
Tree Based ML Algorithms
We will understand Tree based algorithm and ensembling techniques in this sprint
10
Advanced ML
In this sprint we will understand advanced Machine Learning techniques like Boosting and Clustering. We will also understand the challenges in Machine Learning and ways to tackle them
11
Business Problem Solving with Data Science
As a apart of any organization our overall goal as a Data Scientist is to solve a Business Problem. In this sprint we will understand about the same
12
Introduction to NLP
This sprint will get us introduced to the world of NLP with Text and Sentiment Analysis
13
Modeling Language
In this sprint we will understand advanced NLP topics like Language and Topic modelling
14
Parsing and Chatbot
With all the techniques learnt we will try to build Chatbot and understand Parsing techniques
15
Data Science Project 1
The objective of this problem is to predict the ‘basis point spread’ over AAA bonds
16
Data Science Project 2
We need to create a model predicting which class a particular message belongs to using NLP
17
Data Science Project 3
As a data scientist, your aim is to predict customer satisfaction
18
Transitioning to a Data Science Career
In the final sprint we will understand how to transition into the amazing world of Data Science
There are no rows in this table


The Data Science Pipeline is iterative and non-linear. Hence your journey is also non-linear. Think of it like a Christopher Nolan movie - stay through the initial few scenes and as you go towards the end, all the pieces fall in place. We hope you enjoy the transition journey with us.


Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.