Explore

Program Journey

Program Pipeline

⁠

Screenshot 2020-10-22 at 11.58.54 AM.png

⁠

The Data Science Transition program is a hybrid program that is a mix of self-paced learning and mentor led LIVE instruction. In your self-paced learning, pick up the basics of ML concepts. In the mentor led LIVE instruction, you can learn the problem solving skills from a real data scientist. The mentor led instruction is designed in such a way that irrespective of your progress on GLabs, you should be able to pick up the skills and get value. The more you progress on glabs, the more value you will get. Only pre-requisite - you must be comfortable with what is covered in the basecamp.

FOR BEST RESULTS: Plan your study in such a way that you finish the projects on Glabs in 6 months and attend as many LIVE sessions as you can. Do atleast 1-2 hackathons within this time (start the hackathon from month 3). Have 1-2 awesome portfolio projects. Then you are ready to make a transition in your career.

Recommended Program Effort

Non-Programmers - Effort of 8-10 hours per week needed.

Programmers - Effort of 4-6 hours per week needed.

Self-paced learning will proceed → Check it out here

Glabs⁠

. As you keep going through the Glabs content in your own pace, attend the LIVE mentor session. The LIVE sessions will cover the following topics. Please note that the order might vary and topics might change based on feedback of industry and mentors. But this should give you a good idea of what to expect.

For every session when scheduled in Glabs, we will have the pre-requisites defined in the session description. Try to go through the pre-requisites before the session.

Data Science Pipeline

A typical Data Science pipeline is as follows. In the Data Science Transition Program, you will learn how to execute the different parts of the pipeline.

⁠

Screenshot 2020-10-22 at 11.37.07 AM.png

⁠

Identify Business Problem

A problem or pain point for the business is presented to you. As a data scientist, you must be able to formulate a data science problem from the business problem that you would be ready to solve.

Data Collection

In this stage, based on the problem that you have defined, collect the data that is required for solving the problem.

Data Cleaning and EDA

The next stage is to prepare the data by processing the collected data to solve the data science problem. And a data scientist spends about 60-70% of his time at this stage.

Model Building & Evaluation

Then, we train a machine learning model on this data. All the ML algorithms - both supervised and unsupervised learning are used here. The output of this model is then used to figure out the right insights for the business and solve the problem. If the model is found to be unsuitable or not giving satisfactory results, then you go back, collect more data and rebuild the ML model.

Reporting

Finally you would have to cut through the technical jargon and convey the key insights to the business. This is an important step where you showcase the solution to the business problem and convey the recommended data-driven decisions to the stakeholder. If the required pain point is not solved, you go back to reframe the business problem.

The LIVE sessions on Saturdays are as follows :

Basecamp

Basecamp

Session

Content

Understanding Git and Github and basics of Programming Logic

In this session we will understand what is Version Control System, basic git commands and the logic of programming

IPL Data Analysis using basic Python Constructs

We will extensively work with dictionaries and lists here to get started with our python training and analyze the IPL data

Spy Games Data

This session will help us understand string operations, File I/O, conditional statements, loops and Functions in programming

Loan Approval Analysis

We will use the Pandas library to analyze the data to create a targeted marketing campaign for different segments

Visualization for company Stakeholder

In this session we with the help of visualization techniques we will help our company stakeholder get visual insights on the company's operations.

Understanding basic Descriptive Stats with Titanic Data

In this session we will get an idea of how even basic descriptive statistics is used and can help improve our understanding of the data

Rent a Bike Regression Problem

We will understand the machine learning pipeline with this regression problem statement of predicting the number of bike sales

Classification in Machine Learning and Program Expectations

We will understand the ML pipeline for classification with the task of predicting whether a person makes over $50K a year or not and understand what to expect ahead in the program

There are no rows in this table

⁠

Mentor Led Sessions

Mentor Led Sessions

Session Name

Description

Stage

Automobile Data exploration

In this session we are doing some basic analysis of automobile data which the learners can further expand on. This analysis can be used to find patterns and resolve quality issues either in the nick of time or prevent them from happening altogether.

Data Cleaning and EDA

IPL data analysis

In this session, we perform analysis to decide which factors led one of the teams winning. This would help plan out strategy for the future sessions.

Data Cleaning and EDA

Big mart regression

The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. The aim is to build a predictive model and find out the sales of each product at a particular store.

Model Building, Model Evaluation

Election data visualization

The Lok Sabha election is a very complex affair as it involves a lot of factors. There are more than 700 million voters with more than 800,000 polling stations. It is this very fact that makes it a perfect topic to analyze.

Data Cleaning and EDA

ML studio Azure

Azure ML Studio provides you an interactive, visual workspace where your drag and drop data sets and analysis are converted to an interactive canvas. This session will give you an idea on building a low code Machine Learning Solution

Model Building, Model Evaluation

Visualizing cricket performance

We want to know as to what happens during an IPL match which raises several questions in our mind. This analysis is done to know which factors led one of the teams to win and how does it matter.

Data Cleaning and EDA

Feature selection with Breast cancer data

Breast cancer (BC) is one of the most common cancers among women worldwide, representing the majority of new cancer cases. This analysis aims to observe which features are most helpful in predicting malignant or benign cancer and to see general trends that may aid us in model selection and hyperparameter selection.

Model Building, Model Evaluation

Ted Data Analysis

Since the time we have begun watching TED Talks, they have never ceased to amaze us.we are attempting to find insights about the world of TED, its speakers and its viewers and try to answer a few questions.

Data Cleaning and EDA

Low Code Machine Learning Solution

Creating a low code solution using useful libraries like Sweetviz, pycare and pandas profiling

Model Building, Model Evaluation

Intro to Plotly

In this session we will understand how to go through a new library with Plotly. Plotly is a library used to create interactive visualizations.

Data Cleaning and EDA

Startup Data Analysis

There are a lot of innovative startups coming up in the region and a lot of funding for these startups as well. We will analyze the startup eco system in this session

Data Cleaning and EDA

Plotly Dash

Dash is an open-source Python framework used for building analytical web applications. It is a powerful library that simplifies the development of data-driven applications. This session will help us create an interactive web app using Dash.

Reporting

A/B testing

A/B testing is an essential element while developing product in an organization. In this session we will be understanding A/B testing and its impact on business.

Model Building, Model Evaluation

Salary Prediction

With the data given of the income of various individuals, we will study the data and predict in which salary bracket do they come under

Model Building, Model Evaluation

Daily cleaning with exit survey

Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer? We will try to answer these questions in this session.

Data Cleaning and EDA

Time series - FB Prophet

In this session we will get started with using time series data in a hands on manner. We will understand various ways to use and gain insights from a time series data

Model Building, Model Evaluation

Cleaning Tata with Regex

Natural Language Processing is widely being applied in the world for various purposes at the moment. To make it work the input data has to be created in a specific manner, it requires a lot of data cleaning. In this session we will understand the data cleaning required for Natural Language Processing

Data Cleaning and EDA

Data collection - web scraping

Data extraction from web(Ranging from manual copy paste to complex automations)Websites come in different formats resulting in need of different web scrapers. The core process remains the same though.

Data Collection

Customer Marketing Strategy with Clustering

Segmentation in marketing is a technique used to divide customers or other entities into groups based on attributes such as behaviour or demographics. Here we will be using Credit card data to segment the customers

Model Building, Model Evaluation

Deployment with Flask/Heroku

We just dont create a ML model but there is a need to deploy it as well. In this session we will be creating an interactive web app using flask and deploying a ML model using Heroku

Reporting

Chennai water Management analysis

We will be analyzing the various water bodies of Chennai and try to solve the water crisis the city is facing and suggestions to avoid the same in the future

Data Cleaning and EDA

Credit Delinquency Prediction

Delinquency describes something or someone who fails to accomplish that which is required by law, duty, or contractual agreement, such as the failure to make a required payment or perform a particular action.This use-case requires learners to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial delinquency in the next two years

Model Building, Model Evaluation

Medical Insurance code along

The cost of treatment depends on many factors: diagnosis, type of clinic, city of residence, age and so on. We have no data on the diagnosis of patients. But we have other information that can help us to make a conclusion about the health of patients and practice regression analysis.

Model Building, Model Evaluation

Basics of NLP with Twitter Data

In this session we will understand the basics of how to deal with text data to be used for various applications of NLP.

Model Building, Model Evaluation

There are no rows in this table

⁠

Self Learning Sprints on Glabs

Self Learning Sprints on Glabs

Sprints

Modules

Outcome

Getting Started with Data Science

⁠

Welcome to Data Science Program

⁠

Data Science Journey :: Getting Started

⁠

Introduction to Programming

⁠

Introduction to Jupyter Notebook and Google Colab

⁠

Introduction to Data Science

⁠

We will get introduced to the world of Data Science by understanding the basics of programming, setup required and various use cases

Python Fundamentals

⁠

Python : Getting Started

⁠

Python: Handling Program Flow

⁠

In this sprint we will start with the very basics of python programming, understand it and do some hands on exercises to get well versed with it

Numpy and Pandas

⁠

NumPy: Manipulating Data

⁠

Pandas : Wrangling Data

⁠

In this sprint we will be understanding with Numpy and Pandas library which are the pillars of wrangling with data, do some hands on exercise as well as projects

Visualization in Python

⁠

Matplotlib: Data Visualization

⁠

Fundamentals of Git and GitHub

⁠

We will understand visualization using Matplotlib to generate insights from the data

Foundation of Statistics

⁠

Descriptive Statistics: Summarizing Data

⁠

Probability for Machine Learning

⁠

We will understand descriptive statistics as well as probability which play a vital role in making sense of the data

Inferential Statistics and Linear Regression

⁠

Inferential Statistics: Making Inference from Data

⁠

Linear Regression: Make Your First Prediction

⁠

In this sprint we will study Inferential statistics and study our first Machine Learning algorithm of Linear Regression

Regularization and Data Exploration

⁠

Regularization in Machine Learning

⁠

Data Exploration: EDA and Data Preprocessing

⁠

We will understand Regularization’s role in improving results and spend time understanding the various Data preprocessing techniques

Classification in Machine Learning

⁠

Building a Classification Model using Logistic Regression

⁠

Feature Selection in Machine Learning

⁠

In this sprint we will understand Classification and feature selection techniques to come up with the best features

Tree Based ML Algorithms

⁠

Decision Trees in Machine Learning

⁠

Ensemble Techniques

⁠

We will understand Tree based algorithm and ensembling techniques in this sprint

Advanced ML

⁠

Boosting Algorithms : Gradient Boosting Algorithm

⁠

Clustering Algorithms

⁠

Machine Learning Challenges

⁠

In this sprint we will understand advanced Machine Learning techniques like Boosting and Clustering. We will also understand the challenges in Machine Learning and ways to tackle them

Business Problem Solving with Data Science

⁠

How to Solve a Business Problem using Data Science

⁠

As a apart of any organization our overall goal as a Data Scientist is to solve a Business Problem. In this sprint we will understand about the same

Introduction to NLP

⁠

Introduction to Text Analysis with NLP

⁠

Sentiment Analysis With NLP

⁠

This sprint will get us introduced to the world of NLP with Text and Sentiment Analysis

Modeling Language

⁠

Language Modeling

⁠

Topic Models

⁠

In this sprint we will understand advanced NLP topics like Language and Topic modelling

Parsing and Chatbot

⁠

Parsing Text

⁠

Understanding Chatbot using Rasa Tech Stack

⁠

With all the techniques learnt we will try to build Chatbot and understand Parsing techniques

Data Science Project 1

⁠

Predicting Mutual Fund Return

⁠

The objective of this problem is to predict the ‘basis point spread’ over AAA bonds

Data Science Project 2

⁠

Customer Message Domain Classification

⁠

We need to create a model predicting which class a particular message belongs to using NLP

Data Science Project 3

⁠

Modelling Customer’s Feedback

⁠

As a data scientist, your aim is to predict customer satisfaction

Transitioning to a Data Science Career

⁠

Preparing for Data Science Jobs

⁠

Getting your Github Ready

⁠

Overview of Data Science Interview Process

⁠

In the final sprint we will understand how to transition into the amazing world of Data Science

There are no rows in this table

⁠

The Data Science Pipeline is iterative and non-linear. Hence your journey is also non-linear. Think of it like a Christopher Nolan movie - stay through the initial few scenes and as you go towards the end, all the pieces fall in place. We hope you enjoy the transition journey with us.

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.