Predict Pushback Time at Airports (EDA, Data Processing, Feature Engineering)
Led a team of 3 in a competition hosted by NASA and the Federal Aviation Administration, USA. Aim was to predict the minutes until pushback of a flight using public air traffic and weather data.
Focused on reading and writing good research papers and developed several heuristic-based and
machine-learning methods for this task. Ranked 23rd out of 408 participants.
Clickbait Headline Detector (Pytorch, Transformer, NLU, Language Generation)
Built a system for the Webis Clickbait-17 challenge to identify Twitter posts that are clickbait in nature.
Pre-trained and Fine-tuned DeBERTa and ELECTRA models and achieved an accuracy of 87.2% on test set.
Explored text generation task by fine-tuning the T5 transformer and finding similarities between original and generated headlines for an article. Obtained 81% accuracy on the test set.
NHL Goal Prediction (Python, Pandas, Docker, Flask)
Led a team of 4 to build entire machine learning pipeline: data acquisition from API, data cleaning, feature extraction & selection, building interactive visualization, model training, testing, and deployment.
Achieved 92% accuracy on live data and full grades for the project.
Human Activity Recognition using CNN-LSTM (Python, TensorFlow, Keras)
Extracted the features from each block of training sequences using CNN and interpreted the extracted feature sequences using LSTM to classify 6 different human activities such as walking/sitting/standing etc.
Trained on UCI HAR dataset and tested on real-time with accelerometer and gyroscope collected data of smartphone with 90.07% of test accuracy.
Real-time Product Analysis using Data Mining
Developed a price comparison engine that facilitates the buyer to compare products from various E-commerce sites and purchase at the best deal.
Human Gender and Age Estimation on Real-time Video (Matlab, Object Detection)
Implemented Biologically Inspired Features, Kernel Partial Least Square Regression, and Viola-Jones algorithm in MATLAB.
Minimized computation for real-time prediction by merging two separate tasks of gender detection and age estimation in a single step preserving overall accuracy.
Extreme Weather Classification
Detected extreme weather events from tabular atmospherical data containing 16 features corresponding to a time point and location, latitude and longitude. The data set was a subset of
Performed Data Wrangling and Exploratory Data Analysis, and implemented Logistic Regression from scratch. Also, tried various other boosting and bagging classification models.
CropHarvest Classification:
A binary classification task to predict if the given region of land has crop or not.
The data set consists of monthly aggregated remote sensing (satellite), meteorological and topographical data, and is a processed subset of
Won the 36 HR state-level hackathon. Led a team of 3.
Applied Natural Language Processing to process project description written in Resume and used statistical analysis of candidate’s skills to recommend job subdomain to that candidate based on their profile. This inference is useful for skill recommendation module for other candidates. As data increases, our recommendation becomes data-driven.
Implemented Principal Component Analysis (from scratch), Linear Discriminant Analysis, and K- Nearest Neighbour classifier in Python. Test Accuracy: 96.47%
Smart Dustbin
A touch free and GSM based garbage and waste collection bin overflow indicator system for smart cities.
Hardware used: ATmega32 Microcontroller, GSM Modem, IR Sensors, Motor
Modified Round Robin Algorithm [Own Algorithm]
Traditional Round Robin scheduling algorithm reduces the penalty which short jobs suffer in FCFS algorithm and long jobs suffer in SJF algorithm.
Infect there is no starvation in RR for jobs but has some scope of improvement on criteria such as waiting time, turnaround time, throughput and number of context switches.
I developed a Dynamic Quantum Time algorithm which is an amalgamation of traditional RR and Priority Scheduling and improves the performance by ~40% as compared to simple RR algorithm.