Projects

Dense Stereo Vision for Real Time Depth Estimation
Stereo 3D Tracking of Infants in Natural Play Conditions
Deep Learning for Fruit Segmentation and Counting
Neural Network Design
Semantic Segmantation of Infants
SLAM
Structure From Motion
Q-Learning: PACMAN
Q-Learning: Gridworld
Learning Fantasy Basketball Stats
KMeans Image Segmentation
Face Warping
Face Replacement
Gesture Controlled Robotic Arm
Fruit Tree Reconstruction

Dense Stereo Vision for Real Time Depth Estimation

GPU Accelerated Semi Global Block Matching on a quad-rotor for real-time dense stereo depth estimation using the NVIDIA TX2, CUDA, OpenCV and OpenVX.

Stereo 3D Tracking of Infants in Natural Play Conditions

This project was part of the SmarToyGym system at the Rehabilitation Robotics Laboratory. I designed a stereo camera system and 3D pose tracker to identify and track infant limb movements in natural play conditions with varying degrees of occlusion.

Abstract
This paper describes the design and implementation of a multiple view stereoscopic 3D vision system and a supporting infant tracker pipeline to track limb movement in natural play environments and identify potential metrics to quantify movement behavior. So far, human pose estimation and tracking with 3D cameras has been focused primarily on adults and cannot be directly extended to infants because of differences in visual features such as shapes, sizes and appearance. With rehabilitation in mind, we propose a portable, compact, markerless, low cost and high resolution 3D vision system and a tracking algorithm that exploits infant appearance attributes and depth information. This approach achieved a mean 3D tracking error of 8.21cm and a standard deviation of 8.75cm. We also identify two potential metrics for movement behavior analysis - approximate entropy and interaction events.

Details regarding the project can be found in our paper title "Stereo 3D Tracking of Infants in Natural Play Conditions" currently accepted into ICORR 2017.
Link soon

Deep Learning for Fruit Segmentation

This project was part of a larger undertaking by the Precision Agriculture team at the Kumar Lab (GRASP, University of Pennsylvania)

Details regarding the project can be found in our paper title "Counting Apples and Oranges with Deep Learning: A Data Driven Approach" currently accepted into RAL 2017 and ICRA 2017.
http://ieeexplore.ieee.org/document/7814145/

The video shown below was our next endeavour into training our network on Mango data. Here just 21 2400x1600 mango images were trained for 10,000 iterations. The network was designed in Caffe using Python.

Neural Network Classification of Handwritten Digits using Python

I built an artificial neural network from scratch to classify hand written digits using Back-propogation. There are 64 hidden units in the hidden layer, which are each represented by a 20x20 pixel grid cell. To generate this video, I lowered my learning rate and trained the neural network over ~550 epochs (for better visualisation over epochs), where each frame in the video represents the hidden layer at a given epoch.

SFM
Each hidden unit represents different stroke detections and patterns learnt from the handwritten digits. This was easily one of the most exciting and rewarding projects I've worked on.

I initially built this neural network for an assignment for the CIS519 "Introduction to Machine Learning" course taught by Eric Eaton at the University of Pennsylvania. This program was written in Python, using only the 'Image' library for image representation of the hidden units.

Semantic Segmentation of Infants in Natural Play

I used a pretrained VOC-8S model, trained on adult humans to evaluate it's performance in detecting infants in natural play conditions. The model performed reasonably well by identifying the infant region but lacking granularity in detection around limb extremeties. The next step is to retrain the model with a labelled dataset of infants and use existing weights as the initial set of weights (net surgery)

CNN architecture used was the FCN architecture introduced by Shelhamer et al. using Caffe and Python. This experiment was for the work at the Rehabilitation Robotics Laboratory - SmarToyGym Project

SLAM Robot Mapping and Localization in 2D

White pixels indicate obstacles and boundaries. Varying levels of black to grey pixels represent traversable pathways. The blue pixels indicate current Lidar sensor hits and the yellow triangle represents the robot with the longer edge pointing in the direction of the robots current orientation.

This project was part of the ESE650 : Learning in Robotics course taught by Dr. Daniel Lee at The University of Pennsylvania.

Structure From motion

Part 1: Here are some of the initial steps in the SFM pipeline, which are a result of estimating the Fundamental Matrix between two images in a sequence and performing RANSAC to get rid of erroneous outliers and get a set of at least 8 strong correspondences. The next step involves estimating the Essential Matrix and generating a set of 4 possible camera pose configurations. The video illustrates the RANSAC algorithm working with different threshold values to refine the strength of the correspondences and also a visualisation of Epipolar Lines is shown.

SFM

Part 2: This is the second half of the SFM project that I just completed for the CIS580 Machine Perception course taught by Dr. Jainbo Shi. In this video I visualise results from the final bundle adjustment step (the final point cloud) followed by the previous Perspective n Point algorithm's results. Here the different colored points denote the points added from different frames in the image sequence and the camera positions are shown in red. The next visualisation is the previous triangulation step where two frames from the image sequence are used to generate a set of triangulated 3D points. The triangulated points in black indicate non-linear refinement.

Approimate Q-Learning on PACMAN

This was one of my results for the 'Reinforcement Learning' project/homework that was part of the CIS519 'Introduction to Machine Learning' course taught by Eric Eaton at the University of Pennsylvania.

This Q-Learning agent uses function approximation and state abstraction to help PACMAN win as many games as possible. Here, the agent is trained only on 50 games and is exposed to 10 test game instances. In the above video, PACMAN manages to win 8 out of 10 times, with a 80% accuracy. The agent learns weights for state features where different states could share the same features. There is a feature function fn(state,action) over different pairs which is a vector of feature vectors.

qLearnPM
This project was implemented in Python using many support methods and programs to replicate the pacman and gridworld experience along with a python library to implement Markov Decision Processes and other libraries like util.py which helped build the q-Learning agent.

Video has been rendered at 6x the actual speed.

Q-Learning on GRIDWORLD

The next part of the 'Reinforcement Learning' project was to implement a Q-Learning Agent given the gridworld, Markov Decision Process etc. framework provided. Unlike the Value Iteration agent, the Q Learning agent actually 'learns' from experience as is seen in the video. Epsilon-greedy action selection is used which chooses random actions epsilon of the time and follows the best Q-Value during all other instances.

qLearn
This was one of my results for the CIS519 'Introduction to Machine Learning' taught by Eric Eaton at the University of Pennsylvania.

Video is rendered at 4x of the actual speed.

Learning To Turn Fantasy Basketball Into Real Money

Fantasy basketball is a rapidly growing multibillion-dollar industry with colossal yet largely untapped potential for data mining. In fantasy basketball, fantasy owners aim to win money by picking the top statistically-performing NBA players on a weekly basis. In each game, every fantasy NBA player earns a certain number of points based on the actions of the corresponding real-life NBA player. With this in mind, we develop a machine learning (ML) model that predicts the statistical performance of NBA players in a given game. We perform feature selection from a wide array of both basic and derived statistics for individual players and opposing defenses. Linear regression, random forest regression, and support vector regression are compared and both feature and model parameters are searched to ﬁnd the most accurate model for predicting future game fantasy scores. Additionally, we extend this model to predict the outcome of a given game by aggregating the individual fantasy scores of the players on each team in the game.

Project by Aaron Chan, Tim Hu and Shreyas Shivakumar

LINK to the original project report.

Image Segmentation with K-Means Clustering using Python

K_Means
This was part of an assignment for the CIS519: Introduction to Machine Learning course at Penn, taught by Eric Eaton. The video shows my K-Means Clustering algorithm running on an image, iterating from K=1 to K=80 clusters, with the last 3 frames being the original image. I wrote this program in python, using the PIL library for image representation.

Using Thin Plate Splines for Face Warping

This was part of a project for the CIS580 : Computer Vision and Computational Photography at Penn taught by Prof. Jianbo Shi. I referred to the lecture notes as well as "Principal Warps: Thin Plate Splines and Decompositions of Deformations" by Fred. L Bookstein.

Photograph Source : Image

Face Replacement

This video was one of the test videos for a face replacement project that I was a part of in the CIS581 course at the University of Pennsylvania.

Original video: YouTube

Gesture Controlled Robot Arm

Here's a video of our Gesture Controlled Robotic Arm project. We're using a Leap Motion device on the client side along with our custom designed web application that communicates wirelessly with a 5-DOF Robotic Arm on the server side. The Robotic Arm was made using 5 BMS 860DMax servos, a laser cut plastic frame and an Arduino Yun microcontroller.

Link to code repository: GitHub

3D Reconstruction of Fruit Tree

I collected a dataset of an apple tree while I was at Biglerville,PA using my cellphone (Nexus 6P) and tested out some sparse 3D reconstruction using VisualSFM and some additional methods to clean up the point cloud and plot my trajectory