I build perception and learning systems for robotics - from grasp detection and occlusion-robust vision to vision-language models, validated end-to-end on physical robots.
I am a second-year PhD student in Robotics Engineering at WPI, working in the Manipulation and Environmental Robotics Lab with Prof. Berk Calli.
Watching Yamaha's Motobot struggle against Valentino Rossi got me thinking about the gap between perceiving an environment and making the right decision in it - in conditions the system wasn't built for. That question has shaped most of my research.
I came to robotics through Electronics Engineering in Pune, India, which means I think about systems from the hardware up. My work spans robot perception, deep learning for manipulation, and vision-language reasoning - all validated on real robots, not just in simulation.
Outside the lab I am a competitive swimmer and a die hard soccer fan (Glory Glory Man United!). The early-morning training discipline has been useful. Especially when a model training run fails at 3am.
I am looking for internship roles in robot perception, computer vision, deep learning, and physical AI - at teams building things that need to work when the situation changes.
Real-time perception stack for encoderless Franka Panda. WGAN-GP inpainting reconstructs occluded joints from markerless images; keypoint detection with UKF temporal smoothing feeds Image-Based Visual Servoing, enabling stable control under severe occlusion. Outperforms LaMa with 11.7 FID, 0.91 precision.
Feature-level fusion across multiple grasp expert networks. Complementarity quantified via Q-statistics and error correlation - proving mid-accuracy moderately-correlated ensembles outperform SOTA individual models. Evaluated on Cornell, Jacquard, GraspNet-1B and Franka Panda System.
Proposal-conditioned VLA framework for cluttered tabletop grasping. RGB-D Mask R-CNN generates object proposals; confidence-based skill selection chooses the best planner across known, unknown, and occluded multi-object scenarios. Deployed end-to-end on Yale OpenHand gripper.
Real-time fusion of RGB-D, EEG, and audio for human activity tracking and attention estimation. Led a three-person team to full deployment - OpenPose, temporal attention, GMM, and optical flow as a full-stack GUI, not just a research prototype.
Coursework and independent projects spanning 3D vision, SLAM, and robotics systems.

Enhanced NeRF with improved rendering and sampling for photorealistic novel view synthesis.

Full SfM pipeline with epipolar geometry and bundle adjustment for monocular depth.
3.2% IoU improvement via voxel grid filtering and bird's-eye-view with PointNet++.

VIO at 29 FPS via stereo camera + IMU fusion for real-time 6-DOF pose estimation.

RRT-APF hybrid planner for dynamic environments with moving obstacles.

Multi-view calibration with robust homography estimation and distortion correction.
Looking for research and engineering internship roles in robot perception, computer vision, deep learning, and physical AI. If your team cares about what happens when the system meets the real world - let's talk.