HackNYU2018 project developed in 36 hours, focusing on using A.I. and computer vision to build a virtual personal fitness trainer. Capable of using 2D human pose estimation with commodity web-cameras to critique your form and count your repetitions.
This project won the award for "The Most Startup-Viable Hack" as awarded by Contrary Capital.
f2: live pose estimation in a busy environment; note: here the user has over-extended their right arm (image is mirrored), which is considered bad form in this variant of the dumb bell shulder press, hence the message.
The pose estimation was based off of tf-pose-estimation by ildoonet. The model architecture, OpenPose developed by CMU Perceptual Computing Lab, consists of a deep convolutional neural network for feature extraction (MobileNet) and a two-branch multi-stage CNN for confidence maps and Part Affinity Fields (PAFs).
This feature allowed us to track the position of the user's joints using a commodity webcam.
f3: pseudo data flow diagram; note: the pose estimation model output must be processed as it returns pose estimation for all possible humans in frame (see: Future Changes ).
This app runs in browser and the pose estimation and form critique generation is performed on a Flask server. The webcam feed is captured using WebRTC and screenshots are sent to the server as a base64 encoded string every 50ms or as fast as the server can respond - which ever is slower (see: Future Changes ).
This means the server could be run in the cloud on high-performance hardware and the client could be any device with a WebRTC-supported web browser and camera. There is also the option for video to be recorded and sent to the server for post-processing if the user's network connectivity is too slow to stream a live feed.
Multiple Pose Estimations for One User
Current: The model estimates joints for all subjects found in the input image; we then analyze the output and extract the pose that is most likely to be the user.
a. Modify model and training data to only estimate a single 'best' pose.
b. Implement re-identification and support multiple users at once. This is viable as forward propagation time does not increase with multiple poses being estimated.
Webcam Image Data Transfer
Current: Webcam captures are encoded in base64 strings and a post request is sent to the server with the data (note: this was done for ease of implementation due to the hackathon time constraint).
Possible Improvements: Implement web sockets to transfer webcam captures instead.