The goal of this assignment is to design and implement a program that recognizes at least three different gestures of a person in front of a web camera.
We have chosen to implement three interactive "games". The first one is a 3D world visualization tool, using your relative position to orient the map. The second is a Balloon PoP game in which the player can pop balloons by hiting them. And the third one is a Rock Paper Scissor game in which the player competes against the coputer for rock-paper-scissor supremacy
You can find a video of each of these plus a description of how everything was achieved below. Click on each title.
Also in the downloads section you can find a .zip file with all the source code and a working executable in the Debug folder
This part of the software waits until it can see someone, then it diplays a 3d view of earth (from google earth). As you move your head aroud, the program rotates the image to give you the illusion of a 3d view. It also handles two levels of depth as you move back and forth
The idea for this came from one of Johnny Chung Lee's Wiimote projects. I figured, it would be cool if we could do the same thing, but without all the extra gadgets.
On the most basic level, this application runs a face detection algorithm. If it sees a face, it starts tracking it (by running the same algorithm on every frame). It then finds the position of the face relative to the camera and depending on this position it displays one of 98 images
How it works:
Face detection and tracking work using open cv's cvHaarDetectObjects. I started with some example code, and worked my way from there. This algorithm uses a precompiled XML file that included information about people's facial characteristics. This returns an object with a position. I use this position to calculate where the person is relative to the camera
The image rotations work in a very rudamentary fashion. I took 98 screenshot (I think) of the world from different angles from Google Earth. So there are 7 shots per row and column, and 2 different depth levels. The porgram then picks a specific image according to the coordinates of the location of the user's face
The GUI was made taking advantage of open cv's High GUI functionality. Again, I started with some sample code, and worked my way from there.
Problems:
It is very difficult to calculate any kind of statistical data for this program. There are some bugs, especially with the face detection algorithm, that make the program feel glitchy at times. However taking into account that I had two weeks to develop this, and when I began I didn't know how to even open a video feed, I am satisfied with the results. The face recognition algorithm in this program works best when the face is directly facing the camera. If you tilt your face to the sides, it has lots of problems. The algorithm also varies from person to person, so if it doesn't work well for you, I sincerely apologize. Lastly, the program is a little sluggish at the beggining when it has to load the images into memory, performance gets better after that
Performance:
This program runs on around 15M of RAM, it does, though, take a significant share of the CPU resources
Further Development:
I would like to make this an add-on for Google Earth API, or Bing 3d maps. But this will come in the future, when I get some time off school
Additional Comments:
This program can also recognize yes-no gestures. It does this by keeping a history of the x and y position of the face and calculating the average direction of motion
If you have any questions or comments, there is a contact form at the bottom of the page. You can also email me at luiscarrascob@gmail.com. All questions or comments are very welcome
Lastly, in the downloads section you can find the source code, as well as a working executable inside the Debug folder of the .zip file. If you download the source, and need any help or explanation, please let me know!
The balloon popping game uses motion energy templates to determine where motion is taking place. It takes the last 5 frames and averages all the pixels. Any pixel whose average is above the threshold difference is colored red. The balloons fall down the screen by overlaying them onto the current frame. The balloons are overlayed by overwriting any non-background balloon pixels at the specified position in the current frame. When the overlay function and motion template function are combined, the program checks to see if there is motion at any of the pixels where the balloon is overlayed. If the program detects any all red pixels in the range of the balloon, the program “pops” the balloon and “drops” a new one.
The program was tested against an off-white wall with solid lighting conditions. If the light source flickers too much (EX: if there is a tv on in the room), the motion template will capture almost everything as motion and paint most of the image red. Also, if the user moves too quickly, the motion template is patchy and the balloon may not be popped. The motion template does not work reliably in the dark unless the user is close the monitor, as the threshold is set for light conditions and dark conditions to produce enough variance. Taking all this into account, the balloon popper is 90%+ accurate in terms of anyone using it with minimal instruction. If the user keeps his movements smooth then the accuracy is 100%. Some of the difficulty comes from finding a threshold that isn’t too sensitive. If the threshold is too low, subtle light changes can trigger the detector, and if it is too high, then the template is patchy. If we had more time, it would be nice if the program would pop balloons based off hand movement only, ignoring all other movement. And it would also be helpful to smooth the template with a better averaging scheme and more frames, but we didn’t want to use too many resource and bog down the game.