This post is the first in a series documenting my work on a low-cost infrared camera tracking system. These systems are usually used with retroreflective markers to perform high-fidelity motion tracking; some of the higher-end systems have precision down to a fraction of a millimeter even when the marker is several meters away from the cameras.
I was inspired to start on this project by some Youtube videos, primarily some of the work done by Stuff Made Here. He’s been able to create some absolutely incredible systems that rely on high-fidelity motion tracking to work. Here’s one about a basketball-seeking hoop, and another about an automatic bow and arrow system.
These projects are incredible, but the system that they rely on is prohibitively expensive for the average hobbyist. The OptiTrack Prime platform costs $2,500 per camera and requires a subscription to use. Even a barebones system is pushing $12,000. While this is the kind of money that large movie studios and AAA game developers can find in between their couch cushions, this is significantly more than most people tinkering in their garage can afford.
The Hardware
There are three factors that specify how useful these camera systems are: accuracy, latency, and frame rate. Any cost-effective implementation of a tracking system inherently balances these three performance metrics against the cost of the system. In the spirit of this balance, the goals for the camera units are:
- Camera cost under $100
- Frame rate of at least 120FPS
- No more than +/- 0.5 mm accuracy error
- No more than 5ms of latency
- Power, data, and sync through a single cable
These goals should position the cameras as slightly better than the OptiTrack Flex 3 series of cameras, which still cost $660 per camera. A system consisting of six cameras and a central hub should cost less than $1000. While not cheap, this is a significantly more accessible option than existing commercial solutions.
The Software
Of course, these cameras are useless without some kind of software to tie all of the different camera inputs together into 3D spatial information. While setting up this project, I thought a lot about how to structure the work to help users feel secure with this system. One of the biggest drawbacks of the commercial options in my opinion is that, if the company ever goes under, you’re stuck with an assortment of very expensive and shiny bricks that you can no longer use.
To avoid this, I’m going to release the software under a GPLv3 License. This provides users full access to the code and the ability to implement new features as they see fit; provided they comply with the license and integrate any new features back into the existing code base if they plan to release the features in the community.
The camera hardware designs will not be Open-Sourced, but a contingency will be established that releases the designs under an OSHW license should I no longer be able to produce and sell them.
The software will be written in Rust with as few dependencies as possible (ideally just nalgebra, a linear algebra package). The software will take in the camera inputs, compute the spatial locations of markers, and then output this data over a network connection. Using the network connection for output enables users to write code that consumes this data more easily in their language of choice since they will not be tied to using Rust just because my software is written in it.
I really want to up my testing quality in this project, so I’m actually implementing an entire camera simulator in the test environment. The goal is to debug as many problems as possible and fully understand the hardware requirements before even touching the hardware design.
Current Work
So far, I have implemented:
- Representation of markers in 3D space
- Rasterization of those markers into a 2D camera view
- Filtering, Thresholding, and Segmenting of the image
- Center of Mass computation to find marker centers on image plane
You can see the result of this operation in the image below. You may need to zoom in a bit, but there is a green dot at the center of each body to signify the center of mass calculation result.
Next on the block is starting on the actual server software that collects the views from the different cameras and converts those into actual spatial coordinates. Once that’s completed and rigorously tested it will be time to move on to hardware!