It’s been a bit since I published my first post on this project, and that’s because calibration took me a lot longer to figure out than I expected. There are several ways to calibrate these kinds of camera systems, but the one that I am most interested is the same on most manufacturers use: waving a wand in front of the cameras until they all figure out where they are. This sounds pretty magical, and it sure beats relying on manual measurements or holding up a checkerboard.
Before we start I do want to acknowledge: I made this way more complicated than it needs to be since I’m trying to rely on as few external libraries as possible. If you’re working with a system like this and you just want plug-and-play camera calibration, all of this functionality already exists in OpenCV.
Eight Point Calibration
The first step in calibration is to get a good initial estimate of where the cameras are relative to each other. To do this, we rely on a concept called the “fundamental matrix”. Consider two cameras that can both see the same eight points. The projection of point i in the first camera is u_i, and the projection of point i in the second camera is v_i. The fundamental matrix F is the matrix such that u_i * F * v_i = 0 for all i. This matrix is defined down to a single degree of freedom by these eight points, and we just choose the matrix that has unit norm for numerical reasons.
From this fundamental matrix, it is possible to extract the focal lengths of each camera as well as the SE(3) transformation between the two. I’m likely to butcher the details on how exactly to do this, so I will link some papers that helped me understand how to do this.
Longuet-Higgins, H. Christopher. “A computer algorithm for reconstructing a scene from two projections.” Nature 293.5828 (1981): 133-135.
Hartley, Richard. “Extraction of focal lengths from the fundamental matrix.” Unpublished manuscript 2 (1993).
Kanatani, Kenichi, and Chikara Matsunaga. “Closed-form expression for focal lengths from the fundamental matrix.” Proc. 4th Asian Conf. Comput. Vision. Vol. 1. No. 1. 2000.
Bundle Adjustment
The next step is to fit the cameras to a larger dataset than just eight points. There will always be some error inherent in the system, and it’s important to consider a large amount of observations for our best chance of limiting these errors. Bundle adjustment sounds fancy, but in reality it’s just gradient descent.
I formulate the camera adjustment process as an optimization problem, where the camera parameters and 3D wand marker locations are parameters, and the cost is the reprojection error using these estimated parameters compared to the baseline observation. The system is also constrained in the distance between two points corresponding to the same wand location. I use Levenberg-Marquardt updates to solve this, but there are likely plenty of other approaches to solving the problem.
Since all of the data so far is simulated in an ideal environment, the expected final error value should be zero. And, after lots of bug hunting, it is! I’ve linked the paper I used to implement all of this below.
Mitchelson, Joel, and Adrian Hilton. “Wand-based multiple camera studio calibration.” Center Vision, Speech and Signal Process (2003).
Next Steps
While the calibration functionality exists in the codebase, I still need to implement a way to set the server into a “calibration” state to actually set up a calibration. This should be pretty boilerplate, all things considered.
After that should come real-world testing! My current planned testing setup will just consist of two raspberry pis, two Arducam OV9281 camera modules, and some ethernet cables to wire everything together. I may need to add some extra wires for camera sync purposes, but overall I’m pretty confident this will give me an accurate (if slow) approximation of the final system.
Until next time!