OpenCV is great for all kinds of computer vision tasks. Many of these can run in a fully automated fashion, where parameters for the CV algorithms are provided by the user before the program begins or can be determined algorithmically at run time. Some, however, cannot.
For example, for my current project I am trying to find the optimal settings for OpenCV’s implementation of the stereo block matching algorithm. This requires computing disparity pictures, examining them visually, and deciding whether the parameters let the block matcher perform well or not. This is fairly subjective work, and it’s really annoying if you have to restart your program in order to see results with other settings or, if you’re using e.g. the Python interpreter, retype your arguments and display the window again. Of course, you could run a loop over all possible parameter combinations, but that makes it hard to experiment.
In this example, I will show how to implement a GUI in Python that lets you tune settings in your program and recompute values on the fly depending on user inputs in a graphical window.
GUI background: Working with callbacks
If you’ve programmed a standalone program before, you’ve most likely written some kind of user interface. In Python, a great tool for creating command line UIs is argparse. I use it for just about everything because it takes all the work of putting together usage messages and parsing user inputs off your hands.
A command line UI is relatively easy to write even in simple scripts because you can halt the program’s progress until you’ve received the input you need. Graphical interfaces are normally a bit more complex because they don’t halt the program’s progress. Also, receiving the inputs from the program is sometimes more complicated. The principle behind most points of interaction is simple, though: You register a GUI element and pass it a callback so that it knows what to do with the input the user gives it.
A callback is very simple: All it is is a function passed without calling it. A very simple example could look like this:
def foo(): print("You called foo.") def bar(): print("You called bar.") for callback in foo, bar: callback()
Here I have two functions, foo and bar. I make a tuple out of them containing only the function objects, and then iterate over both items in that tuple, calling each item with no arguments. Since both functions require no arguments, this works, so the output is:
You called foo. You called bar.
It’s really that simple.
Of course, the functions could be a little more complicated, and they could also require arguments. Therefore it’s always important that you register a callback that is compatible with the element you’re using with it – if the element passes an argument, your callback needs to accept an argument, etc.
Many GUI elements ask for a callback when they’re initialized. They use that somewhere in their own internals. You should know from the documentation how many arguments they pass to the callback and what kind they are. You use this to control what happens when the user interacts with the GUI.
Of course, most of the time you write a callback, you’ll want it to do something more complex than a Hello World. Your function will probably need more information than the GUI element passes to it. There are many ways to provide the missing data: The most common, and probably the most logical, way of doing this is by using object orientation, but other possibilities include using partially applied functions and global variables. In my opinion, global variables are okay, but only if there’s a good reason for having them and they’re used read-only. As much as I think partially applied functions are cool, I have never been in a situation where they were able to solve a problem more neatly than object orientation. Perhaps it’s a question of taste.
This code is from a project to develop a self-calibrating stereo camera. The code I’ve generated so far lets you calibrate a stereo camera in an automated fashion, and you can make it from any two webcams you like. Once the stereo pair has been calibrated, the goal is to use them in conjunction to produce 3D pictures. This will happen in real time, so I’m using the stereo block matching algorithm in OpenCV. It’s possible to get good results with this algorithm, but you need to tune the parameters you use in order to get good results.
Since originally writing this post, the code has changed a lot and grown more flexible, but I’m leaving the content as is. If you’d like to see the new code, check out the now-full-fledged StereoVision package, more specifically its UI utilities.
What I need is a window that shows the disparity image computed from two input images taken with the calibrated stereo camera, with the possibility of adjusting the algorithm’s three parameters: camera type, disparity search range and block size. I want the image to update automatically when a new parameter is set.
OpenCV provides all the tools needed to do this. I can create a window, show an image in it, and add sliders to it. Also, I can associate a callback with each slider that is called every time the slider is moved. The callback is called with a single argument: the slider’s new value. It looks like this, assuming you’ve already defined the variables I pass in these examples:
# Create window cv2.namedWindow(window_name) # Show an image in the window cv2.imshow(window_name, image) # Add a slider cv2.createTrackbar(slider_name, window_name, start_value, max_value, callback)
In my application, the result of all this looks like this:
How to make it happen
I’m going to take a step back and explain what my code does in a bit more detail here, so feel free to skip to whatever’s interesting for you. If you’re wanting to do this yourself, you’ll need to already have your stereo camera and have it calibrated.
Rectifying the stereo image pair
In order to make a 3D image, the two images you take with your stereo camrea have to be rectified. This involves lining up both images in such a fashion that you can find the same point in both pictures through searching along a given line. If your cameras are aligned horizontally, which is the normal case, this will be a horizontal line, otherwise it will be a vertical line.
You can do this using my StereoCalibration class. It works like this:
# This assumes you've already calibrated your camera and have saved the # calibration files to disk. You can also initialize an empty calibration and # calculate the calibration, or you can clone another calibration from one in # memory calibration = StereoCalibration(input_folder=my_folder) # Now rectify two images taken with your stereo camera. The function expects # a tuple of OpenCV Mats, which in Python are numpy arrays rectified_pair = calibration.rectify((left_image, right_image))
Computing the disparity image
Now that you have a rectified image pair, you can compute the disparity between both pictures. A fast algorithm that’s implemented in OpenCV is stereo block matching. You can use it like this:
# Initialize a stereo block matcher. See documentation for possible arguments block_matcher = cv2.StereoBM() # Compute disparity image disparity = block_matcher.compute(rectified_pair, rectified_pair) # Show normalized version of image so you can see the values cv2.imshow(window_name, disparity / 255.)
Tuning the block matcher
You’ll notice pretty fast that using the algorithm in its default state will probably give you bad results. By changing the parameters you used to initialize the block matcher and trying out new combinations you can find the optimal parameters for your camera pair.
Although the goal here is to use a GUI to tune the block matcher manually, every GUI needs a good backend that works well all on its own – otherwise your code ends up messy. That’s why I implemented the GUI and the calibrated pair that you tune separately – to separate the interface from the backend’s design. You can find the code for both on Github.
Here’s the calibrated camera pair:
class CalibratedPair(webcams.StereoPair): """ A stereo pair of calibrated cameras. Should be initialized with a context manager to ensure that the camera connections are closed properly. """ def __init__(self, devices, calibration, stereo_bm_preset=cv2.STEREO_BM_BASIC_PRESET, search_range=0, window_size=5): """ Initialize cameras. ``devices`` is an iterable of the device numbers. If you want to use the ``CalibratedPair`` in offline mode, pass None. ``calibration`` is a StereoCalibration object. ``stereo_bm_preset``, ``search_range`` and ``window_size`` are parameters for the ``block_matcher``. """ if devices: super(CalibratedPair, self).__init__(devices) #: ``StereoCalibration`` object holding the camera pair's calibration. self.calibration = calibration self._bm_preset = cv2.STEREO_BM_BASIC_PRESET self._search_range = 0 self._window_size = 5 #: OpenCV camera type for ``block_matcher`` self.stereo_bm_preset = stereo_bm_preset #: Number of disparities for ``block_matcher`` self.search_range = search_range #: Search window size for ``block_matcher`` self.window_size = window_size #: ``cv2.StereoBM`` object for block matching. self.block_matcher = cv2.StereoBM(self.stereo_bm_preset, self.search_range, self.window_size) def get_frames(self): """Rectify and return current frames from cameras.""" frames = super(CalibratedPair, self).get_frames() return self.calibration.rectify(frames) def compute_disparity(self, pair): """ Compute disparity from image pair (left, right). First, convert images to grayscale if needed. Then pass to the ``CalibratedPair``'s ``block_matcher`` for stereo matching. If you wish to visualize the image, remember to normalize it to 0-255. """ gray =  if pair.ndim == 3: for side in pair: gray.append(cv2.cvtColor(side, cv2.COLOR_BGR2GRAY)) else: gray = pair return self.block_matcher.compute(gray, gray) @property def search_range(self): """Number of disparities for ``block_matcher``.""" return self._search_range @search_range.setter def search_range(self, value): """Set ``search_range`` to multiple of 16, replace ``block_matcher``.""" if value == 0 or not value % 16: self._search_range = value self.replace_block_matcher() else: raise InvalidSearchRange("Search range must be a multiple of 16.") @property def window_size(self): """Search window size.""" return self._window_size @window_size.setter def window_size(self, value): """Set search window size and update ``block_matcher``.""" if value > 4 and value < 22 and value % 2: self._window_size = value self.replace_block_matcher() else: raise InvalidWindowSize("Window size must be an odd number between " "5 and 21 (inclusive).") @property def stereo_bm_preset(self): """Stereo BM preset used by ``block_matcher``.""" return self._bm_preset @stereo_bm_preset.setter def stereo_bm_preset(self, value): """Set stereo BM preset and update ``block_matcher``.""" if value in (cv2.STEREO_BM_BASIC_PRESET, cv2.STEREO_BM_FISH_EYE_PRESET, cv2.STEREO_BM_NARROW_PRESET): self._bm_preset = value self.replace_block_matcher() else: raise InvalidBMPreset("Stereo BM preset must be defined as " "cv2.STEREO_BM_*_PRESET.") def replace_block_matcher(self): """Replace ``block_matcher`` with current values.""" self.block_matcher = cv2.StereoBM(preset=self._bm_preset, ndisparities=self._search_range, SADWindowSize=self._window_size)
The class inherits from webcams.StereoPair, which is just a class that handles dealing with two webcams as a stereo pair simultaneously. It also does some nice things like letting you take a picture with both cameras simultaneously or show their video streams live. If you use them in a with clause they will clean up after themselves and close the camera connection when they’re done.
All of this functionality is inherited from that class, with one modification that you can see in the method get_frames. This method calls the superclass’ get_frames method, collects the frames it returns, and rectifies them using the StereoCalibration object stored on the CalibratedPair. Because the inherited methods show_frames and show_videos call this method in order to get the frames they show, when you use a CalibratedPair rather than just a normal StereoPair to view camera outputs, you’ll always see the rectified images.
Non-inherited methods include replace_block_matcher, which replaces the block matcher stored on the CalibratedPair object. You’ll also notice several decorators being used in the class to do something that you don’t find in Python code too often: They act as getters and setters. They control access to variables that the class is meant to protect from the user.
The reason for this is that I don’t want the user to pass bad values to the block matcher. If the user passes bad values, OpenCV silently instantiates a new StereoBM object without throwing an error. That would lead to bad results. The decorators allow the object’s fields search_range, window_size and stereo_bm_preset to be accessed as normal variables – all they do is return their counterparts that are named with a leading underscore. Setting them is also done normally, but because I’ve used setter decorators, doing the following:
calibrated_pair.window_size = 5
actually does this:
The window_size method checks the input value and throws a meaningful error if the value is inappropriate.
Storing the parameters for the block matcher means that they exist persisently as a part of the object, so when I change one variable I can retain the settings on the other variables without having a lot of complicated bookkeeping code.
The core of the class, compute_disparity, checks the passed images to see if they are grayscale. If not, they’re converted to grayscale. Then they’re passed to the block matcher.
So if you use this class, you can just pass new parameters for the block matcher again and again, replacing it when you need to, and check the results. If you have an algorithm that can check the quality of disparity images, you don’t need a GUI – the class will work just fine for automated tuning.
Implementing the frontend
Of course, you may not have an algorithm that can judge the quality of disparity pictures. I don’t, which is why I’m stuck with my brain. To ease the parameter selection, I implemented a StereoBMTuner class which hides the complexity of dealing with the details of cv2’s high-level GUI functions.
class StereoBMTuner(object): """ A class for tuning Stereo BM settings. Display a normalized disparity picture from two pictures captured with a ``CalibratedPair`` and allow the user to manually tune the settings for the stereo block matcher. """ #: Window to show results in window_name = "Stereo BM Tuner" def __init__(self, calibrated_pair, image_pair): """Initialize tuner with a ``CalibratedPair`` and tune given pair.""" #: Calibrated stereo pair to find Stereo BM settings for self.calibrated_pair = calibrated_pair cv2.namedWindow(self.window_name) cv2.createTrackbar("cam_preset", self.window_name, self.calibrated_pair.stereo_bm_preset, 3, self.set_bm_preset) cv2.createTrackbar("ndis", self.window_name, self.calibrated_pair.search_range, 160, self.set_search_range) cv2.createTrackbar("winsize", self.window_name, self.calibrated_pair.window_size, 21, self.set_window_size) #: (left, right) image pair to find disparity between self.pair = image_pair self.tune_pair(image_pair) def set_bm_preset(self, preset): """Set ``search_range`` and update disparity image.""" try: self.calibrated_pair.stereo_bm_preset = preset except InvalidBMPreset: return self.update_disparity_map() def set_search_range(self, search_range): """Set ``search_range`` and update disparity image.""" try: self.calibrated_pair.search_range = search_range except InvalidSearchRange: return self.update_disparity_map() def set_window_size(self, window_size): """Set ``window_size`` and update disparity image.""" try: self.calibrated_pair.window_size = window_size except InvalidWindowSize: return self.update_disparity_map() def update_disparity_map(self): """Update disparity map in GUI.""" disparity = self.calibrated_pair.compute_disparity(self.pair) cv2.imshow(self.window_name, disparity / 255.) cv2.waitKey() def tune_pair(self, pair): """Tune a pair of images.""" self.pair = pair self.update_disparity_map()
When you instantiate an object of this class, it sets up a named window, creates three trackbars and calls the tune_pair method, which shows the disparity map and the other GUI elements. It’s important to note that the class itself stores the CalibratedPair, as well as the image pair, that it’s currently working on. This is important for the callbacks.
The callbacks registered with the trackbars receive only a single argument: The value of the trackbar. They are called every time the trackbar is updated.
Each callback is pretty simple. It receives the value from the trackbar and tries to use it to set the appropriate field on the StereoBMTuner’s CalibratedPair. If this is successful, the CalibratedPair swaps out its StereoBM object. If the passed value was inappropriate, the error is caught and nothing is done. If another error occurs, the method stops. If no error occurs, the StereoBMTuner calls its method update_disparity_map, which passes the image pair stored on the StereoBMTuner to the CalibratedPair and asks for a new disparity image. The CalibratedPair computes the disparity image with the new parameters and returns it to the StereoBMTuner, which normalizes the image and shows it in the GUI. This continues until the user presses a key on the keyboard, allowing the user to find the optimal parameters for the block matching algorithm for the image pair in question.
Putting it all together
This is all fine and good – we have all the building blocks, so now we can use them to make a program. I’ve implemented the program so that it takes a series of pictures stored in a folder, as well as a calibration folder, and iterates through all of the images, showing their disparity maps and allowing the user to tune the block matcher. After the user has gone through all the images, he receives a report of what he chose and how often. The code looks like this:
def main(): """Let user tune all images in the input folder and report chosen values.""" parser = argparse.ArgumentParser(description="Read images taken from a " "calibrated stereo pair, compute " "disparity maps from them and show them " "interactively to the user, allowing the " "user to tune the stereo block matcher " "settings in the GUI.") parser.add_argument("calibration_folder", help="Directory where calibration files for the stereo " "pair are stored.") parser.add_argument("image_folder", help="Directory where input images are stored.") args = parser.parse_args() calibration = calibrate_stereo.StereoCalibration( input_folder=args.calibration_folder) input_files = find_files(args.image_folder) calibrated_pair = CalibratedPair(None, calibration) image_pair = [cv2.imread(image) for image in input_files[:2]] rectified_pair = calibration.rectify(image_pair) tuner = StereoBMTuner(calibrated_pair, rectified_pair) chosen_arguments =  while input_files: image_pair = [cv2.imread(image) for image in input_files[:2]] rectified_pair = calibration.rectify(image_pair) tuner.tune_pair(rectified_pair) chosen_arguments.append((calibrated_pair.stereo_bm_preset, calibrated_pair.search_range, calibrated_pair.window_size)) input_files = input_files[2:] stereo_bm_presets, search_ranges, window_sizes = , ,  for preset, search_range, size in chosen_arguments: stereo_bm_presets.append(preset) search_ranges.append(search_range) window_sizes.append(size) for name, values in (("Stereo BM presets", stereo_bm_presets), ("Search ranges", search_ranges), ("Window sizes", window_sizes)): report_variable(name, values) print()
As you can see, setting up the GUI is easy because all of the work is encapsulated in the classes explained above. All the main function really does is load the appropriate images, keep track of the parameters the user sets and report them at the end.
Important to note here is that the complexity in all this work is funneled down to the most abstract level possible. It would have required a bit less thought to just implement this all as a script, but it would have been at least as much work and the result would have been horrific. With a little bit of thought, each part of the program can be written fairly abstractly, making it reusable, so at the end, all that needs to be implemented is exactly what is needed for this use case. All of the one-time logic is contained in the main function, and the redundant part of the logic is stored in a local function in order to keep the code clean.
So there you have it – a GUI with only OpenCV as a dependency with simple user interaction that sets off complex action deeper within the program.