Super Resolution, Going from 3x to 8x Resolution in OpenCV

Super Resolution, Going from 3x to 8x Resolution in OpenCV

Download Source Code

A few weeks ago I published a tutorial on doing Super-resolution with OpenCV using the DNN module.

I would recommend that you go over that tutorial before reading this one but you can still easily follow along with this tutorial. For those of you who don’t know what Super-resolution is then here is an explanation.

Super Resolution can be defined as the class of Algorithms that upscales an image without losing quality, meaning you take a  low-resolution image like an image of size 224×224 and upscale it to a high-resolution version like 1792×1792 (An 8x resolution)  without any loss in quality. How cool is that?

Anyways that is Super resolution, so how is this different from the normal resizing you do?

When you normally resize or upscale an image you use Nearest Neighbor Interpolation. This just means you expand the pixels of the original image and then fill the gaps by copying the values of the nearest neighboring pixels.

The result is a pixelated version of the image.

There are better interpolation methods for resizing like bilinear or bicubic interpolation which take weighted average of neighboring pixels instead of just copying them.

Still the results are blurry and not great.

The super resolution methods enhance/enlarge the image without the loss of quality, Again, for more details on the theory of super resolution methods, I would recommend that you read my Super Resolution with OpenCV Tutorial.

In the above tutorial I describe several architectural improvements that happened with SR Networks over the years.

But unfortunately in that tutorial, I only showed you guys a single SR model which was good but it only did a 3x resolution. It was also from a 2016 paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network” 

That all changes now, in this tutorial we will work with multiple models, even those that will do 8x resolution.

Today, we won’t be using the DNN module, we could do that but for the super resolution problem OpenCV comes with a special module called dnn_superres which is designed to use 4 different powerful super resolution networks. One of the best things about this module is that It does the required pre and post processing internally, so with only a few lines of code you can do super resolution.

The 4 models we are going to use are:

  • EDSR: Enhanced Deep Residual Network from the paper Enhanced Deep Residual Networks for Single Image Super-Resolution (CVPR 2017) by Bee Lim et al.

  • ESPCN: Efficient Subpixel Convolutional Network from the paper Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network (CVPR 2016) by Wenzhe Shi et al.

  • FSRCNN: Fast Super-Resolution Convolutional Neural Networks from the paper Accelerating the Super-Resolution Convolutional Neural Network (ECCV 2016) by Chao Dong et al.

  • LapSRN: Laplacian Pyramid Super-Resolution Network from the paper Deep Laplacian pyramid networks for fast and accurate super-resolution (CVPR 2017) by Wei-Sheng Lai et al.

Here are the papers for the models and some extra resources.

Make sure to download the zip folder from the download code section above. As you can see by clicking the Download models link that each model has different versions like 3x, 4x etc. This means that the model can perform 3x resolution, 4x resolution of the image, and so on. The download zip that I provide contains only a single version of each of the 4 models above.

You can feel free to test out other models by downloading them. These models should be present in your working directory if you want to use them with the dnn_superres module.

Now the inclusion of this super easy to use dnn_superres module is the result of the work of 2 developers Xavier Weber and Fanny Monori. They developed this module as part of their GSOC (Google summer of code) project. GSOC 2019 also made NVIDIA GPU support possible. 

It’s always amazing to see how a summer project for students by google brings forward some great developers making awesome contributions to the largest Computer Vision library out there.

The dnn_superes module in OpenCV was included in version 4.1.2  for C++ but the python wrappers were added in 4.3 version about a month back, so you have to make sure that you have OpenCV version 4.3 installed. And of course, since this module is included in the contrib module so make sure you have also installed OpenCV contrib package.

Note: You can’t install OpenCV 4.3 version by doing pip install as the latest version here open-contrib-python from pip is still version

So the pypi version of OpenCV is maintained by just one guy named: Olli-Pekka Heinisuo by username: skvark and he updates the pypi OpenCV package in his free time. Currently, he’s facing a compiling issue which is why 4.3 version has not come out as of 7-15-2020. But from what I have read, he will be building the .whl files for 4.3 version soon, it may be out this month. If that happens then I’ll update this post.

So right now the only way you will be able to use this module is if you have installed OpenCV 4.3 from Source. If you haven’t done that then you can easily follow my installation tutorial.

I should also take this moment to highlight the fact you should not always rely on OpenCV’s pypi package, no doubt skvark has been doing a tremendous job maintaining OpenCV’s pypi repo but this issue tells you that you can’t rely on  a single developer’s free time to update the library for production use cases, learn to install the Official library from source. Still, pip install opencv-contrib-python is a huge blessing for people starting out or in early stages of learning OpenCV, so hats off to skvark.

As you might have noticed among the 4 models above we have already learned to use ESPCNN in the previous tutorial, we will use it again but this time with the dnn_superres module.

Super Resolution with dnn_superres Code

Directory Hierarchy

After downloading the zip folder, unzip it and you will have the following directory structure.

This is how our directory structure looks like, it has a Jupyter notebook, a media folder with images and the model folder containing all 4 models.

You can now run the notebook Super_Resolution_with_dnn_superres.ipynb and start executing each cell as follows.

Import Libraries

Start by Importing the required libraries.

Initialize the Super Resolution Object

First you have to create the dnn_superres constructor by the following command.

Read Image

We will start by reading and displaying a sample image. We will be running the EDSR model (with 4x scale) to upscale this image.

Extracting Model Name & Scale

In the next few steps, will be using a setModel() function in which we will pass the model’s name and its scale. We could manually do that but all this information is already present in the model’s pathname so we just need to extract the model’s name and scale using simple text processing.

model name: edsr
model scale: 4

Reading the model

Finally we will read the model, this is where all the required weights of the model gets loaded. This is equivalent to DNN module’s readnet function

Setting Model Name & Scale

Here we are setting the name and scale of the model which we extracted above.

Why do we need to do that ?

So remember when I said that this module does not require us to do preprocessing or postprocessing because it does that internally. So in order to initiate the correct pre and post-processing pipelines, the module needs to know which model we will be using and what version meaning what scale 2x, 3x, 4x etc.

Running the Network

This is where all the magic happens. In this line a forward pass of the network is performed along with required pre and post-processing. We are also making note of the time taken as this information will tell us if the model can be run in real-time or not.

As you can see it takes a lot of time, in fact, EDSR is the most expensive model out of the four in terms of computation.

It should be noted that larger your input image’s resolution is the more time its going to take in this step.

Wall time: 45.1 s

Check the Shapes

We’re also checking the shapes of the original image and the super resolution image. As you can see the model upscaled the image by 4 times.

Shape of Original Image: (262, 347, 3) , Shape of Super Resolution Image: (1200, 1200, 3)

Comparing the Original Image & Result

Finally we will display the original image along with its super resolution version. Observe the difference in Quality.

Save the High Resolution Image

Although you can see the improvement in quality but still you can’t observe the true difference with matplotlib so its recommended that you save the SR image in disk and then look at it.

Creating Functions

Now that we have seen a step by step implementation of the whole pipeline, we’ll create the 2 following python functions so we can use different models on different images by just calling a function and passing some parameters.

Initialization Function: This function will contain parts of the network that will be set once, like loading the model.

Main Function: This function will contain the rest of the code. it will also have the option to either return the image or display it with matplotlib. We can also use this function to process a real-time video.

Initialization Function

Main Function

Set returndata = True when you just want the image. This is usually done when I’m working with videos. I’ve also added a few more optional variables to the method.

print_shape: This variable decides if you want to print out the shape of the model’s output.

name: This is the name by which you will save the image in disk.

save_img: This variable decides if you want to save the images in disk or not.

Now that we have created the initialization function and a main function, lets use all 4 models on different examples

The function above displays the original image along with the SR Image.

Initialize Enhanced Deep Residual Network (EDSR, 4x Resolution)

Run the network

Shape of Original Image: (221, 283, 3) , Shape of Super Resolution Image: (884, 1132, 3)
Wall time: 43.1 s

Initialize Efficient Subpixel Convolutional Network (ESPCN, 4x Resolution)

Run the network

Shape of Original Image: (256, 256, 3) , Shape of Super Resolution Image: (1024, 1024, 3)
Wall time: 295 ms

Initialize Fast Super-Resolution Convolutional Neural Networks (FSRCNN, 3x Resolution)

Run the network

Shape of Original Image: (232, 270, 3) , Shape of Super Resolution Image: (696, 810, 3)
Wall time: 253 ms

Initialize Laplacian Pyramid Super-Resolution Network (LapSRN, 8x Resolution)

Run the network

Shape of Original Image: (302, 357, 3) , Shape of Super Resolution Image: (2416, 2856, 3)
Wall time: 26 s

Applying Super Resolution on Video

Lastly, I’m also providing the code to run Super-resolution on Videos. Although the example video I’ve used sucks, but that’s the only video I tested on primarily because I’m only interested in doing super resolution on images as this is where most of my use cases lie. Feel free to test out different models for real-time feed.

Tip: You might also want to save the High res video in disk using the VideoWriter Class.


Here’s a chart for benchmarks using a 768×512 image with 4x resolution on an Intel i7-9700K CPU for all models.

The benchmark shows PSNR (Peak signal to noise ratio) and SSIM (structural similarity index measure) scores, these are the scores which measure how good the supre res network’s output is.

The best performing model is EDSR but it has the slowest inference time, the rest of the models can work in real time.

For detailed benchmarks you can see this page.  Also make sure to check Official OpenCV contrib page on dnn_superres module

If you thought upscaling to 8x resolution was cool then take a guess on the scaling ability of the current state of the Art algorithm in super-resolution?

So believe it or not the state of the art in SR can actually do a 64x resolution…yes 64x, that wasn’t a typo.

In fact, the model that does 64x was published just last month, here’s the paper for that model, here’s the GitHub repo and here is a ready to run colab notebook to test out the code. Also here’s a video demo of it. It’s pretty rare that such good stuff is easily accessible for programmers just a month after publication so make sure to check it out.

The model is far too complex to explain in this post but the authors took a totally different approach, instead of using supervised learning they used self-supervised learning. (This seems to be on the rise).

What’s Next?

computer vision

If you want to go forward from here and learn more advanced things and go into more detail, understand theory and code of different algorithms then be sure to check out our Computer Vision & Image Processing with Python Course (Urdu/Hindi). In this course, I go into a lot of detail regarding vision fundamentals and cover a plethora of algorithms and techniques to help you master Computer Vision.

If you want to start a career in Computer Vision & Artificial Intelligence then this course is for you. One of the best things about this course is that the video lectures are in Urdu/Hindi Language without any compromise on quality, so there is a personal/local touch to it.


In today’s tutorial we learned to use 4 different architectures to do Super resolution going from 3x to 8x resolution. 

Since the library handles preprocessing and postprocessing, so the code for all the models was almost the same and pretty short.

As I mentioned earlier, I only showed you results of a single version of each model, you should go ahead and try other versions of each model.

These models have been trained using DIV2K  BSDS and General100 datasets which contains images of diverse objects but the best results from a super-resolution model is obtained by training them for a domain-specific task, for e.g if you want the SR model to perform best on pedestrians then your dataset should consist mostly of pedestrian images. The best part about training SR networks is that you don’t need to spend hours doing manual annotation, you can just resize them and you’re all set.

Also I would raise a concern regarding these models that we must be careful using SR networks, for e.g. consider this scenario:

 You caught an image of a thief stealing your mail on your low res front door cam, the image looks blurry and you can’t make out who’s in the image.

Now you being a Computer Vision enthusiast thought of running a super res network to get a clearer picture.
After running the network, you get a much clearer image and you can almost swear that it’s Joe from the next block.

The same Joe that you thought was a friend of yours.

The same Joe that made different poses to help you create a pedestrian datasets for that SR network you’re using right now.

How could Joe do this?

Now you feel betrayed but yet you feel really Smart, you solved a crime with AI right?

You Start STORMING to Joe’s house to confront him with PROOF.

Now hold on! … like really hold on.

Don’t do that, seriously don’t do that.

Why did I go on a rant like that?

Well to be honest back when I initially learned about SR networks that’s almost exactly what I thought I would do. Solve Crimes by AI by doing just that (I know it was a ridiculous idea). But soon I realize that SR networks only learn to hallucinate data based on learned data, they can’t visualize a face with 100% accuracy that they’ve never seen. It’s still pretty useful but you have to use this technology carefully.

I hope you enjoyed this tutorial, feel free to comment below and I’ll gladly reply.

Emotion / Facial Expression Recognition with OpenCV.

Emotion / Facial Expression Recognition with OpenCV.

Download Source Code

Facial Recognition

A few weeks ago we learned how to do Super-Resolution using OpenCV’s DNN module, in today’s post we will perform Facial Expression Recognition AKA Emotion Recognition using the DNN module. Although the term emotion recognition is technically incorrect (I will explain why) for this problem but for the remainder of this post I’ll be using both of these terms, since emotion recognition is short and also good for SEO since people still search for emotion recognition while looking for facial expression recognition xD.

The post is structured in the following way:

  • First I will define Emotion Recognition & its importance.
  • Then I will discuss different approaches to tackle this problem.
  • Finally, we will Implement an Emotion Recognition pipeline using OpenCV’s DNN module. 

Emotion Recognition Or Facial Expression Recognition

Now let me start by clarifying what I meant when I said this problem is incorrectly quoted as Emotion recognition. So you see by saying that you’re doing emotion recognition you’re implying that you’re actually finding the emotion of a person whereas in a typical AI-based emotion recognition system you’ll find around and the one that we’re gonna built looks only at a single image of a person’s face to determine the emotion of that person. Now, in reality, our expression may at times exhibit what we feel but not always. People may smile for a picture or someone may have a face that inherently looks gloomy & sad but that doesn’t represent the person’s emotion. 

So If we were to build a system that actually recognizes the emotions of a person then we need to do more than look at a simple face image. We would also consider the body language of a person through a series of frames, so the network would be a combination of an LSTM & a CNN network. Also for a more robust system, we may also incorporate a voice tone recognition AI as the tone of a voice, and speech patterns tell a lot about the person’s feelings.

Watch this part of the interview of Lisa Feldman Barret who debunks these so-called Emotion recognition systems.

Since today we’ll only be looking at a single face image so it’s better to call our task Facial Expression Recognition rather than Emotion recognition.

Facial Expression Recognition Applications:

Monitoring facial expressions of several people over a period of time provides great insights if used carefully, so for this reason we can use this technology in the following applications.

1: Smart Music players that play music according to your mood:

Think about it, you come home after having a really bad day, you lie down on the bed looking really sad & gloomy and then suddenly just the right music plays to lift up your mood.

2: Student Mood Monitoring System:

Now a system that cleverly averages the expressions of multiple students over a period of time can get an estimate of how a particular topic or teacher is impacting students, does the topic being taught stresses out the students, is a particular session from a teacher a joyful experience for students. 

3: Smart Advertisement Banners:

Think about smart advertisement banners that have a camera attached to it, when a commercial airs, it checks real-time facial expressions of people consuming that ad and informing the advertiser if the ad had the desired effect or not. Similarly, companies can get feedback if customers liked their products or not without even asking them.

Also, check out this video in which the performance of a new Ice Cream flavor is tested on people using their expressions.

These are just some of the applications from top of my head, if you start thinking about it you can come up with more use cases. One thing to remember is that you have to be really careful as how you use this technology. Use it as an assistive tool and do not completely rely on it. For e.g don’t deploy on Airport and start interrogating every other black guy who triggers Angry expressions on the system for a couple of frames.

Facial Expression Recognition Approaches:

So let’s talk about the ways we could go about recognizing someone’s facial expressions. We will look at some classical approaches first then move on to deep learning.

Haar Cascades based Recognition:

Perhaps the oldest method that could work are Haar Cascades. So essentially these Haar Cascades also called viola jones Classifier is an outdated Object detection technique by Paul Viola and Michael Jones in 2001. It is a machine learning-based approach where a cascade is trained from a lot of positive and negative images. It is then used to detect objects in images.

The most popular use of these cascades is as a face detector which is still used today, although there are better methods available. 

Now instead of using face detection, we could train a cascade to detect expressions. Since you can only train a single class with a cascade so you’ll need multiple cascades. A better way to go about is to first perform face detection then look for different features inside the face ROI, like detecting a smile with this smile detection cascade. You can also train a frown detector and so on.

Truth be told, this method is so weak that I wouldn’t even try experimenting with this in this time and era but since people have used this in the past so I’m just putting it there.

Fisher, Eigen & LBPH based Recognition:

OpenCV’s built-in face_recognition module has 3 different face recognition algorithms, Eigenfaces face recognizer,  Fisherfaces face recognizer and Local binary patterns histograms (LBPH) Face Recognizer.

If you’re wondering why am I mentioning face recognition algorithms on a facial expression recognition post, So understand this,  these algorithms can extract some really interesting features like principal components and local histograms which you can then feed into an ML classifier like SVM, so in theory, you can repurpose them for emotion recognition, only this time the target classes are not the identities of people but some facial expressions. This will work best if you have a few classes, ideally 2-3. I haven’t seen many people work on emotion like this but take a look at this post in which a guy uses Fisher faces for facial expression recognition.

Again I would mention this is not a robust approach, but would work better than the previous one.

Histogram Oriented Gradients based Recognition (HOG):

Now similar to the above approach instead of using the face_recognizer module to extract features you can extract HOG features of faces, HOG based features are really effective. After extracting HOG features you can train an SVM or any other Machine learning classifier on top of it.

Custom Features with Landmark Detection:

One of the easiest and effective ways to create an emotion recognition system is to use a landmark detector like the one in dlib which allows you to detect 68 important landmarks on the face.

By using this detector you can extract facial features like eyes, eyebrows, mouth, etc. Now you can take custom measurements of these features like measuring the distance between the lip ends to detect if the person is smiling or not. Similarly, you can measure if the eyes are wide open or not, indicating surprise or shock.

Now there are two ways to go about it, either you can send these custom measurements to an ML classifier and let it learn to predict emotions based on these measurements or you use your own heuristics to determine when to call it happy, sad etc based on the measurements.

I do think the former approach is more effective than the latter. But if you’re just determining a singular emotion like if a person is smiling or not then it’s easier to use heuristics.

Deep Learning based Recognizer:

It should not come as a surprise that the State of the Art approach to detect emotions would be a deep learning-based approach. Let me explain how you would create a simple yet effective emotion recognizer system. So what you would simply do is train a Convolutional Neural Network (CNN) on different facial expression images (Ideally thousands of images for each class/emotion) and after the training showed it new samples and if done right it would perform better than all the above approaches I’ve mentioned.

Now that we have discussed different approaches, let’s move on to the coding part of the blog. 

Facial Expression Recognition in OpenCV

We will be using a deep learning classifier that will be loaded to the OpenCV DNN module. The authors trained this model using MS Cognitive Toolkit (formerly CNTK) and then converted this model to ONNX (Open neural network exchange ) format.

ONNX format allows developers to move models between different frameworks such as CNTK, Caffe2, Tensorflow, PyTorch etc.

There is also a javascript version of this model (version 1.2) with a live demo which you can check out here. In this post we will be using version 1.3 which has a better performance.

You can also look at the original source code used to train this model here, the authors explained the architectural details of their model in their research paper titled Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution.

In the paper, the authors demonstrate training a deep CNN using 4 different approaches: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. The model that we are going to use today was trained using cross-entropy loss, which according to the author’s conclusion was one of the best performing models.

The model was trained on FER+ dataset,  FER dataset was the standard dataset for emotion recognition task but in FER+ each image has been labeled by 10 crowd-sourced taggers, which provides a better quality of ground truth label for still image emotion than the original FER labels.

More information about the ONNX version of the model can be found here.

The input to our emotion recognition model is a grayscale image of 64×64 resolution. The output is the probabilities of 8 emotion classes: neutral, happiness, surprise, sadness, anger, disgust, fear, and contempt.

Here’s the architecture of the model.

Here are the steps we would need to perform:

  1.  Initialize the Dnn module.
  2. Read the image.
  3. Detect faces in the image.
  4. Pre-process all the faces.
  5. Run a forward pass on all the faces.
  6. Get the predicted emotion scores and convert them to probabilities.
  7. Finally get the emotion corresponding to the highest probability

Make sure you have the following Libraries Installed.

  • OpenCV ( possibly Version 4.0 or above)
  • Numpy
  • Matplotlib
  • bleedfacedetector

Bleedfacedetector is my face detection library which can detect faces using 4 different algorithms. You can read more about the library here.

You can install it by doing:

pip install bleedfacedetector

Before installing bleedfacedetector make sure you have OpenCV & Dlib installed.

pip install opencv-contrib-python

To install dlib you can do:

pip install dlib
pip install dlib==19.8.1

Directory Hierarchy

You can go ahead and download the source code from the download code section. After downloading the zip folder, unzip it and you will have the following directory structure.

You can now run the Jupyter notebook Facial Expression Recognition.ipynb and start executing each cell as follows.

Import Libraries

Initialize DNN Module

To use Models in ONNX format, you just have to use cv2.dnn.readNetFromONNX(model) and pass the model inside this function.

Read Image

This is our image on which we are going to perform emotion recognition.

  • Line 2: We’re reading the image form disk.
  • Line 5-6 : We’re setting the figure size and showing the image with matplotlib, [:,:,::-1] means to reverse image channels so we can show OpenCV BGR images properly in matplotlib. OpenCV BGR images.

Define the available classes / labels

Now we will create a list of all 8 available emotions that we need to detect.

Detect faces in the image

The next step is to detect all the faces in the image, since our target image only contains a single face so we will extract the first face we find. 

Line 4: We’re using an SSD based face detector with 20% filter confidence to detect faces, you can easily swap this detector with any other detector inside bleedfacedetector by just changing this line.

Line 7: We’re extracting the x,y,w,h coordinates from the first face we found in the list of faces.

Line 10-13: We’re padding the face by a value of 3, now this expands the face ROI boundaries, this way the model takes a look at a larger face image when predicting. I’ve seen this improves results in a lot of cases, Although this is not required.

Padded Vs Non Padded Face

Here you can see what the final face ROI looks like when it’s padded and when it’s not padded.

Pre-Processing Image

Before you pass the image to a neural network you perform some image processing to get the image in the right format. So the first thing we need to do is convert the face from BGR to Grayscale then we’ll resize the image to be of size 64x64. This is the size that our network requires. After that we’ll reshape the face image into (1, 1, 64, 64), this is the final format which the network will accept.

Line 2: Convert the padded face into GrayScale image
Line 5: Resize the GrayScale image into 64 x 64
Line 8: Finally we are reshaping the image into the required format for our model

Input the preprocessed Image to the Network

Forward Pass

Most of the Computations will take place in this step, This is the step where the image goes through the whole neural network.

Check the output

As you can see, the model outputs scores for each emotion class.

Shape of Output: (1, 8)
[[ 0.59999390 -0.05662632 7.5.22 -3.5109.508 -0.33268.593 -3.967.581.5 9.2001578 -3.1812003 ]]

Apply Softmax function to get probabilities:

We will convert the model scores to class probabilities between 0-1 by applying a Softmax function on it.

[9.1010029e-04 4.7197891e-04 9.6490067e-01 1.491846e-05
3.5819356e-04 9.4487186e-06 3.3313509e-02 2.1165248e-05]

Get Predicted emotion

Predicted Emotion is: Surprise

Display Final Result

We already have the correct prediction from the last step but to make it more cleaner we will display the final image with the predicted emotion, we will also draw a bounding box over the detected face.

Creating Functions

Now that we have seen a step by step implementation of the network, we’ll create the 2 following python functions.

Initialization Function: This function will contain parts of the network that will be set once, like loading the model.

Main Function: This function will contain all the rest of the code from preprocessing to postprocessing, it will also have the option to either return the image or display it with matplotlib.

Furthermore, the Main Function will be able to predict the emotions of multiple people in a single image, as we will be doing all the operations in a loop.

Initialization Function

Main Function

Set returndata = True when you just want the image. I usually do this when working with videos.

Initialize the Emotion Recognition

Call the initialization function once.

Calling the main function

Now pass in any image to the main function 

Real time emotion recognition on Video:

You can also take the above main function that we created and put it inside a loop and it will start detecting facial expressions on a video, below code detects emotions using your webcam in real time. Make sure to set returndata = True


Here’s the confusion matrix of the model from the author’s paper. As you can see this model is not good at predicting Disgust, Fear & Contempt classes.

You can try running the model on different images and you’ll also agree with the above matrix, that the last three classes are pretty difficult to predict, particularly because It’s also hard for us to differentiate between these many emotions based on just facial expression, a lot of micro expressions overlap between these classes and so it’s understandable why the algorithm would have a hard time differentiating between 8 different emotional expressions.

Improvement Suggestions:

Still, if you really want to detect some expressions that the model seems to fail on then the best way to go about is to train the model yourself on your own data. Ethnicity & color can make a lot of difference. Also, try removing some emotion classes so the model can focus only on those that you care about.

You can also try changing the padding value, this seems to help in some cases.

If you’re working on a live video feed then try to average the results of several frames instead of giving a new result on every new frame. 

What’s Next?

computer vision

If you want to go forward from here and learn more advanced things and go into more detail, understand theory and code of different algorithms then be sure to check out our Computer Vision & Image Processing with Python Course (Urdu/Hindi). In this course, I go into a lot of detail regarding vision fundamentals and cover a plethora of algorithms and techniques to help you master Computer Vision.

If you want to start a career in Computer Vision & Artificial Intelligence then this course is for you. One of the best things about this course is that the video lectures are in Urdu/Hindi Language without any compromise on quality, so there is a personal/local touch to it.


In this tutorial we first learned about the Emotion Recognition problem, why it’s important, and what are the different approaches we could take to develop such systems.

Then we learned to perform emotion recognition using OpenCV’s DNN module. After that, we went over some ways on how to improve our results.

I hope you enjoyed this tutorial. If you have any questions regarding this post then please feel free to comment below and I’ll gladly answer them.

Checkout Bleed AI Premium Subscribers. This will give you access to Graded Quizzes, Premium Colab Notebooks, Priority Support, Course discounts & Practice Assignments You can Sign Up for the membership here. It’s Free.

Computer Vision Crash Course with OpenCV & Python

Computer Vision Crash Course with OpenCV & Python

Download Source Code

computer vision

If you’re looking for a single stand-alone Tutorial that will give you a good overall taste of the exciting field of Computer Vision using OpenCV then this is it. This Tutorial will serve as a Crash Course to learn the basics of OpenCV Library. 

What is OpenCV:

OpenCV (Open Source Computer Vision ) is the biggest library for Computer Vision which contains more than 2500 optimized algorithms that can be used to do face detection, action recognition, image stitching, extracting 3d models, generating point clouds, augmented reality and a lot more.

So if you’re planning to perform Computer Vision weather on a deep learning project or on a Raspberry pie or you want to make a career in it then at some point you will definitely cross paths with this library. So it’s better that you get started with it today.

About this Crash Course Course:

Since I’m a member of the Official OpenCV.org Course team and this blog Bleed AI is all about making you master Computer Vision, so I feel I’m in a very good position to teach you about this library and that too in a single post.

Of Course, we won’t be able to cover a whole lot as I said it contains over 2500 algorithms, still, after going through this course you will be able to get a grip on fundamentals and built some interesting things.

Prerequisite: To follow along with this course it’s important that you are familiar with Python language & you have python installed in your system.

Make sure to download the Source code above to try out the code.

Let’s get Started

Crash Course Outline:

Here’s the outline for this course.

Installing OpenCV-python:

The easiest way to install OpenCV is by using a package manager like e.g. with pip.

So you can just Open Up the command prompt and run the following command:

pip install opencv-contrib-python

By doing the above, you will install opencv along with its contrib package which contains some extra algorithms. If you don’t need the extra algorithms then you can also run the following command:

pip install opencv-python

Make sure to install Only one of the above packages, not both. There are also some headless versions of OpenCV which do not contain any GUI functions, you can look those here.

The other Method to install OpenCV is installing it from the source. Now installing from the source has its perks but it’s much harder and I recommend only people who have prior experience with OpenCV attempting this. You can look at my tutorial of installing from the source here.

Note: Before you can install OpenCV, you must have numpy library installed on your system. You can install numpy by doing:

pip install numpy

After Installing OpenCV you should check your installation by opening up the command prompt or anaconda prompt, launching python interpreter by typing python and then importing OpenCV by doing: import cv2

Reading & Displaying an Image:

After installing OpenCV we will see how we can read & display an image in OpenCV. You can start by running the following cell in the jupyter notebook you downloaded from the source code section.

In OpenCV you can read an image by using the cv2.imread() function. 

image = cv2.imread(filename, [flag])

Note: The Square brackets i.e. [ ] are used to denote optional parameters


  • filename: Name of the image or path to the image.
  • flag: There are numerous flags but three most important ones are these: 0,1,-1

If you pass in 1 the image is read in Color, if 0 is passed the image is read in Grayscale (Black & white) and if -1 is used to read transparent images which we will learn in the next chapter, If no parameter is passed the image will be read in Color.

Line 1-5: Importing Opencv and numpy library.
Line 8: We are reading our image in grayscale, this function will read the image in a numpy array format.
Line 11: We are printing our image.


Now just by looking at the above output, you can get a lot of information about the image we used.

Take a guess on what’s written in the image

Go ahead …I’ll wait.

If you guessed the number 2 then congrats you’re right. In fact, there is a lot more information that you can extract from the above output. For e.g, I can tell that the size of the image is (21x22). I can also say that number 2 is written in white on a black image and is written in the middle of the image.

How was I able to get all that…especially considering I’m no Sherlock?

The size of the image can easily be extracted by counting the number of rows & columns. And since we are working with a single channel grayscale image, the values in the image represent the intensity of the image, meaning 0 represents black and 255 white color, and all the numbers between 0 and 255 are different shades of gray.

You can look at the colormap below of a Grayscale image to understand it better.

Beside’s counting the rows and columns, you can just use the shape function on a numpy array to find its shape


(21, 22)

The values returned are in rows, columns or you can call it height, width, or x,y. If we were dealing with a color image then img.shape would have returned height, width, channels.

Now it’s not ideal to print images, especially when they are 100s of pixels in width and height, so let’s see how we can show images with OpenCV.

To display an Image there are generally 3 steps. There are generally 3 steps involved in displaying an image properly.

Step 1: Use cv2.imshow() to show images.

cv2.imshow(Window_Name, img)


  • Window_Name: Any custom name you  assign to your window
  • img: Your image either be in uint8 datatype or float datatype having range 0-1.

Step 2: Also with cv2.imshow() you will have to use the cv2.waitKey() function. This function is a keyboard binding function. Its argument is the time in milliseconds. The function waits for specified milliseconds. If you press any key in that time frame, the program continues. If 0 is passed, it waits indefinitely for a keystroke. This function returns the ASCII value of the keyboard key pressed, for e.g. if you press ESC key then it will return 27 which is the ASCII value for the ESC key. For now, we won’t be using this returned value.


Note: The default delay is 0 which means wait forever until the user presses a key.

Step 3: The last step is to destroy the window we created so the program can end, now this is not required to view the image but if you don’t destroy the window then you can’t proceed to end the program and it can crash, so to destroy the windows you will do:


This will destroy all present image windows, there is also a function to destroy a specific window.
Now let’s see the full code in action.

Line 5: I’m resizing the image by 1000%  or by 10 times in both x and y direction using the function cv2.resize() since the original size of the image is too small. I will later discuss this function.
Line 8-14:  Showing the image and waiting for a keypress. Destroying the image when there is a keypress.


Accessing & Modifying Image Pixels and ROI:

For this example, I will be reading this image which is from one of my favorite Anime series.

You can access individual pixels of the image and modify them. Now before we get into that lets understand how an image is represented in OpenCV. We already know it’s a numpy array. But besides that you can find out other properties of the image.


The data type of the Image is: uint8
The dimensions of the Image is: 3

So the datatype of images you read with OpenCV is uint8 and if its a color image then it’s a 3-dimensional image.
Let’s talk about these dimensions. First 2 are the width and the height and the 3rd are the image channels. Now these are B (blue), G (green), & R (red) channels. In OpenCV due to historical reasons, colored images are stored in BGR instead of the common RGB format.

You can access any individual pixel value bypassing its x,y location in the image.


[143 161 168]

The tuple output above means that at location (300,300) the value for the blue channel is 143, the green channel is 161 and the red channel is 168.

Just like we can read individual image pixels, we can modify them too.

I’ve just made the pixel at location (300,300) black. Because I’ve only modified a single pixel the change is really hard to see. So now we will modify an ROI (Region of Interest) of the image so that we can see our changes.

Modifying a whole ROI is pretty similar to modifying a pixel, you just need to select a range instead of a single value.

Line 1-2: We are making a copy of the image so we don’t end up modifying the original.

Line 4-5: We are setting all pixels in x range 100-150 and in y range 80-120 equal to 0 (meaning black). Now, this should give us a black box in the image.

Line 8-10 :  Showing the image and waiting for a keypress. Destroying the image when there is a keypress.


Resizing an Image:

You can use cv2.resize() function to resize an image. You have 2 ways of resizing the image, either by passing in the new width & height values or by passing in percentages of those values using fx and fy params. We have already seen how we used the second method to resize the image to 10x its size so now I’m going to show you the first method.

You can see below both the original and the resized version of the image.


An obvious problem with the above approach you can see is that it’s not maintaining the aspect ratio of the image which is why the image looks distorted to you. A better approach would be to resize a single dimension at a time and shrink or expand the other dimension accordingly.

Resizing While Keeping the Aspect Ratio Constant:

So let’s resize the image while keeping the aspect ratio constant. This time we are going to resize the width to 300 and modify the height respectively.

Line 4-5: We are extracting the shape of the image, [:-1] indicates that I don’t want channels returned, just height and width. This ensures your code works for both color and grayscale images.

Line 7-11: We are calculating the ratio of the new width to the old width and then multiplying this ratio value with the height for getting the new value of the height. The logic behind is this, if we resized a 600 px width image to 300 px width then we would get a ratio of 0.5 and if the height was 200 px then by multiplying 0.5 with the height, we would get a new value of 100 px, by using these new values we won’t get any distortions.


Geometric Transformations:

You can apply different kinds of transformations to the image, now there are some complex transformations but for this post, I will only be discussing translation & rotation. Both of these are types of Affine transformations. So we will be using a function called warpAffine() to transform them.

transformed_ image = cv2.warpAffine(src, M, dsize[, dst[, flags[, borderMode[, borderValue]]]])


  • src: input image.
  • M: 2×3 transformation matrix.
  • dsize: size of the output image.
  • flags: combination of interpolation methods.
  • borderMode: pixel extrapolation method (see BorderTypes) by default its constant border.
  • borderValue: value used in case of a constant border; by default, it is 0, which means replaced values will be black.

Now you pass in a 2×3 Matrix into the warpaffine function which does the required transformation, the first two 2 columns of the matrix control, rotation, scale and shear, and the last column encodes translation (shift) of image.

Again, we will only focus on translation and rotation in this post.


Translation is the shifting of an object’s location, meaning the movement of image in x, y-direction. Suppose you want the image to move tx amount of pixels in the x-direction and ty amount of pixels in y-direction then you will construct below transformation matrix accordingly and pass it into the warpAffine function.

So now you just need to change the tx and ty values here for translation in x and y direction.

Line 6-10: We’re constructing the translation matrix so we move 120 px in x-direction and 40 px in the negative y-direction.



Similarly, we can also rotate an image, by passing a matrix into the warpaffine function. Now instead of designing a matrix for rotation, I’m going to use a built-in function called cv2.getRotationMatrix2D() which will return a rotation matrix according to our specifications.

M = cv2.getRotationMatrix2D( center, angle, scale )


  • center: This is the center of the rotation in the source image.
  • Angle: The Rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the – top-left corner).
  • Scale: scaling factor.


Note: If you don’t like the black pixels that appear after translation or rotation then you can use a different border filling method, look at the available borderModes here.

Drawing on Image:

Let’s take a look at some drawing operations in OpenCV. So now we will learn how to draw a line, a circle, and a rectangle on an image. We will also learn to put text on the image. Since each drawing function will modify the image so we will be working on copies of the original image. We can easily make a copy of image by doing: img.copy()

Most of the drawing functions have below parameters in common.

  • img : Your Input Image
  • color : Color of the shape for a BGR image, pass it as a tuple i.e. (255,0,0), for Grayscale image just pass a single scalar value from 0-255.
  • thickness : Thickness of the line or circle etc. If -1 is passed for closed figures like circles, it will fill the shape. default thickness = 1
  • lineType : Type of line, popular choice is cv2.LINE_AA .

Drawing a Line:

We can draw a line in OpenCV by using the function cv2.line(). We know from basic geometry you can draw a line, you just need 2 points. So you’ll pass in coordinates of 2 points in this function.

cv2.line(img, pt1, pt2, color, [thickness])


  • pt1: First point of the line, this is a tuple of (x1,y1) point.
  • pt2: Second point of the line, this is a tuple of (x2,y2) point.


Drawing a Circle

We can draw a Circle in OpenCV by using the function cv2.Circle(). For drawing a circle we just need a center point and a radius.

cv2.circle(img, Center, radius, color, [thickness])


  • center: Center of the circle.
  • radius: Radius of the circle.


Drawing a Rectangle

We can also draw a rectangle in OpenCV by using the function cv2.rectangle(). You just have to pass two corners of a rectangle to draw it. It’s similar to the cv2.line function.

cv2.rectangle(img, pt1, pt2, color, [thickness])


  • pt1: Top left corner of the rectangle.
  • pt2: bottom right corner of the rectangle.


Putting Text:

Finally, we can also write Text by using the function cv2.putText(). Writing Text on images is an essential tool, you will be able to see real-time stats on image instead of just printing. This is really handy when you’re working with videos.

cv2.putText(img, text, origin, fontFace, fontScale, color, [thickness])


  • text: Text string to be drawn.
  • origin: Top-left corner of the text (x,y) origin position.
  • fontFace: Font type, we will use cv2.FONT_HERSHEY_SIMPLEX.
  • fontScale: Font scale, how large your text will be.


Cropping an Image:

We can also crop or slice an image, meaning we can extract any specific area of the image using its coordinates, the only condition is that it must be a rectangular area. You can segment irregular parts of images but the image is always stored as a rectangular object in the computer, this should not be a surprise since we already know that images are matrices. Now let’s say we wanted to crop naruto’s face then we would need four values namely X1 (lower bound on the x-axis),  X2 (Upper bound on y-axis), Y1 (lower bound on the y-axis) and Y2 (Upper bound on the y-axis)

After getting these values, you will pass them in like below.

face_roi = img [Start X : End X, Start Y: End Y]

Lets see the full script

Line 2: We are passing in the coordinates to crop naruto’s face, you can get these coordinates by several methods, some are of them are: by trial and error, by hovering the mouse over the image when using matplotlib notebook magic command, or by hovering over the image when you have installed OpenCV with QT support or by making a mouse click function that splits x,y coordinates. 


Note: If you’re gonna modify the cropped ROI, then it’s better to make a copy of it, otherwise modifying the cropped version would also affect the original. 

You can make a copy like this:

face_roi  = img[100:270,300:450].copy()

Image Smoothing/Blurring:

Smoothing or blurring an Image in OpenCV is really easy. If you’re thinking about why we would need to blur an image then understand that It’s very common to blur/smooth an image in vision applications, this reduces noise in the image. The noise can be present due to various factors like maybe the sensor by which you took the picture was corrupted or it malfunctioned, or environmental factors like the lightning was poor, etc. Now there are different types of blurring to deal with different types of noises and I have discussed each method in detail and even done a comparison between them inside our Computer Vision Course but for now, we will briefly look at just one method, the Gaussian Blurring method. This is the most common image smoothing technique used. It gets rid of Gaussian Noise. In simple words, this will work most of the time. 

Smoothed_image  = cv2.GaussianBlur(source-image, ksize, sigmaX)


  • source-image:  Our input image
  • ksize: Gaussian kernel size. kernel width and height can differ but they both must be positive and odd. 
  • sigmaX: Gaussian kernel standard deviation in X direction.

Again to keep this short, I won’t be getting into the math nor the parameter details for how this function works, although it’s really interesting. One thing you need to learn is that by controlling the kernel size you control the level of smoothing done. There is also a SigmaX and a SigmaY parameter that you can control.


Thresholding an Image:

For this section, we will be using this image.

There are times when we need a binary black & white mask of the image, where our target object is in white and the background black or vice versa. The easiest way to get a mask of our image is to threshold our image. There are different types of thresholding methods, I’ve introduced most of them in our Computer Vision course but for now, we are going to discuss the most basic and most used one. So what thresholding does is that it checks each pixel in the image against a threshold value and If the pixel value is smaller than the threshold value, it is set to 0, otherwise, it is set to the maximum value, (this maximum value is usually 255 so white color).

ret, thresholded_image = cv2.threshold(source_image, thresh, max_value, type)


  • Source_image: This is your input image.
  • thresh: Threshold value. (If you use THRESH_BINARY then all values above this are set to max_value.)
  • max_value: Maximum value, normally this is set to be 255.
  • type: Thresholding type. The most common types are THRESH_BINARY  & THRESH_BINARY_INV
  • ret: Boolean variable which tells us if thresholding was successful or not.

Now before you can threshold an image you need to convert the image into grayscale, now you could have loaded the image in grayscale but since we have a color image already we can convert to grayscale using cv2.cvtColor function. This function can be used to convert one color to different color formats for this post we are only concerned with the grayscale conversion.


Now that we have a grayscale image, we can apply our threshold.

Line 2: We are applying a threshold such that all pixels having an intensity above 220 are converted to 255 and all pixels below 220 become 0.


Now Let’s see the result of the inverted threshold, which just reverses the results above. For this you just need to pass in cv2.THRESH_BINARY_INV instead of cv2.THRESH_BINARY.


Edge Detection:

Now we will take a look at edge detection, why edge detection? Well edges encode the structure of the images and it encodes most of the information in the images so for this reason edge detection is an integral part of many Vision applications.

In OpenCV there are edge detectors such as Sobel filters and laplacian filters but the most effective is the Canny Edge detector. In our Computer Vision Course I go into detail of exactly how this detector works but for now let’s take a look at its implementation in OpenCV.

edges = cv2.Canny( image, threshold1, threshold2)


  • image: This is our input image.
  • edges: output edge map; single channels 8-bit image, which has the same size as image .
  • threshold1: First threshold for the hysteresis procedure.
  • threshold2: Second threshold for the hysteresis procedure.

Line 1-2: I’m detecting edges with lower and upper hysteresis values being 30 and 150. I can’t explain how these values work without going into the theory so, for now, understand that for any image you need to tune these 2 threshold values to get the correct results.


Contour Detection:

Contour detection is one of my most favorite topics because with just contour detection you can do a lot and I’ve built a number of cool applications using contours.

Contours can be defined simply as a curve joining all the continuous points (along the boundary), having the same color or intensity. In simple terms think of contours as white blobs on black background for e.g. in the output of threshold function or the edge detection function, each shape can be considered as an individual contour. So you can segment each shape, localize them or even recognize them.

The contours are a useful tool for shape analysis, object detection, and recognition, take a look at this detection and recognition application I’ve built using contour detection.

You can use this function to detect contours.

contours, hierarchy = cv2.findContours(image, mode, method[, offset])


  • image Source: This is your input image in binary format, this is either a black & white image obtained from a thresholding or a similar function or the output of a canny edge detector.
  • mode: Contour retrieval mode, for example  cv2.RETR_EXTERNAL mode lets you extract only external contours meaning if there is a contour inside a contour then that child contour will be ignored. You can see other RetrievalModes here
  • method: Contour approximation method, for most cases cv2.CHAIN_APPROX_NONE works just fine.

After you detect contours you can draw them on the image by using this function.

cv2.drawContours( image, contours, contourIdx, color, [thickness] )


  • image: original input image.
  • contours: This is a list of contours, each contour is stored as a vector.
  • contourIdx: Parameter indicating which contour to draw. If it is -1 then all the contours are drawn.
  • color: Color of the contours.

Line 7 Using an Inverted threshold as the shapes need to be white and background black.

Line 10: Detecting contours on the thresholded image and drawing it on the image copy.

Line 13: Draw detected Contours.


You can also get the number of objects or shapes present by counting the number of contours.


Total Shapes present in image are: 6

Since there are 6 shapes in the above image we are seeing 6 detected contours.

Like I said with contours you can build some really amazing things, In our Computer Vision course, I’ve discussed contours in a lot of depth and also created several steps by step applications with it. For e.g. take a look at this Virtual Pen & Eraser post that I created on LearnOpencv.

Morphological Operations:

In this section, we will take a look at morphological operations. This is one of the most used preprocessing techniques to get rid of noise in binary (black & white) masks. They need two inputs, one is our input image and a kernel (also called a structuring element) which decides the nature of the operation. Two very common morphological operations are Erosion and Dilation. Then there are other variants like Opening, Closing, Gradient, etc.

In this post, we will only be looking at Erosion & Dilation. These are all you need in most cases.


The fundamental idea of erosion is just like how it sounds, it erodes (eats away or eliminates) the boundaries of foreground objects (Always try to keep foreground in white). So what happens is that a kernel slides through the image. A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels under the kernel is 1, otherwise, it is eroded (made to zero).

Erosion decreases the thickness or size of the foreground object or you can simply say the white region of image decreases. It is useful for removing small white noises.

eroded_image = cv2.erode(source_image, kernel, [iterations] )


  • source_image: Input image.
  • kernel: Structuring element or filter used for erosion if None is passed then, a 3x3 rectangular structuring element is used. The bigger kernel you create the stronger the impact of erosion on the image.
  • iterations: Number of times erosion is applied, the larger the number, greater the effect. 

We will be using this image for erosion, Notice the white spots, with erosion we will attempt to remove this noise. 

Line 5: Making a 7x7 kernel, bigger the kernel the stronger the effect.

Line 8: Applying erosion with 2 iterations, the values for kernel and iterations should be tuned according to your own images.


As you can see the white noise is gone but there is a small problem, our object (person) has become thinner. We can easily fix this by applying dilation which is the opposite of erosion.


It is just the opposite of erosion. It increases the white region in the image or size of the foreground object increases. So essentially dilation expands the boundary of Objects. Normally, in cases like noise removal, erosion is followed by dilation. Because, erosion removes white noises, but it also shrinks our object like we have seen in our example. So now we dilate it. Since noise is gone, they won’t come back, but our object area increases.

Dilation is also useful for removing black noise or in other words black holes in our object. So it helps in joining broken parts of an object.

dilated_image  = cv2.dilate( source_image, kernel, [iterations])

The parameters are the same as erosion.

We will attempt to fill up holes/gaps in this image.


computer vision

See the black holes/gaps are gone. You will find a combination of erosion and dilation used across many image processing applications.

Working with Videos:

We have learned how to deal with images in OpenCV, now let’s work with Videos in OpenCV. First, it should be clear to you that any operation you perform on images can be done on videos too since a video is nothing but a series of images, for e.g. Consider a 30 FPS video, which means this video shows 30 Frames (images) each second.

There are multiple ways to work with images in OpenCV, you first you have to initialize the camera Object by doing this:

cap = cv2.VideoCapture(arg)

Now there are 4 ways we can use the videoCapture Object depending what you pass in as arg:

1. Using Live camera feed: You pass in an integer number i.e. 0,1,2 etc e.g. cap = cv2.VideoCapture(0), now you will be able to use your webcam live stream.

2. Playing a saved Video on Disk: You pass in the path to the video file e.g. cap = cv2.VideoCapture(Path_To_video).

3. Live Streaming from URL using Ip camera or similar: You can stream from a URL e.g. cap = cv2.VideoCapture(protocol://host:port/script_name?script_params|auth) Note, that each video stream or IP camera feed has its own URL scheme.

4. Read a sequence of Images: You can also read sequences of images but this is not used much.

The next step After Initializing is read from video frame by frame, we do this by using cap.read().

ret, frame = cap.read()

  • ret:: A boolean variable which either returns True if the frame was successfully read otherwise False if it fails to read the next frame, this is a really important param when working with videos since after reading the last frame from the video this parameter will return false meaning it can’t read the next frame so we know we can exit the program now.
  • frame: This will be a frame/image of our video. Now everytime we run cap.read() it will give us a new frame so we will put cap.read() in a loop and show all the frames sequentially , it will look like we are playing a video but actually we are just displaying frame by frame.

After exiting the loop there is one last thing you must do, you must release the cap object you created by doing cap.release() otherwise your camera will stay on even after the program ends. You may also want to destroy any remaining windows after the loop.

Line 2: Initializing the VideoCapture object, if you’re using a usb cam then this value can be 1, 2 etc instead of 0

Line 6-17: looping and reading frame by frame from the camera, making sure it’s not corrupted and then converting to grayscale.

Line 24-25:  Check if the user presses the q under 1 millisecond after the imshow function, if yes then exit the loop. The ord() method converts a character to its ASCII value so we can compare it with the returned ASCII value of waitKey() method.

Line 28: Release the camera otherwise your cameras will be left on and the program will exit, this will cause problems the next time you run this cell.

Face Detection with Machine Learning:

In this section we will work with a machine learning-based face detection model, the model we are going to use is a Haar cascade based face detector. It’s the oldest known face detection technique that is still used today at some capacity, although there are more effective approaches, for e.g. take a look at Bleedfacedetector, a python library that I built a year back. It lets users use 4 different types of face detectors by just changing a single line of code.

This Haar Classifier has been trained on several positive (images with faces) and negative (images without faces) images. After training it has learned to recognize faces. 

Before using the face detector, you first must initialize it.


  • xml_model_file: This is your trained haar cascade model in a .xml file

Detected_faces = cv2.CascadeClassifier.detectMultiScale( image, [ scaleFactor], [minNeighbors])


  • Image: This is your input image.
  • scaleFactor: Parameter specifying how much the image size is reduced at each pyramid scale.
  • minNeighbors: Parameter specifying how many neighbors each candidate rectangle should have to retain it.

I’m not going to go into the details of this classifier so you can ignore the definitions of scaleFactor & minNeighbors and just remember that you should tune the value of scaleFactor for controlling speed/accuracy tradeoff. Also increase the number of minNeihbors if you’re getting lots of false detections. There is also a minSize & a maxSize parameter which I’m not discussing for now.

Let’s detect all the faces in this image.

computer vision

Line 8: We are performing face detection and obtaining a list of faces.

Line 11-13: Looping through each face in the list & drawing a rectangle using its coordinates on the image.The list of faces is an array, of x,y,w,h coordinates so an Object (face) is represented as 4 numbers, x,y is the top left corner of the object (face) and w,h is the width and height of the object (face). We can easily use these coordinates to draw a rectangle on the face.


computer vision

As you can see almost all faces were detected in the above image. Normally you don’t make deductions regarding a model based on a single image but if I were to make one then I’d say this model is racist or in ML terms this model is biased towards white people.

One issue with these cascades is that they will fail when the face is rotated or is tilted sideways or occupied but no worries you can use a stronger SSD based face detection using bleedfacedetecor.

There are also other Haar Cascades besides this face detector that you can use, take a look at the list here. Not all of them are good but you should try the eye & pedestrian cascades.

Image Classification with Deep Learning:

In this section, we will learn to use an image Classifier in OpenCV. We will be using OpenCV’s built-in DNN module. Recently I made a tutorial on performing Super Resolution with DNN module. The DNN module allows you to use pre-trained neural networks from popular frameworks like TensorFlow, PyTorch, ONNX etc. and use those models directly in OpenCV. One problem is that the DNN module does not allow you to train neural networks. Still, it’s a powerful tool, let’s take a look at an image classification pipeline using OpenCV.

Note: I will create a detailed post on OpenCV DNN module in a few weeks, for now I’m keeping this short.

DNN Pipeline 

Generally there are 4 steps when doing deep learning with DNN module.

  1. Read the image and the target classes.
  2. Initialize the DNN module with an architecture and model parameters.
  3. Perform the forward pass on the image with the module
  4. Post process the results.

Now for this we are using a couple of files, like the class labels file, the neural network model and its configuration file, all these files can be downloaded in the source code download section of this post.
We will start by reading the text file containing 1000 ImageNet Classes, and we extract and store each class in a python list.


Number of Classes 1000


[‘n01440764 tench, Tinca tinca’, ‘n01443537 goldfish, Carassius auratus’, ‘n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias’, ‘n01491361 tiger shark, Galeocerdo cuvieri’, ‘n01494475 hammerhead, hammerhead shark’]

All these classes are in the text file named synset_words.txt. In this text file, each class is in on a new line with its unique id, Also each class has multiple labels for e.g look at the first 3 lines in the text file:

  • ‘n01440764 tench, Tinca tinca’
  • ‘n01443537 goldfish, Carassius auratus’
  • ‘n01484850 great white shark, white shark

So for each line we have the Class ID, then there are multiple class names, they all are valid names for that class and we’ll just use the first one. So in order to do that we’ll have to extract the second word from each line and create a new list, this will be our labels list.

Here we will extract the labels (2nd element from each line) and create a labels list.


[‘tench’, ‘goldfish’, ‘great white shark’, ‘tiger shark’, ‘hammerhead’, ‘electric ray’, ‘stingray’, ‘cock’, ‘hen’, ‘ostrich’, ‘brambling’, ‘goldfinch’, ‘house finch’, ‘junco’, ‘indigo bunting’, ‘robin’, ‘bulbul’, ‘jay’, ‘magpie’, ‘chickadee’, ‘water ouzel’, ‘kite’, ‘bald eagle’, ‘vulture’, ‘great grey owl’, ‘European fire salamander’, ‘common newt’, ‘eft’, ‘spotted salamander’, ‘axolotl’, ‘bullfrog’, ‘tree frog’, ‘tailed frog’, ‘loggerhead’, ‘leatherback turtle’, ‘mud turtle’, ‘terrapin’, ‘box turtle’, ‘banded gecko’, ‘common iguana’, ‘American chameleon’, ‘whiptail’, ‘agama’, ‘frilled lizard’, ‘alligator lizard’, ‘Gila monster’, ‘green lizard’, ‘African chameleon’, ‘Komodo dragon’, ‘African crocodile’]

Now we will initialize our neural network which is a GoogleNet model trained in a caffe framework on 1000 classes of ImageNet. We will initialize it using cv2.dnn.readNetFromCaffe(), there are different initialization methods for different frameworks. 

This is the image upon which we will run our classification.

computer vision

Pre-processing the image:

Now before you pass an image in the network you need to preprocess it, this means resizing the image to the size it was trained on, for many neural networks this is 224x224, in pre-processing step you also do other things like Normalize the image (make the range of intensity values between 0-1) and mean subtraction, etc. These are all the steps the authors did on the images that were used during model training.

Fortunately In OpenCV you have a function called cv2.dnn.blobFromImage() which most of the time takes care of all the pre-processing for you.

blob = cv2.dnn.blobFromImage(image[, scalefactor,  [ size], [ mean])


  • Image: Input image.
  • Scalefactor: Used to normalize the image. This value is multiplied by the image, value of 1 means no scaling is done.
  • Size: The size to which the image will be resized to, this depends upon each model.
  • Mean: These are mean R,G,B Channel values from the whole dataset and these are subtracted from the image’s R,G,B channels respectively, this gives illumination invariance to the model.

There are other important parameters too but I’m skipping them for now.

Now this blob is our pre-processed image. It’s ready to be sent to the network but first you must set it as input

This is the most important step, now the image will go through the entire network and you will get an output. Most of the computation time will take place in this step.

Now if we check the size of Output predictions, we will see that it’s 1000. So the model has returned a list of probabilities for each of the 1000 classes in ImageNet dataset. The index of the highest probability is our target class index.


Total Number of Predictions are: 1000

You can try to print the predictions to understand it better, so we will print initial 50 predictions.


[9.21621759e-05 9.98483717e-01 5.40040501e-09 4.63205048e-08
 1.46981725e-08 9.53976155e-07 1.41102263e-09 9.43037321e-07
 1.11432279e-07 3.47782636e-11 1.09528010e-07 2.49071910e-08
 4.35386397e-07 1.09613385e-09 8.02263755e-10 5.40932188e-09
 2.00020311e-09 6.00099359e-10 5.15557423e-11 4.74516648e-10
 8.58448729e-11 1.90391162e-07 3.05192899e-10 1.51088759e-08
 1.51897750e-09 6.16360580e-07 5.98882507e-05 1.80176867e-04
 1.07785991e-06 2.55477469e-04 1.88719014e-07 5.45302964e-06
 2.64027094e-05 2.53552770e-07 5.08395566e-08 6.60280875e-07
 1.89136574e-06 8.86267983e-08 5.25031763e-04 3.21334414e-06
 6.80627727e-06 2.47660046e-06 1.29753553e-05 1.73194076e-06
 1.06492757e-06 3.31227341e-07 1.72065847e-06 4.86000363e-06
 3.48621292e-08 1.47009416e-07]

Now if I wanted to get the top most prediction or the highest probability then we would just need to do np.max() 



See, we got a class with 99.84% probability. This is really good, it means our network is pretty sure about the name of the target class.

If we wanted to check the index of the target class we can just do np.argmax()



Our network says the class with the highest probability is at index 1. We just have to use this index in the labels list to get the name of the actual predicted class



So our target class is goldfish which has a probability of 99.84%

In the final step we are just going to put the above information over the image.


computer vision

So this was an Image classification pipeline, similarly there are a lot of other interesting neural nets for different tasks, Object Detection, Image Segmentation, Image Colorization etc. I cover using 13-14 Different Neural nets with OpenCV using Video Walkthroughs and notebooks inside our Computer Vision Course and also show you how to use them with Nvidia & OpenCL GPUs.

What’s Next?

computer vision

If you want to go forward from here and learn more advanced things and go into more detail, understand theory and code of different algorithms then be sure to check out our Computer Vision & Image Processing with Python Course (Urdu/Hindi). In this course I go into a lot more detail on each of the topics that I’ve covered above.

If you want to start a career in Computer Vision & Artificial Intelligence then this course is for you. One of the best things about this course is that the video lectures are in Urdu/Hindi Language without any compromise on quality, so there is a personal/local touch to it.


In this post, we covered a lot of fundamentals in OpenCV. We learn to work with images as well as videos, this should serve as a good starting point but keep on learning. Remember to refer to OpenCV documentation and StackOverflow if you’re stuck on something. In a few weeks, I’ll be sharing our Computer Vision Resource Guide, which will help you in your Computer Vision journey.

If you have any questions or confusion regarding this post, feel free to comment on this post and I’ll be happy to help.

Today we are also releasing  Beta Version of Bleed AI Premium Subscribers. This will give you access to Graded Quizzes, Premium Colab Notebook, Priority Support, Course discounts & Practice Assignments You can Sign Up for the membership here. It’s Free.

Installation of OpenCV 4.3.0 in Windows 10 from Source with Nvidia GPU Support & Non-Free Flags Enabled.

Installation of OpenCV 4.3.0 in Windows 10 from Source with Nvidia GPU Support & Non-Free Flags Enabled.

In this post we are going to install OpenCV from Source in Windows 10. OpenCV stands for Open Source Computer Vision library. It’s the most widely used and powerful computer vision & Image processing library in the world. It has been around for more than 20 years and contains 1000s of Optimized algorithms written in C++ and has bindings in other languages like python, java, etc. 

Now if this is your first time dealing with OpenCV then I would highly recommend that instead of following this tutorial and installing from source, you just install OpenCV with pip by doing:

pip install opencv-contrib-python

Now if you have already used OpenCV in the past and want to have more control over how this library gets installed in your system then source installation is the way to go. By now you have realized that there are two ways to go about installing this library, one is the installation with a package manager like pip or conda, the other is an installation from source.

Now before we start with installing from source, you first need to understand what are the advantages vs disadvantages when you are installing from source vs with a package manager.

Installation with Package Manager (pip/conda):


  • The installation is really easy, you just have to run a single line of command. No extra knowledge required.
  • You can easily update the library with a single line too.


  • This method will install features that are preset by the library maintainers, these will be the most commonly used features. The programmer doesn’t have the ability to select features of his/her own choice.
  • The programmer can’t do any extra optimization by selecting different available Optimization flags.

Installation from Source:


  • The method allows you to add your own features or get rid of already present features, this is especially useful when you want to use a feature which does not come with the default installation or when you are deploying on a device with limited memory so then you can get rid of unnecessary features.
  • You can select some Optimization flags and get fast performance in some areas of the library.


  • The installation is a tedious process, many things along the way can go wrong.
  • To work with the flags you must be familiar with the library so you know which features to enable or disable for your specific purpose.
  • In order to update you have to do more than just run a single line of command.

In this tutorial the two main advantages from source installation that you will get, first you will be able to enable the Non-Free Algorithms in OpenCV. In later versions of OpenCV you can’t use algorithms like SIFT, SURF, etc as they are not installed with pip, but with source installation you will be able to install these. You will also be able to enable Cuda flags so that you can use Nvidia GPUs with the OpenCV DNN module, which will give you a huge speed boost when running neural nets in OpenCV.  You can still follow along if you don’t have an Nvidia GPU, you will just have to skip that part.

Note: I’m assuming that you already have python installed in your system, if not then you can go ahead and install anaconda distribution for python 3.7 here.

So let’s get started.

Step 0: Uninstall OpenCV (Do this only if you already have OpenCV installed)

If you are installing OpenCV in your system for the first time then you can ignore this step but if you already have installed OpenCV in your system then this step is for you.

  • Open up the command prompt and  do:
  • pip uninstall opencv-python

And also do

  • pip uninstall opencv-contrib-python
  • After Uninstalling these then you should see an error when you import OpenCV by doing:  import cv2

Step 1: Download OpenCV 

Click here to go to the OpenCV release page

  • Click on the Sources button, and you’ll download a zip folder named  opencv-4.3.0.zip
  • Make a new folder named “OpenCV-Installation” and extract this zip folder inside that folder. You can then get rid of the zip folder now.

Step 2: Download OpenCV contrib Package

The contrib package in OpenCV gives you a lot of other interesting modules that you can use with but unfortunately it does not come with the default installation so we have to download it from Github. Download the Source code (zip)

from here and Unzip it in the OpenCV-Installation directory. Note: It’s not required to unzip here but this way it will be cleaner.

After Unzipping both folders my Opencv-installation directory looks like this

So what we just did is that we created a folder named OpenCV- Installation and inside this folder we put opencv_contrib (Extracted) and opencv-4.3.0 (Extracted). 

Step 3 (a): Download Visual Studio 2019

Click here to download the Microsoft Visual Studio 2019 Community version

After Downloading you will get

Step 3 (b): Install Visual Studio 2019

In order to begin the installation process click on the vs installer file and continue with default settings. After that downloading will begin and after downloading the installation process will start

Step 3 (c): Select The Workloads in Visual Studio 2019

Finally, after completing the installation, you will be required to select workloads

As you can see above that we have Selected Python Development and Desktop Development C++

Note: if you are using the only python you would still need to select Desktop development with c++ otherwise at the time of CMake compilation of OpenCV you will get errors.

After Selecting the workloads Click the Install Button and your Packages will start downloading and then installing.

Step 4: Download and Install CMake

Now we have to download & install CMake. CMake is open-source software designed to build packages in a  compiler-independent manner. During the installation of OpenCV from source, CMake will help us to control the compilation process and it will generate native makefiles and workspaces. For more information about CMake you can look here.

Click here and download the latest version of CMake, In our case, we are downloading CMake-3.17.1.

  • Open the Installer file and Begin the installation but on the time of installation you’ll be asked 3 things
    • Agree with the terms & conditions (checkmark).
    • Do you want to add CMake to the system path, if you were going to use CMake with the command line then you should check this but since we’re going to be using the CMake GUI for configuration so we don’t need to check this for now.
    • Lastly, it will ask if you want to change the installation location (Keep it as default).

Step 5: Setting up CMake for Compilation

After completion of installation. Go to the Start Menu and type CMake in the search tab, click on it, you will get this window

  1. Where is the Source code:  Select the Opencv installation file OpenCV-4.3.0 which you just put inside the OpenCV-Installation directory.

  2. Where to build the binaries: This is where the build binaries will be saved, if a path does not exist (which will be true when you’re running this the first time) then CMake will ask you and then create it. For our case, we are just setting the path to be our OpenCV-installation directory followed by /build

It would look something like this

And after selecting both paths Click on the Configure button also Keep the setting as default and click on the Finish button.

Step 6 (a): Setting up Required Flags for our Installation

After compilation a window like this will appear, which allows you to select flags that you want. Now by checking these flags or unchecking them you will have a customized installation of OpenCV. If you were doing a pip installation then you wouldn’t have this ability. 

Checking each flag adds more functionality/modules etc to your OpenCV installation.

For this OpenCV installation a lot of flags are already checked but we’ll checkmark some more flags to include them and we will also provide the path of opencv-contrib package in order to use the modules that are present in opencv-contrib Package.


This flag enables the installation of the Python examples present in the sample folder created during the extraction of the OpenCV zip. 

In Search bar write INSTALL_PYTHON_EXAMPLES and check-mark it, like this:


Qt is a cross-platform app development framework for desktop, embedded systems, and mobile. With Qt, you can write GUIs directly in C++ using its Widgets modules. Qt is also packaged with an interactive graphical tool called Qt Designer which functions as a code generator for Widgets based GUIs.

In Search bar write WITH_QT and checkmark it, like this:


This will allow OpenGL support in OpenCV for building applications related to OpenGL. OpenGL is used to render 3D scenes. These 3D scenes can be later processed using OpenCV.

In Search bar write WITH_OPENGL and check mark it, like this:


This will allow us to use those Non-Free algorithms like SIFT & SURF.

In Search bar write OPENCV_ENABLE_NONFREE and checkmark it, like this:


To finally include the opencv contrib package that you downloaded in step 2 in OpenCV you would need to add its path to the OPENCV_EXTRA_MODULES_PATH  flag.

In Search bar write OPENCV_EXTRA_MODULES_PATH and provide a path of opencv- contrib package, like this:

In our case the path is: C:/Users/parde/Downloads/OpenCV-Installation/opencv-contrib/modules

Note:  In my system I had to replace backward slashes with forward slashes in the above path to avoid an error, this might not be the case with you.

Step 6 (b): Setting up Nvidia GPUs & Enable cuda flags.

Now if you don’t have an Nvidia GPU then please skip this step and go to step 6c. If you do have an Nvidia GPU then there are two parts to make this work. The first part is to install Cuda drivers, toolkit & cudnn and the second is to enable Cuda flags.

Part 1: Install Cuda Drivers, Cuda Toolkit & Cudnn:

Since the first part itself is long and many of you who have Nvidia GPUs may already have done most of that installation so I’m covering it in another blog post here, this post also lets you install TensorFlow 2.0 GPU in your system.

Part 2: Enable Cuda Flags:

After you have completed part 1, you can enable the below flags.


This option allows CUDA support in the OpenCV. You need to make sure that you have CUDA enabled GPU in your machine in order to use the CUDA library. 

In Search bar write WITH_CUDA and checkmark it, like this:


Also enable opencv_dnn_cuda flag.

In Search bar write OPENCV_DNN_CUDA and checkmark it, like this:


Now enter your Cuda architecture version that you found out at the end of the post linked in part 1 step 6(b), it’s important that you enter the correct version.

In Search bar write CUDA_ARCH_BIN and checkmark it, like this:

Note: In some cases I’ve noticed due to some reason the CUDA_ARCH_BIN flag is not visible until you press the configure button, so press the configure button and then put the value of this flag and then you have to again press the configure button in step 6c.

Step 6 (c): Configure & Generate

Click on the Configure button. After completion you will get this message:

After Configuration is done press the generate button.

Step 7: Build the Libraries in Visual Studio

In Order to build the libraries you need to open up OpenCV.sln which will be in your build folder or you can open it up by clicking the Open Project button on the CMake window.

After clicking the Open Project button you will get this window change the Debug mode to Release mode. Don’t forget this step.

After changing to Release  mode:

And now it’s time to install the libraries

  • Expand the CMake Target directory which is on the right side of your screen.
  • Right Click on ALL_BUILD 
  • Click the build option

It will take time to build the libraries, the time of building depends upon what and how many flags you enabled. If you have enabled Cuda flags it may take several hours to build.

 After the build is finished you will get this message. 

If you see this message it means that your build was successful.

Step 8: Install the Library in Visual Studio

  • Below the ALL_BUILD file, you will see an INSTALL file
  • Right Click on the INSTALL file
  • Click on the build option 

Your installation will begin, and at the `end of the build you will get this message

Step 9: Verify the Installation of OpenCV

  • Open the cmd terminal and run the Python Interpreter and then type 
    • import cv2  (Hit Enter)
    • cv2.__version__  (Hit Enter)

if your output is the same as the screenshot below then your installation went good. 

Congratulations! Your Installation Is Complete 🙂 

Using GPUs with OpenCV DNN Module:

For those of you that have installed OpenCV with GPU support might be wondering how to use it.

Now if you have never worked with the DNN Module then I would recommend that you take a look at the coding part of our Super Resolution blog post. 

So for e.g. if you wanted to run the above code on an Nvidia GPU then you just need to include 2 lines after you read the model and its configuration file with readNet function. 

These 2 lines are:


These lines must come before the forward pass is performed and that’s it. If you have configured everything properly then OpenCV will utilize your Nvidia GPU to run the network in the forward pass.

Also as a bonus tip, if you wanted to use OpenCL based GPU in your system then instead of those two lines you can use this line:


By running your models in Nvidia GPU or Intel-based GPU, you will find a tremendous speed increase, provided of course you have a strong GPU. If you didn’t install with GPU support then by putting these lines it wouldn’t use the GPU but it won’t give an error either.

I hope you found this tutorial useful. For future Tutorials by us, make sure to Subscribe to Bleed AI below

In Our Computer Vision & Image processing Course (Urdu/Hindi) I actually go over 13-14 different neural nets with OpenCV DNN and provide you coding examples & Video Walkthroughs to use all of them with the Nvidia and OpenCL based GPUs. Besides that the course itself is pretty comprehensive and arguably one of the best if you want to start a career in Computer Vision & Artificial Intelligence, the only prerequisite to this course is some programming experience in any language.

This course goes into the foundations of Image processing and Computer Vision, you learn from the ground up what the image is and how to manipulate it at the lowest level and then you gradually built up from there in the course, you learn other foundational techniques with their theories and how to use them effectively.

Super Resolution with OpenCV

Super Resolution with OpenCV

Have You seen those Sci fi movies in which the detective tells the techie to zoom in on an image of the suspect and run an enhancement program and suddently that part of image is magically enhanced to a higher resolution instead of being pixelated.

Feel free to take a look at a compilation of those exact scenes below.

It’s also absurd, the amount of times that they all got a reflection of something in the video. Anyways the point is that in the past few years we have made that aspect of Sci-fi a reality. Meaning today with deep learning methods we can actually enhance many low-resolution images to a high-resolution version, sometimes even as high as 8x resolution. This means you can take a 224×224 image and make it 1792×1792 without any loss in quality. This technique is called Super Resolution.

In this tutorial you will learn how to perform Super-Resolution with just OpenCV, specifically, we’ll be using OpenCV’s DNN module so you won’t be using any external frameworks like Pytorch or Tensorflow.

Before we start with the code I want to briefly discuss the amazing progress of Super-Resolution Algorithms. You can feel free to jump right into the code. But I would recommend giving the theory below a quick read even if you don’t understand all of it.

So technically speaking, Super Resolution can be defined as the class of Algorithms that upscales an image without losing quality. How would you upscale an image without this? well you could say you can resize the image and make it larger.

So when you typically resize an image, you use Nearest Neighbor Interpolation. This just means you expand the pixels of the original image and then fill the gaps by copying the values of the nearest neighboring pixels.

Figure 1: Nearest Neighbor interpolation.

Of Course the results would be terrible, you can do better by taking a weighted average of neighboring pixels instead of just copying them. This is essentially done by using Bilinear or Bicubic interpolation.


Still the results above are blurred and you can easily tell that its not the original version. So can we make this upscaled version look like the original with some fancy Algorithms? Well the short answer is No. No smart function or algorithm will be able to replace the missing information. The best we can do is approximate and fill the gaps based on the neighboring pixels. 

But fret not, Neural Networks come to the rescue. These algorithms can actually look at thousands of samples and remember the patterns so at the end of the day you don’t have to approximate the missing information, you can hallucinate based on the past seen data. 

Let me simplify this, what you can do is train a neural network by showing samples of high res images with their low res version. In fact that is what the SRCNN (Image Super-Resolution Using Deep Convolutional Networks ) paper in 2015 by Chao Dong et al did.

They simply input Low res (downscaled version) of images and made the model output a Higher resolution version and then compared it with the original High res version. The metric they were Optimizing was Peak signal to noise ratio (PSNR) score.

Figure 4: PSNR Equation


SRResNet (Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network ) in 2016 by Wenzhe Shi et al improved upon the previous SRCNN at two levels, first, it used Residual blocks (Convolution layers with skip connections) instead of normal Convolution layers. Why? Well the success of architectures like ResNets made the fact popular that Residual blocks are more powerful than simple convolutional layers, as it allowed to add more layers without overfitting.

Second, it shifted the upsampling step to the middle of the network. Now in any Super res architecture there has to be an upsampling step. Now if you’re using bicubic interpolation inside the network to upsample then you can either use it at the start or the end, you can’t use it in the middle because It’s a fix mathematical operation, it’s not learnable. Now if you want to have upsampling in between the layers then you can go for transpose layers to upsample the image. One problem tho, transpose convolutions adds zeros to upscale the image, you don’t have any gradient information to tune this upscaling process. The way SRResNet got around that was that it used sub pixel convolutions to upscale, without me explaining what this layer is you just have to understand that with this technique of upscaling is a learnable operation, so this improved results. This model was also optimizing the PSNR score.

Now Consider the PSNR score again: 

Figure 4: PSNR Equation

As you can see, the model will have a high PSNR score if the MSE (mean squared error) is low. Now this approach works well, but the problem here is that even with high PSNR scores the images do not necessarily look good to the human eye.

So the image fidelity or the human perception of image quality is not exactly correlated to psnr scores. Minimizing MSE would produce images that may look more like the original but may not necessarily look pleasing to the eye.

Consider all these images below that have almost equal MSE when compared to the reference image, even though we can clearly see that the image on the top is way closer to reference image than the bottom one.

Perceptual Loss:

MSE only cared about pixel-wise intensity differences not the actual contents of the image. This problem can be solved by using a better metric. Something called a Perceptual loss (Perceptual Losses for Real-Time Style Transfer and Super-Resolution in 2016 by Justin Johnson et al) can be used. It’s kind of a loss that correlates well with our perception of image quality. It works by simply passing the output of the model and the actual target image to a pre-trained model like VGG variants and then compute the difference between the resulting feature maps of that model and try to minimize that. The layers of the pre-trained model that generates those feature maps for loss calculation stays frozen during the training of the Super res network. This perceptual loss is also called the content loss in style transfer networks.

Figure 7: Perceptual Loss.

EnhanceNet (Single Image Super-Resolution Through Automated Texture Synthesis) by Mehdi S. M. Sajjadi et al, 2017 is a great network that effectively implements this loss.


Figure 8: SRGAN Architecture.

For a moment if we think about the Super Resolution problem then we can agree that we don’t care if the output image matches the original one exactly as long as it looks good, So why not use GANs (Generative Adversarial Networks) to generate realistic Upscaled versions of the image. That’s what SRGAN (Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network) Christian Ledig et al, 2017 did. Like all GANs SRGAN, had a generator that tries to generate realistic-looking Upscaled versions of the original images and it also had a discriminator that tried to tell if the generated image is the Original high res version or a generated Upscaled version. During the training they both get better over time and the generator learns to produce better looking Upscaled versions of the image.

In Addition to this, SRGAN also implemented a Perceptual loss function. So this network with the combination of Generative Loss & Perceptual Loss along with sub-pixel convolutions produces really High-quality Upscaled images.


An interesting variant of SRGANs is this ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks by Xintao Wang et al, 2018. The paper did lots of simple yet interesting things like removing the Batch norm layer, doing residual scaling, modifying discriminator loss, taking feature maps comparison before the activation function, etc. to the above network and so the performance improved. 

The network also computed a weighted average of two models, a GAN model and an MSE trained model, this way the Output looked real and also closely resembled the original image.

 This network got pretty popular in the gaming community, people upscaled old gaming graphics.

Figure 10: Nearest Neighbor and ESRGAN in Gaming.

Other Areas Of Super Resolution:

Domain/Task Specific Super Resolution: 

Needless to say if you train Super Res on a certain type of data then it will perform really well on that type. So people have trained really powerful Super res networks on domain problems like training only on faces and by utilizing face priors, you get a network (like this: Pixel Recursive Super Resolution) which can generate plausible high res face images from a very low res image. Of Course these types of networks can’t be used for CSI use cases as the details are totally made up by the algorithm.

Progressive Networks:

There are also progressive networks that break the training into steps so you can achieve a really high resolution with this for e.g. Progressive Face Super-Resolution via Attention to Facial Landmark, (2019) can improve the resolution by 8x.

Multi-Image Super-Resolution:

All the methods discussed above belong to theSingle Image Super-Resolution” category, while most of the interesting papers in SR are in this category but there is another area called “Multi-Image Super-Resolution” in which you have multiple images of the same scenes but the camera is slightly shifted, by some subpixels on each image. Then you use that extra information from all those individual images and construct a high res version. In fact Google’s New Pixel 3 uses a Multi-Image SR algorithm that uses those slight shifts of handheld motion to produce those amazing SR Zoom effects.

Note: Super resolution is a really popular subject and you’ll see a good number of research papers published each year in this area. There are other interesting papers that I have not discussed but the papers I have mentioned in essence capture the evolution of Super res networks.

Super Resolution with OpenCV Code

Now let’s start with the code, we are going to be using OpenCV’s DNN module, this was introduced in OpenCV version 3 and now in version 4.2 it has evolved a lot. This module lets you use pre trained neural networks from popular frameworks like tensorflow, pytorch, onnx etc and use those models directly in OpenCV. 

The Super Res model we’ll be using is called Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network”  by Wenzhe Shi et al, 2016. Although this does not use Perceptual loss nor a generative loss its still a really fast implementation because it uses Sub-Pixel Convolutions for upscaling. This Model will Enhance your image by 3x.

The model is in ONNX format (Open neural network exchange format). This is an industry-standard format for changing model frameworks, this means you can train a model in PyTorch or other common frameworks and then convert to onnx and then convert back to TensorFlow or any other framework.

OpenCV DNN module allows you to use models that are in ONNX format by using cv2.dnn.readNetFromONNX(). You can get a list of ONNX models from the ONNX Model Zoo.

Figure 14: ONNX model Conversion.

Here are the steps we would need to perform:

  1.  Initialize the Dnn module.
  2.  Read & Pre-process the image.
  3.  Set the preprocessed image as input and do a forward pass with the model.
  4.  Post-process the results to get the final image

Make sure you have the following Libraries Installed.

  • OpenCV ( possibly Version 4.0 or above)
  • Numpy
  • Matplotlib

Directory Hierarchy

Make sure to download the zip folder containing the source code, images, & model from above. After downloading, extract the folder and run the Jupyter notebook kernel from there.

This is how our directory structure looks like, it has a Jupyter notebook, a media folder with images and the model folder.

Import Libraries

Start by Importing the required libraries.

Initialize DNN Module

To use Models in ONNX format, you just have to use cv2.dnn.readNetFromONNX(model) and pass the model inside this function.

Read Image

This is our image on which we are going to perform super-resolution.

  • Line 2: We’re reading the image form disk.
  • Line 5-7 : We’re setting the figure size and showing the image with matplotlib, [:,:,::-1] means to reverse image channels so we can show OpenCV BGR images properly in matplotlib. OpenCV BGR images.

Preprocessing the image

Before you pass in an image to a neural network you perform some image processing to get the image in the right format. So the first thing we will do is resize the image to have the size 224×224. This is the size that our network requires. After that we’ll convert the image from RGB to YCbCr color format. So take a look at the components in this format.

  • Y:  This is called the lumma component. So basically this channel encodes brightness intensity of the image, you can think of this channel like the grayscale version of the image.
  • Cb: This is the blue-difference channel.
  • Cr: This is the red difference channel.

So why are we doing this? If we try to Upscale an RGB (or BGR in case of OpenCV) Color image then we would need to train a network that would have to learn to Upscale each individual channel, so instead of doing that we can do something smarter, That changes the image to the color format where you can just manipulate the main intensity channel. So the network we’re using has learned to upscale Y channel. After it does that, all we do is upscale the channels (These are responsible for color) using bicubic interpolation and merge it with the Y channel. This cuts our work to 1/3. After this we do some formatting of the Y channel and then finally normalize it by dividing with 255.0.

  • Line 2-5: We’re making a copy of the image and resizing it to the size the network requires.
  • Line 8-11: Changing the color channel to YCbCr & splitting it to individual channels, so we can just work with the Y channel.
  • Line 14-17: Formatting Y to the format that is acceptable by the network.
  • Line 20: Converting to float and normalizing the image as it was done in the original implementation.

Input the Blob Image to the Network 

Forward Pass

Most of the Computations will take place in this step, in my PC it took 90 ms for a single pass. This is the step where the image goes through the whole neural network.


After the network outputs the results, you need to post-process it. Mostly you reverse what you did in the preprocessing step.

  • Line 2-5: We’re reversing what we did in the preprocessing step.
  • Line 8: Clipping to stay in the uint8 range and avoid artifacts in the final image from rounding.
  • Line 11-12: Resizing the color channels according to the ‘Y’ channel.
  • Line 15-18: Merging all channels and converting the image back to BGR.

Display Final Result

  • Line 5-7: We’re displaying both the bicubic and super-resolution version of the image in subplots.

Creating Functions

Now that we have seen step by step implementation of the network, we’ll create the following 2 python functions.

Initialization Function: This function will contain parts of the network that will be set once, like loading the model.
Main Function: This function will contain all the rest of the code from preprocessing to postprocessing, it will also have the option to either return the enhanced image or display it with matplotlib.

Initialization Function

Main Function

Set returndata = True when you just want the enhanced image. I usually do this when working with videos.

Initialize the Super Resolution

Call the initialization function once.

Calling the main function

Now pass in any image to the main function and you’ll see a comparison of both its Bicubic and super-resolution version.

Granted the results are not that astonishing, it’s only doing 3x and there are models that can do 8x or more. But its a starting point, its really fast, easily under 100 ms on a CPU. And it’s better than Bicubic interpolation. And most importantly you can use this directly in OpenCV. In future I may consider writing a tutorial on other Super Resolution networks but for that I may have to use Pytorch or Tensorflow.


Super Resolution has some great applications.

Facial Recognition:  Super res algorithms can help in improving the accuracy of those Surveillance systems which have to perform facial recognition on low-resolution cameras. You can try this experiment yourself and see if this network helps you to improve performance on your facial recognition systems.

Satellite Imagery: All those satellite images can be further zoomed in without any loss in quality and with no extra hardware by just implementing super-resolution on them.

Medical: It can also prove really effective in medical imaging. Especially those images for which you are limited by the available lenses.

Data Compression/Bandwidth saving: Imagine the bandwidth savings if you can download low res images and then view in high resolution using a mobile version of a super-resolution algorithm.

In Fact if you think about it, this technique has limitless applications across many industries.

Note: There is still no Generic Super-resolution algorithm that does well in all problem domains. Meaning if you take a Super res network that was trained on a dataset of house pictures and test it on animals then it would do poorly. So almost all Super res networks have their weaknesses. The best thing is to train a super res on your own problem and then use it. The best part about it is that generating labels for any image is as easy as downsizing an image literally.

What’s Next?

computer vision

If you want to go forward from here and learn more advanced things and go into more detail, understand theory and code of different algorithms then be sure to check out our Computer Vision & Image Processing with Python Course (Urdu/Hindi). In this course, I go into a lot of detail regarding vision fundamentals and cover a plethora of algorithms and techniques to help you master Computer Vision.

If you want to start a career in Computer Vision & Artificial Intelligence then this course is for you. One of the best things about this course is that the video lectures are in Urdu/Hindi Language without any compromise on quality, so there is a personal/local touch to it.


So let’s quickly Summarize what we did here, first, we discussed what is Super-resolution then we went over a series of Super Res networks and we saw how each algorithm improved over the other. We also discussed other areas of Super-Resolution like multi-image Super-resolution and domain-specific super res networks.

After that, we learned how to perform a step by step pipeline to do inference with a super res network inside the OpenCV DNN module. 

I hope you enjoyed this tutorial. If you have any questions regarding this post then please feel free to comment below and I’ll gladly answer them.

Checkout Bleed AI Premium Subscribers. This will give you access to Graded Quizzes, Premium Colab Notebooks, Priority Support, Course discounts & Practice Assignments You can Sign Up for the membership here. It’s Free.