fbpx
Building 4 Applications Using Real-Time Selfie Segmentation in Python

Building 4 Applications Using Real-Time Selfie Segmentation in Python

Watch Video Here

In this tutorial, you’ll learn how to do Real-Time Selfie Segmentation using Mediapipe in Python and then build the following 4 applications.

  1. Background Removal/Replacement
  2. Background Blur
  3. Background Desaturation
  4. Convert Image to Transparent PNG

And not only will these applications work on images but I’ll show you how to apply these to your real-time webcam feed running on a CPU.

Selfie Segmentation

Also, the model that we’ll use is almost the same one that Google Hangouts is currently using to segment people, So Yes! We’re going to be learning a State of the Art approach for segmentation.

And on top of that, the code for building all 4 applications will be ridiculously simple.

Interested yet? Then keep reading this full post.

In the first part of this post, we’ll understand the problem of image segmentation and its types, then we’ll understand what selfie segmentation is. After that, we’ll take a look at Mediapipe and how to do selfie segmentation with it. And finally how to build all those 4 applications.

What is Image Segmentation?

If you’re somewhat familiar with computer vision basics then you might be familiar with image segmentation, a very popular problem in Computer Vision.

Just like in an object detection task where you localize objects in the image and draw boxes around it, in a segmentation task, you’re almost doing the same thing, but here instead of drawing a bounding box around each object, you’re trying to segment or draw out the exact boundary of each target Object.

A segmentation model is trying to segment out the busses in the image above

In other words, in segmentation, you’re trying to divide the image into groups of pixels based on some specific criteria.

So an image segmentation algorithm will take an input image and output groups of pixels, each group will belong to some class. Normally this output is actually an image mask where each pixel consists of a single number indicating the class it belongs to.

Now the task of image segmentation can be divided into several categories, let’s understand each of them.

  1. Semantic Segmentation.
  2. Instance Segmentation
  3. Panoptic Segmentation
  4. Saliency Detection.

What is Semantic Segmentation?

In this type of segmentation, our task is to assign a class label (pedestrian, car, road, tree etc.)  to every pixel in the image.

As you can see all the objects in the image, including the buildings, sky, sidewalk are labeled by a certain color indicating that they belong to a certain class e.g all cars are labeled blue, people are labeled red, and so on.

It’s worth noting that although we can extract any individual class, for e.g. we can say extract all cars by looking for blue pixels but we cannot distinguish between different instances of the same class, for e.g. you can’t reliably say which blue pixel belongs to which car.

What is Instance Segmentation?

Another common category of segmentation is called Instance Segmentation. Here the goal is not to label all pixels in the image but only label some selective classes, for which the model was trained on ( for e.g. cars, pedestrians, etc. ). 

As you can see in the image, the algorithm ignored the roads, sky, buildings etc. so here we’re only interested in labeling specific classes.

One other major difference in this approach is that we’re also differentiating between different instances of the same classes i.e. you can tell which pixel belongs to which class and so on.

What is Panoptic Segmentation?

If you’re a curious cat like me, you might wonder, well isn’t there an approach that,

A) Labels all pixels in the image like semantic segmentation.

B) And also differentiates between instances of the same class like instance segmentation.

Well, Yes there is! And it’s called Panoptic Segmentation. Where not only every pixel is assigned a class but we can also differentiate between different instances of the same class, i.e. we can tell which pixel belongs to which car.

This type of segmentation is the combination of both instance and semantic segmentation.

What is Saliency Detection?

Don’t be confused by the word “Detection” here, although Saliency Detection is not generally considered as one of the core segmentation methods but it’s still essentially a major segmentation technique.

So here the goal is to segment out the most salient/prominent (things that stand out ) features in the image.

And this is done regardless of the class of the object. Here’s another example.

As you can see the most obvious object in the above image is the cat, which is exactly what’s being segmented out here.

So in saliency detection where trying to segment out the most standing out features in the image.


Selfie Segmentation:

Alright now that we have understood the fundamental segmentation techniques out there, let’s try to understand what selfie segmentation is.

Well, obviously it’s a no brainer, it’s a segmentation technique that segments out people in images.

You might think, how is this different from semantic or instance Segmentation?

Well, to put it simply, you can consider selfie segmentation as a sort of a mix between semantic segmentation and Saliency detection.

What do I mean by that?

Take a look at the example output of Selfie segmentation on two images below.

In the first image (top) the segmentation is done perfectly, as every person is on a similar scale and prominent in the image, whereas in the second image (bottom) the woman is prominent and is segmented out correctly while her colleagues in the background are not segmented properly.

This is why the technique is called selfie segmentation, it tries to segment out prominent people in the image, ideally everyone to be segmented should be on a similar scale in the image.

This is why I said that this technique is sort of a mix between saliency detection and semantic segmentation.

Now, you might think why do we even need to use another segmentation technique, why not just segment people using semantic or instance segmentation methods.

Well, Actually we could do that. Models like Mask-RCNN, DeepLabv3, and others are really good at segmenting people.

But here’s the problem.

These models although provide State of the Art results but are actually really slow, they aren’t a good fit when it comes to real-time applications especially on CPUs.

This is why the Selfie segmentation model that we’ll use today is specifically designed to segment people and also run at real-time speed on CPU and other low-end hardware. It’s built on a slight modification of the MobielNetv3 model. This model itself contains clever algorithmic innovations for maximum speed and performance gains. To understand more about these algorithmic advances in this model, you can read Google AI’s Blog post on this model.

So what are the use cases for Selfie Segmentation?

The most popular use case for this problem is Video Conferencing. In fact, Google Hangouts is using approximately the same model that we’re going to learn to use today.

You can read the Google AI Blog release about this here.

Besides Video Conferencing, there are several other use cases for this model that we’re going to explore today.

MediaPipe:

Mediapipe is a cross-platform tool that allows you to run a variety of machine learning models in real-time. It’s designed primarily for facilitating the use of ML in streaming media.

This is the tool that we’ll be using today in order to use the selfie segmentation model.  In future tutorials I’ll also be covering the usage of a few other models and make interesting applications out of them. So Stay tuned for those blog posts at Bleed AI.

Alright Now let’s start with the Code!


Download Code:

To get started with Mediapipe, you first need to run the following command to install it

pip install mediapipe

Import the Libraries

Now let’s start by importing the required libraries.

Initialize the Selfie Segmentation Model

The first thing that you need to do is initialize the selfie segmentation class using the mp.solutions.selfie_segmentation function and then you need to call the setup function using .SelfieSegmentation(0) now there are two models for segmentation in mediapipe, by passing in 0 you will be using the general model i.e. input is resized to: 256x256x3 (Height, width, columns) and by passing 1 you will be using the landscape model i.e. input resized to: 144x256x3 (Height, width, columns).

You should select the type of model by taking into account the aspect ratio of the original image, although the landscape model is a bit faster. These models automatically resize the input image before passing it through the network and the size of the output image representing the segmentation mask for both models will be the same as the input that is 256x256x1 or 144x256x1.

Read an Image

Now let’s read a sample image using the function cv2.imread() and display the image using the matplotlib library.


Application 1: Remove/Replace Background

We will start by learning to use selfie segmentation to change the background of images. But first, we will have to convert the image into RGB format as the MediaPipe library expects the images in this format but the function cv2.imread() reads the images in BGR format and we will use the function cv2.cvtColor() to do this conversion.

Then we will pass the image to the MediaPipe Segmentation function which will perform the segmentation process and will return a probability map with pixel values near 1 for the indexes where the person is located in the image and pixel values near 0 for the background.

Notice that we have some gray areas in the map, this signifies that there are areas where the model was not sure if it was the background or the person. So now what we need to do is do some thresholding and set all pixels above certain confidence to white and all other pixels to black.CodeText

So in this step, we’re going to be thresholding the mask above to get a binary black and white mask with a pixel value 1 for the indexes where the person is located and 0 for the background.CodeText

Now we will use the numpy.where() function to create a new image which will have the pixel values from the original sample image at the indexes where the mask image have value 1 (white areas) and replace areas where mask have value 0 (black areas) with 255, to give a white background to the object of the sample image. Right now we’re just adding whtie (255) background but later on we’ll add a separate image as background.

But to create the required output image we will first have to convert the mask image (one channel) into a three-channel image using the function numpy.dstack() as the function numpy.where() will need to have all images to have equal number of channels.

Now instead of having a white background if you need to add another background image, you just need to replace 255 with a background image in np.where function


Create a Background Modification Function

Now we will create a function that will use the selfie segmentation to modify the background of an image depending upon the passed arguments. The followings will be the modifications that the function will be capable of:

  • Change Background: The function will replace the background of the image with a different provided background image OR it will make the background white for the cases when a separate background image is not provided.
  • Blur Background: The function will segment out the prominent person and then blur out the background.
  • Desaturate Background: The function will desaturate (convert to grayscale) the background of the image, giving the image a very interesting effect.
  • Transparent Background: The function will make the background of the image transparent.

Now we will utilize the function created above with the argument method='changeBackground' to change the backgrounds of a few sample images and check the results.


Change Background On Real-Time Web-cam Feed

The results on the images look great, but how will the function we created above fare when applied to our real-time webcam feed. Well, let’s check it out. In the code below we will swap out different background images by pressing the key b on keyboard.

Output:

Woah! that was Cool, not only the results are great but the model is pretty fast.

Video on Video Background Replacement:

Let’s take this one step further and instead of changing the background by an image, let’s replace it with a video loop.

Output:

That was pretty interesting, now that you’ve learned how to segment the background successfully it’s time to make use of this skill and create some other exciting applications out of it.


Application 2: Apply Background Blur

Now this application will actually save you a lot of money.

How?

Well, remember those expensive DSLR or mirrorless cameras that blur out the background, today you’ll learn to achieve the same effect, infact even better by just using your webcam.

So now we will use the function created above to segment out the prominent person and then blur out the background.

All we need to do is just blur the original image using cv2.GaussianBlur() and then instead of replacing the background with a new image (like we did in the previous application) we’ll just replace it with this blur version of the image. This way the segmented person will retain it’s original form but the rest of the parts will be blurred out.

Figure 21

Now let’s call the function with the argument method='blurBackground' over some samples. You can control the amount of blur by controling the blur variable.

Background Blur On Video

Now we will utilize the function created above in a real-time webcam feed where we will be able to blur the background.

Output:

Application 3: Desaturate Background

Now we will use the function created above to desaturate (convert to grayscale) the background of the image. Again the only new thing that we’re doing here is just replacing the black parts of the segmented mask with the grayscale version of the original image.

We will have to pass the argument method='desatureBackground' this time, to desaturate the backgrounds of a few sample images.

Background Desaturation On Video

Now we will utilize the function created above in a real-time webcam feed where we will be able to desaturate the background of the video.

Output:

Application 4: Convert an Image to have a Transparent Background

Now we will use the function created above to segment out the prominent person and then make the background of the image transparent and after that we will store the resultant image into the disk using the function cv2.imwrite().

To create an image with a transparent background (four-channel image) we will need to add another channel called alpha channel to the original image, this channel is a mask which decides which part of the image needs to be transparent and can have values from 0 (black) to 255 (white) which determine the level of visibility. Black (0) acts as the transparent area and white (255) acts as the visible area.

So we just need to add the segmentation mask to the original image.

We will have to pass the argument method='transparentBackground' to the function to get an image with transparent background.

You can go to the location where the image is saved, open it up with an image viewer and you’ll see that the background is transparent.

Further Resources

Note: These models work best for the scenarios where the person is close (< 2m) to the camera.


Summary:


Alright, So today we did a lot!

We Understand the basic terminology regarding different segmentation techniques, in summary:

  • Image Segmentation: The task of dividing pixels into groups of pixels based on some criteria
  • Semantic Segmentation: In this type we assign a class label to every pixel in the image.
  • Instance Segmentation: Here we assign a class label to only selective classes in the image.
  • Panoptic Segmentation: This approach combines both semantic and instance segmentation.
  • Saliency Detection: Here we’re just interested in segmenting prominent objects in the image regardless of the class.
  • Selfie Segmentation: Here we want to segment prominent people in the image.

We also learned that Mediapipe is an awesome tool to use various ML models in real-time. Then we learned how to perform selfie segmentation with this tool and build 4 different useful applications from it. These applications were:

  • How to remove/replace backgrounds in images & videos.
  • How to desaturate the background to make the person pop out in an image or a video.
  • How to blur out the background.
  • How to give an image a transparent background and save it.

Let me know in the comments, you can also reach out to me personally for a 1 on 1 Coaching/consultation session in AI/computer vision regarding your project or your career.

Ready to seriously dive into State of the Art AI & Computer Vision?
Then Sign up for these premium Courses by Bleed AI

This was my first Mediapipe tutorial and I’m planning to write a tutorial on a few other models too. If you enjoyed this tutorial then do let me know in the comments! You’ll definitely get a reply from me