fbpx

Controlling Subway Surfers Game with Pose Detection using Mediapipe and Python

By Taha Anwar and Rizwan Naeem

On August 9, 2021

Watch The Video Here:

In last Week’s tutorial, we learned how to work with real-time pose detection and created a pose classification system. In this week’s tutorial, we’ll learn to play a popular game called “Subway Surfers”  

Of Course, there’s more to it, this is an AI Blog after all.

We will actually be using our body pose to control the game, not keyboard controls, the entire application will work in real-time on your CPU, you don’t even need a depth camera or a Kinect, your webcam will suffice.

Excited yet, let’s get into it, but before that let me tell you a short story that motivated me to build this application today. It starts with me giving a lecture on the importance of physical fitness, I know … I know … how this sounds but just bear with me for a bit.

Hi All, Taha Awnar here, So here’s the thing. One of the best things I enjoyed in my early teenage years was having a fast metabolism due to my involvement in physical activities. I could eat whatever I wanted, not make a conscious effort in exercising and still stay fit.

But as I grew older, and started spending most of my time in front of a computer, I noticed that I was actually gaining weight. So no longer could I afford the luxury of binge unhealthy eating and skipping workouts.

Now I’m a bit of a foodie so although I could compromise a bit on how I eat, I still needed to cut weight some other way, so I quickly realized that unless I wanted to get obese, I needed to consciously make effort to workout.

That’s about when I joined a local gym in my area, and guess what? … it didn’t work out, ( or I didn’t work out … enough 🙁  ) So I quitted after a month.  

So what was the reason ?.Well, I could provide multiple excuses but to be honest, I was just lazy.

A few months later I joined the gym again and again I quitted after just 2 months.

Now I could have just quit completely but instead 8 months back I tried again, this time I even hired a trainer to keep me motivated, and as they say it, 3rd time’s a charm and luckily it was!

8 months in, I’m still at it. I did see results and lost a couple of kgs, although I haven’t reached my personal target so I’m still working towards it.

If you’re reading this post then you’re probably into computer science just like me and you most likely need to spend a lot of time in front of a PC and because of that, your physical and mental fitness must take a toll. And I seriously can’t stress enough how important it is that you take out a couple of hours each week to exercise.

I’m not a fitness guru but I can say working out has many key benefits:

  • Helps you shed excess weight, keeps you physically fit.
  • Gives you mental clarity and improves your work quality.
  • Lots of health benefits.
  • Helps you get a partner, if you’re still single like me … lol

Because of these reasons, even though I have an introverted personality, I consciously take out a couple of hours each week to go to the gym or the park for running.

But here’s the thing, sometimes I wonder why can’t I combine what I do (working on a PC) with some activity so I could … you know hit 2 birds with one stone.

This thought led me to create this post today, so what I did was I created a vision application that allows me to control a very popular game called Subway Surfers via my body movement by utilizing real-time pose detection.

And so In this tutorial, I’ll show you how to create this application that controls the Subway Surfers game using body gestures and movements so that you can also exercise, code, and have fun at the same time.

pose detection subway surfers

How will this Work?

So this game is about a character running from a policeman dodging different hurdles by jumping, crouching, and moving left and right. So we will need to worry about four controls that are normally controlled using a keyboard.

  • Up arrow key to make the character jump
  • Down arrow key to make the character crouch
  • Left arrow key to move  the character to left
  • Right arrow key to move the character to right.
moves pose detection

Using the Pyautogui library, we will automatically trigger the required keypress events, depending upon the body movement of the person that we’ll capture using Mediapipe’s Pose Detection model.

I want the game’s character to:

  • Jump whenever the person controlling the character jumps. 
pose detection jump
  • Crouch whenever the person controlling the character crouches.
pose detection crouch
  • Move left whenever the person controlling the character moves to the left side of the screen.
pose detection left
  • Move right whenever the person controlling the character moves to the right on the screen.  
pose detection right

You can also use the techniques you’ll learn in this tutorial to control any other game. The simpler the game, the easier it will be to control. I have actually published two tutorials about game control via body gestures.

Alright now that we have discussed the basic mechanisms for creating this application, let me walk you through the exact step-by-step process I used to create this.

Outline

  1. Step 1: Perform Pose Detection
  2. Step 2: Control Starting Mechanism
  3. Step 3: Control Horizontal Movements
  4. Step 4: Control Vertical Movements
  5. Step 5: Control Keyboard and Mouse with PyautoGUI
  6. Step 6: Build the Final Application

Alright, let’s get started.

Download Code

Import the Libraries

We will start by importing the required libraries.

Initialize the Pose Detection Model

After that we will need to initialize the mp.solutions.pose class and then call the mp.solutions.pose.Pose() function with appropriate arguments and also initialize mp.solutions.drawing_utils class that is needed to visualize the landmarks after detection.

Step 1: Perform Pose Detection

To implement the game control mechanisms, we will need the current pose info of the person controlling the game, as our intention is to control the character with the movement of the person in the frame. We want the game’s character to move left, right, jump and crouch with the identical movements of the person.

So we will create a function detectPose() that will take an image as input and perform pose detection on the person in the image using the mediapipe’s pose detection solution to get thirty-three 3D landmarks on the body and the function will display the results or return them depending upon the passed arguments.

pose detection landmark

This function is quite similar to the one we had created in the previous post. The only difference is that we are not plotting the pose landmarks in 3D and we are passing a few more optional arguments to the function mp.solutions.drawing_utils.draw_landmarks() to specify the drawing style.

You probably do not want to lose control of the game’s character whenever some other person comes into the frame (and starts controlling the character), so that annoying scenario is already taken care of, as the solution we are using only detects the landmarks of the most prominent person in the image.

So you do not need to worry about losing control as long as you are the most prominent person in the frame as it will automatically ignore the people in the background.

Now we will test the function detectPose() created above to perform pose detection on a sample image and display the results.

pose detection output

It worked pretty well! if you want you can test the function on other images too by just changing the value of the variable IMG_PATH in the cell above, it will work fine as long as there is a prominent person in the image.

Step 2: Control Starting Mechanism

In this step, we will implement the game starting mechanism, what we want is to start the game whenever the most prominent person in the image/frame joins his both hands together. So we will create a function checkHandsJoined() that will check whether the hands of the person in an image are joined or not.

The function checkHandsJoined() will take in the results of the pose detection returned by the function detectPose() and will use the LEFT_WRIST and RIGHT_WRIST landmarks coordinates from the list of thirty-three landmarks, to calculate the euclidean distance between the hands of the person.

pose detection distance

And then utilize an appropriate threshold value to compare with and check whether the hands of the person in the image/frame are joined or not and will display or return the results depending upon the passed arguments.

Now we will test the function checkHandsJoined() created above on a real-time webcam feed to check whether it is working as we had expected or not.

Output Video:

Woah! I am stunned, the pose detection solution is best known for its speed which is reflecting in the results as the distance and the hands status are updating very fast and are also highly accurate.

Step 3: Control Horizontal Movements

Now comes the implementation of the left and right movements control mechanism of the game’s character, what we want to do is to make the game’s character move left and right with the horizontal movements of the person in the image/frame.

So we will create a function checkLeftRight() that will take in the pose detection results returned by the function detectPose() and will use the x-coordinates of the RIGHT_SHOULDER and LEFT_SHOULDER landmarks to determine the horizontal position (LeftRight or Center) in the frame after comparing the landmarks with the x-coordinate of the center of the image.

The function will visualize or return the resultant image and the horizontal position of the person depending upon the passed arguments.

visualize pose detection

Now we will test the function checkLeftRight() created above on a real-time webcam feed and will visualize the results updating in real-time with the horizontal movements.

Output Video:

Cool! the speed and accuracy of this model never fail to impress me.

Step 4: Control Vertical Movements

In this one, we will implement the jump and crouch control mechanism of the game’s character, what we want is to make the game’s character jump and crouch whenever the person in the image/frame jumps and crouches.

So we will create a function checkJumpCrouch() that will check whether the posture of the person in an image is JumpingCrouching or Standing by utilizing the results of pose detection by the function detectPose().

The function checkJumpCrouch() will retrieve the RIGHT_SHOULDER and LEFT_SHOULDER landmarks from the list to calculate the y-coordinate of the midpoint of both shoulders and will determine the posture of the person by doing a comparison with an appropriate threshold value.

The threshold (MID_Y) will be the approximate y-coordinate of the midpoint of both shoulders of the person while in standing posture. It will be calculated before starting the game in the Step 6: Build the Final Application and will be passed to the function checkJumpCrouch().

But the issue with this approach is that the midpoint of both shoulders of the person while in standing posture will not always be exactly the same as it will vary when the person will move closer or further to the camera.

To tackle this issue we will add and subtract a margin to the threshold to get an upper and lower bound as shown in the image below.

pose detection threshold

Now we will test the function checkJumpCrouch() created above on the real-time webcam feed and will visualize the resultant frames. For testing purposes, we will be using a default value of the threshold, that if you want you can tune manually set according to your height.

Output Video:

Great! when I lower my shoulders at a certain range from the horizontal line (threshold), the results are Crouching, and the results are Standing, whenever my shoulders are near the horizontal line (i.e., between the upper and lower bounds), and when my shoulders are at a certain range above the horizontal line, the results are Jumping.

Step 5: Control Keyboard and Mouse with PyautoGUI

The Subway Surfers character wouldn’t be able to move left, right, jump or crouch unless we provide it the required keyboard inputs. Now that we have the functions checkHandsJoined()checkLeftRight() and checkJumpCrouch(), we need to figure out a way to trigger the required keyboard keypress events, depending upon the output of the functions created above.

This is where the PyAutoGUI API shines. It allows you to easily control the mouse and keyboard event through scripts. To get an idea of PyAutoGUI’s capabilities, you can check this video in which a bot is playing the game Sushi Go Round.

To run the cells in this step, it is not recommended to use the keyboard keys (Shift + Enter) as the cells with keypress events will behave differently when the events will be combined with the keys Shift and Enter. You can either use the menubar (Cell>>Run Cell) or the toolbar (▶️Run) to run the cells.

Now let’s see how simple it is to trigger the up arrow keypress event using pyautogui.

Similarly, we can trigger the down arrow or any other keypress event by replacing the argument with that key name (the argument should be a string). You can click here to see the list of valid arguments.

To press multiple keys, we can pass a list of strings (key names) to the pyautogui.press() function.

Or to press the same key multiple times, we can pass a value (number of times we want to press the key) to the argument presses in the pyautogui.press() function.

This function presses the key(s) down and then releases up the key(s) automatically. We can also control this keypress event and key release event individually by using the functions:

  • pyautogui.keyDown(key): Presses and holds down the specified key.
  • pyautogui.keyUp(key): Releases up the specified key.

So with the help of these functions, keys can be pressed for a longer period. Like in the cell below we will hold down the shift key and press the enter key (two times) to run the two cells below this one and then we will release the shift key.

Now we will hold down the shift key and press the tab key and then we will release the shift key. This will switch the tab of your browser so make sure to have multiple tabs before running the cell below.

To trigger the mouse key press events, we can use pyautogui.click() function and to specify the mouse button that we want to press, we can pass the values leftmiddle, or right to the argument button.

We can also move the mouse cursor to a specific position on the screen by specifying the x and y-coordinate values to the arguments x and y respectively.

Step 6: Build the Final Application

In the final step, we will have to combine all the components to build the final application.

We will use the outputs of the functions created above checkHandsJoined() (to start the game), checkLeftRight() (control horizontal movements) and checkJumpCrouch() (control vertical movements) to trigger the relevant keyboard and mouse events and control the game’s character with our body movements.

Now we will run the cell below and click here to play the game in our browser using our body gestures and movements.

Output Video:

While building big applications like this one, I always divide the application into smaller components and then, in the end, integrate all those components to make the final application.

This makes it really easy to learn and understand how everything comes together to build up the full application.

Join My Upcoming Computer Vision For Building Cutting Edge Applications Course

A Course that goes beyond basic applications and teaches you how to create some next-level apps that utilize physics, deep learning (LSTM + CNN) + classical image processing, hand and body gestures to do a variety of very interesting things.

Summary:

In this tutorial, we learned to perform pose detection on the most prominent person in the frame/image, to get thirty-three 3D landmarks, and then use those landmarks to extract useful info about the body movements (horizontal position i.e., left, center or right and posture i.e. jumping, standing or crouching) of the person and then use that info to control a simple game.

Another thing we have learned is how to automatically trigger the mouse and keyboard events programmatically using the Pyautogui library.

Now one drawback of controlling the game with body movements is that the game becomes much harder compared to controlling it via keyboard presses. 

But our aim to make the exercise fun and learn to control Human-Computer Interaction (HCI) based games using AI is achieved. Now if you want, you can extend this application further to control a much more complex application.

Let me know in the comments, you can also reach out to me personally for a 1 on 1 Coaching/consultation session in AI/computer vision regarding your project or your career.

Ready to seriously dive into State of the Art AI & Computer Vision?
Then Sign up for these premium Courses by Bleed AI

2 Comments

  1. Tanisha Banik

    Hello sir,

    Inspired by your video, I have tried to make the balloon archer game using opencv and mediapipe. But my integration with pygame is not working. I would be grateful if you can have a look at my code.

    Reply
  2. zakir

    Sir , Can you please help me I am not able to run this program

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *