Reverse Engineering Challenge

Introduction

Hello and welcome back! Today we are going to solve the sh4ll1 challenge from crackmes.one. The name should sound pretty familiar! I solved sh4ll0 a while back. This is by the same author but this time the challenge is a little tricky. The description for this challenge gives us a tip saying there's noise in the stack. Let's keep this in mind as we analyze the binary. You're always welcome to take a look at my YouTube video!

Sh4ll1 Video Walkthrough

Optional Materials to Follow Along

If you want to follow along you can use my VM or you can use your own. No pressure! At the very least you'll need the binary. The binary comes in a password protected zip file. The password is: crackmes.one. I would also recommend a disassembler like Ghidra or IDA. With all of that out of the way, let's get reversing.

Initial Triage

Let's start by running file on the binary.

Great it looks like we have symbols. Let's take a look at the symbols.

There are two functions that are interesting: systemo and systemv. These are interesting only because it appears they are user defined functions. Also notice the GLIBCXX_3.4 suffix on a few of the symbols. This indicates that the source for this binary was C++ and not C as we are used to seeing. Be prepared to see some disgusting disassembly. Well it's not that bad but C++ code is usually gross. Before we get to the disassembly let's go ahead and run strings.

Just like in the author's previous challenge, we have a three messages; a prompt (Password:), a success message (Good Password), and an error message (Bad password). That's about all we can gather from the binary at the moment. Let's go ahead open this up in IDA.

Static Analysis with IDA

This is pretty interesting. The main function calls two functions: systemv and systemo. Let's take a look at the systemv function.

This function looks a little strange. It is storing values in the stack variables. Interestingly, we don't see space being allocated for these variables. Since these are integers, we would expect to see the RSP register subtracted by 12. The size of an integer is usually 4 bytes and since there are 3 variables we get 3 * 4 which is 12. This leads me to believe that this function contains inline assembly. I can't think of another way to achieve this in C++ code, however, if the reader is aware I'd love to hear. So, we can assume this function looks something like this:

void systemv()
{
        asm(R"(
                movl $0x5, -0x4(%rbp)
                movl $0x7, -0x8(%rbp)
                movl $0x1f5, -0xC(%rbp)
        )");
}

The beauty of C/C++ is you have this level of control which you don't get in languages like Java or Python. This control can be great if you know what you're doing but C/C++ will allow you to shoot yourself in the foot. In fact, it gives you the gun! You might notice the assembly is a little different than what we are used to seeing. That's because the syntax for inline assembly has to be AT&T syntax whereas IDA and Ghidra use Intel syntax by default. Alright so we know that these values get stored in memory. Let's take a look at the systemo function.

The function prolog looks more like what we expect. We see that the RSP register is subtracted by 16 (0x10). This means that the stack from for systemo is essentially the same as the stack frame for systemv. What does that mean? systemo essentially has access to the stack variables from systemv. Pretty interesting if you ask me. If that isn't clear I'll show you what this looks like in gdb at the end of this blog post. You'll be able see the systemo stack contain variables that the systemv function "created." With this knowledge we have just gained, let's go ahead and rename the variables to reflect the values they hold. These are the same values from the systemv function.

Now, let's go ahead and analyze this function with this new information in mind. We see that is_0x7 is added to is_0x5. The result is stored in is_0x5. We then see the result is multiplied by 0x2D and stored in the is_0x1F5 variable. So, is_1F5 holds ((5 + 7) * 45) which is 540. We see var_10 is initialized to 0. If you're confused by the 3 instructions after that you are not alone. This confused me for a while. We are used to seeing a string get loaded in a register when we see a printf call, however, it is a little different for cout and cin. These are actually objects of the ostream class. This gets passed to the extraction operator (<<)in the case of cout which is why we see cout get loaded into the RDI register. Similarly, a few lines down we'll see cin get passed to the insertion operator (>>). But, for now "Password:" and cout are passed to << which equates to cout << "Password:"; We then see var_10 and cin get passed to >> which means our user_input is var_10. Then, we see a simple comparison between our user input and is_0x1F5 which if you remember gets set to 540 are compared with one another. So, the password for this challenge is 540. I wonder if this is at all related to the skateboard trick? 🤔 Anyway, let's test our theory.

And we were right!

In case you were wondering, the pseudo C++ code for this function is below:

void systemo()
{
	int is_0x5, is_0x7, is_0x1F5, user_input = 0;
	is_0x1F5 = (is_5 + is_7) * 0x2D;
    cout << "Password: ";
    cin >> user_input;
    if(user_input == is_0x1F5)
    {
    	cout << "Good password" << endl;
    }
    else
    	cout << "Bad password" << endl;
    return 0;
}

Notice I explicitly defined local variables in this function. This is required in order to setup the stack frame properly.

An Extremely Brief Primer on Stack Frames

A little bit of background knowledge. When a function runs, a stack frame is created. This stack frame holds all of the local variables, arguments passed from another function, and the return address for the functions caller. Alright let's run the program in gdb. If you're using my VM, you'll notice that gdb has the gef extensions. This will allow us to view the stack. Let's take a look at the main function's stack.

This is the main stack frame. The RSP (0x00007fffffffdfa0) register represents the top of stack while RBP (0x00007fffffffdfa0) represents is the bottom of the stack. The reason both are the same in this case is because the author did not pass argc or argv as arguments to the main function. Had this happened, we would have seen a sub rsp, 0x10 instruction which would make room for the argc and argv variables. However, once we execute the call systemv instruction, the return address for the main function gets push'd on to the stack. The return address is exactly what it sounds like. It's the address that the instruction pointer (RIP) will point to after the systemv function ends. Whenever a value is anything is push'd onto the stack, it causes the stack pointer (RSP)to be decremented by 8. Let's step into the systemv function and take a look at the stack frame when the function is first called and after the function prolog.

systemv stack frame before function prolog executes

systemv stack frame after function prolog

As you can see, before the function prolog, RSP gets decremented by 8 and but RBP remains the same. Also note the return address (0x0000555555554a96) is at the top of the stack. Now remember, this function executes 3 instructions before it returns: mov [rbp-0x4], 0x5, mov [rbp-0x8], 0x7, and mov [rbp-0xc], 0x1F5. After this function returns the stack will be restored to the state it was in before the function call. Then we call the systemo function. Let's step into that function and skip to the end of the function prolog. That is, after the sub rsp, 0x10 instruction is called and take a look at the stack.

systemo stack frame contains the variables defined in the previous function

Well would you look at that! We recognize these values from earlier! Because the systemv function technically never created a stack frame, the systemo function will have a similar stack layout. Therefore, when you create variables in the systemo function, they take the values previously set by the systemv function. I really hope that makes sense! I know it's a little confusing especially if you don't have any experience with the stack. If you have any questions feel free to hit me up! My contact info is at the bottom of this blog post! This will only work if there is no other function call in between systemv and systemo. If you're using my VM I created a few different variations of this program that will produce the expected result and some that will not. Feel free to play around with them. If you aren't using my VM you can find the source code here.

Conclusion

This challenge was a little on the easier side but it did require us to know about stack frames and how they can be manipulated. I hope you all enjoyed this and learned something from this tutorial. If you have any questions feel free to hit me up on Twitter, Instagram, or Discord: jaybailey216#6540. If you have a challenge you want me to try next, let me know and I'll give it a shot! I'll see you all next time!

Peace out! ✌🏾

Reverse Engineering Challenge - Sh4ll1

Joshua Bailey