Reverse Engineering Challenge - Sh4ll0 Walkthrough

Introduction

Hello! Welcome to my first blog post! In this post, we will take a look at a simple reverse engineering challenge called Sh4ll0 from the website crackmes.one. My intended audience are noob reverse engineers like myself. However, if you are more experienced I'd be delighted if you gave this a read and let me know how I can further improve my skills! I tend to be verbose in my walkthroughs. I also refrain from using the decompiled output as I think it will make us better reverse engineers by reading and understanding the disassembly. If you are a more visual person, I have a YouTube video where I walkthrough this challenge. You can find that here:

My Sh4ll0 Walkthrough YouTube Video

Optional Materials to Follow Along

If you want to follow along feel free to download the VM I provide. You can find instructions on importing the VM here. If you don't want to use my VM that's fine, my feelings won't be shattered. But you will at least need the binary. You can download the binary here. The binary comes in a password protected zip file. The password is crackmes.one.

You'll also need a disassembler. I recommend IDA or Ghidra. With all of that out of the way, let's get reversing!

Initial Triage

When reversing, I start by running the file command. The file command provides some elementary information about a file (whether the binary is 32- or 64-bit, whether the binary is stripped, etc.). Go ahead and run file on the binary. You should see the following output:

We see that we are dealing with a 64-bit non-stripped binary. Non-stripped binaries are my favorite to analyze because user-defined symbols (like function names) are present in the binary. We can view these symbols using the nm command.

The output format is as follows: address of the symbol, symbol type, and symbol name. You can look at the man page for a full breakdown of the different symbol type identifiers. We see there are two functions of interest: badboy and goodboy. I picked out these two functions because these are not standard C functions. Intuitively, we want to execute the goodboy function because we are good boys and girls, right 😀. We also see the strcmp and scanf functions are used. The author is likely taking our user input and comparing it to some known string. Before we open this in Ghidra, the last thing I like to do is run strings on the binary. strings prints all ASCII characters that are at least four characters. The output is long and contains much of the same information we saw in the from nm. However, we do find out a little bit of extra information.

I am only showing a subset of the strings output above. We see what looks like a prompt "Give me your password: " We also see what looks to be the program output: "Good boy" if we type the correct password, and "Bad boy" if we type the wrong password. Finally, we see a password we can try: "er2rg2e1h94flagergjerj". Since we know the binary takes our input and uses the strcmp function, we can probably safely assume that if we provide the password above, we will get the "Good boy" output. All of this helps us drive our analysis when we open a binary in Ghidra. Before we do that, let's run the binary with the potential password and see what happens.

The program returned with a segmentation fault instead of "Good boy" or "Bad Boy" — Segmentation fault when running the binary

This is not what we were expecting to see in the slightest! We can try different standard passwords, but we will always get a segmentation fault. Clearly, we do not understand the program as well as we think we do. So, let's finally open this up in Ghidra. If you are unfamiliar with creating a project in Ghidra feel free to check out this short blog post detailing all of the steps.

Analyzing the Binary in Ghidra

In my YouTube videos, I always create a "pseudo.c" file. This file will hold the disassembly instructions translated to C-like syntax. I'll be updating this as we learn about the binary. Let's start with the first few lines of the main program.

Right away, we see some familiar strings. The prompt "Give me your password" gets loaded into the EDI register, and the printf function is called a few lines after that. We also see the fake flag is stored in a variable local_10 at address 0x400639. Since this is not a memorable variable name, let's rename it to fake_flag. In Ghidra, you can rename stack variables by clicking on the variable you want to rename, local_10, and pressing the L key. The L key is mapped to two different actions so, Ghidra will ask which action to perform. Select "Edit Label" and press Ok.

Rename the variable to fake_flag. You'll see the change reflected in the disassembly.

As you can see, the variable has been renamed. Wherever this variable is used within the main function, it will have our updated name. The C-like syntax for the lines above roughly translates to the following:

fake_flag = "er2rg2e1h94flagergjerj"; MOV [RBP+fake_flag], er2rg2e1h94flagergjerj
/*
MOV EDI, "Give me your password: "
MOV EAX, 0x0
CALL printf
*/
printf("Give me your password: ");

In next code segment we see two variables (local_2b and local_11) get assigned the values 0x25 and 0x11 respectively. I like to rename everything because the default names are difficult to remember. Since we do not know the variable's purpose, let's rename them to their assigned value. So, local_2b becomes is_0x25 and local_11 becomes is_0x73.

At address 0x400649, we see the value 0x73 is loaded into the EAX register that is then XORed with 3. So, the value of the EAX register holds ox73 ^ 0x3 -> 0x70. At address 0x400650, we see the variable local_2a is assigned the value of the AL register. The AL register holds the lower 8 bits of the EAX register. So this is essentially assigning 0x70 to local_2a. Right below that, we see local_29 get assigned to 0x0. Just as we did before, let's rename these variables to the value they have been assigned. I've done that below.

Now we get to lines 0x400657 and 0x40065b, both of which use the LEA instruction. This instruction stands for "Load Effective Address." LEA loads an address calculated by the operand. Ghidra makes the instruction a little more confusing by adding local_28 in the instruction. The actual instruction looks like LEA RDX, [RBP + -0x20]. Ghidra adds the local_28 identifier because the value of [RBP + -0X20] is a variable on the stack, which Ghidra named local_28. So, it is essentially storing the address of local_28 in the RDX register. Similarly, the instruction below is LEA RAX, [RBP + -0x23]. The following instructions set up the scanf function call. In 64-bit binaries, arguments are passed via registers. The RDI and RSI register holds the first and second arguments, respectively. With this in mind, we can infer that local_28 is the user input and is_0x25 is the format specifier. I've renamed the variables to reflect this new information.

Note: The EAX register is set to 0. If you recall, EAX was set to 0 earlier, before the call to printf. Functions with a variable number of arguments use the AL register to hold the number of vector registers. Vector registers are usually used to hold floating-point values. So, printf("%f", 1.0f); would result in EAX being set to 1.

You might notice a problem here. A format specifier needs an identifier that tells the compiler what type of data is in the variable (e.g. s, x, f). At first, I assumed the format specifier was %s, but you know the saying about making assumptions. The author did a little tomfoolery here. Instead of using a static format specifier, the author calculated the specifier and assigned the result to a variable. If that doesn't make sense, hopefully, it will be clear when I write the code.

int is_0x73;
char format_specifier[3];
format_specifier[0] = '%'; // MOV [RBP+format_specifier], 0x25
format_specifier[2] = is_0x73 ^ 3; // MOV [RBP+is_0x70], AL
format_specifier[3] = '\0'; // MOV [RBP+is_0x0], 0x0

Of course, I'm not 100% sure this is verbatim what the author wrote, but it makes sense given the disassembly output. We can infer the format specifier is %p (0x70 is p in ASCII). So our pseudo.c file looks like this now:

fake_flag = "er2rg2e1h94flagergjerj"; MOV [RBP+fake_flag], er2rg2e1h94flagergjerj
/*
MOV EDI, "Give me your password: "
MOV EAX, 0x0
CALL printf
*/
printf("Give me your password: ");
int is_0x73 = 0x73;
char format_specifier[3];
format_specifier[0] = '%'; // MOV [RBP+format_specifier], 0x25
format_specifier[2] = is_0x73 ^ 3; // MOV [RBP+is_0x70], AL
format_specifier[3] = '\0'; // MOV [RBP+is_0x0], 0x0

Now, let's look further down and see how our user input is used.

We see user_input gets loaded into the RAX register. Then, the is_0x73 variable is loaded into the RDX register and is then negated. Next, at address 40067b, the RAX register is added to the RDX register. This essentially subtracts 0x73 from our user_input. The updated user_input variable is stored in the RAX register. Finally, we get to the crux of the code. At address 400686, the RAX register is used as the operand of the CALL instruction. Our user_input is used as a function pointer. This explains why we received a segmentation fault. If it's not clear, I'll provide a brief explanation.

When a program is executed, it is given an address space that it can use for all of its resources required, like functions and variables. A binary can ONLY access the memory inside this address space. Any attempt to access memory outside of its allowable address space will result in a segmentation fault because the program isn't allowed to access that memory. If you're a terrible programmer like me, you received many of these errors in your programming classes when using pointers.

The code looks something like this:

/*
	MOV RAX, [RBP+user_input]
    MOVSX RDX, [RBP+is_0x73]
    NEG RDX
    ADD RAX, RDX
    MOV [RBP+user_input], RAX
*/
user_input -= 0x73; // The above assembly block translates to this line of C
(*user_input)(); // CALL RAX

Debugging Sh4ll0.bin with GDB w/ GEF

Now let's talk about what happened when we provided the fake flag "er2rg2e1h94flagergjerj" to the program. I'll use a debugger to demonstrate.

Let's open this up in gdb. If you installed gef, it'll look like this:

Because we have symbols, we can disassemble the main function by typing disas main.

Using gdb to disassemble the main function — Disassembling the main function in GDB

Let's set a breakpoint at address 0x400686 by typing b *0x400686 and run the program by typing r. When you run the program, you'll get prompted to insert a password. Insert the password from earlier: "er2rg2e1h94flagergjerj".

After you type in the password, the program will pause at the address we specified. This is just before our user input is placed in the RAX register. In gdb, our user input is represented as [rbp-0x20]. Unfortunately, we cannot change this identifier. At least I don't know how to do so. Let's execute this instruction and pause. We can do this by typing step instruction. Alternatively, you can simply type si which has the same effect. Once you do that, take note of the value stored in the RAX register.

RAX holds the value of 0xe even though we provided er2rg2e1h94flagergjerj — RAX holds the value of `0xe` even though we provided er2rg2e1h94flagergjerj

The RAX register holds the value 0xe but we typed er2rg2e1h94flagergjerj. Why is that? Because the format specifier, %p, expects a pointer. Pointers can only store hexadecimal values. Since e is a valid hexadecimal value, but r is not, only e was stored in our user_input and the rest was discarded. You might already see the problem. When we reach the instruction at address 0x40067b, we subtract the value stored in the RDX register (0x73) from the value stored in the RAX register (0x3).

You can see the results by stepping through the instructions 3 times. So, type si and press enter 3 times. Now, look at the value stored in the RAX register.

RAX after subtracting 0x73 is 0xffffffffffffffff9b — RAX after subtracting 0x73 is `0xffffffffffffffff9b`

The value in RAX is 0xffffffffffffffff9b. This address is going to be used in the CALL instruction at address 0x400686. This is problematic because this address does not fall within the allowable address space for this program thus resulting in a segmentation fault.

Let's solve this challenge already!

Alright, now let's put everything together and solve this challenge. Let's take a look at our completed C code.

char fake_flag = er2rg2e1h94flagergjerj;
printf("Give me your password: ");
int is_0x73 = 0x73;
void(*user_input)(); // Creating a void function pointer because the function we want to call does not return a valed nor does it accept an argument
char format_specifier[3];
format_specifier[0] = '%';
format_specifier[2] = 'p';
format_specifier[3] = '\0';
scanf(format_specifier, &user_input); 
user_input -= is_0x73; // We must add 0x73 to the address of the function we want to call
(*user_input)();

With this in mind, it is clear that we have to provide the program the address of the goodboy function (0x4005f2) plus 0x73. Where did I get the address? If you recall, when we ran the nm command earlier, we saw the address of the goodboy function. You can also see it in Ghidra by filtering for goodboy in the symbol tree and double-clicking on the function name.

This function simply prints out the string "Good boy" and exits. So, let's test our theory and run the program and provide 400665 (4005f2 + 0x73).

If you wanted to live on the edge you could also execute the badboy function which prints out "Bad boy". I'll leave that for you to try on your own if you wish.

Conclusion

Alright, that's it for this challenge. I hope you learned something new and enjoyed reading this post. Feel free to check out my YouTube channel and/or other blog posts! If you have any questions feel free to reach out to me on Twitter, Instagram, or Discord: jaybailey216#6540. If you have a challenge you would like me to try, let me know and I'll give it a shot! I'll see you all next time!

Peace out! ✌🏾