In 2014 the inaugural FLARE On presented seven challenges. As a finisher, you can read my write-up here. Each participant has a different take on the challenges. Each person has different methods, skills, and strengths. Mine are forged by years of forensics, log analysis, and working a mission where results are required regardless of ability, training, or excuses. At the end of this post I’ve linked to other write-ups that I’ve seen.
Let’s begin by setting a level of expectation. You are reading a blog named GhettoForensics. The ultimate goal of Ghetto Forensics is to get by with whatever tools and knowledge you have to complete a mission. You will not find first-rate techniques and solutions here. In fact, when presented with multiple options, I often went out of my way to choose to worst, most cringe-worthy option available. For the lulz, and to show that you don’t need advanced reverse engineering training and experience to survive the industry. I hope you enjoy.
For simplicity sake, unless necessary all IDA output will be as decompiled.
Without further ado.
Let’s roll up our sleeves and … oh, nevermind, there’s the routine.
Filename Etymology: i_am_happy_you_are_to_playing_the_flareon_challenge
This is a routine where I would re-implement the instructions, step by step. Load the values into a python script, mimic the values, and after each step make sure my script produces the same result as the debugger, until all done. The challenge takes an encoded value stored in-line with the code and decodes it. This value is best seen referenced in a debugger, but is seen here statically:
When executed, this script prints the email address of:
Filename Etymology: very_success
Again a Borat-style filename? Would ‘rol rol rol your boat’ be too offensive?
When you look at the executable, it has the tell-tale icon for a Python executable. This makes things a bit easier:
I’ve worked a lot with Python executables and knew where to go. You would eventually find it through static analysis, it looks for a “PYZ” overlay in the executable, decompresses it, and runs the resulting compiled Python code:
Everyone has their favorite tools for dealing with such instances. My go-to is pyinstextractor, hosted on SourceForge. Run this against the original executable and it’ll dump the results in your current directory. Now, the issue with this, which had me confused for honestly 30 minutes, is that it will overwrite anything in your directory. As it dumped the Python code to a file named ‘elfie’, overwriting the executable of ‘elfie’, I scrambled trying to find the original source. I didn’t think to look again at the original file to realize it was overwritten. After a herp-derp moment, I opened the file and saw legitimate Python code, though obfuscated:
In this 56,694 line script there are thousands of variables holding what is obviously Base64 encoded data. While you could manually rename these and rebuild them, you could also just replace ‘exec’ with ‘print’ 🙂
The result is another massive Python script. But, in this case, it’s only 48 lines and the email is pretty apparent, though in reverse:
Reverse it out to show:
Filename Etymology: elfie
Sounds obvious: it’s the alleged name of the goat.
As we debug it, we see that 2 + 2 does, indeed, equal 4. This is a good sign.
youPecks. You-P-Ecks. UPX. Hah!
Instead of ripping them out piece by piece, I just dump and reformat with a script:
The value of each byte in the key is added to by its respective value of the string ‘flarebearstare’.
Can I just take a moment to say how awesome I think ‘flarebearstare’ is? I think they named their team FLARE solely to use that phrase, and I would’ve done the same!
To decode, then, we just need to Base64 decode the transmitted text and then take each byte and _subtract_ its respective ‘flarebearstare’ value. Easy peasy.
But, not so.
A first pass gave exceptions of negative numbers. Huh, that’s weird. OK, we’ll just make sure the result is a positive. and … Nope. WTF?
A closer look at the application eventually shows the issue. The Base64 alphabet is wrong. The case is swapped!
After a few side tests, the only output difference is swapped case in the output string. With that, I take the transmitted Base64 string, swap the case, and it decodes perfectly with this script:
Filename Etymology: sender / challenge.pcap
No imagination here. A sending application and challenge. What about sendto_flare-a-lot? 🙂
Then it was all like.
This challenge was an Android APK that, when executed, displays a screen to input an email address. I’ll jump to the chase on this one; there’s really only one function of note in this library, Java_com_flareon_flare_ValidateActivity_validate. There’s some basic math operations here, but I’ll let the other write-ups talk to those.
The algorithm checks to see if the passed input is 46 bytes. It will then take two bytes at a time, perform magic math on those two bytes, and then compare the results to a respective output array. With 23 arrays, the results seem simple. Do the math on each two bytes, if those bytes match the array, then they are correct.
Beyond that, I have no clue what this function is doing. I know what I’ve been told it’s doing, I’ve read other people’s explanations of it, and even had someone afterward sit down and walk me through it. Nope. Still no clue. I do believe that the brain is sometimes ‘color blind’ to things it shouldn’t be, and this challenge fell within that for me.
After spending a month poking at this on almost a daily basis, I had mentally given up. The answer eventually came to me and, upon completion on 28 Aug, I even made a public joke about this based on the time durations of my challenges 🙂
After a week of trying to reimplement the routine in Python, I gave up. There was just too many unknowns to deal with with Python’s limited type casting, when you don’t know what the intent of the code is. I needed to know what the expected outputs should look like. Therefore, I attempted to debug it using various local Android virtual machines. I first tried to use GenyMotion which failed as they removed all ARM support. I then switched to BlueStacks. However, that has a ‘broken’ NAT implementation that only allowed outgoing traffic. And AndyVM kept crashing on a regular basis when making connections.
From there, I installed the IDA server on my own HTC One M7, which worked, but I then ran up against IDA Pro issues:
For each run, I would copy one of the 23 check tables into the code, brute force it, and add that to my output email. This was made easy with the HxD hex editor as you can simply highlight a block of text and “Copy As C#”, automatically formatting it for source code.
After running through each set of characters I obtained the email address:
Oh, come on! an_arm_and_a_leg? rockets_armed? Give us something.
Running the file through de4dot produces output that is much more usable for analysis.
De-obfuscating the executable changes that block, so the resultant values will be completely different.
You can only work off the original. And that’s not easy to do statically, nor with ILSpy.
Instead, we’ll use dnSpy, which makes the solution almost effortless. In it we can simply look for the string builder with the underscore and the comparison immediately afterward:
Now, just debug. Step through the program until you get to this comparison, mouseover text2, and get your password
Re-run the program, type that in, and get your email!
Why are you so meta? The application relies on the metadata stored within the executable.
Challenge 8 was steganography, something that eluded many early in the challenge. The easy part of stego is having a wide selection of tools available. The hard part is knowing when to use them or not. I cannot even express the anguish over Robert Hanssen’s actions and certain sectors of the forensic community having to use AnaDisk on every. single. floppy. disk. they processed. (In my knowledge, there were no positive results from trying it on every single investigation).
Is there a meaning to this? It just appears as drunken keyboard walking. I’m thinking it’s an internal term, likely a password for an APT campaign (because they never keyboard walk, lol).
Now, we get to the harder challenges. This is where I can show my true ghetto analysis attitude! And where I start taking studious notes on everything. I have a week left to get three more challenges done, so the pressure is on.
And let’s start off with a backhanded compliment of a program.
Followed by a look at some instructions and then a big sea of data.
I really dislike the IDA debugger (I’m heavily reliant on Right Click>Follow in Dump) but it’s best for this challenge. There’s a lot of code to get through and most of it useless and, for me, IDA does a better job of recognizing and assembling this code as you step along.
The first goal is to focus on the actual input portion in all of that. So, let’s run it in the debugger, then step through until we get to the input. Set a breakpoint after that part, type in some unique junk (‘ABCD_1234_ABCD_1234@flare-on.com’). Then start a debugger trace with Instruction Tracing. Then, hit F9, and relax.
This trace output contained 9,600 instructions. Not bad. Not easily readable either. Let’s channel our inner Unix admin. I’m at an advantage: I work from home, I’ve already started growing out my neck beard.
Wait, what? Where am I going with all this … We’re looking for loops. We’re looking for the same instructions to be called with varying registers. We’ve seeded the registers with somewhat unique values. I’m hoping to find a mov, xor, cmp, or something usable.
A first pass shows that there are no EAX = 00000031 or 00000065. After digging a little deeper, I see it:
I know that at 0x401A9C each respective byte is loaded into AL. Let’s then poke around for any single-byte XOR’s with ‘grep’ (Are you cringing at this process yet? I know you are. And I like that.)
Boom! So at 0x012FDF8 are calls regarding single-byte XOR. This may not even be relevant, but I like to just log this stuff as I see it. While we’re at it, let’s hunt for any other math routines:
We know from our input breakpoint that the program picks up around 0x40173B. I can see that also as the top of a loop. Based on that, I can search through the trace to find the bottom of the loop that causes a jz/jnz back to there. I see that at 0x401BC8. So now we have a fairly confined boundary to focus on.
Since we see the routine looping, we can sort-of conclude that it’s not exiting if a byte is wrong. Based on this, can we determine the overall email length? Let’s try.
Run a new trace with a unique and long “email”. For this test, I’ll use:
Because we know each character is unique, and we know the location, we can run a simple:
At 41 bytes it stops checking bytes, so we have a pretty high fidelity guess to the email length. The only reason I do a sort | uniq here is that the results are repeated twice, for some reason. So they show up as 82 bytes (two checks of 41 bytes each).
At this point, I’ll follow the code from AL all the way down to see what happens to it.
.text:00401A9C mov al, [eax+ecx]
Stack[000007B0]:0012FDF4 mov ah, [esp+ebx+0B4h] ; XOR key as AH
.text:00401B14 rol al, cl ; ROL key as CL
.text:00401B16 mov ebx, [esp+ebx+2Ch] ; Load cmpxchg value into EBX
Stack[000007B0]:0012FDF8 cmpxchg bl, dl
That last exchange, cmpxchg, was elusive to discover. When debugging, IDA would never display this opcode properly, nor the hex bytes around it, shown here at address 0x12FDF8:
I knew something was happening here, but could not determine exactly what. So, I switched to Immunity and saw the operation jump out:
At the very end, the respective input byte, performed with these operations, would be compared to a static table using cmpxchg. Knowing this, I think of all the possible ways to collect these values and map them out. Then I thought of the worst way possible… spreadsheets!
Yes. I loaded an Excel spreadsheet and, for each byte, marked the XOR byte, ROL byte, and ultimate CMPX value. Is that a look of disgust I see? Oh yeaaahh
Once the routine was discovered, that was about 5 minutes to collect, reverse, and decode the email of:
Other than a possible backhand compliment, especially when combined with input text, there’s no real idea behind this.
Challenge 10 had a lot of different things going on but, at the end, it came down to a few small gimmick hurdles. Let’s get to them one at a time. You’re given an executable, loader. When executed it does quite a few things as I’ll show in my awesome tool that’s on Github and you should contribute to and I totally gave a demo on it at BlackHat 2015 Arsenal, Noriben.
At a high level, loader.exe is run as PID 2700. It drops aut1.tmp and aut2.tmp to %Temp%. After each, an immediate file is created in C:WindowsSystem32. Respectively, challenge.sys and ioctl.exe. Then, a service is created (shown as services.exe:720 as the source) to create a service named “challenge” to point to that challenge.sys. We also then see a new Class created for that service. Finally, loader runs “ioctl.exe” with the argument of 22E0DC.
And those [VT 0/57] ratings? Come on people, you upload your challenges to VirusTotal? That should be an automatic disqualification.
Upon loading loader into IDA, we quickly see that it’s the wrong way to go about this:
It’s an AutoIt executable, for which there will be an encoded, embedded script. These are automatically extracted with aut2exe.exe, which will produce a script that begins with a few hundred lines of code for service management. Discard these; they’re generic and copy pasted from elsewhere. Focus below that:
This is pretty straight forward. If Win7, drop this, if XP, drop that, otherwise do nothing. Beyond the dropping we see calls of hex strings to “dothis()” with a second argument of “flarebearstare”. dothis() simply passes this along to decrypt() and executes the result. decrypt() is the odd ball out, taking a big string of shellcode and throwing it up into memory.
For now, extract the shellcode, convert to hex, save to file, and open in IDA (which is like three key presses with WinHex, just saying).
A 256 count loop to build an array with byte swapping, followed by a whole other loop that XOR’s based on that array? My money’s on RC4. Let’s whip up a quick Python script with the encoded values and check:
This results in the output of:
ShellExecute(@SystemDir & “ioctl.exe”, “22E0DC”)
_CreateService(“”, “challenge”, “challenge”, @SystemDir & “challenge.sys”, “”, “”, $SERVICE_KERNEL_DRIVER, $SERVICE_DEMAND_START)
Nice! Fill back into our original script to get:
Yup, that was a pretty bit of work for such non-climatic results. I’m bored. Let’s go look at ioctl.exe.
Welp, that was equally boring. Take a hex value as arg1, pass it along to DeviceIoControl as dwIoControlCode, where the hDevice (v7) is the “FileName” of \.challenge. So, take an arg and pass it to a memory-existent driver. Check.
Because I’m not a glutton for punishment on non-Fridays, I would typically focus on the XP driver for the rest. However, there’s a glitch with that. The dwIoControlCodes in the XP are shown as as WORD values while the Windows 7 driver shows as proper DWORDs:
They both have the same functionality so for static analysis the Win7 driver may be more appropriate to use. There are a few things you should see with these drivers. There are 199 referenced functions. Typically, then, I’d sort functions by size and look at the smallest, then the largest. The largest are more fun here…
It’s … so beautiful. m0n0sapiens put it most succinctly:
— m0n0sapiens (@m0n0sapiens) August 11, 2015
Or, in a more disco groove:
It’s raining threads! #Flareon
— int main(void) (@E___H___) August 18, 2015
If you follow the big three functions you’ll see that all three end with data pushed into the same function, that feeds into this:
As with any unusual math routine that may be encoding, look for seed values and Google them. In this case, you’ll see it referenced as XTEA (eXtended Tiny Encryption Algorithm), a well known routine. At the end of each of those three routines is a buffer passed into this decryptor. But, how are each called?
In this case, there is a single subroutine with a switch statement of 101 cases, each a DWORD value. If we find the one used by the dropper we see it pointing to the large “Triangle” routine. I’ll point it out below along with the other three large ones (which I’ll name Parse1, Parse2, and Parse3). I’ve modified this image to remove cruft:
Here we see the code sent from the dropper: 22E0DC, which points to that massive triangle function. Others have written up details of this function and how it works. I skipped it. It had no meaningful calls from it and wasn’t related to the XTEA decryption routine, so I put it on the backburner.
I focus on the XTEA and work back. For each Parse routine this decryptor is called with a buffer of data and a buffer size. That size is slightly obfuscated just because it is set at the very beginning in a mess of other values. I’ll do some magic photoshopping to demonstrate these.
Parse1() calls the decryptor with a 40 (0x28) byte buffer while the other two call it with an 80 (0x50) byte buffer. Each buffer is made up of individual global bytes that are created from subroutines underneath each Parse() routine. The obvious and professional route is clear from a static perspective. Follow the xref’s back from each byte, grab the value, and populate it into the binary.
That’s what others did. That’s not how I roll. Let’s do this live in a debugger. Our hurdle here is to attach to a device driver in memory. That would typically involve using WinDbg at a kernel level, which I do not know how to do (it’s on my bucket list, trust me, right below base jumping in South America). I don’t need to run it properly, I just need to throw it in memory for me to mess with.
So, I use CFF Explorer to modify the PE header, change the Subsystem to a DLL, and save it. I then debug rundll32.exe with an argument calling this new “DLL”. It works!
I take the entry point as it appears in the debugger (0x9C0000) and rebase IDA. Now I can directly see where changes and calls are made. However, as I quickly learn, I have many errors in actually running this. The memory segments that it is loaded in are Executable only. So, in Immunity, switch to the memory map view and just set them all as Full Access. (Didn’t I warn you about how ghetto I was going to make this? You haven’t seen anything yet!)
I throw calls to the three Parse() routines and notice that Parse1() ends with a blank buffer. Passing it into TEA fills it with garbage. I try to place data into the buffer, different junk comes back. This must be an INOUT buffer. But it’s not populated at all. I trace the calls to populate these bytes back, set a few breakpoints, and see that they’re never called. There are 40 conditions that are never met. From a debugger POV I can now try to change those conditions, or BP at each and change the Z flag. Or I can make ghetto calls (my personal favorite).
While in ntoskrnl space, just because I was arbitrarily sitting there, I pull the xref from each subroutine in IDA and just … call them. One at a time. And watch the buffer fill. You can ghetto call because there are no arguments to pass in and no results back. It doesn’t break the stack … much.
I then call Parse1(), track it to the end, make the call, and get my email address:
You cringe at how I did that, but I got it done in just a few hours, so phooey on you.
Filename Etymology: loader / challenge-7.sys / challenge-xp.sys
I am disappoint. How about: driving_mr_pythagoras? dantes_inferno? Let’s brainstorm this, people.
This is the final challenge and it shows. With the exception of the issues with #6, this sample took me the longest out of all challenges from this year and last. There is a lot going on. And, being honest, the other write-ups will explain this challenge much better than I and will provide more professional answers. Read at your own risk.
The executable, CryptoGraph, contains fairly customized encryption that is seeded by a command line argument to decrypt an embedded resource into, ultimately, a JPG. For one, I’m glad they used JPG so that we could avoid the whole GIF vs JIF debate.
Part One of this challenge is processing the command line argument directly against embedded data to produce a new set of data. This data will vary based on the argument passed and how many times it had to verify the data contents.
Part Two takes the results of Part One to seed an RC5 decryption of another embedded resource to the disk.
This seems fairly straightforward. We can brute force the command line options until we get a JPG. This is quite similar to the final challenge last year. However…
- The runtime duration of this application is approximately 15-20 hours.
- Even with the correct command line argument, the correct number of data loops needs to be determined. Running to the end will produce a garbage JPG.
Knowing that, I can see where people can write debugger scripts to fuzz registers or values at certain points. But, I have my limitations. I’m going straight in through the front door. That begins, however, with understanding what’s going on. Therefore I spend a few days doing nothing but debugging, following traces, and keeping notes. A LOT of notes.
Based on such notes, I’m proud to share one of the worst ways possible of finishing this challenge successfully.
For one, now that I’ve read other write-ups, I feel foolish in missing one of the very first checks for a null value at 0x401714. Instead, I focused far past that. The issue here is that there are three distinct ways to view code in IDA: hex view, graph view, and decompiler view. Due to the sheer size of many functions I remained in hex view and decompiler view. However, as others learned during this challenge, graph view made it very easy to track unusual jumps past certain areas that should be reached. There’s a learning lesson.
When checking for the first argument there is an early loop where the correct argument will match a value from the embedded resource and then skip to the rest. If it doesn’t match, a global integer (which I’ve named Data_Checks) is incremented, and the process continues.
Past this is the main loop of the program, shown below, that repeats 32 times. Each time, the speed becomes slower and slower, based on the v16 value passed into Core_Decoding_Loops(), which often numbers in the millions.
There are a few references to incrementing Data_Checks and I tried my hardest to make sure the flow got to that value. After every loop that number incremented, which I took to be a good thing. (Spoiler Alert: It wasn’t).
For example, in this flow graph, I continually tried to follow the cyan (blue) lines leading to Data_Checks.
After following all of the logic at this point, things started to make sense. The continual iterations were due to data not being found at certain offsets of the resource during each round of modifications. There appeared to be at least one exit condition on the loops that would prevent continuous processing at certain points. A proper command line argument should make the data shift correctly to break out of such loops and speed up code execution. But, how do we test that theory?
There are many proper ways of doing it. Instead, here was mine: Find the slowest computing procedure and, after complete, patch the program to quit. Then brute force and see which number makes it end the soonest. For this, I chose to end immediately after that Core_Decoding_Loops(). Through standard execution, getting from the beginning and past that loop with an arbitrary argument would take two minutes. That sounded like a good spread. I went to the instruction after that call, used Immunity to change the code to “call _cexit” and patched the resulting bytes into the executable.
I wrote a quick Python script to brute force the numbers, timing out any process longer than 60 seconds, and waited.
Now, first, this is not the proper way of doing that. Second, that patch doesn’t make the program actually exit, it just crashes it with an unknown software exception (0xc0000417). So I’d have a ton of numbers do nothing and a small handful that crashed.
Of the three command line arguments that crashed for being less than 60 seconds (205, 238, 240) 205 was unique in reaching that point in literally less than a second. That seemed odd enough to investigate further.
Using 205 as an argument changed the entire outlook of the program. Now, early checks that would increase the Data_Checks global value were skipped. On the very first pass, at 0x4016D4, a routine to ROR and XOR data was tested to ensure that the first DWORD was all nulls. Without a proper command line argument, it would appear similar to this:
However, once given 205, it produced:
Every additional check would also produce expected results, skipping large amounts of number crunching. Additionally, the Data_Checks value was never incremented. This value counts the number of loops in which the data validation failed, suggesting that this value should always stay null.
The second part of this challenge was determining that after every large round of computation, shown in pseudocode earlier, the data is re-encoded. As this data is integral to the second part, it needs to be correct before sending it back. From letting the program run with ‘205’ on a second computer overnight (12 hours to run), I discovered that it would produce a garbage JPG by default. Therefore, we need to break out of this loop before it reaches 32 rounds. But, how many rounds do we let it run?
Others found the clean answer to this problem by examining comparisons on the back end. Me? I had a jug of sangria and time to kill on a Saturday afternoon. So, I manually brute forced it while catching up on my Black Butler episodes. It turns out that it didn’t take that long.
At the end of each round of checks I set a break point and disabled all prior others. I would run to this CMP EAX, 20 then, at the following JB, just change the C flag to cause it to break.
Each round produced junk JPGs until I hit round 10, opened the JPG expecting another round of garbage, and screamed like a teenage girl at a Justin Bieber concert. There I saw some sort of SportsBall player with an email!
Filename Etymology: CryptoGraph
Again. Let’s think about this. spin_me_right_round. grab_some_popcorn. one_bit_hahaha_two_bits_hahaha. …
After sending off the email I tried to figure out who this was and why he was there. TinEye reports him as Lionel Messi who is apparently a good SportsBall player. Or, is he?
There you have it. This was an amazingly fun challenge (except #6) and I learned much along the way. I am now prepared to go back and re-do the challenges using the methods detailed by others. My methods tend to be very brute-force-ish, very ‘mess with things in memory until they work’, CTF-speed hacks. But I am slowly forcing myself to learn the proper methods: WinDbg/GDB scripts, PIN tracing, more IDAPython, debugger memory fuzzing.
Jokes aside, it’s an awesome design and is self-supporting.
FireEye’s Official Solutions
Topher Timzen’s A Successful Yet Failed Flare ON Challenge – The Write-up
AcidShout’s 2015 FLARE-ON challenges writeup
Reno Robert’s v0ids3curity writeup
Mohamed Shetta’s FLARE On 2015 Walkthrough
z3r0zh0u’s XLOYE Write Ups
Julien Perrot Flare On 2 write-up
A Disturbing Lack of Taste Challenges #7 and #8
0x0A Tang Solving for Hashes in Flare-On #5
Did you find benefit or enjoyment from this post? Was it a waste of your time? Please, leave feedback! I’m open to critiques, criticisms, and attaboys. If you like it, I’ll keep creating them. Though, next time, a more Forensics related one.