Skip to main content

SECCON CTF 14 Quals

·13 mins· loading · loading · ·
Jaybird1291
Author
Jaybird1291
Table of Contents

Reversing
#

Crown Flash
#

Context & initial static pivot
#

I’m trying to get more familiar with Binary Ninja so I used it for this challenge. The quickest anchor is the user-facing prompt string, so I searched for “Flag:” and followed its code references:

One of the references leads to the main interaction routine:

At a high level, this routine:

  1. prints Flag:
  2. waits for input (with a timeout)
  3. reads a line into a buffer
  4. enforces an exact length check (0x25 bytes)
  5. calls a validator through an indirect function pointer (critical pivot)
  6. prints Correct! or Wrong

The binary asks for an input (“Flag: …”) and returns Wrong if validation fails. The interesting part is: the validation logic is not fully present in the main .text in an easy-to-follow way. Instead, it eventually does an indirect call into a memory area that looks like:

  • not part of the main ELF’s .text
  • not part of libc
  • typically a anonymous executable mapping (often RWX in CTF packers/JIT-style stubs)

This is a common CTF anti-static pattern: hide the real validator until runtime, then jump to it.

Goal: recover the validator and its associated constants, then rebuild a solver.


Dissecting the “routine” wrapper
#

Binary Ninja shows the routine as sub_251cce(...) with a lot of “mysterious” arguments because it’s inside a larger C++ program and BN is reconstructing a non-trivial calling context. The portion that matters for reversing is the I/O + dispatch logic:

  1. Stack canary / SSP noise (ignore, but recognize it) We see patterns like:
  • reads from fsbase + 0x28
  • compares at function exit
  • calls a noreturn abort-like function on mismatch (sub_35da60)

That’s the standard x86_64 stack protector canary. It’s not part of the challenge logic, but it is useful to recognize so we don’t spend time on it.

  1. Prompt + timeout: poll(..., 0x7530) The routine prints Flag: and then calls:
  • sub_35a400(arg4, 1, 0x7530) 0x7530 = 30000 ms (30 seconds).

Then:

  • if return == 0 -> prints Too slow!
  • if return < 0 -> errors out (poll)
  • else -> proceeds to read input

This is a typical anti-bruteforce/anti-automation measure, but more importantly it tells that the binary is checking stdin readiness via poll() rather than just blocking on read(). That’s why “syscall-first” debugging is effective:

  1. Input is read through C++ iostream (why the decompile looks “weird”) The visible routine does not directly read(0, buf, ...) in a clean way. It goes through libstdc++ iostreams, then later materializes exactly what the validator needs: a (ptr, len) pair.

That’s why the decompiler shows pointer arithmetic such as: *(rax + *(*rax - 0x18) + 0x20)

This pattern is typical of optimized C++ object layout:

  • rax is an iostream-related object (think std::istream / std::basic_ios internals),
  • the *(*rax - 0x18)-style term behaves like a runtime offset used to reach a base-subobject/state area (common with virtual inheritance + Itanium C++ ABI),
  • the final + 0x20 lands on a field that behaves like a stream state bitmask.

Semantically, the code is checking whether input succeeded using an iostate mask:

  • badbit = 0x1, eofbit = 0x2, failbit = 0x4 (libstdc++-style layout)
  • testing & 0x5 means “reject if badbit or failbit is set”
  • only if the masked result is zero does the routine proceed to validation.

Practical takeaway: we don’t need to fully reverse iostream internals. The register snapshot at the call site proves the actual validator ABI: rdi = input_ptr, rsi = length, rdx = constants_table.

  1. The exact length gate: arg_18 != 0x25

After the stream-state check, the routine enforces:

  • if (arg_18 != 0x25) -> "Wrong"

This is the first hard constraint we can reliably infer: the flag length is exactly 37 bytes.

A practical implication: when we later reconstruct the validator, our solver should iterate exactly 37 steps and treat the candidate as raw bytes (not necessarily null-terminated).

For now, I’ll create a in.bin with 37 A (without \n):

  1. The real pivot: the validator is an indirect call

Once length matches, the routine does:

  • sub_24ee10(&arg_30, arg3) (setup / context)
  • if (arg6(arg_10, arg_18, &data_21eda0) == 1) -> "Correct!" else "Wrong"

This line is the whole challenge:

ok = validator(input_ptr, input_len, table_ptr);

where:

  • input_ptr is arg_10
  • input_len is arg_18 (must be 0x25)
  • table_ptr is &data_21eda0 (the constant table used per index)
  • validator is arg6, not a fixed symbol, but a function pointer resolved at runtime

This matches our earlier observation of an indirect call *... and establishes the validator signature without needing to fully understand every C++ wrapper call


Finding input capture
#

To find where the candidate flag is read we start by catching read and inspecting registers using the Linux x86_64 ABI:

For read(fd, buf, count):

  • rdi = fd
  • rsi = buf
  • rdx = count

At the catchpoint:

  • inspect rsi to see what was read (x/s $rsi or x/32bx $rsi)
  • inspect rdx to see the maximum expected size
  • use bt to locate where the program continues after the read

This establishes:

  • the input buffer address
  • the expected length (or at least the read limit)
  • the control-flow path into validation

Identifying the pivot
#

Following the backtrace into the main code reveals a key sequence that looks like:

mov    ... , %rdi      ; input pointer
mov    ... , %rsi      ; length
lea    ... , %rdx      ; pointer to a constant table
call   *0xc8(%rsp)     ; indirect call through stack-stored pointer

At the call site, the program performs call QWORD PTR [rsp+0xc8], i.e., the validator entrypoint is not a static symbol but a pointer stored on the stack. Dumping that slot (x/gx $rsp+0xc8) reveals the concrete jump target 0x7ffff7ff6000.

Sanity check: the indirect call target (*(void**)($rsp+0xc8)) falls inside the rwxp anonymous mapping, so we’re not guessing, we’re literally jumping into runtime-generated code.

Disassembling that address (x/8i $tgt) shows valid function prologue instructions (pushes and constant initialization), confirming it is executable code. Finally, /proc/<pid>/maps shows 0x7ffff7ff6000-0x7ffff7ff7000 rwxp … 00:00 0, meaning an anonymous RWX page (not backed by the ELF or libc), which is a strong indicator of runtime-generated/unpacked code.

RWX / unpacking / threat model (sanity + implications)

Seeing an anonymous RWX mapping (rwxp + file 00:00 0) is not “normal program behavior” on modern Linux:

  • Why RWX matters: it means the page is both writable and executable, so the process can generate or decrypt code at runtime and immediately run it. This is a classic pattern for unpacking / JIT / staged validators.
  • Threat model intuition: the binary is deliberately structured so that the “real logic” is absent from .text and only exists in memory, defeating purely static reversing (strings, xrefs, naive decompile).
  • Reverse-engineering consequence: the workflow shifts from “find a function in the ELF” to:
    1. capture the function pointer (here: the indirect call *0xc8(%rsp) target),
    2. confirm the page is executable (info proc mappings / /proc/<pid>/maps),
    3. dump the page (dump memory / dump binary memory) and analyze the generated code directly.
  • Security aside: systems enforcing strict W^X policies typically avoid RWX; CTF binaries often keep it RWX for simplicity, but the concept is the same as real-world loaders (write -> mprotect RX -> execute).

Important observations:

  1. Indirect call (call *0xc8(%rsp))
    This is the hallmark: the target is not a fixed symbol, it’s a pointer computed/loaded at runtime.

  2. Registers line up with a classic validator signature:
    rdi points at the input string/buffer
    rsi is the input length (here: 37, 0x25)
    rdx points at a table of constants used during validation


Into runtime code
#

a.k.a the hidden validator

Single-stepping (si) over the call *... moves RIP into an address like:
0x00007ffff7ff6000 (cf. gef screenshot above)

That address range is suspicious:

  • not inside the main binary mapping
  • not a typical libc .text region
  • looks like a dedicated mapped page

This strongly suggests: the validator code is unpacked/constructed at runtime and executed directly.

The fastest way forward is to dump the page and analyze it.


Dumping runtime artifacts
#

  1. We dump the validator page which is a typical size of 0x1000 bytes:
dump binary memory validator.bin 0x7ffff7ff6000 0x7ffff7ff7000
  1. We dump the constant table used by the validator.

The table is addressed via (%rdx, %rcx, 4) = that’s an array of 32-bit words.
Length is 37 = 37 dwords = 148 bytes = 0x94

If table base is 0x21eda0, then end is:

  • 0x21eda0 + 0x94 = 0x21ee34
dump binary memory table.bin 0x21eda0 0x21ee34
  1. We dump the 4-byte cycling XOR key (the “salt”)

Near the end of the validator, a short 4-byte region is used via (i & 3) indexing:

x/4bx 0x7ffff7ff60d0

Observed bytes:

  • 42 19 66 99

Reading the validator logic
#

The validator iterates over each byte of the input and maintains a 32-bit accumulator/state (eax).

The loop structure is explicit in the dumped validator page:

  • cmp rcx, rsi; je ... shows rsi is the loop bound (input length) and rcx is the byte index.
  • movzx r8d, BYTE PTR [rdi+rcx] confirms rdi is the input buffer.
  • mov r9d, DWORD PTR [rdx+rcx*4] confirms rdx is the base of a uint32_t table indexed by i (rcx), matching the (%rdx,%rcx,4) addressing used to justify dumping 37 dwords.
  • and r9d, 0x3 + movzx r9d, BYTE PTR [r12+r9] shows the 4-byte XOR key is selected via (i & 3) from a small byte array at r12 (loaded by lea r12, [rip+0xb4]).

For each position i:

  1. Mix the input byte with a 4-byte repeating XOR key
  2. Add an index-dependent constant (i * 0x9E3779B9)
  3. Fold it into eax with multiply/rotate-style avalanche
  4. Derive a check value (r11) from eax
  5. Mix a per-index constant (r14) with a table entry
  6. Compare computed vs expected; fail fast on mismatch

This is a classic “rolling hash with per-position targets”.


Why byte-by-byte
#

At step i, the check is F(eax_i, b_i, i, table[i]) == 0, where eax_i is fully determined by bytes [0..i-1]. Therefore we can enumerate b_i ∈ [0..255] independently, then advance state.

So at iteration i, the check depends on:

  • the previous state eax (which depends on earlier bytes only)
  • the current candidate byte b
  • constants (table[i], XOR key, multipliers)

So for each position, brute-forcing b ∈ [0..255] is feasible:

  • test each byte
  • keep the one that satisfies the equality
  • advance to the next index with the updated eax

In many CTF validators, there is exactly one satisfying byte per index, producing a unique solution.


Reconstructed algorithm
#

Constants:

  • K4 = [0x42, 0x19, 0x66, 0x99]
  • EAX0 = 0x72616E64
  • C1 = 0x9E3779B9
  • C2 = 0x045D9F3B
  • C3 = 0x7ED55D16
  • C4 = 0xC761C23C

All arithmetic is modulo 2^32.

For index i and input byte b:

  1. r8 = (b & 0xFF) ^ K4[i & 3]
  2. r8 = r8 + (i * C1)
  3. eax = eax + r8
  4. eax = eax * C2
  5. eax = eax + rol32(eax, 7)
  6. r11 = eax ^ (eax >> 16)
  7. r14 = (i + 1) * C3 + C4
  8. target = table[i] ^ r14
  9. check: r11 == target

If any index fails, the validator returns 0. If all 37 positions pass, it returns 1.

As you could think it depends on r13d but it works so I guess it’s ok.

He got 0/20


Python solver
#

#!/usr/bin/env python3
"""
crown_flash (SECCON) - byte-by-byte stateful solver (pretty output)

- Replays the validator's per-byte update of EAX
- Inverts each step by brute-forcing the next input byte (0..255)
- Produces a readable trace suitable for screenshots
"""

from __future__ import annotations

import argparse
import string
import sys
import time
from typing import Tuple

# ---- Constants extracted from GDB ----

KEY = [0x42, 0x19, 0x66, 0x99]  # bytes at 0x7ffff7ff60d0

TABLE = [
    0x5A971813, 0xA5D22E60, 0x376433F2, 0xD2C130B1,
    0x5125721E, 0x6F98B14A, 0x8B0F519C, 0x70C58BC6,
    0x5DCA2401, 0x28A41E01, 0xB14C47C9, 0xD4EC4301,
    0x78CA9B01, 0xD2E7AF09, 0x5A60701A, 0x9085033F,
    0x6C8CF2D3, 0xC7C7F866, 0x308E6A2B, 0xD583D812,
    0x8B797162, 0xB4B76B2B, 0xA68736B6, 0x5E0F2E8D,
    0xA0FF2519, 0x594F9386, 0x52F9812B, 0x5480290B,
    0xD7B19C6A, 0x23B7ABED, 0xEA18BE84, 0xC50EE1A8,
    0xA5E30ABF, 0x3BED05CE, 0x82052868, 0xA3930232,
    0x69F8AB3B,
]

MASK32 = 0xFFFFFFFF

C1 = 0x9E3779B9
C2 = 0x045D9F3B
C3 = 0x7ED55D16
C4 = 0xC761C23C

INIT_EAX = 0x72616E64  # from: mov $0x72616e64,%eax

# ---------------------------------------------------------------------------


def u32(x: int) -> int:
    return x & MASK32


def rol32(x: int, r: int) -> int:
    x = u32(x)
    return u32((x << r) | (x >> (32 - r)))


def step(eax: int, i: int, b: int) -> Tuple[int, int]:
    """
    Mirrors the validator's per-byte update.

    - r8 = (byte ^ KEY[i&3]) + (i * C1)
    - eax = (eax + r8) * C2
    - eax = eax + rol(eax, 7)
    - r11 = (eax >> 16) ^ eax
    """
    r8 = (b ^ KEY[i & 3]) & 0xFF
    r8 = u32(r8 + u32(i * C1))

    eax2 = u32(eax + r8)
    eax2 = u32(eax2 * C2)
    eax2 = u32(eax2 + rol32(eax2, 7))

    r11 = u32((eax2 >> 16) ^ eax2)
    return eax2, r11


# ---- Pretty printing helpers ----

class Style:
    def __init__(self, color: bool) -> None:
        self.color = color and sys.stdout.isatty()

    def _c(self, code: str, s: str) -> str:
        if not self.color:
            return s
        return f"\x1b[{code}m{s}\x1b[0m"

    def bold(self, s: str) -> str:  return self._c("1", s)
    def dim(self, s: str) -> str:   return self._c("2", s)
    def red(self, s: str) -> str:   return self._c("31", s)
    def green(self, s: str) -> str: return self._c("32", s)
    def cyan(self, s: str) -> str:  return self._c("36", s)
    def yellow(self, s: str) -> str:return self._c("33", s)


def printable_byte(b: int) -> str:
    ch = chr(b)
    if ch in string.printable and ch not in "\r\x0b\x0c":
        if ch == "\n":
            return r"\n"
        if ch == "\t":
            return r"\t"
        return ch
    return "."


def progress_bar(i: int, total: int, width: int = 28) -> str:
    done = int(width * (i / total))
    return "[" + "#" * done + "-" * (width - done) + "]"


def solve(verbose: bool, color: bool, delay: float) -> bytes:
    st = Style(color)

    eax = INIT_EAX
    out = bytearray()
    n = len(TABLE)

    if verbose:
        print(st.bold("crown_flash solver"))
        print(st.dim(f"- bytes: {n}"))
        print(st.dim(f"- init eax: 0x{INIT_EAX:08x}"))
        print(st.dim(f"- key: {', '.join(f'0x{x:02x}' for x in KEY)}"))
        print(st.dim("- note: r15-branch intentionally NOT applied"))
        print()

    t0 = time.time()

    for i in range(n):
        # target = table[i] ^ (((i+1)*C3) + C4)
        r14 = u32(u32((i + 1) * C3) + C4)
        target = u32(TABLE[i] ^ r14)

        found = None
        next_eax = None

        for b in range(256):
            eax2, r11 = step(eax, i, b)
            if r11 == target:
                found = b
                next_eax = eax2
                break

        if found is None or next_eax is None:
            raise RuntimeError(f"No byte found at index {i} (eax=0x{eax:08x})")

        out.append(found)

        if verbose:
            bar = progress_bar(i + 1, n)
            ch = printable_byte(found)
            partial = out.decode("ascii", errors="replace")
            print(
                f"{st.cyan(bar)} {st.bold(f'{i:02d}/{n-1:02d}')}  "
                f"eax={st.yellow(f'0x{eax:08x}')}  "
                f"target={st.dim(f'0x{target:08x}')}  "
                f"byte={st.green(f'0x{found:02x}')}('{st.green(ch)}')  "
                f"-> eax'={st.yellow(f'0x{next_eax:08x}')}  "
                f"{st.dim('partial=')}\"{partial}\""
            )
            if delay > 0:
                time.sleep(delay)

        eax = next_eax

    dt = time.time() - t0
    result = bytes(out)

    if verbose:
        print()
        print(st.bold("Result"))
        try:
            s = result.decode("ascii")
            print(st.green(s))
        except UnicodeDecodeError:
            print(st.green(repr(result)))
        print(st.dim(f"done in {dt:.3f}s"))

    return result


def main() -> None:
    ap = argparse.ArgumentParser(description="crown_flash solver (pretty output)")
    ap.add_argument("-q", "--quiet", action="store_true", help="only print the final flag")
    ap.add_argument("--no-color", action="store_true", help="disable ANSI colors")
    ap.add_argument("--delay", type=float, default=0.0, help="sleep N seconds between steps (for nicer screenshots/videos)")
    args = ap.parse_args()

    flag = solve(verbose=not args.quiet, color=not args.no_color, delay=args.delay)

    if args.quiet:
        try:
            print(flag.decode("ascii"))
        except UnicodeDecodeError:
            print(flag)


if __name__ == "__main__":
    main()


Sanity checks
#

To be confident the reconstruction is correct we:

  • Confirm the validator reads table[i] as dword:
    • instruction pattern: mov (%rdx,%rcx,4), %r9d
  • Confirm input length is fixed:
    • rsi == 0x25 at the validator entry
  • Confirm the XOR key is used with (i & 3):
    • pattern: and $0x3, reg then movzbl (key + reg), ...

What I learnt
#

  • Syscall-first dynamic reversing is the fastest route when symbols are stripped and code is relocated.
  • An indirect call into a “weird” region is a strong unpack/JIT indicator.
  • Dumping a runtime validator + constants is often all that’s needed to solve.
  • Rolling-state validators are frequently solvable incrementally (256 brute-force per byte) when each index has an independent target.

FLAG: SECCON{good->sPLqsLsooJY,EFwBU8Std7Y}