Prox Offensive Information Security
← Blog

June 22, 2026 · Prox Offensive · Research, Threat Emulation

Emulate, Detect, Evolve: An APT33 Adversary-Emulation Case Study

A public-safe walkthrough of emulating APT33-like behavior in a controlled lab to validate detection coverage — the ATT&CK mapping, the Sigma and Splunk artifacts, and what the telemetry actually caught.

Key takeaways

  • We emulated APT33-like behavior in a controlled lab to answer one question: would our detections actually fire?
  • The exercise is public-safe by design — methodology and defensive artifacts only, with offensive commands, credentials, and target details deliberately excluded.
  • Process-creation and command-line telemetry were the most reliable signals; the biggest challenge was tuning out legitimate admin tooling.
  • The point isn’t to “hack” — it’s to understand how an adversary behaves so defenders can see them. That same attacker’s-eye view is what sharpens how we map a client’s external exposure.

Adversary emulation is one of the most useful things a security team can do, and one of the most misunderstood. It is not about breaking into things. It is about reproducing how a known threat actor behaves in a safe environment, then checking whether your logging, detections, and people actually notice. This is a summary of one such exercise — modeling APT33 — and what it taught us. The full public-safe artifacts are on GitHub: apt33-scythe-case-study.

Why emulate APT33?

APT33 is a publicly reported threat group with well-documented tradecraft across discovery, persistence, and destructive impact. Emulating a named adversary — rather than generic “attacker” behavior — forces realism: you model the specific techniques defenders are likely to face, then measure detection coverage against them.

Crucially, the goal is detection validation, not exploitation. Every behavior is reproduced at a high level with benign, inert tooling so the focus stays on the telemetry it generates.

Public-safe by design

A recurring principle in our research: share the method and the detections, never the weapon.

  • All activity is confined to a private lab with dummy data — no real credentials, no malware, no external targets.
  • The published material omits copy-paste offensive commands and internal infrastructure details entirely.
  • What’s released is the part that helps defenders: the plan, the ATT&CK mapping, and the detection artifacts.

This is the same ethic that governs our client work — authorized, controlled, and oriented toward making you safer, not demonstrating that we can cause harm.

The emulation plan (high level)

The exercise moved through seven phases, each chosen to generate specific, observable telemetry:

  1. Preparation — establish baseline logging, validate time sync, create benign test files.
  2. Initial access simulation — user-initiated execution of a benign payload; capture process ancestry.
  3. Discovery — host, user, and network discovery behaviors.
  4. Persistence — a registry run-key entry using a benign binary.
  5. Defense evasion — simulated log clearing, in the lab only.
  6. Collection & exfiltration — stage a small dummy file and simulate a single transfer.
  7. Impact — simulated encryption of dummy files only.

Mapping to MITRE ATT&CK

Each phase maps to specific techniques, which is what makes coverage measurable:

TechniqueNamePhasePrimary detection signal
T1059Command and Scripting InterpreterDiscoveryProcess creation + PowerShell logging
T1087Account DiscoveryDiscoveryCommand-line discovery indicators
T1016System Network Configuration DiscoveryDiscoveryNetwork discovery commands
T1112Modify RegistryPersistenceRegistry modification telemetry
T1070Indicator Removal on HostDefense evasionEvent-log clearing behaviors
T1486Data Encrypted for ImpactImpactFile-modification patterns + process ancestry

The defender’s view

The output that matters lives in the detections, not the attack. The case study ships:

  • Sigma rules for discovery and tooling patterns (portable across SIEM backends).
  • Splunk hunting queries derived from the same logic, to validate detections against lab telemetry.

The telemetry that made detection possible: Windows Security Event Log, Sysmon (process creation, network connections, image loads), EDR process trees, DNS and proxy logs, firewall/NetFlow for lateral-movement visibility, and PowerShell module + script-block logging.

What the telemetry actually caught

The honest results — including where detection got noisy:

What worked

  • Process-creation and command-line telemetry were the most reliable signals.
  • Discovery patterns mapped cleanly to the Sigma rules and Splunk queries with minimal tuning.
  • Correlating a new binary executing with outbound connections sharply improved fidelity.

What produced noise

  • Legitimate IT and admin tooling triggered discovery-like patterns.
  • Approved remote-access tools looked a lot like adversary tooling.
  • Broad query patterns needed allowlisting to avoid drowning in routine admin activity.

Next iteration

  • Add parent-child process constraints to detections.
  • Allowlist known admin hosts and approved tool hashes.
  • Add time-window correlation across user and host context, and expand coverage for lateral-movement pivots.

Why this matters for your exposure

Emulating an adversary is how we keep our attacker’s-eye view honest — and that perspective is exactly what we bring to assessing your external exposure. Understanding how a real threat actor performs discovery, finds a foothold, and moves toward impact is what tells us which of your exposed, internet-facing assets actually matter, and in what order.

Our research informs our methodology; it isn’t the engagement. If you want that perspective applied to your own public attack surface — non-intrusively — start with an External Exposure Audit Sprint, or read more about how we work.

Prox Offensive — Emulate. Detect. Evolve.

Ready to find out what's exposed?

Book a short call and we'll scope the right engagement for your needs.