This article is a guide to reverse engineer Simatic S7 PLC program blocks. 1
Last revision: May 10 2022.
Introduction
PLC (Programmable Logic Controllers) are specialized computers designed to control industrial systems having real-time processing requirements. They take inputs provided by sensors and generate outputs for actuators. As programmable devices, they execute user-provided software and therefore are susceptible to some classes of software attacks. The most publicized demonstration of that was made by the Stuxnet malware, whose end-goal was to take control, damage, and destroy arrays of centrifuges in a uranium enrichment plant. The analysis of the malicious PLC payload proved to be a long and tedious road 2, and up to this day, tooling and knowledge related to those systems remain limited relative to broadly-known architectures such as x86 or arm.
We attempt to bridge some of this gap by providing S7 analysis modules for JEB Pro. This article shows how they can be used to acquire, analyze, disassemble and decompile PLC program blocks intended to run on Siemens Simatic S7-300 and S7-400 devices, a very popular line of PLC used to operate industrial processes.
Terminology
Throughout the rest of this document, the terms PLC, S7 or S7 PLC are used interchangeably to refer to S7-300 or S7-400 PLC devices. Newer devices in the S7 product line, namely the S7-1200 and S7-1500, are not supported by this JEB extension and won’t be considered here.
The official IDE used to program S7 PLC is called Step 7. Step 7 may be used as-is or as a part of the larger software suite Totally Integrated Automation (TIA).
A PLC program is made of blocks, such as data blocks, function blocks, and organization blocks. In this document, the term program may be understood as (collection of) blocks.
A program is downloaded to a PLC from a Programming Station, that is, a Windows-based computer running the Step 7 editor. When a program is retrieved from a PLC, it is uploaded to the programming station.
The assembly language STL (Statements List) and its bytecode counterpart, MC7, are sometimes used interchangeably.
Finally, the names Simatic, Step 7, and Totally Integrated Automation are trademarks of Siemens AG (“Siemens”).
Primer on S7
This section briefly presents what S7 programs are, their structure, as well as lower level details important to know from a reverse engineering perspective.
Programming Environment
S7 PLC are programmed using Step 7 or TIA’s Step 7 (TIA is a platform required to program the most recent S7 devices), the IDE running on a Windows computer referred to as the Programming Device. Once the program is written, it can be downloaded onto a physical PLC or a simulator program (such as PLCSIM, part of Step 7).
Blocks
A PLC program is a collection of blocks. Blocks have a type (data, code, etc.) and a number.
- Data blocks:
- User data blocks are referred to as DB if they are shared by all code, or DI if they belong to a code block
- System data blocks are named SDB
- Code blocks, also called logic blocks:
- Organization Blocks (OB) are program entry points, called by the firmware
- The principal OB is OB1, the program’s main entry point. It is executed repeatedly by the firmware.
- Other OB can be programmed and called when interruptions happen, exceptions occur, timers go off, etc.
- Function blocks (FB) and System Function blocks (SFB) are routines operating on a provided data block, called the instance data block (DI)
- Function (FC) and System Functions (SFC) are routines that do not require a data block to operate
- Organization Blocks (OB) are program entry points, called by the firmware
The distinction between FB and FC is subtle. Any FB could be written to perform equivalently as an FC, and vice versa. They exist as an easy way to distinguish between a function working as-is, like a C routine would (FC), and a function working on a collection of pseudo-encapsulated attributes, like a C++ class method would (FB).
There are various ways to write PLC code. Programmers may choose to write ladder diagrams (LAD) or function block diagrams (FBD); complex processes may be better expressed in statements list (STL) or in a high-level Pascal-like language (SCL). Regardless of source languages, the program is compiled to MC7 bytecode, whose specifications are not public.
A piece of MC7 bytecode is packaged in a block, along with some metadata (authoring information, flags, etc.) and the interface of the block. The interface of a data block is the block definition itself, a structure type. The interface of a logic block is its set of inputs, outputs, local variables, as well as static variables in the case of a FB, or return value in the case of a FC.
MC7 Code
PLC may be programmed using a variety of methods, such as:
- Ladder logic (LAD)
- Function block diagrams (FBD)
- Assembly-like statement list (STL)
- Structured control language (SCL, a high-level Pascal-like language)
- Other methods exist
Step 7 compiles all source codes to MC7 bytecode, a representation that will be translated and executed by a virtual machine running on the PLC.
STL was relatively well-documented up until the S7-400 3. However, the binary specifications are not public at the time of writing. 4
The MC7 instructions map STL statements, with several notable exceptions (e.g. STL’s CALL is translated to UC/CC with additional code to prepare the Address Register pointer, opened Data Block, set up parameters on the Locals memory area in the case of FC/SFC call, etc.).
Execution Environment
The execution environment for MC7 bytecode is the following:
- Memory areas:
- Digital input, called I (0 to 65536 addressable bytes)
- Digital output, called Q (0 to 65536 addressable bytes)
- Global memory, called M (0 to 65536 addressable bytes)
- Local memory, called L (0 to 65536 addressable bytes)
- A special area V references the local memory of the caller method, i.e. if function f1 calls function f2, V in f2 is L of f1
- Shared data block bytes via the DB1 register, called DB
- Instance data block bytes via the DB2 register, called DI
- Timers, called T (256 addressable 16-bit timers)
- Counters, called C (256 addressable 16-bit counters)
- Registers:
- A program counter PC, not directly accessible
- The PC is modified by intra-routine branching instructions (JU/JL/JC/…)
- A 16-bit Status Word register (only the 9 lower bits are used), from #0 to #8:
- FC: First-Check: if 0, indicates that the boolean instruction to be executed is the first in a sequence of logic operations to be performed (“logic operation string”)
- RLO: Result of Logic Operation: holds the result of the last executed bit logic operation
- STA: Status: value of the current boolean address
- OR: Determine how binary-and and binary-or are combined
- OS: Overflow Stored: copy of the OV bit
- OV: Overflow: set by integer/floating-point instruction on overflow
- CC0/CC1: Condition Codes: updated by arithmetic instructions and comparison instructions (see arithmetic and branching instructions for details on how CC0/CC1 are set and used)
- BR: Binary Result: can be used to store the RLO (via SAVE); is used by system functions (SFC/SFB) as a success(1)/error(0) indicator
- Two 32-bit address registers (AR1/AR2)
- The address register hold a MC7 4-byte pointer (see section on MC7 Types). The area part of the pointer may be ignored (for area-internal access), or may be used (for area-crossing access)
- Two or four 32-bit accumulators (ACCU1/ACCU2, ACCU3/ACCU4 optionally)
- Two data block registers, not directly accessible
- A program counter PC, not directly accessible
Translation in JEB
JEB’s MC7 plugin mirrors the execution environment, and adds several synthetic (artificial) registers to help with MC7 code representation and code translation to IR for the decompiler. The processor details can be examined in the GUI client (menu Native, handler Processor Registers).