VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

"Smart" trash: building of logic

Electrical Ordered Freedom #3
August 2011

[Back to index] [Comments]


The main goal of garbage instructions - a hiding/protection of useful code (from av'ers, a watchful eye reverser and other curious). However, the "wrong" trash can lead to detection of viral code, thereby undermining all our efforts.

This text is about how to improve the quality of the generated garbage.

Who the opponent?

Assume that the file is checked, infected with our virus. Antivirus can operate so:

To bypass the emulator, will help us a working anti-emulation, from the signatures - the TrashGen, which (as it turns out xD) uses our virus. But if created trash-code will weak, then we'll detected by heuristics.

Plan of attack

So, for creation of a better trash-code, in the beginning I suggest to choose the compiler, whose generated code we will "imitate": ms, borland etc. Once selected compilers (eg, ms), you can still determine which mode of generation / optimization, we will imitate ("min size"/"max speed"). Since under different modes the code is generated in another way. For example, for ms-compiler, instruction EAX = 1 in mode "max speed" will be as follows:

        mov     eax, 1          ;0xB8 0x01 0x00 0x00 0x00

And for "min size"

        xor     eax, eax        ;0x33 0xC0
        inc     eax             ;0x40

Trashgen can generate both of these options, but it looks more realistic if you use one tactic (do not know what perversion will be new versions of the heuristics).

Further, in addition to various features that you have created in a trashgen, it must also be able to generate "realistic" code (similar to the usual code of standard programs written in HLL), namely:

But even such a trash-code generated taking into account given points, can easily be caught by heuristics.

Useful trash

The main problem is that the garbage-code - it's just trash, useless set of instructions. This is the reason anger heuristics. And if so, then the trash should be helpful. For this we need to implement 2 more tasks:

  1. The useful code should use result of work of a trash-code (or on the contrary, trash-code should affect somehow work of a useful code) aka "pseudo-purpose";
  2. "LOGICAL TRASH" technique;

The first point in the general case is implemented is quite simple: generate trash, runs it, and after trash working off the received result is used in a useful code. For example, have generated the following code:

        mov     eax, 100
        mov     ecx, eax
        sub     ecx, 95

After his execution ECX = 5. And this value can be added to key for decrypt the virus code (many of use cases). However, the generated code trash may be as follows:

        mov     eax, 100        ;1
        mov     ecx, eax        ;2
        mov     ecx, eax        ;3
        mov     ecx, eax        ;4
        sub     ecx, 95         ;5

After his execution ECX also = 5. But instructions 2 and 3 - garbage, for what will be punished by heuristics. The solution consists in creation a "logical trash".

"LOGICAL TRASH" technique

The idea is in that a garbage-code to make logical, like logic of code of usual programs. Normal code first initializes the parameters (registers, local variables etc); then executes instructions, which uses and/or otherwise affect these parameters. And instructions are a single whole - carry out the general problem, and among them aren't present superfluous, no "garbage". No re-initialization, is not used uninitialized parameters. All instructions related to each other, each of which affects the subsequent course of execution of a code.

Around this logic I created in new version of my engine xTG (v2.0.0), which works as follows:

	|	 |
	| begin  |
	    |					+--------------------------------------------+
	    |					|					     |
	    |					|					     |
	    |	 ________________________	|		 ________________________    |
	    |	|			 |	|		|			 |   |
	    |	| module: generation of	 |	|		| module: logic		 |   |
	    |	| instructions		 |	|		|			 |   |
	    |	|			 |	|		|			 |   |
	    |	|			 |	|		|			 |   |
	    |	| ---------------------- | 	|		| ---------------------- |   |
	    +-->| generation  		 |<-----+	   +--->| parser	      	 |   |
	+------>| of		 	 |		   |    |			 |   |
	|	| "right"		 | instr addr      |	| ---------------------- |   |
	|	| instruction		 |-----------------+ 	| emulator	 	 |   |
	|	| 			 |  			| 			 |   |
	|	| ---------------------- |			| ---------------------- | 0 |
	|	| updating of address	 |<----------------+	| analyzer of 	         |---+
	|     1	| to generate a new 	 |		   |  1 | instructions       	 |
	+-------| instruction		 | 0		   +----| (logic)		 |
		|			 |------+		|			 |
		| ---------------------- |	|		| ---------------------- |
		|________________________|	|		|________________________|
	 ________				|
	|	 |				|
	|  end	 |<-----------------------------+

More details:

  1. at first we cause the module of generation instructions;
  2. create "right" instruction: correct opcodes and other bytes. Btw, if the developed Logic-Engine (LE) will be applied to other engines, then LE should also do checking at the byte level;
  3. call module of logic, passing in it the address of (created) instructions;
  4. now the parser takes control. Parser can be built-in module logic. But also may be stand-alone engine (disassembler).

    The parser determines which instruction in front of him, and gets its parameters (operands: registers, addresses, etc); then it retains in some structure these param's and exposes some flags. The completed structure will be used by code-analyzer (see below).

    Also, for example, if we found instructions mov ecx, dword ptr [403008h] etc, then parser will replace address 403008h on other, the corresponding address in allocated memory for correct emulation of instruction;

  5. then we emulate (corrected) instruction. Also, code-emulator can be built-in module logic. But also may be stand-alone engine.

    Emulator receives the address of instruction, prepares a special environment, copies of this environment instruction, and emulates. Moreover, emulation can be at least 3 types: run of instructions in special environment, full imitation of execution of the instruction & the combination of these two methods (1-st method is suitable for most instructions). The result of emulation (current values of instruction param's etc) are stored in variables: the virtual registers etc.

    Btw, emulation - an excellent technology for viruses, with which help it is can be to create very interesting things (for UEP, VM, "logical trash" tech, morphing and other);

  6. and then we call the code-analyzer/checker of logic. Also, code-analyzer can be built-in module logic. But also may be stand-alone engine.

    The analyzer, on basis of data from the parser (filled structure) and the emulator, decides whether, fits instruction on logic or not.

    Analysis of instruction takes place in 2 stages:

    1. Checking of instructions params.

      First, the analyzer must know, what are the param's and how to check them. For this he uses flags, passed from parser. A set of flags can be:

      LGC_INSTR_INIT  equ     00000000000000000000000000000001b       ;instruction of initialization params;
      LGC_INSTR_CHG   equ     00000000000000000000000000000010b       ;... change parameters;
      LGC_P1_DST      equ     00000000000000000000000000000100b       ;1-st param - destination;
      LGC_P1_SRC      equ     00000000000000000000000000001000b       ;1-st param - source;
      LGC_P2_DST      equ     00000000000000000000000000010000b       ;2-nd param - destination;
      LGC_P2_SRC      equ     00000000000000000000000000100000b       ;2-nd param - source;
      LGC_P1_REG      equ     00000000000000000000000001000000b       ;1-st param - reg;
      LGC_P1_ADDR     equ     00000000000000000000000010000000b       ;1-st param - address;
      LGC_P1_NUM      equ     00000000000000000000000100000000b       ;1-st param - number;
      LGC_P2_REG      equ     00000000000000000000001000000000b       ;2-nd param - reg;
      LGC_P2_ADDR     equ     00000000000000000000010000000000b       ;2-nd param - address;
      LGC_P2_NUM      equ     00000000000000000000100000000000b       ;2-nd param - number;

      These flags can describe almost all instructions (some flags can be removed). If instruction contains more than 2 parameters, then other parameters are stored in a separate field.

      Further, the flags determines what and how to check: checks on possibility of initialization params, change their values, to use them in other instructions, and more. The result of each check is stored in the mask's. Their 2: regs_init & regs_used. Roughly speaking, this 2 dword, where each bit corresponds to a particular parameter (for example, some reg (EAX etc)). Moreover, the bits in regs_init indicate, whether it is possible to initialize parameter or not (protection against re-initialization). And on bits in regs_used we know, whether it is possible to use param in instructions or not.

    2. Check of states of instructions param

      So, if the first stage is passed, it means that the parameters suitable. Continue.

      State (of param) - is a stored value that takes parameter. States of all parameters are stored in the table of states, which is a buffer of a certain size.

      So, the analyzer takes current value of parameter that we got through emulation (and saved, for example, in the virtual register), and compares it with all the accumulated states of this parameter.

      If a match is found, then instruction - garbage, check isn't passed. In this case, from table of states we take the last saved state of this param and make it current (stored it state in virtual reg); and also restore the mask's to previous value. If no match is found, then it is a new state of param, instruction passed the check. Add this value to table of states;

  7. go back to module of generation instructions. Check, what value was returned to us by the logic-module: if 0, then instruction is not suitable on logic - on her address will generate a new instruction (overwrite). Jmp to point II. if 1, then instruction is suitable on logic. Jmp to point VIII.
  8. increase the address (to generate a new command) on size of a checked up instruction. Also we will find out, if we have generated the necessary number of bytes, then jmp to IX. If not all, then jmp to II.
  9. exit;

Examples of generating a simple garbage

; ---------------------------------------        ------------------------------------
;| this code doesn't pass the test of    |      | this code pass the test of         |
;| logic - not be created               |       | logic - will be created            |
; ---------------------------------------        ------------------------------------

        ;1.1                                    ;2.1
        mov     edi, 1000                       mov     edi, 1000
        mov     esi, edi                        mov     esi, edi
        mov     edi, esi                        mov     edi, 2000

        ;1.2                                    ;2.2
        mov     eax, 1000                       mov     eax, 1000
        mov     ecx, 2000                       mov     ecx, 2000
        xchg    eax, ecx                        xchg    eax, ecx
        xchg    eax, ecx                        inc     ecx
                                                xchg    eax, ecx

        ;1.3                                    ;2.3
        xor     edi, edi                        xor     edi, edi
        xor     ebx, ebx                        xor     ebx, ebx
        add     edi, ebx                        inc     ebx
                                                add     edi, ebx

        ;1.4                                    ;2.4
        mov     edx, 1000                       mov     edx, 1000
        mov     eax, edx                        mov     eax, edx
        dec     edx                             mov     edx, 2000
        mov     edx, 2000                      

        ;1.5                                    ;2.5
        mov     eax, 1000                       mov     eax, 1000
        mov     ecx, 1001                       mov     ecx, 1001
        sub     ecx, eax                        sub     ecx, eax
        sub     eax, ecx                        sub     eax, ecx
        inc     eax                             dec     eax

In Example 1.1 the first 2 commands are normal, and the third - junk. Reg EDI can be initialized, but it will take the same value, which is now - an unnecessary initialization. Example 2.1 (and later, all the others) shows the correct way to initialize.

In Example 1.2 the first 3 instructions "right", and the fourth - junk. Regs EAX & ECX again take the values that have already taken (check of states parameters).

In Example 1.3 the first 2 instructions "right", and the third - junk. EDI += 0 (check of states);

In Example 1.4 the first 3 instructions "right", and the fourth - junk. Reg EDX can't be initialized, if before, EDX didn't affect on value of another parameter (check of params);

In Example 1.5 the first 4 instructions "right", and the fifth - junk. After the execution of the first 4 instructions, states of EAX would be: 1000, 999. And after the execution fifth instructions EAX = 1000. This state has been (check states parameters).

Positive xD

Realization of technique "logical trash", examples of generation of a thrash-code, and also fuller understanding of an idea - all this you find in sources xTG v2.0.0.

I should say that in xTG most better quality of logic turns out at generation of a linear thrash-code without winapi-functions. In other cases, the logic will be fuzzy, but will be: on branching and with winapi. This is associated with a logic-module - it is more powerful than, the better the result.

If not satisfied with the logic of any instructions, just set the other flags.

Also, the variant of use of logic for already created code is possible. However the required code can be on 100% different from original.

Using this technique, our trash becomes useful and logical, which makes it quite effectively bypass the heuristic. And the only complex use of techniques will realize our desires into reality. Yo0!


  1. beauty on the fire "Emulation of a program code", 2004,
  2. beauty on the fire "Code-analyzers in antiviruses", 2004,
  3. Sl0n "Polymorphism. New technicians", 2004,
[email protected]
virmaking for is eternal
[Back to index] [Comments]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! aka