Macro and security

A macro in computer science is a rule or pattern that specifies how a certain input sequence (often a sequence of characters) should be mapped to a replacement input sequence (also often a sequence of characters) according to a defined procedure.

A macro is used to define variables or procedures,^[1] to allow code reuse, or to design domain-specific languages.

Macros can be separated into several types:

Text substitution macros as in the C language.^[1]
Macros in software. In some software, a sequence of instructions can be associated to a keyboard or mouse action. Some software can include a programming language (like VBA in Microsoft Office) allowing the control of software features.^[2]
Other types of macros that are not covered in this article.

Macros can be very useful to software users. They simplify regularly used actions (repetitive code for a programmer, or a sequence of actions in a program) so that the productivity of the user is increased. However, many problems exist, they will be tackled subsequently.

Flaws and macros viruses

Text substitution dangers

There are a few dangers in text substitution macros, like C macros. The C preprocessor is a powerful tool that can bring clarity to the code or on the contrary, obscure it.^[3]

Compiler error

First, because there is no form of macros error checking (as there is for C or assembly language), it is possible to make macros which will not work.^[4]
Indeed, for the C language, the syntax of each macro is replaced by what has been declared by the preprocessor. Only after that does the compiler check the code.

Example :

//file example.c

#include <stdio.h>
#include <stdlib.h>
#define someString foo

int main()
{
   printf("Example !");
   return 0;
}

This example compiles because the macro someString is not used. But if this macro were used, there would be a compilation error.

//file example.c

#include <stdio.h>
#include <stdlib.h>
#define someString foo

int main()
{
   printf("Example : %s", someString);
   return 0;
}

gcc -o example example.c
example.c: In function 'main':
example.c:6:26: error: 'foo' undeclared (first use in this function)
example.c:6:26: note: each undeclared identifier is reported only once for each function it appears in

This is just a simple example, but in some applications, when the count of code lines begins to be huge, it can be really important.
If the macro does not compile, the syntax may need to be changed. In the previous example, quotation marks have to be added to "foo". For the others, the whole syntax will be false, so the macro and all calls to it might need to be changed.
According to the case, this type of flaw can be minimal or can affect the maintenance of an application.

Repetitive code

In this second point, it is shown that repetitive code can be found in some macros, which can lead to wasted computing time.
The following code illustrates this:^[5]

// file example2.c

#include <stdio.h>
#include <stdlib.h>
#define max(a,b) (((a) > (b)) ? (a) : (b))

int main()
{
   printf("The max ? %d", whatMax());
   return 0;
}

int whatMax()
{
   int foo = 5;
   return max( foo, getNumber() ); 
}

int getNumber()
{
   printf("I am called !\n");
   return 3;
}

There is a call of a macro called "max" between two values: the variable foo and the return value of the function getNumber().
It is just a basic macro calling a function which calculates the maximum between two values.
When this code is executed, it produces:

./example2
I am called !
The max ? 5

For the moment, there is nothing wrong, but if the value of getNumber is replaced, for example, by six, the result is:

./example2
I am called !
I am called !
The max ? 6

The function getNumber() is called two times, yet only needs to be called one time.
This is because of the macro. If the call of the macro "max" is replaced by its syntax, then the return in whatMax is:

return (((foo) > (getNumber()) ? (foo) : (getNumber());

When the value of getNumber is greater than the value of foo, it is called two times: one for the comparison and one for the return.
As such, it can slow down an execution of a program when the function, in this example, need to take time to calculate the return value.

Results not expected

The last point of this section demonstrates that macros can be potentially dangerous.
In some programs, there is critical data on which some operations can be needed. In order to make those operations more legible, macros can be used, but it can be double-edged.
Here is an example in this code and its execution:^[6]

// file example3.c

#include <stdio.h>
#include <stdlib.h>
 #define cube(x) x*x*x
 #define double(x) x+x
 
int main()
{
    int x = 3;
    int y = cube(x + 1);
    int z = 5 * double(x);
   
    printf("y is %d\n",y);
    printf("z is %d\n",z);  
   
    return 0;
}

./example3
y is 10
z is 18

Normally, the result expected was "y is equal to 64 and z to 30". But when the preprocessor replace the calls of the macros, it results in:

// y:
x + 1*x + 1*x + 1 // equivalent to
x + (1 * x) + (1 * x) + 1 // or
3 + (1 * 3) + (1 * 3) + 1 // is equal to 10

// z:
5 * x+x // equivalent to
(5 * x) + x // or 18

Macros in C (and others languages like SMX, SAM76, assembly language, etc.) are just text substitution, they do nothing more. It is the programmer's responsibility to be careful about it, in this case, to put parentheses.
But, in others cases, some effects cannot be instantly predicted, like this one:

// file example4.c

#include <stdio.h>
#include <stdlib.h>
#define double(x) ((x)+(x))
 
int main()
{
    int x = 3;
    int y = double (++x);
    
    printf("y is %d\n",y); 
   
    return 0;
}

8 is expected as the value of y, but the result at the execution is:

./example4
y is 10

Here is what happens if the execution stack is decomposed:

y = (++x) + (++x);
/*
the stack is :
+
++x
++x

- the first '+' is unstacked
- the first '++x' is evaluated and unstacked => x = 4
- the second '++x'is evaluated and unstacked => x = 5
- then the '+' give x+x = 5+5 = 10*/

The result is logical when the execution stack is looked at, but was not very obvious instinctively.

So, macro users need to be careful because without attention, errors easily can appear (in addition, instructions like '++x' in imbricated instructions are not recommended either).

VBA-type/Winword macros flaws

These flaws are completely different from the previous ones : the main problem in VBA-type macros is the viruses. Macro viruses are relatively recent, the first one named Concept,^[7] was created in June 1995.^[8]
The main reason of that is that the high-level languages used to write macro code are powerful and easy to use, considerably increasing the pool of potential virus writers, and the documents containing the macros can be disseminated rapidly and widely by E-mail.^[9] So they can be spread quickly and be very destructive.

Different types of macros viruses

System Macro Viruses

System macro means macros that interact with basic operators in a Word document (like often-used functionalities like FileSave, FileSaveAs which are macros).
The strength, and yet the weakness of a Word document is that such types of macros can be redefined by users.
This allows the user great flexibility, but this also is a flaw that hackers can exploit to take down control of the document and the computer where the Word document is opened.
Such type of viruses use automatic and semi-automatic macros, they can be launched by any action or events without the user’s knowledge or consent.
For example, a Word document has the following macros: AutoExec, AutoNew, AutoClose, AutoOpen, AutoExit, so it is easy for a hacker to replace these basic functionalities by a macro virus which has the same name as the original functionality.^[9]
Also, a combination of shortcut keys can be associated with a system command (like Ctrl+B to set up the bold font) and the user can change them, replacing them by custom macros. As such, a hacker can modify and create macros that the user will activate by using the shortcut key.
Such macros can also be activated by a macro button, like a button "Click here for further information" which seems common and innocuous.^[9]

Document to Macro Conversion

A type of macro virus that cuts and pastes the text of a document in the macro. The macro could be invoked with the Auto-open macro so that the text would be re-created when the document (empty) is opened. The user will not notice that the document is empty. The macro could also convert only some parts of the text in order to be less noticeable. Removing macros from the document manually or by using an anti-virus program could lead to a loss of content in the document. ^[8]^{: 609–610}

Polymorphic Macros

Polymorphic viruses change their code in fundamental ways with each replication in order to avoid detection by anti-virus scanners.^[10] In WordBasic (first name of the language Visual Basic), polymorphic viruses are difficult to do.
Indeed, the macro's polymorphism relies of the encryption of the document. However, the hackers have no control of the encryption key.
Furthermore, the encryption is inefficient: the encrypted macros are just in the document, so the encryption key is too and when a polymorphic macro replicates itself, the key does not change (the replication affects only the macro not the encryption).
In addition to these difficulties, a macro can not modify itself, but another macro can.
WordBasic is a powerful language, it allows some operations to the macros:

Rename the variables used in the macro(s).
Insert random comments between the operators of its macro(s)
Insert between the operators of its macros other, ‘do-nothing’ WordBasic operators which do not affect the execution of the virus.
Replace some of its operators with others, equivalent ones, which perform the same function.
Swap around any operators the order of which does not impact the result of the macro’s execution.
Rename the macro(s) themselves to new, randomly selected names each time the virus replicates itself to a new document, with the appropriate changes in these parts of the virus body which refer to these macros.

So, in order to implement macros viruses which can change its contents, hackers have to create another macro which fulfills the task to modify the content of the virus.
However, this type of macro viruses is not widespread. Indeed, hackers frequently choose to do macro viruses because they are easy and quick to implement. Making a polymorphic macro requires a lot of knowledge of the WordBasic language (it needs the advanced functionalities) and more time than a "classic" macro virus.
Even if a hacker were to make a polymorphic macro, the polymorphism needs to be done, so, the document needs to update and the update can be visible to a user.^[8]^{: 610–612}

Chained macros

During replication, a macro can create do-nothing macros. But this idea can be combined with polymorphic macros, so macros are not necessarily do-nothing; each macro invokes the next one, so they can be arranged in a chain. In such a case, if they are not all removed during a disinfection, some destructive payload is activated. Such an attack can crash the winword processor with an internal error. Since Winword 6.0, the number of macros per template is limited to 150, so the attack is limited, too, but can still be very annoying. ^[8]^: 623

‘Mating’ Macro Viruses

Macro viruses can, in some cases, interact between themselves. If two viruses are executed at the same time, both of them can modify the source code of each other.
So, it results a new virus which can not be recognize by the anti-viruses software. But the result is totally random: the macro virus can be more infectious or less infectious, depending upon which part of the virus has been changed.
However, when the 'mating' is unintentional, the resulting macro virus has more chances to be less infectious.
Indeed, in order to replicate itself, it has to know the commands in the source code, but, if it is changed with a random scheme, the macro can not replicate itself.
Nevertheless, it is possible to do such macros intentionally (it is different from polymorphic macros viruses which must use another macro to change their contents) in order to increase the infectivity of the two viruses.
In the example of the article,^[8]^{: 612–613} the macro virus Colors^[11] infected a document, but another infected the user's system before : the macro virus Concept.
Both of these viruses use the command AutoOpen, so, at first, the macro virus Colors was detected but the command AutoOpen in it was the command of the macro virus Concept.
Moreover, when Concept duplicates itself, it is unencrypted, but the command in the virus Colors was encrypted (Colors encrypt its commands).
So, replication of the macro virus Concept results in the hybridation of this macro virus (which had infected the user's system first) and Colors.
The "hybrid" could replicate itself only if AutoOpen were not executed; indeed this command comes from Concept, but the body of the hybrid is Colors, so that create some conflicts.
This example shows the potential of mating macro viruses: if a couple of mating macro viruses is created, it will make it more difficult to detect both macro viruses (in this hypothesis, there are only two viruses which mate) by the virus-specific scanners and may reinforce the virility of the viruses.
Fortunately, this type of macro virus is rare (more than the polymorphic macro viruses, one may not even exist), indeed, creating two (or more) which can interact with each other and not reduce the virility (rather reinforce it) is complicated.

Macro Virus Mutators

Among the worst scenarios in the world of viruses would be a tool allowing one to create a new virus by modifying an existing one. For executable files, it is hard to create this kind of tool. But it is very simple for macro viruses since sources of macros are always available. Based on the same idea of polymorphic macros, a macro can perform modifications to all macros present in the document. Considering this, there are just a few modifications to make to the macro in order to convert it in a macro virus mutator. So it is easy to create macro virus generators, and thereby to create quickly several thousands of known viruses. ^[8]^{: 613–614}

Parasitic macro viruses

Most macros viruses are stand-alone; they do not depend on other macros (for the infectious part of the virus, not for the replication for some viruses), but some macros viruses do. They are called parasitic macros.^[8]^{: 614–615} When launched, they check other macros (viruses or not), and append their contents to them. In this way, all of the macros became viruses. But, this type of macro can not be spread as quickly as stand-alone macros. Indeed, it depends on other macros, so, without them, the virus can not be spread. So, parasitic macros often are hybrid: they are stand alone and they can infect other macros. This kind of macro virus poses real problems to the virus-specific anti-virus; in fact, they change the content of other viruses, so that accurate detection is not possible.

Suboptimal anti-virus

There are different types of anti-virus (or scanner), one of them is the heuristic analysis anti-virus which interpretes or emulates macros.
Indeed, to examine all of the branches of macros require a NP-complete complexity^[8]^: 605 (using backtracking), so in this case, the analysis of one document (which contains macros) would take too much time. Interpreting or emulating a macro would lead to either false positives errors or macro viruses not detected.

Another type of anti-virus, the integrity checker anti-virus, in some cases, doesn't work : it only checks all documents with extensions DOT^[12] or DOC (indeed, some anti-virus producers suggest to their users), but Word documents can reside in others extensions than those two and the content of the document tends to change often.^[8]^: 605
So, like the heuristic analysis, this can lead to false positives errors, due to the fact that this type of anti-virus checks the whole document.
The last type of anti-virus seen will be the virus-specific scanner.^[8]^: 608 It searches the signature of viruses, so, the type of anti-virus is weakest than the previous ones.
Indeed, the viruses which are detected by virus-specific scanners are just the ones knew by the producers of the software (so, more updates are needed than others types of scanners). Moreover, this type of anti-virus is really weak against morphing viruses (cf.section above), if a macro virus change its content (so, its signature), it can no more be detected by the virus-specific scanners even if it's the same virus which do the same actions. Its signature doesn't match the one declare in the virus scanner.

Additional to the responsibility of the anti-virus, there is a responsibility of the users : if a potential macro virus is detected, the user can choose what to do with it : ignore it, quarantine it or destroy it, but the last option is the most dangerous.
The anti-virus can activate some destructives macro viruses which destroy some data when they are deleted by the anti virus.
So, both of the virus scanners and the users are responsible of the security and the integrity of the documents/computer.
Furthermore, even if the anti-virus aren't optimal in the virus detection, most of the macro viruses are detected and the progression in the virus detection improve but with the creation of new macro viruses.

Solutions

Text-substitutions macros

Marco

Text-substitutions macros can be separated of the compiler (like in C) but they will not be securised.^{[clarification needed]} Or they can be integrated in the compiler but this will limit their utilisation. Marco is a solution allowing macros with both advantage and so, improving the productivity of the programmer, in particular for multi-languages applications. Indeed, it is independent from the language, the only requirement is that the compiler of the target language produce descriptive error messages with location and cause of errors. Marco also needs a module and a plug-in to understand a language. The plug-in must contain three oracles: one that check for syntactic well-formedness, one to determine a fragment's free names and the last one to test whether a fragment captures a given name. For the moment, it is just a prototype and it will only support C++ and SQL. Marco proves that safe code using macros can be well-encapsulated and language-scalable at the same time.^[13]

MacroML

An example of macro language allowing type-safe macros. MacroML is an expressive, typed language that supports generative macros. Macros are translated into the target language, MetaML and it gives a well-type program without runtime errors.^[14]

Macro viruses

There will always be faults in the world of viruses. Fortunately, a lot of macro viruses can be countered with good anti-virus. This section presents characteristics that anti-virus should have to protect from macro-viruses.

Polymorphic macros

First, specific anti-virus products will not really affect this kind of virus, anti-virus analyzing the content will be more efficient. Also, there is something to notice: the modification process is relatively slow and is noticeable by the user. ^[8]^{: 611–612}

Chained macros

In order to protect from these macros viruses, a scanner has to detect all elements of the chain and remove them properly. ^[8]^: 612

‘Mating’ Macro Viruses

The anti-virus has to detect all macros of the virus but also any remnants of them. In the example of Colors, the anti-virus must not detect that Microsoft's ScanProt (macro virus protection tool) contains remnants of the virus but the problem is that ScanProt does contain parts of the virus. ^[8]^: 613

There is another problem, if a virus B is created by modifying some macros of a virus A which consists in a set of macros, then the virus B could be detected as a remnant of A but the anti-virus would only identify (and remove) the non-modified macros. After that, in some cases, the replication of the virus (which does not contain original macros of A now) could be successful. In fact, the anti-virus would have created a new virus.^[8]^: 613

Macro Virus Mutators

Anti-virus have to be updated very easily and fast to counter these viruses. A lot of informations is added every day and the performance of the anti-virus must not drop while updates. ^[8]^: 614

Parasitic macro viruses

The problem here is that there is not an easily identifiable set of macros, the anti-virus have to parse the body of the macro and identify and cut parts added by the macro viruses (fortunately, this task is not difficult to achieve). ^[8]^: 615

References

Michael D. Ernst; Greg J. Badros; David Notkin (December 2002). "An Empirical Analysis of C Preprocessor Use". C preprocessor, macros.
Steven E. Ganz; Amr Sabry; Walid Taha (2001). "Macros as multi-stage computations: Type-safe, generative, binding macros in MacroML". macro language.
Gerard J. Holzmann. "The power of ten - Rules for developing safety critical code" (PDF). safety of macros. p. 4.
William Clinger; Jonathan Rees (1991). "Macros that work". safety of macros. pp. 1–2. {{cite web}}: Invalid |ref=harv (help)
Vesselin Bontchev (1996). "Possible macro virus attacks and how to prevent them". Virus, Macros, Safety of Macros. 15 (7): 595–626. doi:10.1016/S0167-4048(97)88131-X.

Notes

^ ^a ^b Michael D. Ernst; Greg J. Badros; David Notkin (December 2002). "An Empirical Analysis of C Preprocessor Use". C preprocessor, macros.
^ Macro-définition (French)
^ Gerard J. Holzmann. "The power of ten - Rules for developing safety critical code" (PDF). safety of macros. p. 4.
^ Gerard J. Holzmann. "The power of ten - Rules for developing safety critical code" (PDF). safety of macros. pp. 1–2.
^ Repetitive code of some macros
^ Potential dangers in macros
^ Vesselin Bontchev. "Macro Virus Identification Problems". macros viruses. Archived from the original on 2012-08-05.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p Vesselin Bontchev (1996). "Possible macro virus attacks and how to prevent them". Virus, Macros, Safety of Macros. 15 (7): 595. doi:10.1016/S0167-4048(97)88131-X.
^ ^a ^b ^c Paul Docherty; Peter Simpson (1999). "Macro attacks: What next after Melissa?". Viruses, Safety of Macros. 18 (5): 391–395. doi:10.1016/S0167-4048(99)80084-4.
^ Polymorphics macros
^ "Macro Virus Colors". Archived from the original on 2012-11-22. Retrieved 2012-12-15.
^ DOT extension
^ Byeongcheol Lee; Robert Grimm; Martin Hirzel; McKinley Kathryn S. (2012). "Marco: Safe, Expressive Macros for Any Language". Safety of Macros, Macro Language. Lecture Notes in Computer Science. 7313: 589–613. doi:10.1007/978-3-642-31057-7_26. ISBN 978-3-642-31056-0.
^ Steven E. Ganz; Amr Sabry; Walid Taha (2001). "Macros as multi-stage computations: Type-safe, generative, binding macros in MacroML". macro language.

[processor-1] Michael D. Ernst; Greg J. Badros; David Notkin (December 2002). "An Empirical Analysis of C Preprocessor Use". C preprocessor, macros.

[2] Macro-définition (French)

[3] Gerard J. Holzmann. "The power of ten - Rules for developing safety critical code" (PDF). safety of macros. p. 4.

[4] Gerard J. Holzmann. "The power of ten - Rules for developing safety critical code" (PDF). safety of macros. pp. 1–2.

[5] Repetitive code of some macros

[6] Potential dangers in macros

[7] Vesselin Bontchev. "Macro Virus Identification Problems". macros viruses. Archived from the original on 2012-08-05.

[Attack_prevent-8] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p Vesselin Bontchev (1996). "Possible macro virus attacks and how to prevent them". Virus, Macros, Safety of Macros. 15 (7): 595. doi:10.1016/S0167-4048(97)88131-X.

[Melissa-9] Paul Docherty; Peter Simpson (1999). "Macro attacks: What next after Melissa?". Viruses, Safety of Macros. 18 (5): 391–395. doi:10.1016/S0167-4048(99)80084-4.

[10] Polymorphics macros

[Colors-11] "Macro Virus Colors". Archived from the original on 2012-11-22. Retrieved 2012-12-15.

[12] DOT extension

[Marco-13] Byeongcheol Lee; Robert Grimm; Martin Hirzel; McKinley Kathryn S. (2012). "Marco: Safe, Expressive Macros for Any Language". Safety of Macros, Macro Language. Lecture Notes in Computer Science. 7313: 589–613. doi:10.1007/978-3-642-31057-7_26. ISBN 978-3-642-31056-0.

[14] Steven E. Ganz; Amr Sabry; Walid Taha (2001). "Macros as multi-stage computations: Type-safe, generative, binding macros in MacroML". macro language.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Flaws and macros viruses

Text substitution dangers

Compiler error

Repetitive code

Results not expected

VBA-type/Winword macros flaws

Different types of macros viruses

System Macro Viruses

Document to Macro Conversion

Polymorphic Macros

Chained macros

‘Mating’ Macro Viruses

Macro Virus Mutators

Parasitic macro viruses

Suboptimal anti-virus

Solutions

Text-substitutions macros

Marco

MacroML

Macro viruses

Polymorphic macros

Chained macros

‘Mating’ Macro Viruses

Macro Virus Mutators

Parasitic macro viruses

See also

References

Notes