Tuesday, January 19, 2010

5.6 Macro Substitution



[ Team LiB ]





5.6 Macro Substitution


The flexibility of the C preprocessor is often used to define simple functions as macros.[44]

[44] netbsdsrc/usr.bin/yacc/defs.h:131�133



#define IS_IDENT(c) (isalnum(c) || (c) =='�'
|| (c) == '.' \'$')
#define IS_OCTAL(c) ((c) >= '0' && (c) <= '7')
#define NUMERIC_VALUE(c) ((c) -'0')

When reading code that contains macros, keep in mind that macros are neither functions nor statements. Careful coding practices need to be employed when macros are defined to avoid some common pitfalls. Consider the way c is parenthesized in the definition of the NUMERIC_VALUE macro above. Using c without parentheses would cause an expression of the type NUMERIC_VALUE(charval = b) to evaluate as charval = (b - '0') instead of the intended (charval = b) - '0'. By parenthesizing all macro arguments the code avoids precedence problems. A more difficult case concerns the number of times macro arguments are evaluated. Writing code of the form



while (IS_IDENT(a = getchar())
putchar(a);

would expand into



while ((isalnum(a = getchar()) || (a = getchar()) =='�' ||
(a = getchar()) == '.' || (a = getchar()) == '$'))
putchar(a);

reading as many as four different characters in every loop iteration, which is certainly not what was intended. In general, be very careful when you see macro arguments that have side effects such as assignment, pre/post increment/decrement, input/output, and function calls with these effects.


When you see macros used as statements, different problems may result. Consider the definition[45]

[45] netbsdsrc/lib/libc/stdlib/malloc.c:130



#define ASSERT(p) if (!(p)) botch(���STRING(p))

used in the following context.



if (k > n)
ASSERT(process(n) == 0);
else
ASSERT(process(k) == 0);

Once the macro expands the code will read as



if (k > n)
if (!(process(n) == 0))
botch(���STRING(process(l) == 0));
else
if (!(process(k) == 0))
botch(���STRING(process(k) == 0));

which is not what was intended. To avoid this problem we could try to define macros containing an if statement as a block.[46]

[46] netbsdsrc/lib/libc/regex/engine.c:128



#define NOTE(str) \
{ if (m->eflags&REG_TRACE) printf("=%s\n", (str)); }

However, using the NOTE macro in the context



if (k > l)
NOTE("k>l");
else
process(k);

will expand into



if (k > l) {
if (m->eflags&REG�TRACE) printf("=%s\n",("k>l"));
}; else
process(k);

which will not compile since there is an extraneous semicolon before the else.For these reasons in macro definitions you will often encounter statement sequences inside a do { ... }while (0) block.[47]

[47] netbsdsrc/sys/vm/vm_swap.c:293�297



#define getvndxfer(vnx) do { \
int s = splbio(); \
(vnx) = (struct vndxfer *)get_pooled_resource(&vndxfer_head); \
splx(s); \
}while (0)

The blocks, apart from creating a scope where local variables can be defined, are enclosed within a do ... while statement to protect if statements from unwanted interactions. As an example, the code[48]

[48] netbsdsrc/sys/vm/vm_swap.c:101�104



#define DPRINTF(f, m) do { \
if (vmswapdebug & (f)) \
printf m; \
}while(0)

can be placed in an if branch without any of the problems we described above. The result of the macro expansion is not a statement but a statement fragment that requires a following semicolon in all circumstances. Therefore an invocation of a macro that expands to such a form must be written with a following semicolon; there is no choice in the matter. Thus such a macro invocation will always look exactly like a statement and the user of the macro cannot be misled or confused in the ways we discussed.


An alternative approach involves coding conditional operations as expressions by using the C ?: operator or the short-circuit evaluation property of the Boolean operators.[49]

[49] netbsdsrc/lib/libc/regex/regcomp.c:153�154



#define SEETWO(a, b) (MORE() && MORE2() && PEEK() == (a) && \
PEEK2() == (b))
#define EAT(c) ((SEE(c)) ? (NEXT(), 1) : 0)

Figure 5.20 Macros using locally defined variables.


#define PEEK() (*p->next)
#define GETNEXT() (*p->next++)
#define MORE() (p->next < p->end)

static void
p_str(register struct parse *p
{
REQUIRE(MORE(), REG_EMPTY);
while (MORE())
ordinary(p, GETNEXT());
}
[...]
static int /* the value */
p_count(register struct parse *p
{
register int count = 0;
register int ndigits = 0;

while (MORE() && isdigit(PEEK()) && count <= DUPMAX) {
count = count*10 + (GETNEXT() - '0');
ndigits++;
}


Although the macro code above looks almost like a function definition, it is crucial to remember that the sequence is not called, like a function, but is lexically replaced at each point it is invoked. This means that identifiers within the macro body are resolved within the context of the function body they appear in and that any variable values modified within the macro body are propagated to the point the macro was "called." Consider the use of the argument p in the definition of the macros PEEK, GETNEXT, and MORE (Figure 5.20:1).[50] The p refers to the argument of the function p_str (Figure 5.20:2) when used in that function and the argument of the function p_count later (Figure 5.20:3).

[50] netbsdsrc/lib/libc/regex/regcomp.c:150�677


In addition, variables passed to macros can be modified inside the macro body, as demonstrated by the following code.[51]

[51] netbsdsrc/bin/rcp/rcp.c:595�606



#define getnum(t) (t) = 0; while (isdigit(*cp)) (t) = (t) * 10 \
+ (*cp++ - '0');
cp = buf;
if (*cp == 'T'){
setimes++;
cp++;
getnum(mtime.tv_sec);
if (*cp++ !='')
SCREWUP("mtime.sec not delimited");
getnum(mtime.tv_usec);
if (*cp++ != '')
SCREWUP("mtime.usec not delimited");
getnum(atime.tv_sec);

In the above example each argument to getnum is assigned a value according to the string pointed to by cp.If getnum were defined as a proper C function, a pointer to the respective variables would need to have been passed instead.


One final difference between macro substitution and function definitions results from the preprocessor's ability to concatenate macro arguments using the ## operator. The following code[52]

[52] netbsdsrc/sys/arch/i386/i386/vm86.c:153�154



#define DOVREG(reg) tf->tf_vm86_##reg = (u_short) \
vm86s.regs.vmsc.sc_##reg
#define DOREG(reg) tf->tf_##reg = (u_short) vm86s.regs.vmsc.sc_##reg

DOVREG(ds);
DOVREG(es);
DOVREG(fs);
DOVREG(gs);
DOREG(edi);
DOREG(esi);
DOREG(ebp);

will expand into



tf->tf_vm86_ds=(u_short) vm86s.regs.vmsc.sc_ds;
tf->tf_vm86_es=(u_short) vm86s.regs.vmsc.sc_es;
tf->tf_vm86_fs=(u_short) vm86s.regs.vmsc.sc_fs;
tf->tf_vm86_gs=(u_short) vm86s.regs.vmsc.sc_gs;
tf->tf_edi = (u_short) vm86s.regs.vmsc.sc_edi;
tf->tf_esi = (u_short) vm86s.regs.vmsc.sc_esi;
tf->tf_ebp = (u_short) vm86s.regs.vmsc.sc_ebp;

thus dynamically creating code to access structure fields with names such as tf_vm86_ds and sc_ds.


Older, pre-ANSI C compilers did not support the ## operator. However, they could be tricked into concatenating two tokens by separating them by an empty comment (/**/); you are likely to encounter this use when reading pre-ANSI C code. In the following code excerpt you can see exactly how the effect of the ## operator was achieved in earlier programs.[53]

[53] netbsdsrc/sys/arch/bebox/include/bus.h:97�103



#ifdef ���STDC���
#define CAT(a,b) a##b
#else
#define CAT(a,b) a/**/b
#endif

Exercise 5.20
Locate all definitions of the min and max macros in the source code base, counting the ones that are correctly defined and the ones that can result in erroneous code.


Exercise 5.21
C++ and some C compilers support the inline keyword to define functions that are directly compiled at the point they are called, avoiding the associated function call overhead and providing the compiler with additional optimization opportunities in the context of the caller. Discuss how this facility relates to macro definitions in terms of readability, maintainability, and flexibility of code that uses each method.


Exercise 5.22
Compare the readability of C++ template-based code (Section 9.3.4) with similar functionality implemented by using macros. Create, using both approaches, a simple example and compare the compiler error messages after introducing a syntax and a semantic error in your code.





    [ Team LiB ]



    No comments: