Metaprogramming custom control structures in C

by Simon Tatham

This article describes a technique for using the C preprocessor to implement a form of metaprogramming in C, allowing a programmer to define custom looping and control constructions which behave syntactically like C's own for, while and if but manage control flow in a user-defined way. The technique is almost all portable C89, except that some constructions need the feature (available in both C99 and C++) of defining a variable in the initialiser of a for statement. Sample code is provided.

1. Motivation

The existing control constructions in C need no introduction. Every C programmer is familiar with if, while, do…while, and for statements, and their effects on control flow. Every so often, though, you might come across a situation in which it might be nice to be able to invent slightly different control mechanisms.

As one example, if you're iterating over a circularly linked list without a distinguished head element, you might find it inconvenient that for does its test at the head of the loop, because the apparently obvious way to iterate over such a list (comparing your current-element pointer to the list head in the test clause) will execute zero times. So you might like a variant of for that does its test at the end of the loop:

for_after (elephant *e = head; e != head; e = e->next)
    do_stuff_with(e);

or, better still, one that has a separate test for the first iteration, so you can also check if the list is completely empty:

for_general (elephant *e = head; e != NULL; e != head; e = e->next)
    do_stuff_with(e);

There's also a lot of scope for program-specific control constructions, if you find yourself writing some particular kind of loop a lot in a specific code base. For instance, suppose you have some kind of API for retrieving a list of things, and the API requires a lot of setup and teardown and function calls. You might find that even the simplest of loops to actually do something with your list looks a bit like this:

{
    hippo_retrieval_context *hc;
    hc = new_hippo_retrieval_context();
    while (hippo_available(hc)) {
        hippo *h = get_hippo(hc);
        printf("We have a hippo called %s\n", h->name);
        free_hippo(hc, h);
    }
    free_hippo_retrieval_context(hc);
}

and if you find yourself needing to iterate over these lists a lot, then your program will become tediously full of copies of that kind of boilerplate. In such a program you might feel that it would be nice to wrap all that machinery up into a macro defined in your own program's headers, so that you could write just the part of the loop that needed to change every time:

FOR_EACH_HIPPO (hippo *h)
    printf("We have a hippo called %s\n", h->name);

If you were doing it like this, you'd probably also like to arrange that break statements were handled sensibly. In the above example, if the code which actually deals with a list element wants to terminate the loop, it probably has to make sure to write a second copy of the free_hippo call before its break statement, because the copy at the end of the while loop body will be skipped by the break. If you were inventing a custom loop construction, you'd like to arrange that break had configurable handling, so that it automatically cleaned up this sort of thing.

It's easy to imagine that if you felt strongly enough about wanting this sort of thing in your language, you could do it by inventing a sort of secondary preprocessor, which ran after the standard C preprocessor and knew enough about C syntax to be able to recognise things like statements, blocks, declarations and expressions. Then you could define ‘macros’ which looked rather like additional grammar rules, and implement your extra loop constructions as those.

But in fact, one need not go that far. There is already a way to implement almost exactly the above control constructions in standard C, if you're prepared to be devious enough with the existing C preprocessor. In this article, I'll show how, and provide sample code.

(I say ‘almost’, because due to limitations of C macro syntax, the one thing in the above snippets that can't be arranged is the separation of clauses in for_after and for_general with semicolons rather than commas.)

2. Mechanism

If we're going to build custom loop constructions of this type, then how can we do it?

If we want our finished loop construction to be used by means of syntax like this:

MY_LOOP_KEYWORD (parameters)
    statement or braced block

then it's clear that we're going to have to define MY_LOOP_KEYWORD as a macro, and also that the macro must expand to a statement prefix: that is, something which you can put on the front of a valid C statement to yield another valid C statement.

So what do those constraints allow us to do? Well, there are several types of statement prefix in the C syntax:

a label
while (stuff)
for (stuff; stuff; stuff)
if (stuff) {stuff} else
more than one of the above, one after another.

(There's also switch (stuff) and case labels, but we'd like to avoid those if possible because of their side effect of interfering with case labels from an outer switch. It turns out we don't need them anyway; the above list is sufficient.)

So we're going to explore the range of possibilities allowed by defining a macro to expand to a chain of those types of thing, and then prefixing it to a user-supplied statement.

The critical component in the list above is the if…else statement prefix, because it allows us to provide a braced statement block of our own, in which we can put code of our choice. This will of course be vital for any loop construction which has to run specialist code at the start and end, or between iterations.

So, to begin with, here's a construction that lets us run code of our choice and then run the user's code. Suppose we define our macro so that, when followed by a statement or block, it expands to this:

if (1) {
    /* code of our choice */
    goto body;
} else
    body:
        { /* block following the expanded macro */ }

You can see how this works by following the control flow through from the top. We come into the if statement; the condition is always true, so we execute the ‘code of our choice’ section first; then we reach the goto, which conveniently lets us get into the else clause of the same if statement even though that would not otherwise have been executed at all. So we execute our code, and then the user's code.

That was nice and easy. Now what if we want to run code of our choice after the user's loop body?

Well, we certainly can't do that using only if and labels, because those can't stop execution from falling off the end of the user's block and on to the following code. The code we want to run afterwards has to be written above the user's code, which means control flow has to move backwards in the source file – and to do that, we have to use a loop statement. So we can do this, for example:

if (1)
    goto body;
else
    while (1)
        if (1) {
            /* code of our choice */
            break;
        } else
            body:
                { /* block following the expanded macro */ }

As before, we first go into the then-clause of the outermost if, which contains a goto that jumps us directly to the user's block. We execute that block, but then what? Well, that block is enclosed in a while (1) statement, so we now go back round to the top of the while and enter the then-clause of the inner if – where we then run code of our choice, and after that, execute the break statement which terminates the loop. So we've successfully injected code to be run after the user's code has been executed. And still the whole construction consists solely of a chain of statement prefixes prepended to the user's block, so we could feasibly define a macro to expand to all but the last line of the above snippet.

So this is beginning to look pretty promising. We can use both of the techniques above to get code to run before and after the user's code; we can further mess about with the control flow by scattering extra labels all over the place and have our inserted code blocks test conditions, think hard about what to do, and then issue an appropriate goto.

Another thing we may well want to do is to bring variable declarations into scope, so that they can be accessed both by our added code in if blocks and by the user's code itself. This is unfortunately not feasible in old-style C89, since in that you can only open a new scope with an open brace character, and that would mean the user would have to provide an extra closing brace after their statement, which would look ugly and (worse still) confuse editors' automatic C indentation policies.

But if you're willing to allow yourself the one extra language feature of being able to use a declaration as the initialiser clause of a for statement (which is legal in both C99 and C++, so the chances are that your compiler probably has some mode that supports it), then suddenly it becomes possible to bring any declaration you like into scope.

So an obvious approach is to put the declaration in a for, and then use exactly the same technique as above to stop the for from actually causing a loop: i.e. repeat exactly the previous code snippet but replace the while with a for.

That's not quite ideal, though, because execution jumps over the declaration rather than passing through it. If the declaration doesn't include an initialiser, that makes no difference; but you might want to declare and initialise the declaration in one go. One reason for that might be if you're working in C++ and need the constructor to be called; another reason might be if the user was providing a parameter to the loop macro which you wanted to treat as part of a combined declaration and initialiser, such as the ‘hippo *h’ in one of the examples in section 1. (If the user provides a macro parameter of the form ‘type *var’, then we can put that parameter before an assignment to produce a declaration-with-initialiser, but we can't extract the variable name on its own in order to do the declaration and initialisation separately.)

So, can we fix that? Well, we'll have to do two things. One is to recover control after the user's block runs (but we can do that using the while-based construction above), and the other is to find a way to transfer control back out of the for without falling into the trap of writing a break which is executed in the context of the while.

For this, a helpful trick is to put a label inside an if (0). Like this:

if (0)
    finished: ;
else
    for (/* declaration of our choice */ ;;)
        if (1)
            goto body;
        else
            while (1)
                if (1)
                    goto finished;
                else
                    body:
                        { /* block following the expanded macro */ }

So the initial if (0) is ignored and we begin executing the for statement, including its declaration. The construction inside the for should now be familiar from the previous example: its job is to execute the user's block and then loop back round to the goto finished statement, which transfers control into the then-clause of the outermost if. From there, control bypasses the whole else-clause and drops out of the bottom. So this approach to declarations works.

Now what about handling break? If you think about what would happen to some of the above snippets if the suffixed block executes break, you find that the first one (that executes code of our choice before the user's code) doesn't interfere with break at all, so that it would still terminate the next outermost loop or switch; but the other two, in which the user's code is embedded in a while or a for, both have the side effect of changing the meaning of break within the suffixed block so that it just terminates the while or for. This will be nasty if we have to embed either of those constructs inside an actual loop (by which I mean one intended to execute the user's code multiple times, unlike the above dummy loops that we always break out of after one iteration), because then a break in the user's code won't terminate the real loop, only the current iteration of the loop body – in other words, it'll become synonymous with continue, which isn't very useful.

At first sight one is inclined to think that since the problem arose due to a side effect of using a C loop keyword, we surely can't solve the problem except by removing the loop. But in fact, we head in the other direction: we can recover useful handling of break by adding another layer of loop! Specifically, we put the user's code inside two nested while loops, like this:

if (1)
    goto body;
else
    while (1)
        if (1) {
            /* we reach here if the block terminated by break */
        } else
            while (1)
                if (1) {
                    /* we reach here if the block terminated normally */
                } else
                    body:
                        { /* block following the expanded macro */ }

As in the previous examples, the outermost if jumps us into the innermost loop and executes the user's code block. If that block terminates normally, then we loop round to the top of the inner while, and execute a code snippet of our own choice. But if the user's block terminated by means of a break statement, that will terminate the inner while and cause us to loop round to the top of the other one. So we've now arranged for control flow to go to different places based on whether the user issued a break or not; and of course each of the code snippets we provide above can act on that information as it sees fit, including in particular jumping to a location of its choice via goto.

3. Handling `else` clauses

I've shown that it's possible to write an almost arbitrary control structure by this mechanism which expects a single block of code after it and arranges to call that block in a user-defined looping setup.

What if you want to pass more than one block of code to your control structure, as you can with the built-in if…else?

For instance, Python allows an else clause on the end of a for loop, which is executed if the loop terminates normally but skipped if it terminates by break. This is ideal for situations in which the for loop is searching for a suitable element of a list, and you want special-case handling if no such element turned out to exist. If C had that feature too, then you'd be able to write things like:

for (i = 0; i < n; i++) {
    if (array[i] == the_droid_we_are_looking_for)
        break;
} else {
    /* only executed if no array element matched */
    move_along();
}

whereas currently you have to do this by testing after the loop to see if i==n, or if the loop conditions weren't as simple as that then you might resort to something even uglier like declaring an extra flag variable.

Or you might make up your own constructions. I occasionally feel that it would be nice to put an else clause on a while loop, with the semantics (this time not like Python's) that the else clause is executed if and only if the main loop body was run zero times. This would be handy, for example, in cases where you're printing a message to the user every time you go round the loop but you feel it would be unfriendly not to print anything if you're not going round at all:

while ((p = get_a_thing()) != NULL) {
    printf("Processing %s\n", p->name);
    do_stuff_with(p);
} else {
    /* only executed if the condition was false the first time */
    printf("No things to process\n");
}

Again, to do this in standard C you have to do something fairly ugly, such as putting the while inside an if with the same condition, and also turning it into a do…while if the condition (like this one) has side effects that you need to avoid duplicating the first time round the loop.

Another output-related example is that of printing a collection of strings with commas between them, so that however many actual output values you're printing (at least, if it was non-zero) you print one comma fewer. I've always done that by means of a variable storing a separator string:

sep = "";    /* the first value has nothing before it */
while ((p = get_a_thing()) != NULL) {
    printf("%s%d", sep, p->value);
    sep = ",";   /* all subsequent values are preceded by a comma */
}

But it would be cute if you could avoid the extra variable declaration, by writing something like

while_interleaved ((p = get_a_thing()) != NULL)
    printf("%d", p->value);
and_in_between
    putchar(',');

and relying on the loop construct to do the job of arranging to run the second block one fewer times than the first.

All of these types of control structure can be implemented by extending the mechanism described in section 2. If we want to define a loop macro whose invocation has a syntactic form like this:

MY_LOOP_KEYWORD (parameters)
    statement or braced block
else
    statement or braced block

then all we have to do is to insert an unmatched if statement somewhere in our chain of statement prefixes, and then it will match an else written by the user after their first statement or block.

A reasonably general way to arrange this is by using an if (0) with a label on each side, like this:

else_clause:
    if (0)
        then_clause:
            { /* first block following the expanded macro */ }
    else
        { /* second block following the expanded macro */ }

(In case it's becoming unclear, our macro in this case would expand to everything up to and including the then_clause label; everything after that is code written by the user following the invocation of our control macro.)

Now we place other control constructions outside that one which can execute either goto then_clause (which will jump straight to the user's main loop body) or goto else_clause (which will jump to just before the if (0), and therefore head for the else clause).

Doing that has the unfortunate effect that control flow will go to the same place after executing either of the user's blocks. To get around that, we can put additional control constructions just before or after the then_clause label; those will only be run for one of the two clauses, so now we can arrange (by the techniques shown above) to redirect control to a different place after each one. For example, a general approach might look like this:

while (1)
    if (1) {
        /* we reach here after the user's else clause */
        goto somewhere;
    } else
        else_clause:
            if (0)
                while (1)
                    if (1) {
                        /* we reach here after the then clause */
                        goto somewhere_else;
                    } else
                        then_clause:
                            { /* user's first block */ }
            else
                { /* user's second block */ }

I'm assuming that code further up will jump to either the then_clause label or the else_clause one. If the former, we immediately execute the user's then-clause, then we go round to the top of the inner while and reach one of our two code snippets. If the latter, then we execute the if (0), which drops us through to the user's else clause (shown above with indentation reflecting its place in the real syntactic structure, though of course the user will indent it rather differently), and since that's not inside the inner while at all, we would then loop round to the top of the outer while and execute a different snippet.

So that demonstrates how to make use of two code blocks provided by the user. They will have to be separated by the keyword else, or else none of this trickery will work; but of course you can always trivially #define some synonyms for else to make the code look nicer, such as and_in_between in the last example above.

4. Construction kit

Hopefully the previous sections have shown that the general technique of expanding a macro into a well-chosen collection of statement prefixes is surprisingly powerful, and contains all the needed functionality to implement a wide range of looping constructions.

However, actually doing it by chaining together ifs and whiles and labels is a bit of a headache, and if you were trying to define a custom loop construction in a particular application to cope with some inconvenient piece of API (as in one of my motivating examples above) then you might very well run out of patience before getting the sequence of bits and pieces quite right. It would be nicer to have a pre-packaged collection of the snippets in the previous sections, in a form that was reasonably easy for a user to put together into whatever loop construct was most useful to them that day. A sort of ‘loop construction kit’.

Well, all of the trickery shown above has a nice property: because it's all in the form of statement prefixes, it's all composable. So you could quite feasibly define macros to do jobs like ‘execute this code before the suffixed block’, ‘execute this code after the suffixed block’, ‘bring this declaration into scope’, ‘catch and handle break’, and so on, and have each of those macros expand to a statement prefix. Then a user could define a loop macro simply by means of chaining together a collection of those prefixes.

So I've written one of these, and it's available for download at the bottom of this page. I won't document it in full in this article, because the main documentation is in comments in the header file itself and it's easier not to try to keep it in sync in two places; but here's an example of it in use. The following definition, if you've included my header file first, constructs exactly the FOR_EACH_HIPPO loop type described in an earlier secion:

#define FOR_EACH_HIPPO(loopvar)                                 \
    MPP_DECLARE(1, hippo_retrieval_context *_hc)                \
    MPP_BEFORE(2, _hc = new_hippo_retrieval_context())          \
    MPP_AFTER(3, free_hippo_retrieval_context(_hc))             \
    MPP_WHILE(4, hippo_available(_hc))                          \
    MPP_BREAK_CATCH(5)                                          \
    MPP_DECLARE(6, hippo *_h = get_hippo(_hc))                  \
    MPP_DECLARE(7, loopvar = _h)                                \
    MPP_BREAK_THROW(5)                                          \
    MPP_FINALLY(8, free_hippo(_hc, _h))

I hope you'll agree that that sort of thing is a lot easier to write (and read) than the elaborate constructions in the previous sections! And yet it achieves more, by pasting together many things of about the size of the above snippets.

Most of the example should be reasonably clear, but I'll talk through it anyway just in case:

MPP_DECLARE brings a declaration into scope, specifically the hippo_retrieval_context that the original version of the loop had to instantiate surrounding the loop as a whole.
MPP_BEFORE and MPP_AFTER arrange to run the supplied pieces of code before and after the code that follows, i.e. before and after the whole loop. So the hippo_retrieval_context is allocated at the start of the loop, and freed when the loop terminates.
MPP_WHILE is the loop itself, and includes the termination condition.
The next two MPP_DECLAREs declare variables with scope inside the loop, holding the actual value retrieved by get_hippo. The second one refers to loopvar, the macro parameter passed in by the user.
MPP_BREAK_CATCH and MPP_BREAK_THROW are used to get round the problem discussed in section 2, where using a for loop to bring a declaration into scope has the side effect of causing break to do something unhelpful. MPP_BREAK_THROW is a macro which detects when the user's code has issued a break (by the technique shown in section 2), and responds by issuing a goto to a label defined by the corresponding MPP_BREAK_CATCH, which in turn issues another break statement. So any break written by the user will be propagated past the two dangerous MPP_DECLAREs, and instead will terminate the MPP_WHILE loop as the user really wanted.
MPP_FINALLY is another break-handling macro. It arranges that the free_hippo call takes place no matter whether the user exited the block naturally (by falling off the end) or by break.

The numbers used as the first parameter to each MPP_ macro are called ‘label IDs’. Most of those macros have to define labels somewhere in their structure and jump to them using gotos, as you can see in the code snippets in the previous sections. Those code snippets use fixed label names for simplicity; but of course in serious use you can't do that, or else you'd never be able to use the same MPP_ macro even twice in the same function, let alone twice in the same loop macro (as we do above, with three instances of MPP_DECLARE). We get round this by constructing label names using the C preprocessor's token-pasting operator ##: each label includes __LINE__ to ensure that multiple invocations of the same loop macro define different labels, and also includes the label ID passed to the macro defining the label. So the constraint is just that each separate MPP_ macro used in a loop construct definition must have a different label ID, to stop them colliding with each other. But it doesn't matter what the label IDs actually are; I use numbers above for brevity, but you could use descriptive names if you prefer.

(One exception to the unique-IDs rule is that corresponding instances of MPP_BREAK_THROW and MPP_BREAK_CATCH must have the same number, so that one can jump to a label defined in the other. In more complex macros you might have to use two instances of each, and then the numbering makes it clear how they match up.)

You might notice that the variable names invented by the above macro begin with underscores (_hc and _h). This is just a convention I chose to make it unlikely that they'll clash with variable names used by the end user calling the loop macro. You don't have to follow the same convention, of course, but I'd suggest that some convention along those lines is probably useful.

Also on the subject of variable declarations, here's a useful feature of MPP_DECLARE. It places the declaration you give it in the initialisation clause of a for statement – but there's no actual need for the thing in a for statement to be a declaration. So MPP_DECLARE can take a declaration or an ordinary assignment. This is useful in the case where we're assigning to a macro parameter passed in by the user, as in the declaration above assigning to loopvar. It means that the user can call the loop macro as either FOR_EACH_HIPPO(hippo *h), declaring a new variable with scope limited to the loop body, or if they prefer they could instead call it as just FOR_EACH_HIPPO(h) where h is some variable of the right type which was already in scope. By writing the loop macro in the above form, we can arrange for both uses to work.

Another thing in the above code that needs explaining is the distinction between MPP_AFTER and MPP_FINALLY, and why I had to use both in the above definition. Both arrange for code to be run after the suffixed statement terminates, but they have different semantics. Firstly, MPP_AFTER only executes its code snippet if the suffixed statement terminated normally, not by break, whereas MPP_FINALLY executes its statement either way. But the suffixed statement of MPP_AFTER, in the above, is a loop, so any break will stop there. So why wouldn't MPP_FINALLY have done just as well?

The answer is that MPP_FINALLY doesn't just handle break by running some code: it also reissues the break, so that it continues propagating upwards and (in the above example) eventually terminates the loop. That means that MPP_FINALLY expands to a statement prefix which includes a break statement that's not contained in any loop – so it would be illegal C to use MPP_FINALLY in any context where there wasn't a surrounding loop. So I can't use MPP_FINALLY at the top level of my loop construction, even if the break statement in it would never actually be reached.

All of this is documented more fully in comments in the header file itself, along with some additional macros to the ones shown, including ones that absorb a following else clause as discussed in section 3. But the example above should give you an idea of what sort of thing this system can do.

5. Use with coroutines

For my pièce de résistance, here's a mechanism for implementing something very similar to Ruby's ‘iterator method’ mechanism, in which you can define an arbitrary function which is called with a suffixed block of code and can ‘call’ that block, with arguments, anywhere in its own control flow.

If you combine the loop-definition macros described above with my other C preprocessor hack to implement coroutines, you can achieve pretty much the same thing in standard C99!

As an example, let's write some code that generates a sequence of integers: specifically, all those integers which are either a power of 2 or three times a power of 2, in increasing order. If we just wanted to print those numbers, we could use the following snippet of code:

void twothree_up_to(int limit)
{
    int i, tmp;
    for (i = 1; i < limit; i *= 2) {
        printf("%d\n", i); /* a power of 2 */
        if (i > 1) {
            tmp = i + (i >> 1);
            if (tmp < limit)
                printf("%d\n", tmp); /* 3 times a power of 2 */
        }
    }
}

I've deliberately picked an example with slightly fiddly control flow, to show off the technique to the full. It's more convenient to call printf twice in each loop iteration, printing first a power of two and then one-and-a-half times that value, than to fiddle with the loop conditions to arrange exactly one iteration per output number; we also need to allow for the special case that when the power of two is 1 we have to skip the one-and-a-half value, and we must also check the second number printed in each iteration against the provided limit to avoid overrunning by one.

So now let's rewrite that as an ‘iterator’ with more or less Rubyish semantics.

First we must define a set of coroutine macros. For full details of the general technique, see my article ‘Coroutines in C’. Here I'll just observe that the details have to be adjusted from the ones in that article to allow for the state structure being allocated on the stack of the calling function rather than being either dynamically allocated or static:

#define MPCR_BEGIN switch (s->_line) { case 0:
#define MPCR_END(dummyval) s->_line = -1; }
#define MPCR_YIELD(value) do                    \
    {                                           \
        s->_line = __LINE__;                    \
        s->_val = (value);                      \
        return;                                 \
      case __LINE__:;                           \
    } while (0)

These macros expect to be used in a function which takes a parameter called ‘s’, which is a pointer to a structure that contains the coroutine's state. The state structure must in turn contain a member called _line, which tracks the coroutine's next resumption point (and is initialised to zero, and set to the special value -1 to indicate that the coroutine has finished and isn't yielding another value), and another member called _val which is the value passed out of the coroutine to the user's code block in each yield operation.

So now we can rewrite the above function as an iterator using those macros. Of course any local variable in the above function which has to persist across a yield must become an extra field in the state structure, which in the above case means that ‘limit’ and ‘i’ must move but ‘tmp’ is OK as it is:

struct twothree_state {
    int _line, _val;
    int limit;
    int i;
};
void twothree_iterator(struct twothree_state *s)
{
    int tmp;
    MPCR_BEGIN;
    for (s->i = 1; s->i < s->limit; s->i *= 2) {
        MPCR_YIELD(s->i);
        if (s->i > 1) {
            tmp = s->i + (s->i >> 1);
            if (tmp < s->limit)
                MPCR_YIELD(tmp);
        }
    }
    MPCR_END(0);
}

Now the other end of the mechanism has to be a loop macro which declares an instance of struct twothree_state to keep the iterator's persistent state in, then repeatedly calls the iterator function on that state structure to get an output value, and stops when it terminates. We can build such a macro without any difficulty using the loop construction kit discussed in section 4:

#define TWOTHREE_UP_TO(loopvar, limitval)               \
    MPP_DECLARE(1, struct twothree_state _state)        \
    MPP_BEFORE(2, _state._line = 0;                     \
               _state.limit = limitval;                 \
               twothree_iterator(&_state))              \
    MPP_WHILE(3, _state._line >= 0)                     \
    MPP_BREAK_CATCH(4)                                  \
    MPP_AFTER(5, twothree_iterator(&_state))            \
    MPP_DECLARE(6, loopvar = _state._val)               \
    MPP_BREAK_THROW(4)

(So we have to declare the coroutine's state; set it all up and call the iterator to get the first value; loop until the _line field becomes negative, which happens as a result of reaching MPCR_END and is the signal to terminate the loop; arrange to call the iterator again after executing the loop body; assign the yielded value into the user's specified loop variable; and finally handle break by propagating it past the MPP_AFTER and MPP_DECLARE macros. All of this is more or less the same as the example in section 4.)

After all those definitions, we can now call our loop macro with a statement or block of our choice:

TWOTHREE_UP_TO(int k, 1000)
    printf("%d\n", k);

6. Limitations

Of course, whenever you use macros to extend C's syntax, the mechanism comes with a few extra constraints. Only a proper language extension, implemented in the compiler itself, would be able to avoid that.

To begin with, though, here are some things that aren't limitations.

Loops constructed by this mechanism are valid standard C99, or valid standard C++. If you avoid using declarations in for loops (so no MPP_DECLARE, if you're using my construction-kit header) then they're valid C89 as well.

All of these macros are switch/case safe, in the sense that a case label inside the user's block will still be associated with a switch completely outside the loop construction. (That's a consequence of not having used switch in the macros themselves.) Of course, jumping into a loop like that will skip any implicit initialisations and allocations and so on that might be hidden in the loop macro, but nothing in the macros themselves forbids the technique.

All of these macros can handle being followed by either a single statement or a braced block, just like C's built-in loop statements. They're if/else safe (in the sense that you can put an unbraced if…else inside one, or put an unbraced one inside if...else, without causing syntactic confusion), unless of course you've deliberately set them up to eat a following else clause.

Those are the good points. Now for the actual limitations of the technique.

As discussed in the previous sections, the loop-construction macros use goto with labels constructed programmatically using __LINE__. So don't put two loop macros defined using these building blocks on the same source line, or they'll most likely define the same labels twice and cause a compile error.

Loops defined using these macros have no way to control the handling of continue. Fortunately, the default handling is probably the right one anyway: in any loop defined using this system, continue will be equivalent to a jump to just before the end of the loop body. (So any post-loop machinery concealed in the loop macro will still be run.)

The mechanism for catching break can only catch break, and won't catch any other kinds of non-local exit from the loop body such as return, goto, longjmp() or exit(). The fact that I called one of my component macros MPP_FINALLY should not mislead you into thinking it's as good as a ‘real’ finally. So if you define a loop by this mechanism which sets up state that has to be cleaned up when the loop finishes, don't write any returns (or anything else on the above list) in the loop body.

The mechanism for accepting an else clause relies on the C syntax rule that every else binds to the nearest unmatched if. This isn't a problem per se, in that no compiler I've ever heard of gets that rule wrong, but unfortunately some compilers (e.g. GCC) give a warning whenever you write code that actually depends on the rule. So although in principle else clauses on constructions defined like this are optional, you might find that in practice they're mandatory to avoid those annoying warnings.

And last but not least, if you use this sort of trickery in code you write for your employer, don't be surprised if your next performance review contains a raised eyebrow or two!

7. Downloads

The header file I describe above is available here: mp.h.

You can also download a test program that uses that header file: mptest.c. If compiled normally, that file sets up a number of these loops and test-runs them in various ways; if compiled with the macro EXPECTED defined, it instead compiles to equivalent ‘normal’ C code, so you can check that the two versions give the same output. Compile with C89 defined as well to cut out all the tests that depend on declaring variables in for statements.

Copyright © 2012 Simon Tatham.
This document is OpenContent.
You may copy and use the text under the terms of the OpenContent Licence.
Please send comments and criticism on this article to anakin@pobox.com.