13 Desugaring as a Language Feature
Because we create languages to simplify the creation of common tasks, what would a language designed to support desugaring look like? Note that by “look” we don’t mean only syntax but also its key behavioral properties.
Given that general-purpose languages are often used as a target for desugaring, why don’t they offer desugaring capabilities in the language itself? For instance, this might mean extending a base language with the additional language that is the response to the previous question.
13.1 A First Example
DrRacket has a very useful tool called the Macro Stepper, which shows the step-by-step expansion of programs. You should try all the examples in this chapter using the Macro Stepper. For now, however, you should run them in #lang plai rather than #lang plai-typed.
(let (var val) body)
((lambda (var) body) val)
If this doesn’t sound familiar, now would be a good time to refresh your memory of why this works.
(let (var val) body) -> ((lambda (var) body) val)
(define-syntax my-let-1 (syntax-rules () [(my-let-1 (var val) body) ((lambda (var) body) val)]))
(my-let-1 (3 4) 5)
((lambda (3) 5) 4)
lambda: expected either <id> or `[<id> : <type>]' |
for function argument in: 3 |
This immediately tells us that the desugaring process is straightforward in its function: it doesn’t attempt to guess or be clever, but instead simply rewrites while substituting. The output is an expression that is again subject to desugaring.
As a matter of terminology, this simple form of expression-rewriting is often called a macro, as we mentioned earlier in [REF]. Traditionally this form of desugaring is called macro expansion, though this term is misleading because the output of desugaring can be smaller than the input (though it is usually larger).
(define-syntax my-let-2 (syntax-rules () [(my-let-2 ([var val] ...) body) ((lambda (var ...) body) val ...)]))
13.2 Syntax Transformers as Functions
Earlier we saw that my-let-1 does not even attempt to ensure that the syntax in the identifier position is truly (i.e., syntactically) an identifier. We cannot remedy that with the syntax-rules mechanism, but we can with a much more powerful mechanism called syntax-case. Because syntax-case exhibits many other useful features as well, we’ll introduce it and then grow it gradually.
The notation of syntax-rules, with no explicit parameter name or other “function header”, may not make clear that it is a functional transformation (though the rewriting rule format does allude to this fact).
In desugaring, we specify one atomic function for the entire process. Here, we are actually writing several little functions, one for each kind of new syntactic construct (such as my-let-1), and these pieces are woven together by an invisible function that controls the overall rewriting process. (As a concrete example, it is not inherently clear that the output of a macro is expanded further—
though a simple example immediately demonstrates that this is indeed the case.)
Write one or more macros to confirm that the output of a macro is expanded further.
There is one more subtlety. Because the form of a macro looks rather
like Racket code, it is not immediately clear that it “lives in
another world”. In the abstract, it may be helpful to imagine that
the macro definitions are actually written in an entirely different
language that processes only syntax. This simplicity is, however,
misleading. In practice, program transformers—
With that prelude, let’s now introduce syntax-case. We’ll begin by simply rewriting my-let-1 (under the name my-let-3) using this new notation. First, we have to write a header for the definition; notice already the explicit parameter:
(define-syntax (my-let-3 x) <sc-macro-eg-body>)
This binds x to the entire (my-let-3 ...) expression.
As you might imagine, define-syntax simply tells Racket you’re about to define a new macro. It does not pick precisely how you want to implement it, leaving you free to use any mechanism that’s convenient. Earlier we used syntax-rules; now we’re going to use syntax-case. In particular, syntax-case needs to explicitly be given access to the expression to pattern-match:
(syntax-case x () <sc-macro-eg-rule>)
Now we’re ready to express the rewrite we wanted. Previously a rewriting rule had two parts: the structure of the input and the corresponding output. The same holds here. The first (matching the input) is the same as before, but the second (the output) is a little different:
[(my-let-3 (var val) body) #'((lambda (var) body) val)]
Observe the crucial extra characters: #’. Let’s examine what that means.
In syntax-rules, the entire output part simply specifies the structure of the output. In contrast, because syntax-case is laying bare the functional nature of transformation, the output part is in fact an arbitrary expression that may perform any computations it wishes. It must simply evaluate to a piece of syntax.
Syntax is actually a distinct datatype. As with any distinct dataype, it has its own rules for construction. Concretely, we construct syntax values by writing #’; the following s-expression is treated as a syntax value. (In case you were wondering, the x bound in the macro definition above is also of this datatype.)
The syntax constructor, #’, enjoys a special property. Inside the output part of the macro, all syntax variables in the input are automatically bound, and replaced on occurrence. As a result, when the expander encounters var in the output, say, it replaces var with the corresponding part of the input expression.
Remove the #’ and try using the above macro definition. What happens?
So far, syntax-case merely appears to be a more complicated form of syntax-rules: perhaps slightly better in that it more cleanly delineates the functional nature of expansion, and the type of output, but otherwise simply more unwieldy. As we will see, however, it also offers significant power.
syntax-rules can actually be expressed as a macro over syntax-case. Define it.
13.3 Guards
Now we can return to the problem that originally motivated the introduction of syntax-case: ensuring that the binding position of a my-let-3 is syntactically an identifier. For this, you need to know one new feature of syntax-case: each rewriting rule can have two parts (as above), or three. If there are three present, the middle one is treated as a guard: a predicate that must evaluate to true for expansion to proceed rather than signal a syntax error. Especially useful in this context is the predicate identifier?, which determines whether a syntax object is syntactically an identifier (or variable).
Write the guard and rewrite the rule to incorporate it.
(identifier? #'var)
[(my-let-3 (var val) body) (identifier? #'var) #'((lambda (var) body) val)]
Now that you have a guarded rule definition, try to use the macro with a non-identifier in the binding position and see what happens.
13.4 Or: A Simple Macro with Many Features
Consider or, which implements disjunction. It is natural, with prefix syntax, to allow or to have an arbitrary number of sub-terms. We expand or into nested conditionals that determine the truth of the expression.
13.4.1 A First Attempt
(define-syntax (my-or-1 x) (syntax-case x () [(my-or-1 e0 e1 ...) #'(if e0 e0 (my-or-1 e1 ...))]))
> (my-or-1 #f #t) |
my-or-1: bad syntax in: (my-or-1) |
(if #f #f (my-or-1 #t))
(if #f #f (if #t #t (my-or-1)))
Why is #f the right default?
(define-syntax (my-or-2 x) (syntax-case x () [(my-or-2) #'#f] [(my-or-2 e0 e1 ...) #'(if e0 e0 (my-or-2 e1 ...))]))
(define-syntax (my-or-3 x) (syntax-case x () [(my-or-3) #'#f] [(my-or-3 e) #'e] [(my-or-3 e0 e1 ...) #'(if e0 e0 (my-or-3 e1 ...))]))
13.4.2 Guarding Evaluation
(let ([init #f]) (my-or-3 (begin (set! init (not init)) init) #f))
(let ([init #f]) (if (begin (set! init (not init)) init) (begin (set! init (not init)) init) #f))
#'(if e0 e0 ...)
(define-syntax (my-or-4 x) (syntax-case x () [(my-or-4) #'#f] [(my-or-4 e) #'e] [(my-or-4 e0 e1 ...) #'(let ([v e0]) (if v v (my-or-4 e1 ...)))]))
When we repeat our previous example, that contained the set!, with my-or-4, we see that the result is #t, as we would have hoped.
13.4.3 Hygiene
Hopefully now you’re nervous about something else.
What?
(let ([v #t]) (let ([v #f]) (if v v v)))
(let ([v #t]) (or #f v))
(let ([v1 #t]) (or #f v1))
(let ([v1 #t]) (let ([v #f]) v v1))
(let ([v1 #t]) (let ([v2 #f]) v2 v1))
13.5 Identifier Capture
(define os-1 (object/self-1 [first (x) (msg self 'second (+ x 1))] [second (x) (+ x 1)]))
(define-syntax object/self-1 (syntax-rules () [(object [mtd-name (var) val] ...) (let ([self (lambda (msg-name) (lambda (v) (error 'object "nothing here")))]) (begin (set! self (lambda (msg) (case msg [(mtd-name) (lambda (var) val)] ...))) self))]))
self: unbound identifier in module in: self |
Work through the hygienic expansion process to understand why error is the expected outcome.
(define os-2 (object/self-2 self [first (x) (msg self 'second (+ x 1))] [second (x) (+ x 1)]))
(define-syntax object/self-2 (syntax-rules () [(object self [mtd-name (var) val] ...) (let ([self (lambda (msg-name) (lambda (v) (error 'object "nothing here")))]) (begin (set! self (lambda (msg) (case msg [(mtd-name) (lambda (var) val)] ...))) self))]))
Work through the expansion of this version and see what’s different.
(define-syntax (object/self-3 x) (syntax-case x () [(object [mtd-name (var) val] ...) (with-syntax ([self (datum->syntax x 'self)]) #'(let ([self (lambda (msg-name) (lambda (v) (error 'object "nothing here")))]) (begin (set! self (lambda (msg-name) (case msg-name [(mtd-name) (lambda (var) val)] ...))) self)))]))
(define os-3 (object/self-3 [first (x) (msg self 'second (+ x 1))] [second (x) (+ x 1)]))
13.6 Influence on Compiler Design
The use of macros in a language’s definition has an impact on all tools, especially compilers. As a working example, consider let. let has the virtue that it can be compiled efficiently, by just extending the current environment. In contrast, the expansion of let into function application results in a much more expensive operation: the creation of a closure and its application to the argument, achieving effectively the same result but at the cost of more time (and often space).
This would seem to be an argument against using the macro. However, a smart compiler recognizes that this pattern occurs often, and instead internally effectively converts left-left-lambda [REF] back into the equivalent of let. This has two advantages. First, it means the language designer can freely use macros to obtain a smaller core language, rather than having to trade that off against the execution cost.
It has a second, much subtler, advantage. Because the compiler
recognizes this pattern, other macros can also exploit it and
obtain the same optimization; they don’t need to contort their output
to insert let terms if the left-left-lambda pattern occurs
naturally, as they would have to do otherwise. For instance, the
left-left-lambda pattern occurs naturally when writing certain kinds
of pattern-matchers, but it would take an extra step to convert this
into a let in the expansion—
13.7 Desugaring in Other Languages
introducing a new identifier (call it i—
but be sure to not capture any other i the programmer has already defined, i.e., bind i hygienically!), binding it to an iterator obtained from o, and
creating a (potentially) infinite while loop that repeatedly invokes the .next method of i until the iterator raises the StopIteration exception.