Rilouw.eu

The programming language of your dreams (part 1)

Published

For a long time, my favorite language was Python. It's fairly well designed and useful for any task, from web servers like Django to 3D software like Blender; from neural networks like TensorFlow to cloud computing platforms like OpenStack.

Python offers so much libraries and tools for a developer to play with that you feel you can achieve anything, given the right tool.

What I really enjoyed above all was the community, and the feeling that everyone was according to the same set of standards (it was of course not always the case) and that you could somehow easily agree on what was "Pythonic" and what was not (spoiler alert: in fact, you cannot).

Now that I shifted to using lisp languages (especially Racket), I see my past self as childish and primitive, but it has been a brease passing through Python as a part of my road to becoming a better developer, and alas I may never use Python again for personal projects, I have learned a lot and would still recommend it for anyone wanting to achieve efficiently, quickly and elegantly some IT project.

So what is so good in Racket, that it made me think I finally found the language of my dreams?

Well, there's a lot of features that make Racket an awesome language to work with: parameters, continuations, contracts, syntax-parse... But the feature I want to write about today is by far the one I find really transcendental: #lang (pronounce "hash-lang").

#lang allows writing your own languages. Let's dive into it!


The myth of the universal translator

Sometimes comes a task in your daily job that looks like there's something deeper going on.

Sometimes, you feel like your team would really benefit from a tool that could understand some simple data (XML, JSON, a database, whatever...) and execute some generic behavior depending on a configuration you can tweak.

It' easy, you say: I understand my job and my colleagues' job well, I'm sure I can find a common ground from which I could define generic behaviors, and then customize them with parameters.

Let's say you work in a game development studio. You have multiple teams (gameplay, engine, UI, network, 2D & 3D artists...etc) working together on a final product. You usually want to see what each of those teams are working on, preferably merged together in a bundle that would look as close as possible to what the final result should be.

To achieve that, you need build tools, communication tools, reports, tests, bundlers, a whole suite of custom software and I can certainly assert you have a lot of data files in different formats.

You say: I need a tool that would understand the needs of those teams, and translate them into computer logic so that everything can be compiled together.

I need a custom file format. It needs to be simple, it needs to be adapted to the vocabulary of the teams.

In fact, you could say that you need a kind of "universal translator". Not a translator for natural languages, say French, English, Japanese... But a translator for your teams' languages: Gameplay-language, Artist-language, UI-language.

Let's see if we can design something like that together! We'll try an imaginary UI-language:

XML
<ui>
  <menu id="title_menu">
    <button goto="campaign_menu">Campaign</button>
    <button goto="multiplayer_menu">Multiplayer</button>
    <button goto="credits_menu">Credits</button>
  </menu>
</ui>

Here, the words that bear meaning for your team are ui, menu, button.

But we need to write some software that understand and behave depending on that file.

If you're using a typed language like C# or Java, you're likely to make a UI class, a Menu class, a Button class...etc

Soon you'll start to realize that the more you add classes, the more complexe your software becomes, and more bloated the scope reveals itself.

For example: One of your UI designers needs a way to declare variables for the size of elements, so that some of them can be of a relative size compared to other elements. You start putting references to objects everywhere and end up with a bowl of spaghetti code.

In fact, the more you work on it, the more you are "mapping" software to a human craft, the less you see the end of the tunnel. Obviously because there is no proper end to begin with.

It may take years until you accept the sad truth that what you're trying to do is way out of your scope. You're making a software that permits non-technical users to express their specific needs in the vocabulary of their specific domain, while keeping everything together and doing the Right Thing™ every time, preferably in an automated fashion.

You are making the Universal Translator. The ultimate software that allows non-developers to develop. The ultimate machine that understands humans. Microsoft tried it. Adobe tried it. Pretty much every big IT company tried to make the Universal Translator. All failed.

The bad news is that most programming languages are not suited for programming that kind of software. Because actually, that's not a software you want to make, but a language.

The good news is that there is a name for such a language. It's called a DSL: Domain Specific Language.

DSLs are not a Universal Translator. They are just a way of expressing a domain using the most adapted language.

You will not provide your team software that handles programming for them (we saw this is impossible), you will provide them a language so that they can program themselves, in the language that fits the best how they express their job.

Luckily, there is a family of programming languages that have been specifically designed to create DSLs. It's called lisp.

In the lisp family, there are a lot of different languages. Some designed for science, some designed for learning computer science, some general purposed, and some "language oriented".

Racket is a language oriented language. It's my favorite lisp, and if you followed until now, you're now ready to discover some of the tricks that make this language incredibly powerful in creating domain specific languages.

A bit of lisp history

(You can skip this section if you want by clicking here)

Racket is an implementation of Scheme, which is one of the two major lisp families (the other one being Common Lisp). Both have been around for more than 60 years (you read that well) and mostly differ from each other in deep philosophical questions about whether or not we can implicitely introduce variables in scope without telling the user. This is known as "breaking hygiene" in Scheme, or as "anaphoric awesomeness" in Common Lisp (note that the vocabulary difference says it all about what each community think of this practice).

Scheme says no, you can't break hygiene (except maybe sometimes but don't tell my parents please!), Common Lisp says yes, of course you can, that's an awesome feature (it is). That's about it. There are some other differences but really, unless you are deeply interested in lisp design, you don't care (for now).

Racket says: With great power comes great responsibility. I'm Scheme-based, so let's not break hygiene by default. But let's allow breaking hygiene step by step, by building a framework that prevents things from getting hairy, but still unlock the awesomeness of anaphoric macros.

If I'm writing about Racket today, that's (among other reasons) because I think that's the right approach. The only drawback is that it makes learning Racket macros a bit difficult for beginners. But in the long time, it really pays off.

The only thing you need to remember about this for now is that Racket is a Scheme, and that Scheme is a lisp:

Lisp family
Common LispScheme
CLisp
CMU Common Lisp
GNU Common Lisp
Steel Bank Common Lisp
...
Racket ←
Chez Scheme
GNU Guile
Chicken Scheme
...

This great diversity of languages and implementation is at the same time a benediction (because you get to compare different approaches) and a curse (because it usually needs work to port a library from one lisp to another). But what they all share is an awesome syntax called s-expressions. We'll see how it allows a lot of expressivity and freedom in both programming (as a way to write code) and language/format design (as a way to write data).

Lists and atoms

This is what people have been hearing about the most when talking about lisps: The Dreaded Parentheses. Indeed, there is no exact definition of what a lisp is or is not, but you can usually recognize them at their famous ((((parentheses everywhere)))).

They form s-expressions or "s-exp", which are the basic skeleton from which you write a program in most lisps. They are a tree of lists of lists of lists of lists...etc

It looks like this:

lisp
(always (walk (muddy-forest pine-trees))
  (let mushroom (fetch leafy-forest-ground)
    (when (comestible? mushroom)
      (put mushroom -> tiny-basket))))

You can nearly smell the spicy scent of pines in that piece of code. Half code, half poem, I'd dare say.

I have a quick challenge for you. Read the previous piece of code multiple times. Try to guess what is the meaning, the intention I'm conveying through it. When you think you understand what I want to express with this code, read the following paragraph, which is my English version of what I meant.

Plain English
Walk constantly through a muddy forest of pine trees.
Fetch mushrooms from the leafy ground.
When the mushroom is comestible, put it in your basket.

If you guessed the same meaning from the lisp code, well, I suppose you're starting to realize that parentheses and s-expressions are more than just the syntax of a programming language, but a generic way of expressing anything.

There is no special syntax for functions, variables nor anything you find in a standard programming language. In fact, you just don't care about whether some things are functions, or some variables. You just care about the meaning.

But if you really want a standard definition of the s-exp syntax, it fits in pretty much three lines:

  • Rule #1 - A list is a pair of parentheses.
  • Rule #2 - A list can contain atoms and lists separated by spaces.
  • Rule #3 - Everything that is not blank spaces nor parentheses is an atom.

In our example, always, comestible?, ->, forest, basket, mushroom are all atoms.

Atoms can contain any characters in Racket, even symbols like %, _, -, +, ~, &, @, =, whatever you like.

This is because lists and atoms are the only piece of syntax hardcoded in the language. Everything else is defined by programmers. The if clause, variable definitions, classes if you want classes, structs if you want structs...

The language can be object oriented if you want. Functional if you want. It can be your own language. You are the one defining the rules and the syntax.

This is why the first example I gave is written in a language which is not Racket, nor Scheme, nor Common Lisp. It's just one kind of lisp I just made up to best express my desire of fetching mushrooms in the forest.

This lack of formal syntax allows two things:

  • The language can be whatever you like
  • The language can parse itself as data

The second point is the most important thing to understand. Instead of using some kind of file format for your data, you can actually write your data as code. Because there is no fundamental difference between a piece of code like (send love letters) and a listing of data like (banana apple pear). Both are lists holding three atoms. As a programmer, you are the one defining their meaning.

Therefore, you can write JSON as lisp, HTML as lisp, even Javascript as lisp:

JSON-lisp
(users [
  (
    (name "Jane")
    (surname "Doe")
    (inventory [
      ((name "Life potion") (effect "HP_INC") (amount 10))
    ])
  )
  (
    (name "Hanako")
    (surname "Yamada")
    (inventory [
      ((name "Magic staff") (type "weapon") (damage 4))
    ])
  )
])

NB: I indented it like JSON for the sake of the example, but usually lisp programmers care about their parentheses and don't like leaving them alone on their lines.

HTML-lisp
(html ([lang "en"])
  (head
    (title "My awesome blog about ponies"))
  (body
    (article ([class "first"])
      (p "I like all ponies.")
      (p "But my favorite one is: ")
      (a ([href "/favorite"]
          [class "rainbow-link"])
        "Click here to discover my favorite pony!!"))))

The blog you are reading right now is actually written in this kind of HTML-lisp language.

percentage.js
Javascript-lisp
(function print_percentage (a b)
  (var result null)
  (try
   (set result (* (/ a b) 100))
   (catch (e)
     (console.log "Error: {}" e)
     (return false)))
  (console.log "Percentage is {}%" result)
  (return true))

My point is: you do whatever you like with lisp, and give it the shape you like, regardless of the type of data/code you are working with. This makes it the perfect tool for glueing together different technologies, abstracting away different implementations or formats, or designing new languages for your teams.

For the more savvy of you, if you want the exact technical term that refers to this property of lisp to be at the same time code and data: it's called Homoiconicity.

I can already hear you ask: But what is the point? Javascript is good as it is, I don't need this weird parentheses-everywhere syntax!

Well, the fact that this javascript code is also data means you can edit and modify it on-the-fly. You can create new kinds of control words like do..until, a truly working for-each, some kind of integrated SQL query language (select * from users) that would work on JSON-lisp. You name it. You can do exactly what Babel does, for example: transforming new javascript syntax into old compatible one, making your code more portable.

You can do reflection on your own code:

Sample console session
> (define code (read "percentage.js"))
> (first code)
'function

> (second code)
'print_percentage

> (find 'console.log code)
'((console.log "Error: {}" e)
  (console.log "Percentage is {}%" result))

All your code is editable as a list of atoms. It is normally very difficult to parse and modify Javascript because of its syntax (programmers of linters, minifiers or transpilers can testify), but with s-expressions, everything becomes a lot simpler.

Making languages with Racket

Remember the XML UI-language we wrote earlier so that our UI team can design interfaces? In s-exp, it would look like this:

title-screen.ui
ui-lang
(ui
  (menu ([id "title-menu"])
    (button ([goto "campaign-menu"]) "Campaign")
    (button ([goto "multiplayer-menu"]) "Multiplayer")
    (button ([goto "credits-menu"]) "Credits")))

Let's write a Racket module that can parse and display that!

The first thing to do is to write a new Racket module with the name of our language: ui-lang.rkt.

The bare minimum for a language in Racket looks like this:

ui-lang.rkt
Racket
#lang racket/base
(provide
  (except-out (all-from-out racket/base) #%module-begin)
  (rename-out [module-begin #%module-begin]))

(define-syntax-rule (module-begin expr ...)
  (#%module-begin expr ...))

It's a bit intimidating the first time you see it, but it's actually really simple once you understand that everything shown here is just a smart combination of basic Racket features.

Let's take it line by line:

Line 1
#lang racket/base

This first line means racket/base is the language we are using in the module. Racket provides a lot of different languages. For what we want to do (make a language), the base racket library is enough.

In Racket, a module must always start with a #lang clause, so that your file is recognized as a Racket module, and so that the Racket compiler can know in which language it is written.

Line 2
(provide ...)

The provide clause is the way for modules to export their functions. provide takes a list of identifiers defined in the module (or imported) and exposes them as a public interface, so that when a programmer requires your module with (require "my-module.rkt"), they get the public functions, structs and classes your module implements.

Lines 2 to 4
(provide
  (except-out (all-from-out racket/base) #%module-begin)
  (rename-out [module-begin #%module-begin]))

This clause is the magic that makes your file not a simple Racket module, but a language.

It tells your module to provide all identifiers from racket/base with (all-from-out racket/base) but except-out the #%module-begin function.

In fact, #%module-begin is not really a function, it's a syntax. It's the main wrapper that goes around a Racket module when Racket interprets it. This is a bit like the __init__.py file that gets executed when you import a Python module.

What we are doing here when we say except-out #%module-begin is that we disable the default Racket behavior to replace it with our own.

You can see that replacement happening on line 4: (rename-out [module-begin #%module-begin]). We rename our own module-begin syntax to be the new #%module-begin.

Note: #% is a naming convention for all internal Racket systems. By providing our own version of an internal identifier, we are effectively modifying the language! But only in the context of the language we are building, of course.

Finally come lines 6 and 7:

Lines 6 and 7
(define-syntax-rule (module-begin expr ...)
  (#%module-begin expr ...))

This last piece of code is the definition of our own module-begin syntax.

Right now it does not do much. It accepts an undefined number of expressions expr ... and pass them unchanged to the genuine #%module-begin of Racket (the one we import from racket/base but don't provide back).

We successfully created a language that does nothing but read the content of our file and interpret it as Racket. Which will obviously fail because most of our file has undefined elements: ui, button...etc

But there is something happening nonetheless that is interesting: We did not in any ways had to write parsing logic. We just reused the reader from Racket.

That's where things start to get interesting. We get the full power of Racket at our disposal to implement a software that will manipulate the code, without having to care about parsing it. All the hard work of reading streams of characters, tokenize them, put them in data structures... has been done for us.

All we have to do is write the logic of what the language is supposed to do.

The last step before we can try this is to actually use that language in our ui file. We need to add a #lang clause in our file:

title-screen.ui
ui-lang
#lang s-exp "ui-lang.rkt"
(ui
  (menu ([id "title-menu"])
    (button ([goto "campaign-menu"]) "Campaign")
    (button ([goto "multiplayer-menu"]) "Multiplayer")
    (button ([goto "credits-menu"]) "Credits")))

s-exp is the default s-expression reader from Racket. It takes a Racket module path as argument, reads the next lines as s-expressions, then pass them to #%module-begin inside our lang module.

Let's try this now:

Racket REPL
> (require "title-screen.ui")
; title-screen.ui:2:1: ui: unbound identifier in module
;   in: ui
; [,bt for context]

It works! (developers are the only kind of person who can shout happily when they see an error).

Of course, Racket complains that ui does not exist. But this also means the file got read correctly!

Interestingly, if we tweak a bit our ui-lang module, we can get our file to render as a list by simply quoting it:

Lines 6 and 7
(define-syntax-rule (module-begin expr ...)
  (#%module-begin 'expr ...)) ;; notice the simple quote
Racket REPL
> (require "title-screen.ui")
'(ui
   (menu
    ((id "title-menu"))
    (button ((goto "campaign-menu")) "Campaign")
    (button ((goto "multiplayer-menu")) "Multiplayer")
    (button ((goto "credits-menu")) "Credits")))

In the next article...

In this article, we saw what parentheses and s-expressions really are about in lisp languages, and started to implement a custom language for our data, using Racket as the parser, interpreter and compiler.

In the next article (part 2), we'll see how to convey actual meaning to the language, and transform it into a compiler for an imaginary game console's binary format.

And maybe, in a third part, we'll see how to even get rid of parentheses, and design a more specific syntax for our language, that could look like this:

title-screen.ui
ui
title-menu:
  [Campaign] => campaign-menu
  [Multiplayer] => multiplayer-menu
  [Credits] => credits-menu

Stay tuned, and thanks for reading!

Tags: english, technology, programming, racket, lisp

Back home