Stanley Software Productions

Designing a Programming Language

This is a tutorial on how to design and write parts of a programming language. This tutorial assumes that you are a competent programmer. Using a UNIX will help because of the powerful command line, and the precise error messages the shells produce when something bombs (which it will).

DESIGNING A PROGRAMMING LANGUAGE

Last updated: 13.7.06 7:42pm

Designing a programming language is not a small task. It can be simplified by the use of some tricks, but it is by no means simple. Even after you have finished a design and checked it for design problems, you will probably need to write an interpreter and/or a compiler, each with debugging features more advanced than 'Error in program'. If you are still convinced that you would like to design one, I suggest you read on, and go on the journey with me. I am by no means telling you how a language needs to be designed, just a rough guide based on my experience with Esotorus X.

The first thing you need to think about is why you want to write the language. If it is for writing an Operating System in, you are going to have a lot more trouble than if it is simply for Hello World programs. The most common decision by hobbyists who don't really intend their language to be used much is to make an Esoteric programming language, and make it different from everything else. There is a List of Esoteric Programming Languages on WikiPedia. There is also Esolang, a Wiki dedicated to Esoteric Languages (or Esolangs). This tutorial will mainly concentrate on Esolangs. Esolang has a more extensive list than WikiPedia. When I was designing Esotorus X I had three goals: make it Turing complete, make it relatively simple to understand, make it in under a year. Before you start on your language, I suggest you make three goals of your own.

Now that you have your goals, you need to start thinking about the language. Every language needs to have comments, especially an Esoteric one. Comments are important as they make it more obvious which piece of code does what. You will need to take a few sheets of A4 and scribble down your ideas. I decided to make my language heap-based. I didn't concentrate on the stack as I don't really understand how this works. I also decided to use symbols, rather than words, to simplify the interpreter. If the interpreter has to look for phrases, it will be more complicated than if it can scan through one character at a time, go through a switch statement and carry on. I had a heap of 4096 'parts'. I thought (and still think) that this was plenty for what my language was intended for. If you are intending to write a language that supports dynamic linking, hardware (other than screen and keyboard) interfacing and networking, you are likely to need a more advanced system than mine. To create the memory heap, I simply used an array of 'parts'. A part is simply an integer. I also had three pointers, which were each manipulated with different symbols. These pointers each point to a place in the heap. The pointers are in an array of integers of size 3.

That's enough about Esotorus X for now, let's talk about getting something done. For now, get those sheets of paper, and write down what your syntax is going to look like. If you want to use variables, write some pseudo-code about a variable system, if you want to interface with hardware, you are going to have to interface with files and the system APIs. This means you are probably going to have to write some dynamic linking support. See how complicated it gets? Just because you want to print, you need to make a file interfacer and design a dynamic linker. This is why I stayed away from such atrocities. Also note, a language does not need to be able to interface with any hardware at all to be Turing complete. It just needs to be able to perform any calculation the computer can. If Turing-completeness is all that you are worried about, your language could end up fairly simple indeed. Of course, the compiler will be difficult, but OISC (One Instruction Set Computer) has only one instruction and three operands. It has subleq, or subtract and branch if not positive.

Now that you have written down your syntax, goals, and some pseudo-code, you can either test the pseudo-code or you can write a Language Definition. Even if you are the only person who is ever going to use the language, it helps a lot to have a simple resource to go to when you forget how to use it. And you never know, it might get know worldwide, like Brainfuck (which was not designed to be used, but is very well known). A language definition for a hobby language does not need to be too long, just an introduction to how you designed and thought of the language, a paragraph about usage and writing a compiler, and any special cases, and then a table of what commands do, what operands they take, etc. You will definitely need to test your pseudo-code if your language has fairly complex aspects like a high level variable system (by high level, I mean like C and C++. Esotorus X has very low level variables, as you must specify which of the 4096 sections you want your variable to be in. It is also very easy to 'forget' variables by moving the pointer away and not moving it back again). You should now have enough written down to be able to write a preliminary interpreter. The structure of a very simple, single-symbol-per-command language interpreter would be:

Initialise the variables (e.g. memory and pointers)

Read the program a character at a time and pass to a run() function with a switch loop inside it which performs operations on the memory and the pointers.

And that's it. That's all an interpreter needs to do. Now go to your IDE/Text Editor and write an interpreter for your language. You will need an interpreter even if you eventually write a real compiler as you don't want to be debugging the compiler, the interpretation, and the test programs. Debugging a program in a language that's already been created is hard enough, as you know. Having to debug three programs at a time is a little much, so just settle for interpreting for now. By the time you get back you should have a fully working interpreter. I will be happy to provide help via email if you get stuck, so don't delay. My address is at the bottom of this page.

You should now have your interpreter finished and if it is to be a compiled language, you will be itching to write a compiler. Now, there are two ways of going about this: The easy way, the hard easy way and the right way.

In the easy way, for your compiler, you distribute the source for the interpreter, and a compiler for the language this source was written in, and a binary that will read a program written in your language, edit the source of the interpreter to add the program to a string, and compile that source with the compiler distributed with your language. It should then restore the source of the interpreter to it's original state else something bad could happen next time the user compiles something.

In the hard easy way, you need to distribute a compiler for language x (which can be any language of your choice), and a converter, which will convert the program written in your language to language x code, then compiler with the compiler for language x. This will result in faster code than the easy way (especially if language x is assembler), but will require more programming skill.

In the right way, you write a program that will read a program file of your language, and write a binary itself. This, if written well, will result in faster binaries than both of the above, but requires even more programming skill.

To choose which sort of compiler is right for you depends on your programming skill, how much time you have, how much determination you have, and ultimately what the language is for. If it is for Hello World programs and nothing more, it hardly matters what speed it runs at so it should probably be done the easy way, but if it is for writing Operating Systems, you will almost certianly need to do it the right way.

Well, that's it for now. If you think anything is missing from this tutorial (which it almost certianly is), feel free to email me and I will respond as punctually as I see fit. The point that should probably be improved on is writing the interpreter as I don't cover more advanced aspects at all. This is because I have never worked with them, so I can not be sure that what I say will be true.

jamesstanley@bluebottle.com

Hits:
SSP Hit Stats

Designing a Language Hits:
Designing a Language Hit Stats