C! - system oriented programming

This is a first article, intended to be an introduction to to

C!, more articles presenting syntax and inner parts of the* compiler will follow.

C! is one of our projects here at LSE (System Lab of EPITA.) It is a programming language oriented toward lower-level system programming (kernel or driver for example.)

We were looking for a modern programming language for kernel programming and after trying some (D, C++, OCaml … ) it appears that most of them were too oriented toward userland to be used in kernel programming context.

So we decide to modify and extend C to fit our need and quickly aim toward a new programming language: C!

Modern languages and kernel programming

Most parts of kernel code are rather classical: data structures, algorithms and a lot of glue. But, some crucial aspects require lower-level programming: direct management of memory, talking to specific CPU part (interruption management, MMU … ), complete control over data layout, bit-by-bit data manipulation …

Thus, to write some kernel code (or a complete kernel) we need a native language with direct access to this kind of low-level operations. This implies the ability to include ASM code somehow, to manage function calls from ASM and to build standard functions (so you can have function pointers for various interruption mechanisms.)

And, since you’re not in user-land, you can’t use user-land facilities (standard system libs for example). For most languages this means that you must rewrite memory allocators and tools that come shipped with them (especially for managed memory languages using garbage collection).

Another issue is the binary format: when writing user-land programs, your compiler builds a file suited for the kernel binary loader. On most current Unix systems, your file will respect the ELF format. Of course, you can write an ELF loader in your bootloader (or any part of your booting process for that matters) but since you are managing memory and memory mapping, you can’t rely on the way a program is loaded on your system and thus the organization of your ELF must reflect these constraints.

Of course, this issue is not language dependent, even with a pure ASM or C kernel, you will have to control the way your linker builds the final binary. But, in C (and obviously in ASM) there are no major issues there, the structure of your program will be sufficiently simple so that the only important question is: where will I be in memory?

So, what’s wrong with modern languages?

For the most evolved ones such as languages with transparent memory management and garbage collection, one of the most important problem is to provide a replacement for all aspects of the standard libraries of the system: memory allocator, threads and locks management, etc. And in that case some aspects just can’t be rewritten the same way it is in user-land.

The C++ situation is somehow better and worse: in theory there is less runtime needs than most modern languages. The good part is that you can bypass the most problematic elements of C++ (such as RTTI or exceptions) so you don’t have to fight against them. Once you’ve deactivated problematic features and found what can’t be used without them, you have to provide runtime elements needed by your code: start-up code, pure virtual fallback, dynamic stack allocation code, dynamic memory allocation for new and delete operators (for objects and array) …

Roaming here and there, you’ll find documentation on how you can write your C++ kernel, but let’s face it: is the required work really worth the pain?

What’s wrong with C

So, if you’re still reading me, this means that you’re partially convinced that using C++ (or D, or OCaml, or … ) is not a good idea for your kernel. But, why not go on with the good old C programming language?

Since it was designed for that job, it is probably the best (or one of the best) fit for it. But, we want more.

Here is a quick list of what we may find wrong or missing in C:

The C syntax contains a lot of ambiguous traps
While the type system of C is basically size based, a lot of types have an ambiguous size (int for example)
Controlling size and signedness of integers is often painful
There is no clean way to provide some form of genericity or polymorphism
There’s no typed macros
The type system and most static verification mechanisms are too basic compared to what could be done now
C miss a namespace (or module) mechanism
While you can do object oriented programming, it is tedious and error prone

In fact, the above list can divided in two categories:

syntax and base language issues
missing modern features

Genese of C!

Once we stated what was wrong with C, I came up with the idea that we could write a simple syntactic front-end to C or a kind of preprocessor, where we would fix most syntax issues. Since we were playing with syntax, we could add some syntactic sugar as well.

We then decided to take a look at object oriented C: smart usage of function pointers and structures let you build some basic objects. You can even have fully object oriented code. But while it is dead simple to use code with object oriented design, the code itself is complex, tedious, error prone and most of the time unreadable. So, all the gain of the OOP on the outer side, is lost on the inner side.

So why not encapsulated object oriented code in our syntax extension ?

But, OOP means typing (Ok, I wanted static typing for OOP.) And thus, we need to write our own type system and type checker.

Finally, from a simple syntax preprocessor, we ended up with a language of its own.

Compiler-to-compiler

Designing and implementing a programing language implies a lot of work: parsing, static analysis (mostly type checking), managing syntactic sugar, and a lot of code transformations in order to produce machine code.

While syntactic parts and typing are unavoidable, code productions can be shortened somehow: you write a frontend for an existing compiler or use a generic backend such as LLVM. But you still need to produce some kind of abstract ASM, a kind of generic machine code that will be transformed into target specific machine code.

The fact is that a normal compiler will already have done a lot of optimization and smart code transformation before the backend stage. In our case, this means that we should do an important part of the job of a complete C compiler while we are working with code that is mainly C (with a different concrete syntax.)

The last solution (the one we chose) is to produce code for another compiler: in that case all the magic is in the target compiler and we can concentrate our effort on syntax, typing and extensions that can be expressed in the target language.

Based on our previous discussion, you can deduce that we chose to produce C code. Presenting all aspects of using C as a target language will be discussed further in a future article.

Syntactic Sugar

An interesting aspect of building a high-level language is that we can add new shiny syntax extensions quite simply. We decided to focus on syntax extensions that offer comfort without introducing hidden complexity.

Integers as bit-arrays

In lower-level code, you often manipulate integers a bit at a time, so we decided to add a syntax to do that without manipulating masks and bitwise logical operands.

Thus, any integer value (even signed, but this may change, or trigger a warning) can be used as array in left and right position (you can test and assign bit per bit your integer!).

A small example (wait for the next article for full syntax description):

x : int<+32> = 0b001011; // yes, binary value!
t : int<+1>;
t = x[0];
x[0] = x[5];
x[5] = t;

Assembly blocks

When writing kernel code, you need assembly code blocks. The syntax provided by gcc is annoying, you have to correctly manage the string yourself (adding newline and so on.)

On the other hand, I don’t want to add a full assembly parser (as in D compiler for example.) Despite the fact that it is boring and tedious, it implies that the language is stuck to some architectures and we have to rewrite the parser for each new architecture we need …

In the end, I found a way to integrate asm blocks without the noise of gcc but keeping it close enough to be able to translate it directly. Of course, this means that you still have to write clobber lists and stuff.

A little example (using a typed macro function):

#cas(val: volatile void**, cmp: volatile void*, exch: volatile void*) : void*
{
  old: volatile void*;
  asm ("=a" (old); "r" (val), "a" (cmp), "r" (exch);)
  {
            lock
            cmpxchg %3, (%1)
  }
  return old;
}

Macro stuff

Actually, C! has no dedicated preprocessing tools but we included some syntax to provide code that will be macro rather than functions or variables.

First, you can transform any variable or function declaration into a kind of typed macro by simply adding a sharp in front of the name (see previous example). The generated code will be a traditional C macro with all the “dirty” code needed to manage return, call by value and so on.

The other nice syntax extension is macro classes: a macro class provides methods (in fact macro functions) on non object types. The idea is to define simple and recurring operations on a value without boxing it (next article will provide examples).

Modules

Another missing feature of C is a proper module mechanism. We provide a simple module infrastructure sharing a lot (but far simpler) with C++ namespaces. Basically, every C! file is a module and referring to symbols from that module requires the module name, like namespaces. Of course you can also open the module, that is making directly available (without namespace) every symbol of the module.

Namespaces provide a simple way to avoid name specialization: inside the module you can refer to it directly and outside you use the module name and thus no inter-module conflict could happen.

What’s next

In the next article of this series I will present you with C! syntax, the very basis of the object system and macro stuff.

The compiler is still in a prototype state: all features described here are working, but some details are still a bit fuzzy and may need you to do some adjustments in the generated code.

As of now, you can clone C! on its LSE git repository, take a look at the examples in the tests directory and begin writing your own code. Unfortunately, we don’t have an automated build procedure yet, so you will have to do it step by step.