Loci Compiler
My interests in compilers emerged a long time ago, as I was (and remain) particularly interested in developing my own language. I was originally going to write a front end for GCC, but the difficulty of doing so turned me away and I decided to write a full compiler that would go all the way from source code to assembly code. As a result, I found myself studying topics such as lexing and parsing (even though I used flex and bison for these), code generation, register allocation, flow analysis and x86 assembly. Later interest took me into topics such as the compilation of object orientated and functional languages, garbage collection (for which I have a strong interest), SSA form and polymorphic types (i.e. C++ templates or Java generics).
The first version of the compiler eventually appeared, written in C, and could successfully compile individual files to x86 assembly. However the internal complexity was immense, there was high coupling and little cohesion, the code was barely split into modules, the compiler was strongly tied to the x86 assembly generator, there was lots of global state and the register allocator was very buggy (and I had to spend weeks testing it, in which time I discovered errors in the compiler book I was studying).
At this point I was fairly convinced of the advantages of OOP, so I began writing a new compiler in C++, reworking previous segments of C code into C++ modules. The register allocator in particular benefited from my use of the standard library containers, reducing the number of bugs substantially. However, decoupling the assembly generator proved difficult and, as the learning process goes, I repeatedly found my structure inadequate. As a result, I have decided to write a front end and use LLVM for the back end (or an interpreter in the early development stages). In addition the language itself was poor, like a very weak version of C, in which the compiler automatically detected the types of the variables (which, since it was a statically typed language, could lead to a lot of confusion).
It was at this time I became interested in Haskell and Java, two languages that present quite attractive (but generally oppositely leaning) designs. Though I do like Haskell, I chose to create an object orientated programming language, partly because I felt Haskell served the functional paradigm very well, whereas I perceived many shortcomings of Java. I had long determined that the language should be statically typed and be amenable to compilation, and after much consideration I have designed Loci.
One of the defining aspects of Loci is that everything is an object (like C#, I know) and all objects are subject to the same rules (unlike Java, in which the string + operator is a special case). Loci places an emphasis on interfaces, to such an extent that they are the only way polymorphism can be achieved, and extends inheritance only contributes to code reuse. As a result of this, Loci is able to easily support multiple inheritance, however the units being inherited from are called ‘partial classes’. The relationship of interfaces, classes and partial classes is not complicated, however it needs to be (and will be) demonstrated in code. (Currently only interface inheritance is being implemented)
Further defining aspects of Loci include the banishment of global state, marking of mutator methods and mutable objects, no downcasting, no null, no break or continue, a for-each loop, clear naming rules, easy to read import statements and no static methods. However, Loci shares many things in common with other OOP languages: unchecked exceptions, packages, garbage collection, operator overloading, generics (unlike Java these do not suffer type erasure) and an immutable string type (like Java strings, except implemented as ropes).
I am gradually building up a collection of sample code to demonstrate the language, which will be uploaded soon.
In more depth:
- The compiler is licensed under the MIT license.
- Like Java but unlike C++, Loci is a completely object orientated language and therefore all methods and state exist within classes.
- Like Java and C++, Loci is designed to be compiled, however it is not locked into any particular compilation model and can therefore be interpreted, compiled to byte code or compiled to native machine code.
- Everything is an object in Loci, including integers, floating point numbers, booleans and arrays (there is no built-in language syntax for arrays). All the primitive objects are immutable and thus the compiler can reduce these objects to primitive types on the real machine (i.e. no slower than C or Java).
- Loci uses the same idea of references to objects as Java.
- Loci supports pass-by-value. Pass-by-reference can be done by passing a Ref object, but you should almost never treat this as an out parameter – just return a tuple from your function instead.
- Like Java, but unlike C++, Loci is fully garbage collected, however Loci has no notion of a finalize method called by the garbage collector. However, Loci allows the programmer to tag specific classes so that they can have a destructor which is called deterministically via reference counting, thus enabling useful patterns such as RAII.
- Loci is statically typed like C++ and Java, and is strongly typed.
- Loci has no concept of built-in type conversions, so conversions usually occur in constructors, unlike Java and C++ where there are built-in conversions between primitive types. However, for a programmer there is little difference and the syntax is almost identical, the only consideration being the lack of implicit type conversion.
- As a result of the lack of type conversions, there is no mapping between the integers 1 or 0 and true or false. If statements and while loops require that the condition is a boolean and thus if(1){} would throw a compile-time error.
- Loci has interfaces like Java, with the same rules.
- Like Java, Loci doesn’t support multiple inheritance of classes but a class or interface can implement/extend multiple interfaces.
However, a class can inherit from multiple partial classes (which themselves can inherit from multiple other partial classes).(May be implemented – but not any time soon). - However, unlike Java, inheritance and implementation are distinct and therefore polymorphism can only be achieved with interfaces. This means that classes cannot be cast to their base classes, but only to their interfaces, and interfaces can be cast to other interfaces they extend.
- Unlike both Java and C++, in Loci a dynamic cast/downcast (cast to a derived class/interface) is not supported.
- Unlike Java, but like C++, Loci supports operator overloading. The ‘==’ operator must be overloaded to have any functionality, since there is no default implementation. The ‘===’ (three equals) operator cannot be overloaded and is equivalent to Java’s ‘==’ operator, for comparing two references for equality.
- While Java and C++ require marking that objects won’t change (like const char *), Loci operates in the opposite way, in which mutable types and variables must be marked with a dollar sign. This helps to make code more predictable, concise and allows for compiler optimization.
- Unlike both C++ and Java, there is no null in Loci and every reference must point to a valid instance. Like C# there will be a nullable type. In Loci all local variables must be initialized where they are defined and all instance variables must be initialized in each constructor (and cannot subsequently be modified).
- All instance variables should be accessed through the use of the # symbol:
localVar = #instanceVar - The names of types (classes and interfaces) and packages must start with an upper case letter and the names of variables must start with a lower case letter. Underscores are not allowed in any names, however numbers are. There are compile-time checks to enforce this.
- Like Java, Loci has an immutable string type, however the implementation will be a rope rather than an array of characters, which means that concatenation is considerably faster. An array of characters implementation will exist if support for fast indexing is required.
- Loci supports exceptions and all exceptions are unchecked.
- There are no access specifiers in Loci: all methods are public and all properties/data are private.
- Loci doesn’t support static methods or static variables.
- The source file structure of Loci is very similar to Java, with a package declaration, an import statement and then the definition of classes, interfaces, functions etc. Like Java and unlike C++, there are no header files and no preprocessor.
- There is only one import statement in Loci which is used to import multiple classes/interfaces via a nice syntax (rather than repeatedly writing the package name, you write it once and write all the classes/interfaces you need from it in braces).
- Loci supports generics which operate in a similar way to Java, however there is no type erasure and Loci generics are therefore completely type safe without any dynamic checks.
- Loci does not support the instanceOf operator, since this should be handled via interfaces and polymorphism. It will actually be possible to implement something similar via enumerated types and a switch statement (you can tag data of different types to each enumerated type).
- Unlike Java and C++, Loci does not provide a default constructor and every class must define at least one constructor.
- Like Python, Loci only supports one format of the for statement which is a for-each loop, the format being identical to a Java for-each loop.
- Loci may support while loops.
- Loci will support the switch statement for enum types.
- Loci does not have break or continue statements.
- Loci supports standard primitive types which are, mapped to their C++ counterparts: Byte => char, ShortInt => short, Int => int, LongInt => long, Single => float, Double => double
- Loci supports a number of specific integer sizes: Int8, Uint8, Int16, Uint16, Int32, Uint32, Int64, Uint64, Int128 and Uint128 (and perhaps more as processor word sizes increase). Loci does support a Char type which is an alias for Uint32 (to accommodate for every unicode character).


