From a0877e662061e16fd8f15be7a350b8920e203ffd Mon Sep 17 00:00:00 2001
From: John Criswell
Amongst other things, LLVM is a platform for compiler writers. Because of its exceptionally clean and small IR (intermediate representation), compiler writing with LLVM is much easier than with -other system. As proof, the author of Stacker wrote the entire +other systems. As proof, the author of Stacker wrote the entire compiler (language definition, lexer, parser, code generator, etc.) in about four days! That's important to know because it shows how quickly you can get a new @@ -78,11 +78,11 @@ language up when using LLVM. Furthermore, this was the first language the author ever created using LLVM. The learning curve is included in that four days.
The language described here, Stacker, is Forth-like. Programs -are simple collections of word definitions and the only thing definitions +are simple collections of word definitions, and the only thing definitions can do is manipulate a stack or generate I/O. Stacker is not a "real" -programming language; its very simple. Although it is computationally +programming language; it's very simple. Although it is computationally complete, you wouldn't use it for your next big project. However, -the fact that it is complete, its simple, and it doesn't have +the fact that it is complete, it's simple, and it doesn't have a C-like syntax make it useful for demonstration purposes. It shows that LLVM could be applied to a wide variety of languages.
The basic notions behind stacker is very simple. There's a stack of @@ -96,11 +96,11 @@ program in Stacker:
: MAIN hello_world ;This has two "definitions" (Stacker manipulates words, not
functions and words have definitions): MAIN
and
-hello_world
. The MAIN
definition is standard, it
+hello_world. The MAIN
definition is standard; it
tells Stacker where to start. Here, MAIN
is defined to
simply invoke the word hello_world
. The
hello_world
definition tells stacker to push the
-"Hello, World!"
string onto the stack, print it out
+"Hello, World!"
string on to the stack, print it out
(>s
), pop it off the stack (DROP
), and
finally print a carriage return (CR
). Although
hello_world
uses the stack, its net effect is null. Well
@@ -124,7 +124,7 @@ learned. Those lessons are described in the following subsections.
Although I knew that LLVM uses a Single Static Assignment (SSA) format, it wasn't obvious to me how prevalent this idea was in LLVM until I really started using it. Reading the -Programmer's Manual and Language Reference +Programmer's Manual and Language Reference, I noted that most of the important LLVM IR (Intermediate Representation) C++ classes were derived from the Value class. The full power of that simple design only became fully understood once I started constructing executable @@ -200,7 +200,7 @@ should be constructed. In general, here's what I learned:
handle_if
-should they encounter another if/then/else statement and it will just work.
+should they encounter another if/then/else statement, and it will just work.
Note how cleanly this all works out. In particular, the push_back methods on
the BasicBlock
's instruction list. These are lists of type
Instruction
which also happen to be Value
s. To create
-the "if" branch we merely instantiate a BranchInst
that takes as
+the "if" branch, we merely instantiate a BranchInst
that takes as
arguments the blocks to branch to and the condition to branch on. The blocks
act like branch labels! This new BranchInst
terminates
the BasicBlock
provided as an argument. To give the caller a way
-to keep inserting after calling handle_if
we create an "exit" block
+to keep inserting after calling handle_if
, we create an "exit" block
which is returned to the caller. Note that the "exit" block is used as the
terminator for both the "then" and the "else" blocks. This guarantees that no
matter what else "handle_if" or "fill_in" does, they end up at the "exit" block.
@@ -283,7 +283,7 @@ One of the first things I noticed is the frequent use of the "push_back"
method on the various lists. This is so common that it is worth mentioning.
The "push_back" inserts a value into an STL list, vector, array, etc. at the
end. The method might have also been named "insert_tail" or "append".
-Althought I've used STL quite frequently, my use of push_back wasn't very
+Although I've used STL quite frequently, my use of push_back wasn't very
high in other programs. In LLVM, you'll use it all the time.
It took a little getting used to and several rounds of postings to the LLVM -mail list to wrap my head around this instruction correctly. Even though I had +mailing list to wrap my head around this instruction correctly. Even though I had read the Language Reference and Programmer's Manual a couple times each, I still missed a few very key points:
This means that when you look up an element in the global variable (assuming -its a struct or array), you must deference the pointer first! For many +it's a struct or array), you must deference the pointer first! For many things, this leads to the idiom:
@@ -319,13 +319,13 @@ will run against your grain because you'll naturally think of the global array
variable and the address of its first element as the same. That tripped me up
for a while until I realized that they really do differ .. by type.
Remember that LLVM is a strongly typed language itself. Everything
-has a type. The "type" of the global variable is [24 x int]*. That is, its
+has a type. The "type" of the global variable is [24 x int]*. That is, it's
a pointer to an array of 24 ints. When you dereference that global variable with
a single (0) index, you now have a "[24 x int]" type. Although
the pointer value of the dereferenced global and the address of the zero'th element
in the array will be the same, they differ in their type. The zero'th element has
type "int" while the pointer value has type "[24 x int]".
-Get this one aspect of LLVM right in your head and you'll save yourself
+
Get this one aspect of LLVM right in your head, and you'll save yourself
a lot of compiler writing headaches down the road.
Linkage types in LLVM can be a little confusing, especially if your compiler writing mind has affixed very hard concepts to particular words like "weak", "external", "global", "linkonce", etc. LLVM does not use the precise -definitions of say ELF or GCC even though they share common terms. To be fair, +definitions of, say, ELF or GCC, even though they share common terms. To be fair, the concepts are related and similar but not precisely the same. This can lead you to think you know what a linkage type represents but in fact it is slightly different. I recommend you read the @@ -342,10 +342,10 @@ different. I recommend you read the carefully. Then, read it again.
Here are some handy tips that I discovered along the way:
Manipulating the stack can be quite hazardous. There is no distinction given and no checking for the various types of values that can be placed on the stack. Automatic coercion between types is performed. In many -cases this is useful. For example, a boolean value placed on the stack +cases, this is useful. For example, a boolean value placed on the stack can be interpreted as an integer with good results. However, using a word that interprets that boolean value as a pointer to a string to print out will almost always yield a crash. Stacker simply leaves it @@ -406,9 +406,9 @@ is terminated by a semi-colon.
So, your typical definition will have the form:
: name ... ;
The name
is up to you but it must start with a letter and contain
-only letters numbers and underscore. Names are case sensitive and must not be
+only letters, numbers, and underscore. Names are case sensitive and must not be
the same as the name of a built-in word. The ...
is replaced by
-the stack manipulting words that you wish define name
as.
+the stack manipulating words that you wish to define name
as.
@@ -429,12 +429,12 @@ a real program.
There are three kinds of literal values in Stacker. Integer, Strings, +
There are three kinds of literal values in Stacker: Integers, Strings,
and Booleans. In each case, the stack operation is to simply push the
- value onto the stack. So, for example:
+ value on to the stack. So, for example:
42 " is the answer." TRUE
- will push three values onto the stack: the integer 42, the
- string " is the answer." and the boolean TRUE.
The built-in words of the Stacker language are put in several groups depending on what they do. The groups are as follows:
(push count) WHILE (words...) -- END
The following fully documented program highlights many features of both the Stacker language and what is possible with LLVM. The program has two modes of operations. If you provide numeric arguments to the program, it checks to see -if those arguments are prime numbers, prints out the results. Without any -aruments, the program prints out any prime numbers it finds between 1 and one -million (there's a log of them!). The source code comments below tell the +if those arguments are prime numbers and prints out the results. Without any +arguments, the program prints out any prime numbers it finds between 1 and one +million (there's a lot of them!). The source code comments below tell the remainder of the story.
@@ -1015,7 +1015,7 @@ remainder of the story. : exit_loop FALSE; ################################################################################ -# This definition tryies an actual division of a candidate prime number. It +# This definition tries an actual division of a candidate prime number. It # determines whether the division loop on this candidate should continue or # not. # STACK<: @@ -1075,7 +1075,7 @@ remainder of the story. # STACK<: # p - the prime number to check # STACK>: -# yn - boolean indiating if its a prime or not +# yn - boolean indicating if its a prime or not # p - the prime number checked ################################################################################ : try_harder @@ -1248,7 +1248,7 @@ remainder of the story. under the LLVM "projects" directory. You will need to obtain the LLVM sources to find it (either via anonymous CVS or a tarball. See the Getting Started document). -Under the "projects" directory there is a directory named "stacker". That +
Under the "projects" directory there is a directory named "Stacker". That directory contains everything, as follows: