aboutsummaryrefslogtreecommitdiff
path: root/gcc/rust/checks/errors/borrowck/bir-design-notes.md
blob: e2edc20ffa8b473844567454511cf1e30d57b8bd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# Borrow-checker IR (BIR) design notes

## Design goals

Rust GCC project aims to use the [Polonius project](https://github.com/rust-lang/polonius) as its borrow-checker.
Polonius operates on a set of [facts](https://github.com/rust-lang/polonius/blob/master/polonius-engine/src/facts.rs) about the program to determine
the actual lifetimes of borrows.
As Polonius's primary analysis is location sensitive, the facts are tied to the program's control flow graph (CFG).
Unlike rustc, which has
its
own three address language specific representation called [MIR](https://rustc-dev-guide.rust-lang.org/mir/index.html), gccrs uses gcc's AST based
representation GENERIC.
GIMPLE (the three address IR) is generated from GENERIC inside GCC and can carry no language-specific information.
Therefore, we
need to generate our own representation of the program's CFG.
Since gccrs has no intention of having a MIR-like IR, the BIR is not to be used for
code generation.
Therefore, BIR carries only minimal information that is necessary for the borrow-checker and for BIR debugging.

Since BIR is in fact a dead branch of the compilation pipeline, the only way to verify its generations is through manual inspection.
To have some frame of reference for testing, BIR build and dump are carefully designed to resemble the textual dump of rustc's MIR as much as
possible.
This includes the style of the output, numbering of locals and order of basic blocks (when possible).

## BIR Dump Example

An example program calculating the i-th fibonacci number:

```rust

fn fib(i: usize) -> i32 {
    if i == 0 || i == 1 {
        1
    } else {
        fib(i - 1) + fib(i - 2)
    }
}
```

Here is an example of BIR dump (note: this needs to be updated regularly):

```
fn fib(_1: usize) -> i32 {
    let _0: i32;
    let _2: i32;
    let _3: bool;
    let _4: bool;
    let _5: bool;
    let _6: usize;
    let _7: i32;
    let _8: usize;
    let _9: i32;
    let _10: i32;

    bb0: {
        _4 = Operator(_1, const usize);
        switchInt(_4) -> [bb1, bb2];
    }

    bb1: {
        _3 = const bool;
        goto -> bb3;
    }

    bb2: {
        _5 = Operator(_1, const usize);
        _3 = _5;
        goto -> bb3;
    }

    bb3: {
        switchInt(_3) -> [bb4, bb7];
    }

    bb4: {
        _2 = const i32;
        goto -> bb8;
    }

    bb5: {
        _6 = Operator(_1, const usize);
        _7 = Call(fib)(_6, ) -> [bb6];
    }

    bb6: {
        _8 = Operator(_1, const usize);
        _9 = Call(fib)(_8, ) -> [bb7];
    }

    bb7: {
        _10 = Operator(_7, _9);
        _2 = _10;
        goto -> bb8;
    }

    bb8: {
        _0 = _2;
        return;
    }
}


```

The dump consists of:

- A function header with arguments: `fn fib(_1: usize) -> i32 { ... }`.
- Declaration of locals: `let _0: i32;`, where `_0` is the return value (even if it is of the unit type). Arguments are not listed here, they are
  listed in the function header.
- A list of basic blocks: `bb0: { ... }`. The basic block name is the `bb` prefix followed by a number.
- Each basic block consists of a list of BIR statements. Instruction can be either assigned to a local (place) or be a statement.
  Instructions take locals (places) as arguments.
- Each basic block is terminated with a control flow instruction followed by a list of destinations:
    - `goto -> bb3;` - a goto instruction with a single destination.
    - `switchInt(_3) -> [bb4, bb7];` - a switch instruction with multiple destinations.
    - `return;` - a return instruction with no destinations.
    - `Call(fib)(_6, ) -> [bb6];` - a call instruction with a single destination. This section is prepared for panic handling.

## BIR Structure

BIR structure is defined in `gcc/rust/checks/errors/borrowck/rust-bir.h`. It is heavily inspired by rustc's MIR. The main difference is that BIR
drastically reduces the amount of information carried to only borrow-checking relevant information.

As borrow-checking is performed on each function independently, BIR represents a single function (`struct Function`). A `Function` consists of a list
of basic blocks, list of arguments (for dump only) and place database, which keeps track of locals.

### Basic Blocks

A basic block is identified by its index in the function's basic block list.
It contains a list of BIR statements and a list of successor
basic block indices in CFG.

### BIR Statements

BIR statements are of three categories:

- An assignment of an expression to a local (place).
- A control flow operation (switch, return).
- A special statement (not executable), which carries additional information for borrow-checking (`StorageDead`, `StorageLive`).

#### Expressions

Expressions represent the executable parts of the rust code. Many different Rust contracts are represented by a single expression, as only data (and
lifetime) flow needs to be tracked.

- `InitializerExpr` represents any kind of struct initialization. It can be either explicit (struct expression) or implicit (range expression,
  e.g. `0..=5`).
- `Operator<ARITY>` represents any kind of operation, except the following, where special information is needed either for borrow-checking or for
  better debugging.
- `BorrowExpr` represents a borrow operation.
- `AssignmentExpr` holds a place for an assignment statement (i.e., no operation is done on the place, it is just assigned).
- `CallExpr` represents a function call.
    - For functions, the callable is represented by a constant place (see below). (E.i. all calls use the same constant place.)
    - For closures and function pointers, the callable is represented by a (non-constant) place.

### Places

Places are defined in `gcc/rust/checks/errors/borrowck/rust-bir-place.h`.

Places represent locals (variables), their field, and constants. They are identified by their index (`PlaceId`) in the function's place database. For
better dump correspondence to MIR, constants use a different index range.

Non-constant places are created according to Polonius path [documentation](https://rust-lang.github.io/polonius/rules/atoms.html). The following
grammar describes
possible path elements:

```
Path = Variable
     | Path "." Field // field access
     | Path "[" "]"   // index
     | "*" Path
```

It is important to highlight that different fields are assigned to different places; however, all indices are assigned to the same place.
Also, to match the output of rustc.
In dump, paths contain at most one dereference and are split otherwise.
Same paths always result in the same place.

Variables are identified by `AST` `NodeId`. Fields indexes are taken from `TyTy` types.

Each place holds indices to its next relatives (in the path tree), `TyTy` type, lifetime and information whether the type can be copies or it needs to
be moved. Not that unlike rustc, we copy any time we can (for simplicity), while rustc prefers to move if possible (only a single copy is held).

## BIR Builders

There are multiple builders (visitor classes) for BIR based on what context is needed in them.
provides the entry point that handles function parameters and return values, and it creates the BIR main unit `Function`.
`rust-bir-internal.h` provides abstract builder classes with common helper methods for all builder and for expression builders.
Specific builders are then defined for expressions+statements, lazy boolean expressions, patterns, and struct initialization.