Building a Debugger with Cassette

Lyndon White (@oxinabox)

Research Software Engineer

With Thanks

  • Tim Holy
  • Kristoffer Carlsson
  • Jarrett Revels
  • Keno Fischer
  • Valentin Churavy

📼📖💆‍♂️ What is Magnetic Read Head ?

A Magentic Read Head sits above a cassette tape, and reads the content off, allowing it to be presented to the user.

  • MRH instruments the IR, and uses the standard compiler.
  • Debugger.jl uses JuliaInterpreter.jl

🤔 Why do you want to do this?

  • MRH was started about the same time as Debugger.jl
  • There was no functioning debugger for 1.0
  • Seemed like Cassette+Revise could be used to make a debugger with little effort.

🐁 Codebase sizes:

lines in src including comments and white space

MRH: 855 lines
Only 200 of them are doing deep Cassette stuff

Debugger.jl: 1285 lines
JuliaInterpreter: 3780 lines

🚀📈 Performance:

function summer(A)
   s = zero(eltype(A))
   for a in A
       s += a
   end
   return s
end

⛹️ Performance of debuggers is complicated:

  • MRH has to recompile every method it touchs the first time,
    • run-only-once is common when debugging.
    • recompiling large libraries is huge slow
  • This kind of instrumentation destroys SIMD, and potentially breaks CPU pipelining
    • We may be able to ask LLVM to fix this.

🤪 🕳️ Insane blackholes exist

julia> foo() = Complex.(rand(1,2), rand(1,2)) * rand(Int, 2,1);

julia> @btime foo();
  297.770 ns (9 allocations: 720 bytes)

julia> @btime Debugger.@run foo();
  15.472 ms (46982 allocations: 1.78 MiB)

julia> @time MagneticReadHead.@run foo()
  #==  Sits there compiling for over 30 minutes before I give up ==#

🐋 MRH massively increases the amount of Lowered IR generated

Original Code Lowered Size: $$O(n_{statements})\qquad \qquad \mathrm{eg:}\quad 160$$

MRH Instrumented Size: $$O(n_{slots} \times n_{statements}) \qquad \qquad \mathrm{eg:}\quad 25,230$$

Compile time-complexity is worse than quadratic vs IR length

🚀📈 On Performance:

Debugger.jl: Significant runtime overhead 🏃⏲️
MagneticReadHead.jl: Significant compile-time overhead 💻⏲️

  • But: This is being worked on in both cases
  • We can have nice things, and the best of both worlds.

🐞 What does a debugger need?

  • At points in code, check if we are hitting a breakpoint: should_break(...)
  • if so take some break_action(...)
  • if not keep running as normal

🎷🎻 How do we instrument our code to make a debugger work?

Insert at points:

if should_break(current_location)
    break_action(current_location, current_variables)
end

💔❓ What does should_break(...) do?

  • Check if we are Stepping
  • Check current_location against a set of breakpoint rules
    • e.g. for line number, function, method, module
    • that were defined by set_breakpoint

💔🎥 What does break_action do?

  • Run a REPL, subsituting in the values for the variables
  • @eval things into Main scope,
    • such that set_breakpoint works
  • Handle actions like StepNext by changing debugger state

💔🎥 What does the break_action(...) need ?

  • know where in the code we are. (breadcrumbs)
  • know the current state: all the variables.
  • Some way to interact with them (debug REPL)
  • Ability to set and remove breakpoints/step-in/out (control)

⚡ Julia IR: Lightning Intro

🍰 Julia: layers of representenstation:

  • Source code
  • AST: quote
  • Untyped IR: @code_lowered
  • Typed IR: @code_typed
  • LLVM: @code_llvm
  • ASM: @code_native

🚫⌨️ Untyped IR: this is what we are working with

  • basically a linearization of the AST.
  • Only 1 operation per statement (Nested expressions get broken up)
  • the return values for each statement is accessed as SSAValue(index)
  • Variables ↦ Slots
  • Control-flow ↦ Goto based expressions
  • function names ↦ GlobalRef(mod, func)

📼🚧 How Does Cassette Work?

  • It is not magic, Cassette is not specially baked into the compiler.
  • @generated function can return a Expr or a CodeInfo
  • We return a CodeInfo based on a modified version of one for a function argument. Its a bit like a macro with dynamic scope.
  • This capacity allows on to build AD tools, Mocking tools, Debuggers and more.
    • Contrast: Swift for TensorFlow which is adding AD into the compiler.

⚙️🤝 Manual pass

call_and_print(f, args...) = (println(f, " ", args); f(args...))

@generated function rewritten(f)
    ci = deepcopy(@code_lowered f.instance())
    for ii in eachindex(ci.code)
        if ci.code[ii] isa Expr && ci.code[ii].head==:call
            func = GlobalRef(Main, :call_and_print)
            ci.code[ii] = Expr(:call, func, ci.code[ii].args...)
        end
    end
    return ci
end

⚙️🤝 Result of our manual pass:

julia> foo() = 2*(1+1);
julia> rewritten(foo)
+ (1, 1)
* (2, 2)
4

🔨📼 Rather than doing call_and_print:

function overdub(f, args...)
    println(f, " ", args)
    rewritten(f, args...)
end

This is how Cassette (and IRTools, and Arborist) work.

🐛 Debugger 1

MRH v0.1-like

Overdub functions you want to debug.

🐛💡 concept prototype:

  • Overdub function calls to trigger break action.
  • This will only allow function calls to be intercepted
  • But most expressions in julia are function calls
  • So very expressive.
  • Do not need to touch IR.
In [2]:
using Cassette
Cassette.@context Concept1
function Cassette.overdub(ctx::Concept1, f::typeof(+), args...)
    method = @which f(args...)
    iprintstyled("Breakpont Hit", :red)
    iprintstyled(method, :green);
    iprintstyled("Args: ", args, :blue);
    println("...press enter to continue...")
    #readline()
    
    Cassette.recurse(ctx, f, args...)
end
In [3]:
function foo(a)
    b = a+1
    c= 2b
    return b+c
end
Cassette.recurse(Concept1(), ()->foo(4))
Breakpont Hit
+(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} in Base at int.jl:53
Args: (4, 1)
Breakpont Hit
+(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} in Base at int.jl:53
Args: (5, 10)
...press enter to continue...
...press enter to continue...
Out[3]:
15

💡💪 Generalizing that prototype

  • add setting of breakpoint via @eval
  • add deleting of breakpoints via Base.deletemethod
  • add Stepping via setting a breakpoint on every call
  • Improve UX via showing a REPL with named variables etc
In [4]:
Cassette.@context Concept2
function set_breakpoint(ctx::Concept2, f::F) where F
    @eval function Cassette.overdub(ctx::Concept2, f::$F, args...)
        method = @which f(args...)
        iprintstyled("Breakpont Hit", :red)
        iprintstyled(method, :green);
        iprintstyled("Args: ", args, :blue);
        println("...press enter to continue...")
        #readline()
        if Cassette.canrecurse(ctx, f, args...)
            Cassette.recurse(ctx, f, args...)
        else 
            Cassette.fallback(ctx, f, args...)
        end
    end
end
Out[4]:
set_breakpoint (generic function with 1 method)
In [ ]:
using Revise: get_method, sigex2sigts, get_signature

macro undeclare(expr)
    quote
        sig = get_signature($(Expr(:quote, expr)))
        sigt = only(sigex2sigts(@__MODULE__, sig))
        meth = get_method(sigt)
        if meth == nothing
            @info "Method not found, thus not removed."
        else
            Base.delete_method(meth)
        end
    end
end

function rm_breakpoint(f::F) where F
    @undeclare function Cassette.overdub(ctx::MagneticCtx, fi::$(F), zargs...)
    end
end
In [5]:
bar(x) = x + blarg(x)
blarg(x) = 5

set_breakpoint(Concept2(), blarg)

Cassette.@overdub Concept2() bar(1)
Breakpont Hit
blarg(x) in Main at In[5]:2
Args: (1,)
...press enter to continue...
Out[5]:
6

🧰 Debugger Bits

Lots of bits of a debugger are not too hard.

📖💻🖨➰ A Decent REPL can be written in 50 lines

an ok one can be written in 12

function run_repl(name2var)
    while true
        code_ast = Meta.parse(readline())       # READ
        code_ast == nothing && break
        code_ast = substitute_vars(name2var, code_ast)
        try
            result = eval(code_ast)             # EVAL
            display(result)                     # PRINT
        catch err
            showerror(stdout, err)
        end
    end                                         # LOOP
end

〰🕵️‍♀️ CodeInfo lineinfo and codelocs

Contain all you need to go from Line to IR statement index

function statement_ind2src_linenum(ir, statement_ind)
    code_loc = ir.codelocs[statement_ind]
    return ir.linetable[code_loc].line
end
function src_line2ir_statement_ind(ir, src_line)
    linetable_ind = findlast(ir.linetable) do lineinfo
        lineinfo.line == src_line
    end
    statement_ind = findlast(isequal(linetable_ind), ir.codelocs)
    return statement_ind
end

And/Or CodeTracking.jl

🚶‍♀️🚶‍♂️ Stepping mode needs to be updated every time you enter or exit a function.

It is basically a state machine:

  • On the way is easy:

    • $4$ states: StepIn StepNext StepContinue StepOut
  • Out is harder:

    • $4\times4=16$ states,
    • Before call
    • From within call
  • This can all be controlled from the overdub

We will now have a short break

This is a snake wearing a hat. It is clearly having a break also.
That is everything I know about this snake.

🐜 Debugger 2

MRH v0.3-like

Modify the code directly to instert the statements

🤝 To do this we are going to need to use an IR pass

  • IR pass is the same kind of thing we did at the start.
  • but Cassette gives us some helpers, and handles some of fiddlier things.
  • Your path function gets as its input a Reflection object.
  • It must return a CodeInfo

👓 What do we Know (Reflection input):

  • method (+ signature + static_params)
  • code_info: (copied) We are going to edit this
    • code: Array of linearized expressions (untyped IR)
    • codelocs: maps from IR statement index to entry in lineinfo
    • lineinfo: tells you a position in linenumber + filenam
    • slotnames: Array of names for all the variables

✍️ IR writing Helpers: call_expr

function call_expr(mod::Module, func::Symbol, args...)
    Expr(:call, Expr(:nooverdub, GlobalRef(mod, func)), args...)
end

Without the :nooverdub Cassette will try and recurse into this

🧾 For each statement we want to

In julia:

if (should_break(method, original_statement_index)
    variables_names=Symbol[]
    variables=Any[]
    # Do stuff for each slot to
    # capture all variables that are defined
    #...
    break_action(method, original_statement_index, variables_names, variables)
end
$original_statement

🧾🏁 For each statement we want to

In untyped IR:

call_expr(MagneticReadHead, :should_break, method, orig_ind),
Expr(:gotoifnot, SSAValue(ind), next_statement_ind)),
call_expr(Base, :getindex, GlobalRef(Core, :Symbol)),
call_expr(Base, :getindex, GlobalRef(Core, :Any)),
    # Do stuff for each slot to
    # capture all variables that are defined
    #...

call_expr(MagneticReadHead, :break_action,
    method, orig_ind, SSAValue(ind+2), SSAValue(ind+3)
)
end
original_statement

🎰 For each slot we want to

In julia:

if @isdefined(slotname)
    push!(variable_names, slotname)
    push!(variables, slot)
end

🧾🏁 For each slot we want to

In untyped IR:

Expr(:isdefined, slot) # cur_ind
Expr(:gotoifnot, Core.SSAValue(cur_ind), cur_ind + 4)
call_expr(Base, :push!, names_ssa, QuoteNode(slotname)
call_expr(Base, :push!, values_ssa, slot)
In [45]:
function fun_times()
    y=2
    x=2
    y=x+y
end 
Cassette.@overdub Concept31(metadata=Ref(0), pass=pass31n) fun_times()
Index 1 #self# getfield(Main, Symbol("##74#75"))()
fun_times() in Main at In[45]:2
Index 1 #self# fun_times
Index 11 #self# fun_times
Index 11 y 2
Index 21 #self# fun_times
Index 21 x 2
Index 21 y 2
+(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} in Base at int.jl:53
Index 1 #self# +
Index 1 x 2
Index 1 y 2
Index 11 #self# +
Index 11 x 2
Index 11 y 2
Index 31 #self# fun_times
Index 31 x 2
Index 31 y 2
Index 41 #self# fun_times
Index 41 x 2
Index 41 y 4
Index 5 #self# getfield(Main, Symbol("##74#75"))()





Out[45]:
4

🙅‍♂️🙅‍♀️ But! you can't just call isdefined whenever you want

function danger19()
    y=2
    function inner()
        h=y
        y=12
        return h
    end
    inner()
end 
Cassette.@overdub Concept31(metadata=Ref(0), pass=pass31n) danger19()

💥🔥 Woot! the optimizer just OOB'ed

🥍📚 Solving that is left as an excercise to the reader

Basically need to make sure variables are declared before you check if they are defined.

Reliable way is to just ban any slot that was touched by NewvarNode.
but then you don't get all the variables.

🤔 How make your instrumenting debugger,

Or Cassette code in general

Not incredibly 🐌 sloooow 🐢

At runtime

MRH v0.3 is literally 4 orders of magnitude faster than v0.2

At runtime

🐆 gotta go fast 🐇

♨️➰ The Hottest of Loops

There is no hot loop as hot as literally every single statement in your code

Almost as hot: Literally every function call in your code

🗄️🖥️ Don't implement your state-machine using dispatch

  • Previously StepNext, Continue were singleton types.
  • working out new state was done via (dynamic) dispatches
  • This needs to be done in every function call ♨️➰
  • Dispatch is fast 🏃. branches on enums is faster 🚀

📠💽 Don't store a running Dict with references to all variables

  • if everytime a variable is assigned to you also assign it into a Dict stored in the Context:
  • Upside: No isdefined stuff.
  • Downside: call setindex every single assignment. Many allocations.

📞📖 Don't call methods to get the Method

  • Calling to methods takes whole microseconds.
  • You get it "free" during a pass.
  • Til then learn to live without.

🚫 👨‍⚕️ investigate @nospecialize

  • This can matter for runtime because sometimes things get compiled again.
  • No sense compiling specialized for every function-type and arg types
  • Instrumentation is always the same, regardless of input type.

∈ 📏 Inline Your Overdubs.

Huge performance gain from

@inline function Cassette.overdub(ctx::HandEvalCtx, f, args...)

❓What can we do about those 🐋 compile times?

MRH massively increases the amount of Lowered IR generated

Original Code Lowered Size: $$O(n_{statements})\qquad \qquad \mathrm{eg:}\quad 160$$

MRH Instrumented Size: $$O(n_{slots} \times n_{statements}) \qquad \qquad \mathrm{eg:}\quad 25,230$$

Compile time-complexity is at worse then quadratic vs IR length

A) Don't instrument every function

  • MRH already has an option for set_uninstrumented!
  • This blocks breakpoints anywhere deeper in the call-stack
  • New Idea: Semi-instrumented. Don't instrument this except to allow children to be instrumented.
  • MRH Instrumented Size: $O(n_{slots} \times n_{statements})$
  • More precisely it is $\approx (4\times n_{slots} + 5) \times n_{statements}$
  • What can we cut:
    • 🚧 building list of slotnames
      • $(n_{slots}+1) \times n_{statements}$
    • ☑️ isdefined before it has been used
    • isdefined after it has been used
      • between both of the above: $\le2 \times n_{slots} \times n_{statements}$

C) Generate-less code by only instrumented each-line once

  • MRH Instrumented Size: $O(n_{slots} \times n_{statements})$
  • More precisely it is $\approx (5 + 4\times n_{slots}) \times n_{statements}$
  • If we instrument per line instread of per statement: $n_{slots} \times n_{lines} + n_{statements}$
  • Easier to reason when debugging at this level anyway
  • Assumes lines follow a good breakup of code.

D) Sortout Compile Caching between julia runs

  • MRH always instruments the same code the same way
  • So we can precompile the instrumentation for all of Base+StdLibs and it is always the same
  • One way is to ship a system-image

🔜 What is next for MagneticReadHead ?

  • Bug squashing
  • Rewrite in IRTools.jl
  • More alignment with Debugger.jl / Apply this kind of tooling into Debugger.jl