Continuing the deviation from the simple expression parser and going further down the rabbit hole of writing a small scripting language, functions were still an outstanding topic as well as the Maya implementation. If you haven't yet, read Part 1 and Part 2 first.
Starting with functions, this could really be a standalone topic on lower-level function implementations. Here again, the usual disclaimer, I don't actually know anything about low-level programming, and everything I am writing here is based on my own, simplified experiments and interpretation (not necessarily understanding^^) of what I found online.
The basic idea of functions is the same as in higher-level languages. It's a self-contained piece of code that does something with given data. The questions that arise are, how to pass that data, and what happens after?
Arguments
Considering the following dummy function foo:
void foo(int x, int y){};
Calling this function requires two arguments and would typically look something like
foo(2, 5)
2 and 5 would then more or less magically appear as x and y inside of foo. Under the hood, this would look something like this though
foo: ; the label where to jump to when the function gets called
STORE y ; store the top most value in y
STORE x ; store the top most value in x
PUSH 2 ; put 2 to the stack
PUSH 5 ; put 5 on the stack
CALL foo ; for now, this could also be JMP
Function Overloading
Different languages offer a variety of features when working with arguments. Some include function overloading where the same function name can be defined multiple times but for different types. For example, multiplying two floating point numbers vs multiplying two vectors. To make the code find the appropriate function, they have to be labeled accordingly. In my implementation, I am adding the argument types (in assembly the ,, (, and ) characters are not allowed in labels but a) this is not assembly, and b) this is nicer to look at):
foo(int,int): ; the function label
STORE y ; store the top most value in y
STORE x ; store the new top most value in x
CALL foo(int,int)
and
foo(vec3,vec3): ; the function label
STORE v1 ; store the top most value in v1
STORE v2 ; store the new top most value in v2
CALL foo(vec3,vec3)
The exact CALL target is determined during the code generation:
vec3 P = [1.0, 0.0, 0.0];
int a = 1;
foo(P, P);
foo(a, a);
Results in
...
LOAD P
LOAD P
CALL foo(vec3,vec3)
LOAD a
LOAD a
CALL foo(int,int)
Default Arguments
Another, probably more interesting feature in terms of implementation is default arguments. The language supports a power operator so this is fairly useless in practice but it's a nice and simple example to look at:
float pow(float base, float exponent=2.0){
return base^exponent;
};
pow(3.0);
pow(5.0, 3.0);
So far, the generated calls would be pow(float) and pow(float,float). The second is valid but the first label so far does not exist. Treating this as an overload and copying the entire function code several times for each optional argument is unnecessary. Instead, I opted for giving each function with optional arguments several entry points:
pow(float): ; when only one value is provided jump here
PUSH 2.0 ; add the value for the optional argument
JMP 2 ; jump to where the main function body starts
pow(float,float): ; when two float values are provided
STORE exponent ; store the top most value
STORE base ; store the new top most value
At this point, it becomes clear as well why pushing the arguments in the correct order and popping/storing them in reverse order makes more sense than the other way. Essentially if a function expects four input arguments but only one is provided, the remaining three default values can be pushed (if they exist that is!). Also notice that the third line has a JMP instruction, not a CALL.
Memory Management
Luckily, all of this is implemented in Python and I don't have to worry much about the *real* memory management. The VirtualCPU's memory so far was just a simple dict containing the variable names and values. The code in the arguments section will happily run with that memory. But it will also overwrite variable names that exist outside the scope of that particular function. That's a big issue. Resolving it is easy enough though. Turning the memory into a list of dicts and only ever operating on the last one has practically no impact on the VirtualCPU. But now, anytime a function gets called, a new dict (the memory for that particular scope) gets added to the list. To inherit variables from the calling scope, that last element can be a copy of the previous last element. And anytime the callee returns, that last element can be popped from the list and therefore leaving caller's memory completely untouched. Part of why this works is that all my function arguments are passed by value. That worked for the "variable storage" memory and in hindsight, this may have worked for the stack as well but I went a different route. One that took more inspiration from the real world.
The stack memory in the VirtualCPU is only accessed by popping the last element or pushing a new element. That means whenever a function gets called and returns, the stack has to have an expected size.
x = 5 + abs(-3) + 2;
PUSH 5 ; expected memory after: [5]
PUSH 3 ; expected memory after: [5, -3]
CALL abs(int) ; expected memory after: [5, 3]
PUSH 2 ; expected memory after: [5, 3, 2]
ADDL ; expected memory after: [5, 5]
ADDL ; expected memory after: [10]
Imagine after calling the abs function, there was a bunch of garbage on the stack. The ADDL functions would be adding the unexpected values together instead of the desired ones. To prevent this, at some point the stack has to be cleaned of those possible garbage values. In practice, two more registers are used. A stack pointer (SP or ESP), always pointing at the end of the stack as well as a base pointer (BP or EBP) always pointing to the base/start of the current scope (essentially the value ESP holds when a function call happens). The ESP here is not strictly necessary as my stack is a list that dynamically grows and shrinks, but physical memory in an actual CPU is just there. There is nothing to append or pop from so the ESP knows up until which point there is "useful" data and can write new data after.
This can be implemented either on the caller or the callee side. I opted for the latter. When entering a function, the current ESP gets copied into the EBP register. Once the function is done with its code, the ESP gets set back to the value of EBP, and any new pushed value will then override the old, no longer needed data. In VirtualCPU terms that means simply truncating the stack at EBP. This works great apart from the fact that this system breaks when working with nested function calls as the moment a function gets called, the value stored in EBP gets overwritten as registers are globally unique. The solution here was to store the value of the EBP register onto the stack and restore it when returning. These are a lot of stack operations between managing the memory and the parameters so getting the order right was crucial. On top, I haven't talked about return yet.
Return
Not only does a function need to keep track of where to return to, but generally, it also needs to be able to return data back to the caller. Having discussed the memory management above already, I spare you the details of several failed attempts, particularly the ones that tried to sneak a value around the garbage cleanup. The method I landed on was popping the value to be returned into the EAX register. The responsibility of dealing with that return value then falls into the hands of the caller. That means if a function gets called but the return value is ignored, the stack does not get polluted. If the caller expects a return value from the callee, the next instruction is therefore PUSH EAX (loading the EAX register onto the stack). Leading to the final question of how the function knows where to return to.
So far, the difference between JMP and CALL is that the latter creates a new memory scope for the function. The second difference is that CALL also pushes the current PC onto the stack. The callee's first course of action then has to be to pop that value (in my case into the EAX register) to make way for storing the input arguments. As the EAX is globally unique and would be overwritten in nested function calls, as well as when storing the return value, after storing the input arguments, that value gets pushed back onto the stack. All of this might sound complicated, but it's actually a straightforward process that, once implemented, nobody has to think about anymore. I also added a check for when the ESP is smaller than the EBP. In that case, a the code tries to access memory it is not supposed to touch and an access violation error gets raised.
I also added two more small features, one of which is a JMP statement that skips the entire function as I never defined an entry point and the code starts executing from line 1. As well as an ERROR register that can be incremented in case the function misses to return (or in general by any function that wants to indicate that the code may have done something it wasn't supposed to.
To showcase what this looks like in practice, here is a not-so-efficient way to calculate the n-th number of the Fibonacci sequence. If a max value is provided, the result will be the smaller number out of the last number less or equal to max or the n-th number in the sequence.
int fib(int n, int max=-1){
if (n <= 1){
return n;
};
int fib_p = fib(n - 1, max);
int fib_pp = fib(n - 2, max);
if((max != -1) && (max < fib_p + fib_pp)){
return fib_p;
};
return fib_p + fib_pp;
};
int result_fib9 = fib(9);
int result_fib9_max10 = fib(9, 10);
With the VirtualCPU code:
JMP 65; dont execute the function unless its called
fib(int):
POP EAX; save return address
; --- default args ---
PUSH -1
JMP 3; jump to the args section as EAX is already popped
fib(int,int):
POP EAX; save return address
; --- args ---
STORE max; arg 1 (default=-1)
STORE n; arg 0
; --- function setup code ---
PUSH EAX; push the return address
PUSH EBP; save the previous base pointer
PUSH ESP; push current stack pointer
POP EBP; and save as new base pointer
; --- main body ---
PUSH 0
STORE a
PUSH 0
STORE b
PUSH 0
STORE sum
LOAD n
PUSH 1
LE
JZ 5
LOAD n
POP EAX
RET
JMP 1
PUSH 1
STORE i
LOAD i
LOAD n
LT
LOAD sum
LOAD max
LT
LOAD max
PUSH -1
EQ
LOGIC_OR
LOGIC_AND
JZ 10
LOAD a
LOAD b
ADDL
STORE sum
LOAD b
STORE a
LOAD sum
STORE b
JMP -20
LOAD sum
POP EAX
RET
; --- if func does not return ---
MSG Function fib(int) did not return!
PUSH ERROR
PUSH 1
ADD
POP ERROR
RET
; ----- end function
PUSH 9
CALL fib(int)
PUSH EAX
STORE result_fib9
PUSH 9
PUSH 10
CALL fib(int,int)
PUSH EAX
STORE result_fib9_max10
What is not shown above is CALL_BUILTIN which calls Python functions in the background for certain features that I could not implement directly. Those functions are mostly standard functions like ABS(), MIN(), MAX(), or Maya related functions like STATIC(), EVAL(), or REBUILD().
MPU (Maya Processing Unit)
This turned out to be far less fun and much more boring as it was mostly reimplementing all the math I already had but with Maya nodes. Luckily I was able to borrow some code from my previous attempts.
I did however come across an interesting problem once I put attributes into a variables. Consider the following code:
attr-float posX = f@pCube1.translateX;
posX = f@locator1.translateX;
It would not be unreasonable to expect that this connects the locator's translateX to the cube's translateX. However, this is a regular assignment, so posX now holds the value f@locator1.translateX. I added a new assignment operator := specifically for connecting attributes.
posX = 10;
posY := 10;
Here, the attribute in posX is set to 10 and posY gets connected to a new node with an integer attribute set to 10. Courtesy of the auto type conversion.
I also replaced the attr-vec4 attribute placeholders with attr-pnt which are essentially the same as attr-vec3, though when multiplying them with matrices, their w component gets interpreted as 1 instead of 0 for attr-vec3. attr-vec2, attr-mat2, and attr-mat3 are implemented as regular vector and matrix attributes but when querying the data, only the required components will be returned.
Another modification I had to make to the replacement system I showed in the first part was recursive formatting. This looks rather ugly, but it now allows to access multiple attributes in a loop as if there was an array although there isn't as well as 0-padded indexing. In the code, it's basically while "{" in node: node = node.format(**memory)
Below is the implementation of the parallel transport algorithm I showed in the first part:
memory = {
"num_locs": 5,
"num_rivets": 10,
"up_start": [0,1,0],
"DEBUG": 1,
"curve": "curve"
}
# create the locators spanning the curve
loc = {}
for i in range(memory["num_locs"]):
loc[f"loc{i}"] = cmds.spaceLocator(name=f"demo_{i:03}_loc")[0]
# create the rivets riding on the curve
riv = {}
for i in range(memory["num_rivets"]):
riv[f"riv{i}"] = cmds.createNode("joint", name=f"demo_{i:03}_jnt")
memory.update(loc)
memory.update(riv)
# create the curve
cmds.curve(name=memory["curve"], p=memory["num_locs"] * [[0,0,0]], d=2)
source = """
// this variable gets passed into memory, but the parser doesnt
// know that therefore it gets declared here
int num_locs;
// run through the locators and connect them to the curve
// recursive formatting is applied, the memory holds the locators
// under 'loc##'. First, {idx} gets replaced, as python would throw
// an error, the outer {loc##} formatting needs to be escaped: {{loc##}}
// the recursive formatter then resolves the index first and the loc## second
for(int idx=0; idx<num_locs; idx++){
v@{curve}.cv[{idx}] = v@{{loc{idx}}}.t;
v@{{loc{idx}}}.t = idx * [5,0,0];
};
// rebuild the curve with a variable number of spans
attr-int spans = 5;
attr-crv crv = REBUILD(crv@{curve}.ws[0], spans);
// set up the rivets
int num_rivets;
// this attribute does not exist but it gets creates when used the
// first time. So the line below when debug mode gets turned on
attr-int debug = i@{curve}.debugMode;
debug = 1;
vec3 up_start;
attr-vec3 pos, tgt, aim, up, side;
up = up_start;
float parm;
for(int i=0; i<num_rivets; i++){
parm = MIN(i / (num_rivets-1.0), 0.999);
pos = EVAL(crv, parm);
tgt = EVAL(crv, parm+0.001);
aim = NORMALIZE(tgt-pos);
side = NORMALIZE(aim ^ up);
up = NORMALIZE(side ^ aim);
m@{{riv{i}}}.opm = MAT4(aim, up, side, pos);
f@{{riv{i}}}.radius = 0.5;
// this IF is different from the control flow if(){} as this is a
// live node connection that allows for changing the value using
// an attribute and a condition node
i@{{riv{i}}}.displayLocalAxis = IF(debug, 1, 0);
};
"""
code = Analyzer.parse(source)
CPU.run(instructions=code, debug=False, memory=memory)
This is much cleaner and easier to look at than what I had initially.
Conclusion
Will it replace the Python code in my repo? No. This was mainly a learning project and it gave me many insights and new ideas. Any new Maya setups I have to build, particularly the math-heavy stuff, I might try to implement with this, but I will not go back and replace any meaningful amount of Python code (for now).
Cheers!
Commentaires