No Huddle Offense

"Individual commitment to a group effort-that is what makes a team work, a company work, a society work, a civilization work."

Write your functions in Rust – with Azure & Rocket

May 14th, 2021 • Comments Off on Write your functions in Rust – with Azure & Rocket

Rust is certainly one of the hot programming languages right now. Besides the cool feature set the language offers, companies – like Microsoft, Intel, Google, AWS, Facebook, etc. – embrace it: krustlet, rust-vmm, firecracker, etc. are all written in Rust. Then there is the newly formed Rust foundation and even in a recent survey by Stack Overflow it comes in on top. So why not programming your functions in Rust as well?

Most Cloud providers offer ways to write your function in Rust: AWS has a Rust based runtime for Lambda – using custom AWS Lambda runtimes, Azure offers it through what they call Azure Functions custom handlers. Both are reasonably similar in usage – as basically all you need to do is create an executable that is triggered by the frameworks when an Event arrives. Btw if you want to run it your function on-premises try the Rust template for OpenFaaS. We’ll look into how to integrate with Azure in the following sections.

Similarly to what is described in the Quickstart – it all starts with creating a new package using cargo:

$ cargo new rusty_function

Within the newly created package we need to edit the dependencies. As the function handler will be called through HTTP, a web framework is needed: rocket is one of the major web frameworks in the Rust ecosystem. Secondly serde can be used to serialize and deserialize data. The dependencies section of the Cargo.toml file will hence look something like this:

[dependencies]
rocket = "^0.4.7"
rocket_contrib = "^0.4.7"
serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"

Writing the actual logic of the function is pretty straight forward – in main() start a web server that listens to a specific port – as provided by an environment variable:

fn main() {
    let port_key = "FUNCTIONS_CUSTOMHANDLER_PORT";
    let port: u16 = match env::var(port_key) {
        Ok(val) => val.parse().expect("Whoops - not a int..."),
        Err(_) => 8080,
    };

    let config = config::Config::build(config::Environment::Staging)
        .address("127.0.0.1")
        .port(port)
        .finalize()
        .unwrap();

    rocket::custom(config)
        .mount("/", routes![index, hello])
        .launch();
}

Notice that two routes are setup in the second to last line of code. The one listening to HTTP GET calls to the main index page, and the actual function that implements our handler. The function for handling the request to the “main page” is not strictly necessary – I did notice however, that the Azure functions framework does make a single HTTP GET call on “/” when the functions starts. Implementing the index function is straight forward hence:

#[get("/")]
fn index() -> &'static str { 
    ""
}

Before implementing the logic of the function itself, note that Azure expected the inputs and outputs to be formatted in a certain way – more details can be found here. Let’s focus on the output of the function and make sure it returns a JSON struct similar to this:

{
  "returnvalue": {
    "msg": "hello world"
  },
  "outputs": {
    "res": "ok"
  }
}

Not all of this will actually be returned to the function caller, but some of it is used by the Azure Functions framework itself – like outputs can contain entries you want to put in a Azure Queue storage, and logs (not shown in the example above) can be used to capture logging information. The actual ReturnValue is later on mapped to a response that is send to the function caller – more on that later.

We’ll use a set of structs to defined the JSON – and use the serde crate to enable serialization:

#[derive(Serialize)]
struct Response {
    msg: String,
}

#[derive(Serialize)]
struct OutputBody {
    res: String,
}

#[derive(Serialize)]
struct ResponseMessage {
    returnvalue: Response,
    outputs: OutputBody,
}

Obviously to be more conform with naming conventions in Rust – returnvalue should be called return_value, but Azure is looking for a CamelCase formatted ReturnValue – but luckily allows lower case naming too.

Implementing the actual function is straight forward now – notice the declaration of the /RustFunction route – that is later configured (becoming the name of the function), so the Azure Functions framework knows where to look for the function – more on that in the next steps.

#[post("/RustFunction")]
fn hello() -> json::Json<ResponseMessage> {
    json::Json(ResponseMessage {
        returnvalue: Response{msg: "hello world!".to_string()},
        outputs: OutputBody{res: "ok".to_string()}
    })
}

Make sure you use the nightly toolchain (it’s a rocket requirement atm) using the command rustup override set nightly. Also let’s make sure we get a static linked binary using MUSL by defining the following in the file .cargo/config:

[target.x86_64-unknown-linux-musl]
linker = "rust-lld"

The code will compile now using: cargo build –release –target=x86_64-unknown-linux-musl, and we can run it standalone and even trigger the function if we want to. With a simple curl request the function can be tested locally. For the build command above to work you might need add the musl target to your system using rustup target add x86_64-unknown-linux-musl.

$ curl -X POST localhost:8080/RustFunction
{"returnvalue":{"msg":"hello world"},"outputs":{"res":"ok"}}

Integration with Azure Functions

So far everything has been pretty much generic, to make this function run on Azure some integration steps are required. The files required to do this can be created using the Azure Function Core Tools or probably with some help of your IDE. The command func new can be used to create the right templates. Make sure to configure it using the Custom Handler and HTTP Trigger – name the function e.g. RustyFunction matching the route defined earlier.

In the file host.json we now configure the field defaultExecutablePath to match our binary name – in our case rusty_function. The file RustyFunction/function.json should contain the following:

{
  "bindings": [
    {
      "authLevel": "anonymous",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "post"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    }
  ]
}

Notice that within this configuration file we can define the HTTP verbs and also define how Azure Functions should handle the return values – note the $return that basically tells the framework to return whatever is set for the field ReturnValue (see JSON structure earlier) to the caller.

At this point in time, the function can be published to Azure or run locally using func start. Noticed that when calling the function through he framework we just get the “hello world” message back when testing it with curl:

$ curl -X POST http://localhost:7071/api/RustFunction -d ""
{"msg": "hello world"}

Performance

Finally let’s look at the performance of the function when being deployed on Azure. Therefore let’s compare a function written in Python and one written in Rust, both performing the same task: calculating the area of a circle given a radius.

This is not a totally fair comparison – as we are comparing apples to oranges a bit: the programming languages and their stacks are different, their deployment style differs (custom Linux based runtime vs optimized Python runtime). Also note we are only looking at HTTP triggers (coming with an overhead of serialization and deserialization, running a web framework etc.) – none of the other goodies that are available – and obviously hitting a function with a workload generator to look at performance goes a bit against the purpose of function: the frequency of invocation is really “high” and for this kind of call frequency a micro service style deployment might have been way better. Also some aspects of what influences latencies etc. are not in our control – the internet connection, the framework itself, etc.. Anyhow some numbers are better then no numbers.

Both functions have been deployed in Azure West Europe region – the following plot shows the latency percentiles for the function calls. And no surprise, Rust does well (but do notice the tail latency – that could really hurt overall performance – especially if you have multiple of these functions in your environment):

(Click to enlarge)

Another benefit of writing our function in Rust – the binary is tiny; Just about 5.6M in this case – so you can very easily package it up in a tiny container and run that in Azure, or OpenFaaS or … . Now it would be great to be able us other event triggers and concepts like function orchestration provided by e.g. Durable Functions as well in future.

AI planning algorithms in Rust

June 14th, 2020 • Comments Off on AI planning algorithms in Rust

Artificial Intelligence (AI) is a hot topic, although it seams that the focus is on Neural Networks & Deep Learning on the software/algorithmic side, while GPUs/TPUs are the #1 topic for hardware. While this is fine, it feels sometimes like other parts of what had been part of the overall AI domain in the past fall short (See the book Artificial Intelligence: A Modern Approach). Just by looking at the table of contents you can see many more AI topics which are critical but do not get a lot of attention atm. For example planning & reasoning over the insight you have gained – by e.g. using neural networks – is essential to build “semi-cognitive” autonomous systems. Btw I believe planning (the what/how) is as important as figure out the scheduling (when/where) – for example AlphaGo Zero also uses a planning component (using Monte Carlo tree search) for figuring out the next move – next to a neural network.

To have an excuse to learn Rust and to get more into planning & reasoning algorithms I started developing this library (Warning: this is still a work in progress). The experience with Rust and its ecosystem have been great so far. The Ownership concept takes some time to get used to, and probably only makes total sense after a while. A lot of the other concepts can also be found in other languages, so that makes starting easy.

The following example shows the D* Lite algorithm in action. Assume you have a maze like the one shown in the following figure and the goal is to move a robot from the coordinates (5, 0) to (0, 5). To make things more complicated, we assume a path along the corners of the maze is more “expensive” – so at the start the path a robot could take looks like this:

Robot path planning - t_0(Click to enlarge)

In contrast to A*, D* Lite uses a backward search and enables re-planning – that means that after e.g. the robot has moved two steps (to coordinate (5, 2)), and an obstacle gets removed (coordinate (3, 4) no longer represents a wall) it can re-plan without the need to recalculate all values (saving compute time). This can be seen in the following diagram (both diagrams have been plotted through rustplotlib by using Matplotlib in the background, btw. – this shows how easy it is to integrate Python with Rust.):

Robot path planning - t_2(Click to enlarge)

Why Rust? Over the years I learned some programing languages, some probably gone extinct by now. This includes candidates such as Oberon, Turbo Pascal (anyone remember Turbo Vision?), Elan, and Basic. I used to code in Java & C/C++ for a while – today it is more Python and a little bit of Go. It has been nagging me for a while, that I wanted to learn a language that was somehow “revolutionary” and offered sth new. For example Java is a great language & has a powerful SDK; Python is easy to use; etc. But Rust finally offered something new – the concepts around its memory management (w/o the need for garbage collection) & the way that it enforces you to write clean code are compelling (Maybe we can finally battle cargo cult programming). I have looked into Erlang too btw, but the syntax is just a bit too quirky – however the OTP is a very strong concept and sometimes it feels like that Cloud Native Applications + Kubernetes want to be the future OTP šŸ™‚ Microsoft btw also describes some of their motivations to use Rust here.

So Rust is perfect? Probably not (yet) – many tools/libraries in the ecosystem are still young (e.g. grcov works only with nightly toolchain), and do not offer the same range of capabilities as e.g. in the Python world (still looking for good machine learning library etc. for example). Getting to know Rust is a bit painful in the beginning – and one word of advice: do not always follow the tips from the compiler, which it spews out on errors – sit back, think about what the errors actually means and fix it accordingly.

Any Book suggestions? So for learning Rust I mostly went with the “official” Rust book and got an older hardcopy of the book (the last programming books I bought before this was on Erlang years ago btw.). I heard the O’Reilly book on Rust is great too btw.

AI reasoning & planning

January 4th, 2020 • Comments Off on AI reasoning & planning

With the rise of faster compute hardware and acceleration technologies that drove Deep Learning, it is arguable that the AI winters are over. However Artificial Intelligence (AI) is not all about Neural Networks and Deep Learning in my opinion. Even by just looking at the table of contents of the book “AI – A modern approach” by Russel & Norvig it can easily be seen, that the learning part is only one piece of the puzzle. The topic of reasoning and planning is equally – if not even more – important.

Arguably if you have learned a lot of cool stuff you still need to be able to reason over that gained knowledge to actually benefit from the learned insights. This is especially true, when you want to build autonomic systems. Luckily a lot of progress has been made on the topic of automated planning & reasoning, although they do not necessarily get the same attention as all the neural networks and Deep Learning in general.

To build an autonomous systems it is key to use these kind of techniques which allow for the system to adapt to changes in the context (temporal or spatial changes). I did work a lot on scheduling algorithms in the past to achieve autonomous orchestration, but now believe that planning is an equally important piece. While scheduling tells you where/when to do stuff, planning tells you what/how to do it. The optimal combination of scheduling and planning is hence key for future control planes.

To make this more concrete I spend some time implementing planning algorithms to test concepts. Picture the following, let’s say you have two robot arms. And you just give the control system the goal to move a package from A to B you want the system to itself figure out how to move the arms, to pick the package up & move it from A to B. The following diagram shows this:

(Click to enlarge & animate)

The goal of moving the package from A to B is converted into a plan by looking at the state of the packages which is given by it’s coordinates. By picking up the package and moving it around the state of the package hence changes. The movement of the robot arms is constraint, while the smallest part of the robot arm can move by 1 degree during each time step the bigger parts of the arm can move by 2 & 5 degrees respectively.

Based on this, a state graph can be generated. Where the nodes of the graph define the state of the package (it’s position) and the edges actions that can be performed to alter those states (e.g. move a part of an robot arm, pick & drop package etc.). Obviously not all actions would lead to an optimal solution so the weights on the edges also define how useful this action can be. On top of that, an heuristic can be used that allows the planning algorithm to find it’s goal faster. To figure out the steps needed to move the package from A to B, we need to search this state graph and the “lowest cost” path between start state (package is at location A) and end state (package is at location B) defines the plan (or the steps on what/how to achieve the goal). For the example above, I used D* Lite.

Now that a plan (or series of step) is known we can use traditional scheduling techniques to figure out in which order to perform these. Also note the handover of the package between the robots to move it from A to B this shows – especially in distributed systems – that coordination is key. More on that will follow in a next blog-post.

Dancing links, algorithm X and the n-queens puzzle

January 20th, 2019 • Comments Off on Dancing links, algorithm X and the n-queens puzzle

Donald Knuth’s 24th annual Christmas lecture inspired me to implement algorithm X using dancing links myself. To demonstrate how this algorithms works, I choose the n-queens puzzle. This problem has some interesting aspects to it which show all kinds of facets of the algorithm.

Solving the n-queens puzzle using algorithm X can be achieved as follows. Each queen being placed on a chess board will lead to 4 constraints that need to be cared for:

  1. There can only be one queen in the given row.
  2. There can only be one queen in the given column.
  3. There can only be one queen in the diagonal going from top left to bottom right.
  4. There can only be one queen in the diagonal going from bottom left to top right.

Let’s look at a chess board and determine some identifiers for each constraint that any placement can have (See the youtube video of the lecture at this timestep). Donald Knuth doesn’t go into too many details on how to solve this but let’s look at an 4×4 chessboard example:

(Click to enlarge)

Assigning an identifier can be done straight forward: column based constraints are numbered 0-3; row based constraints get the ids from 4-7; diagonals 8-12; reverse diagonals 13-17 for a chessboard of size 4. Note that you can ignore the diagonal of size 1, as they are automatically covered by the row or column the queen would be placed in.

Calculating the identifiers for the constraints is simple. Given an queen positioned at coordinates (x, y) on a chessboard of size N these are the formulations you can use:

  1. the row constraint equals x
  2. the column constraint equals y + N
  3. for the diagonal constraint equals x + y + (2N-1)
  4. for the reverse diagonal constraint equals (N – 1 – x + y) + (4N-4)

Note that each constraint identifier calculation we use an offset, so we can all represent them in one context: 0 for the row constraints, N for the column constraints, 2N-1 for the diagonal and 4N-4 for the reverse diagonal constraints. For the diagonal constraints you need to ensure they are within their ranges: 2*N-1 < constraint < 3*N and 3*N+1 < constraint < 4*N + 1. If the calculated identifier of the constraints is not within the ranges, you can just drop the constraint – this is the case for queens being placed at corner positions. Placing a queen at coordinates (0,0) will result in 3 constraints: [0, 4, 15]. Placing a queen at coordinates (1,2) will results in 4 constraints: [1, 6, 10, 16].

Now that for each queen that can be placed we can calculate the uniquely identifiable constraints identifier, we can translate this problem into an exact cover problem. We are looking for a set of placed queens, for which all constraints can be satisfied. Let’s look at a matrix representation (for a chessboard of size 4, in which the rows are individual possible placements options of a queen, while the columns represent the constraints. The 1s indicate that it is a constraint that needs to be satisfied for the given placement.

Queen coordinatesPrimary constraintsSecondary constraints
x-coordy-coorddiagonalsr-diagonals

01234567891011121314151617
0,0111
0,11111
0,21111
0,3111
1,0
1111
1,11111
1,21111
1,31111
2,01111
2,11111
2,211
11
2,31
111
3,011
11
3,111
11
3,21111
3,3111

Note that the table columns have been grouped into two categories: primary and secondary constraints. This is also a lesser known aspect of the algorithms at hand – primary columns and their constraints must be covered exactly once, while the secondary columns must only be covered at most once. Looking at the n-queens puzzle, this make sense – to fill the chessboard to the maximum possible number of queens, each row and column should have a queen, while there might be diagonals which are not blocked by a queen.

Implementing algorithm x using dancing links is reasonable simple – just follow the instructions in Knuth’s paper here. Let’s start with some Go code for the data structures:

type x struct {
	left  *x
	right *x
	up    *x
	down  *x
	col   *column
	data  string
}

type column struct {
	x
	size int
	name string
}

type Matrix struct {
	head      *x
	headers   []column
	solutions []*x
}

Note the little change I made to store a data string in the object x structure. This will be useful when printing the solutions later on. The following function in the can be used to initialize a new matrix:

func NewMatrix(n_primary, n_secondary int) *Matrix {
	n_cols := n_primary + n_secondary

	matrix := &Matrix{}
	matrix.headers = make([]column, n_cols)
	matrix.solutions = make([]*x, 0, 0)
	matrix.head = &x{}
	matrix.head.left = &matrix.headers[n_primary-1].x
	matrix.head.left.right = matrix.head

	prev := matrix.head
	for i := 0; i < n_cols; i++ {
		column := &matrix.headers[i]
		column.col = column
		column.up = &column.x
		column.down = &column.x
		if i < n_primary {
			// these are must have constraints.
			column.left = prev
			prev.right = &column.x
			prev = &column.x
		} else {
			// as these constraints do not need to be covered - do not weave them in.
			column.left = &column.x
			column.right = &column.x
		}
	}
	return matrix
}

Now that we can instantiate the matrix data structure according to Knuth's description the actual search algorithm can be run. The search function can roughly be split into 3 parts: a) if we found a solution - print it, b) finding the next column to cover and c) backtrack search through the remaining columns:

func (matrix *Matrix) Search(k int) bool {
	j := matrix.head.right
	if j == matrix.head {
		// print the solution.
		fmt.Println("Solution:")
		for u := 0; u < len(matrix.solutions); u += 1 {
			fmt.Println(u, matrix.solutions[u].data)
		}
		return true
	}

	// find column with least items.
	column := j.col
	for j = j.right; j != matrix.head; j = j.right {
		if j.col.size < column.size {
			column = j.col
		}
	}

	// the actual backtrack search.
	cover(column)
	matrix.solutions = append(matrix.solutions, nil)
	for r := column.down; r != &column.x; r = r.down {
		matrix.solutions[k] = r
		for j = r.right; j != r; j = j.right {
			cover(j.col)
		}
		matrix.Search(k + 1)
		for j = r.left; j != r; j = j.left {
			uncover(j.col)
		}
	}

	matrix.solutions = matrix.solutions[:k]
	uncover(column)
	return false
}

Note that when printing the possible solution(s) we use the data field, described earlier for the x object. The cover and uncover functions needed for the search function to work are pretty straight forward:

func cover(column *column) {
	column.right.left = column.left
	column.left.right = column.right
	for i := column.down; i != &column.x; i = i.down {
		for j := i.right; j != i; j = j.right {
			j.down.up = j.up
			j.up.down = j.down
			j.col.size--
		}
	}
}

func uncover(column *column) {
	for i := column.up; i != &column.x; i = i.up {
		for j := i.left; j != i; j = j.left {
			j.col.size++
			j.down.up = j
			j.up.down = j
		}
	}
	column.right.left = &column.x
	column.left.right = &column.x
}

Now there is only only little helper functions missing, providing an easy way to add the constraints. Note the data field of the x object data structure is set here as well:

func (matrix *Matrix) AddConstraints(constraints []int, name string) {
	new_x_s := make([]x, len(constraints))

	last := &new_x_s[len(constraints)-1]
	for i, col_id := range constraints {
		// find the column matching the constraint.
		column := &matrix.headers[col_id]
		column.size++

		// new node x.
		x := &new_x_s[i]
		x.up = column.up
		x.down = &column.x
		x.left = last
		x.col = column
		x.up.down = x
		x.down.up = x
		x.data = name

		last.right = x
		last = x
	}
}

Given an list of constraints to add - for example for the queen at position (3,2): [3, 6, 12, 14] - this little helper function will add the new x objects to the matrix and weave them in

This is basically all there is to tell about this algorithm. Now you can easily solve your (exact) cover problems. Just create a new matrix, add the constraints and run the search algorithm. Doing so for a 4x4 chess board queens placement problem the code above will give you:

Solution:
0 01
1 13
2 20
3 32
Solution:
0 02
1 10
2 23
3 31

In future it would be nice how dancing links could be used to solve other e.g. graph search algorithms, such as A* or similar.better other ways to implement it.

Tracing your functions

December 27th, 2018 • Comments Off on Tracing your functions

Note: this is mostly just for proof of concept – not necessarily something you want to do in a production system, but might be useful in a staging/test environment.

Functions in a Serverless environment (Tim Bray has a nice write-up of some of the key aspects of Serverless here. And let’s leave aside some side notes like this one.) like OpenFaaS or OpenWhisk are short lived things that really come and go. But wouldn’t it be useful to – regardless of this context – to be able to gain insight on how your functions perform? Hence being able to trace/profile the functions in an environment would be a nice to have add-on.

The following diagram shows a high-level overview of how this could look like.

(Click to enlarge)

Once you have the tracing/profiling of your functions in place FlameScope is a handy tool to visualize them. Here you can see the list of trace for previously executed functions:

(Click to enlarge)

Drilling deeper into them you can see the actually FlameGraphs of each function – for example a function calculating pi.

(Click to enlarge)

As some functions have a very short lifespan, you will also note – that by looking at the heatmaps – not a lot is going:

(Click to enlarge)

I’m not necessarily claiming this is all very useful – especially since some functions are still very short lived, and their traces are not capturing a lot of information, therefore. However the general concept of all this sounds intriguing to me. Running your function on different platforms for example will results in difference in the FlameGraph. The function shown here will calculate Pi and perform some I/O operations for test purposes. The function’s FlameGraph from above looks a bit different when run on a different platform (Xeon vs i5):

(Click to enlarge)

Multiple runs of the same function on the same platform will results in similar FlameGraphs.

The following sections describe how to enable such a scenario:

Step 1) enabling the environment

Frameworks such as OpenFaas or OpenWhisk will use docker [cold|warm|hot] containers to host the functions. So one of the initial steps it to enable tracing in those containers. By one some syscalls are blocked – for good reason – in docker [src]. Unless you want to go dig deep into OpenFaaS or OpenWhisk and change how they run containers, the easiest way forward it to enable e.g. the perf_event_open system call using seccomp security profiles system wide. To do so start dockerd with the following parameters:

$ sudo dockerd -H unix:///var/run/docker.sock --seccomp-profile /etc/docker/defaults.json

An example security profile can be found here. Just whitelist the right system calls that you will need and store the profile in /etc/docker/defaults.json. For this example we will user perf, hence we need to enable perf_event_open.

Note: another option (almost more elegant & more secure) would be to trace the functions not from within the container, but on the host system. perf does allow for limiting the cgroup and hence doing this (through the option –cgroup=docker/…) , but this would yet again require some more integration work with your framework. Also note that although perf does not add a lot of overhead for instrumentation, it also does not come “for free” either.

Step 2) plumbing

As we will trace the functions from within their containers, we need to get the data out of the containers. A simple python HTTP service that allows for POSTing traces to it, will also store the same into files in a directory. As this service can be run in a container itself, this directory can easily be mounted to the container. Now within each container we can simply post the data (in the file trace.perf) to the data collector:

$ curl -X POST http://172.17.0.1:9876/ -H 'X-name: go-func' --data-binary @trace.perf

Step 3) altering the templates

The easiest way to kick of the tracing command perf, whenever a function is executed is by altering the template of your framework. For OpenFaaS your can easily pull the templatest using:

$ faas-cli template pull

Once they have been pulled the easiest thing to do is to alter the Dockerfile to include a shell script which executes e.g. your perf commands:

#!/bin/sh

perf record -o perf.data -q -F 1004 -p $1 -a -g -- sleep 5
perf script -i perf.data --header > trace.perf
curl -X POST http://172.17.0.1:9876/ -H 'X-name: fibonacci' --data-binary @trace.perf

Note that in this example we use perf to instrument at a frequency of 1004 Hz – just offsetting it from 1000 Hz to make sure we capture everything. It might make sense to tweak the frequency according to your environment – 1000 Hz is already providing a lot of detail.

Also we need to alter the Dockerfiles to a) install the perf tool and b) ensure we can execute it with the user app:

...
RUN apk add curl sudo
RUN  apk add linux-tools --update-cache --repository http://dl-3.alpinelinux.org/alpine/edge/testing/ --allow-untrusted

# Add non root user
RUN addgroup -S app &amp;&amp; adduser app -S -G app wheel
RUN adduser app wheel
RUN echo '%app ALL=(ALL) NOPASSWD:ALL' &gt;&gt; /etc/sudoers

WORKDIR /home/app/

# ensure we copy the trace file over as well.
COPY trace.sh           .
...

Also make sure that this shell script will get call whenever your function is triggered. For example in the python templates alter the index.py file, for e.g. golang edit the main.go file. Within those just execute the shell script above with the PID of the current process as the first argument

Step 4) visualize it

FlameGraphs are pretty handy, and a team @ Netflix (including Brendan Gregg) have been busy writing a handy tool to visualize the traces. FlameScope can easily be executed using a docker command:

$ docker run --rm -it -v /tmp/stacks:/stacks:ro -p 5000:5000 flamescope

Note: that I had to do some minor tweaks to get FlameScope to work. I had to update the Dockerfile to the python3 version of alpine, manually add libmagic (apk add libmagic), and don’t forget to configure FlameScope to pickup the profiles from /stacks in config.py.

Update [2019/01/08] Originally I traced the functions with a frequency of 1004 Hz. This seems to be a bit high. As you can see in the following graph – and read this as a rule of thumb, not necessarily the ground truth – sampling at about 200 Hz will give you the necessary information:

(Click to enlarge)