September 16, 2021—Why Rust futures are better because they are polled (programming)
Both Rust and C# support asynchronous methods and awaitable expressions.
However, in C# you have to allocate things on the heap and perform dynamic dispatch when you compose awaitable expressions. Using the heap puts pressure on the garbage collector which slows things down, and dynamic dispatch is opaque to the compiler and puts pressure on the JIT engine which slows things down.
Rust doesn’t have these problems. Instead, async/await is a zero-cost
abstraction. It’s even possible to use async/await in [no_std]
(a heapless
environment).
I’m going to explain how async/await works in both languages, then show how the
fact that Rust’s Future
trait is polled means Rust doesn’t have to allocate
on the heap or perform dynamic dispatch when you compose awaitable expressions
in Rust. I also hope to clear up a common misunderstanding about the nature of
polling the futures in Rust (no, polling doesn’t make them slow or inefficient).
I think explaining why or how to use async/await is out of the scope of this article. In fact, if you aren’t totally comfortable using async/await then this article might not be for you. But if you love it and want to learn a little more about why Rust’s polled futures are great then keep reading.
It’s surprising, but polled futures are better!
Warning: Probably none of the code in this article compiles or works. It was written off the cuff and should only be taken for inspiration.
First, let’s take a look at some example async code in both languages.
In C# you can do this:
using System;
using System.Threading.Tasks;
static class Program
{
static void Main()
{
var task = RunAsync(); // Immediately prints "Just a second..."
task.GetAwaiter().GetResult(); // Blocks as it delays for one second then prints "Ok!"
}
static async Task RunAsync()
{
Console.WriteLine("Just a second...");
await Task.Delay(TimeSpan.FromSeconds(1));
Console.WriteLine("Ok!");
}
}
And in Rust you can write this:
use async_std::task;
use futures::executor::block_on;
use std::time::Duration;
async fn run() {
println!("Just a second...");
task::sleep(Duration::from_secs(1)).await;
println!("Ok!");
}
fn main() {
let future = run(); // Doesn't print anything
block_on(future); // Blocks as it prints "Just a second...", delays for one second, and then prints "Ok!"
}
Both programs will output “Just a second…”, then nothing will happen for one second, then they’ll both output “Ok!”.
The asynchronous methods in both languages are marked by the async
keyword:
static async Task RunAsync()
// ^^^^^
async fn run()
// ^^^^^
And the awaitable expressions in both languages involve the await
keyword:
await Task.Delay(TimeSpan.FromSeconds(1));
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expression
// ^^^^^ keyword
task::sleep(Duration::from_secs(1)).await;
// ^^^^^^ keyword
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expression
Functionally, C#’s tasks are pretty similar to Rust’s futures. They’re both pretty ergonomic, and they both bring all the benefits of async/await code.
You’ll notice that one difference is that a C# task automatically starts as soon as you create it, and it runs itself, but a future in Rust is inert and has to be ushered along by something else.
Rust’s futures have to be polled.
What?! Yes you heard right. Polled.
But that’s a good thing. To understand why, we need to do a deep dive into C#.
Let’s turn our attention to C#. I’ll explain how things work in C# and then show why heap allocations and dynamic dispatch are inevitable.
await
keyword worksC#’s compiler automatically transforms async
methods into a state machine. And
within an async
method you can await
awaitable expressions.
For example, when you execute this RunAsync()
method:
async Task RunAsync()
{
Console.WriteLine("Running!");
int value = await GetValueAsync();
if (value % 2 == 0)
{
await DoEvenThingAsync();
}
else
{
object oddResult = await DoOddThingAsync();
Console.WriteLine(oddResult.ToString());
}
}
…then you’re actually executing something like this monstrosity that the compiler automatically generated for you:
Task RunAsync()
{
int value;
object oddResult;
var stateMachine = new StateMachine(
(int step) =>
{
switch (step)
{
case 0:
{
Console.WriteLine("Running!");
var awaitable = GetValueAsync();
var awaiter = awaitable.GetAwaiter();
if (awaiter.IsCompleted)
{
value = awaiter.GetResult();
stateMachine.RunStep(1);
}
else
{
awaiter.OnCompleted(x =>
{
value = x;
stateMachine.RunStep(1);
});
}
}
break;
case 1:
{
if (value % 2 == 0)
{
var awaitable = DoEvenThingsAsync();
var awaiter = awaitable.GetAwaiter();
if (awaiter.IsCompleted)
{
awaiter.GetResult();
}
else
{
awaiter.OnCompleted(() => stateMachine.RunStep(2));
}
}
else
{
var awaitable = DoOddThingsAsync();
var awaiter = awaitable.GetAwaiter();
if (awaiter.IsCompleted)
{
oddResult = awaiter.GetResult();
stateMachine.RunStep(3);
}
else
{
awaiter.OnCompleted(x =>
{
oddResult = x;
stateMachine.RunStep(3);
});
}
}
}
break;
case 2:
{
stateMachine.Finish();
}
break;
case 3:
{
Console.WriteLine(oddResult.ToString());
stateMachine.Finish();
}
break;
}
}
);
stateMachine.RunStep(0);
return stateMachine.ToTask();
}
Of course the above is just inspirational pseudocode. But can you trace through it and see how it works?
Please don’t get lost. What I want you to see is if you want to get a value out of a C# awaitable expression, you have to do something like this:
Action<T> handleTheResult = ...;
var awaiter = (expression).GetAwaiter();
if (awaiter.IsCompleted)
{
var result = awaiter.GetResult(); // Will return immediately; might throw an exception
handleTheResult(result);
}
else
{
// .GetResult() won't immediately return because the task isn't yet complete
awaiter.OnCompleted(() =>
{
// This code will be executed when the task is completed
var result = awaiter.GetResult(); // _Now_ it will immediately return (or throw an exception)
handleTheResult(result);
});
}
…and the compiler automatically does that for you with a state machine when
you use the await
keyword in an async
context.
If you want to learn how to tell C# to build the async state machine in a specific way then take a look at C#’s task type documentation.
If you look at how the Task<T>
class’s state machine
is constructed
then you’ll see that it’s a bit different than my above pseudocode. Among other
things, it captures the current SynchronizationContext
and ExecutionContext
so that continuations can be executed on them.
The fact that SynchronizationContext
and ExecutionContext
are static state
in C# was significant to the Rust language team when they decided how to design
their Future
trait. Rustaceans don’t like static state. Initial
implementations of futures in Rust relied on thread local storage, but that was
a dead end because thread local storage isn’t available in [no_std]
Rust code.
Okay let’s quit wandering around in these weeds and see if we can make some more progress.
You don’t have to rely only on Task
, ValueTask
, or any of the other
framework-provided awaitable types. You can create your own
awaitable type.
Basically it comes down to implementing this interface on a thing:
using System.Runtime.CompilerServices;
public interface IAwaiter<out T> : INotifyCompletion
{
bool IsCompleted { get; }
T GetResult();
}
…then returning that thing from your awaitable type’s GetAwaiter()
method.
That interface doesn’t exist in the .Net framework. You can create it if you
want, but any type that has that signature—implement INotifyCompletion
,
have a bool IsCompleted
property, have a T GetResult()
method—will
work as an awaiter. And any type that returns an awaiter from a GetAwaiter()
method can be awaited.
Like this:
class MyAwaitable<T>
{
public MyAwaiter<T> GetAwaiter() => ...;
}
class MyAwaiter<T> : IAwaiter<T>
{
public bool IsCompleted => ...;
public T GetResult() => ...;
public void OnCompleted(Action continuation) => ...;
}
static class Foo
{
public static async Task RunAsync()
{
Action<T> handleTheResult = ...;
Awaitable<int> awaitable = ...;
int result = await awaitable; // You can await it!
handleTheResult(result);
}
}
Recall from the previous section how the compiler will create a state machine that essentially does this with your awaitable type:
// Almost the same as the RunAsync method above:
Action<int> handleTheResult = ...;
Awaitable<int> awaitable = ...;
var awaiter = awaitable.GetAwaiter();
if (awaiter.IsCompleted)
{
var result = awaiter.GetResult(); // Will return immediately; might throw an exception
handleTheResult(result);
}
else
{
// .GetResult() won't immediately return because the task isn't yet complete
awaiter.OnCompleted(() =>
{
// This code will be executed when the task is completed
var result = awaiter.GetResult(); // _Now_ it will immediately return (or throw an exception)
handleTheResult(result);
});
}
You should now understand how you need to implement your awaitable type. The
awaiter type returned from GetAwaiter()
needs to do these things:
true
from IsCompleted
when the asynchronous work is completedOnCompleted
method. This
callback should be executed when the asynchronous work is completed. Keep in
mind a callback won’t be passed to OnCompleted
if IsCompleted
returned
true when it was checkedGetResult
method. Some implementations will choose to make GetResult
synchronously
block if the asynchronous work isn’t yet completeThe documentation
explains how to create an awaitable type that returns nothing/void. Basically,
GetResult
should return void
instead of T
. If only C# would let us use
void
as a generic type parameter!
So how do you compose (as in combine) awaitable expressions? You now have everything you need to answer this question.
As I showed in the “How the await
keyword works” section above, the C#
compiler transforms code like this:
async Task<T> FooAsync()
{
// Synchronous block 1
var a = GetA();
var b = GetB(a);
// await keyword
var c = await GetCAsync(b);
// Synchronous block 2
var d = GetD(c);
var e = GetE(d);
// await keyword
var f = await GetFAsync(e);
return f;
}
…into a state machine.
The state machine has one step for each synchronous code block. The steps are wrapped up as continuations/callbacks which get passed to the various awaiters along the way. The awaiters decide when to execute the callbacks.
You can do the same thing by hand if you wish.
So the answer is: in C#, awaitable expressions are composed with callbacks.
You should now see why heap allocations and dynamic dispatch are inevitable in
C#’s async/await system: C# awaitable expressions are composed with callbacks.
These callbacks have the type Action
, which is a delegate.
To package something (e.g. a step in a state machine) up into an Action
you
have to put something on the heap.
And invoking an Action
is dynamic dispatch.
ValueTask
?I can just hear it now. You’re wondering if this applies to ValueTask
. After
all, Microsoft created ValueTask
expressly
so that:
nothing need be allocated [in certain cases]: we can simply initialize this
ValueTask<TResult>
struct with theTResult
and return that.
But keep reading. That only applies to the synchronous case, when the result of an asynchronous method is already known and can be returned synchronously.
In other words, when the awaiter’s IsCompleted
property returns true then its
GetResult
method is immediately/synchronously invoked and the result is
immediately available. There’s no need to instantiate an Action
to pass to its
OnCompleted
method.
But when the result isn’t available synchronously: heap allocation and dynamic dispatch.
Now, ValueTask
has a trick up its sleeve. You can create a ValueTask
from an
IValueTaskSource
.
That does help minimize heap allocations in certain cases, but it’s kind of
complicated to implement that interface. And I think that implementations
necessarily come with compromises.
For example: when you create a ValueTask
with an IValueTaskSource
then you
create it
with a token.
And that token is only a 16-bit integer. You can only have so many ValueTask
s
in play at once for a given IValueTaskSource
. And how do you know when a
ValueTask
should be taken out of play and its token recycled? If I have a
ValueTask
then what’s to keep me from continuing to call
valueTask.GetAwaiter().GetResult()
on it over and over? There is nothing built
into ValueTask
or IValueTaskSource
to let you know when the token can be
recycled. You can only recycle the tokens if you have full control over the
ValueTask
s that are created with your IValueTaskSource
. That limits the
usefulness of this trick.
Edit September 29, 2021:
The documentation for
ValueTask
says:A
ValueTask<TResult>
instance may only be awaited once, and consumers may not readResult
until the instance has completed. If these limitations are unacceptable, convert theValueTask<TResult>
to aTask<TResult>
by callingAsTask
.The following operations should never be performed on a
ValueTask<TResult>
instance:
- Awaiting the instance multiple times.
- Calling
AsTask
multiple times.- Using
.Result
or.GetAwaiter().GetResult()
when the operation hasn’t yet completed, or using them multiple times.- Using more than one of these techniques to consume the instance.
If you do any of the above, the results are undefined.
Now, I’m pretty sure these limitations do not apply if you create a
ValueTask
from aTask
. But do you see how evenValueTask
has specific limitations?These limitations necessarily exist because
ValueTask
is astruct
(meaning you use value semantics on it and instances probably live on the stack). It’s very easy to create copies of C#struct
s—just assign thestruct
to another variable—but changing the state of one does not change the state of the copies. In C# there must be anAction
continuation object, and a reference to it must to be kept somewhere. Perhaps the ideal design would only let there be a single reference so that once the continuation is executed then that single reference can be discarded (which would be the signal that the asynchronous operation has completed). But it’s impossible to do that in C# withstruct
s because of how easy it is to copystruct
s and because of how disconnected they are once you do; the developer must either have full control of all instances of the struct, or else he must document a contract (like Microsoft has done forValueTask
above) and hope that people stick to it.
And again, at the end of the day, even with ValueTask
and all its tricks, if
the awaiter returns false
from IsCompleted
then there is going to be at
least one heap allocation and dynamic dispatch.
This is why the Rust language devs say:
We were unable to make the “standard” future abstraction provide zero-cost composition of futures, and we know of no “standard” implementation that does so.
Let’s turn our attention to Rust.
I’m less familiar with Rust so I won’t be able to provide as much detail. But I think you’ll still see why Rust futures are better.
await
keyword worksActually I don’t entirely know how the await
keyword works in Rust. I’m only
able to find documentation on what it does.
My best guess is the Rust compiler generates a state machine similar to how the C# compiler does it.
But keep in mind that in Rust, closures are strongly typed, can be statically dispatched, and can live on the stack. So if async blocks are broken up into various steps in a state machine, that state machine and all its steps will be transparent to the compiler. And its steps can be statically dispatched without heap allocations.
However it works, the Rust language devs are clear that async/await is a zero-cost abstraction. So dynamic dispatch and heap allocation are definitely not necessary.
In Rust it comes down to implementing
the Future
trait:
pub trait Future {
type Output;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}
For example:
struct OneShotTimer {
bool completed,
Option<Waker> waker,
}
impl Future for OneShotTimer {
type Output = ();
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
match self.completed {
true => Poll::Ready(()),
false => {
self.waker = Some(cx.waker().clone())
}
}
}
}
impl OneShotTimer {
pub fn schedule(...) -> Self {
// Do stuff like wire up the .waker to be notified when the timer goes off
...
}
}
See?
Of course there very well might be heap allocations in scheduling this particular implementation of a timer. But that’s up to you. It’s not forced upon you by the async/await system.
And there are no heap allocations in polling this future. Even when it’s not yet
ready nothing needs to be placed on the heap. Sure there is a clone()
call in
there, but that places something into memory that has already been reserved
within the OneShotTimer
. This OneShotTimer
might happily live on the stack
and you could call poll()
in a [no_std]
environment.
You don’t have to always continually poll every single future. Rust’s futures
tell you when they’re ready to be polled again. And you know which one told you
because either you or your task executor gave it a specific Waker
, and you
did remember to create that Waker
in a way that associates it with the
currently executing task, right? So you don’t have to poll all the futures.
Here is what the documentation says:
The
poll
function is not called repeatedly in a tight loop – instead, it should only be called when the future indicates that it is ready to make progress (by callingwake()
). If you’re familiar with thepoll(2)
orselect(2)
syscalls on Unix it’s worth noting that futures typically do not suffer the same problems of “all wakeups must poll all events”; they are more likeepoll(4)
.
Here’s an example of joining two futures into a single future that returns the result of both of them only once they’ve completed:
enum FutureOrResult<T : Future> {
Future(T),
Result(T::Output),
None,
}
struct Joined<A : Future, B : Future>(FutureOrResult<A>, B, Option<Waker>);
impl<A : Future, B : Future> Future for Joined<A, B> {
type Output = (A::Output, B::Output);
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
match self.0 {
FutureOrResult::Future(future_a) => match future_a.poll(cx) {
Poll::Ready(result_a) => match self.1.poll(cx) {
Poll::Ready(result_b) => {
self.0 = FutureOrResult::None;
Poll::Ready((result_a, result_b))
},
Poll::Pending => {
self.0 = FutureOrResult::Result(result_a);
Poll::Pending
}
},
Poll::Pending => Poll::Pending
},
FutureOrResult::Result(result_a) => match self.1.poll(cx) {
Poll::Ready(result_b) => {
self.0 = FutureOrResult::None;
Poll::Ready((result_a, result_b))
},
Poll::Pending => {
Poll::Pending
}
},
FutureOrResult::None => panic!("You called poll too many times")
}
}
}
Maybe there’s a more idiomatic way, I dunno. But do you see any heap allocations anywhere in this composition of two futures?
Do you see how instead of being composed with callbacks and dynamic dispatch, Rust futures are composed with static dispatch?
It’s redundant to say this again, but the Future::poll()
method does not
require heap allocation or dynamic dispatch. Instead it can be statically
dispatched on things on the stack.
What’s to keep you from implementing your own IFuture
interface in C# and
making your own async/await framework?
Almost nothing. Here’s an attempt in C# that I think would get most of the way there. But Rust’s ownership and lifetime system is really excellent, and I think this example suffers without it:
public readonly struct Context
{
// Include a Waker here
}
public readonly struct Maybe<T>
{
public static readonly Maybe<T> None = default;
readonly T _value;
public Maybe(T value)
{
HasValue = true;
_value = value;
}
public bool HasValue { get; }
public T Value => HasValue
? _value
: throw new Exception();
}
public interface IFuture<T>
{
Maybe<T> Poll(in Context context);
}
public struct Joined<TA, TAFuture, TB, TBFuture> : IFuture<(TA, TB)>
where TAFuture : IFuture<TA>
where TBFuture : IFuture<TB>
{
bool _hasFutureA;
TAFuture _futureA;
TA _valueA;
bool _hasFutureB;
TBFuture _futureB;
TB _valueB;
public Joined(TAFuture futureA, TBFuture futureB)
{
_hasFutureA = true;
_hasFutureB = true;
_futureA = futureA;
_futureB = futureB;
_valueA = default!;
_valueB = default!;
}
public Maybe<(TA, TB)> Poll(in Context context)
{
TA valueA = default!;
TB valueB = default!;
var hasBoth = true;
if (_hasFutureA)
{
var pollResult = _futureA.Poll(in context);
if (pollResult.HasValue)
{
valueA = _valueA = pollResult.Value;
_hasFutureA = false;
_futureA = default!;
}
else
{
hasBoth = false;
}
}
else
{
valueA = _valueA;
}
if (_hasFutureB)
{
var pollResult = _futureB.Poll(in context);
if (pollResult.HasValue)
{
valueB = _valueB = pollResult.Value;
_hasFutureB = false;
_futureB = default!;
}
else
{
hasBoth = false;
}
}
else
{
valueB = _valueB;
}
return hasBoth
? new Maybe<(TA, TB)>((valueA, valueB))
: Maybe<(TA, TB)>.None;
}
}
There are neither heap allocations nor dynamic dispatch anywhere in sight.
You would have to be very careful with ownership and lifetimes of these structs,
though. I’m sure havoc would ensue the moment you clone a Joined
struct and
Poll()
‘ed both instances.
But if you walked very carefully then I’m sure it could work.
You can compose Rust futures without heap allocations or dynamic dispatch. That’s not true of C#’s tasks (or any other possible awaitable type).
Rust’s heapless composition power comes from the fact that Rust’s futures are polled. Because they are polled they can be composed with static dispatch instead of callbacks.
Polled futures are great!