[capnproto] Concurrency model in RPC protocol

Discussion:

Ross Light

2017-07-20 05:26:25 UTC

So in this old thread
<https://groups.google.com/d/topic/capnproto/4SLfibQPWFE/discussion>, it's
stated that the "call is received" event requires calling into application
code. From an implementation standpoint, this is declaring that receiving
a call in the RPC system is a critical section that involves crossing over
into application code boundary, which may try to acquire the same mutex (by
making a call on the connection). While you can postpone this problem by
adding queueing, I'm already a little nervous about how much queueing is
required by the protocol.

I'd like to suggest that the E model be considered: each capability is a
separate single queue. Instead of "call A is received happens before call
B is received", "call A returns happens before call B starts". The reason
this simplifies implementation is that because it prescribes what ought to
happen in the critical section (enqueue or throw an overload exception),
then no application needs to be invoked in the critical section. This
might not be a problem for the C++ implementation right now, but once
fibers are involved, I think it would become one.

I believe that the same properties can be obtained by pushing this into
interface definitions: if an interface really wants to declare that
operations can happen in parallel, then there can be a root capability that
creates a capability for each operation. Then the RPC subsystem can know
much more about how much work is being scheduled.

I realize this would be a big change, but I can't see a good way to avoid
this problem in any implementation of the RPC protocol that tries to use a
connection concurrently. Effectively, this forces all implementations to
be single-threaded AFAICT. Let me know what you think.

-Ross

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

'Kenton Varda' via Cap'n Proto

2017-07-20 17:25:09 UTC

Permalink

Post by Ross Light
So in this old thread
<https://groups.google.com/d/topic/capnproto/4SLfibQPWFE/discussion>,
it's stated that the "call is received" event requires calling into
application code. From an implementation standpoint, this is declaring
that receiving a call in the RPC system is a critical section that involves
crossing over into application code boundary, which may try to acquire the
same mutex (by making a call on the connection). While you can postpone
this problem by adding queueing, I'm already a little nervous about how
much queueing is required by the protocol.
I'd like to suggest that the E model be considered: each capability is a
separate single queue. Instead of "call A is received happens before call
B is received", "call A returns happens before call B starts". The reason
this simplifies implementation is that because it prescribes what ought to
happen in the critical section (enqueue or throw an overload exception),
then no application needs to be invoked in the critical section. This
might not be a problem for the C++ implementation right now, but once
fibers are involved, I think it would become one.

If I understand what you're proposing correctly, then another way of saying
it is: "A call must return a result before the next call on the same object
can begin."

(To be clear, this certainly isn't "the E model". Strictly speaking, in E,
calls don't "return" anything, but they do eventually resolve a promise,
and there's no requirement that that resolution happens before the next
call can proceed.)

I don't think this proposal would work. You're saying that if a method call
foo() wants to allow other methods to be called before foo() produces a
result, then foo() must produce a result via a callback rather than via
return. foo() would need to take, as one of its parameters, a capability
which it calls with the eventual results.

This would, of course, lead to all the usual "callback hell" problems we
see in async I/O. Over time, we've reached a consensus that the right way
to solve "callback hell" is to use promises. Promises allow us to express
the eventual results of an async function as a return value, which is much
more natural than using callbacks. Also, it makes it much clearer how
errors are to be propagated, and makes it harder to accidentally leak
errors.

So the next logical step would be to introduce a notion of promises into
Cap'n Proto interface specs. Let methods return promises instead of raw
values, and then they are free to interleave as necessary.

But then what would happen? Probably, everyone would declare all of their
methods to return promises, to give themselves flexibility to change their
implementation in the future if needed. In fact, there'd be even more
temptation to do this then there is in C++ and Javascript today, because
the client of a Cap'n Proto interface already has to treat the result as a
promise for latency and pipelining reasons. So, making a method return a
promise would create no new inconvenience on the client side (because
clients already have to deal with promises either way), and it would create
no inconvenience on the server side (because you can return an
immediately-resolved promise basically just as easily as returning a
value). So, everyone would declare every method to return a promise.

The next step, then, would be to say: OK, since everyone is declaring all
returns to be promises anyway, we're just going to say that it's implied.
All methods actually return promises. You don't need to declare it.

And then... we'd be back to *exactly* where we are today.

Today, the right way to think about Cap'n Proto methods is to say that all
methods return promises.

Post by Ross Light
I believe that the same properties can be obtained by pushing this into
interface definitions: if an interface really wants to declare that
operations can happen in parallel, then there can be a root capability that
creates a capability for each operation. Then the RPC subsystem can know
much more about how much work is being scheduled.
I realize this would be a big change, but I can't see a good way to avoid
this problem in any implementation of the RPC protocol that tries to use a
connection concurrently. Effectively, this forces all implementations to
be single-threaded AFAICT. Let me know what you think.

Cap'n Proto technically only requires that each *object* is
single-threaded. It's based on the actor model, which is actually similar
to the CSP model on which Go's concurrency is based. In fact, maybe the
problem is that we're trying to map Cap'n Proto to the wrong idioms in Go.

Imagine this design: Instead of mapping capnp methods to Go functions, we
map them to messages on a Go channel. Each object reads from a channel.
Each message on the channel initiates a call, and specifies a response
channel to which the call results should be sent when they are ready.

This design would achieve E-Order while allowing overlapping calls and
being idiomatic with Go's concurrency model.

Thoughts?

=====================

On another note -- getting away from specific languages or models -- I
don't think it's practical, in most use cases, to try to utilize multiple
processor cores to service a single connection. (Note I'm avoiding the word
"threads" here since languages like Go blur the meaning of "threads"
between processor cores vs. coroutines. I'm talking about cores, not about
coroutines.)

The problem is, the connection is implicitly a synchronization point. You
can't have multiple cores literally reading from the same connection at the
same time. So you necessarily have to have one core (at a time) servicing
the connection and then passing data off to other cores. But, the cost of
synchronization and data transfer between cores is likely to outweigh the
benefit in most use cases. You'd only get a benefit if one client is making
multiple CPU-heavy calls concurrently, which isn't all that common. In all
other cases you'd lose performance.

Instead, I think a better model is to balance whole connections across
cores. A client can make multiple connections if they want to utilize
multiple cores. When three-party handoff is implemented, the server will
even be able to instruct the client to open additional connections,
transparently to the app. This way the OS kernel actually knows which
thread any particular message is destined for, and directly schedules that
thread when a message arrives. No extra context switches!

This approach works (e.g. using Cap'n Proto C++) today.

-Kenton

Ross Light

2017-07-21 20:52:06 UTC

Permalink

(Sorry for the long response. I did not have time to make it shorter.)

I see your point about how "a call returns happens before the next call
starts" reaches an undesirable state for applications. I had an inkling
that this could be the case, but hadn't fully experimented to see the
results. However, just on terminology, I'm not sure I agree with your
assessment that objects are single-threaded: because returns can come back
in a different order than the calls arrived, this implies some level of
concurrent (but perhaps not parallel) execution.

As for your idea of mapping Cap'n Proto methods to messages on Go channels:
it shuffles the problem around a bit but doesn't escape this deadlock
issue. In fact, the first draft I had of the RPC system used a lot more
channels, but I found it made the control flow hard to reason about (but it
could still be implemented this way). Let me give you enough background on
how Go concurrency works so that we're talking with each other.

*Background*

As you already know, goroutines are scheduled on OS threads. When a
goroutine hits a point at which it is blocked on I/O, it will be removed
from the run queue and the OS thread will be reclaimed for another runnable
goroutine. Because the language patches over the typical thread overhead
and makes it easy for the caller to create new goroutines, the practice is
that all functions block (this avoids the function coloring problem
<http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/>,
similar to how every method ought to be a promise in Cap'n Proto RPC).

There are two flavors of channels: buffered and unbuffered. Buffered can
be thought of as a type-safe and goroutine-safe FIFO. You can place items
onto the channel and they can be received to remove items. But
importantly, if you try to send to a buffered channel that is full, it
blocks until items are received/removed. An unbuffered channel can thus be
thought of as perpetually being in the full state. Most of the time, using
an unbuffered channel is the correct approach as it applies backpressure in
the most predictable way. There was a great talk at Gophercon last week
about Understanding Channels
<https://about.sourcegraph.com/go/understanding-channels-kavya-joshi>, but
the video's not yet online.

*Problem Statement*

The reproduction case is alarmingly simple and common. Alice, in Vat A,
receives a call foo() from Bob, in Vat B. Alice has a reference to Bob
(either through the call or prior calls). foo() is a call that does the
following:

1. Call bar() on Bob
2. Wait for the result
3. Return a value based on computation on the result

Now let's talk broadly about implementation constraints of the RPC system
bridging Vat A and Vat B. There are at least two concurrent (but
definitely not parallel) processes at play: receiving messages from the
wire and sending messages on the wire. Both of these mutate the set of 4
(5) tables, so naturally a mutex protects this data structure. When the
receiver process encounters a Call, it must acquire the mutex, start the
work, then release the mutex. The "start the work" piece is what we're
interested in here.

Let's assume, as you say, that each call comes over an unbuffered channel.
Each capability creates a loop where it receives a new call, optionally
does a bit of processing to "start the work", then sends back a response.
If there's an RPC or longer I/O operation, it spawns a goroutine that sends
back the response so as to avoid blocking that loop.

Here's where the language difference comes into play and provides a footgun
that C++ does not have: Step 2 of foo(). In the C++ implementation,
long-running work does have a different function "color" and returns a
future, which you should not (and makes it convoluted to) wait on without
starting a concurrent process to finish the rest of the work. I do have
futures in the Go implementation, but instead of exposing a ".then()"
method, they expose a ".wait()" method, since this is much more idiomatic.
In Go, it would be quite easy to accidentally have a long-running operation
be in that "start the work" phase. This is fine if the long-running work
doesn't use RPCs, but if you happen to wait for an RPC to return, then this
deadlocks. The RPC connection will still be blocked on servicing the
"start the work".

With a channel, you postpone the problem to the next time Bob calls foo()
on Alice. The bad sequence (from very beginning) is:

1. Bob calls foo() on Alice
2. Vat A sends the foo() message to Alice's channel
3. Alice receives the foo() message
4. Vat A releases the mutex on the tables and waits for the next message to
come over the wire from Vat B
5. Alice calls bar() on Bob
6. Vat A acquires the mutex on the tables, adds the question entry to the
table, writes the Call to Vat B, then releases the mutex. Vat A gives a
future back to Alice.
7. Alice starts waiting on the future.
8. Bob calls foo() on Alice again. Call this foo'().
9. Vat A attempts to send the foo'() message to Alice's channel. It
blocks, since Alice is not ready to receive from the channel (it's still
processing foo()).
10. Bob is finished with bar() and attempts to write its return to Vat A.
Vat A is now deadlocked: it can't receive the new wire message because it's
waiting for Alice to receive foo'(), but foo'() cannot be received until
foo() receives bar()'s response.

*How this surfaces in go-capnproto2*

As mentioned above, the first few revisions of the Go RPC implementation
used channels in this sort of way under the hood to call methods. However,
it did not wait for call to be acknowledged, which destroys E-Order. I've
since moved to a mutex-based solution (since it reduces the number of
goroutines that the RPC system keeps around and makes it easier to debug),
but fundamentally server.Ack
<https://godoc.org/zombiezen.com/go/capnproto2/server#Ack> is equivalent to
starting a channel receive.

With the current go-capnproto2 implementation, you can't even safely start
a call before the server.Ack call (which starts the work), since there the
connection's lock is not recursive. I have an idea for how to fix that:
create a temporary sub-mutex that is propagated via the Context. This is
definitely a necessary change. I'm still a bit scared that there could be
deadlock: Vat A calls to Vat B to service a request from Vat C while Vat A
calls to Vat C to service a call from Vat B. (Note that this is not
3-party, this is just multiple 2-party connections.) I'm not certain about
this however, and would love to be wrong, since the workaround would seem
to be a process-wide mutex on Cap'n Proto RPC work.

However, there's still the looming footgun here: accidental deadlocks while
starting work. I don't see a way around this in general. And it bears
repeating, this is actually a problem for C++ too, but it is much more
obvious when looking at the code that it's blocking on something it
shouldn't. Whether the Go implementation uses straight-line code or
channels doesn't make too much of a difference here: there's still
straight-line code in the channel-based approach, but the split between the
deferred work and the initial work requires a bit more work for the
application (spawning a goroutine and ensuring a resolve promise method
gets called versus just returning). Channels versus functions come down to
a question of API appearance rather than functional change.

*Mitigation*

The only guaranteed safe way to mitigate these issues is to make the Go
capabilities act serially to preserve E-Order. It could be provided in an
opt-out manner, but I'm not even sure how I would write code that avoids
the recursive mutex lock problem. I'm open to other ideas on how to solve
this.

Thanks for the read,
-Ross

Post by 'Kenton Varda' via Cap'n Proto

If I understand what you're proposing correctly, then another way of
saying it is: "A call must return a result before the next call on the same
object can begin."
(To be clear, this certainly isn't "the E model". Strictly speaking, in E,
calls don't "return" anything, but they do eventually resolve a promise,
and there's no requirement that that resolution happens before the next
call can proceed.)
I don't think this proposal would work. You're saying that if a method
call foo() wants to allow other methods to be called before foo() produces
a result, then foo() must produce a result via a callback rather than via
return. foo() would need to take, as one of its parameters, a capability
which it calls with the eventual results.
This would, of course, lead to all the usual "callback hell" problems we
see in async I/O. Over time, we've reached a consensus that the right way
to solve "callback hell" is to use promises. Promises allow us to express
the eventual results of an async function as a return value, which is much
more natural than using callbacks. Also, it makes it much clearer how
errors are to be propagated, and makes it harder to accidentally leak
errors.
So the next logical step would be to introduce a notion of promises into
Cap'n Proto interface specs. Let methods return promises instead of raw
values, and then they are free to interleave as necessary.
But then what would happen? Probably, everyone would declare all of their
methods to return promises, to give themselves flexibility to change their
implementation in the future if needed. In fact, there'd be even more
temptation to do this then there is in C++ and Javascript today, because
the client of a Cap'n Proto interface already has to treat the result as a
promise for latency and pipelining reasons. So, making a method return a
promise would create no new inconvenience on the client side (because
clients already have to deal with promises either way), and it would create
no inconvenience on the server side (because you can return an
immediately-resolved promise basically just as easily as returning a
value). So, everyone would declare every method to return a promise.
The next step, then, would be to say: OK, since everyone is declaring all
returns to be promises anyway, we're just going to say that it's implied.
All methods actually return promises. You don't need to declare it.
And then... we'd be back to *exactly* where we are today.
Today, the right way to think about Cap'n Proto methods is to say that all
methods return promises.

Cap'n Proto technically only requires that each *object* is
single-threaded. It's based on the actor model, which is actually similar
to the CSP model on which Go's concurrency is based. In fact, maybe the
problem is that we're trying to map Cap'n Proto to the wrong idioms in Go.
Imagine this design: Instead of mapping capnp methods to Go functions, we
map them to messages on a Go channel. Each object reads from a channel.
Each message on the channel initiates a call, and specifies a response
channel to which the call results should be sent when they are ready.
This design would achieve E-Order while allowing overlapping calls and
being idiomatic with Go's concurrency model.
Thoughts?
=====================
On another note -- getting away from specific languages or models -- I
don't think it's practical, in most use cases, to try to utilize multiple
processor cores to service a single connection. (Note I'm avoiding the word
"threads" here since languages like Go blur the meaning of "threads"
between processor cores vs. coroutines. I'm talking about cores, not about
coroutines.)
The problem is, the connection is implicitly a synchronization point. You
can't have multiple cores literally reading from the same connection at the
same time. So you necessarily have to have one core (at a time) servicing
the connection and then passing data off to other cores. But, the cost of
synchronization and data transfer between cores is likely to outweigh the
benefit in most use cases. You'd only get a benefit if one client is making
multiple CPU-heavy calls concurrently, which isn't all that common. In all
other cases you'd lose performance.
Instead, I think a better model is to balance whole connections across
cores. A client can make multiple connections if they want to utilize
multiple cores. When three-party handoff is implemented, the server will
even be able to instruct the client to open additional connections,
transparently to the app. This way the OS kernel actually knows which
thread any particular message is destined for, and directly schedules that
thread when a message arrives. No extra context switches!
This approach works (e.g. using Cap'n Proto C++) today.
-Kenton

David Renshaw

2017-07-23 02:49:42 UTC

Permalink

Would it work for the go-capnproto2 RPC system to spawn a new goroutine for
each method invocation? Then the user-supplied method implementation code
would always get executed on a separate goroutine from the object's main
loop, avoiding the deadlock. (There would be no "start the work" phase.)
Admittedly, one downside to this approach is that method implementations
would no longer get direct access to user-defined object state (also known
as "this" or "self"). However, method implementations could probably still
get a mutex-protected reference to the object state, and maybe that's good
enough.

Post by Ross Light
(Sorry for the long response. I did not have time to make it shorter.)
I see your point about how "a call returns happens before the next call
starts" reaches an undesirable state for applications. I had an inkling
that this could be the case, but hadn't fully experimented to see the
results. However, just on terminology, I'm not sure I agree with your
assessment that objects are single-threaded: because returns can come back
in a different order than the calls arrived, this implies some level of
concurrent (but perhaps not parallel) execution.
As for your idea of mapping Cap'n Proto methods to messages on Go
channels: it shuffles the problem around a bit but doesn't escape this
deadlock issue. In fact, the first draft I had of the RPC system used a
lot more channels, but I found it made the control flow hard to reason
about (but it could still be implemented this way). Let me give you enough
background on how Go concurrency works so that we're talking with each
other.
*Background*
As you already know, goroutines are scheduled on OS threads. When a
goroutine hits a point at which it is blocked on I/O, it will be removed
from the run queue and the OS thread will be reclaimed for another runnable
goroutine. Because the language patches over the typical thread overhead
and makes it easy for the caller to create new goroutines, the practice is
that all functions block (this avoids the function coloring problem
<http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/>,
similar to how every method ought to be a promise in Cap'n Proto RPC).
There are two flavors of channels: buffered and unbuffered. Buffered can
be thought of as a type-safe and goroutine-safe FIFO. You can place items
onto the channel and they can be received to remove items. But
importantly, if you try to send to a buffered channel that is full, it
blocks until items are received/removed. An unbuffered channel can thus be
thought of as perpetually being in the full state. Most of the time, using
an unbuffered channel is the correct approach as it applies backpressure in
the most predictable way. There was a great talk at Gophercon last week
about Understanding Channels
<https://about.sourcegraph.com/go/understanding-channels-kavya-joshi>,
but the video's not yet online.
*Problem Statement*
The reproduction case is alarmingly simple and common. Alice, in Vat A,
receives a call foo() from Bob, in Vat B. Alice has a reference to Bob
(either through the call or prior calls). foo() is a call that does the
1. Call bar() on Bob
2. Wait for the result
3. Return a value based on computation on the result
Now let's talk broadly about implementation constraints of the RPC system
bridging Vat A and Vat B. There are at least two concurrent (but
definitely not parallel) processes at play: receiving messages from the
wire and sending messages on the wire. Both of these mutate the set of 4
(5) tables, so naturally a mutex protects this data structure. When the
receiver process encounters a Call, it must acquire the mutex, start the
work, then release the mutex. The "start the work" piece is what we're
interested in here.
Let's assume, as you say, that each call comes over an unbuffered
channel. Each capability creates a loop where it receives a new call,
optionally does a bit of processing to "start the work", then sends back a
response. If there's an RPC or longer I/O operation, it spawns a goroutine
that sends back the response so as to avoid blocking that loop.
Here's where the language difference comes into play and provides a
footgun that C++ does not have: Step 2 of foo(). In the C++
implementation, long-running work does have a different function "color"
and returns a future, which you should not (and makes it convoluted to)
wait on without starting a concurrent process to finish the rest of the
work. I do have futures in the Go implementation, but instead of exposing
a ".then()" method, they expose a ".wait()" method, since this is much more
idiomatic. In Go, it would be quite easy to accidentally have a
long-running operation be in that "start the work" phase. This is fine if
the long-running work doesn't use RPCs, but if you happen to wait for an
RPC to return, then this deadlocks. The RPC connection will still be
blocked on servicing the "start the work".
With a channel, you postpone the problem to the next time Bob calls foo()
1. Bob calls foo() on Alice
2. Vat A sends the foo() message to Alice's channel
3. Alice receives the foo() message
4. Vat A releases the mutex on the tables and waits for the next message
to come over the wire from Vat B
5. Alice calls bar() on Bob
6. Vat A acquires the mutex on the tables, adds the question entry to the
table, writes the Call to Vat B, then releases the mutex. Vat A gives a
future back to Alice.
7. Alice starts waiting on the future.
8. Bob calls foo() on Alice again. Call this foo'().
9. Vat A attempts to send the foo'() message to Alice's channel. It
blocks, since Alice is not ready to receive from the channel (it's still
processing foo()).
10. Bob is finished with bar() and attempts to write its return to Vat A.
Vat A is now deadlocked: it can't receive the new wire message because it's
waiting for Alice to receive foo'(), but foo'() cannot be received until
foo() receives bar()'s response.
*How this surfaces in go-capnproto2*
As mentioned above, the first few revisions of the Go RPC implementation
used channels in this sort of way under the hood to call methods. However,
it did not wait for call to be acknowledged, which destroys E-Order. I've
since moved to a mutex-based solution (since it reduces the number of
goroutines that the RPC system keeps around and makes it easier to debug),
but fundamentally server.Ack
<https://godoc.org/zombiezen.com/go/capnproto2/server#Ack> is equivalent
to starting a channel receive.
With the current go-capnproto2 implementation, you can't even safely start
a call before the server.Ack call (which starts the work), since there the
create a temporary sub-mutex that is propagated via the Context. This is
definitely a necessary change. I'm still a bit scared that there could be
deadlock: Vat A calls to Vat B to service a request from Vat C while Vat A
calls to Vat C to service a call from Vat B. (Note that this is not
3-party, this is just multiple 2-party connections.) I'm not certain about
this however, and would love to be wrong, since the workaround would seem
to be a process-wide mutex on Cap'n Proto RPC work.
However, there's still the looming footgun here: accidental deadlocks
while starting work. I don't see a way around this in general. And it
bears repeating, this is actually a problem for C++ too, but it is much
more obvious when looking at the code that it's blocking on something it
shouldn't. Whether the Go implementation uses straight-line code or
channels doesn't make too much of a difference here: there's still
straight-line code in the channel-based approach, but the split between the
deferred work and the initial work requires a bit more work for the
application (spawning a goroutine and ensuring a resolve promise method
gets called versus just returning). Channels versus functions come down to
a question of API appearance rather than functional change.
*Mitigation*
The only guaranteed safe way to mitigate these issues is to make the Go
capabilities act serially to preserve E-Order. It could be provided in an
opt-out manner, but I'm not even sure how I would write code that avoids
the recursive mutex lock problem. I'm open to other ideas on how to solve
this.
Thanks for the read,
-Ross

Post by 'Kenton Varda' via Cap'n Proto

If I understand what you're proposing correctly, then another way of
saying it is: "A call must return a result before the next call on the same
object can begin."
(To be clear, this certainly isn't "the E model". Strictly speaking, in
E, calls don't "return" anything, but they do eventually resolve a promise,
and there's no requirement that that resolution happens before the next
call can proceed.)
I don't think this proposal would work. You're saying that if a method
call foo() wants to allow other methods to be called before foo() produces
a result, then foo() must produce a result via a callback rather than via
return. foo() would need to take, as one of its parameters, a capability
which it calls with the eventual results.
This would, of course, lead to all the usual "callback hell" problems we
see in async I/O. Over time, we've reached a consensus that the right way
to solve "callback hell" is to use promises. Promises allow us to express
the eventual results of an async function as a return value, which is much
more natural than using callbacks. Also, it makes it much clearer how
errors are to be propagated, and makes it harder to accidentally leak
errors.
So the next logical step would be to introduce a notion of promises into
Cap'n Proto interface specs. Let methods return promises instead of raw
values, and then they are free to interleave as necessary.
But then what would happen? Probably, everyone would declare all of their
methods to return promises, to give themselves flexibility to change their
implementation in the future if needed. In fact, there'd be even more
temptation to do this then there is in C++ and Javascript today, because
the client of a Cap'n Proto interface already has to treat the result as a
promise for latency and pipelining reasons. So, making a method return a
promise would create no new inconvenience on the client side (because
clients already have to deal with promises either way), and it would create
no inconvenience on the server side (because you can return an
immediately-resolved promise basically just as easily as returning a
value). So, everyone would declare every method to return a promise.
The next step, then, would be to say: OK, since everyone is declaring all
returns to be promises anyway, we're just going to say that it's implied.
All methods actually return promises. You don't need to declare it.
And then... we'd be back to *exactly* where we are today.
Today, the right way to think about Cap'n Proto methods is to say that
all methods return promises.

Post by Ross Light
I believe that the same properties can be obtained by pushing this into
interface definitions: if an interface really wants to declare that
operations can happen in parallel, then there can be a root capability that
creates a capability for each operation. Then the RPC subsystem can know
much more about how much work is being scheduled.
I realize this would be a big change, but I can't see a good way to
avoid this problem in any implementation of the RPC protocol that tries to
use a connection concurrently. Effectively, this forces all
implementations to be single-threaded AFAICT. Let me know what you think.

Cap'n Proto technically only requires that each *object* is
single-threaded. It's based on the actor model, which is actually similar
to the CSP model on which Go's concurrency is based. In fact, maybe the
problem is that we're trying to map Cap'n Proto to the wrong idioms in Go.
Imagine this design: Instead of mapping capnp methods to Go functions, we
map them to messages on a Go channel. Each object reads from a channel.
Each message on the channel initiates a call, and specifies a response
channel to which the call results should be sent when they are ready.
This design would achieve E-Order while allowing overlapping calls and
being idiomatic with Go's concurrency model.
Thoughts?
=====================
On another note -- getting away from specific languages or models -- I
don't think it's practical, in most use cases, to try to utilize multiple
processor cores to service a single connection. (Note I'm avoiding the word
"threads" here since languages like Go blur the meaning of "threads"
between processor cores vs. coroutines. I'm talking about cores, not about
coroutines.)
The problem is, the connection is implicitly a synchronization point. You
can't have multiple cores literally reading from the same connection at the
same time. So you necessarily have to have one core (at a time) servicing
the connection and then passing data off to other cores. But, the cost of
synchronization and data transfer between cores is likely to outweigh the
benefit in most use cases. You'd only get a benefit if one client is making
multiple CPU-heavy calls concurrently, which isn't all that common. In all
other cases you'd lose performance.
Instead, I think a better model is to balance whole connections across
cores. A client can make multiple connections if they want to utilize
multiple cores. When three-party handoff is implemented, the server will
even be able to instruct the client to open additional connections,
transparently to the app. This way the OS kernel actually knows which
thread any particular message is destined for, and directly schedules that
thread when a message arrives. No extra context switches!
This approach works (e.g. using Cap'n Proto C++) today.
-Kenton

--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

David Renshaw

2017-07-23 03:01:20 UTC

Permalink

Post by David Renshaw
Would it work for the go-capnproto2 RPC system to spawn a new goroutine
for each method invocation? Then the user-supplied method implementation
code would always get executed on a separate goroutine from the object's
main loop, avoiding the deadlock. (There would be no "start the work"
phase.) Admittedly, one downside to this approach is that method
implementations would no longer get direct access to user-defined object
state (also known as "this" or "self"). However, method implementations
could probably still get a mutex-protected reference to the object state,
and maybe that's good enough.

I suppose that another problem with this is that Go's scheduler might
scramble the order of method calls between the time when they are
dispatched to separate goroutines and the time when they are actually
executed.

Ross Light

2017-07-23 03:58:16 UTC

Permalink

Correct. There's no happens-before relationship between goroutines without
an explicit synchronization point.

I have been thinking about this more, and I think I have a solution.
Instead of making the call to the capability directly in the critical
section, the connection could have a goroutine that receives calls on a
buffered channel. Importantly, the send would abort if the queue is full,
so that it never blocks. The effect would be that any calls would be
deferred until after the critical section, but they would have the same
order. While it still introduces a queue, it's only one per connection,
which is less troublesome to me.

I'll investigate a bit more.

Post by David Renshaw

David Renshaw

2017-07-23 04:10:26 UTC

Permalink

/me reads Ross's post more carefully....

Ah, so it sounds like you are indeed doing something like "spawn a
goroutine for each request" and the way that you keep things in the proper
order is by requiring method implementations to call the server.Ack
function. How does user-defined object state fit in to this model? I guess
it's probably accessible through some mutex-protected reference that's part
of the `call` parameter?

Post by David Renshaw

David Renshaw

2017-07-23 12:56:55 UTC

Permalink

Post by David Renshaw
/me reads Ross's post more carefully....
Ah, so it sounds like you are indeed doing something like "spawn a
goroutine for each request" and the way that you keep things in the proper
order is by requiring method implementations to call the server.Ack
function. How does user-defined object state fit in to this model? I guess
it's probably accessible through some mutex-protected reference that's part
of the `call` parameter?

/me reads some more source code...

It looks like
<https://github.com/capnproto/go-capnproto2/blob/9eb2b19e6905291aa373d2d0bc1516c0abb88b5d/rpc/embargo_test.go#L80>
method implementations get direct access to object state. So if a method
implementation needs unsynchronized access to the state, that must happen
before server.Ack() is called, right? So you can think of server.Ack() as
"releasing the lock" on the object. From this perspective, I don't think
it's surprising or too burdensome that recursive calls to the object can
trigger deadlock if they happen before server.Ack().

'Kenton Varda' via Cap'n Proto

2017-07-24 17:51:42 UTC

Permalink

To clarify, what I meant was that Cap'n Proto is based on the actor model.
In the basic actor model, an actor repeatedly receives messages and, in
response to each message, may update its state and may send additional
messages to other actors (perhaps a reply, perhaps not). These message
handlers are sequential (only one runs at a time) and are non-blocking.

Conceptually, each actor has an internal event loop and only does one thing
at a time. But this doesn't mean the actor model is single-threaded:
multiple actors can be executing simultaneously. Since each only has access
to its own state, there's no need for complex synchronization.

Cap'n Proto extends the model by making request/response pairings explicit,
but it doesn't require that a response be sent before a new request arrives.

Technically, it's common under Cap'n Proto implementations for one actor to
implement multiple objects (where an object is an endpoint for messages,
i.e. what a capability points to) -- or, put another way, multiple objects
may share the same event loop, and thus their event handlers are
serialized. In C++ in particular, currently most (all?) Cap'n Proto
applications are single-threaded, hence the entire process acts like one
actor. But what I'd like to do (in C++) in the future is make it easy to
have multiple actors (and thus multiple threads) in a process, each
possibly handling multiple objects.

Post by Ross Light
As for your idea of mapping Cap'n Proto methods to messages on Go
channels: it shuffles the problem around a bit but doesn't escape this
deadlock issue. In fact, the first draft I had of the RPC system used a
lot more channels, but I found it made the control flow hard to reason
about (but it could still be implemented this way). Let me give you enough
background on how Go concurrency works so that we're talking with each
other.
*Background*

Thanks, I understand the issue better now. Let me know if this is correct:

Go's channels don't really map to the actor model in the way I imagined,
because channels in Go are really used for one-way messages, whereas when
you have request/response, it's considered more idiomatic to use a blocking
function call. If you were trying to match the actor model, you would send
a call message over a channel, and include in that call message a new
channel to which the response is to be sent. You'd then use a select block
or a goroutine to wait for those responses at the same time as waiting for
further calls. But that's not how people usually write things in Go,
perhaps because it makes for difficult-to-follow code.

But indeed, it seems awkward to support a threaded model with concurrent
calls while also supporting e-order, since you now need to convince people
to explicitly acknowledge calls.

If you make it explicitly illegal to call back into the RPC system before
acknowledging the current call -- e.g. panicking if they do -- then
programmers ought to notice the mistake quickly.

Alternatively, what if making a new call implicitly acknowledged the
current call? This avoids cognitive overhead and probably produces the
desired behavior?

Post by Ross Light
I have been thinking about this more, and I think I have a solution.
Instead of making the call to the capability directly in the critical
section, the connection could have a goroutine that receives calls on a
buffered channel. Importantly, the send would abort if the queue is full,
so that it never blocks. The effect would be that any calls would be
deferred until after the critical section, but they would have the same
order. While it still introduces a queue, it's only one per connection,
which is less troublesome to me.

Hmm, do calls on separate objects potentially block each other? Note that
normally E-order only applies within a single object; calls on independent
objects need not be ordered. (However, in practice I do think there are
some use cases for defining "object groups" that have mutual e-order, but
this is not the original guarantee that E gave.)

What happens when the queue is full? Do you start rejecting calls? Or do
you stop reading from the connection? That could, of course, also lead to
deadlock, if a string of dependent calls are bouncing back and forth.

I wonder if a missing piece here is some way to apply backpressure on a
single object without applying to the whole connection.

-Kenton

Ross Light

2017-09-01 16:26:18 UTC

Permalink

Just wanted to close this thread off: I think I have what I need to unblock
Go RPC improvements. My ramblings on implementation at the end didn't make
much sense and were more complicated than what's needed. Don't mind me. :)

Time permitting, I'll try to collect my observations about backpressure in
Cap'n Proto in some sort of sensible documentation. Perhaps this would be
a good candidate for some of the non-normative docs of the RPC spec. I
agree that being able to apply backpressure to a single capability without
blocking the whole connection would be a boon.

One thing I'm currently curious about in the C++ implementation: does the
RPC system provide any backpressure for sending calls to the remote vat?
AFAICT there's no bound on the EventLoop queue.

Post by 'Kenton Varda' via Cap'n Proto
Cap'n Proto extends the model by making request/response pairings

explicit, but it doesn't require that a response be sent before a new
request arrives.

Good point; I'm not arguing for that restriction. I'm fine with this
1. Alice sends Bob foo1()
2. Bob starts working on foo1()
3. Alice sends Bob foo2(). Bob queues it.
4. Alice sends Bob foo3(). Bob queues it.
5. Bob finishes foo1() and returns foo1()'s response to Alice
6. Bob starts working on foo2()
7. Bob finishes foo2() and returns foo2()'s response to Alice
8. Bob starts working on foo3()
9. Bob finishes foo3() and returns foo3()'s response to Alice

In this example, you're saying Bob can't start working on a new request
until after sending a response for the last request. That's what I'm saying
is *not* a constraint imposed by Cap'n Proto.

Post by 'Kenton Varda' via Cap'n Proto
Here's the harder sequence (which IIUC, C++ permits. *If it doesn't*,
1. Alice sends Bob foo1()
2. Bob starts working on foo1(). It's going to do something that will
take a long time (read as: requires a future), so it acknowledges delivery
and keeps going. Bob now has has multiple conceptual actors for the same
capability, although I can see how this can be also be thought of as a
single actor receiving request messages and sending response messages.
3. Alice sends Bob foo2()
4. Bob starts working on foo2().
5. foo2() is short, so Bob returns a result to Alice.
6. foo1()'s long task completes. Bob returns foo1()'s result to Alice.

This does not create "multiple conceptual actors". I think you may be
mixing up actors with threads. The difference between a (conceptual) thread
and an (conceptual) actor is that a thread follows a call stack (possibly
crossing objects) while an actor follows an object (sending asynchronous
messages to other objects).
In step 2, when Bob initiates "something that will take a long time", in
your threaded approach in Go, he makes a blocking call of some sort. But in
the actor model, blocking calls aren't allowed. Bob would initiate a
long-running operation by sending a message. When the operation completes,
a message is sent back to Bob with the results. In between these messages,
Bob is free to process other messages. The important thing is that only one
message handler is executing in Bob at a time, therefore Bob's state does
not need to be protected by a mutex. However, message handlers cannot block
-- they always complete immediately.
Concretely speaking, in C++, the implementation of Bob.foo() will call
some other function that returns a promise, and then foo() will return a
promise chained off of it. As soon as foo() returns that promise, then a
new method on Bob can be invoked immediately, without waiting for the
returned promise to resolve.
This of course suffers from the "function coloring problem" you referenced
earlier. All Cap'n Proto methods are colored red (asynchronous).
I think what the function coloring analogy misses, though, is that
permitting functions to block doesn't really avoid the function-coloring
problem, it only sweeps the problem under the rug. Even in a multi-threaded
program, it is incredibly important to know which functions might block.
Because, in a multi-threaded program, you almost certainly don't want to
call a blocking functions while holding a mutex lock. If you do, you risk
blocking not only your own thread, but all other threads that might need to
take that lock. And in the case of bidirectional communication, you risk
deadlock.
This is, I think, exactly the problem I think you're running into here.
Alternatively, what if making a new call implicitly acknowledged the

Post by 'Kenton Varda' via Cap'n Proto

current call? This avoids cognitive overhead and probably produces the
desired behavior?

I don't think this is a good idea, since it seems common to want to start
off a call (or multiple) before acknowledging delivery.

I guess I meant: *Waiting* on results of a sub call should implicitly
acknowledge the super call / unblock concurrent calls. So you could *start*
multiple sub calls while still being protected from concurrent calls, but
as soon as you *wait* on one, you're no longer protected.

Post by 'Kenton Varda' via Cap'n Proto
I thought about this a bit more over the last couple of days and I
think I have a way out (finally). Right now, operating on the connection
acquires a mutex. I think I need to extend this to be a mutex+condition,
where the condition is for is-connection-making-call. When the connection
makes a call, it marks the is-connection-making-call bit, then plumbs the
is-in-a-connection-call info through the Context (think of as thread-local
storage, except explicit). When the connection acquires the mutex,
non-send-RPC operations will block on the is-connection-making-call bit to
be cleared and send-RPC operations will not block. I've examined the
send-RPC path and that operation ought to be safe to be called. This would
avoid the nasty queue idea that I had.

Sorry, I don't follow this.
-Kenton

'Kenton Varda' via Cap'n Proto

2017-09-01 17:21:43 UTC

Permalink

The C++ RPC implementation has one, limited, form of backpressure created
specifically for Sandstorm sandboxing purposes: setFlowLimit().

https://github.com/capnproto/capnproto/blob/master/c++/src/capnp/rpc.h#L115

This simple approach works well enough to prevent buggy Sandstorm apps from
filling up the front-end's memory. It can theoretically lead to deadlock,
though, in the case where a recursive call bounces back and forth enough
times to fill the limit, then gets stuck waiting.

-Kenton

Post by Ross Light
Just wanted to close this thread off: I think I have what I need to
unblock Go RPC improvements. My ramblings on implementation at the end
didn't make much sense and were more complicated than what's needed. Don't
mind me. :)
Time permitting, I'll try to collect my observations about backpressure in
Cap'n Proto in some sort of sensible documentation. Perhaps this would be
a good candidate for some of the non-normative docs of the RPC spec. I
agree that being able to apply backpressure to a single capability without
blocking the whole connection would be a boon.
One thing I'm currently curious about in the C++ implementation: does the
RPC system provide any backpressure for sending calls to the remote vat?
AFAICT there's no bound on the EventLoop queue.

Post by 'Kenton Varda' via Cap'n Proto
Cap'n Proto extends the model by making request/response pairings

explicit, but it doesn't require that a response be sent before a new
request arrives.

In this example, you're saying Bob can't start working on a new request
until after sending a response for the last request. That's what I'm saying
is *not* a constraint imposed by Cap'n Proto.

This does not create "multiple conceptual actors". I think you may be
mixing up actors with threads. The difference between a (conceptual) thread
and an (conceptual) actor is that a thread follows a call stack (possibly
crossing objects) while an actor follows an object (sending asynchronous
messages to other objects).
In step 2, when Bob initiates "something that will take a long time", in
your threaded approach in Go, he makes a blocking call of some sort. But in
the actor model, blocking calls aren't allowed. Bob would initiate a
long-running operation by sending a message. When the operation completes,
a message is sent back to Bob with the results. In between these messages,
Bob is free to process other messages. The important thing is that only one
message handler is executing in Bob at a time, therefore Bob's state does
not need to be protected by a mutex. However, message handlers cannot block
-- they always complete immediately.
Concretely speaking, in C++, the implementation of Bob.foo() will call
some other function that returns a promise, and then foo() will return a
promise chained off of it. As soon as foo() returns that promise, then a
new method on Bob can be invoked immediately, without waiting for the
returned promise to resolve.
This of course suffers from the "function coloring problem" you
referenced earlier. All Cap'n Proto methods are colored red (asynchronous).
I think what the function coloring analogy misses, though, is that
permitting functions to block doesn't really avoid the function-coloring
problem, it only sweeps the problem under the rug. Even in a multi-threaded
program, it is incredibly important to know which functions might block.
Because, in a multi-threaded program, you almost certainly don't want to
call a blocking functions while holding a mutex lock. If you do, you risk
blocking not only your own thread, but all other threads that might need to
take that lock. And in the case of bidirectional communication, you risk
deadlock.
This is, I think, exactly the problem I think you're running into here.
Alternatively, what if making a new call implicitly acknowledged the

Post by 'Kenton Varda' via Cap'n Proto

current call? This avoids cognitive overhead and probably produces the
desired behavior?

I don't think this is a good idea, since it seems common to want to
start off a call (or multiple) before acknowledging delivery.

Sorry, I don't follow this.
-Kenton