Would it work for the go-capnproto2 RPC system to spawn a new goroutine for
loop, avoiding the deadlock. (There would be no "start the work" phase.)
as "this" or "self"). However, method implementations could probably still
enough.
Post by Ross Light(Sorry for the long response. I did not have time to make it shorter.)
I see your point about how "a call returns happens before the next call
starts" reaches an undesirable state for applications. I had an inkling
that this could be the case, but hadn't fully experimented to see the
results. However, just on terminology, I'm not sure I agree with your
assessment that objects are single-threaded: because returns can come back
in a different order than the calls arrived, this implies some level of
concurrent (but perhaps not parallel) execution.
As for your idea of mapping Cap'n Proto methods to messages on Go
channels: it shuffles the problem around a bit but doesn't escape this
deadlock issue. In fact, the first draft I had of the RPC system used a
lot more channels, but I found it made the control flow hard to reason
about (but it could still be implemented this way). Let me give you enough
background on how Go concurrency works so that we're talking with each
other.
*Background*
As you already know, goroutines are scheduled on OS threads. When a
goroutine hits a point at which it is blocked on I/O, it will be removed
from the run queue and the OS thread will be reclaimed for another runnable
goroutine. Because the language patches over the typical thread overhead
and makes it easy for the caller to create new goroutines, the practice is
that all functions block (this avoids the function coloring problem
<http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/>,
similar to how every method ought to be a promise in Cap'n Proto RPC).
There are two flavors of channels: buffered and unbuffered. Buffered can
be thought of as a type-safe and goroutine-safe FIFO. You can place items
onto the channel and they can be received to remove items. But
importantly, if you try to send to a buffered channel that is full, it
blocks until items are received/removed. An unbuffered channel can thus be
thought of as perpetually being in the full state. Most of the time, using
an unbuffered channel is the correct approach as it applies backpressure in
the most predictable way. There was a great talk at Gophercon last week
about Understanding Channels
<https://about.sourcegraph.com/go/understanding-channels-kavya-joshi>,
but the video's not yet online.
*Problem Statement*
The reproduction case is alarmingly simple and common. Alice, in Vat A,
receives a call foo() from Bob, in Vat B. Alice has a reference to Bob
(either through the call or prior calls). foo() is a call that does the
1. Call bar() on Bob
2. Wait for the result
3. Return a value based on computation on the result
Now let's talk broadly about implementation constraints of the RPC system
bridging Vat A and Vat B. There are at least two concurrent (but
definitely not parallel) processes at play: receiving messages from the
wire and sending messages on the wire. Both of these mutate the set of 4
(5) tables, so naturally a mutex protects this data structure. When the
receiver process encounters a Call, it must acquire the mutex, start the
work, then release the mutex. The "start the work" piece is what we're
interested in here.
Let's assume, as you say, that each call comes over an unbuffered
channel. Each capability creates a loop where it receives a new call,
optionally does a bit of processing to "start the work", then sends back a
response. If there's an RPC or longer I/O operation, it spawns a goroutine
that sends back the response so as to avoid blocking that loop.
Here's where the language difference comes into play and provides a
footgun that C++ does not have: Step 2 of foo(). In the C++
implementation, long-running work does have a different function "color"
and returns a future, which you should not (and makes it convoluted to)
wait on without starting a concurrent process to finish the rest of the
work. I do have futures in the Go implementation, but instead of exposing
a ".then()" method, they expose a ".wait()" method, since this is much more
idiomatic. In Go, it would be quite easy to accidentally have a
long-running operation be in that "start the work" phase. This is fine if
the long-running work doesn't use RPCs, but if you happen to wait for an
RPC to return, then this deadlocks. The RPC connection will still be
blocked on servicing the "start the work".
With a channel, you postpone the problem to the next time Bob calls foo()
1. Bob calls foo() on Alice
2. Vat A sends the foo() message to Alice's channel
3. Alice receives the foo() message
4. Vat A releases the mutex on the tables and waits for the next message
to come over the wire from Vat B
5. Alice calls bar() on Bob
6. Vat A acquires the mutex on the tables, adds the question entry to the
table, writes the Call to Vat B, then releases the mutex. Vat A gives a
future back to Alice.
7. Alice starts waiting on the future.
8. Bob calls foo() on Alice again. Call this foo'().
9. Vat A attempts to send the foo'() message to Alice's channel. It
blocks, since Alice is not ready to receive from the channel (it's still
processing foo()).
10. Bob is finished with bar() and attempts to write its return to Vat A.
Vat A is now deadlocked: it can't receive the new wire message because it's
waiting for Alice to receive foo'(), but foo'() cannot be received until
foo() receives bar()'s response.
*How this surfaces in go-capnproto2*
As mentioned above, the first few revisions of the Go RPC implementation
used channels in this sort of way under the hood to call methods. However,
it did not wait for call to be acknowledged, which destroys E-Order. I've
since moved to a mutex-based solution (since it reduces the number of
goroutines that the RPC system keeps around and makes it easier to debug),
but fundamentally server.Ack
<https://godoc.org/zombiezen.com/go/capnproto2/server#Ack> is equivalent
to starting a channel receive.
With the current go-capnproto2 implementation, you can't even safely start
a call before the server.Ack call (which starts the work), since there the
create a temporary sub-mutex that is propagated via the Context. This is
definitely a necessary change. I'm still a bit scared that there could be
deadlock: Vat A calls to Vat B to service a request from Vat C while Vat A
calls to Vat C to service a call from Vat B. (Note that this is not
3-party, this is just multiple 2-party connections.) I'm not certain about
this however, and would love to be wrong, since the workaround would seem
to be a process-wide mutex on Cap'n Proto RPC work.
However, there's still the looming footgun here: accidental deadlocks
while starting work. I don't see a way around this in general. And it
bears repeating, this is actually a problem for C++ too, but it is much
more obvious when looking at the code that it's blocking on something it
shouldn't. Whether the Go implementation uses straight-line code or
channels doesn't make too much of a difference here: there's still
straight-line code in the channel-based approach, but the split between the
deferred work and the initial work requires a bit more work for the
application (spawning a goroutine and ensuring a resolve promise method
gets called versus just returning). Channels versus functions come down to
a question of API appearance rather than functional change.
*Mitigation*
The only guaranteed safe way to mitigate these issues is to make the Go
capabilities act serially to preserve E-Order. It could be provided in an
opt-out manner, but I'm not even sure how I would write code that avoids
the recursive mutex lock problem. I'm open to other ideas on how to solve
this.
Thanks for the read,
-Ross
Post by 'Kenton Varda' via Cap'n ProtoPost by Ross LightSo in this old thread
<https://groups.google.com/d/topic/capnproto/4SLfibQPWFE/discussion>,
it's stated that the "call is received" event requires calling into
application code. From an implementation standpoint, this is declaring
that receiving a call in the RPC system is a critical section that involves
crossing over into application code boundary, which may try to acquire the
same mutex (by making a call on the connection). While you can postpone
this problem by adding queueing, I'm already a little nervous about how
much queueing is required by the protocol.
I'd like to suggest that the E model be considered: each capability is a
separate single queue. Instead of "call A is received happens before call
B is received", "call A returns happens before call B starts". The reason
this simplifies implementation is that because it prescribes what ought to
happen in the critical section (enqueue or throw an overload exception),
then no application needs to be invoked in the critical section. This
might not be a problem for the C++ implementation right now, but once
fibers are involved, I think it would become one.
If I understand what you're proposing correctly, then another way of
saying it is: "A call must return a result before the next call on the same
object can begin."
(To be clear, this certainly isn't "the E model". Strictly speaking, in
E, calls don't "return" anything, but they do eventually resolve a promise,
and there's no requirement that that resolution happens before the next
call can proceed.)
I don't think this proposal would work. You're saying that if a method
call foo() wants to allow other methods to be called before foo() produces
a result, then foo() must produce a result via a callback rather than via
return. foo() would need to take, as one of its parameters, a capability
which it calls with the eventual results.
This would, of course, lead to all the usual "callback hell" problems we
see in async I/O. Over time, we've reached a consensus that the right way
to solve "callback hell" is to use promises. Promises allow us to express
the eventual results of an async function as a return value, which is much
more natural than using callbacks. Also, it makes it much clearer how
errors are to be propagated, and makes it harder to accidentally leak
errors.
So the next logical step would be to introduce a notion of promises into
Cap'n Proto interface specs. Let methods return promises instead of raw
values, and then they are free to interleave as necessary.
But then what would happen? Probably, everyone would declare all of their
methods to return promises, to give themselves flexibility to change their
implementation in the future if needed. In fact, there'd be even more
temptation to do this then there is in C++ and Javascript today, because
the client of a Cap'n Proto interface already has to treat the result as a
promise for latency and pipelining reasons. So, making a method return a
promise would create no new inconvenience on the client side (because
clients already have to deal with promises either way), and it would create
no inconvenience on the server side (because you can return an
immediately-resolved promise basically just as easily as returning a
value). So, everyone would declare every method to return a promise.
The next step, then, would be to say: OK, since everyone is declaring all
returns to be promises anyway, we're just going to say that it's implied.
All methods actually return promises. You don't need to declare it.
And then... we'd be back to *exactly* where we are today.
Today, the right way to think about Cap'n Proto methods is to say that
all methods return promises.
Post by Ross LightI believe that the same properties can be obtained by pushing this into
interface definitions: if an interface really wants to declare that
operations can happen in parallel, then there can be a root capability that
creates a capability for each operation. Then the RPC subsystem can know
much more about how much work is being scheduled.
I realize this would be a big change, but I can't see a good way to
avoid this problem in any implementation of the RPC protocol that tries to
use a connection concurrently. Effectively, this forces all
implementations to be single-threaded AFAICT. Let me know what you think.
Cap'n Proto technically only requires that each *object* is
single-threaded. It's based on the actor model, which is actually similar
to the CSP model on which Go's concurrency is based. In fact, maybe the
problem is that we're trying to map Cap'n Proto to the wrong idioms in Go.
Imagine this design: Instead of mapping capnp methods to Go functions, we
map them to messages on a Go channel. Each object reads from a channel.
Each message on the channel initiates a call, and specifies a response
channel to which the call results should be sent when they are ready.
This design would achieve E-Order while allowing overlapping calls and
being idiomatic with Go's concurrency model.
Thoughts?
=====================
On another note -- getting away from specific languages or models -- I
don't think it's practical, in most use cases, to try to utilize multiple
processor cores to service a single connection. (Note I'm avoiding the word
"threads" here since languages like Go blur the meaning of "threads"
between processor cores vs. coroutines. I'm talking about cores, not about
coroutines.)
The problem is, the connection is implicitly a synchronization point. You
can't have multiple cores literally reading from the same connection at the
same time. So you necessarily have to have one core (at a time) servicing
the connection and then passing data off to other cores. But, the cost of
synchronization and data transfer between cores is likely to outweigh the
benefit in most use cases. You'd only get a benefit if one client is making
multiple CPU-heavy calls concurrently, which isn't all that common. In all
other cases you'd lose performance.
Instead, I think a better model is to balance whole connections across
cores. A client can make multiple connections if they want to utilize
multiple cores. When three-party handoff is implemented, the server will
even be able to instruct the client to open additional connections,
transparently to the app. This way the OS kernel actually knows which
thread any particular message is destined for, and directly schedules that
thread when a message arrives. No extra context switches!
This approach works (e.g. using Cap'n Proto C++) today.
-Kenton
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.