Post by 'Kenton Varda' via Cap'n ProtoPost by b***@gmail.comThis makes sense in a general case but I'm using CapnProto for an IPC on
a single device, do not need to load balance, etc.
Ohhh, this simplifies things considerably. For local IPC, you can
absolutely rely on the kernel to tell you if the service crashes. You will
always get a DISCONNECTED exception immediately. So you only need to catch
those and retry. You don't need to use timeouts.
Only for unix sockets though. For TCP the first syscall after the
disconnect always succeeds and the RESET packet is received. Only the next
syscall will return "connection reset". Luckily CapnProto RPC does at least
2 writeMessage to the socket (call and finish) and also tries to read
responses so there are enough actual syscalls to detect the disconnect it
seems.
Post by 'Kenton Varda' via Cap'n ProtoPost by b***@gmail.comThere are mentions of persistent capabilities and "sturdy refs" in docs
and sources. But I've not figured out what is this exactly, how to use them
and if it will be any help at all in this case.
SturdyRefs are sort of a "design pattern". It's not something built into
the library, since implementing them requires deep knowledge of
higher-level details. The idea is that you can call save() on a capability
and receive back some sort of token that you can use to get that capability
again in the future. But, what those tokens should look like and how
exactly you restore them is not specified by Cap'n Proto.
Thanks! This looks useful for security stuff but not a concern for me
currently.
Post by 'Kenton Varda' via Cap'n ProtoPost by b***@gmail.comTo handle disconnects I need to attach "catch_" to every request send
call which tries to reconnect (and bootstrap, etc.) on the error and repeat
the call. I've tried to create a generic call wrapper function but there is
a problem with it: it is impossible to store requests nor send the same
request twice, so the only option is to capture the whole request
generating code into a lambda and repeat that. I need to copy all the data
into the lambda and rebuild the request every time before sending. Looks
kind of awkward and inefficient. You can see an attempt in the code I've
posted above. Maybe I've missed some lower level API which could help with
that? Does the repeating the same request from the application level makes
sense or I'm trying to do something silly here?
I generally recommend catching the DISCONNECTED exception somewhere
higher-level in your program, not at every single call site. I think this
is where your complexity is coming from. Try to think about what is the
overall operation you are performing (which may consistent of a sequence of
multiple calls). Add an exception handler to the overall operation which
restarts the whole thing.
Thanks for the suggestion! I've thought about this but my use case is very
basic: I'm just gathering some data in one process and calling
onSomeEvent(data) or onSomeOtherEvent(data) of the other. I have multiple
processes which communicate like that and can potentially crash or restart
at any time. So that I need to repeat requests I've not managed to think of
something better than to wrap the whole request generation in a lambda so
far. Maybe for this particular scenario RPC is an overkill but I thought it
will be useful later. Better start with a simpler case.
Post by 'Kenton Varda' via Cap'n ProtoFWIW, here's a utility class from the Sandstorm codebase that helps with
https://github.com/sandstorm-io/sandstorm/blob/master/src/sandstorm/util.h#L426-L461
It creates a capability which proxies calls, but when any call fails with
DISCONNECTED, it initiates a reconnect callback and blocks subsequent calls
until the reconnect completes. However, note that it does not automatically
retry the call which threw DISCONNECTED, since only the application knows
what kinds of calls are safe to retry. So the app still needs to catch and
retry, but at least the exception handler doesn't need to figure out for
itself how to reconnect.
Thank you! This is much closer to the solution I've envisioned. It also
demonstrates advanced capability APIs. I've tried to integrate this
CapRedirector and having some problems.
I'm using it like this:
client = capnp::Capability::Client{kj::heap<CapRedirector>([this] () {
return networkAddress->connect().then(
[this] (kj::Own<kj::AsyncIoStream>&& stream) {
KJ_DBG("connected");
connection = mv(stream);
twoPartyClient = kj::heap<capnp::TwoPartyClient>(*connection);
return twoPartyClient->bootstrap();
});
})}.castAs<typename T::Client>();
If the server was not restarted immediately after the call failed with the
disconnected exception the next error will be "Connection refused" which is
expected. But even after the server is started the next call also fails
with the same "Connection refused" exception. So the first call after the
restart is also lost. This is because the reconnect attempt is in the
exception handler.
I've tried adapt this for my use with queuing capability and reconnecting
but could not figure out how to use this CapRedirector's in the second
constructor form for experiments. The main problem is that I have nothing
to call setTarget on after I create a client from it because capability
client's constructor takes Own<Server>&& so my old reference will become
invalid. But I need to create the client instance to make requests. Can you
advice?
Also I could not quite figure out what problem iteration count and dummy
ping are solving.
Post by 'Kenton Varda' via Cap'n ProtoThis utility should probably be moved to the Cap'n Proto library at some
point. I think there's also room for some utility code for automatically
retrying an operation on DISCONNECTED exceptions.
-Kenton
Post by b***@gmail.comIs there some way to restore broken connection without recreating all
objects a new and loosing all the state? In theory after the in-progress
calls fail there should not be leftover data to read or write. The
AsyncIoStream is just a wrapper for a file descriptor. Does it matter for
the code above that it was replaced? Or the only option here is to never
let the CapnProto RPC system see the broken connection by using something
like ZeroMQ sockets underneath?
The hard part is not restoring the connection, but rather restoring the
server-side state. If your server crashed, then all the capabilities you
had on the connection point to objects that no longer exist. You need to
instruct the server on how to rebuild those. Hence you have to start fresh.
Since you said your use case is IPC, the only circumstance where you will
get disconnected is if the server crashed. So, some sort of approach that
allows restoring an existing session wouldn't help, because the server-side
state is gone.
In theory you could design a proxy/membrane that records every call that
returned a capability, so that it can replay them after disconnect in order
to reconstruct the same capabilities. However, whether or not this actually
works depends on the application -- in some cases, replaying the exact same
call sequence may be the wrong thing to do. For example, say the first time
you connected, you made a call to create a new file with some name. On
subsequent connections, you want to re-open the existing file rather than
create a new one. This is why it's not really possible for Cap'n Proto to
automate restoring connections...
-Kenton
I've not really considered the passed capabilities or server side state. In
my case repeating the last call is not a problem but in the general case
you are absolutely right. Maybe solving it automatically is not feasible
but some additional facilities in the library such as CapRedirector will
definitely help.
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.