[capnproto] Self-contained structures in C++

Discussion:

Nathan Hourt

2016-03-03 23:34:30 UTC

Hi, all. I'm looking for a way to have a capnp data structure which has
standard value semantics (move, copy, etc), but owns its storage (i.e. is
self-contained), which will be deallocated when the object is destroyed.
Basically, I want it to act like a kj::Array: I can use it as-is, I can
move it around between functions, but when it gets destroyed, its storage
goes with it.

I've thought about something like std::pair<MallocMessageBuilder,
Foo::Builder> where the builder references the MallocMessageBuilder, but
doing that requires separate APIs for self-contained builders vs. normal
builders which reference storage elsewhere. I can easily create my own
solution (one trivial solution would be
std::pair<kj::Maybe<MallocMessageBuilder>, Foo::Builder>), but if kj or
capnp already has something for this, I might as well use that.

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

Kenton Varda

2016-03-05 01:58:18 UTC

Permalink

I think you want something like:

template <typename T>
class RcBuilder<T> {
// Refcounted builder.

public:
typename T::Builder* operator->() { return builder; }

template <typename U>
RcBuilder<capnp::FromBuilder<U>> child(U subBuilder) {
// Given `subBuilder` which is a child of this object, return a new
// RcBuilder wrapper that also holds a refcount.

return RcBuilder<capnp::FromBuilder<U>>(kj::addRef(*message),
subBuilder);
}

private:
class RefcountedMallocMessageBuilder: public kj::Refcounted, public
capnp::MallocMessageBuilder {}

kj::Own<RefcountedMallocMessageBuilder> message;
typename T::Builder builder;
};

(Above is incomplete, but you get the idea.)

Note that any reference to any part of the message of course causes the
entire message to remain resident in memory. If that's a problem, you'll
need to copy the sub-object into a new MessageBuilder and wrap that.

-Kenton

Post by Nathan Hourt
Hi, all. I'm looking for a way to have a capnp data structure which has
standard value semantics (move, copy, etc), but owns its storage (i.e. is
self-contained), which will be deallocated when the object is destroyed.
Basically, I want it to act like a kj::Array: I can use it as-is, I can
move it around between functions, but when it gets destroyed, its storage
goes with it.
I've thought about something like std::pair<MallocMessageBuilder,
Foo::Builder> where the builder references the MallocMessageBuilder, but
doing that requires separate APIs for self-contained builders vs. normal
builders which reference storage elsewhere. I can easily create my own
solution (one trivial solution would be
std::pair<kj::Maybe<MallocMessageBuilder>, Foo::Builder>), but if kj or
capnp already has something for this, I might as well use that.
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Branislav Katreniak

2016-03-11 14:59:41 UTC

Permalink

Hi Kenton / all

I am evaluating capnproto and I love it! But I am worried about the lack of
self contained classes.

I would like to discuss how to extend capnpc-c++ to generate not only
Reader and Builder classes but also the self contained class.
I will call it Mutable for now.

When Mutable is constructed from Reader, Builder or another Mutable, all
data is copied into one new segment owned by Mutable.
This makes it quite fast operation without many allocations.
The price is that this initial memory is never released as Mutable is
modified and it is well bounded.

When Mutable is modified and the modification needs to allocate memory, it
allocates new segment for each chunk.
These separate segments can easily be released when they are disowned.

Would it be possible to extend List encoding to hold both size and capacity?

Then Mutable could be just Builder subclass owning MutableMessageBuilder.

Does it make sense?

Brano

Kenton Varda

2016-03-11 19:17:54 UTC

Permalink

Hi Branislav,

I have actually been planning for a while to add "POCS support" ("Plain Old
C++ Struct"), which is slightly different from what you describe but solves
similar problems.

Currently for a struct Foo you get Foo::Builder and Foo::Reader. The type
`Foo` itself is a "namespace struct"; it only exists to contain the other
names. My plan is to make `Foo` actually be a plain-old C++ struct matching
the declared type. It would have a constructor that copies from a Reader
and a method for copying itself into a Builder.

Using the POCS would of course entail a copy and some allocation, but the
benefit is now you have a data structure that can be used in the ways that
we're all used to using C++. It can even be mutated arbitrarily without the
memory leaking problem. People would likely use POCS in
non-performance-sensitive cases and then could use the zero-copy APIs when
performance really matters.

There is a slight difference between what I'm proposing and what you're
proposing in that a POCS would do more allocation of sub-objects since it's
not using Cap'n Proto format at all, but I think that's the right choice
since when performance matters people can optimize based on the zero-copy
classes.

Thoughts?

-Kenton

Post by Branislav Katreniak
Hi Kenton / all
I am evaluating capnproto and I love it! But I am worried about the lack
of self contained classes.
I would like to discuss how to extend capnpc-c++ to generate not only
Reader and Builder classes but also the self contained class.
I will call it Mutable for now.
When Mutable is constructed from Reader, Builder or another Mutable, all
data is copied into one new segment owned by Mutable.
This makes it quite fast operation without many allocations.
The price is that this initial memory is never released as Mutable is
modified and it is well bounded.
When Mutable is modified and the modification needs to allocate memory, it
allocates new segment for each chunk.
These separate segments can easily be released when they are disowned.
Would it be possible to extend List encoding to hold both size and capacity?
Then Mutable could be just Builder subclass owning MutableMessageBuilder.
Does it make sense?
Brano
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Nathan Hourt

2016-03-11 19:36:04 UTC

Permalink

I like that design. Having a POCS (which easily copies to/from a
builder/reader) which I can use in my frontend code would be more
convenient. The frontend isn't very performance sensitive, and it's a bit
trickier to convince Qt to play nicely with the builders and readers.

Post by Kenton Varda
Hi Branislav,
I have actually been planning for a while to add "POCS support" ("Plain
Old C++ Struct"), which is slightly different from what you describe but
solves similar problems.
Currently for a struct Foo you get Foo::Builder and Foo::Reader. The type
`Foo` itself is a "namespace struct"; it only exists to contain the other
names. My plan is to make `Foo` actually be a plain-old C++ struct matching
the declared type. It would have a constructor that copies from a Reader
and a method for copying itself into a Builder.
Using the POCS would of course entail a copy and some allocation, but the
benefit is now you have a data structure that can be used in the ways that
we're all used to using C++. It can even be mutated arbitrarily without the
memory leaking problem. People would likely use POCS in
non-performance-sensitive cases and then could use the zero-copy APIs when
performance really matters.
There is a slight difference between what I'm proposing and what you're
proposing in that a POCS would do more allocation of sub-objects since it's
not using Cap'n Proto format at all, but I think that's the right choice
since when performance matters people can optimize based on the zero-copy
classes.
Thoughts?
-Kenton

Post by Branislav Katreniak
Hi Kenton / all
I am evaluating capnproto and I love it! But I am worried about the lack
of self contained classes.
I would like to discuss how to extend capnpc-c++ to generate not only
Reader and Builder classes but also the self contained class.
I will call it Mutable for now.
When Mutable is constructed from Reader, Builder or another Mutable, all
data is copied into one new segment owned by Mutable.
This makes it quite fast operation without many allocations.
The price is that this initial memory is never released as Mutable is
modified and it is well bounded.
When Mutable is modified and the modification needs to allocate memory,
it allocates new segment for each chunk.
These separate segments can easily be released when they are disowned.
Would it be possible to extend List encoding to hold both size and capacity?
Then Mutable could be just Builder subclass owning MutableMessageBuilder.
Does it make sense?
Brano

--
You received this message because you are subscribed to the Google Groups

Post by Branislav Katreniak
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Nathan Hourt

*The Truth will set you free*

Branislav Katreniak

2016-03-11 20:41:45 UTC

Permalink

POCS class would definitely solve my problem functionally and I if it was
available now, I would likely not even think about this.

I must say that I have bad experience with parsing json messages into
Variants. That was so many allocations that it became bottleneck. POCS
class can be much better, depending and what is in the message. If somebody
encodes Hash<Text, Text>, it is a lot of allocations. But I have few
reasons against simple POCS class:
* it will drop any information that is encoded in newer version of
protocol.
* kj::String allocates even small strings on heap. That is costly. Small
size optimization in kj::String would solve this.
* POCS means another chunk of code generated per capnproto class.

My idea can be realized much simpler. If arena can track how much memory is
used and how much is leaked, it can rebuild message from scratch if it
wastes too much memory. That solves the whole problem of leaking. Arena can
also start to reuse memory.

The remaining problem is to find the right moment when the message can be
rebuilt. Arena would need to track all Readers and Builders. Then it is
possible to implement different strategies by providing different
MessageBuilder implementation:
* ensure (runtime) that no readers exists when builder is active and
rebuild message when writer exits.
* ref counted arena - either thread safe locked or single threaded without
locking. Qt-like implicit sharing (copy on write) is possible.
* Sub-Mutable can reference parent arena if if is bigger than some ration
of whole arena. Otherwise it makes deep copy.

API wise ... can Mutable class be supper set of POCS class? I think so.

But this approach really needs extension to List encoding to hold both size
and capacity.

Is it sane?

Brano

Kenton Varda

2016-03-11 21:06:11 UTC

Permalink

Post by Branislav Katreniak
POCS class would definitely solve my problem functionally and I if it was
available now, I would likely not even think about this.
I must say that I have bad experience with parsing json messages into
Variants. That was so many allocations that it became bottleneck. POCS
class can be much better, depending and what is in the message. If somebody
encodes Hash<Text, Text>, it is a lot of allocations. But I have few
* it will drop any information that is encoded in newer version of
protocol.

FWIW, I would definitely design this such that it preserves "unknown
fields".

Post by Branislav Katreniak
* kj::String allocates even small strings on heap. That is costly. Small
size optimization in kj::String would solve this.

Hmm, "small size optimization" (inlining small strings into the struct)
would change semantics in that any pointers into a kj::String would be
invalidated when the string is moved. We'd probably need to introduce a new
type, but that might not be so bad.

* POCS means another chunk of code generated per capnproto class.
That is a real problem, yes.

Post by Branislav Katreniak
My idea can be realized much simpler. If arena can track how much memory
is used and how much is leaked, it can rebuild message from scratch if it
wastes too much memory. That solves the whole problem of leaking. Arena can
also start to reuse memory.
The remaining problem is to find the right moment when the message can be
rebuilt. Arena would need to track all Readers and Builders.

My intuition is that tracking all readers and builders would be way too
expensive, and you'll end up with a net loss here. Currently these classes
are trivially copyable; to track them you'd need every
constructor/destructor call to manipulate a linked list, and you'd need to
store extra pointers.

Post by Branislav Katreniak
Then it is possible to implement different strategies by providing
* ensure (runtime) that no readers exists when builder is active and
rebuild message when writer exits.
* ref counted arena - either thread safe locked or single threaded without
locking. Qt-like implicit sharing (copy on write) is possible.
* Sub-Mutable can reference parent arena if if is bigger than some ration
of whole arena. Otherwise it makes deep copy.
API wise ... can Mutable class be supper set of POCS class? I think so.
But this approach really needs extension to List encoding to hold both
size and capacity.
Is it sane?

I think what you'll end up with is another API that's maybe somewhat easier
to manipulate than Builders today but still much harder to manipulate than
POCS. I think we will still want the POCS API for maximum ease of use, in
which case I am not sure this in-between API adds a lot of value. Meanwhile
it sounds pretty complex to implement. I'm also not convinced it would
perform better, given the extra bookkeeping needed. E.g. reusing memory
would require writing something very much like malloc(), so I wouldn't
expect it to perform better than malloc(). Doing GC/compaction adds lots of
complexity and may have its own performance issues.

-Kenton

Branislav Katreniak

2016-03-14 08:17:33 UTC

Permalink

Post by Kenton Varda
I think what you'll end up with is another API that's maybe somewhat
easier to manipulate than Builders today but still much harder to
manipulate than POCS. I think we will still want the POCS API for maximum
ease of use, in which case I am not sure this in-between API adds a lot of
value. Meanwhile it sounds pretty complex to implement. I'm also not
convinced it would perform better, given the extra bookkeeping needed. E.g.
reusing memory would require writing something very much like malloc(), so
I wouldn't expect it to perform better than malloc(). Doing GC/compaction
adds lots of complexity and may have its own performance issues.

You are totally right. It was stupid idea. Thank you for naming it so
clearly!

Do you have already ideas for POCS classes design?
* What about integration with std classes / or KJ style? Nothrow move
constructors? Disabled copy constructor? Using std::string, std::vector for
strings and lists?
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for
internal representation for everything except pointers? That could lead to
high code reuse and fast serialization.

I wonder how capnp is typically used within bigger project without POCS
classes. How you use capn only on application edge for RPC and do you hold
the state in different set of internal classes? Do you do the memory
management explicitly by copying & passing MallocMessageBuilder instances
around?

Kind regards
Brano

Kenton Varda

2016-03-18 20:29:52 UTC

Permalink

Post by Branislav Katreniak
Do you have already ideas for POCS classes design?
* What about integration with std classes / or KJ style? Nothrow move
constructors? Disabled copy constructor? Using std::string, std::vector for
strings and lists?

Naturally I'd prefer KJ over std. :) Probably text fields will be
kj::String and lists will be kj::Vector.

I'm fine with marking move constructors nothrow (though I think std's
insistence on this is misguided).

I think I would disable the copy constructor, but provide a .clone().

Post by Branislav Katreniak
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for
internal representation for everything except pointers? That could lead to
high code reuse and fast serialization.

It would be cool to lay out the data fields such that they can be memcpy()d
from the serialized format, although I was also thinking it would be nice
if you didn't have to go through accessor methods and instead these fields
were simply public member variables. I don't think these two things are
compatible, due to endianness issues. Will have to think about which is
better.

Unknown fields would need to be preserved in separate arrays of bytes and
pointers (for the respective sections). I guess preserving data fields is
tricky if we don't use the memcpy()able layout since they could be
interleaved with the known fields. To preserve a pointer, we'd actually
serialize it and its target in "flat" format and store a word array. We
should of course optimize for the case that any extra fields are zero/null
and so need not be preserved.

Post by Branislav Katreniak
I wonder how capnp is typically used within bigger project without POCS
classes. How you use capn only on application edge for RPC and do you hold
the state in different set of internal classes? Do you do the memory
management explicitly by copying & passing MallocMessageBuilder instances
around?

Well, I'm probably personally the largest-scale user of Cap'n Proto (within
Sandstorm.io). I use a variety of styles depending on the situation. Simple
data (with only a couple fields) is easy to translate into a struct
internally. For complicated data I will sometimes copy into a
MallocMessageBuilder, yes. Though I'd say most of the time our RPC calls
consume their data directly and produce their results directly, without
really copying it elsewhere. Still, the API is wonky and can get
cumbersome, making me personally excited about the POCS solution.

-Kenton

'Geoffrey Romer' via Cap'n Proto

2016-03-18 20:53:06 UTC

Permalink

Post by Kenton Varda

Naturally I'd prefer KJ over std. :) Probably text fields will be
kj::String and lists will be kj::Vector.
I'm fine with marking move constructors nothrow (though I think std's
insistence on this is misguided).
I think I would disable the copy constructor, but provide a .clone().

It might be possible to do both at once, by making those public member
variables special types that used little-endian representation internally,
regardless of the host endianness, while providing a native-integer-like
interface through operator overloads etc. See e.g. the Boost.Endian arithmetic
types <http://www.boost.org/doc/libs/1_60_0/libs/endian/doc/arithmetic.html>
.

Post by Kenton Varda
Unknown fields would need to be preserved in separate arrays of bytes and
pointers (for the respective sections). I guess preserving data fields is
tricky if we don't use the memcpy()able layout since they could be
interleaved with the known fields. To preserve a pointer, we'd actually
serialize it and its target in "flat" format and store a word array. We
should of course optimize for the case that any extra fields are zero/null
and so need not be preserved.

Well, I'm probably personally the largest-scale user of Cap'n Proto
(within Sandstorm.io). I use a variety of styles depending on the
situation. Simple data (with only a couple fields) is easy to translate
into a struct internally. For complicated data I will sometimes copy into a
MallocMessageBuilder, yes. Though I'd say most of the time our RPC calls
consume their data directly and produce their results directly, without
really copying it elsewhere. Still, the API is wonky and can get
cumbersome, making me personally excited about the POCS solution.
-Kenton
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Kenton Varda

2016-03-18 22:45:05 UTC

Permalink

On Fri, Mar 18, 2016 at 1:53 PM, 'Geoffrey Romer' via Cap'n Proto <

Post by 'Geoffrey Romer' via Cap'n Proto
It might be possible to do both at once, by making those public member
variables special types that used little-endian representation internally,
regardless of the host endianness, while providing a native-integer-like
interface through operator overloads etc. See e.g. the Boost.Endian arithmetic
types
<http://www.boost.org/doc/libs/1_60_0/libs/endian/doc/arithmetic.html>.

Right, proxy types. My beef with them is that they produce surprising
results when combined with type inference.

auto foo = myStruct.bar;
// Turns out `foo` now has type WeirdProxyThing<int>.

int& iref = myStruct.bar;
// Unexpectedly doesn't work.

long n = myStruct.bar;
someFuncOverloadedOnEveryIntType(myStruct.bar);
// It's hard to make sure that neither of the above lines complains
about ambiguity.

At one time I was thinking about proposing a C++ language change that would
allow you to mark a proxy type as "always convert to type T when copying"
-- and hence `foo`'s type above would be inferred as `int`, and ambiguity
is solved by treating the type as `int`. References are still a problem,
though.

Another problem with trying to lay out POCOs precisely is that boolean
values would become bitfields. Fortunately we don't need any kind of proxy
for booleans (though of course references won't work). Unfortunately the
layout of bitfields differs by compiler, so we'll need some #ifdefs.

Yet another problem is that groups are going to be really weird to
implement while matching Cap'n Proto layout, since a group's fields can be
interleaved with non-grouped fields. The best implementation I can think of
would be "slightly UB":

struct Outer {
struct SomeGroup {
Proxy<int> baz;
Padding<int> _outer_foo;
Proxy<int> qux;
Padding<int> _outer_bar;
Proxy<int> corge;
}

union {
SomeGroup group;
struct {
Padding<int> _group_baz;
Proxy<int> foo;
Padding<int> _group_qux;
Proxy<int> bar;
Padding<int> _group_corge;
}
}
}

Now you might write code like:

Outer outer;
outer.foo = 123;
outer.group.baz = 234;
assert(outer.foo == 123); // UB?

I think this is technically UB because accessing any one field of the union
technically de-initializes all the others.

Then again, assuming we know exactly how the compiler will pack structs is
already a pretty big assumption. Can we assume compilers won't actually
have a problem with the above?

-Kenton

'Geoffrey Romer' via Cap'n Proto

2016-03-18 23:38:12 UTC

Permalink

Post by Kenton Varda
On Fri, Mar 18, 2016 at 1:53 PM, 'Geoffrey Romer' via Cap'n Proto <

Right, proxy types. My beef with them is that they produce surprising
results when combined with type inference.
auto foo = myStruct.bar;
// Turns out `foo` now has type WeirdProxyThing<int>.
int& iref = myStruct.bar;
// Unexpectedly doesn't work.
long n = myStruct.bar;
someFuncOverloadedOnEveryIntType(myStruct.bar);
// It's hard to make sure that neither of the above lines complains
about ambiguity.
At one time I was thinking about proposing a C++ language change that
would allow you to mark a proxy type as "always convert to type T when
copying" -- and hence `foo`'s type above would be inferred as `int`, and
ambiguity is solved by treating the type as `int`. References are still a
problem, though.
Another problem with trying to lay out POCOs precisely is that boolean
values would become bitfields. Fortunately we don't need any kind of proxy
for booleans (though of course references won't work). Unfortunately the
layout of bitfields differs by compiler, so we'll need some #ifdefs.
Yet another problem is that groups are going to be really weird to
implement while matching Cap'n Proto layout, since a group's fields can be
interleaved with non-grouped fields. The best implementation I can think of
struct Outer {
struct SomeGroup {
Proxy<int> baz;
Padding<int> _outer_foo;
Proxy<int> qux;
Padding<int> _outer_bar;
Proxy<int> corge;
}
union {
SomeGroup group;
struct {
Padding<int> _group_baz;
Proxy<int> foo;
Padding<int> _group_qux;
Proxy<int> bar;
Padding<int> _group_corge;
}
}
}
Outer outer;
outer.foo = 123;
outer.group.baz = 234;
assert(outer.foo == 123); // UB?
I think this is technically UB because accessing any one field of the
union technically de-initializes all the others.

Not necessarily. The standard says that "In a standard-layout union with an
active member of struct type T1, it is permitted to read a non-static data
member m of another union member of struct type T2 provided m is part of
the common initial sequence of T1 and T2." So I think you're OK so long as
SomeGroup and your anonymous struct are standard-layout and
layout-compatible (i.e. their "common initial sequence" consists of all
data members), and superficially that seems pretty doable.

Post by Kenton Varda
Then again, assuming we know exactly how the compiler will pack structs is
already a pretty big assumption. Can we assume compilers won't actually
have a problem with the above?

If the struct is standard-layout, its members are guaranteed to be laid out
in the order they're declared, and there's guaranteed to be no padding
before the first member. Padding between members is implementation-defined,
but from what I can tell, in practice it's always the minimum amount of
padding necessary to satisfy alignment requirements. Furthermore, you can
use offsetof() to validate your guesses about the layout of the struct, so
that if you get it wrong you get a build error rather than silent runtime
corruption.

Post by Kenton Varda
-Kenton

Kenton Varda

2016-03-18 23:41:52 UTC

Permalink

Oh, of course, things like struct sockaddr already rely on aliasing between
structs in a union, so that has to work. Great! Maybe this will work.

-Kenton

On Fri, Mar 18, 2016 at 4:38 PM, 'Geoffrey Romer' via Cap'n Proto <

Post by 'Geoffrey Romer' via Cap'n Proto

Post by Kenton Varda
On Fri, Mar 18, 2016 at 1:53 PM, 'Geoffrey Romer' via Cap'n Proto <

Right, proxy types. My beef with them is that they produce surprising
results when combined with type inference.
auto foo = myStruct.bar;
// Turns out `foo` now has type WeirdProxyThing<int>.
int& iref = myStruct.bar;
// Unexpectedly doesn't work.
long n = myStruct.bar;
someFuncOverloadedOnEveryIntType(myStruct.bar);
// It's hard to make sure that neither of the above lines complains
about ambiguity.
At one time I was thinking about proposing a C++ language change that
would allow you to mark a proxy type as "always convert to type T when
copying" -- and hence `foo`'s type above would be inferred as `int`, and
ambiguity is solved by treating the type as `int`. References are still a
problem, though.
Another problem with trying to lay out POCOs precisely is that boolean
values would become bitfields. Fortunately we don't need any kind of proxy
for booleans (though of course references won't work). Unfortunately the
layout of bitfields differs by compiler, so we'll need some #ifdefs.
Yet another problem is that groups are going to be really weird to
implement while matching Cap'n Proto layout, since a group's fields can be
interleaved with non-grouped fields. The best implementation I can think of
struct Outer {
struct SomeGroup {
Proxy<int> baz;
Padding<int> _outer_foo;
Proxy<int> qux;
Padding<int> _outer_bar;
Proxy<int> corge;
}
union {
SomeGroup group;
struct {
Padding<int> _group_baz;
Proxy<int> foo;
Padding<int> _group_qux;
Proxy<int> bar;
Padding<int> _group_corge;
}
}
}
Outer outer;
outer.foo = 123;
outer.group.baz = 234;
assert(outer.foo == 123); // UB?
I think this is technically UB because accessing any one field of the
union technically de-initializes all the others.

Not necessarily. The standard says that "In a standard-layout union with
an active member of struct type T1, it is permitted to read a non-static
data member m of another union member of struct type T2 provided m is part
of the common initial sequence of T1 and T2." So I think you're OK so long
as SomeGroup and your anonymous struct are standard-layout and
layout-compatible (i.e. their "common initial sequence" consists of all
data members), and superficially that seems pretty doable.

Post by Kenton Varda
Then again, assuming we know exactly how the compiler will pack structs
is already a pretty big assumption. Can we assume compilers won't actually
have a problem with the above?

If the struct is standard-layout, its members are guaranteed to be laid
out in the order they're declared, and there's guaranteed to be no padding
before the first member. Padding between members is implementation-defined,
but from what I can tell, in practice it's always the minimum amount of
padding necessary to satisfy alignment requirements. Furthermore, you can
use offsetof() to validate your guesses about the layout of the struct, so
that if you get it wrong you get a build error rather than silent runtime
corruption.

Post by Kenton Varda
-Kenton

--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Alexander S.

2016-03-20 04:37:07 UTC

Permalink

I'm probably missing the point here (and I admittedly know little about
capnproto), but could we simply generate a class with byte arrays for
primary storage and then expose the POCS fields as references (or pointers)
into those arrays?
--ap

Post by Kenton Varda
Oh, of course, things like struct sockaddr already rely on aliasing
between structs in a union, so that has to work. Great! Maybe this will
work.
-Kenton
On Fri, Mar 18, 2016 at 4:38 PM, 'Geoffrey Romer' via Cap'n Proto <

Post by 'Geoffrey Romer' via Cap'n Proto

Post by Kenton Varda
On Fri, Mar 18, 2016 at 1:53 PM, 'Geoffrey Romer' via Cap'n Proto <

Right, proxy types. My beef with them is that they produce surprising
results when combined with type inference.
auto foo = myStruct.bar;
// Turns out `foo` now has type WeirdProxyThing<int>.
int& iref = myStruct.bar;
// Unexpectedly doesn't work.
long n = myStruct.bar;
someFuncOverloadedOnEveryIntType(myStruct.bar);
// It's hard to make sure that neither of the above lines complains
about ambiguity.
At one time I was thinking about proposing a C++ language change that
would allow you to mark a proxy type as "always convert to type T when
copying" -- and hence `foo`'s type above would be inferred as `int`, and
ambiguity is solved by treating the type as `int`. References are still a
problem, though.
Another problem with trying to lay out POCOs precisely is that boolean
values would become bitfields. Fortunately we don't need any kind of proxy
for booleans (though of course references won't work). Unfortunately the
layout of bitfields differs by compiler, so we'll need some #ifdefs.
Yet another problem is that groups are going to be really weird to
implement while matching Cap'n Proto layout, since a group's fields can be
interleaved with non-grouped fields. The best implementation I can think of
struct Outer {
struct SomeGroup {
Proxy<int> baz;
Padding<int> _outer_foo;
Proxy<int> qux;
Padding<int> _outer_bar;
Proxy<int> corge;
}
union {
SomeGroup group;
struct {
Padding<int> _group_baz;
Proxy<int> foo;
Padding<int> _group_qux;
Proxy<int> bar;
Padding<int> _group_corge;
}
}
}
Outer outer;
outer.foo = 123;
outer.group.baz = 234;
assert(outer.foo == 123); // UB?
I think this is technically UB because accessing any one field of the
union technically de-initializes all the others.

Not necessarily. The standard says that "In a standard-layout union with
an active member of struct type T1, it is permitted to read a non-static
data member m of another union member of struct type T2 provided m is part
of the common initial sequence of T1 and T2." So I think you're OK so long
as SomeGroup and your anonymous struct are standard-layout and
layout-compatible (i.e. their "common initial sequence" consists of all
data members), and superficially that seems pretty doable.

Post by Kenton Varda
Then again, assuming we know exactly how the compiler will pack structs
is already a pretty big assumption. Can we assume compilers won't actually
have a problem with the above?

If the struct is standard-layout, its members are guaranteed to be laid
out in the order they're declared, and there's guaranteed to be no padding
before the first member. Padding between members is implementation-defined,
but from what I can tell, in practice it's always the minimum amount of
padding necessary to satisfy alignment requirements. Furthermore, you can
use offsetof() to validate your guesses about the layout of the struct, so
that if you get it wrong you get a build error rather than silent runtime
corruption.

Post by Kenton Varda
-Kenton

Kenton Varda

2016-03-20 11:07:46 UTC

Permalink

Post by Alexander S.
I'm probably missing the point here (and I admittedly know little about
capnproto), but could we simply generate a class with byte arrays for
primary storage and then expose the POCS fields as references (or pointers)
into those arrays?

Not really. The pointers would be larger than the data they point to, the
extra indirection would be slow, and copy constructors would have to be
carefully-written since you wouldn't want to end up with pointers into the
wrong object.

Better to go with accessor methods at that point.

Post by Alexander S.
--ap

Post by 'Geoffrey Romer' via Cap'n Proto

Post by Kenton Varda
On Fri, Mar 18, 2016 at 1:53 PM, 'Geoffrey Romer' via Cap'n Proto <

Right, proxy types. My beef with them is that they produce surprising
results when combined with type inference.
auto foo = myStruct.bar;
// Turns out `foo` now has type WeirdProxyThing<int>.
int& iref = myStruct.bar;
// Unexpectedly doesn't work.
long n = myStruct.bar;
someFuncOverloadedOnEveryIntType(myStruct.bar);
// It's hard to make sure that neither of the above lines complains
about ambiguity.
At one time I was thinking about proposing a C++ language change that
would allow you to mark a proxy type as "always convert to type T when
copying" -- and hence `foo`'s type above would be inferred as `int`, and
ambiguity is solved by treating the type as `int`. References are still a
problem, though.
Another problem with trying to lay out POCOs precisely is that boolean
values would become bitfields. Fortunately we don't need any kind of proxy
for booleans (though of course references won't work). Unfortunately the
layout of bitfields differs by compiler, so we'll need some #ifdefs.
Yet another problem is that groups are going to be really weird to
implement while matching Cap'n Proto layout, since a group's fields can be
interleaved with non-grouped fields. The best implementation I can think of
struct Outer {
struct SomeGroup {
Proxy<int> baz;
Padding<int> _outer_foo;
Proxy<int> qux;
Padding<int> _outer_bar;
Proxy<int> corge;
}
union {
SomeGroup group;
struct {
Padding<int> _group_baz;
Proxy<int> foo;
Padding<int> _group_qux;
Proxy<int> bar;
Padding<int> _group_corge;
}
}
}
Outer outer;
outer.foo = 123;
outer.group.baz = 234;
assert(outer.foo == 123); // UB?
I think this is technically UB because accessing any one field of the
union technically de-initializes all the others.

Not necessarily. The standard says that "In a standard-layout union with
an active member of struct type T1, it is permitted to read a non-static
data member m of another union member of struct type T2 provided m is part
of the common initial sequence of T1 and T2." So I think you're OK so long
as SomeGroup and your anonymous struct are standard-layout and
layout-compatible (i.e. their "common initial sequence" consists of all
data members), and superficially that seems pretty doable.

Post by Kenton Varda
Then again, assuming we know exactly how the compiler will pack structs
is already a pretty big assumption. Can we assume compilers won't actually
have a problem with the above?

If the struct is standard-layout, its members are guaranteed to be laid
out in the order they're declared, and there's guaranteed to be no padding
before the first member. Padding between members is implementation-defined,
but from what I can tell, in practice it's always the minimum amount of
padding necessary to satisfy alignment requirements. Furthermore, you can
use offsetof() to validate your guesses about the layout of the struct, so
that if you get it wrong you get a build error rather than silent runtime
corruption.

Post by Kenton Varda
-Kenton

--
You received this message because you are subscribed to the Google
Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/group/capnproto.

Branislav Katreniak

2016-03-21 08:54:45 UTC

Permalink

Post by Kenton Varda
Naturally I'd prefer KJ over std. :) Probably text fields will be
kj::String and lists will be kj::Vector.

I understand that KJ types are a must. But I am curious. Would you accept
patches to use std types in POCS classes if based on global define? Like
to CAPNP_LITE.

Post by Kenton Varda
* How do you plan to preserve "unknown fields"? Reuse capnp encoding for

Post by Branislav Katreniak
internal representation for everything except pointers? That could lead to
high code reuse and fast serialization.

Getters and setters will make the code the api more consistent with Reader
and Builder. The prefixes `set` and `get` are great in generated code,
because they move all fields into kind of namespace. No collisions with
language keywords, no collisions with generic methods like `clone()`.

Post by Kenton Varda
I guess preserving data fields is tricky if we don't use the memcpy()able
layout since they could be interleaved with the known fields.

Internal padding is limited to 7 bytes. Compiler should know their
positions. So this is bot big problem.

What about API for groups? Can they be pure references into parent POCS
class without owned memory? If not, I have hard time to imagine how to
effectively reuse serialized format for group POCS class. For me, this
looks like a show stopper for serialized format in POCS classes.

Kind regards
Brano

Branislav Katreniak

2016-03-25 19:46:03 UTC

Permalink

As I am trying to implement generator for POCS classes, I have few
questions.

What is good name for POCS class in source code? The generated code is
placed in class that was called "outer" class now. It would be good to come
with terminology that can be used also in documentation.

What is api to primitive fields? It is not possible to return reference to
primitive type, because it can have different byte order and it may be
xored with default value. It needs either setter and getter or proxy type.
I start simple with setter and getter. Proxy type can be introduced later.

uint32_t getNumber() const;
void setNumber(uint32_t);

What is api to string field? Minimal approach that need is intrusive into
string class

kj::String& getName();
const kj::String& getName() const { return _name; }

kj::String needs to be extend with null field.

Second approach

bool hasName() const;
kj::String& getName();
const kj::String& getName() const;

Non-const getter sets name to non-null. Const getter can return reference
to global instance if hasName() == false. This second approach is
non-intrusive and works also with std::string.

Can we afford to not support NULL flag for strings in POCS class and encode
empty string as null string on wire? That simplifies the API and generated
code.

Do we need an option to set NULL flag?

It is easy to add also convenience setter for consistency with builders and
primitive types.

void setName(kj::String &&);

I believe structs and lists can stick to the same API as strings.

Groups can be just a view into POCS class without value semantics.
Internally it will be pointer / reference to owning POCS instance. If POCS
instance is deleted or if group sits in union section that is invalidated,
group simply points to invalid memory. We could track group instances from
POCS class and clear the pointers to trade speed / nice exceptions instead
of crash. But first version can be simply without unions and groups.

Any suggestions?

Kind regards
Brano

Kenton Varda

2016-03-25 22:36:50 UTC

Permalink

Hi Brano,

Post by Branislav Katreniak
Getters and setters will make the code the api more consistent with Reader
and Builder. The prefixes `set` and `get` are great in generated code,
because they move all fields into kind of namespace. No collisions with
language keywords, no collisions with generic methods like `clone()`.

That's a good point. Though, we already work around language keyword
collisions by appending a trailing underscore to conflicting names.

Post by Branislav Katreniak
What about API for groups? Can they be pure references into parent POCS
class without owned memory? If not, I have hard time to imagine how to
effectively reuse serialized format for group POCS class. For me, this
looks like a show stopper for serialized format in POCS classes.

I think it would be reasonable for group accessors to return a reference.

For that matter, sub-message accessors would probably return a reference
too, except that presumably there'd be a method you can call to have the
sub-message disowned which would then return Own<T>. The disown method
would not be available for groups.

Post by Branislav Katreniak
As I am trying to implement generator for POCS classes, I have few
questions.

It's cool that you're working on this. Note that I tend to have strong
opinions on APIs, so it might be a good idea to write up a doc or something
with a proposed API before going too far into implementation. :)

Post by Branislav Katreniak
What is good name for POCS class in source code? The generated code is
placed in class that was called "outer" class now. It would be good to come
with terminology that can be used also in documentation.

I think the outer class itself (which currently behaves only as a
namespace) should be the POCS type. This makes the POCS interface very
natural to use, which is its goal in life after all.

Post by Branislav Katreniak
What is api to primitive fields? It is not possible to return reference to
primitive type, because it can have different byte order and it may be
xored with default value. It needs either setter and getter or proxy type.
I start simple with setter and getter. Proxy type can be introduced later.
uint32_t getNumber() const;
void setNumber(uint32_t);

I think it's important to settle on one API now -- I don't want to end up
with multiple ways of doing things.

I think the "plain old fields" approach is a nicer API than accessors if we
can make it work, so the question is: is there any reason we can't make it
work? I think we have to go through all of the features and think about
whether there is an issue.

I just realized: We don't need proxies. We can use regular integer types,
and we say that endianness is translated during the copy between wire
format and POCS. Since almost all CPUs are little-endian, on almost all
CPUs we'll still be able to use a memcpy() -- only on big-endian CPUs will
the translation have to fall back to handling each field. I think this is a
far better plan than using proxy types in POCS since proxies have so many
problems.

So let's list out some things:

Primitives:
* Void: C++ does not let you declare zero-width fields, but since voids
don't affect the ultimate encoding we could pull the void fields out to the
beginning or end of the structure.
* Boolean: Use bitfields.
* Integers/Floats: Use regular types.
* Enums: Use C++11 enum classes with uint16_t as the base type. In fact the
enum types we're already generating should work for this.

Pointers: No need for perfect alignment here since we obviously can't
memcpy() them anyway.
* Text: kj::String
* Data: kj::Array<kj::byte>
* List(T): kj::Vector<T>. This implies bool lists will expand to
byte-per-element in POCS format, not bit-per-element as they are on the
wire. That's probably OK; these aren't used very often. (Of course,
std::vector<bool> would keep them as bits but everyone seems to agree that
was a disaster.)
* Structs: kj::Own<T>
* Capabilities: T::Client (I wonder if we should move T::Client's members
into T and make T::Client be an alias for backwards-compatibility?)

Null pointers: I suppose that String, Array, Vector, and Own will all need
to support comparison with nullptr. String and Array do already, but they
are equivalent to comparing with the empty string / array. This is arguably
incorrect, although in practice I'd say programs should avoid
distinguishing between null and empty. If we decide this needs to be
distinguishable, then we probably need to introduce new types here.
Probably, we'd use capnp::Text and capnp::List<T>, which is arguably more
consistent with the rest of the API anyhow.

Groups: Described previously. The whole struct would be an anonymous union
containing an anonymous struct of the top-level fields as well as named
structs for each group, carefully padded to align with each other.

Unions: This is tricky. Possibilities:
- Use a proxy type. Since a union member could itself be a struct, and
there's no way to proxy arbitrary members, we'd probably need to use
operator->(): foo.unionField->member. But then it's hard for us to tell
whether the union member is being accessed for read or for write, which is
important because on read we want to throw an exception if it's not the
active member and on write we want to make it active. We'd probably have to
assume the latter (unless the struct is const).
- Use an accessor method returning a reference. This is similar to the
previous point but instead of foo.unionField->member you'd have
foo.unionField().member. Unclear whether this is more or less confusing.
- Revert all the way to getFoo(), setFoo(), initFoo(). Sadness.

Another issue with Unions is how to handle "which". It seems like this
should be a method too, since we don't want people directly overwriting the
union discriminant.

Structs containing unions will need to have non-default constructors,
destructors, copy, and assignment to deal with any unioned pointers.

AnyPointer: We can have capnp::AnyPointer be a class with some arbitrary
interface for this. It would probably need to store an encoded Cap'n Proto
blob behind the scenes, which it could decode on-demand.

AnyList/AnyStruct: kj::Own<AnyList>/kj::Own<AnyStruct>, otherwise similar
to AnyPointer.

Generics: I think this is straightforward. We'll need a template typedef
Pointer<T> which expands to:
* Pointer<List<T>> -> kj::Vector<T>
* Pointer<T> | T is a capability -> T::Client
* Pointer<T> | T is a struct -> kj::Own<T>
* Pointer<AnyPointer> -> AnyPointer
* Pointer<Text> -> kj::String
* Pointer<Data> -> kj::Array<kj::byte>

So the most difficult parts are groups (seem to work, but weird) and unions
(API is inconsistent). There is also an open question about whether to use
kj::String, kj::Array, and kj::Vector vs. defining new types capnp::Text,
capnp::Data, and capnp::List<T>. I'm actually starting to lean towards the
latter for consistency's sake. We could design them to mostly reuse the KJ
types under the hood.

I think overall this strategy seems doable.

Post by Branislav Katreniak
What is api to string field? Minimal approach that need is intrusive into
string class
kj::String& getName();
const kj::String& getName() const { return _name; }
kj::String needs to be extend with null field.

FWIW if we use accessors, I'd want to have the same set of accessors that
builders have today, so get, set, etc:

bool hasName() const;
kj::StringPtr getName() const;
void setName(kj::StringPtr);
void setName(kj::String&&);
kj::String disownName();

I would not want methods that return references, since returning a
reference defeats a lot of the purpose of accessors -- it allows someone
external to subsequently change the value of the struct (through the
reference) without any chance for the class to detect this change.

Post by Branislav Katreniak
Second approach
bool hasName() const;
kj::String& getName();
const kj::String& getName() const;
Non-const getter sets name to non-null. Const getter can return reference
to global instance if hasName() == false. This second approach is
non-intrusive and works also with std::string.
Can we afford to not support NULL flag for strings in POCS class and
encode empty string as null string on wire? That simplifies the API and
generated code.
Do we need an option to set NULL flag?
It is easy to add also convenience setter for consistency with builders
and primitive types.
void setName(kj::String &&);
I believe structs and lists can stick to the same API as strings.
Groups can be just a view into POCS class without value semantics.
Internally it will be pointer / reference to owning POCS instance. If POCS
instance is deleted or if group sits in union section that is invalidated,
group simply points to invalid memory. We could track group instances from
POCS class and clear the pointers to trade speed / nice exceptions instead
of crash. But first version can be simply without unions and groups.
Any suggestions?
Kind regards
Brano
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.
As I am trying to implement generator for POCS classes, I have few
questions.
What is good name for POCS class in source code? The generated code is
placed in class that was called "outer" class now. It would be good to come
with terminology that can be used also in documentation.
What is api to primitive fields? It is not possible to return reference to
primitive type, because it can have different byte order and it may be
xored with default value. It needs either setter and getter or proxy type.
I start simple with setter and getter. Proxy type can be introduced later.
uint32_t getNumber() const;
void setNumber(uint32_t);
What is api to string field? Minimal approach that need is intrusive into
string class
kj::String& getName();
const kj::String& getName() const { return _name; }
kj::String needs to be extend with null field.
Second approach
bool hasName() const;
kj::String& getName();
const kj::String& getName() const;
Non-const getter sets name to non-null. Const getter can return reference
to global instance if hasName() == false. This second approach is
non-intrusive and works also with std::string.
Can we afford to not support NULL flag for strings in POCS class and
encode empty string as null string on wire? That simplifies the API and
generated code.
Do we need an option to set NULL flag?
It is easy to add also convenience setter for consistency with builders
and primitive types.
void setName(kj::String &&);
I believe structs and lists can stick to the same API as strings.
Groups can be just a view into POCS class without value semantics.
Internally it will be pointer / reference to owning POCS instance. If POCS
instance is deleted or if group sits in union section that is invalidated,
group simply points to invalid memory. We could track group instances from
POCS class and clear the pointers to trade speed / nice exceptions instead
of crash. But first version can be simply without unions and groups.
Any suggestions?
Kind regards
Brano
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Branislav Katreniak

2016-03-28 20:12:03 UTC

Permalink

This post might be inappropriate. Click to display it.

Branislav Katreniak

2016-03-31 15:00:49 UTC

Permalink

Post by Kenton Varda
I think the outer class itself (which currently behaves only as a
namespace) should be the POCS type. This makes the POCS interface very
natural to use, which is its goal in life after all.

Outer class is very nice place. But I consider this capnp declaration:

struct A {
bi @0 : B.I;
interface I {
}
}

struct B {
ai @0 : A.I;
interface I {
}
}

How to compile this into C++ classes? Class A declaration needs B.I fully
declared before. Class B declaration needs A.I fully declared before. But
there is no way to declare nested class in C++ before parent class
declaration. My interpretation is that outer class must stay only as
namespace. That leads to question how to name the POCS class.

Kind regards
Brano

Kenton Varda

2016-04-01 23:06:25 UTC

Permalink

Post by Branislav Katreniak
Hi Kenton

Post by Kenton Varda
I would not want methods that return references, since returning a

reference defeats a lot of the purpose of accessors
I fully agree with this. But is there any difference between returning
reference to member variable and publishing member variable directly? I
personally have no experience with proxy types. I prefer to choose method
returning reference against proxy type just because there is less C++ dark
magic involved.
I love your point that byte order and XORing default value is best handled
at conversion to / from reader and builder.
The proposal is nice. I don't really understand ANY pointers in capnproto
yet, so these parts I will sink into me later. But I have a proposal for
* UnionEnum which() const - like Reader
* bool isFoo() const - like Reader
* FooType& foo() - throws if which is not FOO. Returns reference to foo.
* const FooType& foo() const - throws if which is not FOO. Returns const
reference to foo.
* FooType& initFoo() - sets which to foo, returns reference.
I really dislike to act differently on const and non-const object. That is
hard to think about.

Let's focus on non-unions for now while we keep thinking about this.

I have some crazy ideas forming for how we could make proxies work for
unions that I'd like to play with. It occurs to me that my worry about
struct members was not correct since a struct field would be Own<T> anyway.

Post by Branislav Katreniak

Post by Kenton Varda
I'd say programs should avoid distinguishing between null and empty.

I would like to not give an option to distinguish between NULL and empty
for lists, strings and data. I consider it bad design. If it is important,
it is better to express the difference in separate bool.

Well, using null is common as a way to implement Maybe<T>. Adding a
separate boolean is not great because it can be inconsistent -- I would
actually recommend using a two-member union instead. But either of these
approaches adds overhead, making relying on null pointers attractive.

Arguably, we should extend the language to support an explicit Maybe(T)
type which we could then translate into kj::Maybe. Existing protocols which
rely on null would need to transition over to Maybe(T), but it would be a
backwards-compatible change.

Post by Branislav Katreniak
FWIW if we use accessors, I'd want to have the same set of accessors that
Let's stick to naked public member fields for now. It generates little
code and it is the fastest solution.
Can you, please, make API proposal for conversion from reader and to
builder?

Builders should have field setters from POCS types. MessageBuilder should
also allow POCS as an input to setRoot().

For Reader/Builder -> POCS, two options:

1) Support copy constructor / assignment operator. However, this is a
little weird since the copy would do allocation, and we prefer to avoid
implicit allocation in KJ/capnp code.

2) Have a asNative() or asPocs() method on Reader/Builder which creates
POCS objects. Requires more tying but makes allocation explicit.

Post by Branislav Katreniak
My attack plan is plan is to start with code that has the right API but is
as simple as possible to be correct. Optimizations as phase 2.
I am still curious about your position for an option to use std types in
POCS classes. I understand that you don't want to use it yourself. But
these POCS classes infiltrate much deeper into application logic than
reader and builder. And they will look weird if the team uses std classes
everywhere else. Do you think that kj::Own, kj::String and kj::Vector
provide real benefit for POCS classes over std::unique_ptr, std::string and
std::vector? I believe that it may help capnproto adoption if it plays more
nicely with std code.

I think there is quite a lot wrong with std::string, std::vector, and
std::unique_ptr, which is why I wrote my own versions. For example:
- std::string is designed to support reference counting and copy-on-write,
which is now broadly understood to have been a mistake, but it's impossible
to eliminate the weird specification quirks now.
- std::unique_ptr uses template polymorphism to support custom allocators.
It should have used virtual method dispatch polymorphism. Template
polymorphism means your code must always declare exactly which allocator is
allowed or be templatized itself. This has negative implications for our
POCS types: it would be neat to implement an optimization in which Data,
Text, and List(Primitive) fields inside the message could actually point
back to the original input rather than make a copy, but that requires
control over how they are deallocated.

Actually, thinking about it, I think that our POCS types should use neither
std nor KJ types. Instead, we should have special types which we control:

- Pointer<T> for structs (not kj::Own).
- List<T> for lists (not kj::Vector).
- Text and Data for blobs (not kj::String nor kj::Array).

This way, we can potentially customize these interfaces if desired to
support features like:
- Zero-copy blob references (pointers into the original message reader).
- Lazy parsing, i.e. only parse a sub-struct when first accessed. (We
probably don't actually want this, but it's nice to be able to change our
minds later.)
- Null pointer comparisons.
- Implicit conversions to appropriate std types for convenience (e.g. the
way Text::Reader today can implicitly convert to std::string).

Post by Branislav Katreniak
I think the outer class itself (which currently behaves only as a

Post by Kenton Varda
namespace) should be the POCS type. This makes the POCS interface very
natural to use, which is its goal in life after all.

struct A {
interface I {
}
}
struct B {
interface I {
}
}
How to compile this into C++ classes? Class A declaration needs B.I fully
declared before. Class B declaration needs A.I fully declared before. But
there is no way to declare nested class in C++ before parent class
declaration. My interpretation is that outer class must stay only as
namespace. That leads to question how to name the POCS class.

Ugh.

This is probably exceedingly rare in practice. It would be sad to make the
API harder for everyone just to cover this one obscure case.

What if we ignore this problem for now, but plan that if it comes up for
real in the future, we will make the code generator resolve the cycle by
injecting a proxy type?

-Kenton

Lee Clagett

2016-04-02 04:44:46 UTC

Permalink

On Fri, 1 Apr 2016 16:06:25 -0700

On Mon, Mar 28, 2016 at 1:12 PM, Branislav Katreniak

Post by Branislav Katreniak
Hi Kenton

[...]

Post by Branislav Katreniak
My attack plan is plan is to start with code that has the right API
but is as simple as possible to be correct. Optimizations as phase
2.
I am still curious about your position for an option to use std
types in POCS classes. I understand that you don't want to use it
yourself. But these POCS classes infiltrate much deeper into
application logic than reader and builder. And they will look weird
if the team uses std classes everywhere else. Do you think that
kj::Own, kj::String and kj::Vector provide real benefit for POCS
classes over std::unique_ptr, std::string and std::vector? I
believe that it may help capnproto adoption if it plays more nicely
with std code.

I think there is quite a lot wrong with std::string, std::vector, and
- std::string is designed to support reference counting and
copy-on-write, which is now broadly understood to have been a
mistake, but it's impossible to eliminate the weird specification
quirks now.

std::string reference counted and COW implementations are not permitted
in C++11 or newer. Gcc 5.1 has an ABI breakage for this and std::list
complexity changes [0]. Capnproto supports the Gcc 4.x versions, so
the non-consistency does stink.

- std::unique_ptr uses template polymorphism to support custom
allocators. It should have used virtual method dispatch polymorphism.
Template polymorphism means your code must always declare exactly
which allocator is allowed or be templatized itself. This has
negative implications for our POCS types: it would be neat to
implement an optimization in which Data, Text, and List(Primitive)
fields inside the message could actually point back to the original
input rather than make a copy, but that requires control over how
they are deallocated.

The templated deleter allows for the empty-base-class optimization.
unique_ptr with the default deleter has identical size requirements to
a raw pointer. Any type-erased deleter must have storage, and therefore
will always take up more space. unique_ptr delete can be made a
polymorphic with a simple two-liner:

template<typename T>
using unique_poly_ptr = std::unique_ptr<T, std::function<void(T*)>>;

Although the move constructor of std::function is _not_ `noexcept`, so
resource leaks are possible without a wrapper that uses swap. And a
`make_poly_ptr` function would help prevent resource leaks in case the
std::function initially throws on construction. Halfway to writing an
entirely new unique_ptr anyway. Would be dead-simple if std::function
had `noexcept` moves.

Lee

[0]https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html

Branislav Katreniak

2016-04-06 08:16:29 UTC

Permalink

Post by Kenton Varda
Arguably, we should extend the language to support an explicit Maybe(T)
type which we could then translate into kj::Maybe. Existing protocols which
rely on null would need to transition over to Maybe(T), but it would be a
backwards-compatible change.

It would be great to make it explicit in IDL that given Text / Data can be
optional. If not set, empty text / data can be serialized as null pointer.

If I read the code correctly, kj::Maybe(T) allocates T on heap, forcing
extra pointer lookup.

- List<T> for lists (not kj::Vector).
As of now, List<T> cannot be used with forward declared T.

- Text and Data for blobs (not kj::String nor kj::Array).
As of now, it is possible to just make capnp::Text subclass of kj::String
and capnp::Data subclass of kj::Array and it works.

Post by Kenton Varda

Post by Branislav Katreniak
struct A {
interface I {
}
}
struct B {
interface I {
}
}
How to compile this into C++ classes? Class A declaration needs B.I fully
declared before. Class B declaration needs A.I fully declared before. But
there is no way to declare nested class in C++ before parent class
declaration. My interpretation is that outer class must stay only as
namespace. That leads to question how to name the POCS class.

Ugh.
This is probably exceedingly rare in practice. It would be sad to make the
API harder for everyone just to cover this one obscure case.
What if we ignore this problem for now, but plan that if it comes up for
real in the future, we will make the code generator resolve the cycle by
injecting a proxy type?

It is not that easy to ignore this problem now. It comes at many places, I
highlighted here just one case.

POCS type needs for its declaration
* forward declaration of Foo to declare member of struct type Foo
* full declaration of Foo::Client to declare member of interface type Foo
* forward declaration of enum Foo to declare member of enum type Foo

This is also problem.

struct A {
bx @0 : B.X;
}

struct B {
ax @0 : A.X;
struct X {
}
}

What does it mean to ignore this problem? Do we compile only POCS classes
that use only previously declared types? Everything else will use proxy
type?

Kind regards
Brano

Kenton Varda

2016-04-08 19:55:52 UTC

Permalink

Post by Branislav Katreniak
If I read the code correctly, kj::Maybe(T) allocates T on heap, forcing
extra pointer lookup.

No, it doesn't. It uses placement-new.

- List<T> for lists (not kj::Vector).

Post by Branislav Katreniak
As of now, List<T> cannot be used with forward declared T.

Is this solved by specifying explicitly List<T, Kind::STRUCT>? We could
easily do that in generated code.

- Text and Data for blobs (not kj::String nor kj::Array).

Post by Branislav Katreniak
As of now, it is possible to just make capnp::Text subclass of kj::String
and capnp::Data subclass of kj::Array and it works.

That seems reasonable.

What does it mean to ignore this problem? Do we compile only POCS classes

Post by Branislav Katreniak
that use only previously declared types? Everything else will use proxy
type?

Never mind. Better solution to the problem: Declare all classes at the top
level, then make the inner type names be typedefs. This is what protobuf
does, actually.

struct A;
struct A_X;
...

struct A {
typedef A_X X;
...
}

-Kenton

Branislav Katreniak

2016-04-11 11:02:52 UTC

Permalink

Looking at where these POCS types are heading, I would like to step back in
this discussion. Thinking about my requirements, I don't really need POCS
classes. All I need is classes usable for mutable state.

I see two problems why current code is not really usable for mutable:
1. arena allocations are leaked when memory is released
2. builders have no way to effectively resize lists. Lists need to be
extended with concept of capacity

The 1st problem can be solved by using introducing special MallocArena that
uses malloc for each allocation. Builders don't own the memory they point
to, but capnp already has a concept to own memory for builders and readers:
Orhan. Orhans cannot outlive their arena, but MallocArena never goes of
scope. MallocArena introduces complications because it allocates from big
address space, but that should be workable. Special class and support
methods for Orhan in MallocArena can be introduced.

The 2nd problem requires tweaks in list layout. It is possible to restrict
this new layout for MallocArena allocated builders and to let resize
operations assert / always realloc for non MallocArena builders. But it is
possible to push this to all builders the moment when the first
reallocation happens.. It allows optimizations where replacing string can
be done in place. Actually List of struct (list pointer block C set to 7)
can support capacity by storing capacity in list pointer block D and real
size in content prefix "tag".

Using these classes for purely mutable state will not be as fast as true
POCS classes. But it generates little new code. And the mutable state can
be passed to any existing reader and builder without conversion.

A bit off topic, but I am talking about generated code size ... would it
make sense to make struct Builder subclass of struct Reader? The reader
methods would be reused in Builder.

Thoughts?

Kind regards
Brano

Post by Kenton Varda

Post by Branislav Katreniak
If I read the code correctly, kj::Maybe(T) allocates T on heap, forcing
extra pointer lookup.

No, it doesn't. It uses placement-new.
- List<T> for lists (not kj::Vector).

Post by Branislav Katreniak
As of now, List<T> cannot be used with forward declared T.

Is this solved by specifying explicitly List<T, Kind::STRUCT>? We could
easily do that in generated code.
- Text and Data for blobs (not kj::String nor kj::Array).

Post by Branislav Katreniak
As of now, it is possible to just make capnp::Text subclass of kj::String
and capnp::Data subclass of kj::Array and it works.

That seems reasonable.
What does it mean to ignore this problem? Do we compile only POCS classes

Post by Branislav Katreniak
that use only previously declared types? Everything else will use proxy
type?

Never mind. Better solution to the problem: Declare all classes at the top
level, then make the inner type names be typedefs. This is what protobuf
does, actually.
struct A;
struct A_X;
...
struct A {
typedef A_X X;
...
}
-Kenton

Kenton Varda

2016-04-15 03:00:30 UTC

Permalink

Post by Branislav Katreniak
Looking at where these POCS types are heading, I would like to step back
in this discussion. Thinking about my requirements, I don't really need
POCS classes. All I need is classes usable for mutable state.
1. arena allocations are leaked when memory is released
2. builders have no way to effectively resize lists. Lists need to be
extended with concept of capacity
The 1st problem can be solved by using introducing special MallocArena
that uses malloc for each allocation. Builders don't own the memory they
point to, but capnp already has a concept to own memory for builders and
readers: Orhan. Orhans cannot outlive their arena, but MallocArena never
goes of scope. MallocArena introduces complications because it allocates
from big address space, but that should be workable. Special class and
support methods for Orhan in MallocArena can be introduced.

A simple start to this is to create a MessageBuilder subclass that always
allocates the minimum size passed to the allocateSegment() method. This
will effectively force every allocation to create a new segment.

You could then implement an optimization where whenever an object is
deleted that happens to be the last object in a segment,
SegmentBuilder::tryTruncate() is used to free the space. This would be a
reasonable optimization to have in general.

Then, as one more optimization: if tryTruncate() truncates the segment to
zero-size, perhaps the whole segment can simply be deleted. And perhaps,
later, when a new segment is allocated, it can replace a previously-deleted
segment instead of being appended to the end.

Now you've solved the arena problem, with only some minor changes that are
reasonable optimizations as-is.

However, this approach will suffer from the fact that all pointers will be
far pointers, which use more space and are slower to dereference.

I'm not sure there's any better option, though. I really don't want Cap'n
Proto to grow a whole internal implementation of malloc() that applies
specifically within a message.

Post by Branislav Katreniak
The 2nd problem requires tweaks in list layout. It is possible to
restrict this new layout for MallocArena allocated builders and to let
resize operations assert / always realloc for non MallocArena builders. But
it is possible to push this to all builders the moment when the first
reallocation happens.. It allows optimizations where replacing string can
be done in place. Actually List of struct (list pointer block C set to 7)
can support capacity by storing capacity in list pointer block D and real
size in content prefix "tag".

Perhaps you could exploit the fact that INLINE_COMPOSITE-type lists
separately specify "total words" and "number of elements", where the former
could in fact be much larger than is needed by the latter. You could
over-allocate space and then increase the element count incrementally.

Note that list builders do not keep track of the locations of the pointer
to the list nor the list tag, and I'd rather not add any new fields to this
type as it is supposed to be a pass-by-value type. So, you'll need some
sort of ResizeableList which contains both the list builder and an
AnyPointer::Builder pointing back to the list's pointer. That doesn't seem
too bad, though.

Post by Branislav Katreniak
Using these classes for purely mutable state will not be as fast as true
POCS classes. But it generates little new code. And the mutable state can
be passed to any existing reader and builder without conversion.

Note that if this feature is going to be wholly obsoleted by POCS then that
strongly argues against implementing it at all. I don't want to add
something to the library that turns out to be totally useless a few months
later when we implement POCS. And I do think POCS is likely to get
implemented within a few months -- I've been itching to do it for a while,
and I anticipate having some breathing room in my workload soon.

Post by Branislav Katreniak
A bit off topic, but I am talking about generated code size ... would it
make sense to make struct Builder subclass of struct Reader? The reader
methods would be reused in Builder.

This would be a very large refactoring and I'm pretty sure there's a good
reason I didn't do it that way in the first place, though I don't know the
reason off the top of my head.

Keep in mind that the get() methods of a Builder have slightly different
semantics from those of a Reader -- for struct-typed fields, get() will
initialize the pointer to be non-null.

-Kenton

Branislav Katreniak

2016-04-15 08:07:09 UTC

Permalink

Post by Kenton Varda
However, this approach will suffer from the fact that all pointers will be
far pointers, which use more space and are slower to dereference.
I'm not sure there's any better option, though. I really don't want Cap'n
Proto to grow a whole internal implementation of malloc() that applies
specifically within a message.

Copying from Reader can happen into one segment. Further modifications will
lead to one segment per allocation. This looks like good compromise to me.

Post by Kenton Varda
Note that list builders do not keep track of the locations of the pointer
to the list nor the list tag, and I'd rather not add any new fields to this
type as it is supposed to be a pass-by-value type. So, you'll need some
sort of ResizeableList which contains both the list builder and an
AnyPointer::Builder pointing back to the list's pointer. That doesn't seem
too bad, though.

It is good idea to separate List and Resizeable list (like kj::Array and
kj::Vector). ResizeableList can be limited to case where the list is in its
own segment. So segment size easily holds the list capacity.

Post by Kenton Varda
Note that if this feature is going to be wholly obsoleted by POCS then
that strongly argues against implementing it at all. I don't want to add
something to the library that turns out to be totally useless a few months
later when we implement POCS. And I do think POCS is likely to get
implemented within a few months -- I've been itching to do it for a while,
and I anticipate having some breathing room in my workload soon.

Great! I am not against POCS classes, I like them. But I realized that POCS
classes are not good task for me. It is a lot of work and there is a too
much design work to create something acceptable by you :)

On the other side I learned my lessons to never count on non existing code
from 3rd party. So I am happy to have my own attack plan. Looking at
current priorities, I am likely to look on adding Promise into IDL next.

Thank you for your great feedback!

Kind regards
Brano

m***@gmail.com

2018-10-04 04:59:53 UTC

Permalink

Hi everyone,

Is there any news on this topic? Is Anybody working on a PODS implementation?

I'm looking forward to this feature.

Best regads
Thomas

'Kenton Varda' via Cap'n Proto

2018-10-04 19:27:25 UTC

Permalink

Hi Thomas,

At the moment, no one is actively working on this. It's on my list of
features that I really want to build, but I seem to have too many projects
and not enough time. :/

-Kenton

Post by m***@gmail.com
Hi everyone,
Is there any news on this topic? Is Anybody working on a PODS
implementation?
I'm looking forward to this feature.
Best regads
Thomas
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Branislav Katreniak

2016-07-07 09:29:16 UTC

Permalink

Hi Kenton

I would like to forward declare capnp generated client class. AFAIK this is
not possible because it is generated as inner class. But this part of your

Post by Kenton Varda
* Capabilities: T::Client (I wonder if we should move T::Client's members

into T and make T::Client be an alias for backwards-compatibility?)

Are you willing to review & merge this change if I implement it?

Kind regards
Brano

Kenton Varda

2016-07-08 05:21:15 UTC

Permalink

Hi Brano,

Sorry, I'm uncomfortable carrying out that change without more research /
experimentation.

A change I would be more comfortable with is to declare T_Client as a
top-level class with T::Client then being an alias to T_Client. This is
what Protobuf does with nested types. This way you can then forward-declare
T_Client.

Thoughts?

-Kenton

Post by Branislav Katreniak
Hi Kenton
I would like to forward declare capnp generated client class. AFAIK this
is not possible because it is generated as inner class. But this part of

Post by Kenton Varda
* Capabilities: T::Client (I wonder if we should move T::Client's

members into T and make T::Client be an alias for backwards-compatibility?)
Are you willing to review & merge this change if I implement it?
Kind regards
Brano
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Branislav Katreniak

2016-07-08 07:09:48 UTC

Permalink

Hi Kenton.

That works for me and I like it. Moreover, it will be possible to
consistently apply to all nested classes.

I am likely to look at this in August.

Thank you!
Brano

Post by Kenton Varda
Hi Brano,
Sorry, I'm uncomfortable carrying out that change without more research /
experimentation.
A change I would be more comfortable with is to declare T_Client as a
top-level class with T::Client then being an alias to T_Client. This is
what Protobuf does with nested types. This way you can then forward-declare
T_Client.
Thoughts?
-Kenton

Post by Branislav Katreniak
Hi Kenton
I would like to forward declare capnp generated client class. AFAIK this
is not possible because it is generated as inner class. But this part of

Post by Kenton Varda
* Capabilities: T::Client (I wonder if we should move T::Client's

M. Taha

2016-03-13 02:27:17 UTC

Permalink

Hi All,
I agree with Kenton, Please consider implementing POCS in near future
(hopefully in next release). It'll be very helpful in my use case.
About performance Issue: I think It would perform better in some use-cases
like exchanging messages between processes running in the same machine
using shared memory mapping.
And the generated code will be very simple and much readable/understandable
to more audience.
I think it would bring a new usage to the library as a code generator
regardless of serialization and RPC (even liter than CAPNP_LITE).

keep doing well.
--
------------------------------
*Web:*
* www.must.edu.eg <http://www.must.edu.eg/>*
*Facebook:*

* facebook.com/mustuni <http://facebook.com/mustuni>*
*Twitter:*
twitter.com/must_university
*Instagram: instagram.com/mustuni <http://instagram.com/mustuni>*
*Pinterest: pinterest.com/mustuni <http://pinterest.com/mustuni>*
------------------------------
*Think Green â Please do not print this email unless you really need to.*
*- This email message is intended for the use of the person to whom it
has been sent, and may contain information that is confidential or legally
protected. If you are not the intended recipient or have received this
message in error, you are not authorized to copy, distribute, or otherwise
use this message or its attachments. Please notify the sender immediately
by return e-mail and permanently delete this message and any attachments.
Misr University for Science & Technology makes no warranty that this email
is error or virus free. Thank you.*

r***@gmail.com

2016-04-09 05:45:03 UTC

Permalink

I would be pleased to see a POCS implementation (reading the Cap'n docs I
was a bit dismayed by the lack of ability to practically use it to define a
schema for your "live" data - it seems like that way you're saving
serialize/deserialize time only hypothetically, while in practice you're
just making it so you're serializing your data into a Cap'n message *manually
*every time you approach a wire / process boundary / storage medium)

On the other hand, POCS support makes it more likely that you end up doing
actual serialize-deserialize operations again, and weakens Cap'n's claim of
being infinitely faster. :)

I'm mostly chiming in here, though, to suggest that if you're going to be
generating code for POCS conversions, it would be nice to make this
something you enable explicitly per file or per message. One of the selling
points of Cap'n is the faster build times and reduced code vs. Protobuf,
and this change reduces that advantage. If you make it only do POCS based
on a flag or annotation of some kind, it makes it possible to keep, eg. RPC
messages in their pure builder/reader implementation, but also have POCS
translators for those structs that you plan to use for internal state.
(Also, perhaps specified per-language - one might want a native struct in
C++, but only need the builder/reader model on the Python side or whatever.)

For bonus points it would be nice if it was possible to use per-field
annotations to specify special alternative conversions. It's always
frustrating with Protobufs to have a map-style list of key-value pairs
where the key isn't a primitive type, so it can't be translated into a
std::map or similar. (Also, eg. distinguishing intent between std::map or
std::multimap.) Being able to specify "include these header files in the
generated POCS code, and use these functions for converting this field"
would neatly cover all such annoying cases. Even better, if it was
implemented with this support then the standard primitive POCS conversion
could be built upon this mechanism, with the standard conversion functions
just being the default values for the conversion annotations.