Discussion:
[capnproto] Considerations for larger, recursive datasets.
s***@gmail.com
2018-05-04 18:08:59 UTC
Permalink
Hello!

I'm interested in using Cap'n Proto for serializing a large-ish quad tree
data structure that is used for querying geospatial data. I have a few
questions that I was hoping you could help me out with:

- Have you come across any message size limitations or performance
issues with larger data in Cap'n Proto? I'd love to be able to represent
indices on the order of 100 MB+.

- In a post made on 8/1/14 ("Recursive Schemas"), you mentioned that
there can only be a single pointer to other structs -- is this still the
case? I would love to be able to have a pointer to parent nodes so that I
can traverse up through the quad tree.

- How are structs laid out in memory (via arena allocation) when working
with nested structs? Is it based on the order of "init" statements? I'd
like to maintain a breadth-first layout of nested structs in my serialized
output to maintain locality of nodes at a specific depth -- do I just need
to initialize the structs in the order at which I want them laid out in the
serialized representation?

Are there any other considerations I should take into account?

Thank you!!
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
'Kenton Varda' via Cap'n Proto
2018-05-04 19:44:27 UTC
Permalink
Hi Stefan,
Post by s***@gmail.com
Hello!
I'm interested in using Cap'n Proto for serializing a large-ish quad tree
data structure that is used for querying geospatial data. I have a few
- Have you come across any message size limitations or performance
issues with larger data in Cap'n Proto? I'd love to be able to represent
indices on the order of 100 MB+.
As long as you are mmap()ing in the file, the total size of the file will
have no bearing on performance -- the only thing that matters is how much
of the object tree you actually traverse.

Note that mmap() doesn't allow for any userspace compression. If you find
you're spending a lot of time in disk I/O, you may want to enable
compression at the filesystem level.
Post by s***@gmail.com
- In a post made on 8/1/14 ("Recursive Schemas"), you mentioned that
there can only be a single pointer to other structs -- is this still the
case? I would love to be able to have a pointer to parent nodes so that I
can traverse up through the quad tree.
Sorry, but capnp is still a tree structure, not a graph. You'll need to
remember parent node pointers on a stack as you traverse.
Post by s***@gmail.com
- How are structs laid out in memory (via arena allocation) when
working with nested structs? Is it based on the order of "init" statements?
I'd like to maintain a breadth-first layout of nested structs in my
serialized output to maintain locality of nodes at a specific depth -- do I
just need to initialize the structs in the order at which I want them laid
out in the serialized representation?
Yes, they will be ordered in memory in the order in which they were
allocated (which happens when you call "init").

Note you may want to tune the constructor parameters to
MallocMessageBuilder to make sure you are allocating large segments, to
avoid fragmentation near the start of the message. Or you may want to
implement your own MessageBuilder subclass.

Are there any other considerations I should take into account?
I'm assuming this is a data structure that you'll build once, and then use
many times without modifying it further. If so, it should work well. If you
need to continuously modify, Cap'n Proto isn't very good at that right now.

-Kenton
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
s***@gmail.com
2018-05-07 18:29:15 UTC
Permalink
Thanks for the support, Kenton!
Post by 'Kenton Varda' via Cap'n Proto
Hi Stefan,
Post by s***@gmail.com
Hello!
I'm interested in using Cap'n Proto for serializing a large-ish quad tree
data structure that is used for querying geospatial data. I have a few
- Have you come across any message size limitations or performance
issues with larger data in Cap'n Proto? I'd love to be able to represent
indices on the order of 100 MB+.
As long as you are mmap()ing in the file, the total size of the file will
have no bearing on performance -- the only thing that matters is how much
of the object tree you actually traverse.
Note that mmap() doesn't allow for any userspace compression. If you find
you're spending a lot of time in disk I/O, you may want to enable
compression at the filesystem level.
Post by s***@gmail.com
- In a post made on 8/1/14 ("Recursive Schemas"), you mentioned that
there can only be a single pointer to other structs -- is this still the
case? I would love to be able to have a pointer to parent nodes so that I
can traverse up through the quad tree.
Sorry, but capnp is still a tree structure, not a graph. You'll need to
remember parent node pointers on a stack as you traverse.
Post by s***@gmail.com
- How are structs laid out in memory (via arena allocation) when
working with nested structs? Is it based on the order of "init" statements?
I'd like to maintain a breadth-first layout of nested structs in my
serialized output to maintain locality of nodes at a specific depth -- do I
just need to initialize the structs in the order at which I want them laid
out in the serialized representation?
Yes, they will be ordered in memory in the order in which they were
allocated (which happens when you call "init").
Note you may want to tune the constructor parameters to
MallocMessageBuilder to make sure you are allocating large segments, to
avoid fragmentation near the start of the message. Or you may want to
implement your own MessageBuilder subclass.
Are there any other considerations I should take into account?
I'm assuming this is a data structure that you'll build once, and then use
many times without modifying it further. If so, it should work well. If you
need to continuously modify, Cap'n Proto isn't very good at that right now.
-Kenton
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
Loading...