'Sebastien Diot' via Cap'n Proto
2017-09-23 17:59:34 UTC
Hi.
Short version:
Can I "drop the last segment", if it only contains orphans?
Long version:
I've got kind of a "mental dead-lock" trying to decide how to store data
for my new project. Currently, I'm considering if I can use Lightning
Memory-Mapped Database (LMDB), together with Cap'n Proto.
I've seen "mmap-friendly mutable storage format" in the RoadMap, but I'm
not sure what it precisely means.
My concept would look about like this:
1) To be able to store multiple "tables" and "indices" and update them in a
transactional way in LMDB, they must all be in the same LMDB file. So, I
would first add a fixed-size prefix (i.e. 2 bytes) to my keys, as a
table/index-ID.
2) Then comes the row/entity ID, say 8 bytes.
3) Then comes the *segment ID* (4 bytes, AFAIK, based on the Cap'n Proto
documentation).
That way, I'm hoping I could spread a multi-segment Cap'n Proto message
over multiple keys in LMDB, and be able to both read and update my data
while limiting the need to make memory copies. I haven't actually tried
this, so it's all theoretical, but it "sounds possible" (correct me if I'm
wrong).
Currently, the main problem I see, is that you cannot append to lists. You
have to replace the list with a bigger one, thereby creating an orphan,
thereby "wasting memory". Or possibly, if that fits your use-case, create a
"linked-list" of the lists, one after the other. But that way you loose
O(1) access to lists elements, as you first have to iterate over the
"linked list", to get at the desired list.
One way for me to map the data of my "largest" entity, and the changes to
it, would be to have a variable-size map (key-value pair), containing
"changes".
I would start by allocating a segment that would be exactly the right size
for all the data, without the changes map.
Then I would add a "change map" with some small start size, for example 64
entries. This would cause the creation of a new segment (and a second
key-value pair in the DB). When the changes map fills, I would make a
temporary copy, make the map an orphan, drop the last segment (which only
contains an orphan), create a new map twice the size, causing the
allocation of a new segment, and copy the temp data back into it.
While the "changes map" would need an occasional copy, the main part of the
message, the first segment, would not, and by "replacing" the last segment,
I would not be "wasting memory", while also not needing to make a full copy
of the message to eliminate space wasted by orphans.
So,
A) Can I "drop the last segment", if it only contains orphans?
B) Is there any obvious reason I could not (effectively) use Cap'n Proto
with a memory-mapped DB, assuming A) is possible, and my "entities" have
maximum one "variable-sized" part?
Regards,
Sebastien Diot
Short version:
Can I "drop the last segment", if it only contains orphans?
Long version:
I've got kind of a "mental dead-lock" trying to decide how to store data
for my new project. Currently, I'm considering if I can use Lightning
Memory-Mapped Database (LMDB), together with Cap'n Proto.
I've seen "mmap-friendly mutable storage format" in the RoadMap, but I'm
not sure what it precisely means.
My concept would look about like this:
1) To be able to store multiple "tables" and "indices" and update them in a
transactional way in LMDB, they must all be in the same LMDB file. So, I
would first add a fixed-size prefix (i.e. 2 bytes) to my keys, as a
table/index-ID.
2) Then comes the row/entity ID, say 8 bytes.
3) Then comes the *segment ID* (4 bytes, AFAIK, based on the Cap'n Proto
documentation).
That way, I'm hoping I could spread a multi-segment Cap'n Proto message
over multiple keys in LMDB, and be able to both read and update my data
while limiting the need to make memory copies. I haven't actually tried
this, so it's all theoretical, but it "sounds possible" (correct me if I'm
wrong).
Currently, the main problem I see, is that you cannot append to lists. You
have to replace the list with a bigger one, thereby creating an orphan,
thereby "wasting memory". Or possibly, if that fits your use-case, create a
"linked-list" of the lists, one after the other. But that way you loose
O(1) access to lists elements, as you first have to iterate over the
"linked list", to get at the desired list.
One way for me to map the data of my "largest" entity, and the changes to
it, would be to have a variable-size map (key-value pair), containing
"changes".
I would start by allocating a segment that would be exactly the right size
for all the data, without the changes map.
Then I would add a "change map" with some small start size, for example 64
entries. This would cause the creation of a new segment (and a second
key-value pair in the DB). When the changes map fills, I would make a
temporary copy, make the map an orphan, drop the last segment (which only
contains an orphan), create a new map twice the size, causing the
allocation of a new segment, and copy the temp data back into it.
While the "changes map" would need an occasional copy, the main part of the
message, the first segment, would not, and by "replacing" the last segment,
I would not be "wasting memory", while also not needing to make a full copy
of the message to eliminate space wasted by orphans.
So,
A) Can I "drop the last segment", if it only contains orphans?
B) Is there any obvious reason I could not (effectively) use Cap'n Proto
with a memory-mapped DB, assuming A) is possible, and my "entities" have
maximum one "variable-sized" part?
Regards,
Sebastien Diot
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.