[capnproto] Random access over network

Discussion:

b***@gmail.com

2018-08-16 17:32:10 UTC

Hi,

I'm investigating using Cap'n Proto as the basis for a format containing a
large collection of r-tree indexed data. The typical access pattern would
be to query the index resulting in a set of nodes in the tree. The
collection of data would be physically clustered on node indices so that
one can efficiently seek and read the data items for the searched node
indexes.

The recommendations for random access has been to simply use mmap which I
assume would work well in this case but AFAIK it's something that is only
used for files readily available on attached block storage. However, in
this case the full dataset might very well be too large to keep locally
and the preferred access method would be streaming access over network with
the same pattern of random access using index searches.

I'm a C++ novice and I fail to understand if something remotely like this
can be done already with the reference C++ implementation. Indeed, I have
not even been able to understand if it supports sequential streaming access
of a part of a message - it seems assumed that a message is fully read into
RAM, except when using mmap which would then be the only way to partially
read a message (sequential or random). But I do not want to give up yet,
perhaps there is something I'm missing?

Regards,

BjÃ¶rn

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+***@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

'Kenton Varda' via Cap'n Proto

2018-08-16 18:31:28 UTC

Permalink

Hi BjÃ¶rn,

The easiest way to make this work would be to have the data set live on a
network filesystem (e.g. NFS) or block device (e.g. NBD, iSCSI) which you
can mount on your local system and then use mmap().

If mounting a remote filesystem is not an option, it is technically
possible to do everything in userspace instead -- but it's tricky.
Essentially, you can implement a memory mapping entirely in userspace by
writing your own signal handler for SIGSEGV. At startup, you would create
an anonymous memory mapping that is at least the size of your remote file,
and is marked to prohibit reading. When your program attempts to read from
this space, a SIGSEGV signal is raised. In your signal handler, you look at
what address the code was trying to access (from si_addr in the siginfo_t),
you fetch the appropriate page from the remote server, you map that page
into the right place in local memory, and then you mark it as readable. On
return from the signal handler, the code continues on with the newly-mapped
data.

This is, of course, pretty advanced systems hacking, an unfortunately I
don't know of a library that does it for you (though I bet one exists...
somewhere).

Otherwise, you need to spit your data into smaller pieces that your
application knows how to fetch explicitly as needed...

-Kenton

Post by b***@gmail.com
Hi,
I'm investigating using Cap'n Proto as the basis for a format containing a
large collection of r-tree indexed data. The typical access pattern would
be to query the index resulting in a set of nodes in the tree. The
collection of data would be physically clustered on node indices so that
one can efficiently seek and read the data items for the searched node
indexes.
The recommendations for random access has been to simply use mmap which I
assume would work well in this case but AFAIK it's something that is only
used for files readily available on attached block storage. However, in
this case the full dataset might very well be too large to keep locally
and the preferred access method would be streaming access over network with
the same pattern of random access using index searches.
I'm a C++ novice and I fail to understand if something remotely like this
can be done already with the reference C++ implementation. Indeed, I have
not even been able to understand if it supports sequential streaming access
of a part of a message - it seems assumed that a message is fully read into
RAM, except when using mmap which would then be the only way to partially
read a message (sequential or random). But I do not want to give up yet,
perhaps there is something I'm missing?
Regards,
BjÃ¶rn
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.

Björn Harrtell

2018-08-17 07:06:04 UTC

Permalink

Thanks, this makes it much clearer to me.

Some additional diving into making custom handling for SIGSEGV can be found
at https://stackoverflow.com/questions/24351359/mmap-for-remote-file.
However, I will likely not go down that rabbit hole. :)

I will instead consider the recommendation to split up the problem up in
multiple messages with externally handled framing/indexing.

Post by 'Kenton Varda' via Cap'n Proto
Hi BjÃ¶rn,
The easiest way to make this work would be to have the data set live on a
network filesystem (e.g. NFS) or block device (e.g. NBD, iSCSI) which you
can mount on your local system and then use mmap().
If mounting a remote filesystem is not an option, it is technically
possible to do everything in userspace instead -- but it's tricky.
Essentially, you can implement a memory mapping entirely in userspace by
writing your own signal handler for SIGSEGV. At startup, you would create
an anonymous memory mapping that is at least the size of your remote file,
and is marked to prohibit reading. When your program attempts to read from
this space, a SIGSEGV signal is raised. In your signal handler, you look at
what address the code was trying to access (from si_addr in the siginfo_t),
you fetch the appropriate page from the remote server, you map that page
into the right place in local memory, and then you mark it as readable. On
return from the signal handler, the code continues on with the newly-mapped
data.
This is, of course, pretty advanced systems hacking, an unfortunately I
don't know of a library that does it for you (though I bet one exists...
somewhere).
Otherwise, you need to spit your data into smaller pieces that your
application knows how to fetch explicitly as needed...
-Kenton

Post by b***@gmail.com
Hi,
I'm investigating using Cap'n Proto as the basis for a format containing
a large collection of r-tree indexed data. The typical access pattern would
be to query the index resulting in a set of nodes in the tree. The
collection of data would be physically clustered on node indices so that
one can efficiently seek and read the data items for the searched node
indexes.
The recommendations for random access has been to simply use mmap which I
assume would work well in this case but AFAIK it's something that is only
used for files readily available on attached block storage. However, in
this case the full dataset might very well be too large to keep locally
and the preferred access method would be streaming access over network with
the same pattern of random access using index searches.
I'm a C++ novice and I fail to understand if something remotely like this
can be done already with the reference C++ implementation. Indeed, I have
not even been able to understand if it supports sequential streaming access
of a part of a message - it seems assumed that a message is fully read into
RAM, except when using mmap which would then be the only way to partially
read a message (sequential or random). But I do not want to give up yet,
perhaps there is something I'm missing?
Regards,
BjÃ¶rn
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/capnproto.