Large Static Graphs

Giraph Out of Core

Presented by Diptanshu
V props, E props and topo kept in separate files.
Partitions can be loaded from disk to memory on the fly, and written back to disk.
If property changes and not topo, only deltas of property loaded from disk
Sticky partitions for random set of partitions to be retained in memory. LRU to keep recently used partitions in memory.
Messages batched and written to disk in vertex ID order. Multiple batches can be in different files.
Compute method called on vertices in same order. Cursor for each file moves sequentially forward, pointing to current/newer vertex being processed. Ensures single pass thru file.

Presented by Diptanshu
Each subgraph fits in mem, but not entire graph in distributed mem
If subgraph in memory, updated async in memory. Else write to disk. Messages read when loading that subgraph from disk.
Allows subgraphs to make async progress for k supersteps.
Inactive vertices need not be loaded into memory for a given superstep after being persisted to disk.
TODO
See CCGRID 2016 paper by Safi
Look at KLA programming model

Presented by Diptanshu
Edge owned only by single vertex. Undirected means two edges.
Direct mutation on edges owned by a vertex: Add, remove, convert from directed to undirected. Visible in same superstep.
Graph mutation requests for vertices/edges not owned by a vertex (add/remove vertex/edge, optionally with properties). Mutations are applied between two ssteps.
sending message to no-existed vertex can create it.
Mutation conflicts resolved using vertex change and vertex resolver.
TODO: What is the specific order in which mutations happen?

Outedges can have custom implementations. Iterable also accessible thru different models, like ArrayList, HashMapEdgeList, ByteArrayList, etc.