Skip to content
Ruihao edited this page Sep 23, 2016 · 2 revisions

Attribute List

A Husky object is the smallest unit of data abstraction in Husky, and it may contain different attributes for different algorithms. For example, if one wants to perform both PageRank and connected-component algorithms on the same graph, the Husky object (a vertex) will have at least two attributes, a PR value and a component ID. However, when performing PankRank, the information of component ID is irrelevant. Likewise, the PR value is irrelevant to the connected component algorithm. Generally, the more algorithms or analyses we want to perform, the more attributes we need to add to the Husky object. Therefore, we may end up with a bulky Husky object, which contains lots of irrelevant attributes for one specific task. Getting rid of these irrelevant attributes can take better advantage of CPU cache, and thus boost the performance.

In the above example, storing all the attributes in the Husky object is essentially row-store, where each object is like a "row" that contains all attributes. Attribute list is inspired by the concept of column-store, where we store attributes in columns, namely attribute lists, so that we can read in only relevant attributes.

Attribute

An attribute can be primitive-typed, e.g., double is sufficient for a PR value. It can also be a class that is composed of more information. For example, you can define a complex attribute type:

class ComplexAttr {
 public:
  ComplexAttr() = default;
  ~ComplexAttr() = default;
  ComplexAttr(bool bVal, double dbVal, const std::string& strVal)
      : isTrue(bVal), someDb(dbVal), someStr(strVal) {}

  bool isTrue;
  double someDb;
  std::string someStr;
};

However, you should always avoid using complex type for attributes unless it is necessary, because abusing this will make column-store degenerates into row-store.

Now let's see how to create an attribute list.

Create an Attribute List

An attribute list must be created by an object list, since a standalone attribute list does not make sense.

// 1. define object type
class Obj {
   public:
    using KeyT = int;
    KeyT key;
    const KeyT& id() const { return key; }
    Obj() = default;
    explicit Obj(const KeyT& k) : key(k) {}
};

// 2. Create an Object list named "v_list".
auto& objlist = ObjListFactory::create_objlist<Obj>("v_list");

// 3. Create a PR value attribute list named "pr".
auto& prlist = objlist.create_attrlist<double>("pr");

// 4. Create a component id attribute list named "cid".
auto& cidlist = objlist.create_attrlist<int>("cid");

You can access or delete the attribute lists later with the object list.

// get
auto& prlist = objlist.get_attrlist<double>("pr");

// delete
objlist.del_attrlist("pr");

[NOTE] You don't have to worry about globalizing attribute lists, as they must belong to some object lists. As long as the object lists are globalized, you are good to go.

Set and Get an Attribute

The attribute can be accessed either through its corresponding object or index. The public member functions for set are:

// set by index
void set(const size_t idx, AttrT&& attr);  // move
void set(const size_t idx, const AttrT& attr);  // copy

// set by object
void set(const ObjT& obj, AttrT&& attr);  // move
void set(const ObjT& obj, const AttrT& attr);  // copy

The following example shows how to add an object and also set its attribute.

// 1. create an object list
auto& objlist = ObjListFactory::create_objlist<Obj>("v_list");

// 2. create an attribute list
auto& dblist = objlist.create_attrlist<double>("db");

// 3. add an object and get its index
auto idx = objlist.add_object(Obj(0));
dblist.set(idx, 1.0);

[CAUTION] To set the attribute by object needs the object in the object list. Let's see a wrong example first:

// 1. create object list
auto& objlist = ObjListFactory::create_objlist<Obj>("v_list");

// 2. create attribute list
auto& dblist = objlist.create_attrlist<double>("db");

// 3. create an new object and add to object list
Obj newobj = Obj(0);
objlist.add_object(newobj);

// 4. WRONG: set by object
dblist.set(newobj, 1.0);  // WRONG!!

The above usage is wrong because newobj is not "in" objlist, and it leads to an "index out of range" exception. The set by object function is handy in the list_execute context, where objects in the object list are passed as parameters to the lambda function. For example, we can do the following to set the attributes of the objects:

list_execute(objlist, [&](Obj& obj) {
    dblist.set(obj, 0.0);
});

The public member functions for get are:

// get by index
AttrT& get(const size_t idx);
// get by object
AttrT& get(const ObjT& obj);
// get by []operator
AttrT& operator[] (size_t idx);
Clone this wiki locally