Skip to content
This repository has been archived by the owner on Dec 2, 2020. It is now read-only.

Collector

mbostock edited this page Apr 17, 2012 · 19 revisions

WikiAPI ReferenceCollector

The collector is a server that runs (by default) on port 1080. The collector exists to receive events from emitters (client programs that you write) and save them to Cube's Mongo database. The collector also invalidates any cached metrics that were associated with the incoming events, so that the latest values are always to evaluator clients.

# /1.0/event/put

Post events to Cube. The endpoint supports both HTTP POST and WebSockets. For HTTP, the body of the request should be a JSON-encoded array of events. For example, to post a single "request" event:

[
  {
    "type": "request",
    "time": "2011-09-12T21:33:12Z",
    "data": {
      "host": "web14",
      "path": "/search",
      "query": {
        "q": "flowers"
      },
      "duration_ms": 241,
      "status": 200,
      "user_agent": "Chrome/13.0.782.112"
    }
  }
]

If the above is in a file events.json, you can use curl to POST the event:

curl -X POST -d @events.json http://localhost:1080/1.0/event/put

If the post is successful, the endpoint returns a status 200 with the body "{}". If the post fails, a status 400 is returned with the body "{error: message}", where message is a description of what went wrong.

For WebSockets, simply send events as messages. Each event should be JSON encoded. For example, to post a single "request" event:

var socket = new WebSocket("ws://localhost:1080/1.0/event/put");

socket.onopen = function() {
  socket.send(JSON.stringify({
    "type": "request",
    "time": "2011-09-12T21:33:12Z",
    "data": {
      "host": "web14",
      "path": "/search",
      "query": {
        "q": "flowers"
      },
      "duration_ms": 241,
      "status": 200,
      "user_agent": "Chrome/13.0.782.112"
    }
  }));
};

If the post is successful, no message is returned. If an error occurs, Cube replies with a JSON error message that you can log. For example:

socket.onerror = function(error) {
  console.log("error", error);
};

# /collectd

See Collectd.

Events

When you send events to Cube, two fields are required:

  • type - a name for grouping events (such as "request"); typically singular, of the form [a-z_][a-zA-Z0-9_]*.
  • time - the event time in ISO 8601 format, UTC.

The type determines the name of the underlying collection the the Mongo database. For example, if you send an event of type "request", then Cube will store the event in the collection "request_events", and store the associated metrics in the collection "request_metrics". These collections and the necessary associated indexes will be created automatically, if they do not already exists.

While it is possible to use a single event type for all events you send to Cube, it's a good idea to use descriptive event types. By storing events in separate collections, you can create custom indexes for those events, and you can control the size of the associated metrics cache. By default, the only index on the events table is by time ({t: 1}). If you frequently perform queries using a particular data field, then you should add an index to Cube's backing Mongo database. For example, if you frequently query "request" events by the field "path", then create an index on path and time: {"d.path": 1, t: 1}. This will greatly improve the performance of finding events for a particular path within a given time range.

Events may also include two optional fields:

  • id - a unique identifier, for replacing existing events.
  • data - a data object, for storing additional event data.

By specifying an id, you allow Cube to replace a previous event with new data. (Note: the new event must have the same time as the old event; otherwise, Cube will only invalidate the metrics associated with the new time.) The data field stores arbitrary JSON that you wish to associated with the event. Typically this is an object that contains a set of key-value pairs; however, you can store any JSON data, such as numbers, strings, booleans, arrays, nested objects, etc. Currently, Cube restricts the property names you can use when storing events: names must be of the form [a-zA-Z_][a-zA-Z0-9_$].

Internally, events are transformed slightly for more efficient representation. The above "request" event is represented in the Mongo request_events collection as:

{
  "_id" : ObjectId("47cc67093475061e3d95369d"),
  "t": ISODate("2011-09-12T21:33:12Z"),
  "d": {
    "host": "web14",
    "path": "/search",
    "query": {
      "q": "flowers"
    },
    "duration_ms": 241,
    "status": 200,
    "user_agent": "Chrome/13.0.782.112"
  }
}

Configuration

When constructing a cube.server, you may specify a configuration object that controls its behavior. The default configuration is as follows:

{
  "mongo-host": "127.0.0.1",
  "mongo-port": 27017,
  "mongo-database": "cube_development",
  "mongo-username": null,
  "mongo-password": null,
  "http-port": 1080
}

The mongo-host, mongo-port and mongo-database controls where the collector saves events, and where it finds metrics to invalidate. If your Mongo database requires authentication, specify the optional mongo-username and mongo-password parameters. The http-port parameter specifies the port the collector listens to.

Starting and Stopping

To start the Collector:

node bin/collector &

To stop the Collector, ^C the process:

fg
^C

Alternatively, find the process via ps and then kill it:

ps aux | grep -e 'collector' | grep -v grep | awk '{print $2}' | xargs -i kill -SIGINT {}
Clone this wiki locally