Skip to content

Python Protobuf interface

Stephen Fegan edited this page Aug 5, 2016 · 33 revisions

The protobuf structures used in calin are accessible from Python through a SWIG interface. The proto definitions are converted to a SWIG interface file using an extension to protobuf compiler. The interface file is then compiled using SWIG to C++ and built. The automatically generated interface files can be found in the build tree; for example the protobuf for the TelescopeEvent structure, which can be found in proto/iact_data/telescope_event.proto, will be translated into a SWIG interface file that can be found under the build directory in proto/iact_data/telescope_event.pb.i.

Packages

Protobuf packages correspond to Python modules.

Enumerations

An protobuf enumeration such as

enum EnumType {
  UNKNOWN = 0,
  STARTED = 1,
  RUNNING = 2
};

will produce functions

var_bool = EnumType_IsValid(var_int)
var_string = EnumType_Name(var_int)
[var_bool, value] = EnumType_Parse(var_string)

which respectively:

  • check whether an integer var_int represents a valid enumerated value,
  • convert integer values to the stringified name of the enumerated value, and
  • convert stringified names back to integer values.

Where var_bool is a Python boolean value (True or False), var_int is an Python integer variable and var_string is a Python string variable. In addition two package or class constants are defined giving the minimum and maximum integer values in the enum.

var_int = EnumType_MIN
var_int = EnumType_MAX

For the example given above we would have,

>>> print(EnumType_MIN)
0
>>> print(EnumType_MAX)
2
>>> print(EnumType_IsValid(1))
True
>>> print(EnumType_IsValid(3))
False
>>> print(EnumType_Name(1))
STARTED
>>> print(EnumType_Name(3))

>>> print(EnumType_Parse('RUNNING'))
[True, 2]
>>> print(EnumType_Parse('BLAHBLAHBLAH'))
[False, 0]

If an enum is defined within a containing message, rather than in the global space, these functions and constants are part of the containing message (i.e. they are class functions and class variables).

Messages

Protobuf messages produce Python classes that wrap the underlying C++ code. Messages are all derived from the base class google.Message from which they inherit the following member functions:

m.CopyFrom(m_from)
m.MergeFrom(m_from)
var_int = m.SpaceUsed()

var_string = m.DebugString()
var_string = m.ShortDebugString()
var_string = m.GetTypeName()
m.Clear()
var_bool = m.IsInitialized()
var_int = m.ByteSize()
var_bool = m.ParseFromString(var_bytes)
var_bool = m.ParsePartialFromString(var_bytes)
var_bytes = m.SerializeAsString()
var_bytes = m.SerializePartialAsString()

The meanings of these functions can be deduced from the Google Protobuf documentation site.

Two additional functions are supplied that provide a convenient interface with the protobuf to JSON converter. These are:

var_bool = m.ParseFromJSON(var_string, ignore_unknown_fields=False)
var_string = m.SerializeAsJSON(include_empty_fields=False)

The first function, ParseFromJSON, parses a JSON message in var_string into the protobuf. The optional parameters ignore_unknown_fields instructs the parser to ignore any unknown fields in the JSON message. Missing fields are set to their default values. The second function, SerializeAsJSON, does the reverse, converting the protobuf to a JSON message and returning it. The optional parameter include_empty_fields instructs the converter to emit even empty (singular) fields, i.e. those that have their default values, which are usually suppressed. See the protobuf JSON documentation for more details.

Message fields

Access to each field in the Protobuf message is given by Python class member functions that are generated automatically from the definition of the message. The specific Python functions produced depend on the type of field, as described in the sections below. The names of the functions are all based on the name of the field in the .proto definition; for example a repeated field named telescopes will have a field telescopes_size() which gives the number of entries in the repeated field.

All fields have accessors, setters and functions to clear the data in the field. Functions also provide access to the description and units of the field, if they are defined in the .proto file. For all fields the following functions are provided (assuming the field is named f):

m.clear_f()
m.f_desc()
m.f_units()

The clear_f function clears the value in the field; 'f_desc' returns the description string from the .proto file, if it is defined, or None if not. f_units does the same thing for the units string from the .proto file.

Singular Numeric, String and Bytes Fields

A message with a simple singular numeric or string field, such as

message SimpleMessage {
  int32 i = 1;
}

will produce the following Python member functions to get, set and clear the field i:

m = SimpleMessage()
var_int = m.i()
m.set_i(var_int)
m.clear_i()
m.i_desc()
m.i_units()

where var_int is a Python variable. The correspondence between Protobuf and Python types is given in the table below:

Protobuf type Python 3 type
bool bool
uint32, sint32, fixed32, sfixed32 int
uint64, sint64, fixed64, sfixed64 int
float float
double float
string str
bytes bytes

Singular Enum Fields

A message with an enum field of type EnumType, such as

message SimpleMessage {
  EnumType e = 1;
}

produces Python member functions to get, set, and clear e,

m = SimpleMessage()
var_int = m.e()
m.set_e(var_int)
m.clear_e()
m.e_desc()
m.e_units()

In this case var_int is an integer type.

Singular Message Fields

As in the C++ implementation embedded message fields work differently to the data simple types above. They do not have traditional setter functions that take an Message as an input, but rather there is a mutable accessor that returns a proxy that can be use manipulate the sub-message.

For singular message fields such as:

message SubMessage {
  int32 i = 1;
}    
message SimpleMessage {
  SubMessage sm = 1;
}

the Python code for the SimpleMessage class will have the following member functions:

m = SimpleMessage()
var_bool = m.has_sm()
var_proxy = m.const_sm()
var_proxy = m.mutable_sm()
var_proxy = m.sm()
m.clear_sm()
m.sm_desc()
m.sm_units()

where var_proxy is a Python SWIG proxy for the C++ instance of the SubMessage. This proxy can be used to access the fields of the sub message, for example the value of i can be accessed as follows,

sm = m.sm() # Here either: (1) the existing sub-message is accessed or (2) an empty one is created
var_int = sm.i()

The function m.has_sm() can be used to test whether the field sm is set within m or not. The function m.clear_sm() clears any instance of sm in m. The other three functions provide access to the sub-field. Of these the recommended accessor is the simple function m.sm() which provides read and write access to the sub-message. The function m.const_sm() is intended to provide read-only access to the sub-field but unfortunately this is not enforced by SWIG/Python and hence the user is not forbidden from invoking non-const functions on var_proxy - the consequences of doing so on the underlying C++ implementation could be unfortunate, and hence use of the const functions are not recommended and they may be removed. The function m.mutable_sm() is equivalent to m.sm().

Repeated Numeric Fields

A Protobuf field such as,

message SimpleMessage {
  repeated int32 vec_i = 1;
}

can be accessed from Python using the following functions:

m = SimpleMessage()
var_int = m.vec_i_size()
var_int = m.vec_i(index)
m.add_vec_i(var_int)
m.set_vec_i(index, var_int)
array_int = m.vec_i()
array_int = m.vec_i_copy()
array_int = m.vec_i_view()
m.set_vec_i(array_int)
m.clear_vec_i()
m.vec_i_desc()
m.vec_i_units()
  • m.vec_i_size returns the number of the items in the repeated field
  • m.vec_i(index) returns the element referred to by index. An assertion is thrown if index is not in the range of -m.vec_i_size() <= index < m.vec_i_size().
  • m.add_vec_i(var_int) appends the value of var_int to the field
  • m.set_vec_i(index, var_int) sets the value of the element referred to by index to var_int. An assertion is thrown if index is not in the range of -m.vec_i_size() <= index < m.vec_i_size().
  • m.vec_i() returns a numpy array with a copy of all elements. This is identical to m.vec_i_copy() below.
  • m.vec_i_copy() returns a numpy array with a copy of all elements.
  • m.vec_i_view() returns a numpy array that gives direct access to the elements in protobuf without copying. Faster than m.vec_i_copy() but may pose problems if the protobuf is deleted while the array is still in use.
  • m.set_vec_i(array_int) clears any existing elements in the vector and adds all those in array_int which must be a numpy array or a list.
  • m.clear_vec_i() clears all elements in the vector.

Repeated String Fields

A repeated string field produces the same set of functions as for numeric fields. For example a message,

message SimpleMessage {
  repeated string vec_s = 1;
}

generates the following accessors,

m = SimpleMessage()
var_int = m.vec_s_size()
var_string = m.vec_s(index)
m.add_vec_s(var_string)
m.set_vec_s(index, var_string)
list_string = m.vec_s()
m.set_vec_s(list_string)
m.clear_vec_s()
m.vec_s_desc()
m.vec_s_units()

The only difference between them and the numeric accessors described in the previous section is that m.vec_s() returns a list of str (since there is no numpy array type for strings).

Repeated Bytes Fields

A repeated bytes field produces the same set of functions as for numeric fields and string fields, apart from the array/list accessors which are not generated in the present implementation.

message SimpleMessage {
  repeated bytes vec_b = 1;
}

generates the following accessors,

m = SimpleMessage()
var_int = m.vec_b_size()
var_bytes = m.vec_b(index)
m.add_vec_b(var_bytes)
m.set_vec_b(index, var_bytes)
m.clear_vec_b()
m.vec_b_desc()
m.vec_b_units()

Repeated Enum Fields

message SimpleMessage {
  repeated EnumType vec_e = 1;
}

Repeated Message Fields

A repeated sub-message results in single element and vector accessors and setters, and functions to clear and append to the list. For example, the following message,

message SubMessage {
  int32 i = 1;
}    
message SimpleMessage {
  repeated SubMessage vec_sm = 1;
}

produces a set of Python functions

m = SimpleMessage()
var_int = m.vec_sm_size()
var_proxy = m.vec_sm(index)
var_proxy = m.mutable_vec_sm(index)
var_proxy = m.const_vec_sm(index)
m.add_vec_sm(var_proxy)
m.set_vec_sm(index, var_proxy)
list_proxy = m.vec_sm()
list_proxy = m.mutable_vec_sm()
list_proxy = m.const_vec_sm()
m.set_vec_sm(list_proxy)
m.clear_vec_sm()
m.vec_sm_desc()
m.vec_sm_units()