-
Notifications
You must be signed in to change notification settings - Fork 90
Introduction to Template Metaprogramming: Explicit Specialization
Previous: Metafunctions; Next: Variadic Templates
The previous sections set the stage and introduced the concepts needed to write
metafunctions. This section describes the last missing piece of the puzzle
which will allow us to implement if
statements and perform iteration needed
to write truly general metaprograms.
Enter "explicit specialization". The first section of this tutorial described how template types and functions have to be instantiated by providing them with arguments to produce actual types and functions (called "specializations"), which can then be used in the program. This is done implicitly by the compiler using the "blueprint" defined by the template. However, C++ allows us to explicitly instantiate a particular specialization in order to change the blueprint for that particular set of template arguments. An explicit specialization is defined by declaring a function / type template with the same name as the original template, but by providing an empty list of template parameters and listing the arguments for which the template is specialized in angle brackets after the name of the function/type.
For example, the mad
template we used before could be explicitly specialized
for float
and double
values to use the built-in compiler instruction for
combined multiply-add:
[try it]
template <typename T, typename U, typename V>
auto mad(T t, U u, V v) -> decltype(t * u + v) { return t * u + v; }
template <>
float mad<float, float, float>(float t, float u, float v) {
return intrinsic_float_mad(t, u, v);
}
template <>
double mad<double, double, double>(double t, double u, double v) {
return intrinsic_double_mad(t, u, v);
}
int test(double &x, double &y, float &z) {
x = mad(2.3, 1, 2.2); // uses an implicit spec. of the base template
y = mad(2.3, 1.0, 2.2); // uses the explicit double spec.
z = mad(2.3f, 1.0f, 2.2f); // uses the explicit float spec.
}
In the above code the compiler will do the following to find out which implementation to use:
- All overloads (both template, and non-template, but not explicit
specializations!) of the
mad
function are considered and the best match is found according to rules described in [todo.add.ref.to.standard], or less formally on cppreference. - If the best match is a function, that function is called.
- If it is a function template the template parameters are substituted with the deduced arguments, and all explicit specializations of that function template are considered. If one of them matches the substituted arguments, that specialization is instantiated, otherwise the compiler uses the base template to instantiate an implicit specialization.
Explicit specializations are not exclusive to function templates, but work with
type templates as well. The same rules for explicit specialization selection
work with type templates as with function templates (since types cannot
be overloaded, the first step is trivial). For example, the atomic
class
template can be used to implement atomic operations on objects which will be
used from multiple threads. The default implementation has to provide the
possibility of atomically storing, and loading an object, which can be
implemented using mutexes:
[try it]
template <typename T>
class atomic {
public:
atomic() = default;
atomic(T value) : data{value} {} // construction is not atomic
atomic(atomic&) = delete;
T operator=(const T &value) {
std::lock_guard<std::mutex> guard(m);
data = value;
return value;
}
operator T() {
std::lock_guard<std::mutex> guard(m);
auto tmp = data;
return tmp;
}
private:
T data;
std::mutex m;
};
However, for some built-in types (e.g. int
) the hardware has special
instructions which are guaranteed to be atomic, so mutexes do not have to be
used. An explicit specialization can be created to handle this case more
efficiently. In addition, intrinsics for special operations may exist for this
type, so additional members can be added to the specialization (some members
may also be removed, though this is usually not very useful):
[try it]
template <>
class atomic<int> {
public:
atomic() = default;
atomic(int value) : data{value} {}
atomic(atomic&) = delete;
int operator=(int value) {
return data = value;
}
operator int() {
return data;
}
int fetch_add(int value) {
return intrinsic_atomic_add(&data, value);
}
private:
int data;
};
Explicit specialization is the key to creating advanced metafunctions, as they allow the selection of different implementations depending on the input parameters, making control flow possible in metafunctions. For example, it can be useful to figure out if a template argument is of a certain type (e.g. int). A pseudocode for such a metafunction would look somewhat like this:
bool is_int(typename T) {
switch T {
case int: return true;
default: return false;
}
}
Unfortunately, the above is not valid code, but the same can be achieved with
explicit specializations. The base template can be used as the default
case,
and explicit specializations can be used as distinct case
statements:
[try it]
template <typename>
struct is_int {
static constexpr auto value = false;
};
template <>
struct is_int<int> {
static constexpr auto value = true;
};
static_assert(is_int<double>::value == false, "");
static_assert(is_int<int>::value == true, "");
The example can even be simplified by using true_type
and false_type
classes defined previously:
[try it]
template <typename> struct is_int : false_type {};
template <> struct is_int<int> : true_type {};
As soon as control flow and functions are available in a language, iteration can be implemented by creating recursive functions, with stopping conditions expressed with control flow. For example, the factorial function can be implemented as follows:
[try it]
template <int N> struct fact :
integral_constant<long, N * fact<N-1>::value> {};
template <> struct fact<0> :
integral_constant<long, 1> {};
static_assert(fact<3>::value == 6, "");
static_assert(fact<5>::value == 120, "");
Similarly, the compiler can be instructed to generate the Fibonacci sequence:
[try it]
template <int N> struct fib :
integral_constant<long, fib<N-1>::value + fib<N-2>::value> {};
template <> struct fib<0> : integral_constant<long, 0> {};
template <> struct fib<1> : integral_constant<long, 1> {};
static_assert(fib<5>::value == 5, "");
static_assert(fib<6>::value == 8, "");
For simple recursive metafunctions described here, C++11's constexpr
functions
can be used instead, but there are cases (e.g. when iterating through a complex,
recursive type) where this is not the case. Here is how the same can be achieved
with constexpr
functions:
[try it]
constexpr long fact(int n) {
return n == 0 ? 1 : n * fact(n - 1);
}
constexpr long fib(int n) {
return n == 0 ? 0 : n == 1 ? 1 : fib(n-1) + fib(n-2);
}
While (full) explicit specialization allows different implementations for
particular combinations of template parameters, sometimes it is useful to have
a separate implementation for an entire class of template parameters. For
example, the default_delete
and unique_ptr
templates implemented at the
beginning of this tutorial and used in uniqe_ptr
do not work well for
variable size arrays, since delete
will be called instead of delete[]
when
deallocating memory. It is possible to create a specialization for each type
of variable size array separately (e.g. int[]
, float[]
, double[]
), but
this would lead to a large amount of code duplication, and it would never be
possible to cover all cases.
For this reason, C++ also allows "partial" (explicit) specializations which can be used to provide an alternative implementation for an entire class of template arguments. Partial specializations are defined in the same way as full specializations, but the template parameter list does not have to be empty, but are used to provide "wildcards" in the argument list of the specialization. Thus, parameters used in that list are never specified when instantiating the template, but only appear in the list of template arguments after the identifier. The compiler has to be able to automatically deduce their substitutions from the template arguments (in the same way as with function template arguments) in order for the specialization to be valid.
A partial specialization for the implementation of unique_ptr
from the
beginning of this tutorial can be added to properly handle variable-size
arrays:
[try it]
// partial specialization for arrays of default_delete
template <typename T>
struct default_delete<T[]> {
void operator()(T arr[]) {
delete[] arr;
}
};
// partial specialization of uniqe_ptr for arrays
// (to avoid adding another pointer layer to the array)
template <typename T, typename Deleter>
class unique_ptr<T[], Deleter> {
public:
unique_ptr(T resource[] = nullptr, Deleter deleter = Deleter{})
: resource{resource}, deleter{deleter} {}
unique_ptr(const unique_ptr&) = delete;
unique_ptr(unique_ptr&& other)
: resource{std::move(other.resource)},
deleter{std::move(other.deleter)} {}
unique_ptr& operator=(const unique_ptr&) = delete;
unique_ptr& operator=(unique_ptr&& other) {
using std::swap;
swap(other.resource, resource);
swap(other.deleter, deleter);
}
~unique_ptr() {
deleter(resource);
}
T* get() { return resource; }
const T* get() const { return resource; }
private:
T* resource;
Deleter deleter;
};
int main() {
unique_ptr<int[]> arr_ptr(new int[5]); // uses the specialized version with
// correct deleter
}
Partial specialization is not supported for function templates, but only for class templates. However, problems that partial specialization solves with type templates can usually be solved for function templates using a combination of overloading and SFINAE (explained later in this tutorial).
For class templates, the compiler will select the appropriate specialization as follows:
- Arguments are substituted into the parameters of the base template, and default arguments are used for all unspecified parameters.
- Once every template parameter is substituted with an argument, all explicit
specializations (either partial or full) are scanned for a matching
argument list. In case of partal specializations, the parameters of the
specializations are deduced by substituting the arguments of the base
template into the wildcards of the partial specialization. E.g. for a
partial specialization of
foo
with template parameterT
, and wildcardT[]
:if the substitution in the base template wastemplate <typename T> foo {}; template <typename T> foo<T[]> {};
T/int[]
, the substitution in the partial specialization will beT/int
(deduced by substitutingT[]/int[]
). - If no applicable specializations are found, the base template is used to generate an implicit specialization for that substitution.
- Otherwise, the "most specialized" (as defined in [todo.reference.to.standard], or less formally on cppreference) explicit specialization is used (if there are more equaly specialized specializations, the program is ill-formed). If it is a full specialization, that specialization is used. If it is a partial specialization, the template parameters in that specialization are substituted with substitutions from step 2, and an implicit (full) specialization for those arguments is created from the partial specialization template.
Partial specialization can be used to create more complex metafunctions.
For example, the is_same
metafunction that is used for testing throughout
this tutorial can be created to check if two types are the same
(this is also a generalization of the is_int
metafunction that works for
any type):
[try it]
template <typename T, typename U> struct is_same : false_type {};
template <typename T> struct is_same<T, T> : true_type {};
// the implementation of is_int can now be simplified:
template <typename T> struct is_int : is_same<T, int> {};
static_assert(!is_same<int, float>::value, "");
static_assert(is_same<int, int>::value, "");
static_assert(!is_same<int, int&>::value, "");
static_assert(is_int<int>::value, "");
static_assert(!is_int<int&>::value, "");
As a second example, the remove_extent
metafunction removes the last
dimension of the array, if the input is an array, or leaves the input type
unchanged otherwise:
[try it]
template <typename T> struct remove_extent { using type = T; };
// specialization for variable-size arrays
template <typename T> struct remove_extent<T[]> { using type = T; };
// specialization for fixed-size arrays
template <typename T, int N> struct remove_extent<T[N]> { using type = T; };
static_assert(is_same<remove_extent<int[]>::type, int>::value, "");
static_assert(is_same<remove_extent<int[5]>::type, int>::value, "");
// no extent to remove
static_assert(is_same<remove_extent<int>::type, int>::value, "");
static_assert(is_same<remove_extent<int[5][3]>::type, int[3]>::value, "");
static_assert(is_same<remove_extent<int[][3]>::type, int[3]>::value, "");
The third example combines iteration with partial specialization to count the
number of dimensions of an array (and is an example of a metafunction that
returns values, and cannot be implemented using constexpr
functions):
[try it]
template <typename T> struct rank : integral_constant<int, 0> {};
template <typename T> struct rank<T[]> :
integral_constant<int, rank<T>::value + 1> {};
template <typename T, int N> struct rank<T[N]> : rank<T[]> {};
static_assert(rank<int>::value == 0, "");
static_assert(rank<int[]>::value == 1, "");
static_assert(rank<int[3]>::value == 1, "");
static_assert(rank<int[][2][3][5]>::value == 4, "");
Finally, partial specialization can be used to implement compile-time linked lists and operations on them:
[try it]
// template representing the nodes of the list
template <typename, typename = void> struct node {};
// metafunction that adds elements to the beginning of the list
template <typename List, typename Value>
struct push_front {
using type = node<Value, List>;
};
// metafunction that gets the first element of the list
template <typename List> struct front {};
template <typename Element, typename Next>
struct front<node<Element, Next>> {
using type = Element;
};
// metafunction that removes the first element of the list
template <typename List> struct pop_front {};
template <typename Element, typename Next>
struct pop_front<node<Element, Next>> {
using type = Next;
};
// metafunction that computes the size of the list
template <typename> struct size : integral_constant<int, 0> {};
template <typename Element, typename Next> struct size<node<Element, Next>> :
integral_constant<int, size<Next>::value + 1> {};
using single_element = push_front<void, int>::type;
static_assert(is_same<single_element, node<int, void>>::value);
using list1 =
push_front<push_front<single_element, float>::type, double>::type;
static_assert(is_same<list1, node<double, node<float, node<int, void>>>>::value);
static_assert(is_same<front<list1>::type, double>::value);
using list2 = pop_front<list1>::type;
static_assert(is_same<list2, node<float, node<int, void>>>::value);
static_assert(size<single_element>::value == 1);
static_assert(size<list1>::value == 3);
static_assert(size<list2>::value == 2);
The switch statement described earlier provides a way to choose an
implementation depending on a specific combination of template arguments.
However, sometimes it is necessary to select an implementation depending on
some expression being true or false. Usually, this is done using the
if
-else
control flow statement. While there is no such statement in the C++
template language, one can be emulated using partial specialization.
The basic idea is to use a dummy boolean template parameter, and default it to
the condition of the if statement. Then, one branch (usually the if
branch)
is implemented in the base template, and the other one (usually the else
branch) in a partial specialization. The basic structure looks like this:
template </* parameters */, bool = (/* condition */)> struct foo {
/* code to execute if condition evaluates to true */
};
template </* parameters */> struct foo</* parameters */, false> {
/* code to execute if condition evaluates to false */
};
An example of this would be creating a compile-time list of two elements, with elements sorted by their size:
[try it]
// v v - not a closing bracket!
template <typename T, typename U, bool = (sizeof(T) >= sizeof(U))>
struct make_sorted_list {
using type = node<T, node<U>>;
};
template <typename T, typename U>
struct make_sorted_list<T, U, false> {
using type = node<U, node<T>>;
};
static_assert(is_same<
make_sorted_list<double, int>::type,
node<double, node<int>>>::value, "");
static_assert(is_same<
make_sorted_list<double, int>::type,
node<double, node<int>>>::value, "");
Substituting arguments into template parameters may result in invalid code. For example, the substituted type may not have a member being requested, or values of that type may not support the operations being applied to them. The SFINAE principle states that failing to substitute an argument into a parameter does not necessarily cause an error. In case of function templates, the function template giving the error is discarded from the list of candidate overloads, and the next best candidate is considered next. An error only occurs if none of the overloads are valid candidates (or if multiple equally ranked candidates are left at the end). In case of function templates, SFINAE applies to both the function declaration and its definition.
For type templates, there is no overloads, but SFINAE still applies when selecting the best specialization. In this case, however, SFINAE only applies to the declaration, and a substitution failure in the definition does result in an error.
One of the most common uses of SFINAE is to query various properties of types.
For example, to query if a class T
has a member type alias called foo
of
a specific type, a metafunction can be created whose base template returns
false, and a specialization that returns true, but its declaration is only
valid if T
has a member called foo
:
[try it]
template <typename, typename> struct has_foo_of_type : false_type {};
template <typename T> struct has_foo_of_type<T, typename T::foo> : true_type {};
struct X { using foo = int; };
struct Y {};
static_assert(has_foo_of_type<X, int>::value, "");
static_assert(!has_foo_of_type<X, float>::value, "");
static_assert(!has_foo_of_type<Y, int>::value, "");
Checking if T
only has a type member foo
of any type is somewhat more
complicated. Fortunately, the void_type
metafunction finally shows its
usefulness. This function can be used to convert any type into a void type, so
the call to foo
in the specialization only has to be converted into void, and
the default argument for the second parameter set to void.
[try it]
template <typename, typename = void> struct has_foo : false_type {};
template <typename T> struct has_foo<T,
typename void_type<typename T::foo>::type> : true_type {};
struct X { using foo = int; };
struct Y { using foo = float; };
struct Z {};
static_assert(has_foo<X>::value);
static_assert(has_foo<Y>::value);
static_assert(!has_foo<Z>::value);
A similar idea with a dummy template parameter can be used to check if a type
supports a certain operation. For example, to see if +
is supported by a
type, the base template can just be defined as false_type
, and an expression
including the +
operation is inserted via decltype
into the declaration of
the partial specialization.
[try it]
template <typename, typename = void> struct has_add : false_type {};
template <typename T> struct has_add<T, typename void_type<
decltype(declval<T>() + declval<T>())>::type> : true_type {};
struct X {};
static_assert(has_add<int>::value, "");
static_assert(!has_add<X>::value, "");
Even though nested if
-else
can be used to express a multiple choice if
,
excessive nesting makes for less readable and harder to understand code,
especially in template metaprogramming, since each nesting level requires the
definition of a new identifier. Fortunately, a better alternative is possible
using SFINAE. The basic idea is to have each if
branch of the compound
statement expressed as a partial specialization, and cause a substitution
failure if the condition is false. The base template represents the else
branch. To cause a substitution failure, we will define a utility type template
enable_if
. enable_if
will define a type member named type
if its
input is true
, and will not define that type member if its input is false
:
[try it]
template <bool, typename T = void> struct enable_if { using type = T; }
template <typename T> struct enable_if<false, T> {};
using t = enable_if<true>::type; // works
// using t2 = enable_if<false>::type; // fails to compile
Using enable_if
, the general structure of the if
-elseif
-else
statement
looks as follows:
template </* parameters */, typename = void> struct foo {
/* else branch */
};
template </* parameters */> struct foo</* parameters */,
typename enable_if</* condition 1 */>::type> {
/* condition 1 branch */
};
//...
template </* parameters */> struct foo</* parameters */,
typename enable_if</* condition N */>::type> {
/* condition N branch */
};
This more advanced version of the if statement can be used to implement the merge algorithm, which, given two sorted lists, produces a new list containing all the elements of the other two lists in the sorted order (we compare types in the list by their size):
[try it]
template <typename L1, typename L2, typename = void> struct merge {
// nothing to do in the else branch
};
template <typename L1> struct merge<L1, void, void> {
// if the second list is empty, the result is the first list
using type = L1;
};
template <typename L2> struct merge<void, L2, void> {
// if the first list is empty, the result is the second list
using type = L2;
};
template <> struct merge<void, void, void> {
// need case for both list empty to avoid ambiguity
using type = void;
};
template <typename L1, typename L2>
struct merge<L1, L2, typename enable_if<(
sizeof(typename front<L1>::type) <=
sizeof(typename front<L2>::type))>::type> {
// if the first element of first list is smaller than the first element of
// the second list, take the first element of the first list, and merge the
// rest of the lists
using type = node<
typename front<L1>::type,
typename merge<typename pop_front<L1>::type, L2>::type>;
};
template <typename L1, typename L2>
struct merge<L1, L2, typename enable_if<(
sizeof(typename front<L1>::type) >
sizeof(typename front<L2>::type))>::type> {
// if the first element of first list is larger than the first element of
// the second list, take the first element of the second list, and merge
// the rest of the lists
using type = node<
typename front<L2>::type,
typename merge<L1, typename pop_front<L2>::type>::type>;
};
using list1 = node<int[1], node<int[2], node<int[5], node<int[7]>>>>;
using list2 = node<int[2], node<int[3], node<int[6], node<int[6]>>>>;
static_assert(is_same<
merge<list1, list2>::type,
node<int[1], node<int[2], node<int[2], node<int[3], node<int[5],
node<int[6], node<int[6], node<int[7]>>>>>>>>>::value);
Merge is the first step to implementing the merge sort algorithm. The only missing part is the split algorithm, which will split the list in two halves. Split can be implemented as two separate metafunctions, one that only leaves the first half of the values, and the other which removes the first half of the list:
[try it]
template <int K, typename L, typename = void> struct take {};
template <typename L> struct take<0, L, void> { using type = void; };
template <int K, typename First, typename Rest>
struct take<K, node<First, Rest>, typename enable_if<(K > 0)>::type> {
using type = node<First, typename take<K - 1, Rest>::type>;
};
template <int K, typename L, typename = void> struct remove {};
template <typename L> struct remove<0, L, void> { using type = L; };
template <int K, typename First, typename Rest>
struct remove<K, node<First, Rest>, typename enable_if<(K > 0)>::type> {
using type = typename remove<K - 1, Rest>::type;
};
static_assert(is_same<
take<2, node<int, node<float, node<double>>>>::type,
node<int, node<float>>>::value, "");
static_assert(is_same<
remove<2, node<int, node<float, node<double>>>>::type,
node<double>>::value, "");
Finally, with all the components in place, the merge sort algorithm can be implemented as follows:
[try it]
template <typename L, typename = void> struct merge_sort {
// the else branch will be the one with an empty or 1-element list - the
// input list just has to be returned
using type = L;
};
template <typename L>
struct merge_sort<L, typename enable_if<(size<L>::value > 1)>::type> {
// if the list is non-trivial, it has to be split in half, the halves
// sorted recursively, and merged back together
private:
static constexpr auto list_size = size<L>::value;
using first_half = typename merge_sort<
typename take<list_size / 2, L>::type>::type;
using second_half = typename merge_sort<
typename remove<list_size / 2, L>::type>::type;
public:
using type = typename merge<first_half, second_half>::type;
};
using unsorted =
node<int[3], node<int[2], node<int[5], node<int[1], node<int[4]>>>>>;
using sorted =
node<int[1], node<int[2], node<int[3], node<int[4], node<int[5]>>>>>;
static_assert(is_same<merge_sort<unsorted>::type, sorted>::value);
In addition to function and class templates, C++11 introduced type alias templates, which are templates that, when instantiated, produce a type alias:
[try it]
template <typename T> using void_t = typename void_type<T>::type;
static_assert(is_same<void_t<float>, void_type<float>::type>::value, "");
The nice part about type aliases is that even if their arguments are templates,
the compiler still knows that they produce a type, so there is no need to
specify the typename
keyword when using them. In addition, they do allow us
to create a metafunction that returns an already existing type. This seems to
solve the problem with metafunctions that return types, and the type
member
type alias convention: instead of creating a new type template, a type alias
template could be created instead.
Unfortunately, type alias templates do not provide as many features as type
templates, so they cannot always be used to replace type templates.
The major drawback is that they do not support any kind of explicit
specialization (neither full, nor partial), so they cannot be used to
implement anything that requires conditionals or iteration. In addition, older
versions of the standard (before C++14) do not specify how are unused arguments
handled. For example, the following implementation of void_t
is not
guaranteed to trigger SFINAE behavior for its argument on older compilers:
template <typename T> using void_t = void;
Thus, for everything but simple cases, the implementation of the metafunction is still done using type templates, and the resulting function is then wrapped into a type alias template to provide a simpler interface. For example:
[try it]
// simplified interface for enable_if
template <bool B, typename T = void>
using enable_if_t = typename enable_if<B, T>::type;
// simplified interface for merge
template <typename L1, typename L2>
using merge_t = typename merge<L1, L2>::type;
// simplified interface for take
template <int K, typename L>
using take_t = typename take<K, L>::type;
// simplified interface for remove
template <int K, typename L>
using remove_t = typename remove<K, L>::type;
// a simpler version of merge sort using type alias templates
template <typename L, typename = void> struct merge_sort {
// the else branch will be the one with an empty or 1-element list - the
// input list just has to be returned
using type = L;
};
template <typename L>
struct merge_sort<L, enable_if_t<(size<L>::value > 1)>> {
// if the list is non-trivial, it has to be split in half, the halves
// sorted recursively, and merged back together
using type = merge_t<
typename merge_sort<take_t<size<L>::value / 2, L>>::type,
typename merge_sort<remove_t<size<L>::value / 2, L>>::type>;
};
// simplified interface for merge_sort
template <typename L>
using merge_sort_t = typename merge_sort<L>::type;
using unsorted =
node<int[3], node<int[2], node<int[5], node<int[1], node<int[4]>>>>>;
using sorted =
node<int[1], node<int[2], node<int[3], node<int[4], node<int[5]>>>>>;
static_assert(is_same<merge_sort_t<unsorted>, sorted>::value);
It is also possible to have templates that take template parameters
and instantiate the template parameter from within the definition of the
template. For example, this can be used to implement higher-order
metafunctions, like modifying the merge sort algorithm to take a custom
comparator for comparing the values.
For more details about template template
parameters see
cpprerence.
Previous: Metafunctions; Next: Variadic Templates
Tutorial: Building a Poisson Solver
- Getting Started
- Implement: Matrices
- Implement: Solvers
- Optimize: Measuring Performance
- Optimize: Monitoring Progress
- Optimize: More Suitable Matrix Formats
- Optimize: Using a Preconditioner
- Optimize: Using GPUs
- Customize: Loggers
- Customize: Stopping Criterions
- Customize: Matrix Formats
- Customize: Solvers
- Customize: Preconditioners