This guide takes you through writing a simple UDF in a Gradle project. For the API documentation, please refer to the Transport UDFs API. For information about the project in general please refer to the documentation index
Add the following to the build.gradle
of the Gradle module in which you wish to develop your UDF.
buildscript {
repositories {
mavenCentral()
}
dependencies {
classpath "com.linkedin.transport:transportable-udfs-plugin:+"
}
}
apply plugin: "java"
apply plugin: "com.linkedin.transport.plugin"
repositories {
mavenCentral()
}
Let's write a UDF to multiply two integers. Paste the following into src/main/java/transport/example/Multiply.java
inside your Transport UDF module.
package transport.example;
import com.linkedin.transport.api.data.StdInteger;
import com.linkedin.transport.api.udf.StdUDF2;
import com.linkedin.transport.api.udf.TopLevelStdUDF;
import java.util.Arrays;
import java.util.List;
public class Multiply extends StdUDF2<StdInteger, StdInteger, StdInteger>
implements TopLevelStdUDF {
@Override
public List<String> getInputParameterSignatures() {
return Arrays.asList("integer", "integer");
}
@Override
public String getOutputParameterSignature() {
return "integer";
}
@Override
public StdInteger eval(StdInteger first, StdInteger second) {
return getStdFactory().createInteger(first.get() * second.get());
}
@Override
public String getFunctionName() {
return "multiply";
}
@Override
public String getFunctionDescription() {
return "Multiplies two integers";
}
}
In the example above, StdInteger
is an interface that provides high-level integer operations to its objects.
Depending on the engine where this UDF is executed, this interface is implemented differently to deal with native data types used by that engine.
getStdFactory()
is a method used to create objects that conform to a given data type.
StdUDF2
is an abstract class to express a UDF that takes two parameters.
It is parametrized by the UDF input types and the UDF output type.
For a more detailed documentation of the API usage, see Transport UDFs API.
Run gradle build
from the terminal (use ./gradlew build
if you are using the Gradle wrapper).
Now you should be able to see the UDF jar as well as platform-specific artifacts being built in the build/libs
folder inside the module.
For instructions on how to use these artifacts, see Using Transport UDFs.
- Complex types (maps, arrays, structs) with generics
- Transport UDFs can accept/return complex types. E.g. the input parameter signature for a UDF which accepts a list of integers would be
array(integer)
. You can also use generic types to derive types at runtime. E.g. you can accept a generic typeK
as input and return anarray(K)
in which case the type ofK
will be derived at query compile time. - Example: MapFromTwoArraysFunction and StructCreateByNameFunction
- Transport UDFs can accept/return complex types. E.g. the input parameter signature for a UDF which accepts a list of integers would be
- UDF overloading
- You can define multiple Transport UDFs which share the same name but accept different input parameter signatures using the
TopLevelStdUDF
Interface. - Example: NumericAddFunction is the interface that defines the UDF name which is then shared by two of its overloadings viz. NumericAddIntFunction and NumericAddLongFunction.
- You can define multiple Transport UDFs which share the same name but accept different input parameter signatures using the
- Accessing HDFS files in the UDF
- Transport UDF API provides a standard way to access and process HDFS files in the UDFs (details -
StdUDF
File Processing). - Example: One common usage of this feature is to build hash tables (or bitmaps) from files that can then be used as lookup tables inside the UDF. Such usage is demonstrated in FileLookupFunction.
- Transport UDF API provides a standard way to access and process HDFS files in the UDFs (details -