Skip to content

Running CoRB

Mads Hansen edited this page Nov 2, 2023 · 2 revisions

There are a variety of ways to execute a CoRB job.

The entry point is the main method in the com.marklogic.developer.corb.Manager class. CoRB requires the MarkLogic XCC JAR on the classpath (preferably the version that corresponds to the MarkLogic server version), which can be downloaded from https://developer.marklogic.com/products/xcc.

Running CoRB as a gradle task

ml-gradle has a CorbTask that facilitates executing CoRB as a Gradle task. Refer to https://github.com/marklogic-community/ml-gradle/wiki/Corb-and-Gradle for an overview and specific instructions for configuring and executing CoRB as a gradle task.

Configuring Options

CoRB needs options specified through one or more of the following mechanisms:

  1. command-line parameters
  2. Java system properties ex: -DXCC-CONNECTION-URI=xcc://user:password@localhost:8202
  3. As properties file in the class path specified using -DOPTIONS-FILE=myjob.properties. Relative and full file system paths are also supported.

If specified in more than one place, a command line parameter takes precedence over a Java system property, which take precedence over a property from the OPTIONS-FILE properties file.

Note: Any or all of the properties can be specified as Java system properties or key value pairs in properties file.

Note: CoRB exit codes 0 - successful, 0 - nothing to process (ref: EXIT-CODE-NO-URIS), 1 - initialization or connection error and 2 - execution error

Note: CoRB now supports Logging Job Metrics back to the MarkLogic database log and/or as document in the database.

Usage Examples

Usage 1 - Command line options:

java -server -cp .:marklogic-xcc-11.1.0.jar:marklogic-corb-2.5.6.jar
        com.marklogic.developer.corb.Manager
        XCC-CONNECTION-URI
        [COLLECTION-NAME] [PROCESS-MODULE] [THREAD-COUNT] [URIS-MODULE] [MODULE-ROOT]
          [MODULES-DATABASE] [INSTALL] [PROCESS-TASK] [PRE-BATCH-MODULE] [PRE-BATCH-TASK]
            [POST-PROCESS-MODULE] [POST-BATCH-TASK] [EXPORT-FILE-DIR] [EXPORT-FILE-NAME]
              [URIS-FILE]

Usage 2 - Java system properties specifying options:

java -server -cp .:marklogic-xcc-11.1.0.jar:marklogic-corb-2.5.6.jar
        -DXCC-CONNECTION-URI=xcc://user:password@host:port/[database]
        -DPROCESS-MODULE=module-name.xqy -DTHREAD-COUNT=10
        -DURIS-MODULE=get-uris.xqy
        -DPOST-BATCH-PROCESS-MODULE=post-batch.xqy
        -D...
        com.marklogic.developer.corb.Manager

Usage 3 - Properties file specifying options:

java -server -cp .:marklogic-xcc-11.1.0.jar:marklogic-corb-2.5.6.jar
        -DOPTIONS-FILE=job.properties com.marklogic.developer.corb.Manager

looks for job.properties file in classpath

Usage 4 - Combination of properties file with java system properties and command line options:

java -server -cp .:marklogic-xcc-11.1.0.jar:marklogic-corb-2.5.6.jar
        -DOPTIONS-FILE=job.properties -DTHREAD-COUNT=10
        com.marklogic.developer.corb.Manager XCC-CONNECTION-URI

Sample job properties

Note: any of the properties below can be specified as java system property i.e. '-D' option)

sample 1 - simple batch job
XCC-CONNECTION-URI=xcc://user:password@localhost:8202/   
THREAD-COUNT=10  
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB   
URIS-MODULE=get-uris.xqy  
PROCESS-MODULE=transform.xqy  
sample 2 - Use username, password, host and port specified separately instead of connection URI
XCC-USERNAME=username   
XCC-PASSWORD=password   
XCC-HOSTNAME=localhost   
XCC-PORT=9999   
XCC-DBNAME=ML-database   
THREAD-COUNT=10  
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB   
URIS-MODULE=get-uris.xqy  
PROCESS-MODULE=SampleCorbJob.xqy
sample 3 - simple batch with URIS-FILE (in place of URIS-MODULE)
XCC-CONNECTION-URI=xcc://user:password@localhost:8202/   
THREAD-COUNT=10  
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB   
URIS-FILE=input-uris.csv  
PROCESS-MODULE=SampleCorbJob.xqy  
sample 4 - simple batch with XML-FILE (in place of URIS-MODULE)
XCC-CONNECTION-URI=xcc://user:password@localhost:8202/   
THREAD-COUNT=10  
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB   
XML-FILE=input.xml  
XML-NODE=/rootNode/childNode
URIS-LOADER=com.marklogic.developer.corb.FileUrisXMLLoader
PROCESS-MODULE=SampleCorbJob.xqy  
sample 5 - report, generates a single file with data from processing each URI
XCC-CONNECTION-URI=xcc://user:password@localhost:8202/   
THREAD-COUNT=10  
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB
PROCESS-MODULE=get-data-from-document.xqy   
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask   
EXPORT-FILE-NAME=/local/path/to/exportmyfile.csv   
sample 6 - report with header, add following to sample 4.
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask  
EXPORT-FILE-TOP-CONTENT=col1,col2,col3  
sample 7 - dynamic headers, assuming pre-batch-header.xqy module returns the header row, add the following to sample 4.
PRE-BATCH-MODULE=pre-batch-header.xqy  
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask   
sample 8 - pre and post batch hooks
XCC-CONNECTION-URI=xcc://user:password@localhost:8202/   
THREAD-COUNT=10  
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB   
URIS-MODULE=get-uris.xqy  
PROCESS-MODULE=transform.xqy  
PRE-BATCH-MODULE=pre-batch.xqy   
POST-BATCH-MODULE=post-batch.xqy   
sample 9 - adhoc tasks

XQuery modules live local to filesystem where CORB is located. Any XQuery module can be adhoc.

XCC-CONNECTION-URI=xcc://user:password@localhost:8202/   
THREAD-COUNT=10  
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB   
URIS-MODULE=get-uris.xqy|ADHOC   
PROCESS-MODULE=SampleCorbJob.xqy|ADHOC   
PRE-BATCH-MODULE=/local/path/to/adhoc-pre-batch.xqy|ADHOC
sample 10 - jasypt encryption

XCC-CONNECTION-URI, XCC-USERNAME, XCC-PASSWORD, XCC-HOSTNAME, XCC-PORT and/or XCC-DBNAME properties can be encrypted and optionally enclosed by ENC(). If JASYPT-PROPERTIES-FILE is not specified, it assumes default jasypt.properties.

XCC-CONNECTION-URI=ENC(encrypted_uri)   
DECRYPTER=com.marklogic.developer.corb.JasyptDecrypter  

sample jasypt.properties

jasypt.password=foo   
jasypt.algorithm=PBEWithMD5AndTripleDES  
sample 11 - private key encryption with java keys

XCC-CONNECTION-URI, XCC-USERNAME, XCC-PASSWORD, XCC-HOSTNAME, XCC-PORT and/or XCC-DBNAME properties can be encrypted and optionally enclosed by ENC()

XCC-CONNECTION-URI=encrypted_uri    
DECRYPTER=com.marklogic.developer.corb.PrivateKeyDecrypter  
PRIVATE-KEY-FILE=/path/to/key/private.key  
PRIVATE-KEY-ALGORITHM=RSA  
sample 12 - private key encryption with unix keys

XCC-CONNECTION-URI, XCC-USERNAME, XCC-PASSWORD, XCC-HOSTNAME, XCC-PORT and/or XCC-DBNAME properties can be encrypted and optionally enclosed by ENC()

XCC-CONNECTION-URI=encrypted_uri  
DECRYPTER=com.marklogic.developer.corb.PrivateKeyDecrypter  
PRIVATE-KEY-FILE=/path/to/rsa/key/rivate.pkcs8.key  
sample 13 - JavaScript modules deployed to modules database
MODULE-ROOT=/temp/  
MODULES-DATABASE=MY-Modules-DB  
URIS-MODULE=get-uris.sjs  
PROCESS-MODULE=transform.sjs  
sample 14 - Adhoc JavaScript modules
URIS-MODULE=get-uris.sjs|ADHOC  
PROCESS-MODULE=extract.sjs|ADHOC