Skip to content

Latest commit

 

History

History
148 lines (102 loc) · 3.62 KB

README.md

File metadata and controls

148 lines (102 loc) · 3.62 KB

DataOps


The goal of this project is to provide an extensible Java/Groovy framework for reading and/or writing data, so that disparate data sources can be linked.

It uses convention over configuration (configuration being the fallback).

This library currently provides the ability to:

  • read Excel2013, CSV and FixedWidth files
  • write to any JDBC database connection (comes with a H2 database OOTB)

Distribution

There are no public JARs available yet, simply because I haven't got anywhere to host them. If there's interest, I'll gladly publish the library (to search.maven.org?).


Example Usage: Reading a CSV file into an inmemory database connection

test.csv

Name,Age
Nick,30
Bob,27
Jack,55

example.groovy

import org.dataops.writers.JDBCWriter

def minAge = 29
new JDBCWriter()
        .read('/test.csv')
        .eachRow("select * from data where age > $minAge"){
    println "$it.name is $it.age years old."
}

console output

Nick is 30 years old.
Jack is 55 years old.

console err

INFO: create table data (
    name varchar(2048) ,
    age decimal
)

Example Usage: Reading a CSV file into a custom database connection/schema

example.groovy (snippet)

new JDBCWriter(Sql.newInstance('jdbc:h2:mem:test'))
        .read('/test.txt', [schemaName: 'csv'])
        .eachRow("select * from csv.data where age > $minAge"){
        // do something
}

console err

INFO: create schema if not exists csv
INFO: create table csv.data (
    name varchar(2048) ,
    age decimal
)

Example Usage: Reading files from different locations

example.groovy (snippet)

def db = new JDBCWriter()
db.read('/test.csv')                // read from system
db.read('file:/test.csv')           // read from system (using url syntax)
db.read('classpath:///test.csv')    // read from classpath (using url syntax)
db.read('http://foo/bar/test.csv')  // read from http
// or you can just use a URL
db.read(new URL('http://foo/bar/test.csv'))

Example Usage: Using the DataOps.bat file (GroovySH)

Run bin\DataOps.bat

db = new JDBCWriter()
db.read '/test.csv'
db.rows 'select * from data'

Example Usage: Different JDBC Connection strings.

DataOps comes with H2, MySQL and PostgreSQL drivers OOTB. Below are some examples on how to connect to them:

new JDBCWriter(Sql.newInstance('jdbc:h2:mem:test'))
new JDBCWriter(Sql.newInstance('jdbc:h2:file:/test'))
new JDBCWriter(Sql.newInstance('jdbc:postgresql://localhost:5432/test?user=postgres&password=password'))
new JDBCWriter(Sql.newInstance([
       url: 'jdbc:h2:mem:foobar',
       user: 'sa',
       password: 'sa',
       driver: 'org.h2.Driver']))

TODO

  • DONE - Build executable jar, integrated with Groovy Shell.
  • Ability to configure tableName mappings in options
  • Ability to configure creation of schemas/tables
  • Document "How to write your own Reader/Writer" (extend AbsDataReader/AbsDataWriter and register it.)
  • Test reading files from URLs (e.g. http)
  • Extend reader formats (Read html tables from web sites? Xml? Xls<2013? Other JDBC databases? etc???)
  • Introduce batch mode (pipe data into a table from command line)

References

We make use of:


Bitdeli Badge