The goal of this project is to provide an extensible Java/Groovy framework for reading and/or writing data, so that disparate data sources can be linked.
It uses convention over configuration (configuration being the fallback).
This library currently provides the ability to:
- read Excel2013, CSV and FixedWidth files
- write to any JDBC database connection (comes with a H2 database OOTB)
There are no public JARs available yet, simply because I haven't got anywhere to host them. If there's interest, I'll gladly publish the library (to search.maven.org?).
test.csv
Name,Age
Nick,30
Bob,27
Jack,55
example.groovy
import org.dataops.writers.JDBCWriter
def minAge = 29
new JDBCWriter()
.read('/test.csv')
.eachRow("select * from data where age > $minAge"){
println "$it.name is $it.age years old."
}
console output
Nick is 30 years old.
Jack is 55 years old.
console err
INFO: create table data (
name varchar(2048) ,
age decimal
)
example.groovy (snippet)
new JDBCWriter(Sql.newInstance('jdbc:h2:mem:test'))
.read('/test.txt', [schemaName: 'csv'])
.eachRow("select * from csv.data where age > $minAge"){
// do something
}
console err
INFO: create schema if not exists csv
INFO: create table csv.data (
name varchar(2048) ,
age decimal
)
example.groovy (snippet)
def db = new JDBCWriter()
db.read('/test.csv') // read from system
db.read('file:/test.csv') // read from system (using url syntax)
db.read('classpath:///test.csv') // read from classpath (using url syntax)
db.read('http://foo/bar/test.csv') // read from http
// or you can just use a URL
db.read(new URL('http://foo/bar/test.csv'))
Run bin\DataOps.bat
db = new JDBCWriter()
db.read '/test.csv'
db.rows 'select * from data'
DataOps comes with H2, MySQL and PostgreSQL drivers OOTB. Below are some examples on how to connect to them:
new JDBCWriter(Sql.newInstance('jdbc:h2:mem:test'))
new JDBCWriter(Sql.newInstance('jdbc:h2:file:/test'))
new JDBCWriter(Sql.newInstance('jdbc:postgresql://localhost:5432/test?user=postgres&password=password'))
new JDBCWriter(Sql.newInstance([
url: 'jdbc:h2:mem:foobar',
user: 'sa',
password: 'sa',
driver: 'org.h2.Driver']))
- DONE - Build executable jar, integrated with Groovy Shell.
- Ability to configure tableName mappings in options
- Ability to configure creation of schemas/tables
- Document "How to write your own Reader/Writer" (extend AbsDataReader/AbsDataWriter and register it.)
- Test reading files from URLs (e.g. http)
- Extend reader formats (Read html tables from web sites? Xml? Xls<2013? Other JDBC databases? etc???)
- Introduce batch mode (pipe data into a table from command line)
We make use of:
- Apache POI - for parsing Excel files
- OpenCSV - for parsing CSV files