Skip to content

Commit

Permalink
Merge pull request #6 from civitaspo/v0.1.0
Browse files Browse the repository at this point in the history
V0.1.0
  • Loading branch information
civitaspo committed Sep 8, 2015
2 parents 91fa8f0 + 519576c commit 2b5247b
Show file tree
Hide file tree
Showing 10 changed files with 518 additions and 236 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@
build/
.idea
*.iml
.ruby-version

13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Read files on Hdfs.
- **config** overwrites configuration parameters (hash, default: `{}`)
- **input_path** file path on Hdfs. you can use glob and Date format like `%Y%m%d/%s`.
- **rewind_seconds** When you use Date format in input_path property, the format is executed by using the time which is Now minus this property.
- **partition** when this is true, partition input files and increase task count. (default: `true`)

## Example

Expand All @@ -24,12 +25,13 @@ in:
- /opt/analytics/etc/hadoop/conf/core-site.xml
- /opt/analytics/etc/hadoop/conf/hdfs-site.xml
config:
fs.defaultFS: 'hdfs://hdp-nn1:8020'
fs.defaultFS: 'hdfs://hadoop-nn1:8020'
dfs.replication: 1
fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem'
fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem'
input_path: /user/embulk/test/%Y-%m-%d/*
rewind_seconds: 86400
partition: true
decoders:
- {type: gzip}
parser:
Expand All @@ -50,6 +52,15 @@ in:
- {name: c3, type: long}
```
## Note
- the feature of the partition supports only 3 line terminators.
- `\n`
- `\r`
- `\r\n`

## The Reference Implementation
- [hito4t/embulk-input-filesplit](https://github.com/hito4t/embulk-input-filesplit)

## Build

```
Expand Down
4 changes: 2 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ configurations {
provided
}

version = "0.0.3"
version = "0.1.0"

sourceCompatibility = 1.7
targetCompatibility = 1.7
Expand All @@ -22,7 +22,7 @@ dependencies {
provided "org.embulk:embulk-core:0.7.0"
// compile "YOUR_JAR_DEPENDENCY_GROUP:YOUR_JAR_DEPENDENCY_MODULE:YOUR_JAR_DEPENDENCY_VERSION"
compile 'org.apache.hadoop:hadoop-client:2.6.0'
compile 'com.google.guava:guava:14.0'
compile 'com.google.guava:guava:15.0'
testCompile "junit:junit:4.+"
}

Expand Down
2 changes: 1 addition & 1 deletion lib/embulk/input/hdfs.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Embulk::JavaPlugin.register_input(
"hdfs", "org.embulk.input.HdfsFileInputPlugin",
"hdfs", "org.embulk.input.hdfs.HdfsFileInputPlugin",
File.expand_path('../../../../classpath', __FILE__))
231 changes: 0 additions & 231 deletions src/main/java/org/embulk/input/HdfsFileInputPlugin.java

This file was deleted.

Loading

0 comments on commit 2b5247b

Please sign in to comment.