Skip to content

Commit

Permalink
Merge pull request #16 from christianherweg0807/s3_role_by_bucket
Browse files Browse the repository at this point in the history
S3 role by bucket
  • Loading branch information
cherweg authored Jun 18, 2019
2 parents 41a06c3 + 32b6620 commit 20526f0
Show file tree
Hide file tree
Showing 15 changed files with 777 additions and 464 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
##2.0.0
Breaking Changes:
- s3_key_prefix was never functional and will be removed
config for s3 paths are regex (if not exact match)
- s3_options_by_bucket substitutes all s3_* options
We will merge deprecated options into the new structure for one release
Changes:
- Refactor plugin structure to be more modular
- Rework threadding design
- introduce s3_options_by_bucket to configure settings (e.g aws_options_hash or type)
##1.6.1
- Fix typo in gzip error logging
##1.6.0
Expand Down
91 changes: 51 additions & 40 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -9,51 +9,58 @@ PATH
GEM
remote: https://rubygems.org/
specs:
aws-sdk (2.11.75)
aws-sdk-resources (= 2.11.75)
aws-sdk-core (2.11.75)
aws-eventstream (1.0.2)
aws-sdk (2.11.253)
aws-sdk-resources (= 2.11.253)
aws-sdk-core (2.11.253)
aws-sigv4 (~> 1.0)
jmespath (~> 1.0)
aws-sdk-resources (2.11.75)
aws-sdk-core (= 2.11.75)
aws-sdk-resources (2.11.253)
aws-sdk-core (= 2.11.253)
aws-sdk-v1 (1.67.0)
json (~> 1.4)
nokogiri (~> 1)
aws-sigv4 (1.0.2)
aws-sigv4 (1.1.0)
aws-eventstream (~> 1.0, >= 1.0.2)
chronic_duration (0.10.6)
numerizer (~> 0.1.1)
clamp (0.6.5)
coderay (1.1.2)
concurrent-ruby (1.0.5-java)
concurrent-ruby (1.1.5)
diff-lcs (1.3)
elasticsearch (5.0.4)
elasticsearch-api (= 5.0.4)
elasticsearch-transport (= 5.0.4)
elasticsearch-api (5.0.4)
elasticsearch (5.0.5)
elasticsearch-api (= 5.0.5)
elasticsearch-transport (= 5.0.5)
elasticsearch-api (5.0.5)
multi_json
elasticsearch-transport (5.0.4)
elasticsearch-transport (5.0.5)
faraday
multi_json
faraday (0.13.1)
faraday (0.15.4)
multipart-post (>= 1.2, < 3)
ffi (1.9.18-java)
ffi (1.10.0-java)
filesize (0.0.4)
fivemat (1.3.5)
fivemat (1.3.7)
gem_publisher (1.5.0)
gems (0.8.3)
i18n (0.6.9)
insist (1.0.0)
jar-dependencies (0.3.11)
jar-dependencies (0.4.0)
jmespath (1.4.0)
jrjackson (0.4.4-java)
jrjackson (0.4.7-java)
jruby-openssl (0.9.19-java)
json (1.8.6-java)
kramdown (1.14.0)
logstash-codec-json (3.0.5)
logstash-core-plugin-api (>= 1.60, <= 2.99)
logstash-codec-json_stream (1.0.0)
logstash-codec-line (>= 2.1.0)
logstash-core-plugin-api (>= 1.60, <= 2.99)
logstash-codec-line (3.0.8)
logstash-core-plugin-api (>= 1.60, <= 2.99)
logstash-codec-plain (3.0.6)
logstash-core-plugin-api (>= 1.60, <= 2.99)
logstash-core (5.5.1.snapshot1-java)
logstash-core (5.6.4-java)
chronic_duration (= 0.10.6)
clamp (~> 0.6.5)
concurrent-ruby (~> 1.0, >= 1.0.5)
Expand All @@ -62,7 +69,7 @@ GEM
gems (~> 0.8.3)
i18n (= 0.6.9)
jar-dependencies
jrjackson (~> 0.4.0)
jrjackson (~> 0.4.3)
jruby-openssl (= 0.9.19)
manticore (>= 0.5.4, < 1.0.0)
minitar (~> 0.5.4)
Expand All @@ -75,9 +82,9 @@ GEM
stud (~> 0.0.19)
thread_safe (~> 0.3.5)
treetop (< 1.5.0)
logstash-core-plugin-api (2.1.27-java)
logstash-core (= 5.5.1.snapshot1)
logstash-devutils (1.3.5-java)
logstash-core-plugin-api (2.1.28-java)
logstash-core (= 5.6.4)
logstash-devutils (1.3.6-java)
fivemat
gem_publisher
insist (= 1.0.0)
Expand All @@ -93,13 +100,15 @@ GEM
aws-sdk-v1 (>= 1.61.0)
logstash-codec-plain
logstash-core-plugin-api (>= 1.60, <= 2.99)
manticore (0.6.1-java)
manticore (0.6.4-java)
openssl_pkcs8_pure
method_source (0.8.2)
minitar (0.5.4)
multi_json (1.12.2)
multi_json (1.13.1)
multipart-post (2.0.0)
nokogiri (1.8.3-java)
nokogiri (1.10.2-java)
numerizer (0.1.1)
openssl_pkcs8_pure (0.0.0.2)
polyglot (0.3.5)
pry (0.10.4-java)
coderay (~> 1.1.0)
Expand All @@ -108,22 +117,22 @@ GEM
spoon (~> 0.0)
puma (2.16.0-java)
rack (1.6.6)
rack-protection (1.5.3)
rack-protection (1.5.5)
rack
rake (12.2.1)
rspec (3.7.0)
rspec-core (~> 3.7.0)
rspec-expectations (~> 3.7.0)
rspec-mocks (~> 3.7.0)
rspec-core (3.7.0)
rspec-support (~> 3.7.0)
rspec-expectations (3.7.0)
rake (12.3.2)
rspec (3.8.0)
rspec-core (~> 3.8.0)
rspec-expectations (~> 3.8.0)
rspec-mocks (~> 3.8.0)
rspec-core (3.8.0)
rspec-support (~> 3.8.0)
rspec-expectations (3.8.2)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.7.0)
rspec-mocks (3.7.0)
rspec-support (~> 3.8.0)
rspec-mocks (3.8.0)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.7.0)
rspec-support (3.7.0)
rspec-support (~> 3.8.0)
rspec-support (3.8.0)
rspec-wait (0.0.9)
rspec (>= 3, < 4)
ruby-maven (3.3.12)
Expand All @@ -139,7 +148,7 @@ GEM
ffi
stud (0.0.23)
thread_safe (0.3.6-java)
tilt (2.0.8)
tilt (2.0.9)
treetop (1.4.15)
polyglot
polyglot (>= 0.3.1)
Expand All @@ -148,8 +157,10 @@ PLATFORMS
java

DEPENDENCIES
logstash-codec-json_stream
logstash-devutils
logstash-input-s3-sns-sqs!

BUNDLED WITH
1.16.2
2.0.1

1 change: 1 addition & 0 deletions fixtures/log-stream.real-formatted

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions lib/logstash/inputs/codec_factory.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# CodecFactory:
# lazy-fetch codec plugins
# (FIXME: is this thread-safe?)
require "logstash/inputs/threadable"

#module LogStash module Inputs class S3SNSSQS < LogStash::Inputs::Threadable
class CodecFactory
def initialize(logger, options)
@logger = logger
@default_codec = options[:default_codec]
@codec_by_folder = options[:codec_by_folder]
@codecs = {
'default' => @default_codec
}
end

def get_codec(record)
codec = find_codec(record)
if @codecs[codec].nil?
@codecs[codec] = get_codec_plugin(codec)
end
@logger.debug("Switching to codec #{codec}") if codec != 'default'
return @codecs[codec]
end

private

def find_codec(record)
bucket, key, folder = record[:bucket], record[:key], record[:folder]
unless @codec_by_folder[bucket].nil?
@logger.debug("trying to find codec for folder #{folder}", :codec => @codec_by_folder[bucket][folder])
return @codec_by_folder[bucket][folder] unless @codec_by_folder[bucket][folder].nil?
end
return 'default'
end

def get_codec_plugin(name, options = {})
LogStash::Plugin.lookup('codec', name).new(options)
end
end
#end;end;end
Binary file removed lib/logstash/inputs/mime/.DS_Store
Binary file not shown.
66 changes: 66 additions & 0 deletions lib/logstash/inputs/s3/client_factory.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# not needed - Mutex is part of core lib:
#require 'thread'
require "logstash/inputs/threadable"


#module LogStash module Inputs class S3SNSSQS < LogStash::Inputs::Threadable
class S3ClientFactory

def initialize(logger, options, aws_options_hash)
@logger = logger
@aws_options_hash = aws_options_hash
# FIXME: region per bucket?
@sts_client = Aws::STS::Client.new(region: options[:aws_region])
# FIXME: options are non-generic (...by_bucket mixes credentials with folder stuff)
@credentials_by_bucket = options[:s3_credentials_by_bucket]
@logger.debug("Credentials by Bucket", :credentials => @credentials_by_bucket)
@default_session_name = options[:s3_role_session_name]
@clients_by_bucket = {}
#@mutexes_by_bucket = {}
@creation_mutex = Mutex.new
end

def get_s3_client(bucket_name)
bucket_symbol = bucket_name.to_sym
@creation_mutex.synchronize do

if @clients_by_bucket[bucket_symbol].nil?
options = @aws_options_hash
unless @credentials_by_bucket[bucket_name].nil?
options.merge!(credentials: get_s3_auth(@credentials_by_bucket[bucket_name]))
end
@clients_by_bucket[bucket_symbol] = Aws::S3::Client.new(options)
@logger.debug("Created a new S3 Client", :bucket_name => bucket_name, :client => @clients_by_bucket[bucket_symbol], :used_options => options)
#@mutexes_by_bucket[bucket_symbol] = Mutex.new
end
end
# to be thread-safe, one uses this method like this:
# s3_client_factory.get_s3_client(my_s3_bucket) do
# ... do stuff ...
# end
# FIXME: this does not allow concurrent downloads from the same bucket!
# So we are testing this without this mutex.
#@mutexes_by_bucket[bucket_symbol].synchronize do
yield @clients_by_bucket[bucket_symbol]
#end
end

private

def get_s3_auth(credentials)
# reminder: these are auto-refreshing!
if credentials.key?('role')
@logger.debug("Assume Role", :role => credentials["role"])
return Aws::AssumeRoleCredentials.new(
client: @sts_client,
role_arn: credentials['role'],
role_session_name: @default_session_name
)
elsif credentials.key?('access_key_id') && credentials.key?('secret_access_key')
@logger.debug("Fetch credentials", :access_key => credentials['access_key_id'])
return Aws::Credentials.new(credentials)
end
end

end # class
#end;end;end
59 changes: 59 additions & 0 deletions lib/logstash/inputs/s3/downloader.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# encoding: utf-8
require 'fileutils'
require 'thread'
require "logstash/inputs/threadable"
#require "logstash/inputs/s3/remote_file"

#module LogStash module Inputs class S3SNSSQS < LogStash::Inputs::Threadable
class S3Downloader

def initialize(logger, stop_semaphore, options)
@logger = logger
@stopped = stop_semaphore
@factory = options[:s3_client_factory]
@delete_on_success = options[:delete_on_success]
end

def copy_s3object_to_disk(record)
# (from docs) WARNING:
# yielding data to a block disables retries of networking errors!
begin
@factory.get_s3_client(record[:bucket]) do |s3|
response = s3.get_object(
bucket: record[:bucket],
key: record[:key],
response_target: record[:local_file]
)
end
rescue Aws::S3::Errors::ServiceError => e
@logger.error("Unable to download file. Requeuing the message", :error => e, :record => record)
# prevent sqs message deletion
throw :skip_delete
end
throw :skip_delete if stop?
return true
end

def cleanup_local_object(record)
FileUtils.remove_entry_secure(record[:local_file], true) if ::File.exists?(record[:local_file])
rescue Exception => e
@logger.warn("Could not delete file", :file => record[:local_file], :error => e)
end

def cleanup_s3object(record)
return unless @delete_on_success
begin
@factory.get_s3_client(record[:bucket]) do |s3|
s3.delete_object(bucket: record[:bucket], key: record[:key])
end
rescue Exception => e
@logger.warn("Failed to delete s3 object", :record => record, :error => e)
end
end

def stop?
@stopped.value
end

end # class
#end;end;end
Loading

0 comments on commit 20526f0

Please sign in to comment.