Skip to content

Commit

Permalink
Merge pull request #128 from jdlubrano/fix/double-printed-headers-whe…
Browse files Browse the repository at this point in the history
…n-using-with-first

Fix Double Printed Headers with `first`
  • Loading branch information
jdlubrano authored Apr 18, 2022
2 parents c57e913 + 4d31799 commit 0adad3e
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 1 deletion.
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,37 @@ stream.each_slice(100) do |lines|
end
```

##### Caveats

This library strives to provide streamed data via an `Enumerable` interface.
In order to be memory-efficient, however, each time the stream is iterated over,
a new GET request is made to fetch the data from its remote URL. For example,

```ruby
url = 'https://my.remote.file/file.txt'
stream = StreamLines::Reading::Stream.new(url)
do_something_with_first_row(stream.first) # GET request made

stream.each do |line| # same GET request made
# Do something with the line of data (the line will be a String)
end
```

makes two GET requests. The call to `first` makes a GET request to fetch
the first row of data. The subsequent call to `each` makes the same GET
request. To avoid unnecessary requests, I recommend a slightly different
approach, which may not be intuitive but does make only one network request:

```
url = 'https://my.remote.file/file.txt'
stream = StreamLines::Reading::Stream.new(url)
stream.each_with_index do |line, i|
do_something_with_first_row(line) if i.zero?
# Do something with the line of data (the line will be a String)
end
```

##### CSVs

This gem provides first-class support for streaming CSVs from a remote URL.
Expand Down
3 changes: 2 additions & 1 deletion lib/stream_lines/reading/csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class CSV
def initialize(url, **csv_options)
@url = url
@csv_options = accepted_csv_options(csv_options)
@first_row_headers = @csv_options[:headers] == true

encoding = @csv_options[:encoding] || Encoding.default_external
@stream = Stream.new(url, encoding: encoding)
Expand All @@ -41,7 +42,7 @@ def each(&block)
attr_reader :url

def first_row_headers?
@csv_options[:headers] == true
@first_row_headers
end

def assign_first_row_headers(first_line)
Expand Down
11 changes: 11 additions & 0 deletions spec/reading/csv_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,17 @@
expect(streamed_rows.map(&:to_h)).to eq([{ 'foo' => '1', 'bar' => '2' },
{ 'foo' => '3', 'bar' => '4' }])
end

it 'correctly yields all of the data' do
stream = described_class.new(url, headers: true)

cloud = []
cloud << stream.first.headers.to_csv
stream.each do |row|
cloud << row.fields.to_csv
end
expect(cloud).to eq ["foo,bar\n", "1,2\n", "3,4\n"]
end
end

context 'when the headers are provided as an array' do
Expand Down

0 comments on commit 0adad3e

Please sign in to comment.