-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial/incremental decompression of clusters #411
Closed
Closed
Changes from 1 commit
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
3dc7700
Introduced zim::IDataStream
veloman-yunkan eed7ade
Enter BufDataStream
veloman-yunkan ff249c7
zim::ReaderDataStreamWrapper
veloman-yunkan d344009
Enter embryonic CompressedCluster
veloman-yunkan 9c46d28
Got rid of read_size() in cluster.cpp
veloman-yunkan 56505a5
CompressedCluster::blobs_
veloman-yunkan eaf0aed
CompressedCluster is fully read via IDataStream
veloman-yunkan a9f2fde
Enter NonCompressedCluster
veloman-yunkan 34c1326
Enter DecodedDataStream
veloman-yunkan 757cd27
Streaming decompression of clusters
veloman-yunkan ac857be
Lazy/on-demand decompression of clusters
veloman-yunkan 1c2a7b9
Removed unnecessary indentation
veloman-yunkan 1ac51bd
DecodedDataStreamTest.largeCompressedData
veloman-yunkan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
/* | ||
* Copyright (C) 2020 Veloman Yunkan | ||
* | ||
* This program is free software; you can redistribute it and/or | ||
* modify it under the terms of the GNU General Public License as | ||
* published by the Free Software Foundation; either version 2 of the | ||
* License, or (at your option) any later version. | ||
* | ||
* This program is distributed in the hope that it will be useful, but | ||
* is provided AS IS, WITHOUT ANY WARRANTY; without even the implied | ||
* warranty of MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, and | ||
* NON-INFRINGEMENT. See the GNU General Public License for more details. | ||
* | ||
* You should have received a copy of the GNU General Public License | ||
* along with this program; if not, write to the Free Software | ||
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA | ||
* | ||
*/ | ||
|
||
#include "idatastream.h" | ||
|
||
namespace zim | ||
{ | ||
|
||
IDataStream::Blob | ||
IDataStream::readBlobImpl(size_t size) | ||
{ | ||
std::shared_ptr<char> buf(new char[size], std::default_delete<char[]>()); | ||
readImpl(buf.get(), size); | ||
return Blob(buf, size); | ||
} | ||
|
||
} // namespace zim |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
/* | ||
* Copyright (C) 2020 Veloman Yunkan | ||
* | ||
* This program is free software; you can redistribute it and/or | ||
* modify it under the terms of the GNU General Public License as | ||
* published by the Free Software Foundation; either version 2 of the | ||
* License, or (at your option) any later version. | ||
* | ||
* This program is distributed in the hope that it will be useful, but | ||
* is provided AS IS, WITHOUT ANY WARRANTY; without even the implied | ||
* warranty of MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, and | ||
* NON-INFRINGEMENT. See the GNU General Public License for more details. | ||
* | ||
* You should have received a copy of the GNU General Public License | ||
* along with this program; if not, write to the Free Software | ||
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA | ||
* | ||
*/ | ||
|
||
#ifndef ZIM_IDATASTREAM_H | ||
#define ZIM_IDATASTREAM_H | ||
|
||
#include <exception> | ||
#include <memory> | ||
|
||
#include "endian_tools.h" | ||
|
||
namespace zim | ||
{ | ||
|
||
// IDataStream is a simple interface for sequential iteration over a stream | ||
// of values of built-in/primitive types and/or opaque binary objects (blobs). | ||
// An example usage: | ||
// | ||
// void foo(IDataStream& s) | ||
// { | ||
// const uint32_t n = s.read<uint32_t>(); | ||
// for(uint32_t i=0; i < n; ++i) | ||
// { | ||
// const uint16_t blobSize = s.read<uint16_t>(); | ||
// IDataStream::Blob blob = s.readBlob(blobSize); | ||
// bar(blob, blobSize); | ||
// } | ||
// } | ||
// | ||
class IDataStream | ||
{ | ||
public: // types | ||
class Blob | ||
{ | ||
private: // types | ||
typedef std::shared_ptr<const char> DataPtr; | ||
|
||
public: // functions | ||
Blob(const DataPtr& data, size_t size) : data_(data) , size_(size) {} | ||
|
||
const char* data() const { return data_.get(); } | ||
size_t size() const { return size_; } | ||
|
||
private: // data | ||
DataPtr data_; | ||
size_t size_; | ||
}; | ||
|
||
public: // functions | ||
virtual ~IDataStream() {} | ||
|
||
// Reads a value of the said type from the stream | ||
// | ||
// For best portability this function should be used with types of known | ||
// bit-width (int32_t, uint16_t, etc) rather than builtin types with | ||
// unknown bit-width (int, unsigned, etc). | ||
template<typename T> T read(); | ||
|
||
// Reads a blob of the specified size from the stream | ||
Blob readBlob(size_t size); | ||
|
||
private: // virtual methods | ||
// Reads exactly 'nbytes' bytes into the provided buffer 'buf' | ||
// (which must be at least that big). Throws an exception if | ||
// more bytes are requested than can be retrieved. | ||
virtual void readImpl(void* buf, size_t nbytes) = 0; | ||
|
||
// By default a blob is returned as an independent object owning | ||
// its own buffer. However, the function readBlobImpl() can be | ||
// overriden so that it returns a blob referring to arbitrary | ||
// pre-existing memory. | ||
virtual Blob readBlobImpl(size_t size); | ||
}; | ||
|
||
//////////////////////////////////////////////////////////////////////////////// | ||
// Implementation of IDataStream | ||
//////////////////////////////////////////////////////////////////////////////// | ||
|
||
// XXX: Assuming that opaque binary data retrieved via 'readImpl()' | ||
// XXX: is encoded in little-endian form. | ||
template<typename T> | ||
inline T | ||
IDataStream::read() | ||
{ | ||
const size_t N = sizeof(T); | ||
char buf[N]; | ||
readImpl(&buf, N); | ||
return fromLittleEndian<T>(buf); // XXX: This handles only integral types | ||
} | ||
|
||
inline | ||
IDataStream::Blob | ||
IDataStream::readBlob(size_t size) | ||
{ | ||
return readBlobImpl(size); | ||
} | ||
|
||
} // namespace zim | ||
|
||
#endif // ZIM_IDATASTREAM_H |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
/* | ||
* Copyright (C) 2020 Veloman Yunkan | ||
* | ||
* This program is free software; you can redistribute it and/or | ||
* modify it under the terms of the GNU General Public License as | ||
* published by the Free Software Foundation; either version 2 of the | ||
* License, or (at your option) any later version. | ||
* | ||
* This program is distributed in the hope that it will be useful, but | ||
* is provided AS IS, WITHOUT ANY WARRANTY; without even the implied | ||
* warranty of MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, and | ||
* NON-INFRINGEMENT. See the GNU General Public License for more details. | ||
* | ||
* You should have received a copy of the GNU General Public License | ||
* along with this program; if not, write to the Free Software | ||
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA | ||
* | ||
*/ | ||
|
||
#include "idatastream.h" | ||
|
||
#include "gtest/gtest.h" | ||
|
||
namespace | ||
{ | ||
|
||
using zim::IDataStream; | ||
|
||
// Implement the IDataStream interface in the simplest way | ||
class InfiniteZeroStream : public IDataStream | ||
{ | ||
void readImpl(void* buf, size_t nbytes) { memset(buf, 0, nbytes); } | ||
}; | ||
|
||
// ... and test that it compiles and works as intended | ||
|
||
TEST(IDataStream, read) | ||
{ | ||
InfiniteZeroStream izs; | ||
IDataStream& ids = izs; | ||
EXPECT_EQ(0, ids.read<int>()); | ||
EXPECT_EQ(0L, ids.read<long>()); | ||
|
||
// zim::fromLittleEndian() handles only integer types | ||
// EXPECT_EQ(0.0, ids.read<double>()); | ||
} | ||
|
||
TEST(IDataStream, readBlob) | ||
{ | ||
const size_t N = 16; | ||
const char zerobuf[N] = {0}; | ||
InfiniteZeroStream izs; | ||
IDataStream& ids = izs; | ||
const IDataStream::Blob blob = ids.readBlob(N); | ||
EXPECT_EQ(0, memcmp(blob.data(), zerobuf, N)); | ||
} | ||
|
||
} // unnamed namespace |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using
zim::Blob
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention is to promote
IDataStream::Blob
tozim::Blob
, so eventually we will usezim::Blob
here 😉This version of a blob implementation has several advantages over the current version of
zim::Blob
zim::Blob
andzim::Buffer