-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for dictionary encoded vector ingestion #85
Conversation
This pull request was exported from Phabricator. Differential Revision: D59863671 |
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
8246c40
to
e136999
Compare
This pull request was exported from Phabricator. Differential Revision: D59863671 |
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
e136999
to
7c1f1f8
Compare
This pull request was exported from Phabricator. Differential Revision: D59863671 |
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
7c1f1f8
to
64cef9d
Compare
This pull request was exported from Phabricator. Differential Revision: D59863671 |
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
64cef9d
to
5b81d73
Compare
This pull request was exported from Phabricator. Differential Revision: D59863671 |
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
5b81d73
to
c3adf45
Compare
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
This pull request was exported from Phabricator. Differential Revision: D59863671 |
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
Summary: Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again. Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements. A caveat is that right now, `isDictionaryValidRunLengthEncoded` function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1). Reviewed By: helfman Differential Revision: D59863671
This pull request has been merged in 516efcb. |
Summary:
Allows for already encoded dictionaries to passthrough the writer as opposed to being decoded + deduped again.
Specifically, we want to allow for a "passthrough" without decoding when the schema indicates that we are ingesting an ArrayVector, but the array we find is a dictionaryVector with ArrayVector values. When this is the case, we will check to see if the dictionary is a valid run-length encoding (meaning, it likely has been deduped by upstream). The way in which the vector is written to storage does not change, we will still write the offsets and the ArrayVector elements.
A caveat is that right now,
isDictionaryValidRunLengthEncoded
function doesn't actually verify that there is a duplicate. It's possible that a scenario exists in which a DictionaryVector was passed to the ArrayVector field writer, but the DictionaryVector has not been deduped, but its offsets still technically fulfill a run-length encoding (each offset is greater than or equal to the offset at index - 1).Reviewed By: helfman
Differential Revision: D59863671