Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 6.1.1 #1365

Merged
merged 7 commits into from
Oct 9, 2024
Merged

Release 6.1.1 #1365

merged 7 commits into from
Oct 9, 2024

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Oct 9, 2024

Jira ref: PDP-1453

oguzhanunlu and others added 7 commits October 1, 2024 15:29
Iglu Scala Client has new lookupSchemasUntil function that allows to fetch list of schemas until given schema key.
If we replace listSchemasLike function with lookupSchemasUntil function, RDB Loader won't rely on the list endpoint of Iglu Server.

This commit makes necessary changes to use new lookupSchemasUntil function instead of listSchemasLike function.
Some of the tests are failing since windows don't have expected timestamps.
In order to solve this problem, this commit makes necessary changes to read
window timestamps from shredded message instead of using hard-coded values.
After starting to use lookupSchemasUntil in fetchSchemasWithSameModel, we are only getting
schemas until the given schema key for every schema key. Previously, we were getting all the
schemas for the same schema model.

This change caused change of behavior when a message contains multiple schema keys for same schema model.
When this happens, RDB Loader tries to create same table multiple times. In order to solve this problem,
this commit contains the change for creating the migration for only max schema key of the same schema model.
We've seen exceptions in spark executors like:

```
java.lang.NullPointerException: Cannot invoke "scala.collection.mutable.Set.isEmpty()" because the return value of "com.snowplowanalytics.snowplow.rdbloader.transformer.batch.spark.TypesAccumulator.accum()" is null
```

The error is coming from our Spark Accumulator for accumulating Iglu
types. This is similar to [an issue previously seen][1] in Spark's own
`CollectionAccumulator`. That issue [was fixed in Spark][2] by making
the accumulator's internal state non-final, and synchronizing access to
the internal state. So here we make the exact same change to our own
Accumulator.

It is a rare race condition which is hard to reproduce.

[1]: https://issues.apache.org/jira/browse/SPARK-20977
[2]: apache/spark#31540
@spenes spenes merged commit 0b447df into master Oct 9, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants