Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for product types #251

Merged
merged 38 commits into from
Jul 20, 2022
Merged

Conversation

jserranohidalgo
Copy link
Member

@jserranohidalgo jserranohidalgo commented Jul 14, 2022

Description

The main motivation for this PR is increasing the support for product types (i.e. case classes) in doric. This mainly includes:

  • Creating literal columns from (non-custom) product types.
  • Deserializing struct columns into their corresponding (non-custom) case class instances.
  • Deserializing columns into custom case class instances (e.g. creating a User instance from a string containing name and age).
  • Accessing fields of user-defined product types (not only of RowColumns).
  • Safe-access: if the specified field is missing, a compilation error ensues.

Some other changes in the LiteralSparkType and SparkType type classes have been also included. For instance:

  • Spark types now include a nullable field. This is for ensuring that the datatype calculated by doric (statically, through implicits), matches exactly the datatype calculated reflectively by Spark.
  • Add implicit Spark type instances for some new Spark datatypes: DateType, TimeStampType, ...
  • Literal columns are created through typedlit in order to serialize product types (and combinations thereof) out-of-the-box.

Related Issues

Resolves:

Partially solves:

Related but NOT SOLVED by this PR:

How Has This Been Tested?

Tests for Spark and Literal Spark types have been divded into four major specs:

  • Spark and Literal Spark Types tests - just checks that the implicit approach by doric and the reflective method by Spark, give the same datatypes.
  • Custom types tests. Tests related to custom type mappings.
  • Serialization tests. Tests related to the creation of literal columns for Scala values.
  • Deserialization tests- Tests related to the creation of Scala values from DataFrame rows.

@jserranohidalgo jserranohidalgo requested a review from a team as a code owner July 14, 2022 15:00
@github-actions github-actions bot added spark_2.4 PR changes to spark 2.4 spark_3.0 PR changes to spark 3.0 spark_3.1 PR changes to spark 3.1 spark_3.2 PR changes to spark 3.2 spark_3.3 PR changes to spark 3.3 labels Jul 14, 2022
@codecov
Copy link

codecov bot commented Jul 18, 2022

Codecov Report

Merging #251 (f7a5ce5) into main (888b56a) will increase coverage by 0.01%.
The diff coverage is 97.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #251      +/-   ##
==========================================
+ Coverage   97.28%   97.29%   +0.01%     
==========================================
  Files          58       58              
  Lines        1028     1107      +79     
  Branches       10       10              
==========================================
+ Hits         1000     1077      +77     
- Misses         28       30       +2     
Flag Coverage Δ
spark-2.4.x 93.40% <91.85%> (-0.51%) ⬇️
spark-3.0.x 96.58% <97.78%> (+0.07%) ⬆️
spark-3.1.x 97.40% <97.78%> (+0.01%) ⬆️
spark-3.2.x 97.40% <97.78%> (+0.01%) ⬆️
spark-3.3.x 97.40% <97.78%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
core/src/main/scala/doric/sem/Errors.scala 92.31% <ø> (ø)
core/src/main/scala/doric/types/SparkType.scala 96.00% <96.88%> (+0.62%) ⬆️
.../src/main/scala/doric/types/LiteralSparkType.scala 98.65% <98.36%> (-1.35%) ⬇️
core/src/main/scala/doric/DoricColumn.scala 66.67% <100.00%> (-4.76%) ⬇️
core/src/main/scala/doric/syntax/DStructs.scala 100.00% <100.00%> (ø)
...c/main/scala/doric/syntax/LiteralConversions.scala 100.00% <100.00%> (ø)
core/src/main/scala/doric/types/NumericType.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 888b56a...f7a5ce5. Read the comment docs.

@jserranohidalgo
Copy link
Member Author

@alfonsorr @eruizalo los dos ignore que quedan en los tests se resolverán con el ticket #250
salvo que tengáis algún comentario, por mi parte no veo nada más a añdir/cambiar

@alfonsorr alfonsorr merged commit 7ab6876 into hablapps:main Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark_2.4 PR changes to spark 2.4 spark_3.0 PR changes to spark 3.0 spark_3.1 PR changes to spark 3.1 spark_3.2 PR changes to spark 3.2 spark_3.3 PR changes to spark 3.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants