Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow specifying date format for parsing #878

Merged
merged 1 commit into from
Jul 25, 2024
Merged

Conversation

nightscape
Copy link
Owner

@nightscape nightscape commented Jul 25, 2024

PR Type

Enhancement, Documentation


Description

  • Added support for custom date parsing by introducing a dateFormat parameter.
  • Enhanced HeaderDataColumn to handle date parsing using the new parseDate function.
  • Updated ExcelRelation to include a dateParser for custom date formats.
  • Documented the new dateFormat option in the README.

Changes walkthrough 📝

Relevant files
Enhancement
DataColumn.scala
Add support for custom date parsing in DataColumn               

src/main/scala/com/crealytics/spark/excel/DataColumn.scala

  • Added support for parsing date strings into java.sql.Date.
  • Modified HeaderDataColumn to include parseDate function.
  • Enhanced date extraction logic to handle different cell types.
  • +9/-3     
    DefaultSource.scala
    Introduce dateFormat parameter in DefaultSource                   

    src/main/scala/com/crealytics/spark/excel/DefaultSource.scala

    • Added dateFormat parameter to DefaultSource.
    +1/-0     
    ExcelRelation.scala
    Support custom date parsing in ExcelRelation                         

    src/main/scala/com/crealytics/spark/excel/ExcelRelation.scala

  • Added support for dateFormat in ExcelRelation.
  • Implemented dateParser for custom date parsing.
  • Updated HeaderDataColumn instantiation to include dateParser.
  • +12/-1   
    Documentation
    README.md
    Document dateFormat option and minor formatting fixes       

    README.md

  • Documented the new dateFormat option.
  • Minor formatting corrections.
  • +4/-3     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Error Handling
    The handling of CellType.ERROR for DateType cells defaults to a specific date (1970-01-01). Consider if this could lead to data misinterpretation or if a different approach like logging or throwing an exception might be more appropriate.

    SimpleDateFormat Thread Safety
    SimpleDateFormat is not thread-safe. The current implementation in dateParser might lead to issues if used in a multi-threaded environment. Consider using DateTimeFormatter from Java 8 onwards for thread safety.

    Copy link

    github-actions bot commented Jul 25, 2024

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Add error handling for date conversions to prevent runtime exceptions

    Consider handling potential exceptions from DateUtil.getJavaDate(n).getTime and
    parser.parse(stringValue).getTime which might throw an exception if the numeric or
    string values are not valid dates.

    src/main/scala/com/crealytics/spark/excel/DataColumn.scala [120]

    -numericValue.map(n => new Date(DateUtil.getJavaDate(n).getTime))
    -(stringValue: String) => new java.sql.Date(parser.parse(stringValue).getTime)
    +numericValue.flatMap(n => Try(new Date(DateUtil.getJavaDate(n).getTime)).toOption)
    +(stringValue: String) => Try(new java.sql.Date(parser.parse(stringValue).getTime)).getOrElse(defaultDate)
     
    Suggestion importance[1-10]: 9

    Why: This suggestion addresses a potential bug by adding error handling for date conversions, which is crucial for preventing runtime exceptions and ensuring robustness.

    9
    Maintainability
    Use the provided parseDate function for consistency and flexibility in date parsing

    Replace the hardcoded instantiation of java.sql.Date with a call to parseDate to
    maintain consistency and leverage the provided parsing function.

    src/main/scala/com/crealytics/spark/excel/DataColumn.scala [118]

    -case CellType.ERROR => Some(new java.sql.Date(0))
    +case CellType.ERROR => Some(parseDate("1970-01-01"))
     
    Suggestion importance[1-10]: 8

    Why: This suggestion improves maintainability by using the existing parseDate function, ensuring consistency and flexibility in date parsing. It addresses a minor issue but enhances code readability and maintainability.

    8
    Refactor date parsing into a separate method for better code organization

    Encapsulate the date parsing logic into a separate method to improve code
    readability and maintainability.

    src/main/scala/com/crealytics/spark/excel/ExcelRelation.scala [67-73]

    -private val dateParser: String => Date = {
    -  dateFormat
    -    .map { fmt =>
    -      val parser = new SimpleDateFormat(fmt)
    -      (stringValue: String) => new java.sql.Date(parser.parse(stringValue).getTime)
    -    }
    -    .getOrElse((stringValue: String) => Date.valueOf(stringValue))
    +private def parseDate(stringValue: String): Date = {
    +  dateFormat.map { fmt =>
    +    val parser = new SimpleDateFormat(fmt)
    +    Try(new java.sql.Date(parser.parse(stringValue).getTime)).getOrElse(defaultDate)
    +  }.getOrElse(Date.valueOf(stringValue))
     }
    +private val dateParser: String => Date = parseDate
     
    Suggestion importance[1-10]: 8

    Why: This suggestion improves maintainability by encapsulating the date parsing logic into a separate method, enhancing code readability and organization.

    8
    Enhancement
    Enhance date parsing robustness by configuring the SimpleDateFormat parser

    Use a more robust date parsing approach that handles different date formats and
    locales by setting the locale and lenient properties of SimpleDateFormat.

    src/main/scala/com/crealytics/spark/excel/ExcelRelation.scala [70]

     val parser = new SimpleDateFormat(fmt)
    +parser.setLenient(false)
    +parser.setTimeZone(TimeZone.getTimeZone("UTC"))
     
    Suggestion importance[1-10]: 7

    Why: This suggestion improves the robustness of date parsing by setting the locale and lenient properties of SimpleDateFormat. It enhances the code but is not critical.

    7

    @nightscape nightscape merged commit 1d1a644 into main Jul 25, 2024
    40 checks passed
    @nightscape nightscape deleted the parsing-date-format branch July 25, 2024 18:31
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants