Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typos in correspondence with TRY #29

Open
Rekyt opened this issue May 24, 2024 · 3 comments
Open

Typos in correspondence with TRY #29

Rekyt opened this issue May 24, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Rekyt
Copy link

Rekyt commented May 24, 2024

Similarly to #28. Let's look at the correspondence with TRY.

I've performed a similar matching of codes and names in TRY, and found few typos (see the detailed script below).

  1. Same remarks as for GIFT, some APD traits have several matching traits on the same line for TRY, e.g., trait_0030810 has two traits matching on GIFT_close.
  2. Names are globally matching but some names correspondence are off because TRY silently modified the names of the trait. The names can be updated accordingly by matching the TraitID in an updated TRY traits table (downloadable through TRY website: https://www.try-db.org/de/DnldTraitList.php).
  3. More serious are the non-corresponding codes. It seems some matching are wrong because of this.
    For example, 'leaf_cell_wall_N_per_cell_wall_dry_mass ' [APD:trait_0001511] is referenced as having a close match with 'Leaf cell wall nitrogen (N) per unit cell wall dry mass' referenced as [TRY:96] in APD, however, this traits corresponds to 'Seed oil content per seed mass'. While given the matched name it should be matching with [TRY:3377]. See the script for more example of this
Matching script
try_traits = readr::read_delim("tde2024422162351.txt", skip = 3, col_select = -6)

apd_try_detailed = tibble::as_tibble(read.csv("APD_traits_input.csv")) |>
  select(identifier:label, starts_with("TRY")) |>
  rename(trait_id = identifier) |>
  tidyr::pivot_longer(
    starts_with("TRY"), names_to = "match_type", values_to = "matched_trait"
  ) |>
  filter(matched_trait != "") |> 
  mutate(
    # Split for traits that have multiple matches on one line
    split_traits = purrr::map(stringr::str_split(matched_trait, ";"), trimws),
    # Extract GIFT trait name
    extracted_trait = purrr::map(
      split_traits, \(x) stringr::str_extract(x, "^(.*)\\s\\[", group = 1)
    ),
    # Extract GIFT trait code
    extracted_code = purrr::map(
      split_traits, \(x) stringr::str_extract(x, "\\[TRY:(.+)\\]", group = 1) |>
        as.numeric()
    )
  ) |>
  tidyr::unnest(split_traits:extracted_code)

apd_try_smaller = apd_try_detailed |>
  # Match names based on trait code
  left_join(
    try_traits |>
      distinct(TraitID, name_matched_on_code = Trait),
    by = c(extracted_code = "TraitID")
  ) |>
  # Match code based on trait name
  left_join(
    try_traits |>
      distinct(code_matched_on_name = TraitID, Trait),
    by = c(extracted_trait = "Trait")
  )
  select(trait, extracted_trait, extracted_code, name_matched_on_code, code_matched_on_name)

## Potentially problematic traits
# non-matching names according to code
apd_try_smaller |>
  filter(extracted_trait != name_matched_on_code)

# non-matching code according to name
apd_try_smaller |>
  filter(extracted_code != code_matched_on_name)
@ehwenk ehwenk added the bug Something isn't working label Jul 31, 2024
@ehwenk ehwenk added this to AusTraits Jul 31, 2024
@ehwenk ehwenk moved this to Backlog in AusTraits Jul 31, 2024
@ehwenk
Copy link
Collaborator

ehwenk commented Aug 23, 2024

Addressed (2) and (3) above with b69f40c

(1) is intentional.

ehwenk added a commit that referenced this issue Aug 23, 2024
While addressing #29 found a few additional matches to TRY traits to add. This is not a comprehensive review of additional trait matches that might exist, but simply adding a few that were apparent.
@ehwenk
Copy link
Collaborator

ehwenk commented Aug 23, 2024

@Rekyt Can you check if the changes on this branch look good to you? There are ~3 traits that won't match because we have to change ";" to "," in the names.

Thank you for pointing out the inconsistencies, especially those places where we have an incorrect TRY number-name match.

@ehwenk ehwenk moved this from Backlog to In Progress in AusTraits Aug 23, 2024
@ehwenk ehwenk self-assigned this Aug 23, 2024
@Rekyt
Copy link
Author

Rekyt commented Aug 30, 2024

With the updated APD_traits_input.csv file, I only the traits you mention because of the substitution of semi-colons by commas and also of three dots being converted to an actual ellipsis character , so it should be fine!

Also, I haven't mentioned it elsewhere, but as you may have guessed, I didn't find any issues with trait matched on BIEN. It's simpler of course because it has only 53 traits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants