You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similarly to #28. Let's look at the correspondence with TRY.
I've performed a similar matching of codes and names in TRY, and found few typos (see the detailed script below).
Same remarks as for GIFT, some APD traits have several matching traits on the same line for TRY, e.g., trait_0030810 has two traits matching on GIFT_close.
Names are globally matching but some names correspondence are off because TRY silently modified the names of the trait. The names can be updated accordingly by matching the TraitID in an updated TRY traits table (downloadable through TRY website: https://www.try-db.org/de/DnldTraitList.php).
More serious are the non-corresponding codes. It seems some matching are wrong because of this.
For example, 'leaf_cell_wall_N_per_cell_wall_dry_mass ' [APD:trait_0001511] is referenced as having a close match with 'Leaf cell wall nitrogen (N) per unit cell wall dry mass' referenced as [TRY:96] in APD, however, this traits corresponds to 'Seed oil content per seed mass'. While given the matched name it should be matching with [TRY:3377]. See the script for more example of this
Matching script
try_traits=readr::read_delim("tde2024422162351.txt", skip=3, col_select=-6)
apd_try_detailed=tibble::as_tibble(read.csv("APD_traits_input.csv")) |>
select(identifier:label, starts_with("TRY")) |>
rename(trait_id=identifier) |>tidyr::pivot_longer(
starts_with("TRY"), names_to="match_type", values_to="matched_trait"
) |>
filter(matched_trait!="") |>
mutate(
# Split for traits that have multiple matches on one linesplit_traits=purrr::map(stringr::str_split(matched_trait, ";"), trimws),
# Extract GIFT trait nameextracted_trait=purrr::map(
split_traits, \(x) stringr::str_extract(x, "^(.*)\\s\\[", group=1)
),
# Extract GIFT trait codeextracted_code=purrr::map(
split_traits, \(x) stringr::str_extract(x, "\\[TRY:(.+)\\]", group=1) |>
as.numeric()
)
) |>tidyr::unnest(split_traits:extracted_code)
apd_try_smaller=apd_try_detailed|># Match names based on trait code
left_join(
try_traits|>
distinct(TraitID, name_matched_on_code=Trait),
by= c(extracted_code="TraitID")
) |># Match code based on trait name
left_join(
try_traits|>
distinct(code_matched_on_name=TraitID, Trait),
by= c(extracted_trait="Trait")
)
select(trait, extracted_trait, extracted_code, name_matched_on_code, code_matched_on_name)
## Potentially problematic traits# non-matching names according to codeapd_try_smaller|>
filter(extracted_trait!=name_matched_on_code)
# non-matching code according to nameapd_try_smaller|>
filter(extracted_code!=code_matched_on_name)
The text was updated successfully, but these errors were encountered:
While addressing #29 found a few additional matches to TRY traits to add. This is not a comprehensive review of additional trait matches that might exist, but simply adding a few that were apparent.
@Rekyt Can you check if the changes on this branch look good to you? There are ~3 traits that won't match because we have to change ";" to "," in the names.
Thank you for pointing out the inconsistencies, especially those places where we have an incorrect TRY number-name match.
With the updated APD_traits_input.csv file, I only the traits you mention because of the substitution of semi-colons by commas and also of three dots being converted to an actual ellipsis character …, so it should be fine!
Also, I haven't mentioned it elsewhere, but as you may have guessed, I didn't find any issues with trait matched on BIEN. It's simpler of course because it has only 53 traits.
Similarly to #28. Let's look at the correspondence with TRY.
I've performed a similar matching of codes and names in TRY, and found few typos (see the detailed script below).
trait_0030810
has two traits matching onGIFT_close
.For example, 'leaf_cell_wall_N_per_cell_wall_dry_mass ' [APD:trait_0001511] is referenced as having a close match with 'Leaf cell wall nitrogen (N) per unit cell wall dry mass' referenced as [TRY:96] in APD, however, this traits corresponds to 'Seed oil content per seed mass'. While given the matched name it should be matching with [TRY:3377]. See the script for more example of this
Matching script
The text was updated successfully, but these errors were encountered: