Built site for gh-pages

lter · Nov 13, 2024 · c11ee9a · c11ee9a
1 parent b6270bc
commit c11ee9a
Show file tree

Hide file tree

Showing 10 changed files with 489 additions and 160 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-d43672e4
+32b6215c
diff --git a/mod_data-viz.html b/mod_data-viz.html
diff --git a/mod_data-viz_files/figure-html/demo_all-num-vars_viz-code-real-1.png b/mod_data-viz_files/figure-html/demo_all-num-vars_viz-code-real-1.png
diff --git a/mod_data-viz_files/figure-html/demo_seasons_viz-code-real-1.png b/mod_data-viz_files/figure-html/demo_seasons_viz-code-real-1.png
diff --git a/mod_data-viz_files/figure-html/demo_tax-consist_viz-code-real-1.png b/mod_data-viz_files/figure-html/demo_tax-consist_viz-code-real-1.png
diff --git a/mod_data-viz_files/figure-html/multi-modal-1.png b/mod_data-viz_files/figure-html/multi-modal-1.png
diff --git a/mod_multivar-viz_files/figure-html/nms-ord-1.png b/mod_multivar-viz_files/figure-html/nms-ord-1.png
diff --git a/mod_stats_files/figure-html/mem-explore-graph-1.png b/mod_stats_files/figure-html/mem-explore-graph-1.png
diff --git a/search.json b/search.json
@@ -526,6 +526,17 @@
       "Data Visualization"
     ]
   },
+  {
+    "objectID": "mod_data-viz.html#code-demo-post-harmonization-visualization",
+    "href": "mod_data-viz.html#code-demo-post-harmonization-visualization",
+    "title": "Data Visualization & Exploration",
+    "section": "Code Demo: Post-Harmonization Visualization",
+    "text": "Code Demo: Post-Harmonization Visualization\nAfter harmonizing your data, you’ll want to generate one last set of ‘sanity check’ plots to make sure (1) you have interpreted the metadata correctly (2) you haven’t made any obvious errors in the harmonization and (3) your data are ready for analysis. Nothing is less fun than finding out your analytical results are due to an error in the underlying data.\nFor these plots, printing out to multi-page PDFs can be helpful instead of trying to scroll through many pages on the screen.\n\nAdditional Needed Packages\nIf you’d like to follow along with the code chunks included throughout this demo, you’ll need to install the following packages:\n\n## install.packages(\"librarian\")\nlibrarian::shelf(tidyverse, scales, ggforce, slider)\n\nThe three sets of plots below encompass many of the most common data structures we have encountered types in ecological synthesis projects. These include quantitative measurements collected over many sites, taxonomic data collected over many sites, and seasonal time series data.\n\nGraph All Numeric VariablesTaxonomic ConsistencySeasonal Time Series\n\n\nIt can be helpful to visualize all numeric variables in your dataset, grouped by site (or dataset source) to check that the data have been homogenized correctly. As an example, we’ll use a 2019 dataset on lake water quality, chemistry, and zooplankton community composition near the Niwot Ridge LTER. The dataset is a survey of 16 high alpine lakes and has structure similar to one that might be included in a multi-site synthesis. For more information on these data, check out the data package on EDI.\n\n# Read in data\n1green_biochem &lt;- read.csv(file = file.path(\"data\", \"green-lakes_water-chem-zooplank.csv\"))\n\n# Check structure\nstr(green_biochem)\n\n\n1\n\nNote that you could also read in this data directly from EDI. See ~line 31 of this script for a syntax example\n\n\n\n\n'data.frame':   391 obs. of  14 variables:\n $ local_site : chr  \"Blue Lake\" \"Blue Lake\" \"Blue Lake\" \"Blue Lake\" ...\n $ location   : chr  \"LAKE\" \"LAKE\" \"LAKE\" \"LAKE\" ...\n $ depth      : num  0 1 2 3 4 5 6 7 8 9 ...\n $ date       : chr  \"2016-07-08\" \"2016-07-08\" \"2016-07-08\" \"2016-07-08\" ...\n $ time       : chr  \"09:11:00\" \"09:13:00\" \"09:14:00\" \"09:16:00\" ...\n $ chl_a      : num  0.521 NA NA NA NA NA NA NA NA NA ...\n $ pH         : num  6.75 6.78 6.72 6.67 6.57 6.55 6.52 6.51 6.48 6.49 ...\n $ temp       : num  2.8 2.8 2.73 2.72 7.72 2.65 2.65 2.65 2.64 2.65 ...\n $ std_conduct: num  8 9 10 9 10 9 9 9 9 9 ...\n $ conduct    : num  4 5 6 6 6 5 5 5 5 6 ...\n $ DO         : num  8.23 8.14 8.14 8.05 8.11 8.07 8.21 8.19 8.17 8.16 ...\n $ sat        : num  60.9 60.1 60.2 59.4 59.8 59.4 60.3 60.3 60.1 60 ...\n $ secchi     : num  6.25 NA NA NA NA NA NA NA NA NA ...\n $ PAR        : num  1472 872 690 530 328 ...\n\n\nOnce we have the data, we can programmatically identify all columns that R knows to be numeric.\n\n# determine which columns are numeric in green_biochem\nnumcols &lt;- green_biochem %&gt;%\n1  dplyr::select(dplyr::where(~ is.numeric(.x) == TRUE)) %&gt;%\n  names(.) %&gt;% \n  sort(.)\n\n# Check that out\n2numcols\n\n\n1\n\nThe tilde (~) is allowing us to evaluate each column against this conditional\n\n2\n\nYou may notice that these columns all have \"num\" next to them in their structure check. The scripted method is dramatically faster and more reproducible than writing these names down by hand\n\n\n\n\n [1] \"chl_a\"       \"conduct\"     \"depth\"       \"DO\"          \"PAR\"        \n [6] \"pH\"          \"sat\"         \"secchi\"      \"std_conduct\" \"temp\"       \n\n\nNow that we have our data and a vector of numeric column names, we can generate a multi-page PDF of scatterplots where each page is specific to a numeric variable and each graph panel within a given page reflects a site-by-date combination.\n\n# Open PDF 'device'\n1grDevices::pdf(file = file.path(\"qc_all_numeric.pdf\"))\n\n# Loop across numeric variables\nfor (var in numcols) {\n  \n  # Create a set of graphs for onevariable\n  myplot &lt;- ggplot(green_biochem, aes(x = date, y = .data[[var]])) +\n2    geom_point(alpha = 0.5) +\n    facet_wrap(. ~ local_site)\n  \n  # Print that variable\n  print(myplot)\n}\n\n# Close the device\n3dev.off()\n\n\n1\n\nThis function tells R that the following code should be saved as a PDF\n\n2\n\nA scatterplot may not be the best tool for your data; adjust appropriately\n\n3\n\nThis function (when used after a ‘device’ function like grDevices::pdf) tells R when to stop adding things to the PDF and actually save it\n\n\n\n\nThe first page of the resulting plot should look something like the following, with each page having the same content but a different variable on the Y axis.\n\n\n\n\n\n\n\n\n\n\n\nTaxonomic time series can be tricky to work with due to inconsistencies in nomenclature and/or sampling effort. In particular, ‘pseudoturnover’ where one species ‘disappears’ with or without the simultaneous ‘appearance’ of another taxa can be indicative of either true extinctions, or changes in species names, or changes in methodology that cause particular taxa not to be detected. A second complication is that taxonomic data are often archived as ‘presence-only’ so it is necessary to infer the absences based on sampling methodology and add them to your dataset before analysis.\nWhile there are doubtless many field-collected datasets that have this issue, we’ve elected to simulate data so that we can emphasize the visualization elements of this problem while avoiding the “noise” typical of real data. This simulation is not necessarily vital to the visualization so we’ve left it out of the following demo. However, if that is of interest to you, see this script–in particular ~line 41 through ~80.\nA workflow for establishing taxonomic consistency and plotting the results is included below.\n\n# Read in data\ntaxa_df &lt;- read.csv(file.path(\"data\", \"simulated-taxa-df.csv\"))\n\n# Check structure\nstr(taxa_df)\n\n'data.frame':   1025 obs. of  4 variables:\n $ year : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...\n $ plot : int  1 1 1 1 1 1 1 1 1 1 ...\n $ taxon: chr  \"Taxon_A\" \"Taxon_B\" \"Taxon_C\" \"Taxon_D\" ...\n $ count: int  8 11 7 13 14 15 11 6 9 7 ...\n\n\nFirst, we’ll define units of sampling (year, plot and taxon) and ‘pad out’ the zeros. In this example, we have only added zeroes for taxa-plot-year combinations where that taxa is present in at least one year at a given plot. Again, this zero-padding is prerequisite to the visualization but not necessarily part of it so see ~lines 84-117 of the prep script if that process is of interest.\n\n# Read in data\nwithzeros &lt;- read.csv(file.path(\"data\", \"simulated-taxa-df_with-zeros.csv\"))\n\n# Check structure\n1str(withzeros)\n\n\n1\n\nNotice how there are more rows than the preceding data object and several new zeroes in the first few rows?\n\n\n\n\n'data.frame':   1100 obs. of  4 variables:\n $ plot : int  2 5 8 1 2 3 4 5 6 7 ...\n $ taxon: chr  \"Taxon_A\" \"Taxon_A\" \"Taxon_A\" \"Taxon_A\" ...\n $ year : int  2019 2019 2013 2010 2010 2010 2010 2010 2010 2010 ...\n $ n    : int  0 0 0 8 8 8 13 6 13 13 ...\n\n\nNow that we have the data in the format we need, we’ll create a plot of species counts over time with zeros filled in. Because there are many plots and it is difficult to see so many panels on the same page, we’ll use the facet_wrap_paginate function from the ggforce package to create a multi-page PDF output.\n\n# Create the plot of species counts over time (with zeros filled in)\nmyplot &lt;- ggplot(withzeros, aes(x = year, y = n, group = plot, color = plot)) +\n  geom_point() +\n  scale_x_continuous(breaks = scales::pretty_breaks()) +\n1  ggforce::facet_wrap_paginate(~ taxon, nrow = 2, ncol = 2)\n\n# Start the PDF output\ngrDevices::pdf(file.path(\"counts_by_taxon_with_zeros.pdf\"),\n               width = 9, height = 5)\n\n# Loop across pages (defined by `ggforce::facet_wrap_paginate`)\nfor (i in seq_along(ggforce::n_pages(myplot))) {\n  \n  page_plot &lt;- myplot + \n      ggforce::facet_wrap_paginate(~taxon, page = i, \n                                   nrow = 2, ncol = 2)\n  \n  print(page_plot)\n}\n\n# Close the PDF output\ndev.off()\n\n\n1\n\nThis allows a faceted graph to spread across more than one page. See ?ggforce::facet_wrap_paginate for details\n\n\n\n\nThe first page of the resulting plot should look something like this:\n\n\n\n\n\n\n\n\n\nNotice how “Taxon_A” is absent from all plots in 2014 whereas “Taxon_B” has extremely high counts in the same year. Often this can signify inconsistent use of taxonomic names over time.\n\n\nFor time series, intra-annual variation can often make data issues difficult to spot. In these cases, it can be helpful to plot each year onto the same figure and compare trends across study years.\nAs an example, we’ll use a 2024 dataset on streamflow near the Niwot Ridge LTER. The dataset is a 22 year time-series of daily streamflow. For more information on these data, check out the data package on EDI.\n\n# Read data\n1green_streamflow &lt;- read.csv(file.path(\"data\", \"green-lakes_streamflow.csv\"))\n\n# Check structure\nstr(green_streamflow)\n\n\n1\n\nNote again that you could also read in this data directly from EDI. See ~line 129 of this script for a syntax example\n\n\n\n\n'data.frame':   15451 obs. of  6 variables:\n $ LTER_site  : chr  \"NWT\" \"NWT\" \"NWT\" \"NWT\" ...\n $ local_site : chr  \"gl4\" \"gl4\" \"gl4\" \"gl4\" ...\n $ date       : chr  \"1981-06-12\" \"1981-06-13\" \"1981-06-14\" \"1981-06-15\" ...\n $ discharge  : num  9786 8600 7600 6700 5900 ...\n $ temperature: num  NA NA NA NA NA NA NA NA NA NA ...\n $ notes      : chr  \"flow data estimated from intermittent observations\" \"flow data estimated from intermittent observations\" \"flow data estimated from intermittent observations\" \"flow data estimated from intermittent observations\" ...\n\n\nLet’s now calculate a moving average encompassing the 5 values before and after each focal value.\n\n# Do necessary wrangling\nstream_data &lt;- green_streamflow %&gt;%\n  # Calculate moving average for each numeric variable\n  dplyr::mutate(dplyr::across(.cols = dplyr::all_of(c(\"discharge\", \"temperature\")),\n                              .fns = ~ slider::slide_dbl(.x = .x, .f = mean,\n                                                         .before = 5, .after = 5),\n                              .names = \"{.col}_move.avg\" )) %&gt;%\n  # Handle date format issues\n  dplyr::mutate(yday = lubridate::yday(date),\n                year = lubridate::year(date))\n\n# Check the structure of that\nstr(stream_data)\n\n'data.frame':   15451 obs. of  10 variables:\n $ LTER_site           : chr  \"NWT\" \"NWT\" \"NWT\" \"NWT\" ...\n $ local_site          : chr  \"gl4\" \"gl4\" \"gl4\" \"gl4\" ...\n $ date                : chr  \"1981-06-12\" \"1981-06-13\" \"1981-06-14\" \"1981-06-15\" ...\n $ discharge           : num  9786 8600 7600 6700 5900 ...\n $ temperature         : num  NA NA NA NA NA NA NA NA NA NA ...\n $ notes               : chr  \"flow data estimated from intermittent observations\" \"flow data estimated from intermittent observations\" \"flow data estimated from intermittent observations\" \"flow data estimated from intermittent observations\" ...\n $ discharge_move.avg  : num  7299 6699 5992 5527 5274 ...\n $ temperature_move.avg: num  NA NA NA NA NA NA NA NA NA NA ...\n $ yday                : num  163 164 165 166 167 168 169 170 171 172 ...\n $ year                : num  1981 1981 1981 1981 1981 ...\n\n\nPlot seasonal timeseries of each numeric variable as points with the moving average included as lines\n\n# Start PDF output\ngrDevices::pdf(file = file.path(\"qc_all_numeric_seasonal.pdf\"))\n\n# Loop across variables\nfor (var in c(\"discharge\", \"temperature\")) {\n  \n  # Make the graph\n  myplot &lt;- ggplot(stream_data, aes(x = yday, group = year, color = year)) +\n1    geom_point(aes(y = .data[[var]])) +\n2    geom_line(aes(y = .data[[paste0(var, \"_move.avg\")]])) +\n    viridis::scale_color_viridis()\n  \n  # Print it\n  print(myplot)\n}\n\n# End PDF creation\ndev.off()\n\n\n1\n\nAdd points based on the year\n\n2\n\nAdding lines based on the average\n\n\n\n\nThe resulting figure should look something like this:\n\n\n\n\n\n\n\n\n\nOne of these years is not like the others….",
+    "crumbs": [
+      "Phase II -- Plan",
+      "Data Visualization"
+    ]
+  },
   {
     "objectID": "mod_data-viz.html#multivariate-visualization",
     "href": "mod_data-viz.html#multivariate-visualization",

diff --git a/sitemap.xml b/sitemap.xml
@@ -2,114 +2,114 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://lter.github.io/ssecr/proj_teams.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_next-steps.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_stats.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/instructors.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/policy_ai.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/policy_attendance.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_data-disc.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/policy_conduct.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/CONTRIBUTING.html</loc>
-    <lastmod>2024-11-13T14:45:56.007Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.339Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_data-viz.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_version-control.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/proj_milestones.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_thinking.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/fellows.html</loc>
-    <lastmod>2024-11-13T14:45:56.031Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.371Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_reproducibility.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_project-mgmt.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/policy_pronouns.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_credit.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_facilitation.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_multivar-viz.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_wrangle.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_findings.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_spatial.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_team-sci.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_reports.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/policy_usability.html</loc>
-    <lastmod>2024-11-13T14:45:56.083Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/index.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
   <url>
     <loc>https://lter.github.io/ssecr/mod_interactivity.html</loc>
-    <lastmod>2024-11-13T14:45:56.079Z</lastmod>
+    <lastmod>2024-11-13T15:53:09.419Z</lastmod>
   </url>
 </urlset>