From 976b0476896f1016f738254009028b83763c8894 Mon Sep 17 00:00:00 2001
From: Ty Garber <tyler.garber@dfw.wa.gov>
Date: Mon, 7 Oct 2024 15:06:06 -0700
Subject: [PATCH] data sharing guidance

---
 README.html | 370 ++++++++++++++++++++++++++++++++++++++--------------
 README.md   |  24 ++++
 2 files changed, 294 insertions(+), 100 deletions(-)
diff --git a/README.html b/README.html
index 7ea4812..4969580 100644
--- a/README.html
+++ b/README.html
@@ -364,7 +364,6 @@ <h1>Goals</h1>
 reproducible and approachable.</p>
 <!-- ## General practices -->
 <!-- Todo:
- - CBE: read other R best practices docs, adapt parts that make sense to us.
  - Populate / link directions on some of the suggestions (e.g., git, renv, etc)
 -->
 </div>
@@ -384,6 +383,67 @@ <h2>Overview</h2>
 tutorials and examples to show <em>how</em> to implement some of the
 less obvious practices.</p>
 </div>
+<div id="non-coding" class="section level2">
+<h2>Non-coding</h2>
+<p>The following are important for coding and non-coding projects
+alike.</p>
+<ul>
+<li>When sharing files via email it’s useful to have a consistent,
+informative version naming scheme. We recommend
+<code>filename_date_editorintials</code>.</li>
+<li>We encounter many issues when incorporating spreadsheet content into
+programming pipelines. Some of this stems from the difference between
+what is easily human-readable and what is easily machine-readable, and
+sometimes it may be appropriate to make the spreadsheets as
+human-readable as possible. However, some practices can make
+spreadsheets more machine-readable without affecting human readability.
+<ul>
+<li>Ensure that headers are consistent in different files that are meant
+to be combined. This includes capitalization and the use of spaces.
+Copy-pasting from a template file is a good way to ensure exactly
+identical headers in this case.</li>
+<li>Ensure that categories have a consistent name in a given column. For
+example, we have encountered data sheets with a “Yes/No” type column in
+which “Yes” is a mix of “Y”, “Yes”, “yes”, “yes”, and “yees”. When read
+into R or another language, these will be treated as five different
+categories instead of one. Consider using data validation in Excel to
+constrain user inputs to intended values. This can also be used to
+ensure that fields that should contain numbers do not end up with
+character strings.</li>
+<li>For more suggestions on good spreadsheet practices, see <a href="https://doi.org/10.1080/00031305.2017.1375989">Broman and Woo
+2018</a></li>
+</ul></li>
+</ul>
+</div>
+<div id="data-sharing-guidance" class="section level2">
+<h2>Data Sharing Guidance</h2>
+<p>Sharing what type of data and who it can be shared with is often
+confusing. Generally sport data can be shared with everyone freely,
+while commercial data has restrictions on who and <em>how</em> the data
+can be shared, for example under the Magnuson-Stevens Act (MSA)</p>
+<div id="sport" class="section level3">
+<h3>Sport</h3>
+<p>Sport data can usually freely shared to the public, although there
+might be restrictions around sharing charter fishing data via the
+MSA.</p>
+</div>
+<div id="commerical" class="section level3">
+<h3>Commerical</h3>
+<p>The MSA has to be considered when sharing commercial data, often the
+data has to be aggregated in way to not specifically identify fishers.
+In the co-management realm this is rarely and issue as much of the data
+is aggregated, but for sharing with the pubic guidance should be
+requested through WDFWs Records Office.</p>
+</div>
+<div id="treaty" class="section level3">
+<h3>Treaty</h3>
+<p>Public requests for treaty data should be directed to the individual
+tribes themselves or WDFW’s Records Office. Tribal data can be shared
+freely with the data’s respective tribe, sharing one tribes data with
+another tribe should be done under caution with the guidance of NWIFC
+staff.</p>
+</div>
+</div>
 <div id="project-management" class="section level2">
 <h2>Project management</h2>
 <p>For any kind of substantial work involving more than one file, use
@@ -394,10 +454,10 @@ <h2>Project management</h2>
 <p>When developing a document to report results or findings to a general
 user, use Rmarkdown or Quarto to create a report that blends R code with
 explanations and graphics.</p>
-<p>When a script is likely to be re-used (e.g. not a one-off analysis),
-create a commented version with instructions on use. This should be
-stored somewhere accessible. Collin Edwards maintains the
-<code>snippets</code> <a href="https://github.com/FRAMverse/snippets">github repository</a> for
+<p>When a script is likely to be re-used (e.g. not a one-off analysis)
+or if it is going to be shared, create a commented version with
+instructions on use. This should be stored somewhere accessible. Collin
+Edwards maintains the <code>snippets</code> <a href="https://github.com/FRAMverse/snippets">github repository</a> for
 this kind of thing, or it could live in a Teams folder. If code is
 likely to be useful to the team or others, it may also be appropriate to
 incorporate this code into an R package, or develop a new R package for
@@ -474,23 +534,17 @@ <h4>Outside WDFW</h4>
 be deleted from the 3rd party when the receipt is verified.</p>
 <p>When <code>.zip</code> files are blacklisted by the recipient’s IT
 department, an alternative would be the <code>.7z</code> format from the
-<a href="https://www.7-zip.org/">7-zip</a> software.</p>
+<a href="https://www.7-zip.org/">7-zip</a> software. Sometimes zipped
+files can successfully be emailed if the file name is changed to end in
+something else (e.g., <code>.zap</code>) and including instuctions to
+change the file name back.</p>
 </div>
 <div id="common-project-directory-structure" class="section level4">
 <h4>Common project directory structure</h4>
 <p>When working across multiple projects, it can be helpful if each
-project has a similar file structure. Ty and Collin will discuss what
-that should be, but good foundation is Ty’s approach, which has a
-<code>data/</code> and a <code>scripts/</code> subfolder. It may be
-helpful to also include a standardized readme with basic information
-(when project was started, what goal was, who was working on it). When
-Collin was working in an academic setting, he had a <a href="https://gist.github.com/cbedwards/7e64215e062c42da54dbd01626ef6a72">code
-snippet</a> that he ran whenever starting a new project, which created
-his standardized folder structure and auto-populated a few key template
-files. We could think about writing something similar.</p>
-<p>Draft file structure? <em>CBE: slightly updated. I like to keep the
-raw data in a separate folder from where cleaned / intermediate data
-files live.</em></p>
+project has a similar file structure. Your needs for individual projects
+may vary, but the following project structure is often a good option (or
+at least a good starting point).</p>
 <pre><code>project_folder
 ├── scripts
 │   ├── data_clean.R # should save to `cleaned_data/`
@@ -506,6 +560,37 @@ <h4>Common project directory structure</h4>
 │   └── some_spreadsheet.xlsx
 ├── .gitignore
 └── project_folder.Rproj</code></pre>
+<p>The idea with this folder structure is that:</p>
+<ul>
+<li>The <code>scripts/</code> folder contains R scripts used in the
+project.</li>
+<li><code>original_data/</code> contains the data files provided for
+this project (but not data files that are generated or cleaned in this
+project)</li>
+<li><code>cleaned_data/</code> contains any data files that are
+generated as part of this project (e.g., by cleaning and integrated data
+from <code>original_data/</code>), which can then be used for subsequent
+analyses in this project. The idea with this separation of data is that
+it makes it easier to ensure that original data files are <em>never</em>
+modified.</li>
+<li><code>figures/</code> contains image objects created as a part of
+this project. Depending on the project, this folder may not be used, but
+sometimes it’s appropriate to generate hundreds of figures
+programmatically (e.g. separate bar plots of fishery impacts for each
+stock).</li>
+<li><code>results/</code> contains non-image objects created as a part
+of this project. For example, if a project synthesizes data and produces
+summary <code>.csv</code> or <code>.xlsx</code> files, they would go in
+<code>results/</code></li>
+</ul>
+<p>To streamline giving new projects this folder structure, the
+<code>framrsquared</code> package <a href="https://github.com/FRAMverse/framrsquared">found here</a> has the
+<code>initialize_project()</code> function. By default, this generates
+the folder structure above; optional arguments allow users to specify a
+different folder structure, copy template files for quarto documents,
+and initialize <code>renv</code>.</p>
+<p>To reiterate, this project structure is not mandatory for good
+coding. It’s simply a useful option.</p>
 </div>
 <div id="databases" class="section level4">
 <h4>Databases</h4>
@@ -531,7 +616,8 @@ <h4>Tips</h4>
 <li>using <code>if(interactive())</code> allows you to write code that
 behaves differently when being compiled for a report than when its being
 run interactively. This can be useful when developing parameterized
-reports.</li>
+reports, as the parameters will live in the YAML header, which is not
+run in interactive mode.</li>
 </ul>
 </div>
 </div>
@@ -539,6 +625,11 @@ <h4>Tips</h4>
 <div id="r-practices" class="section level2">
 <h2>R Practices</h2>
 <ul>
+<li><p>for maximum compatibility, use dashes rather than spaces or
+underscores in file names. <span class="math inline">\(\LaTeX\)</span>,
+which is sometimes used as a part of Rmarkdown and Quarto documents,
+does not like spaces or underscores. This is most relevant when creating
+image files that may be loaded into reports.</p></li>
 <li><p>Ensure that your code is reproducible by never saving / loading
 the environment. Scripts should include code to read in relevant files,
 and can save key objects for re-use later. In Rstudio, go to
@@ -551,43 +642,37 @@ <h2>R Practices</h2>
 Function calls that require file paths should be relative, such that
 someone with a copy of the project directory can run the script without
 needing to change those file paths.</p></li>
-<li><p>Ensure that figure titles are correct. When copy-pasting
-figure-generation code to make comparable figures for different parts of
-the data (e.g., different stocks or different fisheries), it’s easy to
-accidentally leave old titles in place, leading to confusion. Consider
-using <code>paste()</code> or <code>glue()</code> with variable names or
-even r functions so that the figure title auto-updates
-appropriately.</p></li>
-</ul>
-<pre class="r"><code>## &quot;fragile&quot; version of plotting an mtcar variable; copy-pasting and plotting a second variable requires careful updating of ggtitle()
-dat.plot &lt;- data |&gt; 
-    filter(fishery_title == &quot;NT Area 10 Sport&quot;)
-ggplot(dat.plot, aes(x = stock, y = AEQ))+
-   geom_col()+
-   ggtitle(&quot;Chinook AEQ of NT Area 10 Sport&quot;)+
-   coord_flip()
-   
-## robust version:
-fishery_plot &lt;- &quot;NT Area 10 Sport&quot; ## define the fishery to plot in one place at the top
-dat.plot &lt;- data |&gt; 
-    filter(fishery_title == fishery_plot) ## use variable in our filter function 
-ggplot(dat.plot, aes(x = stock, y = AEQ))+
-   geom_col()+
-   ggtitle(paste(&quot;Chinook AEQ of&quot;, fishery_plot))+ ## use paste and variable name
-   coord_flip()
-   
-## alternative robust version:
-dat.plot &lt;- data |&gt; 
-    filter(fishery_title == &quot;NT Area 10 Sport&quot;)
-ggplot(dat.plot, aes(x = stock, y = AEQ))+
-   geom_col()+
-   ggtitle(paste(&quot;Chinook AEQ of&quot;, dat.plot$fishery_title[1]))+ ## obtain the fishery name directly from dat.plot
-   coord_flip()</code></pre>
-<ul>
-<li>When loading libraries, use <code>library()</code> rather than
+<li><p>When loading libraries, use <code>library()</code> rather than
 <code>require()</code>. Put all library calls at the top of the script,
 so that users immediately encounter errors if they have not yet
-installed relevant libraries.</li>
+installed relevant libraries.</p></li>
+<li><p>To improve transparency, give R scripts a header with your name,
+the date, and a brief explanation of the script’s purpose. To streamline
+this process, consider adding a <code>header</code> <a href="https://rstudio.github.io/rstudio-extensions/rstudio_snippets.html">Rstudio
+snippet</a>. We have a template snippet <a href="https://github.com/FRAMverse/snippets/blob/main/Rstudio/header-snippet.txt">here</a>;
+you can update this with your own name and then add it to your Rstudio’s
+snippets.</p></li>
+<li><p>When running simulations or other code in which the outcomes of a
+run can differ due to randomness, it can be difficult and frustrating
+for others to attempt to replicate your work (or replicate an error).
+One key tool is to use <code>set.seed()</code> at the beginning of a
+script. This will ensure that the randomness is repeated exactly every
+time the script is run. Note that since setting the seed prevents
+alternative random outcomes, it is unwise to do so when developing code,
+as your code will only ever represent one set of random
+outcomes.</p></li>
+<li><p>In rare cases, R packages will work only for 32 bit R or only for
+64 bit R (historically, this was an issue for connecting to databases).
+Code that uses these packages will then only run on some computers,
+severely hampering our transparency and code sharing. Because of this,
+these packages should be avoided whenever reasonable. When there is no
+other option, there should be very clear commenting or documentation
+identifying this issue, so that users know immediately whether or not
+they will be able to run the code. If R functions exist for both 32 bit
+and 64 bit R but have different functions or syntax, consider supportinb
+both architectures by including an <code>if</code> statement;
+<code>.Machine$sizeof.pointer</code> will return 8 in 64-bit R, and 4 in
+32-bit R.</p></li>
 </ul>
 </div>
 <div id="style-guide" class="section level2">
@@ -652,8 +737,8 @@ <h4>Naming Conventions</h4>
 </table>
 <p>Although Hadley Wickham recommends snake case for R scripting, R has
 no official naming convention. When writing a script a naming convention
-should be chosen and be consistently used throughout the documents
-entirety.</p>
+should be chosen and then used consistently throughout the entire
+document.</p>
 </div>
 <div id="assignment-operators" class="section level4">
 <h4>Assignment Operators</h4>
@@ -662,7 +747,8 @@ <h4>Assignment Operators</h4>
 well as their directional reversals. The vast majority of your
 assignments will either be <code>&lt;-</code> or <code>=</code>,
 although essentially equal, one should be chosen and used exclusively
-throughout the project.</p>
+throughout the project. In Rstudio, [ctrl][=] is a hotkey to create
+<code>&lt;-</code>.</p>
 </div>
 <div id="pipes" class="section level4">
 <h4>Pipes</h4>
@@ -670,21 +756,10 @@ <h4>Pipes</h4>
 <code>magittr</code> package. The ‘native pipe’ <code>|&gt;</code> was
 introduced in R 4.0. These two perform essentially the same function,
 but with different placeholders which can lead to various errors in
-scripts when mix. One pipe should be used in the document.</p>
-<!-- i think we should allow some flexibility, and go from there, thoughts? -->
-<!--
-=======
-The following are good general practices, but specific style choices are often a matter of taste. Consistency is the most important part -- use the same style throughout your script.
-
-
-
--   Snakecase for variable names. E.g. `chinook_landed_catch`. *Using separators in variable names makes them easier to read. Using periods as separators becomes ambiguous when dealing with S3 methods*
--   Use `<-` for assignment rather than `=`. Always ensure there is a space before and after the assignment operator. *This helps with visually distinguishing the assignment `x <- 10` and the test `x < -10`.*
--   There's not cost to spreading code across more lines. When in doubt, break really long / complex lines into more, shorter lines; create intermediate variables if necessary. When using pipes, put each pipe operation on its own line.
--   We recommend using "Code > Reindent Code" (select all, then Ctrl-I) and "Code > Reformat Code" (select all, then Ctrl-shift-A) to make code easier to read
--   Avoid creating variables that share names with common functions (e.g., use `x_mean = mean(x)` instead of `mean = mean(X)`, and `cur_plot = ggplot(...` instead of  `plot = ggplot(...`).
--   Where possible, use names instead of numbers when indexing named vectors, dataframes, or lists. (e.g., `mtcars$cyl` or `mtcars[, cyl]` rather than `mtcars[, 2]`)
--->
+scripts when mix. One pipe should be used in the document. In Rstudio,
+[ctrl][shift][m] generates a pipe; you can set which type of pipe is
+generated in Tools &gt; Global Options &gt; Code, and check/uncheck the
+“Use native pipe operator…” box.</p>
 </div>
 </div>
 </div>
@@ -695,9 +770,9 @@ <h2>Visualization</h2>
 good practices. The following is technically agnostic to packages, but
 suggestions are centered on ggplot2-based approaches</p>
 <ul>
-<li>Axes should have clear, interpretable labels</li>
-<li>Colors should be easily distinguishable, including by folks with
-common forms of color vision deficiencies.
+<li><p>Axes should have clear, interpretable labels.</p></li>
+<li><p>Colors should be easily distinguishable, including by folks with
+common forms of color vision deficiencies.</p>
 <ul>
 <li>The <code>viridis</code> <a href="https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html">package</a>
 makes it very easy to create high-contrast accessible graphics.</li>
@@ -713,44 +788,81 @@ <h2>Visualization</h2>
 a ggplot2 object under different simulated color vision deficiencies to
 help check accessibility.</li>
 </ul></li>
-<li>Text size should be large enough to read comfortably. When using
-ggplot, this can easily be achieved by using any of the built-in themes
-and including the optional argument <code>base_size</code>. A good
-starting point is <code>theme_bw(base_size = 16)</code>.</li>
-<li>When there is complexity in interpreting a plot, this should be
-included in text associated with the plot. This is easy to do in Quarto
-or Rmarkdown, as we can add caveates or comments right below or above
-the associated R chunk. In quarto reports, an explicit figure caption
-can be added with
-<code>#| fig-cap: &quot;caption contents go here&quot;</code>.</li>
-<li>Sometimes we work with timeseries data using day of year as a
+<li><p>Text size should be large enough for others to read comfortably.
+In our experience, this always means making the font size seemly too
+large. When using ggplot, this can easily be achieved by using any of
+the built-in themes and including the optional argument
+<code>base_size</code>. A good starting point is
+<code>theme_bw(base_size = 16)</code>.</p></li>
+<li><p>When there is some kind of nuance or complexity in interpreting a
+plot (e.g., an axis label can be misunderstood), this should be included
+in text associated with the plot. This is easy to do in Quarto or
+Rmarkdown, as we can add caveats or comments right below or above the
+associated R chunk. In quarto reports, an explicit figure caption can be
+added with <code>#| fig-cap: &quot;caption contents go here&quot;</code> in the
+associated R chunk.</p></li>
+<li><p>Sometimes we work with timeseries data using day of year as a
 numeric (e.g., converting dates to values from 1 to 365). Plotting
 results on a doy scale makes them difficult to interpret; instead, we
 can use the <a href="https://github.com/FRAMverse/snippets/blob/main/R/doy_2md.R">doy_2md()
 function here</a> to translate back. In it’s simplest form, using this
 function just requires including a
 <code>scale_x_continuous(labels = doy_2md)</code> call in your ggplot
-layers.</li>
+layers.</p></li>
+<li><p>Ensure that figure titles are correct. When copy-pasting
+figure-generation code to make comparable figures for different parts of
+the data (e.g., different stocks or different fisheries), it’s easy to
+accidentally leave old titles in place, leading to confusion. Consider
+using <code>paste()</code> or <code>glue()</code> with variable names or
+even r functions so that the figure title auto-updates
+appropriately.</p></li>
 </ul>
+<pre class="r"><code>## &quot;fragile&quot; version; copy-pasting and plotting a different fishery requires careful updating of ggtitle()
+dat.plot &lt;- data |&gt; 
+    filter(fishery_title == &quot;NT Area 10 Sport&quot;)
+ggplot(dat.plot, aes(x = stock, y = AEQ))+
+   geom_col()+
+   ggtitle(&quot;Chinook AEQ of NT Area 10 Sport&quot;)+
+   coord_flip()
+   
+## robust version:
+fishery_plot &lt;- &quot;NT Area 10 Sport&quot; ## define the fishery to plot in one place at the top
+dat.plot &lt;- data |&gt; 
+    filter(fishery_title == fishery_plot) ## use variable in our filter function 
+ggplot(dat.plot, aes(x = stock, y = AEQ))+
+   geom_col()+
+   ggtitle(paste(&quot;Chinook AEQ of&quot;, fishery_plot))+ ## use paste and variable name
+   coord_flip()
+   
+## alternative robust version:
+dat.plot &lt;- data |&gt; 
+    filter(fishery_title == &quot;NT Area 10 Sport&quot;)
+ggplot(dat.plot, aes(x = stock, y = AEQ))+
+   geom_col()+
+   ggtitle(paste(&quot;Chinook AEQ of&quot;, dat.plot$fishery_title[1]))+ ## obtain the fishery name directly from dat.plot
+   coord_flip()</code></pre>
 </div>
 <div id="creating-custom-functions" class="section level2">
 <h2>Creating custom functions</h2>
 <ul>
-<li>Functions should have clear names, preferably involving a verb. This
-name should not be the same as common R functions (e.g., don’t create a
-custom plotting function and call it <code>plot</code>)</li>
-<li>functions should not rely on objects in the global environment; if
+<li>Functions should have clear names, preferably based around a verb
+(e.g., <code>make_fishery_plot()</code>, not
+<code>fishery_plot()</code>). This name should not be the same as common
+R functions (e.g., don’t create a custom plotting function and call it
+<code>plot</code>)</li>
+<li>functions should not rely on objects in the global environment. If
 the function needs an object, ensure that the object is an argument for
 the function.</li>
 <li>Whenever possible, avoid writing functions that rely on
-side-effects, particularly creating new variables in the global
-environment (e.g., with <code>assign()</code>). If you need a function
-to create several objects, have the function return a list of those
-objects. (Note that file manipulation is an obvious exception to the
-general aim to avoid side-effects in functions; functions can read or
-write)</li>
+side-effects, particularly side effects that create new variables in the
+global environment or modify existing variables in the environment
+(e.g., with <code>assign()</code>). If you need a function to create
+several objects, have the function return a list of those objects. (Note
+that file manipulation is an key exception to the general aim to avoid
+side-effects in functions; it is often appropriate to have functions
+read or write files)</li>
 <li>When writing functions to create graphics, the user has much better
-control if the function creates and returns a gglot object instead of
+control if the function creates and returns a ggplot object instead of
 directly manipulating a graphics window using base R plotting
 functions.</li>
 <li>For longer scripts, consider separating the code into multiple
@@ -769,8 +881,66 @@ <h2>Version control</h2>
 (emailing different versions of a zipped folder back and forth is not a
 good idea). Git and Github are the best tool for this, and Rstudio now
 supports using github to manage projects.</p>
+<div id="setting-up" class="section level3">
+<h3>Setting up</h3>
+<p>Setting up Git and linking it to Rstudio is an involved task, and we
+recommend <a href="https://happygitwithr.com/">Happy Git with R</a> as a
+resource.</p>
+</div>
+<div id="getting-started" class="section level3">
+<h3>Getting started</h3>
+<p>At its simplest, git is a way to keep track of changes, and merge
+different, non-conflicting changes to the same documents. In this sense,
+you can think of it as a mix between dropbox and a Google document, but
+with more to learn but a lot more control and functionality. We
+recommend Chapter 20 of Happy Git with R for an overview. For simple
+tasks (e.g., working on your own project), a standard workflow is to
+pull (this makes sure your local version of the repository is up to
+date), and work in the repo. At good stopping points or key checkpoints
+in your work (you completed a specific task, or are stopping work on
+this project for the day), add any new files that were created, commit
+the repository, pull the remote to make sure you are up to date, and
+then push. See “Using git in Rstudio” for the terminal commands for this
+workflow.</p>
+</div>
+<div id="using-git-in-rstudio" class="section level3">
+<h3>Using git in Rstudio</h3>
+<p>Once an Rproject is linked to a git repository, Rstudio will have a
+git button in the top menu (near the “go to file/function” field).
+Clicking this button, then “Commit” opens up an interface to create and
+control a git commit, then push it to the remote repo. However, often
+this interface is very slow/laggy when a project has many files. An
+alternative is to manage the commit in the “terminal” tab of the console
+window. Here you can type git commands, which are typically much faster
+for Rstudio to enact. Here is a typical commit process in the terminal,
+with explanations for each step.</p>
+<p><code>git add -A</code> This adds any new files to tracking (ignoring
+files that are covered in <code>.gitignore</code>), so that they are
+included in the commit. <code>git commit -a -m &quot;commit message&quot;</code>
+This commits the current state of all tracked files. Replace the text in
+quote marks to an appropriate commit message. (e.g., “Addressing issue
+#4, modified intitialize_project() function”). <code>git pull</code>
+This updates your local version with the remote version, in case someone
+else has made changes. If there are changes and they conflict with your
+changes, git will ask you to address those. <code>git push</code> This
+updates the remote version with your commit and its associated
+changes.</p>
+</div>
+<div id="branching-forking-pull-requests" class="section level3">
+<h3>Branching, forking, pull requests</h3>
 <p><em>Populate with links, basics of the work flow</em> - branching and
-pull requests - forking and pull requests - happy git with R link</p>
+pull requests - forking and pull requests - Adapt git use example from
+BDS coding practices.</p>
+</div>
+<div id="git-switch" class="section level3">
+<h3>git switch</h3>
+<p>If you have started making changes to a git repository and realize
+before committing the changes that your work should really be on a new
+branch, you can use the following git commands to achieve this:</p>
+<p><code>git switch -c newbranchname</code></p>
+<p>where “newbranchname” is replaced by an appropriate name for your new
+branch.</p>
+</div>
 </div>
 <div id="package-development-guidelines" class="section level2">
 <h2>Package development guidelines</h2>
@@ -778,7 +948,7 @@ <h2>Package development guidelines</h2>
 <li>When in doubt, we should have more packages that are small and
 focused.</li>
 <li>Before developing something new, make sure there isn’t an existing
-tool we can use (I’m looking at YOU/ME, Collin)</li>
+tool we can use (I’m looking at YOU-ME, Collin)</li>
 </ul>
 <div id="tips-for-clean-checks" class="section level4">
 <h4>Tips for clean checks</h4>
diff --git a/README.md b/README.md
index 7bb1fbb..4393051 100644
--- a/README.md
+++ b/README.md
@@ -34,6 +34,30 @@ The following are important for coding and non-coding projects alike.
     - Ensure that headers are consistent in different files that are meant to be combined. This includes capitalization and the use of spaces. Copy-pasting from a template file is a good way to ensure exactly identical headers in this case.
     - Ensure that categories have a consistent name in a given column. For example, we have encountered data sheets with a "Yes/No" type column in which "Yes" is a mix of "Y", "Yes", "yes", "yes ", and "yees". When read into R or another language, these will be treated as five different categories instead of one. Consider using data validation in Excel to constrain user inputs to intended values. This can also be used to ensure that fields that should contain numbers do not end up with character strings.
     - For more suggestions on good spreadsheet practices, see [Broman and Woo 2018](https://doi.org/10.1080/00031305.2017.1375989)
+    
+## Data Sharing Guidance
+Sharing what type of data and who it can be shared with is often confusing.
+Generally sport data can be shared with everyone freely, while commercial data
+has restrictions on who and *how* the data can be shared, for example under the
+Magnuson-Stevens Act (MSA)
+
+### Sport
+Sport data can usually freely shared to the public, although there might be
+restrictions around sharing charter fishing data via the MSA.
+
+### Commerical
+The MSA has to be considered when sharing commercial data, often the data has
+to be aggregated in way to not specifically identify fishers. In the co-management
+realm this is rarely and issue as much of the data is aggregated, but for sharing
+with the pubic guidance should be requested through WDFWs Records Office.
+
+### Treaty
+Public requests for treaty data should be directed to the individual tribes
+themselves or WDFW's Records Office. Tribal data can be shared freely with the 
+data's respective tribe, sharing one tribes data with another tribe should be 
+done under caution with the guidance of NWIFC staff.
+
+
 
 ## Project management