diff --git a/.DS_Store b/.DS_Store
new file mode 100644
index 0000000..b1b1f67
Binary files /dev/null and b/.DS_Store differ
diff --git a/.nojekyll b/.nojekyll
index d678ce0..2b3dff3 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-ff3c9455
\ No newline at end of file
+b265fe01
\ No newline at end of file
diff --git a/appendix.html b/appendix.html
index af3bf2d..e9b5910 100644
--- a/appendix.html
+++ b/appendix.html
@@ -233,7 +233,14 @@ <h2 id="toc-title">Table of contents</h2>
    
   <ul>
   <li><a href="#sec-guide" id="toc-sec-guide" class="nav-link active" data-scroll-target="#sec-guide">Introduction</a></li>
-  <li><a href="#data-access" id="toc-data-access" class="nav-link" data-scroll-target="#data-access"><strong>Data Access</strong></a></li>
+  <li><a href="#data-access" id="toc-data-access" class="nav-link" data-scroll-target="#data-access"><strong>Data Access</strong></a>
+  <ul class="collapse">
+  <li><a href="#dbeaver" id="toc-dbeaver" class="nav-link" data-scroll-target="#dbeaver">DBeaver</a></li>
+  <li><a href="#sas-connection" id="toc-sas-connection" class="nav-link" data-scroll-target="#sas-connection">SAS Connection</a></li>
+  <li><a href="#r-connection" id="toc-r-connection" class="nav-link" data-scroll-target="#r-connection">R Connection</a></li>
+  <li><a href="#python-connection" id="toc-python-connection" class="nav-link" data-scroll-target="#python-connection">Python Connection</a></li>
+  <li><a href="#stata-connection" id="toc-stata-connection" class="nav-link" data-scroll-target="#stata-connection">Stata Connection</a></li>
+  </ul></li>
   <li><a href="#redshift-query-guidelines-for-researchers" id="toc-redshift-query-guidelines-for-researchers" class="nav-link" data-scroll-target="#redshift-query-guidelines-for-researchers"><strong>Redshift Query Guidelines for Researchers</strong></a>
   <ul class="collapse">
   <li><a href="#do-and-dont-do-best-practices" id="toc-do-and-dont-do-best-practices" class="nav-link" data-scroll-target="#do-and-dont-do-best-practices"><strong><em>DO and DON’T DO BEST PRACTICES:</em></strong></a></li>
@@ -292,9 +299,8 @@ <h2 class="anchored" data-anchor-id="data-access"><strong>Data Access</strong></
 </figure>
 </div>
 <p>All data is stored under schemas in the <strong><em>projects</em></strong> database and are accessible by the following programs:&nbsp;</p>
-<ul>
-<li><strong>DBeaver</strong></li>
-</ul>
+<section id="dbeaver" class="level3">
+<h3 class="anchored" data-anchor-id="dbeaver">DBeaver</h3>
 <p>To establish a connection to Redshift in DBeaver, first double click on the server you wish to connect to. In the example below I’m connecting to <code>Redshift11_projects</code>. Then a window will appear asking for your Username and Password. This will be your user folder name and include <code>adrf\</code> before the username. Then click OK. You will now have access to your data stored on the <code>Redshift11_projects</code> server.</p>
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure">
@@ -305,12 +311,12 @@ <h2 class="anchored" data-anchor-id="data-access"><strong>Data Access</strong></
 <p>When users create tables in their PR (Research Project) or TR (Training Project) schema, the table is initially permissioned to the user only. This is analogous to creating a document or file in your U drive: Only you have access to the newly created table.</p>
 <p>If you want to allow all individuals in your project workspace to access the table in the PR/TR schema, you will need to grant permission to the table to the rest of the users who have access to the PR or TR schema.</p>
 <p>You can do this by running the following code:</p>
-<p><code>GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name TO group db_xxxxxx_rw;</code></p>
+<div class="sourceCode" id="cb1"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">GRANT</span> <span class="kw">SELECT</span>, <span class="kw">UPDATE</span>, <span class="kw">DELETE</span>, <span class="kw">INSERT</span> <span class="kw">ON</span> <span class="kw">TABLE</span> schema_name.table_name <span class="kw">TO</span> <span class="kw">group</span> db_xxxxxx_rw;</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <blockquote class="blockquote">
 <p>Note: In the above code example replace <code>schma_name</code> with the <code>pr_</code> or <code>tr_</code> schema assigned to your workspace and replace <code>table_name</code> with the name of the table on which you want to grant access. Also, in the group name <code>db_xxxxxx_rw</code>, replace <code>xxxxxx</code> with your project code. This is the last 6 characters in your project based user name. This will start with either a <code>T</code> or a <code>P</code>.</p>
 </blockquote>
 <p>If you want to allow only a single user on your project to access the table, you will need to grant permission to that user. You can do this by running the following code:</p>
-<p><code>GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE</code> <code>schema_name.table_name&nbsp;to</code> <code>"IAM:first_name.last_name.project_code";</code></p>
+<div class="sourceCode" id="cb2"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">GRANT</span> <span class="kw">SELECT</span>, <span class="kw">UPDATE</span>, <span class="kw">DELETE</span>, <span class="kw">INSERT</span> <span class="kw">ON</span> <span class="kw">TABLE</span> schema_name.table_name <span class="kw">to</span> <span class="ot">"IAM:first_name.last_name.project_code"</span>;</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <blockquote class="blockquote">
 <p>Note: In the above code example replace <code>schma_name</code> with the <code>pr_</code> or <code>tr_</code> schema assigned to your workspace and replace <code>table_name</code> with the name of the table on which you want to grant access. Also, in <code>"IAM:first_name.last_name.project_code"</code> update <code>first_name.last_name.project_code</code> with the user name to whom you want to grant access to.</p>
 </blockquote>
@@ -321,9 +327,9 @@ <h2 class="anchored" data-anchor-id="data-access"><strong>Data Access</strong></
 <li><p><strong>Redshift11_projects_DSN</strong></p></li>
 </ul>
 <p>In the code examples below, the default DSN is <code>Redshift01_projects_DSN</code>.</p>
-<ul>
-<li><strong>SAS Connection</strong></li>
-</ul>
+</section>
+<section id="sas-connection" class="level3">
+<h3 class="anchored" data-anchor-id="sas-connection">SAS Connection</h3>
 <pre class="sas"><code>proc sql;
 connect to odbc as my con
 (datasrc=Redshift01_projects_DSN user=adrf\user.name.project password=password);
@@ -331,158 +337,97 @@ <h2 class="anchored" data-anchor-id="data-access"><strong>Data Access</strong></
 (select * form projects.schema.table);
 disconnect from mycon;
 quit;</code></pre>
-<ul>
-<li><strong>R Connection</strong></li>
-</ul>
+</section>
+<section id="r-connection" class="level3">
+<h3 class="anchored" data-anchor-id="r-connection">R Connection</h3>
 <p><strong>Best practices for loading large amounts of data in R</strong></p>
 <p>To ensure R can efficiently manage large amounts of data, please&nbsp;add the following&nbsp;lines of code to your R script before any packages are loaded:&nbsp;</p>
-<p><code>options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx8192m")) gc()</code></p>
-<p><strong><em>Redshift R database connectivity Changes</em></strong></p>
-<p><em>When connecting to Redshift database using R (whether in R Studio or Jupyter Notebook), We strongly recommend that you use a JDBC based connection (versus using an ODBC based connection). Other than how you connect to database, rest of your code should remain the same.</em></p>
+<div class="sourceCode" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">options</span>(<span class="at">java.parameters =</span> <span class="fu">c</span>(<span class="st">"-XX:+UseConcMarkSweepGC"</span>, <span class="st">"-Xmx8192m"</span>)) <span class="fu">gc</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <p><strong>Best practices for writing tables to Redshift</strong></p>
-<p>When writing tables to Redshift from R in either a SQL query or R data frame, please use the following lines of R code:</p>
-<p>Writing a table from R to Redshift using SQL <code>INTO</code> statement use the function <code>dbSendUpdate()</code>:</p>
-<p><code>dbSendUpdate(conn, SELECT col_1 INTO schema_name.table_name FROM schema_name.old_table_name</code></p>
 <p>When writing an R data frame to Redshift use the following code as an example:</p>
-<div class="sourceCode" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>qry <span class="ot">&lt;-</span> <span class="st">"set search_path to schema_name"</span></span>
-<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="fu">dbSendUpdate</span>(conn, qry)</span>
-<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="fu">dbWriteTable</span>(</span>
-<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="at">conn =</span> conn, <span class="co">#name of the connection</span></span>
-<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="at">name =</span> <span class="st">'table_name'</span>, <span class="co">#name of table to save df to</span></span>
-<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="at">value =</span> df_name, <span class="co">#name of df to write to Redshift</span></span>
-<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="at">overwrite =</span> <span class="cn">TRUE</span> <span class="co">#if you want to overwrite a current table, otherwise FALSE</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Note: replace the table_name with the name of the data frame you wish to write to Redshift</span></span>
+<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>DBI<span class="sc">::</span><span class="fu">dbWriteTable</span>(<span class="at">conn =</span> conn, <span class="co">#name of the connection </span></span>
+<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="at">name =</span> <span class="st">"schema_name.table_name"</span>, <span class="co">#name of table to save df to </span></span>
+<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="at">value =</span> df_name, <span class="co">#name of df to write to Redshift </span></span>
+<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="at">overwrite =</span> <span class="cn">TRUE</span>) <span class="co">#if you want to overwrite a current table, otherwise FALSE</span></span>
+<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>qry <span class="ot">&lt;-</span> <span class="st">"GRANT SELECT ON TABLE schema.table_name TO group &lt;group_name&gt;;"</span></span>
+<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="fu">dbSendUpdate</span>(conn,qry)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <p><strong><em>The below table is for connecting to RedShift11 Database</em></strong></p>
-<table class="table">
-<colgroup>
-<col style="width: 11%">
-<col style="width: 43%">
-<col style="width: 44%">
-</colgroup>
-<tbody>
-<tr class="odd">
-<td></td>
-<td>ODBC Based Connection</td>
-<td>JDBC Based Connection(Recommended)</td>
-</tr>
-<tr class="even">
-<td><strong><em>Libraries</em></strong></td>
-<td><code>library(odbc)</code></td>
-<td><code>library(RJDBC)</code></td>
-</tr>
-<tr class="odd">
-<td><strong><em>User ID and Password</em></strong></td>
-<td></td>
-<td><p><code>dbusr=Sys.getenv("DBUSER")</code></p>
-<p><code>dbpswd=Sys.getenv("DBPASSWD")</code></p></td>
-</tr>
-<tr class="even">
-<td></td>
-<td></td>
-<td><p><code># Database URL</code></p>
-<p><code>url &lt;- "jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;</code></p>
-<p><code>loginToRp=urn:amazon:webservices:govcloud;</code></p>
-<p><code>ssl=true;</code></p>
-<p><code>AutoCreate=true;</code></p>
-<p><code>idp_host=adfs.adrf.net;</code></p>
-<p><code>idp_port=443;</code></p>
-<p><code>ssl_insecure=true;</code></p>
-<p><code>plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider"</code></p>
-<p><code># Redshift JDBC Driver Setting</code></p>
-<p><code>driver &lt;- JDBC("com.amazon.redshift.jdbc42.Driver",</code></p>
-<p><code>classPath = "C:\\drivers\\redshift_withsdk\\redshift-jdbc42-2.1.0.12\\redshift-jdbc42-2.1.0.12.jar",</code></p>
-<p><code>identifier.quote="`")</code></p></td>
-</tr>
-<tr class="odd">
-<td><strong><em>Connection</em></strong></td>
-<td><code>con &lt;- dbConnect(odbc(),"Redshift11_projects_DSN",uid = "adrf\\John.doe.p00002", pwd = 'xxxxxxxxxxxxxx')</code></td>
-<td><code>conn &lt;- dbConnect(driver, url, dbusr, dbpswd)</code></td>
-</tr>
-</tbody>
-</table>
-<p><em>For the above code to work, please create a file name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <strong>.Renviron</strong> in your user folder (user folder is something like&nbsp; i.e.&nbsp;u:\ John.doe.p00002) And <strong>.Renviron file</strong> should contain the following:</em></p>
-<div class="sourceCode" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>DBUSER<span class="ot">=</span><span class="st">'adrf\John.doe.p00002'</span></span>
-<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>DBPASSWD<span class="ot">=</span><span class="st">'xxxxxxxxxxxx'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="sourceCode" id="cb6"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RJDBC)</span>
+<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>dbusr<span class="ot">=</span><span class="fu">Sys.getenv</span>(<span class="st">"DBUSER"</span>)                                                                                  </span>
+<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>dbpswd<span class="ot">=</span><span class="fu">Sys.getenv</span>(<span class="st">"DBPASSWD"</span>)</span>
+<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Database URL</span></span>
+<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>url <span class="ot">&lt;-</span> <span class="fu">paste0</span>(<span class="st">"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;"</span>,</span>
+<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="st">"loginToRp=urn:amazon:webservices:govcloud;"</span>,</span>
+<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a><span class="st">"ssl=true;"</span>,</span>
+<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a><span class="st">"AutoCreate=true;"</span>,</span>
+<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a><span class="st">"idp_host=adfs.adrf.net;"</span>,</span>
+<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a><span class="st">"idp_port=443;"</span>,</span>
+<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a><span class="st">"ssl_insecure=true;"</span>,</span>
+<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a><span class="st">"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider"</span>)</span>
+<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Redshift JDBC Driver Setting</span></span>
+<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a>driver <span class="ot">&lt;-</span> <span class="fu">JDBC</span>(<span class="st">"com.amazon.redshift.jdbc42.Driver"</span>,</span>
+<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a><span class="at">classPath =</span> <span class="st">"C:</span><span class="sc">\\</span><span class="st">drivers</span><span class="sc">\\</span><span class="st">redshift_withsdk</span><span class="sc">\\</span><span class="st">redshift-jdbc42-2.1.0.12</span><span class="sc">\\</span><span class="st">redshift-jdbc42-2.1.0.12.jar"</span>,</span>
+<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a><span class="at">identifier.quote=</span><span class="st">"`"</span>)</span>
+<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a>conn <span class="ot">&lt;-</span> <span class="fu">dbConnect</span>(driver, url, dbusr, dbpswd)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<p><em>For the above code to work, please create a file name <strong>.Renviron</strong> in your user folder (user folder is something like i.e.&nbsp;u:\ John.doe.p00002) And <strong>.Renviron file</strong> should contain the following:</em></p>
+<div class="sourceCode" id="cb7"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>DBUSER<span class="ot">=</span><span class="st">'adrf\John.doe.p00002'</span></span>
+<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>DBPASSWD<span class="ot">=</span><span class="st">'xxxxxxxxxxxx'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <p><em>PLEASE replace user id and password with your project workspace specific user is and password.</em></p>
 <p><em>This will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.</em></p>
-<p>&nbsp;</p>
 <p><strong><em>The below table is for connecting to RedShift01 Database</em></strong></p>
-<table class="table">
-<colgroup>
-<col style="width: 11%">
-<col style="width: 43%">
-<col style="width: 44%">
-</colgroup>
-<tbody>
-<tr class="odd">
-<td></td>
-<td style="text-align: left;"><strong><em><br>
-ODBC Based Connection</em></strong></td>
-<td style="text-align: left;"><strong><em>JDBC Based Connection (Recommended)</em></strong></td>
-</tr>
-<tr class="even">
-<td><strong><em>Libraries</em></strong></td>
-<td style="text-align: left;"><code>library(odbc)</code></td>
-<td style="text-align: left;"><code>library(RJDBC)</code></td>
-</tr>
-<tr class="odd">
-<td><strong><em>User ID and Password</em></strong></td>
-<td style="text-align: left;"></td>
-<td style="text-align: left;"><code>dbusr=Sys.getenv("DBUSER") dbpswd=Sys.getenv("DBPASSWD")</code></td>
-</tr>
-<tr class="even">
-<td></td>
-<td style="text-align: left;"><em><br>
-</em></td>
-<td style="text-align: left;"><p><code># Database URL</code></p>
-<p><code>url &lt;- "jdbc:redshift:iam://adrf-redshift01.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;</code></p>
-<p><code>loginToRp=urn:amazon:webservices:govcloud;</code></p>
-<p><code>ssl=true;AutoCreate=true;</code></p>
-<p><code>idp_host=adfs.adrf.net;</code></p>
-<p><code>idp_port=443;</code></p>
-<p><code>ssl_insecure=true;</code></p>
-<p><code>plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider"</code></p>
-<p><code># Redshift JDBC Driver Setting</code></p>
-<p><code>driver &lt;- JDBC("com.amazon.redshift.jdbc42.Driver",</code></p>
-<p><code>classPath = "C:\\drivers\\redshift_withsdk\\redshift-jdbc42-2.1.0.12\\redshift-jdbc42-2.1.0.12.jar",</code></p>
-<p><code>identifier.quote="`")</code></p></td>
-</tr>
-<tr class="odd">
-<td><strong><em>Connection</em></strong></td>
-<td style="text-align: left;"><code>con &lt;- dbConnect(odbc(), "Redshift11_projects_DSN", uid = "adrf\\John.doe.p00002", pwd = 'xxxxxxxxxxxxxx')</code></td>
-<td style="text-align: left;"><p><code>conn &lt;- dbConnect(driver, url,</code></p>
-<p><code>dbusr, dbpswd)</code></p></td>
-</tr>
-</tbody>
-</table>
-<p><em>For the above code to work, please create a file name <strong>.Renviron</strong> in your user folder (user folder is something like&nbsp; i.e.&nbsp;u:\ John.doe.p00002) And <strong>.Renviron file</strong> should contain the following:</em></p>
-<div class="sourceCode" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>DBUSER<span class="ot">=</span><span class="st">'adrf\John.doe.p00002’</span></span>
-<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="st">DBPASSWD='</span>xxxxxxxxxxxx<span class="st">'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<p>&nbsp;<em>PLEASE replace user id and password with your project workspace specific user is and password.</em></p>
+<div class="sourceCode" id="cb8"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RJDBC)</span>
+<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>dbusr<span class="ot">=</span><span class="fu">Sys.getenv</span>(<span class="st">"DBUSER"</span>)                                                                                  </span>
+<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>dbpswd<span class="ot">=</span><span class="fu">Sys.getenv</span>(<span class="st">"DBPASSWD"</span>)</span>
+<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Database URL</span></span>
+<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>url <span class="ot">&lt;-</span> <span class="fu">paste0</span>(<span class="st">"jdbc:redshift:iam://adrf-redshift01.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;"</span>,</span>
+<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a><span class="st">"loginToRp=urn:amazon:webservices:govcloud;"</span>,</span>
+<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a><span class="st">"ssl=true;"</span>,</span>
+<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="st">"AutoCreate=true;"</span>,</span>
+<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a><span class="st">"idp_host=adfs.adrf.net;"</span>,</span>
+<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a><span class="st">"idp_port=443;"</span>,</span>
+<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a><span class="st">"ssl_insecure=true;"</span>,</span>
+<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="st">"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider"</span>)</span>
+<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Redshift JDBC Driver Setting</span></span>
+<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a>driver <span class="ot">&lt;-</span> <span class="fu">JDBC</span>(<span class="st">"com.amazon.redshift.jdbc42.Driver"</span>,</span>
+<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a><span class="at">classPath =</span> <span class="st">"C:</span><span class="sc">\\</span><span class="st">drivers</span><span class="sc">\\</span><span class="st">redshift_withsdk</span><span class="sc">\\</span><span class="st">redshift-jdbc42-2.1.0.12</span><span class="sc">\\</span><span class="st">redshift-jdbc42-2.1.0.12.jar"</span>,</span>
+<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a><span class="at">identifier.quote=</span><span class="st">"`"</span>)</span>
+<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a>conn <span class="ot">&lt;-</span> <span class="fu">dbConnect</span>(driver, url, dbusr, dbpswd)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<p><em>For the above code to work, please create a file name <strong>.Renviron</strong> in your user folder (user folder is something like i.e.&nbsp;u:\ John.doe.p00002) And <strong>.Renviron file</strong> should contain the following:</em></p>
+<div class="sourceCode" id="cb9"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>DBUSER<span class="ot">=</span><span class="st">'adrf\John.doe.p00002'</span></span>
+<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>DBPASSWD<span class="ot">=</span><span class="st">'xxxxxxxxxxxx'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<p><em>PLEASE replace user id and password with your project workspace specific user is and password.</em></p>
 <p><em>This will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.</em></p>
-<ul>
-<li><strong>Python Connection</strong></li>
-</ul>
-<div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pyodbc</span>
-<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
-<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>cnxn <span class="op">=</span> pyodbc.<span class="ex">connect</span>(<span class="st">'DSN=Redshift01_projects_DSN;</span></span>
-<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="er">                      UID = adrf\\user.name.project; PWD = password')</span></span>
-<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> pd.read_sql(“SELECT <span class="op">*</span> FROM projects.schema_name.table_name”, cnxn)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
-<ul>
-<li><strong>Stata Connection</strong></li>
-</ul>
-<div class="sourceCode" id="cb6"><pre class="sourceCode stata code-with-copy"><code class="sourceCode stata"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">odbc</span> <span class="kw">load</span>, <span class="kw">exec</span>(“select * from PATH_TO_TABLE”) <span class="kw">clear</span> </span>
-<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>dsn(“Redshift01_projects_DSN<span class="st">") user(“adrf\user.name.project) password(“password”)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</section>
+<section id="python-connection" class="level3">
+<h3 class="anchored" data-anchor-id="python-connection">Python Connection</h3>
+<div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pyodbc</span>
+<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
+<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>cnxn <span class="op">=</span> pyodbc.<span class="ex">connect</span>(<span class="st">'DSN=Redshift01_projects_DSN;</span></span>
+<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="er">                      UID = adrf\\user.name.project; PWD = password')</span></span>
+<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> pd.read_sql(“SELECT <span class="op">*</span> FROM projects.schema_name.table_name”, cnxn)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</section>
+<section id="stata-connection" class="level3">
+<h3 class="anchored" data-anchor-id="stata-connection">Stata Connection</h3>
+<div class="sourceCode" id="cb11"><pre class="sourceCode stata code-with-copy"><code class="sourceCode stata"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">odbc</span> <span class="kw">load</span>, <span class="kw">exec</span>(“select * from PATH_TO_TABLE”) <span class="kw">clear</span> </span>
+<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>dsn(“Redshift01_projects_DSN<span class="st">") user(“adrf\user.name.project) password(“password”)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</section>
 </section>
 <section id="redshift-query-guidelines-for-researchers" class="level2">
 <h2 class="anchored" data-anchor-id="redshift-query-guidelines-for-researchers"><strong>Redshift Query Guidelines for Researchers</strong></h2>
 <p><em>Developing your query.</em> Here’s an example workflow to follow when developing a query.</p>
 <ol type="1">
-<li><p>Study the column and table metadata, which is accessible via the table definition.&nbsp; Each table definition can be displayed by clicking on the [+] next the table name.&nbsp;</p></li>
+<li><p>Study the column and table metadata, which is accessible via the table definition. Each table definition can be displayed by clicking on the [+] next the table name.&nbsp;</p></li>
 <li><p>To get a feel for a table’s values, SELECT * from the tables you’re working with and LIMIT your results (Keep the LIMIT applied as you refine your columns) or use (e.g., select&nbsp; * from [table name] LIMIT 1000 )</p></li>
 <li><p>Narrow down the columns to the minimal set required to answer your question.</p></li>
 <li><p>Apply any filters to those columns.</p></li>
-<li><p>If you need to aggregate data, aggregate a small number of rows&nbsp;</p></li>
+<li><p>If you need to aggregate data, aggregate a small number of rows</p></li>
 <li><p>Once you have a query returning the results you need, look for sections of the query to save as a Common Table Expression (CTE) to encapsulate that logic.</p></li>
 </ol>
 <section id="do-and-dont-do-best-practices" class="level3">
diff --git a/export.html b/export.html
index 3d7b384..274257e 100644
--- a/export.html
+++ b/export.html
@@ -203,6 +203,8 @@ <h2 id="toc-title">Table of contents</h2>
   <ul class="collapse">
   <li><a href="#general-best-practices-for-a-successful-export" id="toc-general-best-practices-for-a-successful-export" class="nav-link" data-scroll-target="#general-best-practices-for-a-successful-export">General Best Practices for a Successful Export</a></li>
   <li><a href="#timelines-for-export-process" id="toc-timelines-for-export-process" class="nav-link" data-scroll-target="#timelines-for-export-process">Timelines for Export Process</a></li>
+  <li><a href="#export-review-process" id="toc-export-review-process" class="nav-link" data-scroll-target="#export-review-process">Export Review Process</a></li>
+  <li><a href="#how-to-check-your-export-review-status" id="toc-how-to-check-your-export-review-status" class="nav-link" data-scroll-target="#how-to-check-your-export-review-status">How to Check Your Export Review Status:</a></li>
   </ul></li>
   <li><a href="#preparing-data-for-export" id="toc-preparing-data-for-export" class="nav-link" data-scroll-target="#preparing-data-for-export"><strong>Preparing Data for Export</strong></a>
   <ul class="collapse">
@@ -236,7 +238,7 @@ <h1 class="title"><span class="chapter-number">9</span>&nbsp; <span class="chapt
 
 <section id="export-review-guidelines" class="level2">
 <h2 class="anchored" data-anchor-id="export-review-guidelines"><strong>Export Review Guidelines</strong></h2>
-<p>To provide ADRF users with the ability to draw from sensitive data, results that are exported from the ADRF must meet rigorous standards meant to protect privacy and confidentiality. To ensure that those standards are met, the ADRF Export Review team reviews each request to ensure that it follows formal guidelines that are set by the respective agency providing the data in partnership with the Coleridge Initiative. Prior to moving data into the ADRF from the agency, the Export Review team suggests default guidelines to implement, based on standard statistical approaches in the U.S. government <sup><a href="https://coleridgeinitiative.atlassian.net/wiki/spaces/ADRF/pages/1789886504" title="/wiki/spaces/ADRF/pages/1789886504">1,&nbsp;2</a></sup> as well as international standards <sup><a href="https://coleridgeinitiative.atlassian.net/wiki/spaces/ADRF/pages/1789886504" title="/wiki/spaces/ADRF/pages/1789886504">3,&nbsp;4</a></sup>, and <sup><a href="https://coleridgeinitiative.atlassian.net/wiki/spaces/ADRF/pages/1789886504" title="/wiki/spaces/ADRF/pages/1789886504">5</a></sup>. The Data Steward from the agency supplying the data works with the team to amend these default rules in line with the agency’s requirements. If you are unsure about the review guidelines for the data you are using in the ADRF or if you have any questions relating to exports, please reach out to support@coleridgeinitiative.org before submitting an export request.</p>
+<p>To provide ADRF users with the ability to draw from sensitive data, results that are exported from the ADRF must meet rigorous standards meant to protect privacy and confidentiality. To ensure that those standards are met, the ADRF Export Review team reviews each request to ensure that it follows formal guidelines that are set by the respective agency providing the data in partnership with the Coleridge Initiative. Prior to moving data into the ADRF from the agency, the Export Review team suggests default guidelines to implement, based on standard statistical approaches in the U.S. government <sup><a href="./references.html#References">1,2</a></sup> as well as international standards <sup><a href="./references.html#References">3,4</a></sup>, and <sup><a href="./references.html#References">5</a></sup>. The Data Steward from the agency supplying the data works with the team to amend these default rules in line with the agency’s requirements. If you are unsure about the review guidelines for the data you are using in the ADRF or if you have any questions relating to exports, please reach out to support@coleridgeinitiative.org before submitting an export request.</p>
 <p>To learn more about limiting disclosure more generally, please refer to the <a href="https://textbook.coleridgeinitiative.org/" title="https://textbook.coleridgeinitiative.org/">textbook</a> or view the <a href="https://www.youtube.com/playlist?list=PLCtgqmGgzkEzwdPYKfrNHtOKFu3P5ErrU" title="https://www.youtube.com/playlist?list=PLCtgqmGgzkEzwdPYKfrNHtOKFu3P5ErrU">videos</a>.</p>
 <section id="general-best-practices-for-a-successful-export" class="level3">
 <h3 class="anchored" data-anchor-id="general-best-practices-for-a-successful-export">General Best Practices for a Successful Export</h3>
@@ -257,15 +259,49 @@ <h3 class="anchored" data-anchor-id="timelines-for-export-process">Timelines for
 <li><p>The review process can be delayed if the reviewer needs additional information or if the reviewer needs you to make changes to your code or output to meet the ADRF nondisclosure requirements.</p></li>
 </ol>
 </section>
+<section id="export-review-process" class="level3">
+<h3 class="anchored" data-anchor-id="export-review-process">Export Review Process</h3>
+<p>The ADRF Export Review process typically involves two main stages:</p>
+<ol type="1">
+<li>Primary Review:</li>
+</ol>
+<p>This is an initial, cursory review of your documentation and exports to ensure they do not include micro-data. A primary review can take up to 5 business days, so please plan accordingly when submitting your materials.</p>
+<p>In cases where the reviewer has questions or requires additional information, the primary review may extend beyond 5 business days.</p>
+<ol start="2" type="1">
+<li>Secondary Review:</li>
+</ol>
+<p>This is a comprehensive review conducted by an approved Data Steward who has content knowledge for the data permissioned to your workspace.</p>
+<p>If your submission pertains to multiple data assets, it will require approval by each Data Steward before the material can be exported from the ADRF.</p>
+</section>
+<section id="how-to-check-your-export-review-status" class="level3">
+<h3 class="anchored" data-anchor-id="how-to-check-your-export-review-status">How to Check Your Export Review Status:</h3>
+<p>If you’ve submitted an export request, you can easily check the status of your submission by following these steps:</p>
+<ol type="1">
+<li><p>Log into the ADRF.</p></li>
+<li><p>Open the ADRF Export module.</p></li>
+</ol>
+<section id="status-descriptions" class="level4">
+<h4 class="anchored" data-anchor-id="status-descriptions">Status Descriptions:</h4>
+<p>To help you better understand the different stages of the Export Review process, here are the status descriptions you may encounter:</p>
+<ol type="1">
+<li>Awaiting Reviewer:</li>
+</ol>
+<p>Your export is currently under primary review. If any issues arise during the primary review, your reviewer will notify you. Upon completion of the primary review, the secondary reviewer(s) will be notified.</p>
+<ol start="2" type="1">
+<li>Awaiting Secondary Review:</li>
+</ol>
+<p>Your export is currently under secondary review. If your submission pertains to multiple data assets, it will require a review by each Data Steward before being approved.</p>
+</section>
+</section>
 </section>
 <section id="preparing-data-for-export" class="level2">
 <h2 class="anchored" data-anchor-id="preparing-data-for-export"><strong>Preparing Data for Export</strong></h2>
+<p>Each agency has specific disclosure review guidelines, especially with respect to the minimum allowable cell sizes for tables. Refer to these guidelines when preparing export requests. If you are unsure of what guidelines are in place for the dataset with which you are working in the ADRF, please reach out to <a href="mailto:support@coleridgeinitiative.org">support@coleridgeinitiative.org</a>.</p>
 <section id="tables" class="level3">
 <h3 class="anchored" data-anchor-id="tables">Tables</h3>
 <ol type="1">
 <li><p>Cell Sizes</p>
 <ol type="1">
-<li><p>Each agency has specific disclosure review guidelines, especially with respect to the minimum allowable cell sizes for tables. Refer to these guidelines when preparing export requests. If you are unsure of what guidelines are in place for the dataset with which you are working in the ADRF, please reach out to <a href="mailto:support@coleridgeinitiative.org">support@coleridgeinitiative.org</a>.</p></li>
 <li><p>For individual-level data, please report the number of observations from each cell. For individual-level data, the default rule is to suppress cells with fewer than 10 observations, unless otherwise directed by the guidelines of the agency that provided the data.</p></li>
 <li><p>If your table includes row or column totals or is dependent on a preceding or subsequent table, reviewers will need to take into account complementary disclosure risks—that is, whether the tables’ totals, or the separate tables when read together, might disclose information about individuals in the data in a way that a single, simpler table would not. Reviewers will work with you by offering guidance on implementing any necessary complementary suppression techniques.</p></li>
 </ol></li>
diff --git a/index.html b/index.html
index 2fe56e2..1b61467 100644
--- a/index.html
+++ b/index.html
@@ -225,7 +225,7 @@ <h1 class="title">ADRF Onboarding Handbook</h1>
     <div>
     <div class="quarto-title-meta-heading">Published</div>
     <div class="quarto-title-meta-contents">
-      <p class="date">Last Updated on 17 October, 2023</p>
+      <p class="date">Last Updated on 23 October, 2023</p>
     </div>
   </div>
   
diff --git a/packages.html b/packages.html
index ad42685..088a7a1 100644
--- a/packages.html
+++ b/packages.html
@@ -252,7 +252,8 @@ <h2 class="anchored" data-anchor-id="r-packages">R packages</h2>
 <p><code>library(tidyverse)</code></p>
 <p>All packages will be installed in your user folder.</p>
 <p>To install a specific package version you can specify:</p>
-<p><code>install.packages("remotes")</code> <code>remotes::install_version("tidyverse", "1.3.2")</code></p>
+<p><code>install.packages("remotes")</code></p>
+<p><code>remotes::install_version("tidyverse", "1.3.2")</code></p>
 <div class="callout callout-style-default callout-note callout-titled">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">
diff --git a/search.json b/search.json
index 0c02753..844a6ae 100644
--- a/search.json
+++ b/search.json
@@ -186,14 +186,14 @@
     "href": "export.html#export-review-guidelines",
     "title": "9  Export guidelines",
     "section": "Export Review Guidelines",
-    "text": "Export Review Guidelines\nTo provide ADRF users with the ability to draw from sensitive data, results that are exported from the ADRF must meet rigorous standards meant to protect privacy and confidentiality. To ensure that those standards are met, the ADRF Export Review team reviews each request to ensure that it follows formal guidelines that are set by the respective agency providing the data in partnership with the Coleridge Initiative. Prior to moving data into the ADRF from the agency, the Export Review team suggests default guidelines to implement, based on standard statistical approaches in the U.S. government 1, 2 as well as international standards 3, 4, and 5. The Data Steward from the agency supplying the data works with the team to amend these default rules in line with the agency’s requirements. If you are unsure about the review guidelines for the data you are using in the ADRF or if you have any questions relating to exports, please reach out to support@coleridgeinitiative.org before submitting an export request.\nTo learn more about limiting disclosure more generally, please refer to the textbook or view the videos.\n\nGeneral Best Practices for a Successful Export\n\nCurrently, the review process is highly manual: Reviewers will read your code and view your output files, which may be time-consuming.\nEach additional release adds disclosure risk and therefore limits subsequent releases; we ask that users limit the number of files they request to export to just the outputs necessary to produce a particular report or paper. If you are requesting an export of more than 10 files, there may be an additional charge.\nThe reviewers may ask you to make changes to your code or output to meet the requirements of guidelines that have been given by the providers of the data in the ADRF. Thus, we strongly encourage you to produce all output files—tables with rounded numbers, graphs with titles, and so forth—through code, rather than manually.\nWe ask that you only request review of final versions of output files, rather than in-progress versions. Any file containing intermediate output will be rejected.\nEvery code file should have a header describing the contents of the file, including a summary of the data manipulation that takes place in the file (e.g., regression, table or figure creation, etc.).\nDocumenting code by using comments throughout is helpful for disclosure reviews. The better the documentation, the faster the turnaround of export requests. If data files are aggregated, please provide documentation on the level of aggregation and for where in the code the aggregation takes place.\nTo help reviewers, who may not have seen your code before, we ask that users create meaningful variable names. For instance, if you are calculating outflows, it is better to name the variable “outflows” than to name it “var1.”\n\n\n\nTimelines for Export Process\n\nColeridge reviewers have five business days to complete an export from the day you submit an export request. However, timelines may differ depending on your agency, so please refer to your specific agency’s guidelines.\nThe review process can be delayed if the reviewer needs additional information or if the reviewer needs you to make changes to your code or output to meet the ADRF nondisclosure requirements."
+    "text": "Export Review Guidelines\nTo provide ADRF users with the ability to draw from sensitive data, results that are exported from the ADRF must meet rigorous standards meant to protect privacy and confidentiality. To ensure that those standards are met, the ADRF Export Review team reviews each request to ensure that it follows formal guidelines that are set by the respective agency providing the data in partnership with the Coleridge Initiative. Prior to moving data into the ADRF from the agency, the Export Review team suggests default guidelines to implement, based on standard statistical approaches in the U.S. government 1,2 as well as international standards 3,4, and 5. The Data Steward from the agency supplying the data works with the team to amend these default rules in line with the agency’s requirements. If you are unsure about the review guidelines for the data you are using in the ADRF or if you have any questions relating to exports, please reach out to support@coleridgeinitiative.org before submitting an export request.\nTo learn more about limiting disclosure more generally, please refer to the textbook or view the videos.\n\nGeneral Best Practices for a Successful Export\n\nCurrently, the review process is highly manual: Reviewers will read your code and view your output files, which may be time-consuming.\nEach additional release adds disclosure risk and therefore limits subsequent releases; we ask that users limit the number of files they request to export to just the outputs necessary to produce a particular report or paper. If you are requesting an export of more than 10 files, there may be an additional charge.\nThe reviewers may ask you to make changes to your code or output to meet the requirements of guidelines that have been given by the providers of the data in the ADRF. Thus, we strongly encourage you to produce all output files—tables with rounded numbers, graphs with titles, and so forth—through code, rather than manually.\nWe ask that you only request review of final versions of output files, rather than in-progress versions. Any file containing intermediate output will be rejected.\nEvery code file should have a header describing the contents of the file, including a summary of the data manipulation that takes place in the file (e.g., regression, table or figure creation, etc.).\nDocumenting code by using comments throughout is helpful for disclosure reviews. The better the documentation, the faster the turnaround of export requests. If data files are aggregated, please provide documentation on the level of aggregation and for where in the code the aggregation takes place.\nTo help reviewers, who may not have seen your code before, we ask that users create meaningful variable names. For instance, if you are calculating outflows, it is better to name the variable “outflows” than to name it “var1.”\n\n\n\nTimelines for Export Process\n\nColeridge reviewers have five business days to complete an export from the day you submit an export request. However, timelines may differ depending on your agency, so please refer to your specific agency’s guidelines.\nThe review process can be delayed if the reviewer needs additional information or if the reviewer needs you to make changes to your code or output to meet the ADRF nondisclosure requirements.\n\n\n\nExport Review Process\nThe ADRF Export Review process typically involves two main stages:\n\nPrimary Review:\n\nThis is an initial, cursory review of your documentation and exports to ensure they do not include micro-data. A primary review can take up to 5 business days, so please plan accordingly when submitting your materials.\nIn cases where the reviewer has questions or requires additional information, the primary review may extend beyond 5 business days.\n\nSecondary Review:\n\nThis is a comprehensive review conducted by an approved Data Steward who has content knowledge for the data permissioned to your workspace.\nIf your submission pertains to multiple data assets, it will require approval by each Data Steward before the material can be exported from the ADRF.\n\n\nHow to Check Your Export Review Status:\nIf you’ve submitted an export request, you can easily check the status of your submission by following these steps:\n\nLog into the ADRF.\nOpen the ADRF Export module.\n\n\nStatus Descriptions:\nTo help you better understand the different stages of the Export Review process, here are the status descriptions you may encounter:\n\nAwaiting Reviewer:\n\nYour export is currently under primary review. If any issues arise during the primary review, your reviewer will notify you. Upon completion of the primary review, the secondary reviewer(s) will be notified.\n\nAwaiting Secondary Review:\n\nYour export is currently under secondary review. If your submission pertains to multiple data assets, it will require a review by each Data Steward before being approved."
   },
   {
     "objectID": "export.html#preparing-data-for-export",
     "href": "export.html#preparing-data-for-export",
     "title": "9  Export guidelines",
     "section": "Preparing Data for Export",
-    "text": "Preparing Data for Export\n\nTables\n\nCell Sizes\n\nEach agency has specific disclosure review guidelines, especially with respect to the minimum allowable cell sizes for tables. Refer to these guidelines when preparing export requests. If you are unsure of what guidelines are in place for the dataset with which you are working in the ADRF, please reach out to support@coleridgeinitiative.org.\nFor individual-level data, please report the number of observations from each cell. For individual-level data, the default rule is to suppress cells with fewer than 10 observations, unless otherwise directed by the guidelines of the agency that provided the data.\nIf your table includes row or column totals or is dependent on a preceding or subsequent table, reviewers will need to take into account complementary disclosure risks—that is, whether the tables’ totals, or the separate tables when read together, might disclose information about individuals in the data in a way that a single, simpler table would not. Reviewers will work with you by offering guidance on implementing any necessary complementary suppression techniques.\n\nWeighted Data\n\nIf weighted results are to be exported, you must report both weighted and unweighted counts.\n\nRatios\n\nIf ratios are reported, please report the number of valid cases for both the numerator and the denominator (e.g., number of men in state X and number of women in state X, in addition to the ratio of women in state X).\n\nPercentiles\n\nDo not report exact percentiles. Instead, for example, you may calculate a “fuzzy median,” by averaging the true 45th and 55th percentiles.\n\nPercentages\n\nFor any reported percentages or proportions, the underlying counts of individuals contributing to the numerators and denominators must be provided for each statistic in the desired export.\n\nMaxima and Minima\n\nSuppress maximum and minimum values in general.\nYou may replace an exact maximum or minimum with a top-coded value.\n\n\n\n\nGraphs\n\nGraphs are representations of tables. Thus, for each graph (which may have, e.g., a jpg, pdf, png, or tif extension), provide the source data of the underlying table of the graph following the guidelines for tables above.\nBecause graphs and other figures take the most time to review, the number of generated graphs should be as low as possible. Please consider the possibility that you could export the underlying table instead, and generate the graph in another package.\nIf a graph is produced from aggregated data or from tables that have been disclosure-proofed following the guidelines above (e.g., bar charts of magnitudes), provide the underlying tables.\nIf a graph is produced directly from unit-record data but aggregated in the visualization (e.g., frequency histograms), provide the underlying tables.\nIf a graph is produced directly from unit-record data and displays unit-record values (e.g., scatterplots, plots of residuals), the graph can be released only after you ensure that individuals cannot be re-identified and that values can only be estimated with a high level of uncertainty. Further processing to meet this requirement can include, but is not restricted to, cutting off the tails of a distribution, removing outliers, jittering the actual values, and removing or modifying axis values.\nIf a graph is produced from the results of modeling or derivation and uses the unit-record data (e.g., regression curves), the graph can be released only if the values cannot be used to find original data values.\n\nGraphs of this type are generally automatically cleared.\nFor precision/recall graphs, you will need to report the sample size used to generate your model(s).\n\n\n\n\nModel Output\n\nOutput from regression or machine-learning models generally does not pose a risk of disclosing personally identifiable information, as long as the models are not based on small samples. Provide the counts for each variable that produces the model output. If categorical variables are used then provide the counts for each category."
+    "text": "Preparing Data for Export\nEach agency has specific disclosure review guidelines, especially with respect to the minimum allowable cell sizes for tables. Refer to these guidelines when preparing export requests. If you are unsure of what guidelines are in place for the dataset with which you are working in the ADRF, please reach out to support@coleridgeinitiative.org.\n\nTables\n\nCell Sizes\n\nFor individual-level data, please report the number of observations from each cell. For individual-level data, the default rule is to suppress cells with fewer than 10 observations, unless otherwise directed by the guidelines of the agency that provided the data.\nIf your table includes row or column totals or is dependent on a preceding or subsequent table, reviewers will need to take into account complementary disclosure risks—that is, whether the tables’ totals, or the separate tables when read together, might disclose information about individuals in the data in a way that a single, simpler table would not. Reviewers will work with you by offering guidance on implementing any necessary complementary suppression techniques.\n\nWeighted Data\n\nIf weighted results are to be exported, you must report both weighted and unweighted counts.\n\nRatios\n\nIf ratios are reported, please report the number of valid cases for both the numerator and the denominator (e.g., number of men in state X and number of women in state X, in addition to the ratio of women in state X).\n\nPercentiles\n\nDo not report exact percentiles. Instead, for example, you may calculate a “fuzzy median,” by averaging the true 45th and 55th percentiles.\n\nPercentages\n\nFor any reported percentages or proportions, the underlying counts of individuals contributing to the numerators and denominators must be provided for each statistic in the desired export.\n\nMaxima and Minima\n\nSuppress maximum and minimum values in general.\nYou may replace an exact maximum or minimum with a top-coded value.\n\n\n\n\nGraphs\n\nGraphs are representations of tables. Thus, for each graph (which may have, e.g., a jpg, pdf, png, or tif extension), provide the source data of the underlying table of the graph following the guidelines for tables above.\nBecause graphs and other figures take the most time to review, the number of generated graphs should be as low as possible. Please consider the possibility that you could export the underlying table instead, and generate the graph in another package.\nIf a graph is produced from aggregated data or from tables that have been disclosure-proofed following the guidelines above (e.g., bar charts of magnitudes), provide the underlying tables.\nIf a graph is produced directly from unit-record data but aggregated in the visualization (e.g., frequency histograms), provide the underlying tables.\nIf a graph is produced directly from unit-record data and displays unit-record values (e.g., scatterplots, plots of residuals), the graph can be released only after you ensure that individuals cannot be re-identified and that values can only be estimated with a high level of uncertainty. Further processing to meet this requirement can include, but is not restricted to, cutting off the tails of a distribution, removing outliers, jittering the actual values, and removing or modifying axis values.\nIf a graph is produced from the results of modeling or derivation and uses the unit-record data (e.g., regression curves), the graph can be released only if the values cannot be used to find original data values.\n\nGraphs of this type are generally automatically cleared.\nFor precision/recall graphs, you will need to report the sample size used to generate your model(s).\n\n\n\n\nModel Output\n\nOutput from regression or machine-learning models generally does not pose a risk of disclosing personally identifiable information, as long as the models are not based on small samples. Provide the counts for each variable that produces the model output. If categorical variables are used then provide the counts for each category."
   },
   {
     "objectID": "export.html#submitting-an-export-request",
@@ -214,7 +214,7 @@
     "href": "packages.html#r-packages",
     "title": "10  Adding Additional Packages in R/Python",
     "section": "R packages",
-    "text": "R packages\nTo install R packages, simply type:\ninstall.packages(\"packagename\")\n\nand the package will be installed from the repository. You will not have to re-install the package again, and to use the package load it with the library() function. For example:\nlibrary(tidyverse)\nAll packages will be installed in your user folder.\nTo install a specific package version you can specify:\ninstall.packages(\"remotes\") remotes::install_version(\"tidyverse\", \"1.3.2\")\n\n\n\n\n\n\nNote\n\n\n\nWe recommend starting R using Rstudio for best results, instead of double clicking on a R or Rmarkdown script."
+    "text": "R packages\nTo install R packages, simply type:\ninstall.packages(\"packagename\")\n\nand the package will be installed from the repository. You will not have to re-install the package again, and to use the package load it with the library() function. For example:\nlibrary(tidyverse)\nAll packages will be installed in your user folder.\nTo install a specific package version you can specify:\ninstall.packages(\"remotes\")\nremotes::install_version(\"tidyverse\", \"1.3.2\")\n\n\n\n\n\n\nNote\n\n\n\nWe recommend starting R using Rstudio for best results, instead of double clicking on a R or Rmarkdown script."
   },
   {
     "objectID": "packages.html#python-packages",
@@ -249,14 +249,14 @@
     "href": "appendix.html#data-access",
     "title": "12  Redshift querying guide",
     "section": "Data Access",
-    "text": "Data Access\nThe data is housed in Redshift. You need to replace the “user.name.project” with your project based username. The project based username is your user folder name in the U:/ drive:\n\n\n\n\n\n\nNote: Your username will be different than in these examples.\n\nThe password needed to access Redshift is the second password entered when logging into the ADRF as shown in the screen below:\n\n\n\n\n\nAll data is stored under schemas in the projects database and are accessible by the following programs: \n\nDBeaver\n\nTo establish a connection to Redshift in DBeaver, first double click on the server you wish to connect to. In the example below I’m connecting to Redshift11_projects. Then a window will appear asking for your Username and Password. This will be your user folder name and include adrf\\ before the username. Then click OK. You will now have access to your data stored on the Redshift11_projects server.\n\n\n\n\n\nCreating Tables in PR/TR Schema\nWhen users create tables in their PR (Research Project) or TR (Training Project) schema, the table is initially permissioned to the user only. This is analogous to creating a document or file in your U drive: Only you have access to the newly created table.\nIf you want to allow all individuals in your project workspace to access the table in the PR/TR schema, you will need to grant permission to the table to the rest of the users who have access to the PR or TR schema.\nYou can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name TO group db_xxxxxx_rw;\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in the group name db_xxxxxx_rw, replace xxxxxx with your project code. This is the last 6 characters in your project based user name. This will start with either a T or a P.\n\nIf you want to allow only a single user on your project to access the table, you will need to grant permission to that user. You can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name to \"IAM:first_name.last_name.project_code\";\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in \"IAM:first_name.last_name.project_code\" update first_name.last_name.project_code with the user name to whom you want to grant access to.\n\nIf you have any questions, please reach out to us at support@coleridgeinitiative.org\nWhen connecting to the database through SAS, R, Stata, or Python you need to use one of the following DSNs:\n\nRedshift01_projects_DSN\nRedshift11_projects_DSN\n\nIn the code examples below, the default DSN is Redshift01_projects_DSN.\n\nSAS Connection\n\nproc sql;\nconnect to odbc as my con\n(datasrc=Redshift01_projects_DSN user=adrf\\user.name.project password=password);\nselect * from connection to mycon\n(select * form projects.schema.table);\ndisconnect from mycon;\nquit;\n\nR Connection\n\nBest practices for loading large amounts of data in R\nTo ensure R can efficiently manage large amounts of data, please add the following lines of code to your R script before any packages are loaded: \noptions(java.parameters = c(\"-XX:+UseConcMarkSweepGC\", \"-Xmx8192m\")) gc()\nRedshift R database connectivity Changes\nWhen connecting to Redshift database using R (whether in R Studio or Jupyter Notebook), We strongly recommend that you use a JDBC based connection (versus using an ODBC based connection). Other than how you connect to database, rest of your code should remain the same.\nBest practices for writing tables to Redshift\nWhen writing tables to Redshift from R in either a SQL query or R data frame, please use the following lines of R code:\nWriting a table from R to Redshift using SQL INTO statement use the function dbSendUpdate():\ndbSendUpdate(conn, SELECT col_1 INTO schema_name.table_name FROM schema_name.old_table_name\nWhen writing an R data frame to Redshift use the following code as an example:\nqry &lt;- \"set search_path to schema_name\"\ndbSendUpdate(conn, qry)\n\ndbWriteTable(\nconn = conn, #name of the connection\nname = 'table_name', #name of table to save df to\nvalue = df_name, #name of df to write to Redshift\noverwrite = TRUE #if you want to overwrite a current table, otherwise FALSE\nThe below table is for connecting to RedShift11 Database\n\n\n\n\n\n\n\n\n\nODBC Based Connection\nJDBC Based Connection(Recommended)\n\n\nLibraries\nlibrary(odbc)\nlibrary(RJDBC)\n\n\nUser ID and Password\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\n\n\n\n# Database URL\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\nloginToRp=urn:amazon:webservices:govcloud;\nssl=true;\nAutoCreate=true;\nidp_host=adfs.adrf.net;\nidp_port=443;\nssl_insecure=true;\nplugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\n\n\nConnection\ncon &lt;- dbConnect(odbc(),\"Redshift11_projects_DSN\",uid = \"adrf\\\\John.doe.p00002\", pwd = 'xxxxxxxxxxxxxx')\nconn &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\n\n\nFor the above code to work, please create a file name            .Renviron in your user folder (user folder is something like  i.e. u:\\ John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002'\nDBPASSWD='xxxxxxxxxxxx'\nPLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\n \nThe below table is for connecting to RedShift01 Database\n\n\n\n\n\n\n\n\n\n\nODBC Based Connection\nJDBC Based Connection (Recommended)\n\n\nLibraries\nlibrary(odbc)\nlibrary(RJDBC)\n\n\nUser ID and Password\n\ndbusr=Sys.getenv(\"DBUSER\") dbpswd=Sys.getenv(\"DBPASSWD\")\n\n\n\n\n\n# Database URL\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift01.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\nloginToRp=urn:amazon:webservices:govcloud;\nssl=true;AutoCreate=true;\nidp_host=adfs.adrf.net;\nidp_port=443;\nssl_insecure=true;\nplugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\n\n\nConnection\ncon &lt;- dbConnect(odbc(), \"Redshift11_projects_DSN\", uid = \"adrf\\\\John.doe.p00002\", pwd = 'xxxxxxxxxxxxxx')\nconn &lt;- dbConnect(driver, url,\ndbusr, dbpswd)\n\n\n\nFor the above code to work, please create a file name .Renviron in your user folder (user folder is something like  i.e. u:\\ John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002’\nDBPASSWD='xxxxxxxxxxxx'\n PLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\n\nPython Connection\n\nimport pyodbc\nimport pandas as pd\ncnxn = pyodbc.connect('DSN=Redshift01_projects_DSN;\n                      UID = adrf\\\\user.name.project; PWD = password')\ndf = pd.read_sql(“SELECT * FROM projects.schema_name.table_name”, cnxn)\n\nStata Connection\n\nodbc load, exec(“select * from PATH_TO_TABLE”) clear \ndsn(“Redshift01_projects_DSN\") user(“adrf\\user.name.project) password(“password”)"
+    "text": "Data Access\nThe data is housed in Redshift. You need to replace the “user.name.project” with your project based username. The project based username is your user folder name in the U:/ drive:\n\n\n\n\n\n\nNote: Your username will be different than in these examples.\n\nThe password needed to access Redshift is the second password entered when logging into the ADRF as shown in the screen below:\n\n\n\n\n\nAll data is stored under schemas in the projects database and are accessible by the following programs: \n\nDBeaver\nTo establish a connection to Redshift in DBeaver, first double click on the server you wish to connect to. In the example below I’m connecting to Redshift11_projects. Then a window will appear asking for your Username and Password. This will be your user folder name and include adrf\\ before the username. Then click OK. You will now have access to your data stored on the Redshift11_projects server.\n\n\n\n\n\nCreating Tables in PR/TR Schema\nWhen users create tables in their PR (Research Project) or TR (Training Project) schema, the table is initially permissioned to the user only. This is analogous to creating a document or file in your U drive: Only you have access to the newly created table.\nIf you want to allow all individuals in your project workspace to access the table in the PR/TR schema, you will need to grant permission to the table to the rest of the users who have access to the PR or TR schema.\nYou can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name TO group db_xxxxxx_rw;\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in the group name db_xxxxxx_rw, replace xxxxxx with your project code. This is the last 6 characters in your project based user name. This will start with either a T or a P.\n\nIf you want to allow only a single user on your project to access the table, you will need to grant permission to that user. You can do this by running the following code:\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE schema_name.table_name to \"IAM:first_name.last_name.project_code\";\n\nNote: In the above code example replace schma_name with the pr_ or tr_ schema assigned to your workspace and replace table_name with the name of the table on which you want to grant access. Also, in \"IAM:first_name.last_name.project_code\" update first_name.last_name.project_code with the user name to whom you want to grant access to.\n\nIf you have any questions, please reach out to us at support@coleridgeinitiative.org\nWhen connecting to the database through SAS, R, Stata, or Python you need to use one of the following DSNs:\n\nRedshift01_projects_DSN\nRedshift11_projects_DSN\n\nIn the code examples below, the default DSN is Redshift01_projects_DSN.\n\n\nSAS Connection\nproc sql;\nconnect to odbc as my con\n(datasrc=Redshift01_projects_DSN user=adrf\\user.name.project password=password);\nselect * from connection to mycon\n(select * form projects.schema.table);\ndisconnect from mycon;\nquit;\n\n\nR Connection\nBest practices for loading large amounts of data in R\nTo ensure R can efficiently manage large amounts of data, please add the following lines of code to your R script before any packages are loaded: \noptions(java.parameters = c(\"-XX:+UseConcMarkSweepGC\", \"-Xmx8192m\")) gc()\nBest practices for writing tables to Redshift\nWhen writing an R data frame to Redshift use the following code as an example:\n# Note: replace the table_name with the name of the data frame you wish to write to Redshift\n\nDBI::dbWriteTable(conn = conn, #name of the connection \nname = \"schema_name.table_name\", #name of table to save df to \nvalue = df_name, #name of df to write to Redshift \noverwrite = TRUE) #if you want to overwrite a current table, otherwise FALSE\n\nqry &lt;- \"GRANT SELECT ON TABLE schema.table_name TO group &lt;group_name&gt;;\"\ndbSendUpdate(conn,qry)\nThe below table is for connecting to RedShift11 Database\nlibrary(RJDBC)\ndbusr=Sys.getenv(\"DBUSER\")                                                                                  \ndbpswd=Sys.getenv(\"DBPASSWD\")\n\n# Database URL\nurl &lt;- paste0(\"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\",\n\"loginToRp=urn:amazon:webservices:govcloud;\",\n\"ssl=true;\",\n\"AutoCreate=true;\",\n\"idp_host=adfs.adrf.net;\",\n\"idp_port=443;\",\n\"ssl_insecure=true;\",\n\"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\")\n\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\nconn &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor the above code to work, please create a file name .Renviron in your user folder (user folder is something like i.e. u:\\ John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002'\nDBPASSWD='xxxxxxxxxxxx'\nPLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\nThe below table is for connecting to RedShift01 Database\nlibrary(RJDBC)\ndbusr=Sys.getenv(\"DBUSER\")                                                                                  \ndbpswd=Sys.getenv(\"DBPASSWD\")\n\n# Database URL\nurl &lt;- paste0(\"jdbc:redshift:iam://adrf-redshift01.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\",\n\"loginToRp=urn:amazon:webservices:govcloud;\",\n\"ssl=true;\",\n\"AutoCreate=true;\",\n\"idp_host=adfs.adrf.net;\",\n\"idp_port=443;\",\n\"ssl_insecure=true;\",\n\"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\")\n\n# Redshift JDBC Driver Setting\ndriver &lt;- JDBC(\"com.amazon.redshift.jdbc42.Driver\",\nclassPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\nidentifier.quote=\"`\")\nconn &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor the above code to work, please create a file name .Renviron in your user folder (user folder is something like i.e. u:\\ John.doe.p00002) And .Renviron file should contain the following:\nDBUSER='adrf\\John.doe.p00002'\nDBPASSWD='xxxxxxxxxxxx'\nPLEASE replace user id and password with your project workspace specific user is and password.\nThis will ensure you don’t have your id and password in R code and then you can easily share your R code with others without sharing your ID and password.\n\n\nPython Connection\nimport pyodbc\nimport pandas as pd\ncnxn = pyodbc.connect('DSN=Redshift01_projects_DSN;\n                      UID = adrf\\\\user.name.project; PWD = password')\ndf = pd.read_sql(“SELECT * FROM projects.schema_name.table_name”, cnxn)\n\n\nStata Connection\nodbc load, exec(“select * from PATH_TO_TABLE”) clear \ndsn(“Redshift01_projects_DSN\") user(“adrf\\user.name.project) password(“password”)"
   },
   {
     "objectID": "appendix.html#redshift-query-guidelines-for-researchers",
     "href": "appendix.html#redshift-query-guidelines-for-researchers",
     "title": "12  Redshift querying guide",
     "section": "Redshift Query Guidelines for Researchers",
-    "text": "Redshift Query Guidelines for Researchers\nDeveloping your query. Here’s an example workflow to follow when developing a query.\n\nStudy the column and table metadata, which is accessible via the table definition.  Each table definition can be displayed by clicking on the [+] next the table name. \nTo get a feel for a table’s values, SELECT * from the tables you’re working with and LIMIT your results (Keep the LIMIT applied as you refine your columns) or use (e.g., select  * from [table name] LIMIT 1000 )\nNarrow down the columns to the minimal set required to answer your question.\nApply any filters to those columns.\nIf you need to aggregate data, aggregate a small number of rows \nOnce you have a query returning the results you need, look for sections of the query to save as a Common Table Expression (CTE) to encapsulate that logic.\n\n\nDO and DON’T DO BEST PRACTICES:\n\nTip 1: Use SELECT &lt;columns&gt; instead of SELECT *\nSpecify the columns in the SELECT clause instead of using SELECT *. The unnecessary columns place extra load on the database, which slows down not just the single Amazon Redshift, but the whole system.\nInefficient\nSELECT * FROM projects.schema_name.table_name\nThis query fetches all the data stored in the table you choose which might not be required for a particular scenario.\nEfficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name\n\n\nTip 2: Always fetch limited data and target accurate results\nLesser the data retrieved, the faster the query will run. Rather than applying too many filters on the client-side, filter the data as much as possible at the server. This limits the data being sent on the wire and you’ll be able to see the results much faster.  In Amazon Redshift use LIMIT (###) qualifier at the end of the query to limit records.\nSELECT  col_A, col_B, col_C FROM projects.schema_name.table_name  WHERE [apply some filter] LIMIT 1000\n\n\nTip 3: Use wildcard characters wisely\nWildcard characters can be either used as a prefix or a suffix. Using leading wildcard (%) in combination with an ending wildcard will search all records for a match anywhere within the selected field.\nInefficient\nSelect col_A, col_B, col_C from projects.schema_name.table_name where col_A like '%BRO%'\nThis query will pull the expected results of Brown Sugar, Brownie, Brown Rice and so on. However, it will also pull unexpected results, such as Country Brown, Lamb with Broth, Cream of Broccoli.    \nEfficient\nSelect col_A, col_B, col_C from projects.schema_name.table_name where col_B like 'BRO%'.\nThis query will pull only the expected results of Brownie, Brown Rice, Brown Sugar and so on. \n\n\nTip 4: Does My record exist?\nNormally, developers use EXISTS() or COUNT() queries for matching a record entry. However, EXISTS() is more efficient as it will exit as soon as finding a matching record; whereas, COUNT() will scan the entire table even if the record is found in the first row.\nEfficient\nselect col_A from projects.schema_name.table_name A where exists (select 1 from projects.schema_name.table_name B where A.col_A = B.col_A ) order by col_A;\n\n\nTip 5: Avoid correlated subqueries\nA correlated subquery depends on the parent or outer query. Since it executes row by row, it decreases the overall speed of the process.  \nInefficient\nSELECT col_A, col_B, (SELECT col_C FROM projects.schema_name.table_name_a WHERE col_C = c.rma LIMIT 1) AS new_name FROM projects.schema_name.table_name_b\nHere, the problem is — the inner query is run for each row returned by the outer query. Going over the “table_name_b” table again and again for every row processed by the outer query creates process overhead. Instead, for Amazon Redshift query optimization, use JOIN to solve such problems.\nEfficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name c LEFT JOIN projects.schema_name.table_name co ON c.col_A = co.col_B\n\n\nTip 6: Avoid using Amazon Redshift function in the where condition\nOften developers use functions or methods with their Amazon Redshift queries. \nInefficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name WHERE RIGHT(birth_date,4) = '1965' and  LEFT(birth_date,2) = '07'\nNote that even if birth_date has an index, the above query changes the WHERE clause in such a way that this index cannot be used anymore.\nEfficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name WHERE birth_date between '711965' and '7311965'\n\n\nTip 7: Use WHERE instead of HAVING\nHAVING clause filters the rows after all the rows are selected. It is just like a filter. Do not use the HAVING clause for any other purposes. It is useful when performing group bys and aggregations.\n\n\nTip 8: Use temp tables when merging large data sets\nCreating local temp tables will limit the number of records in large table joins and merges.  Instead of performing large table joins, one can break out the analysis by performing the analysis in two steps: 1) create a temp table with limiting criteria to create a smaller / filtered result set.  2) join the temp table to the second large table to limit the number of records being fetched and to speed up the query. This is especially useful when there are no indexes on the join columns.\nInefficient\nSELECT col_A, col_B, sum(col_C) total FROM projects.schema_name.table_name  pd  INNER JOIN  projects.schema_name.table_name st  ON  pd.col_A=st.col_B WHERE pd.col_C like 'DOG%' GROUP BY pd.col_A, pd.col_B, pd.col_C\nNote that even if joining column col_A has an index, the col_B column does not.  In addition, because the size of some tables can be large, one should limit the size of the join table by first building a smaller filtered #temp table then performing the table joins.  \nEfficient\nSET search_path = schema_name;  -- this statement sets the default schema/database to projects.schema_name\nStep 1:\nCREATE TEMP TABLE temp_table (\ncol_A varchar(14),\ncol_B varchar(178),\ncol_C varchar(4) );\nStep 2:\nINSERT INTO temp_table SELECT col_A, col_B, col_C\nFROM projects.schema_name.table_name WHERE col_B like 'CAT%';\nStep 3:\nSELECT pd.col_A, pd.col_B, pd.col_C, sum(col_C) as total FROM  temp_table pd  INNER JOIN  projects.schema_name.table_name st  ON pd.col_A=st.col_B  GROUP BY pd.col_A, pd.col_B, pd.col_C;\nDROP TABLE temp_table;\nNote always drop the temp table after the analysis is complete to release data from physical memory.\n\n\n\nOther Pointers for best database performance\nSELECT columns, not stars. Specify the columns you’d like to include in the results (though it’s fine to use * when first exploring tables — just remember to LIMIT your results).\nAvoid using SELECT DISTINCT. SELECT DISTINCT command in Amazon Redshift used for fetching unique results and remove duplicate rows in the relation. To achieve this task, it basically groups together related rows and then removes them. GROUP BY operation is a costly operation. To fetch distinct rows and remove duplicate rows, use more attributes in the SELECT operation. \nInner joins vs WHERE clause. Use inner join for merging two or more tables rather than using the WHERE clause. WHERE clause creates the CROSS join/ CARTESIAN product for merging tables. CARTESIAN product of two tables takes a lot of time.\nIN versus EXISTS. IN operator is costlier than EXISTS in terms of scans especially when the result of the subquery is a large dataset. We should try to use EXISTS rather than using IN for fetching results with a subquery. \nAvoid\nSELECT col_A , col_B, col_C\nFROM projects.schema_name.table_name\nWHERE col_A IN\n(SELECT col_B FROM projects.schema_name.table_name WHERE col_B = 'DOG')\nPrefer\nSELECT col_A , col_B, col_C\nFROM projects.schema_name.table_name\nWHERE EXISTS\n(SELECT col_A FROM projects.schema_name.table_name b WHERE\na.col_A = b.col_B and b.col_B = 'DOG')\nQuery optimizers can change the order of the following list, but this general lifecycle of a Amazon Redshift query is good to keep in mind when writing Amazon Redshift.\n\nFROM (and JOIN) get(s) the tables referenced in the query.\nWHERE filters data.\nGROUP BY aggregates data.\nHAVING filters out aggregated data that doesn’t meet the criteria.\nSELECT grabs the columns (then deduplicates rows if DISTINCT is invoked).\nUNION merges the selected data into a result set.\nORDER BY sorts the results.\n\n\n\nAmazon Redshift best practices for FROM\nJoin tables using the ON keyword. Although it’s possible to “join” two tables using a WHERE clause, use an explicit JOIN. The JOIN + ON syntax distinguishes joins from WHERE clauses intended to filter the results.\nSET search_path = schema_name;-- this statement sets the default schema/database to projects.schema_name\nSELECT A.col_A , B.col_B, B.col_C\nFROM projects.schema_name.table_name as A\nJOIN projects.schema_name.table_name B ON A.col_A = B.col_B\nAlias multiple tables. When querying multiple tables, use aliases, and employ those aliases in your select statement, so the database (and your reader) doesn’t need to parse which column belongs to which table. \nAvoid\nSET search_path = schema_name;-- this statement sets the default schema/database to projects.schema_name\nSELECT col_A , col_B, col_C\nFROM dbo.table_name as A\nLEFT JOIN dbo.table_name as B ON A.col_A = B.col_B\nPrefer\nSET search_path = schema_name;-- this statement sets the default schema/database to projects.schema_name\nSELECT A.col_A , B.col_B, B.col_C\nFROM dbo.table_name as A\nLEFT JOIN dbo.table_name as B\nA.col_A = B.col_B\n\n\nAmazon Redshift best practices for WHERE\nFilter with WHERE before HAVING. Use a WHERE clause to filter superfluous rows, so you don’t have to compute those values in the first place. Only after removing irrelevant rows, and after aggregating those rows and grouping them, include a HAVING clause to filter out aggregates.\nAvoid functions on columns in WHERE clauses. Using a function on a column in a WHERE clause can really slow down your query, as the function prevents the database from using an index to speed up the query. Instead of using the index to skip to the relevant rows, the function on the column forces the database to run the function on each row of the table. The concatenation operator || is also a function, so don’t try to concat strings to filter multiple columns. Prefer multiple conditions instead:\nAvoid\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name\nWHERE concat(col_A, col_B) = 'REGULARCOFFEE'\nPrefer\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name\nWHERE col_A ='REGULAR' and col_B = 'COFFEE'\n\n\nAmazon Redshift best practices for GROUP BY\nOrder multiple groupings by descending cardinality. Where possible, GROUP BY columns in order of descending cardinality. That is, group by columns with more unique values first (like IDs or phone numbers) before grouping by columns with fewer distinct values (like state or gender).\n\n\nAmazon Redshift best practices for HAVING\nOnly use HAVING for filtering aggregates. Before HAVING, filter out values using a WHERE clause before aggregating and grouping those values.\nSELECT  col_A, sum(col_B) as total_amt\nFROM  projects.schema_name.table_name\nWHERE  col_C = 1617 and col_A='key'\nGROUP BY col_A\nHAVING  sum(col_D)&gt; 0\n\n\nAmazon Redshift best practices for UNION\nPrefer UNION All to UNION. If duplicates are not an issue, UNION ALL won’t discard them, and since UNION ALL isn’t tasked with removing duplicates, the query will be more efficient\n\n\nAmazon Redshift best practices for ORDER BY\nAvoid sorting where possible, especially in subqueries. If you must sort, make sure your subqueries are not needlessly sorting data.\nAvoid\nSELECT col_A, col_B, col_C\nFROM projects.schema_name.table_name\nWHERE col_B IN\n(SELECT col_A  FROM projects.schema_name.table_name\nWHERE col_C = 534905 ORDER BY col_B);\nPrefer\nSELECT col_A, col_B, col_C\nFROM projects.schema_name.table_name\nWHERE col_A IN\n(SELECT col_B  FROM projects.schema_name.table_name\nWHERE col_C = 534905);\n\n\nTroubleshooting Queries\nThere are several metrics for calculating the cost of the query in terms of storage, time, CPU utilization. However, these metrics require DBA permissions to execute. Follow up with ADRF support to get additional assistance.\nUsing the SVL_QUERY_SUMMARY view: To analyze query summary information by stream, do the following:    \nStep 1: select query, elapsed, substring  from svl_qlog order by query desc limit 5;       \nStep 2: select * from svl_query_summary where query = MyQueryID order by stm, seg, step;\nExecution Plan:  Lastly, an execution plan is a detailed step-by-step processing plan used by the optimizer to fetch the rows. It can be enabled in the database using the following procedure: \n\nClick on SQL Editor in the menu bar.\nClick on Explain Execution Plan.\n\nIt helps to analyze the major phases in the execution of a query. We can also find out which part of the execution is taking more time and optimize that sub-part. The execution plan shows which tables were accessed, what index scans were performed for fetching the data. If joins are present it shows how these tables were merged. Further, we can see a more detailed analysis view of each sub-operation performed during query execution."
+    "text": "Redshift Query Guidelines for Researchers\nDeveloping your query. Here’s an example workflow to follow when developing a query.\n\nStudy the column and table metadata, which is accessible via the table definition. Each table definition can be displayed by clicking on the [+] next the table name. \nTo get a feel for a table’s values, SELECT * from the tables you’re working with and LIMIT your results (Keep the LIMIT applied as you refine your columns) or use (e.g., select  * from [table name] LIMIT 1000 )\nNarrow down the columns to the minimal set required to answer your question.\nApply any filters to those columns.\nIf you need to aggregate data, aggregate a small number of rows\nOnce you have a query returning the results you need, look for sections of the query to save as a Common Table Expression (CTE) to encapsulate that logic.\n\n\nDO and DON’T DO BEST PRACTICES:\n\nTip 1: Use SELECT &lt;columns&gt; instead of SELECT *\nSpecify the columns in the SELECT clause instead of using SELECT *. The unnecessary columns place extra load on the database, which slows down not just the single Amazon Redshift, but the whole system.\nInefficient\nSELECT * FROM projects.schema_name.table_name\nThis query fetches all the data stored in the table you choose which might not be required for a particular scenario.\nEfficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name\n\n\nTip 2: Always fetch limited data and target accurate results\nLesser the data retrieved, the faster the query will run. Rather than applying too many filters on the client-side, filter the data as much as possible at the server. This limits the data being sent on the wire and you’ll be able to see the results much faster.  In Amazon Redshift use LIMIT (###) qualifier at the end of the query to limit records.\nSELECT  col_A, col_B, col_C FROM projects.schema_name.table_name  WHERE [apply some filter] LIMIT 1000\n\n\nTip 3: Use wildcard characters wisely\nWildcard characters can be either used as a prefix or a suffix. Using leading wildcard (%) in combination with an ending wildcard will search all records for a match anywhere within the selected field.\nInefficient\nSelect col_A, col_B, col_C from projects.schema_name.table_name where col_A like '%BRO%'\nThis query will pull the expected results of Brown Sugar, Brownie, Brown Rice and so on. However, it will also pull unexpected results, such as Country Brown, Lamb with Broth, Cream of Broccoli.    \nEfficient\nSelect col_A, col_B, col_C from projects.schema_name.table_name where col_B like 'BRO%'.\nThis query will pull only the expected results of Brownie, Brown Rice, Brown Sugar and so on. \n\n\nTip 4: Does My record exist?\nNormally, developers use EXISTS() or COUNT() queries for matching a record entry. However, EXISTS() is more efficient as it will exit as soon as finding a matching record; whereas, COUNT() will scan the entire table even if the record is found in the first row.\nEfficient\nselect col_A from projects.schema_name.table_name A where exists (select 1 from projects.schema_name.table_name B where A.col_A = B.col_A ) order by col_A;\n\n\nTip 5: Avoid correlated subqueries\nA correlated subquery depends on the parent or outer query. Since it executes row by row, it decreases the overall speed of the process.  \nInefficient\nSELECT col_A, col_B, (SELECT col_C FROM projects.schema_name.table_name_a WHERE col_C = c.rma LIMIT 1) AS new_name FROM projects.schema_name.table_name_b\nHere, the problem is — the inner query is run for each row returned by the outer query. Going over the “table_name_b” table again and again for every row processed by the outer query creates process overhead. Instead, for Amazon Redshift query optimization, use JOIN to solve such problems.\nEfficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name c LEFT JOIN projects.schema_name.table_name co ON c.col_A = co.col_B\n\n\nTip 6: Avoid using Amazon Redshift function in the where condition\nOften developers use functions or methods with their Amazon Redshift queries. \nInefficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name WHERE RIGHT(birth_date,4) = '1965' and  LEFT(birth_date,2) = '07'\nNote that even if birth_date has an index, the above query changes the WHERE clause in such a way that this index cannot be used anymore.\nEfficient\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name WHERE birth_date between '711965' and '7311965'\n\n\nTip 7: Use WHERE instead of HAVING\nHAVING clause filters the rows after all the rows are selected. It is just like a filter. Do not use the HAVING clause for any other purposes. It is useful when performing group bys and aggregations.\n\n\nTip 8: Use temp tables when merging large data sets\nCreating local temp tables will limit the number of records in large table joins and merges.  Instead of performing large table joins, one can break out the analysis by performing the analysis in two steps: 1) create a temp table with limiting criteria to create a smaller / filtered result set.  2) join the temp table to the second large table to limit the number of records being fetched and to speed up the query. This is especially useful when there are no indexes on the join columns.\nInefficient\nSELECT col_A, col_B, sum(col_C) total FROM projects.schema_name.table_name  pd  INNER JOIN  projects.schema_name.table_name st  ON  pd.col_A=st.col_B WHERE pd.col_C like 'DOG%' GROUP BY pd.col_A, pd.col_B, pd.col_C\nNote that even if joining column col_A has an index, the col_B column does not.  In addition, because the size of some tables can be large, one should limit the size of the join table by first building a smaller filtered #temp table then performing the table joins.  \nEfficient\nSET search_path = schema_name;  -- this statement sets the default schema/database to projects.schema_name\nStep 1:\nCREATE TEMP TABLE temp_table (\ncol_A varchar(14),\ncol_B varchar(178),\ncol_C varchar(4) );\nStep 2:\nINSERT INTO temp_table SELECT col_A, col_B, col_C\nFROM projects.schema_name.table_name WHERE col_B like 'CAT%';\nStep 3:\nSELECT pd.col_A, pd.col_B, pd.col_C, sum(col_C) as total FROM  temp_table pd  INNER JOIN  projects.schema_name.table_name st  ON pd.col_A=st.col_B  GROUP BY pd.col_A, pd.col_B, pd.col_C;\nDROP TABLE temp_table;\nNote always drop the temp table after the analysis is complete to release data from physical memory.\n\n\n\nOther Pointers for best database performance\nSELECT columns, not stars. Specify the columns you’d like to include in the results (though it’s fine to use * when first exploring tables — just remember to LIMIT your results).\nAvoid using SELECT DISTINCT. SELECT DISTINCT command in Amazon Redshift used for fetching unique results and remove duplicate rows in the relation. To achieve this task, it basically groups together related rows and then removes them. GROUP BY operation is a costly operation. To fetch distinct rows and remove duplicate rows, use more attributes in the SELECT operation. \nInner joins vs WHERE clause. Use inner join for merging two or more tables rather than using the WHERE clause. WHERE clause creates the CROSS join/ CARTESIAN product for merging tables. CARTESIAN product of two tables takes a lot of time.\nIN versus EXISTS. IN operator is costlier than EXISTS in terms of scans especially when the result of the subquery is a large dataset. We should try to use EXISTS rather than using IN for fetching results with a subquery. \nAvoid\nSELECT col_A , col_B, col_C\nFROM projects.schema_name.table_name\nWHERE col_A IN\n(SELECT col_B FROM projects.schema_name.table_name WHERE col_B = 'DOG')\nPrefer\nSELECT col_A , col_B, col_C\nFROM projects.schema_name.table_name\nWHERE EXISTS\n(SELECT col_A FROM projects.schema_name.table_name b WHERE\na.col_A = b.col_B and b.col_B = 'DOG')\nQuery optimizers can change the order of the following list, but this general lifecycle of a Amazon Redshift query is good to keep in mind when writing Amazon Redshift.\n\nFROM (and JOIN) get(s) the tables referenced in the query.\nWHERE filters data.\nGROUP BY aggregates data.\nHAVING filters out aggregated data that doesn’t meet the criteria.\nSELECT grabs the columns (then deduplicates rows if DISTINCT is invoked).\nUNION merges the selected data into a result set.\nORDER BY sorts the results.\n\n\n\nAmazon Redshift best practices for FROM\nJoin tables using the ON keyword. Although it’s possible to “join” two tables using a WHERE clause, use an explicit JOIN. The JOIN + ON syntax distinguishes joins from WHERE clauses intended to filter the results.\nSET search_path = schema_name;-- this statement sets the default schema/database to projects.schema_name\nSELECT A.col_A , B.col_B, B.col_C\nFROM projects.schema_name.table_name as A\nJOIN projects.schema_name.table_name B ON A.col_A = B.col_B\nAlias multiple tables. When querying multiple tables, use aliases, and employ those aliases in your select statement, so the database (and your reader) doesn’t need to parse which column belongs to which table. \nAvoid\nSET search_path = schema_name;-- this statement sets the default schema/database to projects.schema_name\nSELECT col_A , col_B, col_C\nFROM dbo.table_name as A\nLEFT JOIN dbo.table_name as B ON A.col_A = B.col_B\nPrefer\nSET search_path = schema_name;-- this statement sets the default schema/database to projects.schema_name\nSELECT A.col_A , B.col_B, B.col_C\nFROM dbo.table_name as A\nLEFT JOIN dbo.table_name as B\nA.col_A = B.col_B\n\n\nAmazon Redshift best practices for WHERE\nFilter with WHERE before HAVING. Use a WHERE clause to filter superfluous rows, so you don’t have to compute those values in the first place. Only after removing irrelevant rows, and after aggregating those rows and grouping them, include a HAVING clause to filter out aggregates.\nAvoid functions on columns in WHERE clauses. Using a function on a column in a WHERE clause can really slow down your query, as the function prevents the database from using an index to speed up the query. Instead of using the index to skip to the relevant rows, the function on the column forces the database to run the function on each row of the table. The concatenation operator || is also a function, so don’t try to concat strings to filter multiple columns. Prefer multiple conditions instead:\nAvoid\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name\nWHERE concat(col_A, col_B) = 'REGULARCOFFEE'\nPrefer\nSELECT col_A, col_B, col_C FROM projects.schema_name.table_name\nWHERE col_A ='REGULAR' and col_B = 'COFFEE'\n\n\nAmazon Redshift best practices for GROUP BY\nOrder multiple groupings by descending cardinality. Where possible, GROUP BY columns in order of descending cardinality. That is, group by columns with more unique values first (like IDs or phone numbers) before grouping by columns with fewer distinct values (like state or gender).\n\n\nAmazon Redshift best practices for HAVING\nOnly use HAVING for filtering aggregates. Before HAVING, filter out values using a WHERE clause before aggregating and grouping those values.\nSELECT  col_A, sum(col_B) as total_amt\nFROM  projects.schema_name.table_name\nWHERE  col_C = 1617 and col_A='key'\nGROUP BY col_A\nHAVING  sum(col_D)&gt; 0\n\n\nAmazon Redshift best practices for UNION\nPrefer UNION All to UNION. If duplicates are not an issue, UNION ALL won’t discard them, and since UNION ALL isn’t tasked with removing duplicates, the query will be more efficient\n\n\nAmazon Redshift best practices for ORDER BY\nAvoid sorting where possible, especially in subqueries. If you must sort, make sure your subqueries are not needlessly sorting data.\nAvoid\nSELECT col_A, col_B, col_C\nFROM projects.schema_name.table_name\nWHERE col_B IN\n(SELECT col_A  FROM projects.schema_name.table_name\nWHERE col_C = 534905 ORDER BY col_B);\nPrefer\nSELECT col_A, col_B, col_C\nFROM projects.schema_name.table_name\nWHERE col_A IN\n(SELECT col_B  FROM projects.schema_name.table_name\nWHERE col_C = 534905);\n\n\nTroubleshooting Queries\nThere are several metrics for calculating the cost of the query in terms of storage, time, CPU utilization. However, these metrics require DBA permissions to execute. Follow up with ADRF support to get additional assistance.\nUsing the SVL_QUERY_SUMMARY view: To analyze query summary information by stream, do the following:    \nStep 1: select query, elapsed, substring  from svl_qlog order by query desc limit 5;       \nStep 2: select * from svl_query_summary where query = MyQueryID order by stm, seg, step;\nExecution Plan:  Lastly, an execution plan is a detailed step-by-step processing plan used by the optimizer to fetch the rows. It can be enabled in the database using the following procedure: \n\nClick on SQL Editor in the menu bar.\nClick on Explain Execution Plan.\n\nIt helps to analyze the major phases in the execution of a query. We can also find out which part of the execution is taking more time and optimize that sub-part. The execution plan shows which tables were accessed, what index scans were performed for fetching the data. If joins are present it shows how these tables were merged. Further, we can see a more detailed analysis view of each sub-operation performed during query execution."
   },
   {
     "objectID": "appendix.html#aws-sources",

	ODBC Based Connection	JDBC Based Connection(Recommended)
*Libraries*	`library(odbc)`	`library(RJDBC)`
*User ID and Password*		`dbusr=Sys.getenv("DBUSER")` - `dbpswd=Sys.getenv("DBPASSWD")`
		`# Database URL` - `url <- "jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;` - `loginToRp=urn:amazon:webservices:govcloud;` - `ssl=true;` - `AutoCreate=true;` - `idp_host=adfs.adrf.net;` - `idp_port=443;` - `ssl_insecure=true;` - `plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider"` - `# Redshift JDBC Driver Setting` - `driver <- JDBC("com.amazon.redshift.jdbc42.Driver",` - `classPath = "C:\\drivers\\redshift_withsdk\\redshift-jdbc42-2.1.0.12\\redshift-jdbc42-2.1.0.12.jar",` - identifier.quote="`")
*Connection*	`con <- dbConnect(odbc(),"Redshift11_projects_DSN",uid = "adrf\\John.doe.p00002", pwd = 'xxxxxxxxxxxxxx')`	`conn <- dbConnect(driver, url, dbusr, dbpswd)`
	*-ODBC Based Connection*	*JDBC Based Connection (Recommended)*
*Libraries*	`library(odbc)`	`library(RJDBC)`
*User ID and Password*		`dbusr=Sys.getenv("DBUSER") dbpswd=Sys.getenv("DBPASSWD")`
	-	`# Database URL` - `url <- "jdbc:redshift:iam://adrf-redshift01.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;` - `loginToRp=urn:amazon:webservices:govcloud;` - `ssl=true;AutoCreate=true;` - `idp_host=adfs.adrf.net;` - `idp_port=443;` - `ssl_insecure=true;` - `plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider"` - `# Redshift JDBC Driver Setting` - `driver <- JDBC("com.amazon.redshift.jdbc42.Driver",` - `classPath = "C:\\drivers\\redshift_withsdk\\redshift-jdbc42-2.1.0.12\\redshift-jdbc42-2.1.0.12.jar",` - identifier.quote="`")
*Connection*	`con <- dbConnect(odbc(), "Redshift11_projects_DSN", uid = "adrf\\John.doe.p00002", pwd = 'xxxxxxxxxxxxxx')`	`conn <- dbConnect(driver, url,` - `dbusr, dbpswd)`