diff --git a/docs/dag/kubernetes/backup_disaster_recovery.rst b/docs/dag/kubernetes/backup_disaster_recovery.rst
new file mode 100644
index 000000000..71002a60f
--- /dev/null
+++ b/docs/dag/kubernetes/backup_disaster_recovery.rst
@@ -0,0 +1,79 @@
+.. _backup_disaster_recovery:
+
+****************************
+Backup and Disaster Recovery
+****************************
+
+Protecting application data is one of the fundamental purposes of any storage system. Regardless of an application being cloud native, 12 factor, microservice or any other architecture, the data still needs to be protected for the application to continue to function and be valuable to the organization.
+
+NetApp's storage platforms provide data protection and recoverability options which vary based on recovery time and acceptable data loss requirements. Trident can provision volumes which can take advantage of some of these features, however a full data protection and recovery strategy should be evaluated for each application with a persistence requirement.
+
+ONTAP snapshots
+===============
+
+Snapshots play an important role by providing point-in-time recovery options for application data. It's important to remember, however, that snapshots are not backups by themselves. They will not protect against storage system failure or other catastrophes which result in the storage system failing. Snapshots are a convenient, quick, and easy way to recover data for most scenarios.
+
+Using ONTAP snapshots with containers
+-------------------------------------
+
+A backend which has not explicitly set a snapshot policy will use the "``none``" policy. This means that ONTAP will not take any snapshots of the volume automatically. If the storage administrator takes manual snapshots or changes the snapshot policy via the ONTAP management interface, this will not affect Trident operation.
+
+Note that the snapshot directory is hidden by default. This is to facilitate maximum compatibility of volumes provisioned using the ontap-nas and ontap-nas-economy drivers as some applications, such as MySQL, could experience issues.
+
+Accessing the snapshot directory
+--------------------------------
+
+The ``.snapshot`` directory is a mechanism which can be enabled when using the ``ontap-nas`` and ``ontap-nas-economy`` drivers to allow applications to recover data from snapshots directly. More information about how to enable access and enable self-service data recovery can be found in `this blog post `_ on `netapp.io `_.
+
+Restoring the snapshots
+-----------------------
+
+It is possible to restore a volume to a state recorded in a previously created snapshot copy to retrieve lost information using the “volume snapshot restore” ONTAP CLI command . When you restore a Snapshot copy, the restore operation overwrites the existing volume configuration. Any changes made to the data in the volume after the Snapshot copy was created are lost.
+
+.. code-block:: console
+
+ cluster1::*> volume snapshot restore -vserver vs0 -volume vol3 -snapshot vol3_snap_archive
+
+
+
+SolidFire snapshots
+===================
+
+It is possible to backup data on a SolidFire Volume by setting a snapshot schedule to a SolidFire volume. This would make sure that the snapshots of the volume are taken at the required interval. However, it is not possible to set a snapshot schedule to a volume through the solidfire-san driver. This would have to be set manually using the Element OS Web UI or Element OS APIs.
+
+In the event of a data corruption, we can choose a particular snapshot and rollback the volume to the snapshot manually using the Element OS Web UI or Element OS APIs. This reverts any changes made to the volume since the snapshot was created.
+
+
+
+Etcd snapshots using ``etcdctl`` command line utility
+=====================================================
+
+
+The etcdctl command line utility offers the provision to take snapshot of an etcd cluster. It also enables us to restore the previously taken snapshot.
+
+etcdctl snapshot backup
+-----------------------
+
+The etcdctl command ``etcdctl snapshot save /var/etcd/data/snapshot.db`` enables us to take a point-in-time snapshot of the etcd cluster. NetApp recommends using a script to take timely backups. This command can be deployed from within the etcd container or the command can be deployed using the ``kubectl exec`` command directly. Store the periodic snapshots under the persistent Trident NetApp volume `/var/etcd/data` so that snapshots are stored securely and can safely be recovered should the trident pod be lost. Periodically check the volume to be sure it does not run out of space.
+
+etcdctl snapshot restore
+------------------------
+
+In the event of an accidental deletion or corruption of Trident etcd data, we can choose the appropriate snapshot and restore it back using the command ``etcdctl snapshot restore snapshot.db --data-dir /var/etcd/data/etcd-test2 --name etcd1``. Take note to restore the snapshot on to a different folder [shown in the above command as /var/etcd/data/etcd-test2 which is on the mount] inside the Trident NetApp volume.
+
+After the restore is complete, uninstall Trident. Take note not to use the "-a" flag during uninstallation. Mount the Trident volume manually on the host and make sure that the current "member" folder under /var/etcd/data is deleted. Copy the "member" folder from the restored folder /var/etcd/data/etcd-test2 to /var/etcd/data. After the copy is complete, re-install Trident. Verify if the restore and recovery has been completed successfully by making sure all the required data is present.
+
+Data replication using ONTAP
+============================
+
+Replicating data can play an important role in protecting against data loss due to storage array failure. Snapshots are a point-in-time recovery which provide a very quick and easy method of recovering data which has been corrupted or accidentally lost as a result of human or technological error. However, they cannot protect against catastrophic failure of the storage array itself.
+Trident is unable to configure replication relationships itself, however the storage administrator can use ONTAP’s SVM-DR function to automatically replicate volumes to a DR destination. If this method is used to automatically protect Trident provisioned volumes, there are some considerations to take into account.
+
+* A distinct backend should be created for each SVM which has SVM-DR enabled.
+
+* Storage classes should be crafted so as to not select the replicated backends except when desired. This is important to avoid having volumes which do not need the protection of a replication relationship be provisioned onto the backend.
+
+* Application administrators should understand the additional cost and complexity associated with replicating the data and a plan for recovery should be determined before they leverage replication.
+
+* Trident does not automatically detect SVM failures. Therefore, upon a failure, the administrator needs to run the command `tridentctl backend update` to trigger Trident's failover to the new backend.
+
diff --git a/docs/dag/kubernetes/concepts_and_definitions.rst b/docs/dag/kubernetes/concepts_and_definitions.rst
new file mode 100644
index 000000000..7992d4581
--- /dev/null
+++ b/docs/dag/kubernetes/concepts_and_definitions.rst
@@ -0,0 +1,132 @@
+.. _concepts_and_definitions:
+
+************************
+Concepts and Definitions
+************************
+
+Kubernetes introduces several new concepts that storage, application and platform administrators should take into consideration. It is essential to understand the capability of each within the context of their use case.
+
+Kubernetes storage concepts
+===========================
+
+The Kubernetes storage paradigm includes several entities which are important to each stage of requesting, consuming, and managing storage for containerized applications. At a high level, Kubernetes uses three types of objects to describe storage, as described below:
+
+Persistent Volume claim
+-----------------------
+
+Persistent volume claims (PVCs) are used by applications to request access to storage resources. At a minimum, this includes two key characteristics:
+
+* Size – The capacity desired by the application component
+* Access mode – This describes the rules for accessing the storage volume. In particular, there are three access modes:
+
+ * Read Write Once (RWO) – only one node is allowed to access the storage volume at a time for read and write access
+ * Read Only Many (ROX) – many nodes may access the storage volume in read-only mode
+ * Read Write Many (RWX) – many nodes may simultaneously read and write to the storage volume
+
+* Optional: Storage Class - Which storage class to request for this request. See below for storage class information.
+
+More information about PVCs can be found in the `Kubernetes `_ or `OpenShift `_ documentation.
+
+Persistent Volume
+-----------------
+
+PVs are objects that describe, to Kubernetes, how to connect to a storage device. Kubernetes supports many different types of storage. However, this document covers only NFS and iSCSI devices since NetApp platforms and services support those protocols.
+At a minimum, the PV must contain these parameters:
+
+* The capacity represented by the object, e.g. "5Gi"
+* The access mode--same as for PVCs--however, access modes can be dependent on protocol.
+
+ * RWO is supported by all PVs
+ * ROX is supported primarily by file and file-like protocols, e.g. NFS and CephFS. However, some block protocols are supported, such as iSCSI
+ * RWX is supported by file and file-like protocols only, such as NFS
+
+* The protocol, e.g. "iscsi" or "nfs", and additional information needed to access the storage. For example, an NFS PV will need the NFS server and mount path.
+* A reclaim policy that describes the Kubernetes action when the PV is released. There are three options available:
+
+ * Retain, which will mark the volume as waiting for administrator action. The volume cannot be reissued to another PVC.
+ * Recycle, where, after being released, Kubernetes will connect the volume to a temporary pod and issue a ``rm -rf`` command to clear the data. For our interests, this is only supported by NFS volumes.
+ * A policy of Delete will cause Kubernetes to delete the PV when it is released. Kubernetes does not, however, delete the storage which was referenced by the PV.
+
+More information about PVs can be found in the `Kubernetes `_ or `OpenShift `_ documentation.
+
+Storage Class
+-------------
+
+Kubernetes uses the storage class object to describe storage with specific characteristics. An administrator may define several storage classes that each define different storage properties. They are used by the :ref:`PVC ` to provision storage. A storage class may have a provisioner associated with it that will trigger Kubernetes to wait for the volume to be provisioned by the specified provider. In the case of NetApp, the provisioner identifier used is ``netapp.io/trident``.
+
+A storage class object in Kubernetes has only two required fields:
+
+* A name
+* The provisioner, a full list of provisioners can be found in `the documentation `_
+
+The provisioner used may require additional attributes, which will be specific to the provisioner used. Additionally, the storage class may have a reclaim policy and mount options specified which will be applied to all volumes created for that storage class.
+
+More information about storage classes can be found in the `Kubernetes `_ or `OpenShift `_ documentation.
+
+Kubernetes Compute Concepts
+===========================
+
+In addition to basic storage concepts, like how to request and consume storage as described above, it's important to understand the compute concepts involved in the consumption of storage resources. Kubernetes is a container orchestrator, which means that it will dynamically assign containerized workloads to cluster members according to the resource requirements they have expressed (or defaults, if no explicit request is made).
+
+For more information about what containers are and why they are different, see the `Docker documentation `_.
+
+Pods
+----
+
+`A pod `_ represents one or more containers which are related to each other. Containers which are members of the same pod are co-scheduled to the same node in the cluster. They typically share network and storage resources, though not every container in the pod may access the storage or be publicly accessible via the network.
+
+The smallest granularity of management for Kubernetes compute resources is the pod. It is the atomic unit (smallest unit) of scale and is the consumer of other resources, such as storage.
+
+Services
+--------
+
+A Kubernetes `service `_ acts as an internal load balancer for replicated pods. It enables the scaling of pods while maintaining a consistent service IP address. There are several types of services, which may be reachable only within the cluster with a ClusterIP, or may be exposed to the outside world with a NodePort, LoadBalancer, or ExternalName.
+
+
+Deployments
+-----------
+
+A `deployment `_ is one or more pods which are related to each other and often represent a "service" to a larger application being deployed. The application administrator uses deployments to declare the state of their application component and request that Kubernetes ensure that the state is implemented at all times. This can include several options:
+
+* Pods which should be deployed, including versions, storage, network, and other resource requests
+* Number of replicas of each pod instance
+
+The application administrator then uses the deployment as the interface for managing the application. For example, by increasing or decreasing the number of replicas desired the application can be horizontally scaled in or out. Updating the deployment with a new version of the application pod(s) will trigger Kubernetes to remove existing instances one at a time and redeploy using the new version. Conversely, rolling back to a previous version of the deployment will cause Kubernetes to revert the pods to the previously specified version and configuration.
+
+StatefulSets
+------------
+
+Deployments specify how to scale application components, but it's limited to just the pods. When a webserver (which is managed as a Kubernetes deployment) is scaled up, Kubernetes will add more instances of that pod to reach the desired count. It is possible to add PVCs to deployments but then the PVC is shared by all pod replicas. What if each pod needs unique persistent storage?
+
+`StatefulSets `_ are a special type of deployment where persistent storage is requested along with each replica of the pod(s). The StatefulSet definition includes a template PVC, which is used to request additional storage resources as the application is scaled out. In this case, each replica receives its own volume of storage. This is generally used for stateful applications such as databases.
+
+In order to accomplish the above, StatefulSets provide unique pod names and network identifiers that are persistent across pod restarts. They also allow ordered operations, including startup, scale-up, upgrades, and deletion.
+
+As the number of pod replicas increase, the number of PVCs do also. However, scaling down the application will not result in the PVCs being destroyed, as Kubernetes relies on the application administrator to clean up the PVCs in order to prevent inadvertent data loss.
+
+Connecting containers to storage
+================================
+
+When the application submits a PVC requesting storage, the Kubernetes engine will assign a PV which matches, or closely matches, the requirement. If no PV exists which can meet the request expressed in the PVC, then it will wait until a PV has been created which matches the request before making the assignment. If no storage class was assigned, then the Kubernetes administrator would be expected to request a storage resource and introduce a PV. However, the provisioner handles that process automatically when using storage classes.
+
+.. _figDynamicStorageProvisioningProcess:
+
+.. figure:: images/DynamicStorageProvisioningProcess.*
+
+ Kubernetes dynamic storage provisioning process
+
+The storage is not connected to a Kubernetes node within a cluster until the pod has been scheduled. At that time, ``kubelet``, the `agent `_ running on each node that is responsible for managing container instances, mounts the storage to the host according to the information in the PV. When the container(s) in the pod are instantiated on the host, ``kubelet`` mounts the storage devices into the container.
+
+Destroying and creating pods
+============================
+
+It's important to understand that Kubernetes destroys and creates pods (workloads), it does not "move" them the same as live VM migration used by hypervisors. When Kubernetes scales down or needs to re-deploy a workload on a different host, the pod and the container(s) on the original host are stopped, destroyed, and the resources unmounted. The standard mount and instantiate process is then followed wherever in the cluster the same workload is re-deployed as a different pod with a different name, IP address, etc.
+When the application being deployed relies on persistent storage, that storage must be accessible from any Kubernetes node deploying the workload within the cluster. Without a shared storage system available for persistence, the data would be abandoned, and usually deleted, on the source system when the workload is re-deployed elsewhere in the cluster.
+
+To maintain a persistent pod that will always be deployed on the same node with the same name and characteristics, a Stateful Set must be used as described above.
+
+Container Storage Interface
+===========================
+
+The Cloud Native Computing Foundation (CNCF) is actively working on a standardized Container Storage Interface (CSI). NetApp is active in the CSI Special Interest Group (SIG). The CSI is meant to be a standard mechanism used by various container orchestrators to expose storage systems to containers. Trident v19.01 with CSI is currently in alpha stage and runs with Kubernetes version <= 1.12. However, today CSI is in very early stages and does not provide the features NetApp's interface provides for Kubernetes. Therefore, NetApp recommends deploying Trident without CSI at this time and waiting until CSI is more mature.
+
diff --git a/docs/dag/kubernetes/deploying_trident.rst b/docs/dag/kubernetes/deploying_trident.rst
new file mode 100644
index 000000000..d5446632d
--- /dev/null
+++ b/docs/dag/kubernetes/deploying_trident.rst
@@ -0,0 +1,142 @@
+.. _deploying_trident:
+
+*****************
+Deploying Trident
+*****************
+
+The guidelines in this section provide recommendations for Trident installation with various Kubernetes configurations and considerations. As with all the other recommendations in this guide, each of these suggestions should be carefully considered to determine if it's appropriate and will provide benefit to your deployment.
+
+Supported Kubernetes cluster architectures
+==========================================
+
+Trident is supported with the following Kubernetes architectures. In each of the Kubernetes architectures below, the installation steps remain relatively the same except for the ones which have an asterick.
+
+ +-----------------------------------------------+-----------+---------------------+
+ | Kubernetes Cluster Architectures | Supported | Normal Installation |
+ +===============================================+===========+=====================+
+ | Single master, compute | Yes | Yes |
+ +-----------------------------------------------+-----------+---------------------+
+ | Multiple master, compute | Yes | Yes |
+ +-----------------------------------------------+-----------+---------------------+
+ | Master, etcd, compute | Yes* | No |
+ +-----------------------------------------------+-----------+---------------------+
+ | Master, infrastructure, compute | Yes | Yes |
+ +-----------------------------------------------+-----------+---------------------+
+
+The cell marked with an asterik above has an external production etcd cluster and requires different installation steps for deploying Trident. The :ref:`Trident etcd documentation ` discusses in detail how to freshly deploy Trident on an external etcd cluster. It also mentions how to migrate existing Trident deployment to an external etcd cluster as well.
+
+Trident installation modes
+==========================
+
+Three ways to install Trident are discussed in this chapter.
+
+**Normal install mode**
+
+Normal installation involves running the ``tridentctl install -n trident`` command which deploys the Trident pod on the Kubernetes cluster. Trident installation is quite a straightforward process. For more information on installation and provisioning of volumes, refer to the :ref:`Deploying documentation `.
+
+**Offline install mode**
+
+In many organizations, production and development environments do not have access to public repositories for pulling and posting images as these environments are completely secured and restricted. Such environments only allow pulling images from trusted private repositories.
+In such scenarios, make sure that a private registry instance is available. Then trident and etcd images should be downloaded from a bastion host with internet access and pushed on to the private registry. To install Trident in offline mode, just issue the ``tridentctl install -n trident`` command with the ``--etcd-image`` and the ``--trident-image`` parameter with the appropriate image name and location. For more information on how to install Trident in offline mode, please examine the blog on `Installing Trident for Kubernetes from a Private Registry `_.
+
+
+**Remote install mode**
+
+Trident can be installed on a Kubernetes cluster from a remote machine. To do a remote install, install the appropriate version of ``kubectl`` on the remote machine from where you would be running the ``tridentctl install`` command remotely. Copy the configuration files from the Kubernetes cluster and set the KUBECONFIG environment variable on the remote machine. Initiate a ``kubectl get nodes`` command to verify you can connect to the required Kubernetes cluster. Complete the Trident deployment from the remote machine using the normal installation steps.
+
+Trident CSI installation
+========================
+The Container Storage Interface (CSI) is a standardized API for container orchestrators to manage storage plugins. CSI still in early stages and has not yet integrated much functionality. However, using the early specs, NetApp developed a Trident CSI for Kubernetes alpha driver for the Kubernetes CSI Beta integration for testing purposes only. As of Trident 19.01, NetApp recommends not deploying CSI in production environments until CSI becomes more mature. More information regarding CSI can be found in the :ref:`Trident documentation `.
+
+
+Recommendations for all deployments
+===================================
+
+Deploy Trident to a dedicated namespace
+---------------------------------------
+
+`Namespaces `_ provide administrative separation between different applications and are a barrier for resource sharing, for example a PVC from one namespace cannot be consumed from another. Trident provides PV resources to all namespaces in the Kubernetes cluster and consequently leverages a service account which has elevated privileges.
+
+Additionally, access to the Trident pod may enable a user to access storage system credentials and other sensitive information. It is important to ensure that application users and management applications do not have the ability to access the Trident object definitions or the pods themselves.
+
+Use quotas and range limits to control storage consumption
+----------------------------------------------------------
+
+Kubernetes has two features which, when combined, provide a powerful mechanism for limiting the resource consumption by applications. The `storage quota mechanism `_ allows the administrator to implement global, and storage class specific, capacity and object count consumption limits on a per-namespace basis. Further, using a `range limit `_ will ensure that the PVC requests must be within both a minimum and maximum value before the request is forwarded to the provisioner.
+
+These values are defined on a per-namespace basis, which means that each namespace will need to have values defined which fall in line with their resource requirements. An example of `how to leverage quotas `_ can be found on `netapp.io `_.
+
+Use PVC protection to protect in-use resources
+----------------------------------------------
+
+This feature is on by default if running Kubernetes 1.10 or greater. Follow this process only if running Kubernetes 1.09 or less.
+
+`Storage object in use protection `_, or simply PVC protection, is a Kubernetes feature which prevents the deletion of PVCs which are in use by a pod and PVs which are bound to a PVC. This is important as it prevents a volume from being destroyed while it's actively being used by an application, which can result in data loss.
+
+Implementing PVC protection is slightly different depending on your host operating system and Kubernetes distribution. Generically, the following process can be followed when using CentOS with "vanilla" Kubernetes, however be sure to follow the documentation for your particular operating system and Kubernetes version.
+
+.. code-block:: console
+
+ # from each master server in the cluster, edit the api server config
+ vi /etc/kubernetes/manifests/kube-apiserver.yaml
+
+ # search for the line under the "kube-apiserver" command stanza which
+ # starts with "- --admission-control" and append the value
+ # "StorageObjectInUseProtection"
+
+ # alternatively, backup the file and use the following sed command
+ cp /etc/kubernetes/manifests/kube-apiserver.yaml \
+ /etc/kubernetes/manifests/kube-apiserver.yaml.orig
+
+ sed -i '/admission-control/ s/$/,StorageObjectInUseProtection/' \
+ /etc/kubernetes/manifests/kube-apiserver.yaml
+
+ # after editing, restart the kube-apiserver service
+ systemctl restart kube-apiserver.service
+
+After adding the admission controller for storage object protection you can verify it's functioning by viewing the details of a PVC and verifying the presence of the finalizer.
+
+.. code-block:: console
+ :emphasize-lines: 11
+
+ [root@kubemaster ~]# kubectl describe pvc pvc-protection
+ Name: pvc-protection
+ Namespace: default
+ StorageClass: performance
+ Status: Bound
+ Volume: default-pvc-protection-3bcc8
+ Labels:
+ Annotations: pv.kubernetes.io/bind-completed=yes
+ pv.kubernetes.io/bound-by-controller=yes
+ volume.beta.kubernetes.io/storage-provisioner=netapp.io/trident
+ Finalizers: [kubernetes.io/pvc-protection]
+ Capacity: 3972844748800m
+ Access Modes: RWO
+ Events:
+ Type Reason Age From Message
+ ---- ------ ---- ---- -------
+ Normal ExternalProvisioning 23s (x2 over 23s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "netapp.io/trident" or manually created by system administrator
+ Normal ProvisioningSuccess 21s netapp.io/trident Kubernetes frontend provisioned a volume and a PV for the PVC
+
+Deploying Trident to OpenShift
+==============================
+
+OpenShift uses Kubernetes for the underlying container orchestrator. Consequently, the same recommendations will apply when using Trident with Kubernetes or OpenShift. However, there are some minor additions when using OpenShift which should be taken into consideration.
+
+Deploy Trident to infrastructure nodes
+--------------------------------------
+
+Trident is a core service to the OpenShift cluster, provisioning and managing the volumes used across all projects. Consideration should be given to deploying Trident to the infrastructure nodes in order to provide the same level of care and concern.
+
+To deploy Trident to the infrastructure nodes, the project for Trident must be created by an administrator using the `oc adm` command. This prevents the project from inheriting the default node selector, which forces the pod to execute on compute nodes.
+
+.. code-block:: console
+
+ # create the project which Trident will be deployed to using
+ # the non-default node selector
+ oc adm new-project --node-selector="region=infra"
+
+ # deploy Trident using the project name
+ tridentctl install -n
+
+The result of the above command is that any pod deployed to the project will be scheduled to nodes which have the tag "``region=infra``". This also removes the default node selector used by other projects which schedules pods to nodes which have the label "``node-role.kubernetes.io/compute=true``".
diff --git a/docs/dag/kubernetes/images/DynamicStorageProvisioningProcess.png b/docs/dag/kubernetes/images/DynamicStorageProvisioningProcess.png
new file mode 100644
index 000000000..8d9936bcf
Binary files /dev/null and b/docs/dag/kubernetes/images/DynamicStorageProvisioningProcess.png differ
diff --git a/docs/dag/kubernetes/images/MultiInfraCluster.png b/docs/dag/kubernetes/images/MultiInfraCluster.png
new file mode 100644
index 000000000..fba611e25
Binary files /dev/null and b/docs/dag/kubernetes/images/MultiInfraCluster.png differ
diff --git a/docs/dag/kubernetes/images/MultiMasterCluster.png b/docs/dag/kubernetes/images/MultiMasterCluster.png
new file mode 100644
index 000000000..ec8bf23cb
Binary files /dev/null and b/docs/dag/kubernetes/images/MultiMasterCluster.png differ
diff --git a/docs/dag/kubernetes/images/MultiMasterCluster1.png b/docs/dag/kubernetes/images/MultiMasterCluster1.png
new file mode 100644
index 000000000..6e9eae887
Binary files /dev/null and b/docs/dag/kubernetes/images/MultiMasterCluster1.png differ
diff --git a/docs/dag/kubernetes/images/MultietcdCluster.png b/docs/dag/kubernetes/images/MultietcdCluster.png
new file mode 100644
index 000000000..f58ba0ddc
Binary files /dev/null and b/docs/dag/kubernetes/images/MultietcdCluster.png differ
diff --git a/docs/dag/kubernetes/index.rst b/docs/dag/kubernetes/index.rst
new file mode 100644
index 000000000..8b7feafdb
--- /dev/null
+++ b/docs/dag/kubernetes/index.rst
@@ -0,0 +1,19 @@
+#############################
+Design and Architecture Guide
+#############################
+
+.. toctree::
+ :numbered:
+ :maxdepth: 2
+ :caption: Table of Contents:
+
+ introduction
+ concepts_and_definitions
+ netapp_products_integrations
+ kubernetes_cluster_architecture_considerations
+ storage_kubernetes_infrastructure_services
+ storage_configuration_trident
+ deploying_trident
+ integrating_trident
+ backup_disaster_recovery
+ security_recommendations
diff --git a/docs/dag/kubernetes/integrating_trident.rst b/docs/dag/kubernetes/integrating_trident.rst
new file mode 100644
index 000000000..8eaf27b7a
--- /dev/null
+++ b/docs/dag/kubernetes/integrating_trident.rst
@@ -0,0 +1,324 @@
+.. _integrating_trident:
+
+*******************
+Integrating Trident
+*******************
+
+Trident backend design
+======================
+
+ONTAP
+-----
+
+**Choosing a backend driver for ONTAP**
+
+Four different backend drivers are available for ONTAP systems. These drivers are differentiated by the protocol being used and how the volumes are provisioned on the storage system. Therefore, give careful consideration regarding which driver to deploy.
+
+At a higher level, if your application has components which need shared storage (multiple pods accessing the same PVC) NAS based drivers would be the default choice, while the block based iSCSI driver meets the needs of non-shared storage. Choose the protocol based on the requirements of the application and the comfort level of the storage and infrastructure teams. Generally speaking, there is little difference between them for most applications, so often the decision is based upon whether or not shared storage (where more than one pod will need simultaneous access) is needed.
+
+The four :ref:`drivers ` for ONTAP backends are listed below:
+
+* ``ontap-nas`` – each PV provisioned is a full ONTAP FlexVolume
+* ``ontap-nas-economy`` – each PV provisioned is a qtree, with up to 200 qtrees per FlexVolume
+* ``ontap-nas-flexgroup`` - each PV provisioned as a full ONTAP FlexGroup, and all aggregates assigned to a SVM are used.
+* ``ontap-san`` – each PV provisioned is a LUN within its own FlexVolume
+
+Choosing between the three NFS drivers has some ramifications to the features which are made available to the application.
+
+Note that, in the tables below, not all of the capabilities are exposed through Trident. Some must be applied by the storage administrator after provisioning if that functionality is desired. The superscript footnotes distinguish the functionality per feature and driver.
+
+.. table:: ONTAP NAS driver capabilities
+ :align: left
+
+ +-----------------------------+--------------+--------+--------------+---------------+--------+--------------+
+ | ONTAP NFS Drivers | Snapshots | Clones | Multi-attach | QoS | Resize | Replication |
+ +=============================+==============+========+==============+===============+========+==============+
+ | ``ontap-nas`` | Yes | Yes | Yes | Yes\ :sup:`2` | Yes | Yes\ :sup:`2`|
+ +-----------------------------+--------------+--------+--------------+---------------+--------+--------------+
+ | ``ontap-nas-economy`` | Yes\ :sup:`1`| No | Yes | Yes\ :sup:`12`| Yes | Yes\ :sup:`2`|
+ +-----------------------------+--------------+--------+--------------+---------------+--------+--------------+
+ | ``ontap-nas-flexgroup`` | Yes | No | Yes | Yes\ :sup:`2` | Yes | Yes\ :sup:`2`|
+ +-----------------------------+--------------+--------+--------------+---------------+--------+--------------+
+
+
+The SAN driver capabilities are shown below.
+
+.. table:: ONTAP SAN driver capabilities
+ :align: left
+
+
+ +-----------------------------+-----------+--------+--------------+---------------+---------------+---------------+
+ | ONTAP SAN Driver | Snapshots | Clones | Multi-attach | QoS | Resize | Replication |
+ +=============================+===========+========+==============+===============+===============+===============+
+ | ``ontap-san`` | Yes | Yes | No | Yes\ :sup:`2` | Yes\ :sup:`2` | Yes\ :sup:`2` |
+ +-----------------------------+-----------+--------+--------------+---------------+---------------+---------------+
+
+| Footnote for above tables:
+| Yes\ :sup:`1`: Trident managed, but not PV granular
+| Yes\ :sup:`2`: Not Trident managed
+| Yes\ :sup:`12`: Not Trident managed and not PV granular
+
+
+The features that are not PV granular are applied to the entire FlexVolume and all of the PVs (i.e. qtrees) will share a common schedule for each qtree.
+
+As we can see in the above tables, much of the functionality between the ``ontap-nas`` and ``ontap-nas-economy`` is the same. However, since the ``ontap-nas-economy`` driver limits the ability to control the schedule at per-PV granularity, this may affect your disaster recovery and backup planning in particular. For development teams which desire to leverage PVC clone functionality on ONTAP storage, this is only possible when using the ``ontap-nas`` or ``ontap-san`` drivers (note, the ``solidfire-san`` driver is also capable of cloning PVCs).
+
+SolidFire Backend Driver
+-----------------------
+The ``solidfire-san`` driver, used with the SolidFire platform, helps the admin configure a SolidFire backend for Trident on the basis of QoS limits. If you would like to design your backend to set the specific QoS limits on the volumes provisioned by Trident, use the `Type` parameter in the backend file. The admin also can restrict the volume size that could be created on the storage using the `limitVolumeSize` parameter. Currently SolidFire storage features like volume resize and volume replication are not supported through the ``solidfire-san`` driver. These operation should be done manually through Element OS Web UI.
+
+.. table:: SolidFire SAN driver capabilities
+ :align: left
+
+ +-------------------+-----------+--------+--------------+------+-------------------+---------------+
+ | SolidFire Driver | Snapshots | Clones | Multi-attach | QoS | Resize | Replication |
+ +===================+===========+========+==============+======+===================+===============+
+ | ``solidfire-san`` | Yes | Yes | No | Yes | Yes\ :sup:`1` | Yes\ :sup:`1` |
+ +-------------------+-----------+--------+--------------+------+-------------------+---------------+
+
+
+| Footnote:
+| Yes\ :sup:`1`: Not Trident managed
+
+Storage Class design
+====================
+
+Individual Storage Classes need to be configured and applied to create a Kubernetes storage class object. This section discusses how to design a storage class for your application.
+
+Storage Class design For specific backend utilization
+-----------------------------------------------------
+
+Filtering can be used within a specific storage class object to determine which storage pool or set of pools are to be used with that specific storage class. Three sets of filters can be set in the Storage Class: `storagePools`, `additionalStoragePools`, and/or `excludeStoragePools`.
+
+The `storagePools` parameter helps restrict storage to the set of pools that match any specified attributes. The `additionalStoragePools` parameter is used to extend the set of pools that Trident will use for provisioning along with the set of pools selected by the attributes and `storagePools` parameters. You can use either parameter alone or both together to make sure that appropriate set of storage pools are selected.
+
+The `excludeStoragePools` parameter is used to specifically exclude the listed set of pools that match the attributes.
+
+Please refer to :ref:`Trident StorageClass Objects ` on how these parameters are used.
+
+Storage Class design To emulate QoS policies
+-----------------------------------------------
+
+If you would like to design Storage Classes to emulate Quality of Service policies, create a Storage Class with the `media` attribute as `hdd` or `ssd`. Based on the `media` attribute mentioned in the storage class, Trident will select the appropriate backend that serves `hdd` or `ssd` aggregates to match the media attribute and then direct the provisioning of the volumes on to the specific aggregate. Therefore we can create a storage class PREMIUM which would have `media` attribute set as `ssd` which could be classified as the PREMIUM QoS policy. We can create another storage class STANDARD which would would have the media attribute set as 'hdd' which could be classified as the STANDARD QoS policy. We could also use the “IOPS” attribute in the storage class to redirect provisioning to a SolidFire appliance which can be defined as a QoS Policy.
+
+
+Please refer to :ref:`Trident StorageClass Objects ` on how these parameters can be used.
+
+Storage Class Design To utilize backend based on specific features
+---------------------------------------------------------------------
+
+Storage Classes can be designed to direct volume provisioning on a specific backend where features such as thin and thick provisioning, snapshots, clones and encryption are enabled. To specify which storage to use, create Storage Classes that specify the appropriate backend with the required feature enabled.
+
+Please refer to :ref:`Trident StorageClass Objects ` on how these parameters can be used.
+
+
+PVC characteristics which affect storage provisioning
+=====================================================
+
+Some parameters beyond the requested storage class may affect Trident's provisioning decision process when creating a PVC.
+
+Access mode
+-----------
+
+When requesting storage via a PVC, one of the mandatory fields is the access mode. The mode desired may affect the backend selected to host the storage request.
+
+Trident will attempt to match the storage protocol used with the access method specified according to the following matrix. This is independent of the underlying storage platform.
+
+.. table:: Protocols used by access modes
+ :align: left
+
+ +-------+---------------+--------------+---------------+
+ | | ReadWriteOnce | ReadOnlyMany | ReadWriteMany |
+ +=======+===============+==============+===============+
+ | iSCSI | Yes | Yes | No |
+ +-------+---------------+--------------+---------------+
+ | NFS | Yes | Yes | Yes |
+ +-------+---------------+--------------+---------------+
+
+A request for a ReadWriteMany PVC submitted to a Trident deployment without an NFS backend configured will result in no volume being provisioned. For this reason, the requestor should use the access mode which is appropriate for their application.
+
+Modifying persistent volumes
+============================
+
+Persistent volumes are, with two exceptions, immutable objects in Kubernetes. Once created, the reclaim policy and the size can be modified. However, this doesn't prevent some aspects of the volume from being modified outside of Kubernetes. This may be desirable in order to customize the volume for specific applications, to ensure that capacity is not accidentally consumed, or simply to move the volume to a different storage controller for any reason.
+
+.. note::
+ Kubernetes in-tree provisioners do not support volume resize operations for NFS or iSCSI PVs at this time. Trident supports expanding NFS volumes. For a list of PV types which support volume resizing refer to the `Kubernetes documentation `_.
+
+The connection details of the PV cannot be modified after creation.
+
+Volume move operations
+----------------------
+
+Storage administrators have the ability to move volumes between aggregates and controllers in the ONTAP cluster non-disruptively to the storage consumer. This operation does not affect Trident or the Kubernetes cluster, so long as the destination aggregate is one which the SVM Trident is using has access to. Importantly, if the aggregate has been newly added to the SVM, the backend will need to be "refreshed" by re-adding it to Trident. This will trigger Trident to reinventory the SVM so that the new aggregate is recognized.
+
+However, moving volumes across backends is not supported. This includes between SVMs in the same cluster, between clusters, or onto a different storage platform (even if that storage system is one which is connected to Trident).
+
+Resizing volumes
+----------------
+To make sure that the Persistent Volumes provisioned by Trident can be resized later, create Persistent Volume based out of a PersistentVolume Claim that utilizes a Storage Class which allow volume expansion by setting "allowVolumeExpansion” attribute as true. Whenever the Persistent Volume needs to be resized, edit the "spec.resources.requests.storage" annotation in the Persistent Volume Claim to the required volume size and Trident will automatically take care of resizing the volume on ONTAP.
+
+.. note::
+ 1. Currently NFS PV resize is only supported by Trident and not iSCSI PV resize.
+ 2. Kubernetes, prior to version 1.12, does not support NFS PV resize as the admission controller may reject PVC size updates. The Trident team has changed Kubernetes to allow such changes starting with Kubernetes 1.12. While we recommend using Kubernetes 1.12, it is still possible to resize NFS PVs for earlier versions of Kubernetes that support resize. This is done by disabling the PersistentVolumeClaimResize admission plugin when the Kubernetes API server is started.
+
+When to manually provision a volume instead of using Trident
+============================================================
+
+Trident's goal is to be the provisioning engine for all storage consumed by containers. However, we understand that there are scenarios which may still need a manually provisioned PV. Generally speaking, these situations are limited to needing to customize the properties of the underlying storage device in ways which Trident does not support.
+
+There are two ways which the desired settings can be applied:
+
+#. Use the backend configuration, or PVC attributes, to customize the volume properties at provisioning time
+#. After the volume is provisioned, the storage administrator applies configuration to the volume which is bound to the PVC
+
+Option number 1 is limited by the volume options with Trident supports, which do not encompass all of the options available. Option 2 may be the only viable solution for fully customizing storage for a particular application. Finally, you can always provision a volume manually and introduce a matching PV outside of Trident if you do not want Trident to manage it for some reason.
+
+If you have requirements to customize volumes in ways which Trident does not support, please let us know using resources on the :ref:`contact_us` page.
+
+Deploying OpenShift services using Trident
+==========================================
+
+The OpenShift value-add cluster services provide important functionality to cluster administrators and the applications being hosted. The storage which these services use can be provisioned using the node-local resources, however this often limits the capacity, performance, recoverability, and sustainability of the service. Leveraging an enterprise storage array to provide capacity to these services can enable dramatically improved service, however, as with all applications, the OpenShift and storage administrators should work closely together to determine the best options for each. The Red Hat documentation should be leveraged heavily to determine the requirements and ensure that sizing and performance needs are met.
+
+Registry service
+----------------
+
+Deploying and managing storage for the registry has been documented on `netapp.io `_ in `this blog post `_.
+
+Logging service
+---------------
+
+Like other OpenShift services, the logging service is deployed using Ansible with configuration parameters supplied by the inventory, a.k.a. hosts, file provided to the playbook. There are two installation methods which will be covered: deploying logging during initial OpenShift install and deploying logging after OpenShift has been installed.
+
+.. warning::
+ As of Red Hat OpenShift version 3.9, the official documentation recommends against NFS for the logging service due to concerns around data corruption. This is based on Red Hat testing of their products. ONTAP's NFS server does not have these issues, and can easily back a logging deployment. Ultimately, the choice of protocol for the logging service is up to you, just know that both will work great when using NetApp platforms and there is no reason to avoid NFS if that is your preference.
+
+ If you choose to use NFS with the logging service, you will need to set the Ansible variable ``openshift_enable_unsupported_configurations`` to ``true`` to prevent the installer from failing.
+
+**Getting started**
+
+The logging service can, optionally, be deployed for both applications as well as for the core operations of the OpenShift cluster itself. If you choose to deploy operations logging, by specifying the variable ``openshift_logging_use_ops`` as ``true``, two instances of the service will be created. The variables which control the logging instance for operations contain "ops" in them, whereas the instance for applications do not.
+
+Configuring the Ansible variables according to the deployment method is important in order to ensure that the correct storage is utilized by the underlying services. Let's look at the options for each of the deployment methods
+
+.. note::
+ The tables below only contain the variables which are relevant for storage configuration as it relates to the logging service. There are many other options found in `the documentation `_ which should be reviewed, configured, and used according to your deployment.
+
+The variables in the below table will result in the Ansible playbook creating a PV and PVC for the logging service using the details provided. This method is significantly less flexible than using the component installation playbook after OpenShift installation, however if you have existing volumes available, it is an option.
+
+.. table:: Logging variables when deploying at OpenShift install time
+ :align: left
+
+ +---------------------------------------------+------------------------------------------------+
+ | Variable | Details |
+ +=============================================+================================================+
+ | ``openshift_logging_storage_kind`` | Set to ``nfs`` to have the installer create an |
+ | | NFS PV for the logging service. |
+ +---------------------------------------------+------------------------------------------------+
+ | ``openshift_logging_storage_host`` | The hostname or IP address of the NFS host. |
+ | | This should be set to the data LIF for your |
+ | | virtual machine. |
+ +---------------------------------------------+------------------------------------------------+
+ | ``openshift_logging_storage_nfs_directory`` | The mount path for the NFS export. For |
+ | | example, if the volume is junctioned as |
+ | | ``/openshift_logging``, you would use that |
+ | | path for this variable. |
+ +---------------------------------------------+------------------------------------------------+
+ | ``openshift_logging_storage_volume_name`` | The name, e.g. ``pv_ose_logs``, of the PV to |
+ | | create. |
+ +---------------------------------------------+------------------------------------------------+
+ | ``openshift_logging_storage_volume_size`` | The size of the NFS export, for example |
+ | | ``100Gi``. |
+ +---------------------------------------------+------------------------------------------------+
+
+If your OpenShift cluster is already running, and therefore Trident has been deployed and configured, the installer can use dynamic provisioning to create the volumes. The following variables will need to be configured.
+
+.. table:: Logging variables when deploying after OpenShift install
+ :align: left
+
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | Variable | Details |
+ +=====================================================+======================================================================================+
+ | ``openshift_logging_es_pvc_dynamic`` | Set to true to use dynamically provisioned volumes. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | ``openshift_logging_es_pvc_storage_class_name`` | The name of the storage class which will be used in the PVC. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | ``openshift_logging_es_pvc_size`` | The size of the volume requested in the PVC. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | ``openshift_logging_es_pvc_prefix`` | A prefix for the PVCs used by the logging service. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | ``openshift_logging_es_ops_pvc_dynamic`` | Set to ``true`` to use dynamically provisioned volumes for the ops logging instance. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | ``openshift_logging_es_ops_pvc_storage_class_name`` | The name of the storage class for the ops logging instance. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | ``openshift_logging_es_ops_pvc_size`` | The size of the volume request for the ops instance. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+ | ``openshift_logging_es_ops_pvc_prefix`` | A prefix for the ops instance PVCs. |
+ +-----------------------------------------------------+--------------------------------------------------------------------------------------+
+
+.. note::
+ A bug exists in OpenShift 3.9 which prevents a storage class from being used when the value for ``openshift_logging_es_ops_pvc_dynamic`` is set to ``true``. However, this can be worked around by, counterintuitively, setting the variable to ``false``, which will include the storage class in the PVC.
+
+**Deploy the logging stack**
+
+If you are deploying logging as a part of the initial OpenShift install process, then you only need to follow the standard deployment process. Ansible will configure and deploy the needed services and OpenShift objects so that the service is available as soon as Ansible completes.
+
+However, if you are deploying after the initial installation, the component playbook will need to be used by Ansible. This process may change slightly with different versions of OpenShift, so be sure to read and follow `the documentation `_ for your version.
+
+Metrics service
+---------------
+
+The metrics service provides valuable information to the administrator regarding the status, resource utilization, and availability of the OpenShift cluster. It is also necessary for pod autoscale functionality and many organizations use data from the metrics service for their charge back and/or show back applications.
+
+Like with the logging service, and OpenShift as a whole, Ansible is used to deploy the metrics service. Also, like the logging service, the metrics service can be deployed during initial setup of the cluster or after its operational using the component installation method. The following tables contain the variables which are important when configuring persistent storage for the metrics service.
+
+.. note::
+ The tables below only contain the variables which are relevant for storage configuration as it relates to the metrics service. There are many other options found in the documentation which should be reviewed, configured, and used according to your deployment.
+
+.. table:: Metrics variables when deploying at OpenShift install time
+ :align: left
+
+ +---------------------------------------------+-----------------------------------------------------+
+ | Variable | Details |
+ +=============================================+=====================================================+
+ | ``openshift_metrics_storage_kind`` | Set to ``nfs`` to have the installer create an NFS |
+ | | PV for the logging service. |
+ +---------------------------------------------+-----------------------------------------------------+
+ | ``openshift_metrics_storage_host`` | The hostname or IP address of the NFS host. This |
+ | | should be set to the data LIF for your SVM. |
+ +---------------------------------------------+-----------------------------------------------------+
+ | ``openshift_metrics_storage_nfs_directory`` | The mount path for the NFS export. For example, if |
+ | | the volume is junctioned as ``/openshift_metrics``, |
+ | | you would use that path for this variable. |
+ +---------------------------------------------+-----------------------------------------------------+
+ | ``openshift_metrics_storage_volume_name`` | The name, e.g. ``pv_ose_metrics``, of the PV to |
+ | | create. |
+ +---------------------------------------------+-----------------------------------------------------+
+ | ``openshift_metrics_storage_volume_size`` | The size of the NFS export, for example ``100Gi``. |
+ +---------------------------------------------+-----------------------------------------------------+
+
+If your OpenShift cluster is already running, and therefore Trident has been deployed and configured, the installer can use dynamic provisioning to create the volumes. The following variables will need to be configured.
+
+.. table:: Metrics variables when deploying after OpenShift install
+ :align: left
+
+ +-------------------------------------------------------+-------------------------------------------------------------+
+ | Variable | Details |
+ +=======================================================+=============================================================+
+ | ``openshift_metrics_cassandra_pvc_prefix`` | A prefix to use for the metrics PVCs. |
+ +-------------------------------------------------------+-------------------------------------------------------------+
+ | ``openshift_metrics_cassandra_pvc_size`` | The size of the volumes to request. |
+ +-------------------------------------------------------+-------------------------------------------------------------+
+ | ``openshift_metrics_cassandra_storage_type`` | The type of storage to use for metrics, this must be set to |
+ | | dynamic for Ansible to create PVCs with the appropriate |
+ | | storage class. |
+ +-------------------------------------------------------+-------------------------------------------------------------+
+ | ``openshift_metrics_cassanda_pvc_storage_class_name`` | The name of the storage class to use. |
+ +-------------------------------------------------------+-------------------------------------------------------------+
+
+**Deploying the metrics service**
+
+With the appropriate Ansible variables defined in your hosts/inventory file, deploy the service using Ansible. If you are deploying at OpenShift install time, then the PV will be created and used automatically. If you're deploying using the component playbooks, after OpenShift install, then Ansible will create any PVCs which are needed and, after Trident has provisioned storage for them, deploy the service.
+
+The variables above, and the process for deploying, may change with each version of OpenShift. Ensure you review and follow `the deployment guide `_ for your version so that it is configured for your environment.
diff --git a/docs/dag/kubernetes/introduction.rst b/docs/dag/kubernetes/introduction.rst
new file mode 100644
index 000000000..e79cc3eed
--- /dev/null
+++ b/docs/dag/kubernetes/introduction.rst
@@ -0,0 +1,28 @@
+.. _introduction:
+
+************
+Introduction
+************
+
+Containers have quickly become one of the most popular methods of packaging and deploying applications. The ecosystem surrounding the creation, deployment, and management of containerized applications has exploded, resulting in myriad solutions available to customers who simply want to deploy their applications with as little friction as possible.
+
+Application teams love containers due to their ability to decouple the application from the underlying operating system. The ability to create a container on their own laptop, then deploy to a teammate's laptop, their on-premises data center, hyperscalars, and anywhere else means that they can focus their efforts on the application and its code, not on how the underlying operating system and infrastructure are configured.
+
+At the same time, operations teams are only just now seeing the dramatic rise in popularity of containers. Containers are often approached first by developers for personal productivity purposes, which means the infrastructure teams are insulated from or unaware of their use. However, this is changing. Operations teams are now expected to deploy, maintain, and support infrastructures which host containerized applications. In addition, the rise of DevOps is pushing operations teams to understand not just the application, but the deployment method and platform at a much greater depth than ever before.
+
+Fortunately there are robust platforms for hosting containerized applications. Arguably the most popular of those platforms is `Kubernetes `_, an open source `Cloud Native Computing Foundation (CNCF) `_ project, which orchestrates the deployment of containers, including connecting them to network and storage resources as needed.
+
+Deploying an application using containers doesn't change its fundamental resource requirements. Reading, writing, accessing, and storing data doesn't change just because a container technology is now a part of the stack in addition to virtual and/or physical machines.
+
+To facilitate the consumption of storage resources by containerized applications, `NetApp `_ created and released an open source project known as `Trident `_. Trident is a storage orchestrator which integrates with Docker and Kubernetes, as well as platforms built on those technologies, such as `Red Hat OpenShift `_, `Rancher `_, and `IBM Cloud Private `_. The goal of Trident is to make the provisioning, connection, and consumption of storage as transparent and frictionless for applications as possible; while operating within the constraints put forth by the storage administrator.
+
+To achieve this goal, Trident automates the storage management tasks needed to consume storage for the storage administrator, the Kubernetes and Docker administrators, and the application consumers. Trident fills a critical role for storage administrators, who may be feeling pressure from application teams to provide storage resources in ways which have not previously been expected. Modern applications, and just as importantly modern development practices, have changed the storage consumption model, where resources are created, consumed, and destroyed quickly. According to `DataDog `_, containers have a median lifespan of just six days. This is dramatically different than storage resources for traditional applications, which commonly exist for years. Those which are deployed using container orchestrators have an even shorter lifespan of just a half day. Trident is the tool which storage administrators can rely on to safely, within the bounds given to it, provision the storage resources applications need, when they need them, and where they need them.
+
+Target Audience
+===============
+
+This document outlines the design and architecture considerations that should be evaluated when deploying containerized applications with persistence requirements within your organization. Additionally, you can find best practices for configuring Kubernetes and OpenShift with Trident.
+
+It is assumed that you, the reader, have a basic understanding of containers, Kubernetes, and storage prior to reading this document. We will, however, explore and explain some of the concepts which are important to integrating Trident, and through it NetApp's storage platforms and services, with Kubernetes. Unless noted, Kubernetes and OpenShift can be used interchangeably in this document.
+
+As with all best practices, these are suggestions based on the experience and knowledge of the NetApp team. Each should be considered according to your environment and targeted applications.
diff --git a/docs/dag/kubernetes/kubernetes_cluster_architecture_considerations.rst b/docs/dag/kubernetes/kubernetes_cluster_architecture_considerations.rst
new file mode 100644
index 000000000..e2fa5268e
--- /dev/null
+++ b/docs/dag/kubernetes/kubernetes_cluster_architecture_considerations.rst
@@ -0,0 +1,153 @@
+.. _kubernetes_cluster_architecture_considerations:
+
+**************************************************
+Kubernetes Cluster Architecture and Considerations
+**************************************************
+
+Kubernetes is extremely flexible and is capable of being deployed in many different configurations. It supports clusters as small as a single node and as large as a `few thousand `_. It can be deployed using either physical or virtual machines on premises or in the cloud. However, single node deployments are mainly used for testing and are not suitable for production workloads. Also, hyperscalers such as AWS, Google Cloud and Azure abstract some of the initial and basic deployment tasks away. When deploying Kubernetes, there are a number of considerations and decisions to make which can affect the applications and how they consume storage resources.
+
+Cluster concepts and components
+===============================
+
+A Kubernetes cluster typically consists of two types of nodes, each responsible for different aspects of functionality:
+
+* Master nodes – These nodes host the control plane aspects of the cluster and are responsible for, among other things, the API endpoint which the users interact with and provide scheduling for pods across resources. Typically, these nodes are not used to schedule application workloads.
+* Compute nodes – Nodes which are responsible for executing workloads for the cluster users.
+
+The cluster has a number of Kubernetes intrinsic services which are deployed in the cluster. Depending on the service type, each service is deployed on only one type of node (master or compute) or on a mixture of node types. Some of these services, such as etcd and DNS, are mandatory for the cluster to be functional, while other services are optional. All of these services are deployed as pods within Kubernetes.
+
+* etcd – etcd is a distributed key-value datastore. It is used heavily by Kubernetes to track the state and manage the resources associated with the cluster.
+* DNS – Kubernetes maintains an internal DNS service to provide local resolution for the applications which have been deployed. This enables inter-pod communication to happen while referencing friendly names instead of internal IP addresses which can change as the container instances are scheduled.
+* API Server - Kubernetes deploys the API server to allow interaction between kubernetes and the outside world. This is deployed on the master node(s).
+* The dashboard – an optional component which provides a graphical interface to the cluster.
+* Monitoring and logging – optional components which can aid with resource reporting.
+
+.. note::
+ We have not discussed Kubernetes container networking to allow pods to communicate with each other, or to outside the cluster. The choice of using a overlay network (e.g. Flannel) or a straight layer 3 solution (e.g. Calico) is out of scope of this document and does not affect access to storage resources by the pods.
+
+Cluster architectures
+=====================
+
+There are three primary Kubernetes cluster architectures. These accommodate various methods of high availablility and recoverability of the cluster, it's services, and the applications running. Trident is installed the same no matter which kubernetes architecture is chosen.
+
+Master nodes are critical to the operation of the cluster. If no masters are running, or the master nodes are unable to reach a quorum, then the cluster is unable to schedule and execute applications. The master nodes are the control plane for the cluster and consequentially there should be special consideration given to their `sizing `_ and quantity.
+
+Compute nodes are, generally speaking, much more disposable. However, extra resources must be built into the compute infrastructure to accomodate any workloads from failed nodes. Compute nodes can be added and removed from the cluster as needed quickly and easily to accommodate the scale of the applications which are being hosted. This makes it very easy to burst, and reclaim, resources based on real-time application workload.
+
+Single master, compute
+----------------------
+
+This architecture is the easiest to deploy but does not provide high availability of the core management services. In the event the master node is unavailable, no interaction can happen with the cluster until, at a minimum, the Kubernetes API server is returned to service.
+
+This architecture can be useful for testing, qualification, proof-of-concept, and other non-production uses, however it should never be used for production deployments.
+
+A single node used to host both the master service and the workloads is a variant of this architecture. Using a single node kubernetes cluster is useful when testing or experimenting with different concepts and capabilities. However, the limited scale and capacity make it unreasonable for more than very small tests. The Trident :ref: `quick start guide ` outlines the process to instantiate a single node Kubernetes cluster with Trident that provides full functionality for testing and validation.
+
+Multiple master, compute
+------------------------
+
+Having multiple master nodes ensures that services remain available should master node(s) fail. In order to facilitate availability of master services, they should be deployed with odd numbers (e.g. 3,5,7,9 etc.) so quorum (master node majority) can be maintained should one or more masters fail. In the HA scenario, Kubernetes will maintain a copy of the etcd databases on each master, but hold elections for the control plane function leaders `kube-controller-manager` and `kube-scheduler` to avoid conflicts. The worker nodes can communicate with any master's API server through a load balancer.
+
+ Deploying with multiple masters is the minimum recommended configuration for most production clusters.
+
+.. _figMultiMasterCluster:
+
+.. figure:: images/MultiMasterCluster1.png
+ :align: center
+ :figclass: align-center
+
+ Multiple master architecture
+
+Pros:
+
+* Provides highly available master services, ensuring that the loss of up to (n/2) – 1 master nodes will not affect cluster operations.
+
+Cons:
+
+* More complex initial setup.
+
+Master, etcd, compute
+---------------------
+
+This architecture isolates the etcd cluster from the other master server services. This removes workload from the master servers, enabling them to be sized smaller, and makes their scale out (or in) more simple.
+Deploying a Kubernetes cluster using this model adds a degree of complexity, however it adds flexibility to the scale, support, and management of the etcd service used by Kubernetes, which may be desirable to some organizations.
+
+.. _figMultietcdCluster:
+
+.. figure:: images/MultietcdCluster.png
+ :align: center
+ :figclass: align-center
+
+ Multiple master, etcd, compute architecture
+
+
+
+
+Pros:
+
+* Provides highly available master services, ensuring that the loss of up to (n/2) – 1 master nodes will not affect cluster operations.
+* Isolating etcd from the other master services reduces workload for master servers.
+* Decoupling etcd from the masters makes etcd administration and protection easier. Independent management allows for different protection and scaling schemes.
+
+Cons:
+
+* More complex initial setup.
+
+
+
+
+
+
+Red Hat OpenShift infrastructure architecture
+---------------------------------------------
+
+In addition to the architectures referenced above, Red Hat's OpenShift introduces the concept of `infrastructure nodes `_. These nodes host cluster services such as log aggregation, metrics collection and reporting, container registry services, and overlay network management and routing.
+
+`Red Hat recommends `_ a minimum of three infrastructure nodes for production deployments. This ensures that the services have resources available and are able to migrate in the event of host maintenance or failure.
+
+This architecture enables the services which are critical to the cluster, i.e. registry, overlay network routing, and others to be hosted on dedicated nodes. These dedicated nodes may have additional redundancy, different CPU/RAM requirements, and other low-level differences from compute nodes. This also makes adding and removing compute nodes as needed easier, without needing to worry about core services being affected by a node being evacuated.
+
+.. _figMultiinfraCluster:
+
+.. figure:: images/MultiInfraCluster.png
+ :align: center
+ :figclass: align-center
+
+ OpenShift, Multiple master, infra, compute architecture
+
+
+
+
+An additional option involves separating out the master and etcd roles into different servers in the same way as can be done in Kubernetes. This results in having master, etcd, infrastructure, and compute node roles. Further details, including examples of OpenShift node roles and potential deployment options, can be found in the `Red Hat documentation `_.
+
+
+Choosing an architecture
+========================
+
+Regardless of the architecture that you choose, it's important to understand the ramifications to high availability, scalability, and serviceability of the component services. Be sure to consider the affect on the applications being hosted by the Kubernetes or OpenShift cluster. The architecture of the storage infrastructure supporting the Kubernetes/OpenShift cluster and the hosted applications can also be affected by the chosen cluster architecture, such as where etcd is hosted.
+
+
+Persistent storage for cluster services
+=======================================
+
+Dynamically provisioned persistent storage for the applications is provided using the storage class mechanism, with Trident acting as the interface to the NetApp portfolio. However, as you may have noted above there are several critical services hosted on the master servers and other cluster critical services which may be hosted on other node types.
+
+Etcd persistent storage
+-----------------------
+
+When Kubernetes etcd is hosted by the master server it uses local storage. Instead, if you desire to leverage an enterprise storage array for etcd, you must mount a volume to the master node at the correct location prior to kubernetes deployment. This storage cannot be dynamically provisioned by Trident or any other storage class provisioner as it is needed prior to the Kubernetes cluster being operational.
+This same paradigm holds true if dedicated etcd nodes are being used. Prior to deploying etcd, the volume from the storage system must be mounted to the host’s file system at the location etcd is configured to use.
+Refer to your Kubernetes’ provider documentation on where to mount the volume and/or customize the etcd configuration to use non-default storage.
+
+Persistent storage for logging services
+---------------------------------------
+
+Centralized logging for applications deployed to the Kubernetes cluster is an optional service. Depending on how, and when, the service is deployed the storage class concepts may be able to dynamically provision storage for the service.
+Refer to your vendor’s documentation on how to customize the storage for logging services. Additionally, this document discusses Red Hat’s OpenShift logging service best practices in a later chapter.
+
+Metrics and analytics services
+------------------------------
+
+Many third-party metrics and analytics tools are available for monitoring, reporting, and providing status of the applications and cluster. These tools may require persistent storage, often with specific performance characteristics.
+Each vendor has different storage recommendations and deployment methodology. Work with your vendor to identify requirements and, if needed, provision storage from an enterprise array to meet the requirements. This document will discuss the Red Hat OpenShift metrics service in a later chapter.
+
diff --git a/docs/dag/kubernetes/netapp_products_integrations.rst b/docs/dag/kubernetes/netapp_products_integrations.rst
new file mode 100644
index 000000000..e1f13af5e
--- /dev/null
+++ b/docs/dag/kubernetes/netapp_products_integrations.rst
@@ -0,0 +1,47 @@
+.. _netapp_products_integrations:
+
+************************************************
+NetApp Products and Integrations with Kubernetes
+************************************************
+
+The NetApp portfolio of storage products integrates with many different aspects of a Kubernetes cluster, providing advanced storage capabilities which enhance the functionality, capability, performance, and availability of the Kubernetes deployment.
+
+
+Trident
+-------
+
+NetApp Trident is a dynamic storage provisioner for the containers ecosystem. It provides the ability to create storage volumes for containerized applications managed by Docker and Kubernetes. Trident is a fully supported, open source project hosted on `GitHub `_.
+Trident works with the portfolio of NetApp storage platforms to deliver storage on-demand to applications according to policies defined by the administrator. When used with Kubernetes, Trident is deployed using native paradigms and provides persistent storage to all namespaces in the cluster.
+For more information about Trident, visit `ThePub `_.
+
+
+ONTAP
+-----
+
+ONTAP is NetApp’s multiprotocol, unified storage operating system providing advanced data management capabilities for any application. ONTAP systems may have all-flash, hybrid, or all-HDD configurations and offer many different deployment models, including engineered hardware (FAS and AFF), white-box (ONTAP Select), and cloud-only (Cloud Volumes ONTAP). Trident supports all the above mentioned deployment models.
+
+Element OS
+----------
+
+Element OS enables the storage administrator to consolidate workloads by guaranteeing performance and enabling a simplified and streamlined storage footprint. Coupled with a full API to enable automation of all aspects of storage management, Element OS enables storage administrators to do more with less effort.
+
+Trident supports all Element OS clusters, more information can be found at `Element Software `_.
+
+NetApp HCI
+----------
+
+NetApp HCI simplifies the management and scale of the datacenter by automating routine tasks and enabling infrastructure administrators to focus on more important functions.
+
+NetApp HCI is fully supported by Trident, which is able to provision and manage storage devices for containerized applications directly against the underlying HCI storage platform. For more information about NetApp HCI visit `NetApp HCI `_.
+
+SANtricity
+----------
+
+The NetApp E and EF Series storage platforms, using the SANtricity operating system, provides robust storage that is highly available, performant, and capable of delivering storage services for applications at any scale.
+
+Trident is able to create and manage SANtricity volumes across the portfolio of products.
+
+For more information about SANtricity and the storage platforms which use it, see `SANtricity Software `_.
+
+
+
diff --git a/docs/dag/kubernetes/security_recommendations.rst b/docs/dag/kubernetes/security_recommendations.rst
new file mode 100644
index 000000000..1b7ed242c
--- /dev/null
+++ b/docs/dag/kubernetes/security_recommendations.rst
@@ -0,0 +1,50 @@
+.. _security_recommendations:
+
+*************************
+Security Recommendations
+*************************
+
+Run Trident in its own namespace
+---------------------------------
+
+It is important to prevent applications, application admins, users, and management applications from accessing Trident object definitions or the pods to ensure reliable storage and block potential malicious activity. To separate out the other applications and users from Trident, always install Trident in its own Kubernetes namespace. In our :ref:`Installing Trident docs ` we call this namespace `trident`. Putting Trident in its own namespace assures that only the Kubernetes administrative personnel have access to the Trident pod and the artifacts (such as backend and CHAP secrets if applicable) stored in Trident's etcd datastore. Allow only administrators access to the trident namespace and thus access to `tridentctl` application.
+
+etcd container
+--------------
+
+Trident's state is stored in etcd. Etcd contains details regarding the backends, storage classes, and volumes the Trident provisioner creates. Trident's default install method installs etcd as a container in the Trident pod. For security reasons, the etcd container located within the Trident pod is not accessible via its REST interface outside the pod, and only the trident container can access the etcd container. If you choose to use an :ref:`external etcd `, authenticate Trident with the etcd cluster using certificates. This also ensures all the communication between the Trident and etcd is encrypted.
+
+CHAP authentication
+-------------------
+
+NetApp recommends deploying bi-directional CHAP to ensure authentication between a host and the SolidFire backend. Trident uses a secret object that includes two CHAP passwords per SolidFire tenant. Kubernetes manages mapping of Kubernetes tenant to SF tenant. Upon volume creation time, Trident makes an API call to the SolidFire system to retrieve the secrets if the secret for that tenant doesn’t already exist. Trident then passes the secrets on to Kubernetes. The kubelet located on each node accesses the secrets via the Kubernetes API and uses them to run/enable CHAP between each node accessing the volume and the SolidFire system where the volumes are located.
+
+Since Trident v18.10, SolidFire defaults to use CHAP if the Kubernetes version is >= 1.7 and a Trident access group doesn't exist. Setting `AccessGroup` or `UseCHAP` in the backend configuration file overrides this behavior. CHAP is guaranteed by setting ``UseCHAP`` to ``true`` in the backend.json file.
+
+
+Storage backend credentials
+---------------------------
+
+Delete backend config files
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Deleting the backend config files helps prevent unauthorized users from accessing backend usernames, passwords and other credentials. After the backends are created using the `tridentctl create backend` command, make sure that the backend config files are deleted from the node where Trident was installed. Keep a copy of the file in a secure location if the backend file will be updated in the future (e.g. to change passwords)..
+
+Encrypt Trident* etcd Volume
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Since Trident stores all its backend credentials in its etcd datastore, it may be possible for unauthorized users to access this information either from etcdctl or directly from the hardware hosting the etcd volume. Therefore, as an additional security measure, enable encryption on the Trident volume.
+
+The appropriate encryption license must be enabled on the backends to encrypt Tridents volume.
+
+**ONTAP Backend**
+
+
+ Prior to Trident installation, edit the :ref:`temporary backend.json ` file to include :ref:`encryption `. When the volume is created for Tridents use, that volume will then be encrypted upon trident installation.
+
+Alternatively, encrypt the trident volume using the ONTAP CLI command ``volume encryption conversion start -vserver SVM_name -volume volume_name``. Verify the status of the conversion operation using the command ``volume encryption conversion show``. Please note that you cannot use `volume encryption conversion start` ONTAP CLI command to start encryption on a SnapLock or FlexGroup volume. For more information on how setup NetApp Volume Encryption, refer to the `ONTAP NetApp Encryption Power Guide `_.
+
+**Element Backend**
+
+
+On the Solidfire backend, enable encryption for the cluster.
diff --git a/docs/dag/kubernetes/storage_configuration_trident.rst b/docs/dag/kubernetes/storage_configuration_trident.rst
new file mode 100644
index 000000000..b64cb3b62
--- /dev/null
+++ b/docs/dag/kubernetes/storage_configuration_trident.rst
@@ -0,0 +1,202 @@
+.. _storage_configuration_trident:
+
+*********************************
+Storage Configuration for Trident
+*********************************
+
+Each storage platform in NetApp's portfolio has unique capabilities that benefit applications, containerized or not. Trident works with each of the major platforms: ONTAP, Element, and E-Series. There is not one platform which is better suited for all applications and scenarios than another, however the needs of the application and the team administering the device should be taken into account when choosing a platform.
+
+The storage administrator(s), Kubernetes administrator(s), and the application team(s) should work with their NetApp team to ensure that the most appropriate platform is selected.
+
+Regardless of which NetApp storage platform you will be connecting to your Kubernetes environment, you should follow the baseline best practices for the host operating system with the protocol that you will be leveraging. Optionally, you may want to consider incorporating application best practices, when available, with backend, storage class, and PVC settings to optimize storage for specific applications.
+
+Some of the best practices guides are listed below, however refer to the `NetApp Library `_ for the most current versions.
+
+* ONTAP
+
+ * `NFS Best Practice and Implementation Guide `_
+ * `SAN Administration Guide `_ (for iSCSI)
+ * `iSCSI Express Configuration for RHEL `_
+
+* SolidFire
+
+ * `Configuring SolidFire for Linux `_
+
+* E-Series
+
+ * `Installing and Configuring for Linux Express Guide `_
+
+Some example application best practices guides:
+
+* `ONTAP Best Practice Guidelines for MySQL `_
+* `MySQL Best Practices for NetApp SolidFire `_
+* `NetApp SolidFire and Cassandra `_
+* `Oracle Best Practices on NetApp SolidFire `_
+* `PostgreSQL Best Practices on NetApp SolidFire `_
+
+Not all applications will have specific guidelines, it's important to work with your NetApp team and to refer to the `library `_ for the most up-to-date recommendations and guides.
+
+Best practices for configuring ONTAP
+====================================
+
+The following recommendations are guidelines for configuring ONTAP for containerized workloads which consume volumes dynamically provisioned by Trident. Each should be considered and evaluated for appropriateness in your environment.
+
+Use SVM(s) which are dedicated to Trident
+-----------------------------------------
+
+Storage Virtual Machines (SVMs) provide isolation and administrative separation between tenants on an ONTAP system. Dedicating an SVM to applications enables the delegation of privileges and enables applying best practices for limiting resource consumption.
+
+There are several options available for the management of the SVM:
+
+* Provide the cluster management interface in the backend configuration, along with appropriate credentials, and specify the SVM name
+* Create a dedicated management interface for the SVM
+* Share the management role with an NFS data interface
+
+In each case, the interface should be in DNS, and the DNS name should be used when configuring Trident. This helps to facilicate some DR scenarios, for example, SVM-DR without the use of network identity retention.
+
+There is no preference between having a dedicated or shared management LIF for the SVM, however you should ensure that your network security policies align with the approach you choose. Regardless, the management LIF should be accessible via DNS to facilitate maximum flexibility should `SVM-DR `_ be used in conjunction with Trident.
+
+Limit the maximum volume count
+------------------------------
+
+ONTAP storage systems have a maximum volume count which varies based on the software version and hardware platform, see the `Hardware Universe `_ for your specific platform and ONTAP version to determine exact limits. When the volume count is exhausted, provisioning operations will fail not only for Trident, but for all storage requests.
+
+Trident's ``ontap-nas`` and ``ontap-san`` drivers provision a FlexVolume for each Kubernetes persistent volume (PV) which is created, and the ``ontap-nas-economy`` driver will create approximately one FlexVolume for every 200 PVs. To prevent Trident from consuming all available volumes on the storage system, it is recommended that a limit be placed on the SVM. This can be done from the command line:
+
+.. code-block:: console
+
+ vserver modify -vserver -max-volumes
+
+The value for ``max-volumes`` will vary based on several criteria specific to your environment:
+
+* The number of existing volumes in the ONTAP cluster
+* The number of volumes you expect to provision outside of Trident for other applications
+* The number of persistent volumes expected to be consumed by Kubernetes applications
+
+The ``max-volumes`` value is total volumes provisioned across all nodes in the cluster, not to an individual node. As a result, some conditions may be encountered where a cluster node may have far more, or less, Trident provisioned volumes than another.
+
+For example, a 2-node ONTAP cluster has the ability to host a maximum of 2000 FlexVolumes. Having the maximum volume count set to 1250 appears very reasonable. However, if only aggregates from one node are assigned to the SVM, or the aggregates assigned from one node are unable to be provisioned against (e.g. due to capacity), then the other node will be the target for all Trident provisioned volumes. This means that the volume limit may be reached for that node before the ``max-volumes`` value is reached, resulting in impacting both Trident and other volume operations using that node. Avoid this situation by ensuring that aggregates from each node in the cluster are assigned to the SVM used by Trident in equal numbers.
+
+In addition to controlling the volume count at the storage array, leveraging Kubernetes capabilities should also be used as explained in the next chapter.
+
+Limit the maximum size of volumes created by the Trident user
+-------------------------------------------------------------
+
+ONTAP can prevent a user from creating a volume above a maximum size, as defined by the administrator. This is implemented using the permissions system and should be applied to the user which Trident uses to authenticate, e.g. ``vsadmin``.
+
+.. code-block:: console
+
+ security login role modify -vserver -role -access all -cmddirname "volume create" -query "-size <=50g"
+
+The above example command will prevent the user from creating volume larger than 50GiB in size. The value should be modified to what is appropriate for your applications and the expected size of volumes desired.
+
+.. note::
+ This does not apply when using the ``ontap-nas-economy`` driver. The economy driver will create the FlexVolume with a size equal to the first PVC provisioned to that FlexVolume. Subsequent PVCs provisioned to that FlexVolume will result in the volume being resized, which is not subject to the limitation described above.
+
+In addition to controlling the volume size at the storage array, leveraging Kubernetes capabilities should also be used as explained in the next chapter.
+
+Create and use an SVM QoS policy
+--------------------------------
+
+Leveraging an ONTAP QoS policy, applied to the SVM, limits the number of IOPS consumable by the Trident provisioned volumes. This helps to `prevent a bully `_ or out-of-control container from affecting workloads outside of the Trident SVM.
+
+Creating a QoS policy for the SVM can be done in a few steps. Refer to the documentation for your version of ONTAP for the most accurate information. The example below creates a QoS policy which limits the total IOPS available to the SVM to 5000.
+
+.. code-block:: console
+
+ # create the policy group for the SVM
+ qos policy-group create -policy-group -vserver -max-throughput 5000iops
+
+ # assign the policy group to the SVM, note this will not work
+ # if volumes or files in the SVM have existing QoS policies
+ vserver modify -vserver -qos-policy-group
+
+Additionally, if your version of ONTAP supports it, you may consider using a QoS minimum in order to guarantee an amount of throughput to containerized workloads. Adaptive QoS is not compatible with an SVM level policy.
+
+The number of IOPS dedicated to the containerized workloads depends on many aspects. Among other things, these include:
+
+* Other workloads using the storage array. If there are other workloads, not related to the Kubernetes deployment, utilizing the storage resources, then care should be taken to ensure that those workloads are not accidentally adversely impacted.
+* Expected workloads running in containers. If workloads which have high IOPS requirements will be running in containers, then a low QoS policy will result in a bad experience.
+
+It's important to remember that a QoS policy assigned at the SVM level will result in all volumes provisioned to the SVM sharing the same IOPS pool. If one, or a small number, of the containerized applications has a high IOPS requirement it could become a bully to the other containerized workloads. If this is the case, you may want to consider using external automation to assign per-volume QoS policies.
+
+Limit storage resource access to Kubernetes cluster members
+-----------------------------------------------------------
+
+Limiting access to the NFS volumes and iSCSI LUNs created by Trident is a critical component of the security posture for your Kubernetes deployment. Doing so prevents hosts which are not a part of the Kubernetes cluster from accessing the volumes and potentially modifying data unexpectedly.
+
+It's important to understand that namespaces are the logical boundary for resources in Kubernetes. The assumption is that resources in the same namespace are able to be shared, however, importantly, there is no cross-namespace capability. This means that even though PVs are global objects, when bound to a PVC they are only accessible by pods which are in that same namespace. It's critical to ensure that namespaces are used to provide separation when appropriate.
+
+The primary concern for most organizations, with regard to data security in a Kubernetes context, is that a process in a container can access storage mounted to the host, but which is not intended for the container. Simply put, this is not possible. The underlying technology for containers, `namespaces `_, are designed to prevent this type of compromise. However, there is one exception: privileged containers.
+
+A privileged container is one that is run with substantially more host-level permissions than normal. These are not denied by default, so disabling the capability using `pod security policies `_ is very important for preventing this accidental exposure.
+
+For volumes where access is desired from both Kubernetes and external hosts, the storage should be managed in a traditional manner, with the PV introduced by the administrator and not managed by Trident. This ensures that the storage volume is destroyed only when both the Kubernetes and external hosts have disconnected and are no longer using the volume. Additionally, a custom export policy can be applied which enables access from the Kubernetes cluster nodes and targeted servers outside of the Kubernetes cluster.
+
+For deployments which have dedicated infrastructure nodes (e.g. OpenShift), or other nodes which are not schedulable for user applications, separate export policies should be used to further limit access to storage resources. This includes creating an export policy for services which are deployed to those infrastructure nodes, for example the OpenShift Metrics and Logging services, and standard applications which are deployed to non-infrastructure nodes.
+
+Create export policy
+--------------------
+
+Create appropriate export policies for the Storage Virtual Machines. Allow only Kubernetes nodes access to the NFS volumes.
+
+Export policies contain one or more export rules that process each node access request. Use the ``vserver export-policy create`` ONTAP CLI to create the export policy. Add rules to the export policy using the ``vserver export-policy rule create`` ONTAP CLI command. Performing the above commands enables you to restrict which nodes have access to data.
+
+Disable ``showmount`` for the application SVM
+---------------------------------------------
+
+The showmount feature enables an NFS client to query the SVM for a list of available NFS exports. A pod deployed to the Kubernetes cluster could issue the showmount -e command against the data LIF and receive a list of available mounts, including those which it does not have access to. While this isn't, by itself, dangerous or a security compromise, it does provide unnecessary information potentially aiding an unauthorized user with connecting to an NFS export.
+
+Disabling showmount is an SVM level command:
+
+.. code-block:: console
+
+ vserver nfs modify -vserver -showmount disabled
+
+Use NFSv4 for Trident's etcd when possible
+------------------------------------------
+
+NFSv3 locks are handled by Network Lock Manager (NLM), which is a sideband mechanism not using the NFS protocol. Therefore, during a failure scenario and a server hosting the Trident pod ungracefully leaves the network (either by a hard reboot or all access being abruptly severed), the NFS lock is held indefinitely. This results in Trident failure because etcd's volume cannot be mounted from another node.
+
+NFSv4 has session management and locking built into the protocol and the locks are released automatically when the session times out. In a recovery situation, the trident pod will be redeployed on another node, mount, and come back up after the v4 locks are automatically released.
+
+
+Best practices for configuring SolidFire
+=========================================
+
+**Solidfire Account**
+
+Create a SolidFire account. Each SolidFire account represents a unique volume owner and receives its own set of Challenge-Handshake Authentication Protocol (CHAP) credentials. You can access volumes assigned to an account either by using the account name and the relative CHAP credentials or through a volume access group. An account can have up to two-thousand volumes assigned to it, but a volume can belong to only one account.
+
+**SolidFire QoS**
+
+Use QoS policy if you would like to create and save a standardized quality of service setting that can be applied to many volumes.
+
+Quality of Service parameters can be set on a per-volume basis. Performance for each volume can be assured by setting three configurable parameters that define the QoS: Min IOPS, Max IOPS, and Burst IOPS.
+
+The following table shows the possible minimum, maximum, and Burst IOPS values for 4Kb block Size.
+
+ +-------------------+----------------------------------------------------+-----------+---------------+----------------+
+ | IOPS Parameter | Definition | Min value | Default Value | Max Value(4Kb) |
+ +===================+====================================================+===========+===============+================+
+ | Min IOPS | The guaranteed level of performance for a volume.| 50 | 50 | 15000 |
+ +-------------------+----------------------------------------------------+-----------+---------------+----------------+
+ | Max IOPS | Performance will not exceed this limit. | 50 | 15000 | 200,000 |
+ +-------------------+----------------------------------------------------+-----------+---------------+----------------+
+ | Burst IOPS | Maximum IOPS allowed in a short burst scenario. | 50 | 15000 | 200,000 |
+ +-------------------+----------------------------------------------------+-----------+---------------+----------------+
+
+Note: Although the Max IOPS and Burst IOPS can be set as high as 200,000, real-world maximum performance of a volume is limited by cluster usage and per-node performance.
+
+Block size and bandwidth have a direct influence on the number of IOPS. As block sizes increase, the system increases bandwidth to a level necessary to process the larger block sizes. As bandwidth increases the number of IOPS the system is able to attain decreases. For more information on QoS and performance, refer to the `NetApp SolidFire Quality of Service (QoS) `_ Guide.
+
+
+**SolidFire authentication**
+
+Element supports two methods for authentication: CHAP and Volume Access Groups (VAG). CHAP uses the CHAP protocol to authenticate the host to the backend. Volume Access groups controls access to the volumes it provisions. NetApp recommends using CHAP for authentication as it's simplier and has no scaling limits.
+
+CHAP authentication (verification that the initiator is the intended volume user) is supported only with account-based access control. If you are using CHAP for authentication, 2 options are available: unidirectional CHAP and bidirectional CHAP. Unidirectional CHAP authenticates volume access by using the SolidFire account name and initiator secret. The bidirectional CHAP option provides the most secure way of authenticating the volume since the volume authenticates the host through the account name and the initiator secret, and then the host authenticates the volume through the account name and the target secret.
+
+However, if CHAP is unable to be enabled and VAGs are required, create the access group and add the host initiators and volumes to the access group. Each IQN that you add to an access group can access each volume in the group with or without CHAP authentication. If the iSCSI initiator is configured to use CHAP authentication, account-based access control is used. If the iSCSI initiator is not configured to use CHAP authentication, then volume access group access control is used.
+
+For more information on how to setup Volume Access Groups and CHAP authentication, please refer the NetApp HCI Installation and setup guide.
diff --git a/docs/dag/kubernetes/storage_kubernetes_infrastructure_services.rst b/docs/dag/kubernetes/storage_kubernetes_infrastructure_services.rst
new file mode 100644
index 000000000..5052a4576
--- /dev/null
+++ b/docs/dag/kubernetes/storage_kubernetes_infrastructure_services.rst
@@ -0,0 +1,146 @@
+.. _storage_kubernetes_infrastructure_services:
+
+**********************************************
+Storage for Kubernetes Infrastructure Services
+**********************************************
+
+Trident focuses on providing persistence to Kubernetes applications, but before you can host those applications, you need to run a healthy, protected Kubernetes cluster. Those clusters are made up of a number of services with their own persistence requirements that need to be considered.
+
+**Node-local container storage, a.k.a. graph driver storage**
+
+One of the often overlooked components of a Kubernetes deployment is the storage which the container instances consume on the Kubernetes cluster nodes, usually referred to as `graph driver storage `_. When a container is instantiated on a node it consumes capacity and IOPS to do many of it's operations as only data which is read from or written to a persistent volume is offloaded to the external storage. If the Kubernetes nodes are expected to host dozens, hundreds, or more containers this may be a significant amount of temporary capacity and IOPS which are expected of the node-local storage.
+
+Even if you don't have a requirement to keep the data, the containers still need enough performance and capacity to execute their application. The Kubernetes administrator and storage administrator should work closely together to determine the requirements for graph storage and ensure adequate performance and capacity is available.
+
+The typical method of augmenting the graph driver storage is to use a block device mounted at the location where container instance storage is located, e.g. ``/var/lib/docker``. The host operating system being used to underpin the Kubernetes deployment will each have different methods for how to replace the graph storage with something more robust than a simple directory on the node. Refer to the documentation from Red Hat, Ubuntu, SUSE, etc. for those instructions.
+
+.. note::
+ Block protocol is specifically recommended for graph storage due to the nature of how the graph drivers work. In particular, they create thin clones, using a variety of methods depending on the driver, of the container image. NFS does not support this functionality and results in a full copy of the container image file system for each instance, resulting in significant performance and capacity implications.
+
+If the Kubernetes nodes are virtualized, this could also be addressed by ensuring that the datastore they reside on meets the performance and capacity needs, however the flexibility of having a separate device, even an additional virtual disk, should be carefully considered. Using a separate device gives the ability to independently control capacity, performance, and data protection to tailor the policies according to needs. Often the capacity and performance needed for graph storage can fluctuate dramatically, however data protection is not necessary.
+
+Persistent storage for cluster services
+=======================================
+
+There are several critical services hosted on the master servers and other cluster critical services which may be hosted on other node types.
+
+Using capacity provisioned from enterprise storage adds several benefits for each service as delineated below:
+
+* Performance – leveraging enterprise storage means being able to provide latency in-line with application needs and controlling performance using QoS policies. This can be used to limit performance for `bullies `_ or ensure performance for applications as needed.
+* Resiliency – in the event of node failure, being able to quickly recover the data and present it to a replacement ensures that the application is minimally affected.
+* Data protection – putting application data onto dedicated volumes allows the data to have its own snapshot, replication, and retention policies.
+* Data security – ensuring that data is encrypted and protected all the way to the disk level is important for meeting many compliance requirements.
+* Scale – using enterprise storage enables deploying fewer instances, with the instance count depending on compute resources, rather than having to increase the total count for resiliency and performance purposes. Data which is protected automatically, and provides adequate performance, means that horizontal scale out doesn't need to compensate for limited performance.
+
+There are some best practices which apply across all of the cluster services, or any application, which should be addressed as well.
+
+* Determining the amount of acceptable data loss, i.e. the `Recovery Point Objective `_ (RPO), is a critical part of deploying a production level system. Having an understanding of this will greatly simplify the decisions around other items described in this section.
+* Cluster service volumes should have a snapshot policy which enables the rapid recovery of data according to your requirements, as defined by the RPO. This enables quick and easy restoration of a service to a point in time by reverting the volume or the ability to recover individual files if needed.
+* Replication can provide both backup and disaster recovery capabilities, each service should be evaluated and have the recovery plan considered carefully. Using storage replication may provide rapid recovery with a higher RPO, or can provide a starting baseline which expedites restore operations without having to recover all data from other locations.
+
+etcd considerations
+-------------------
+
+You should carefully consider the high availability and data protection policy for the etcd instance used by the Kubernetes master(s). This service is arguably the most critical to the overall functionality, recoverability, and serviceability of the cluster as a whole, so it's imperative that its deployment meets your goals.
+
+The most common method of providing high availability and resiliency to the etcd instance is to horizontally scale the application by having multiple instances across multiple nodes. A minimum of three nodes is recommended.
+
+Kubernetes etcd will, by default, use local storage for its persistence. This holds true whether etcd is co-located with other master services or is hosted on dedicated nodes. To use enterprise storage for etcd a volume must be provisioned and mounted to the node at the correct location prior to deployment. This storage cannot be dynamically provisioned by Trident or any other storage class provisioner as it is needed prior to the Kubernetes cluster being operational.
+
+Refer to your Kubernetes provider's documentation on where to mount the volume and/or customize the etcd configuration to use non-default storage.
+
+logging
+-------
+
+Centralized logging for applications deployed to the Kubernetes cluster is an optional service. Using enterprise storage for logging has the same benefits as with etcd: performance, resiliency, protection, security, and scale.
+
+Depending on how and when the service is deployed, the storage class may define how to dynamically provision storage for the service. Refer to your vendor's documentation on how to customize the storage for logging services. Additionally, this document discusses Red Hat's OpenShift logging service best practices in a later chapter.
+
+metrics
+-------
+
+There are many third-party metrics and analytics services available for monitoring and reporting of the status and health of every aspect of the cluster and application. Many of these require persistent storage, often with specific performance characteristics, for the service in order for it to function as intended.
+
+Architecturally, many of these function similarly where an agent exists on each node which forwards data to a centralized collector to aggregate, analyze, and display the data. Similar to the logging service, using entprise storage allows the aggregation service to move across nodes in the cluster seamlessly and ensures the data is protected in the event of node failure.
+
+Each vendor has different recommendations and deployment methodology. Work with your vendor to identify requirements and, if needed, provision storage from an enterprise array to meet the requirements. This document will discuss the Red Hat OpenShift metrics service in a later chapter.
+
+registry
+--------
+
+The registry is the service with which users and applications will have the most direct interaction. It can also have a dramatic affect on the perceived performance of the Kubernetes cluster as a whole, as slow image push and pull operations can result in lengthy times for tasks which directly affect the developer and application.
+
+Fortunately, the registry is flexible with regard to storage protocol. Keep in mind different protocols have different implications.
+
+* Object storage is the default recommendation and is the simplest to use for Kubernetes deployments which expect to have significant scale or where the images need to be accessed across geographic regions.
+* NFS is a a good choice for many deployments as it allows a single repository for the container images while allowing many registry endpoints to front the capacity.
+* Block protocols, such as iSCSI, can be used for registry storage, but they introduce a single point of failure. The block device can only be attached to a single registry node due the single-writer limitation of the supported filesystems.
+
+Protecting the images stored in the registry will have different priorities for each organization and each application. Registry images are, generally, either cached from upstream registries or have images pushed to them during the application build process. The RTO is important to the desired protection scheme because it will affect the recovery process. If RTO is not an issue, then the applications may be able to simply rebuild the container images and push them into a new instance of the registry. If faster RTO is desired, then a replication policy should be used which adheres to the desired recovery goal.
+
+Design choices and guidelines when using ONTAP
+==============================================
+
+When using ONTAP as the backend storage for containerized applications, with storage dynamically provisioned by Trident, there are several design and implementation considerations which should be addressed prior to deployment.
+
+Storage Virtual Machines
+------------------------
+
+Storage virtual machines (SVMs) are used for administrative delegation within ONTAP. They give the storage administrator the ability to isolate a particular user, group, or application to only having access to resources which they have been specifically granted. When Trident accesses the storage system via an SVM, it is prevented from doing many system level management tasks, providing additional isolation of capabilities for storage provisioning and management tasks.
+
+There are several different ways which SVMs can be leveraged with Trident. Each is explained below. It's important to understand that having multiple Trident deployments, i.e. multiple Kubernetes clusters, does not change the below statements. When an SVM is shared with multiple Trident instances they simply need distinct prefixes defined in the backend configuration files.
+
+**SVM shared with non Trident-managed workloads**
+
+This configuration uses a single, or small number of, SVMs to host all of the workloads on the cluster and results in the containerized applications being hosted by the same SVM as other, non-containerized, workloads. The shared SVM model is common in organizations where there exists multiple network segments which are isolated and adding additional IP addresses is difficult or impossible.
+
+There is nothing inherently wrong with this configuration, however it is more challenging to apply policies which affect only the container workloads.
+
+**Dedicated SVM for Trident-managed workloads**
+
+Creating an SVM which is used solely by Trident for provisioning and deprovisioning volumes for containerized workloads is the default recommendation from NetApp. This enables the storage administrator to put controls in place to limit the amount of resources which Trident is able to consume.
+
+As was noted above, having multiple Kubernetes clusters connect to and consume storage from the same SVM is acceptable, the only change to the Trident configuration should be to :ref:`provide a different prefix `.
+
+When creating backends which connect to the same underlying SVM resources, but have differing features applied, e.g. snapshot policies, using different prefixes is recommended to aid the storage administrator with identifying volumes and ensuring that no confusion ensues as a result.
+
+**Multiple SVMs dedicated to Trident-managed workloads**
+
+You may consider using multiple SVMs with Trident for many different reasons, including isolating applications and resource domains, strict control over resources, and to facilitate multitenacy. It's also worth considering using at least two SVMs with any Kubernetes cluster to isolate persistent storage for cluster services from application storage.
+
+When using multiple SVMs, with one dedicated to cluster services, the goal is to isolate and control the workload in a more flexible way. This is possible because the expectation is that the Kubernetes cluster services SVM will not have dynamic provisioning happening against it in the same manner that the application SVM will. Many of the persistent storage resources needed by the Kubernetes cluster must exist prior to Trident deployment and consequentially must be manually provisioned by the storage administrator.
+
+Kubernetes cluster services
+---------------------------
+
+Even for cluster services persistent volumes created by Trident, there should be serious consideration given to using per-volume QoS policies, including QoS minimums when possible, and customizing the volume options for the application. Below are the default recommendations for the cluster services, however you should evaluate your needs and adjust policies according to your data protection, performance, and availability requirements. Despite these recommendations, you will still want to evaluate and determine what works best for your Kubernetes cluster and applications.
+
+**etcd**
+
+* The default snapshot policy is often adequate for protecting against data corruption and loss, however snapshots are not a backup strategy. Some consideration should be given to increasing the frequency, and decreasing the retention period, for etcd volumes. For example, keeping 24 hourly snapshots or 48 snapshots taken every 30 minutes, but not retaining them for more than one or two days. Since any data loss for etcd can be problematic, having more frequent snapshots makes this scenario easier to recover from.
+* If the disaster recovery plan is to recover the Kubernetes cluster as-is at the destination site, then these volumes should be replicated with SnapMirror or SnapVault.
+* etcd does not have significant IOPS or throughput requirements, however latency can play a critical role in the responsiveness of the Kubernetes API server. Whenever possible the lowest latency storage available should be used.
+* A QoS policy should be leveraged to provide a minimum amount of IOPS to the etcd volume(s). The minimum value will depend on the number of nodes and pods which are deployed to your Kubernetes cluster. Monitoring should be used to verify that the configured policy is adequate and adjusted over time as the Kubernetes cluster expands.
+* The etcd volumes should have their export policy or iGroup limited to only the nodes which are hosting, or could potentially host, etcd instances.
+
+**logging**
+
+* Volumes which are providing storage capacity for aggregated logging services need to be protected, however an average RPO is adequate in many instances since logging data is often not critical to recovery. If your application has strict compliance requirements, this may be different however.
+* Using the default snapshot policy is generally adequate. Optionally, depending on the needs of the administrators, reducing the snapshot policy to one which keeps as few as seven daily snapshots may be acceptable.
+* Logging volumes should be replicated to protect the historical data for use by the application and by administrators, however recovery may be deprioritized for other services.
+* Logging services have widely variable IOPS requirements and read/write patterns. It's important to consider the number of nodes, pods, and other objects in the cluster. Each of these will generate data which needs to be stored, indexed, analyzed, and presented, so a larger cluster may have substantially more data than expected.
+* A QoS policy may be leveraged to provide both a minimum and maximum amount of throughput available. Note that the maximum may need to be adjusted as additional pods are deployed, so close monitoring of the performance should be used to verify that logging is not being adversely affected by storage performance.
+* The volumes export policy or iGroup should be limited to nodes which host the logging service. This will depend on the particular solution used and the chosen configuration. For example OpenShift's logging service is deployed to the infrastructure nodes.
+
+**metrics**
+
+* Kubernetes autoscale feature relies on metrics to provide data for when scale operations need to occur. Also, metrics data often plays a critical role in show-back and charge-back operations, so ensure that you are working to address the needs of the entire business with the RPO policy. Ensure that your RPO and RTO meet the needs of these functions.
+* As the number of cluster nodes and deployed pods increases, so too does the amount of data which is collected and retained by the metrics service. It's important to understand the performance and capacity recommendations provided by the vendor for your metrics service as they can vary dramatically, particularly depending on the amount of time for which the data is retained and the number of metrics which are being monitored.
+* A QoS policy can be used to limit the amount of IOPS or throughput which the metrics services uses, however it is generally not necessary to use a minimum policy.
+* It is recommended to limit the export policy or iGroup to the hosts which the metrics service is executed from. Note that it's important to understand the architecture of your metrics provider. Many have agents which run on all hosts in the cluster, however those will report metrics to a centralised repository for storage and reporting. Only that group of nodes needs access.
+
+**registry**
+
+* Using a snapshot policy for the registry data may be valuable for recovering from data corruption or other issues, however it is not necessary. A basic snapshot policy is recommended, however individual container images cannot be recovered (they are stored in a hashed manner), only a full volume revert can be used to recover data.
+* The workload for the registry can vary widely, however the general rule is that push operations happen infrequently, while pull operations happen frequently. If a CI/CD pipeline process is used to build, test, and deploy the application(s) this may result in a predictable workload. Alternatively, and even with a CI/CD system in use, the workload can vary based on application scaling requirements, build requirements, and even Kubernetes node add/remove operations. Close monitoring of the workload should be implemented to adjust as necessary.
+* A QoS policy may be implemented to ensure that application instances are still able to pull and deploy new container images regardless of other workloads on the storage system. In the event of a disaster recovery scenario, the registry may have a heavy read workload while applications are instantiated on the destination site. The configured QoS minimum policy will prevent other disaster recovery operations from slowing application deployment.
diff --git a/docs/index.rst b/docs/index.rst
index 47404469d..8c9f6d457 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -19,6 +19,7 @@ Storage Orchestrator for Containers
:caption: Kubernetes
kubernetes/index
+ dag/kubernetes/index
.. toctree::
:caption: Docker
diff --git a/docs/kubernetes/operations/tasks/backends/ontap.rst b/docs/kubernetes/operations/tasks/backends/ontap.rst
index cd5bf1f19..e4fa7f915 100644
--- a/docs/kubernetes/operations/tasks/backends/ontap.rst
+++ b/docs/kubernetes/operations/tasks/backends/ontap.rst
@@ -154,10 +154,10 @@ Parameter Description
spaceReserve Space reservation mode; "none" (thin) or "volume" (thick) "none"
snapshotPolicy Snapshot policy to use "none"
snapshotReserve Percentage of volume reserved for snapshots "0" if snapshotPolicy is "none", else ""
-splitOnClone Split a clone from its parent upon creation false
-encryption Enable NetApp volume encryption false
+splitOnClone Split a clone from its parent upon creation "false"
+encryption Enable NetApp volume encryption "false"
unixPermissions ontap-nas* only: mode for new volumes "777"
-snapshotDir ontap-nas* only: access to the .snapshot directory false
+snapshotDir ontap-nas* only: access to the .snapshot directory "false"
exportPolicy ontap-nas* only: export policy to use "default"
securityStyle ontap-nas* only: security style for new volumes "unix"
========================= =============================================================== ================================================