diff --git a/docs/src/packages.dot b/docs/src/assets/packages.dot
similarity index 100%
rename from docs/src/packages.dot
rename to docs/src/assets/packages.dot
diff --git a/docs/src/packages.png b/docs/src/assets/packages.png
similarity index 100%
rename from docs/src/packages.png
rename to docs/src/assets/packages.png
diff --git a/docs/src/packages_sketchy.png b/docs/src/assets/packages_sketchy.png
similarity index 100%
rename from docs/src/packages_sketchy.png
rename to docs/src/assets/packages_sketchy.png
diff --git a/docs/src/dev-notes/AssemblyStrategies.md b/docs/src/dev-notes/AssemblyStrategies.md
new file mode 100644
index 00000000..7b1eae9e
--- /dev/null
+++ b/docs/src/dev-notes/AssemblyStrategies.md
@@ -0,0 +1,33 @@
+# Assembly strategies
+
+GridapDistributed offers several assembly strategies for distributed linear systems. These strategies modify the ghost layout for the rows and columns of the assembled matric and vector. Depending on your usecase, one strategy may be more convenient than the others.
+
+## SubAssembledRows
+
+!!! info
+    - **Main idea:** Both columns and rows are ghosted, whith (potentially) different ghost layouts. Assembly is costly but matrix-vector products are cheap.
+    - **Pros:** Matrix-vector product fills both owned and ghost rows of the output vector. Communication is therefore not required to make the output vector consistent.
+    - **Cons:** Communication is required to assemble the matrix and vector.
+    - **Use cases:** Default assembly strategy.
+
+- Each processor integrates over the **owned cells**, i.e there are no duplicated cell contributions. However, processors do not hold all the contributions they need to assemble their matrix and vector.
+- 
+
+## FullyAssembledRows
+
+!!! info
+    - **Main idea:** Columns are ghosted, but rows ownly contain owned indices. Assembly is cheap but matrix-vector products are costly.
+    - **Pros:** Assembly is local, i.e no communication is required. Column vectors can also be used as row vectors.
+    - **Cons:** Matrix-vector product only fills the owned rows of the output vector. Communication is therefore required to make the output vector consistent.
+    - **Use cases:** This is the strategy used by PETSc. You should also use this strategy if you plan to feed back output row vectors as input column vectors during successive matrix-vector products.
+
+- Each processor integrates over **all it's local (owned + ghost) cells**, i.e contributions for interface cells are duplicated. This implies that each processor has access to **all** the contributions for its **owned dofs** without need for any communication.
+- Contributions whose row index is not owned by the processor are discarded, while owned rows can be fully assembled without any communication.
+
+## FEConsistentAssembly
+
+!!! info
+    - **Main idea:** Same as `FullyAssembledRows` but the ghost layout for the columns is the same as the original `FESpace` ghost layout.
+    - **Pros:** Assembly is local, i.e no communication is required. DoF `PVector`s from the `FESpace` can be used as column and row vectors for the matrix (like in serial).
+    - **Cons:** Matrix-vector product only fills the owned rows of the output vector. Communication is therefore required to make the output vector consistent.
+    - **Use cases:** You should use this strategy if you are constantly creating `FEFunction`s with vectors coming from the linear system (and viceversa). This is quite typical for geometric solvers.
diff --git a/docs/src/index.md b/docs/src/index.md
index 9fcca624..ca7e658b 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -8,21 +8,18 @@ Documentation of the `GridapDistributed.jl` library.
 
 ## Introduction
 
-The ever-increasing demand for resolution and accuracy in mathematical models of physical processes governed by systems of Partial Differential Equations (PDEs) 
-can only be addressed using fully-parallel advanced numerical discretization methods and scalable solution methods, thus able to exploit the vast amount of computational resources in state-of-the-art supercomputers. To this end, `GridapDistributed.jl` is a registered software package which provides 
-fully-parallel distributed memory data structures and associated methods
-for the Finite Element (FE) numerical solution of PDEs on parallel computers. Thus, it can be run on multi-core CPU desktop computers at small scales, as well as on HPC clusters and supercomputers at medium/large scales. The data structures in `GridapDistributed.jl` are designed to mirror as far as possible their counterparts in the `Gridap.jl` Julia software package, while implementing/leveraging most of their abstract interfaces. As a result, sequential Julia scripts written in the high-level Application Programming Interface (API) of `Gridap.jl` can be used verbatim up to minor adjustments in a parallel distributed memory context using `GridapDistributed.jl`.
+The ever-increasing demand for resolution and accuracy in mathematical models of physical processes governed by systems of Partial Differential Equations (PDEs) can only be addressed using fully-parallel advanced numerical discretization methods and scalable solution methods, thus able to exploit the vast amount of computational resources in state-of-the-art supercomputers. To this end, `GridapDistributed.jl` is a registered software package which provides fully-parallel distributed memory data structures and associated methods for the Finite Element (FE) numerical solution of PDEs on parallel computers. Thus, it can be run on multi-core CPU desktop computers at small scales, as well as on HPC clusters and supercomputers at medium/large scales. The data structures in `GridapDistributed.jl` are designed to mirror as far as possible their counterparts in the `Gridap.jl` Julia software package, while implementing/leveraging most of their abstract interfaces. As a result, sequential Julia scripts written in the high-level Application Programming Interface (API) of `Gridap.jl` can be used verbatim up to minor adjustments in a parallel distributed memory context using `GridapDistributed.jl`.
 This equips end-users with a tool for the development of simulation codes able to solve real-world application problems on massively parallel supercomputers while using a highly expressive, compact syntax, that resembles mathematical notation. This is indeed one of the main advantages of `GridapDistributed.jl` and a major design goal that we pursue.
 
 In order to scale FE simulations to large core counts, the mesh used to discretize the computational domain on which the PDE is posed must be partitioned (distributed) among the parallel tasks such that each of these only holds a local portion of the global mesh. The same requirement applies to the rest of data structures in the FE simulation pipeline, i.e., FE space, linear system, solvers, data output, etc. The local portion of each task is composed by a set of cells that it owns, i.e., the **local cells** of the task, and a set of off-processor cells (owned by remote processors) which are in touch with its local cells, i.e., the **ghost cells** of the task.
-This overlapped mesh partition is used by `GridapDistributed.jl`, among others, to exchange data among nearest neighbors, and to glue together global Degrees of Freedom (DoFs) which are sitting on the interface among subdomains. Following this design principle, `GridapDistributed.jl` provides scalable parallel data structures and associated methods for simple grid handling (in particular, Cartesian-like meshes of arbitrary-dimensional, topologically n-cube domains), FE spaces setup, and distributed linear system assembly. It is in our future plans to provide highly scalable linear and nonlinear solvers tailored for the FE discretization of PDEs (e.g., linear and nonlinear matrix-free geometric multigrid and domain decomposition preconditioners). In the meantime, however, `GridapDistributed.jl` can be combined with other Julia packages in order to realize the full potential required in real-world applications. These packages and their relation with `GridapDistributed.jl` are overviewed in the next section. 
+This overlapped mesh partition is used by `GridapDistributed.jl`, among others, to exchange data among nearest neighbors, and to glue together global Degrees of Freedom (DoFs) which are sitting on the interface among subdomains. Following this design principle, `GridapDistributed.jl` provides scalable parallel data structures and associated methods for simple grid handling (in particular, Cartesian-like meshes of arbitrary-dimensional, topologically n-cube domains), FE spaces setup, and distributed linear system assembly. It is in our future plans to provide highly scalable linear and nonlinear solvers tailored for the FE discretization of PDEs (e.g., linear and nonlinear matrix-free geometric multigrid and domain decomposition preconditioners). In the meantime, however, `GridapDistributed.jl` can be combined with other Julia packages in order to realize the full potential required in real-world applications. These packages and their relation with `GridapDistributed.jl` are overviewed in the next section.
 
 ## Building blocks and composability
 
 The figure below depicts the relation among `GridapDistributed.jl` and other packages in the Julia package ecosystem. The interaction of `GridapDistributed.jl` and its dependencies is mainly designed with separation of concerns in mind towards high composability and modularity. On the one hand, `Gridap.jl` provides a rich set of abstract types/interfaces suitable for the FE solution of PDEs. It also provides realizations (implementations) of these abstractions tailored to serial/multi-threaded computing environments. `GridapDistributed.jl` **implements** these abstractions for parallel distributed-memory computing environments. To this end, `GridapDistributed.jl` also leverages (**uses**) the serial realizations in `Gridap.jl` and associated methods to handle the local portion on each parallel task. (See arrow labels in the figure below.)  On the other hand, `GridapDistributed.jl` relies on `PartitionedArrays.jl` in order to handle the parallel execution model (e.g., message-passing via the Message Passing Interface (MPI)), global data distribution layout, and communication among tasks. `PartitionedArrays.jl` also provides a parallel implementation of partitioned global linear systems (i.e., linear algebra vectors and sparse matrices) as needed in grid-based numerical simulations. While `PartitionedArrays.jl` is an stand-alone package, segregated from `GridapDistributed.jl`, it was designed with parallel FE packages such as `GridapDistributed.jl` in mind. In any case, `GridapDistributed.jl` is designed so that a different distributed linear algebra library from `PartitionedArrays.jl` might be used as well, as far as it is able to provide the same functionality. 
 
 
-| ![fig:packages](./packages_sketchy.png) |
+| ![fig:packages](./assets/packages_sketchy.png) |
 |:--:|
 |`GridapDistributed.jl` and its relation to other packages in the Julia package ecosystem. In this diagram, each rectangle represents  a Julia package, while the (directed) arrows represent relations (dependencies) among packages. Both the direction of the arrow and the label attached to the arrows are used to denote the nature of the relation. Thus, e.g., `GridapDistributed.jl` depends on `Gridap.jl` and `PartitionedArrays.jl` , and GridapPETSc depends on `Gridap.jl`  and `PartitionedArrays.jl` . Note that, in the diagram, the arrow direction is relevant, e.g., GridapP4est depends on `GridapDistributed.jl` but not conversely.|
 
@@ -43,4 +40,4 @@ Here, one can find a list of resources to get started with this programming lang
 
 * First steps to learn Julia form the [Gridap wiki](https://github.com/gridap/Gridap.jl/wiki/Start-learning-Julia) page.
 * Official webpage [docs.julialang.org](https://docs.julialang.org/)
-* Official list of learning resources [julialang.org/learning](https://julialang.org/learning/)
\ No newline at end of file
+* Official list of learning resources [julialang.org/learning](https://julialang.org/learning/)