Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3-documentation review #13

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 129 additions & 19 deletions src/documentation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,54 +2,164 @@
title: "Documentation"
---

Documentation is an essential part of any software project. It is the way to communicate with potential users and contributors, and to ensure that the project is sustainable in the long term.

# R users

## DESCRIPTION file

For your entire project, you will need a DESCRIPTION file which gather the project metadata, for instance:

```
> Package: mypackage
> Title: What the Package Does (One Line, Title Case) \
> Version: 0.0.0.1000 \
> Authors@R:
> Authors@R:
person("First", "Last", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "YOUR-ORCID-ID")) \
> Description: What the package does (one paragraph). \
> Imports: Rpackage1, Rpackage2 (the list of R packages that are needed to run your analysis)
```

Some of these sections may be edited by hand, but others are automatically generated by `devtools` or `usethis` packages.

## Function documentation: basics
## Function documentation: basics

- What is needed in the function documentation?
1. what does your function do
2. with which arguments
3. what does it return
3. what does it return
4. (maybe) some examples of how to use it

- Here is an example of header for the custom 'add' function:
- Here is an example of header for the custom 'add' function:
```r
#' Add together two numbers
#'
#' @param x A number.
#' @param y A number.
#' @returns A numeric vector.
#' @examples
#' add(1, 1)
#' Add together two numbers
#'
#' @param x A number.
#' @param y A number.
#' @returns A numeric vector.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
x + y
add <- function(x, y) {
x + y
}
```

- Write both function and documentation at the same time in my-function.R file, stored in R sub-repository.
You can add many options to your documentation, such as:
- `@export` to make the function available to the user
- `@importFrom` to import a function from a package
- `@seealso` to refer to other functions

- Write both function and documentation at the same time in `my-function.R` file, stored in R sub-repository.

- Use roxygen to generate man/my-function.Rd, reading the header: the devtools function document()
- Use `roxygen` to generate `man/my-function.Rd`, reading the header: the `devtools` function `document()`
```r
devtools::document()
```
will generate (or update) your package’s .Rd files
will generate (or update) your package’s `.Rd` files

## Package documentation

For a more "integrated" documentation of your package, that details the functions, datasets, and other objects in your package, you can use [vignettes](https://r-pkgs.org/vignettes.html) that can generate webpages with interactive code, results, plots and comments, and [pkgdown](https://r-pkgs.org/website.html) to create a website for your package.

Also see [CI/CI page](ci_cd.qmd) to automate vignette and website publishing.


# Python users

## `README.md` file
This is the main documentation file for your project. It is located at the root of the project and should contain a general description of the project, its purpose, and how to use it. This is the first thing that users will see when they visit your project on GitHub or Gitlab (or wherever you host your code).

Here is a list of things that you should include in your `README.md` file:
- ___Name___ of the project / package. Idealy, it should match the name of the repository.
- ___Badges___: These are small images that show the status and the quality of your project. It is especialy usefull if you want to distribute your project / package to users. For example, you can add a badge that shows :
- the build status of the project : [![CI Build](https://github.com/pandera-dev/pandera/workflows/CI%20Tests/badge.svg?branch=main)](https://github.com/pandera-dev/pandera/actions?query=workflow%3A%22CI+Tests%22+branch%3Amain)
- the build of the documentation : [![Documentation Status](https://readthedocs.org/projects/phenomenal/badge/?version=latest)](https://phenomenal.readthedocs.io/en/latest/?badge=latest)
- the version of the package on Pypi: [![PyPI version shields.io](https://img.shields.io/pypi/v/pandera.svg)](https://pypi.org/project/pandera/) or on Conda: [![Last version](https://anaconda.org/openalea3/openalea.phenomenal/badges/version.svg)](https://anaconda.org/OpenAlea3/openalea.phenomenal/files)
- [and many more](https://shields.io/)...
- ___Description___: A short description of the project / package. 1-3 sentences is generaly enough. Just enough to give an idea of what the project is about, and generaly not too technical.
- ___Installation___: How to install the package. This should include the command to install the package using `pip` or `conda`, and any other dependencies that need to be installed.
- ___Usage___: How to use the package. This should include an example of the most basic use case of the package.
- ___Links___: Links to the documentation, tutorials, the issue tracker, the source code, the license, etc.
- ___Contributing___: How to contribute to the project. This should include information on how to report bugs, how to request new features, and how to submit code changes and how to setup the development environment.
- ___Citation___: How to cite the project.

## Documentation of API
`API` stands for Application Programming Interface. It can refer to functions, classes, or modules in your package, that create a user interface to your code. The documentation of the API is essential for users to understand how to use your package.

### Docstrings

#### What is it and how to write it?
In Python, the documentation is written in a `docstring`: a string that is the first statement in a module, function, class, or method, embedded within `"""`(triple double-quotes). The docstring should describe what the function does, what arguments it takes and their types (i.e. `strings`, `bool`, etc...), and what it returns. This `docstring` is then used by the `help()` function, and by the `pydoc` module to generate documentation.

You need to consistently write `docstrings` for all the functions, classes, and modules in your package.

There are several conventions for writing `docstrings` in Python. The most common ones are:
- [Google style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
- [Numpy style](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard)
- [reStructuredText](https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html)

#### Example
Here is an example of a function with a `docstring`:
```python
def add(x: int, y: int) -> int:
"""
Add together two numbers.

Parameters
----------
x : int
A number.
y : int
A number.

Returns
-------
int
A numeric vector.

Examples
--------
>>> add(1, 1)
2
>>> add(10, 1)
11
"""
return x + y
```
This simple function simply adds two numbers together. The `docstring` provides:
- a description of what the function does
- the inputs / parameters of the function and their types.
- the output of the function and its type
- a simple example of how to use the function. Note that theses exemples can be executed using the `doctest` module, hence providing another nice way to [test the function](testing.qmd#why-do-you-need-tests-). The lines that need to be executed are preceded by `>>>`.

> Note:
>
> You can see in the definition of the function that the arguments have "type hints" (i.e. `x: int`). This is not mandatory, but it is a good practice to add type hints to your functions, as it adds another layer of documentation and it makes the code more readable and helps catch bugs early. You can further describe the return type of the function using the `->` operator (i.e. `-> int`). The type hints are not enforced by Python, but they can be checked using a static type checker like `mypy` that will check through your code and make sure that the types are consistent.


For more complex and extensive examples, you can check `xarray`package, which has a very good documentation of its API.
[The Dataset class documlentation](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html#xarray.Dataset) and [the associated docstring](https://github.com/pydata/xarray/blob/main/xarray/core/dataset.py#L555)

## Tutorials
Tutorials are a great way to show users how to use your package. They can be written in a Jupyter notebook (`.ipynb` files). You can see great exemples of galleries of tutorials:
- [xarray](https://docs.xarray.dev/en/stable/gallery.html)
- [geopandas](https://geopandas.org/en/stable/gallery/index.html)
- [scikit-learn](https://scikit-learn.org/stable/auto_examples/index.html)

## `Sphinx` documentation
To organize your documentation, build automatically a table of content, the API reference, and the tutorials, you can use [`Sphinx`](https://www.sphinx-doc.org/en/master/). This is not the only tool to generate documentation, but it is one of the most popular. Another popular framework is [`MkDocs`](https://www.mkdocs.org/).

`Sphinx` generates static websites (i.e. they are not interactive) from templates. It is highly customizable with extensions and themes and can generate documentation in many formats (HTML, PDF, ePub, etc...). It can also be used to generate documentation for other languages than Python.

You can have a look at the [Sphinx themes gallery](https://sphinx-themes.org/). The most popular ones are `PyData`, `Furo` or `Read the Docs`.

## Package documentation
### Syntax
`Sphinx` (and `Sphinx` extensions) can handle three types of syntax for the documentation:
- `reStructuredText` (`.rst` files): this is the native syntax of `Sphinx` that has been used for many years, but has lost some popularity to `Markdown` or `myST` syntax.
- `Markdown` (`.md` files): this is a very popular syntax for writing documentation (used by jupyter notebooks) as it is simple and easy to read. However, some features of documentation are not handled by `Markdown` (like cross-references, custom elements, colored call out blocks). You need the extension `myst-parser` to use `Markdown` syntax in `Sphinx`.
- `myST` (`.myst` files): this is a new syntax that is a superset of `Markdown` and `reStructuredText`. It is more powerful than `Markdown` and more readable than `reStructuredText`. You need the extension `myst-parser` to use `myST` syntax in `Sphinx`.

For a more "integrated" manual, see vignette at <https://r-pkgs.org/vignettes.html>
and website with pkgdown here <https://r-pkgs.org/website.html>.
### Building gallery of tutorials
To build a gallery of tutorials, you can use either the [`Sphinx Gallery`](https://sphinx-gallery.github.io/stable/index.html) or the [`nbsphinx`](https://nbsphinx.readthedocs.io/en/latest/) extensions. The `Sphinx Gallery` is more powerful and can generate the gallery from `.py` files, while `nbsphinx` is more simple and can generate the gallery from `.ipynb` files.