-
Notifications
You must be signed in to change notification settings - Fork 5
/
example.Rmd
1629 lines (1325 loc) · 59.5 KB
/
example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r head, child="metadata.yaml"}
```
```{r setup, echo = FALSE}
require(knitr)
options(width=60, width.cutoff=60)
opts_chunk$set(tidy=TRUE)
hook_source_def = knit_hooks$get('source')
knit_hooks$set(source = function(x, options){
if (!is.null(options$verbatim) && options$verbatim){
opts = gsub(",\\s*verbatim\\s*=\\s*TRUE\\s*", "", options$params.src)
bef = sprintf('\n\n ```{r %s}\n', opts, "\n")
stringr::str_c(bef, paste(knitr:::indent_block(x, " "), collapse = '\n'), "\n ```\n")
} else {
hook_source_def(x, options)
}
})
rinline <- function(code){
sprintf('`r %s`', code)
}
```
```{r numbered_chunk_hook, eval=FALSE, echo=FALSE}
# The current chunk hook may not be the default, and should be processed
# prior to doing the numbering; which should come last.
previous_chunk_hook <- knitr::knit_hooks$get("chunk")
knitr::knit_hooks$set(chunk = function(x, options) {
x <- previous_chunk_hook(x, options)
if (isTRUE(options$number)) {
str <- "{.r .numberLines"
if (!is.null(options$startFrom)) {
str <- paste0(str, " startFrom=\"", options$startFrom, "\"")
}
str <- paste0(str, "}")
x <- gsub("(\\s?[`]{3,})r", paste0("\\1", str), x)
}
return(gsub("(^\n|\n+$)", "", x))
})
```
````{r numbered_chunk_hook, echo=FALSE}
````
# Introduction
This is an introduction to the use of [Markdown](http://daringfireball.net/projects/markdown/)
with embedded R code to create dynamic documents in multiple formats,
e.g. HTML, PDF and Word. This is useful to generate reports (or papers) that contain all
the relevant R code to carry out the analysis and allows for automatic updates to the
document if either the code or the data change. As a result analyses become a lot
easier to reproduce because the code and the presentation of results are closely
linked and figures and tables can be updated automatically.
Traditionally dynamic R documents like this have been (and often still are)
written in LaTeX using either Sweave or, more recently, `knitr`. While LaTeX is a very
powerful tool that allows great control over page layout, the learning curve can be
steep. More importantly, adding LaTeX commands to the text can be distracting and break
the flow of writing and coding (at least for me) and the resulting LaTeX documents
are not very readable. Of course they can be turned into beautiful PDFs but that doesn't
help while editing the text. More recently the use of Markdown has become popular.
Writing Markdown is much easier than LaTeX, thus lowering the entry barrier, and its
emphasis on maintaining readability of the raw text means that both writing and editing
documents is faster than with LaTeX.
Several tools are available to produce dynamic documents in Markdown and convert them
to various output formats. Here we will mainly focus on a combination of two of these,
namely [`knitr`](http://yihui.name/knitr/) and [`pandoc`](http://johnmacfarlane.net/pandoc/).
Much, if not all, of what is needed to create a reproducible analysis is provided by
`knitr`. This R package provides functions that allow the processing of Markdown documents
with embedded R code. The code will be executed and its output, including plots,
can be included in the output. A selection of tutorials and useful examples for
`knitr` can be found on [`knitr`'s homepage](http://yihui.name/knitr/demo/showcase/).
However, when trying to use this to generate publication quality reports the
limitations of the Markdown syntax quickly start to become apparent. The focus on
simplicity and the fact that it was originally designed for authoring web content means
that much of the requirements for scientific writing are not easily met by standard Markdown.
`Pandoc` is a very useful tool that helps to alleviate this problem. It comes with
[its own Markdown dialect](http://johnmacfarlane.net/pandoc/demo/example9/pandocs-markdown.html)
that includes many extensions that fill some of the gaps in the
Markdown syntax, including the ability to use bibliographic databases in a variety of
formats, while trying to retain the text's readability. It also facilitates the
conversion between a large number of
[document formats](http://johnmacfarlane.net/pandoc/diagram.png), providing
great flexibility.
## Code conventions
Throughout this document examples of R code and Markdown formatting will be presented in
code blocks:
```r
message("This is R code")
```
To better show the effect of Markdown examples on the output these will often be
followed by the same text rendered in the output format. To distinguish these examples
from the main text the entire block of raw and converted Markdown will be framed by
horizontal lines.
----
```markdown
This is Markdown text in **bold** and *italics*.
```
This is Markdown text in **bold** and *italics*.
----
In addition to the code examples provided throughout this document the document itself is
written in Markdown with embedded R code and may illustrate additional features.
## Availability
The HTML version of this document is [available online](http://galahad.well.ox.ac.uk/repro/index.html).
The [PDF version](http://galahad.well.ox.ac.uk/repro/repro_example.pdf) is available for download and
the source files are on [GitHub](https://github.com/humburg/reproducible-reports).
## Compiling this document
Creating PDF and HTML output from the R/Markdown source file is a two step process.
First `knitr` is used to execute the R code and produce the corresponding Markdown
output. This can be done either by starting an R session and executing
`knitr("example.Rmd")` or from the command line:
```{.bash}
Rscript --slave -e "library(knitr);knit('example.Rmd')"
```
Either way this generates a Markdown file called 'example.md'. This can then be
converted into PDF and HTML files by using the configuration file 'example.pandoc'
by calling the pandoc function from the `knitr` package.
```{.bash}
Rscript --slave -e "library(knitr);pandoc('example.md')"
```
The function automatically locates the configuration file and passes the requested
parameters to `pandoc`.
### Required software
In addition to installations of `knitr` and `pandoc` a few external tools
are required to compile this document.
[R](http://r-project.org/) is required to run `knitr` as well as other R packages
to support additional functionality.
Additional R packages used:
* [animation](http://cran.r-project.org/web/packages/animation/index.html)
(for animated figures)
* [pander](http://cran.r-project.org/web/packages/pander/index.html) (for Markdown
formatting of R objects)
These can be installed via the `install.packages` command from within R.
Animations also require [ffmpeg](https://www.ffmpeg.org/) and either
[ImageMagick](http://www.imagemagick.org/) or
[GraphicsMagick](http://www.graphicsmagick.org/).
As one might expect a working LaTeX tool chain is required to generate
PDF output from LaTeX documents. Several distributions are available
online, including [MiKTeX](http://miktex.org/) and [TeX Live](https://www.tug.org/texlive/).
[Python](https://www.python.org/) ($\geq$ 2.7) is required for the `pandoc`
filters discussed in the latter parts of this document. This also requires
the `pandocfilters` Python module, which can be installed via
[pip](https://pypi.python.org/pypi/pip).
# Brief Markdown primer
A Markdown formatted file is in essence a plain text file that may contain a number of
formatting marks. It is designed to be easy to write and read in its raw form. Although
it was originally designed as an easier way to write web pages it can be converted to
many other rich text formats.
The purpose of this section is to briefly describe basic elements of Markdown formatting.
More detailed descriptions are available online, e.g. at the official
[Markdown](http://daringfireball.net/projects/markdown/syntax) and
[`pandoc`](http://johnmacfarlane.net/pandoc/README.html) websites.
## Headers, paragraphs and emphasis
The basics of text formatting involve marking of text as headings, structuring
it into paragraphs and highlighting selected words for emphasis. Headings can be
created by underlining them:
```markdown
This is a top level heading
===========================
This is some ordinary text.
This is a second level heading
------------------------------
It is followed by more ordinary text.
### Third level heading
Adding more "#" creates lower level headings
```
Note that `pandoc` also allows the use of "#" and "##" for first and second level headings.
Paragraphs are created by adding an empty line between two lines of text:
----
```markdown
This is the first paragraph.
Line breaks are generally ignored in formatting.
This is the second paragraph.
If you add two or more spaces to the end of a line
the line break will be preserved in the conversion
to the output format.
```
This is the first paragraph.
Line breaks are generally ignored in formatting.
This is the second paragraph.
If you add two or more spaces to the end of a line
the line break will be preserved in the conversion
to the output format.
----
Several methods of highlighting text are supported:
----
```markdown
Words within a paragraph and be *emphasised*. These are usually rendered in *italics*.
**Strong emphasis** typically results in **bold** text. Instead of * it is also possible
to use _ for emphasis. With pandoc it is also possible to ~~strike out~~ text.
```
Words within a paragraph and be *emphasised*. These are usually rendered in *italics*.
**Strong emphasis** typically results in **bold** text. Instead of * it is also possible
to use _ for emphasis. With pandoc it is also possible to ~~strike out~~ text.
----
## Block elements
### Block quotes
In Markdown quotes can be marked using the same conventions commonly used in email:
----
```markdown
> This text is quoted. A single ">" at the beginning
> of the paragraph is sufficient for the entire paragraph
> to be quoted (but syntax highlighting may not work properly).
>
> > You can also quote other quotes, i.e. block quotes can be nested.
```
> This text is quoted. A single ">" at the beginning
> of the paragraph is sufficient for the entire paragraph
> to be quoted (but syntax highlighting may not work properly).
>
> > You can also quote other quotes, i.e. block quotes can be nested.
----
### Lists
Basic bullet lists can be created by starting a line with a \*:
----
```markdown
* first item
* second item
* third item
```
* first item
* second item
* third item
----
Ordered lists start with numbers
----
```markdown
1. first item
2. second item
3. third item
```
1. first item
2. second item
3. third item
----
but `pandoc` also allows this:
----
```markdown
#. first item
#. second item
#. third item
```
#. first item
#. second item
#. third item
----
There is support for other of list types and variations of the basic syntax
in `pandoc`. See the [documentation](http://johnmacfarlane.net/pandoc/README.html#lists)
for more details.
### Tables
Basic Markdown tables are created by lining up the columns and making headers, like so:
----
```markdown
Column 1 Column 2 Column 3
-------- -------- --------
1 10 100
2 20 200
3 30 300
Table: A simple table
```
Column 1 Column 2 Column 3
---------- ---------- ----------
1 10 100
2 20 200
3 30 300
Table: A simple table
----
Just as with lists there are several variations and extensions to this basic syntax
supported by `pandoc`. As usual, details can be found in the
[documentation](http://johnmacfarlane.net/pandoc/README.html#tables).
### Code blocks and inline code
Special blocks to display source code with syntax highlighting can be included by starting
a line with three back ticks, optionally followed by attributes to control aspects of
the highlighting. A block like the one below will be rendered as R code.
----
````
```r
x <- seq(-6,6, by=0.1)
yNorm <- dnorm(x)
yt <- dt(x, df=3)
yCauchy <- dcauchy(x)
plot(x, yNorm, type="l", ylab="Density")
lines(x, yt, col=2)
lines(x, yCauchy, col=4)
legend("topright", legend=c("standard normal", "t (df=2)", "Cauchy"),
col=c(1,2,4), lty=1)
```
````
```r
x <- seq(-6,6, by=0.1)
yNorm <- dnorm(x)
yt <- dt(x, df=3)
yCauchy <- dcauchy(x)
plot(x, yNorm, type="l", ylab="Density")
lines(x, yt, col=2)
lines(x, yCauchy, col=4)
legend("topright", legend=c("standard normal", "t (df=2)", "Cauchy"),
col=c(1,2,4), lty=1)
```
----
Code fragments can also be included inline:
----
```markdown
This is normal text with some R code: `x <- runif(100)`{.r}.
```
This is normal text with some R code: `x <- runif(100)`{.r}.
----
# Using `knitr` for dynamic code blocks
The code blocks we have seen so far are all static, i.e. while they do include
valid source code this code is not interpreted, just displayed. To achieve the aim of
a dynamic document that can be updated automatically if the underlying data or analysis
change we need code blocks that are actually executed. The `knitr` R package does just
that. Lets look again at the R code from the example in the previous section but this
time we will use a code block that `knitr` will process.
````
```{r distributions, verbatim = TRUE}
x <- seq(-6,6, by=0.1)
yNorm <- dnorm(x)
yt <- dt(x, df=3)
yCauchy <- dcauchy(x)
```
````
In this example syntac highlighting for the R code has been switched off
to better demonstrate how the code chunks are created. Once the code in the above
block has been evaluated we can use it for inline R statements that will be
replaced with the computed values. For example we can do something like this:
----
```markdown
The Normal density was evaluated at `r rinline("length(yNorm)")` points.
```
The Normal density was evaluated at `r length(yNorm)` points.
----
## Figures
To add a figure with a plot of the data all that is needed is to create the
plot in an R chunk.
```{r distribution_plot, fig.cap="Three related distributions"}
plot(x, yNorm, type="l", ylab="Density")
lines(x, yt, col=2)
lines(x, yCauchy, col=4)
legend("topright", legend=c("standard normal", "t (df=3)", "Cauchy"),
col=c(1,2,4), lty=1)
```
This automatically includes the plot that was generated as a figure in the
final document. It is possible to include a custom caption using the chunk option
`fig.cap`.
## Animations
It is possible to include animations (generated from a series of plots) instead of
a single figure. Look at this slightly more complex version of the previous example:
```{r distributions2, fig.show="animate", fig.cap="Movie of three related distributions"}
x <- seq(-6,6, by=0.1)
yNorm <- dnorm(x)
yCauchy <- dcauchy(x)
par(bg="white")
for(i in 1:20){
plot(x, yNorm, type="l", ylab="Density")
lines(x, dt(x, df=i), col=2)
lines(x, yCauchy, col=4)
legend("topright", legend=c("standard normal", paste0("t (df = ", i, ")"), "Cauchy"),
col=c(1,2,4), lty=1)
}
```
It is possible to generate an animated GIF from this sequence of plots by wrapping the
above code in a function and then calling `saveGIF` from the animation package.
```{r distributions3, results="hide"}
threeDists <- function(df, x=seq(-6,6, by=0.1)){
yNorm <- dnorm(x)
yCauchy <- dcauchy(x)
yt <- dt(x, df=df)
par(bg="white")
plot(x, yNorm, type="l", ylab="Density")
lines(x, yt, col=2)
lines(x, yCauchy, col=4)
legend("topright", legend=c("standard normal", paste0("t (df = ", df, ")"), "Cauchy"),
col=c(1,2,4), lty=1)
}
plotFun <- function(df){
threeDists(df)
animation::ani.pause()
}
animation::saveGIF(lapply(1:20, plotFun), interval=0.5, movie.name="dist3.gif")
```
The code above assumes that ImageMagick is installed. If you are using GraphicsMagick
instead add the option `convert="gm convert"` to the `saveGIF` call.
Since the graphics output of this code is written directly to a file rather than an on-screen
graphics device it will not be automatically included in the Markdown document produced
by `knitr`. It can be included manually using the Markdown syntax for the inclusion
of figures.
```markdown
![Animated GIF of three related distributions](dist3)
```
![Animated GIF of three related distributions](dist3)
Note that this only works for output document formats that support GIFs. As a fallback
we generate a png of the first frame to be included in other formats, e.g. PDF, that
can't display GIFs.
```{r dist3-png, results="hide"}
png("dist3.png")
threeDists(1)
dev.off()
```
Here we make use of `pandoc`'s `--default-image-extension` option to set the default
image format to gif for HTML and docx output and to png for PDF.
### Creating animations on Windows
The procedure for generating animations may fail on Windows^[Thanks to
[reyntjesr](https://github.com/reyntjesr) for reporting this issue and providing
the work around described here.]. This may be due to a conflict
between ImageMagick's `convert.exe`, which is used to convert from png to gif,
and Windows' own `convert.exe`, which converts between FAT and NTFS file systems.
It may be possible to circumvent this issue by using
[GraphicsMagick](http://www.graphicsmagick.org/) for the conversion instead. To
do this, install GraphicsMagick and replace the call to `saveGIF` above with
```r
animation::saveGIF(lapply(1:20, plotFun), interval=0.5, movie.name="dist3.gif",
convert="gm convert")
```
An alternative work-around involves bypassing the animation package entirely.
Instead, rename ImageMagick's `convert.exe` to `imConvert.exe`^[Instead of
renaming the file it is also possible to use the full path to ImageMagick's `convert.exe`
in the shell command.], generate one
png file for each frame of the animation and then call `imConvert` manually
to create the animated gif.
```r
png(file="figure/threeDist%02d.png", width=500, heigh=500)
lapply(1:20, threeDists)
dev.off()
shell("imConvert -delay 40 figure/threeDist*.png dist3.gif")
```
## Tables
It is often convenient to display the contents of R objects in the final document.
While it is easy to simply display the output of R's `print` statement as it would
be displayed in the R console, this is not exactly producing pretty results. It
is much more elegant to include proper tables that can be rendered nicely in
the final document. Manually formatting the output as a Markdown table (that pandoc
will then convert to the final output format) can be a daunting task. Fortunately
R functions exist to help with this task. The `knitr` package provides a simple function,
`kable`, that allows automatic formatting of tables. This requires the data to be in a
suitable format (either a `data.frame` or a `matrix`) so some preprocessing may be necessary.
Consider the [iris](http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html)
dataset distributed with R.
```{r iris1, results="asis"}
data(iris)
knitr::kable(head(iris))
```
A table of summary statistics can be obtained with a little extra effort:
```{r iris2}
irisSummary <- apply(iris[,1:4], 2, function(x) tapply(x, iris$Species, summary))
irisSummary <- lapply(irisSummary, do.call, what=rbind)
```
This produces a list of matrices with summary statistics by iris species:
```{r iris3}
irisSummary
```
Each of these can again be displayed as a nicely formatted table using `kable`
but unfortunately information about the column that was summarised will be lost
in the process. For some output formats `kable` supports the use of a `caption`
option but unfortuantely this doesn't work when producing Markdown, as is the
case here. An alternative is to use the `pander` package to produce output
suitable for further processing with pandoc.
```{r iris4, results="asis"}
suppressPackageStartupMessages(library(pander))
panderOptions('table.split.table', Inf) ## don't split tables
pander(irisSummary[1:2])
```
Alternatively the following code produces somewhat more elegant output at
the expense of a few extra lines of code.
```{r iris5, results="asis"}
for(i in 3:4){
set.caption(sub(".", " ", names(irisSummary)[i], fixed=TRUE))
pander(irisSummary[[i]])
}
```
The functionality provided by `pander` is alot more poweful than the simple `kable`
function and can handle a wide variety of R objects.
```{r iris6, results="asis"}
pander(t.test(Sepal.Length ~ Species=="setosa", data=iris))
```
# Converting from Markdown to multiple output formats using `knitr`
Once R code chunks have been executed via `knitr` the resulting Markdown document can be
converted to a variety of other formats with the help of `pandoc`. This generally
works well but can require the construction of lengthy command lines. To make things
worse these command lines may differ depending on the desired output format. If more
than one output format is desired this can quickly become tedious. Fortunately `knitr`
includes a function `pandoc` that takes care of the conversion process and
can use a configuration file that lists all the desired options for the desired target
formats. The configuration file used for this document is shown below^[This also
demonstrates another feature of `knitr`: It is possible to
[include external documents](http://yihui.name/knitr/demo/child/) using the `child`
chunk option].
````
```{r configFile, child="example.pandoc"}
```
````
This file contains one block with format specific options for each output format,
starting with `t: <format>`. Note that the first block has no target format specification
and contains options that apply to all output formats. The use of a configuration file
like this makes it easy to manage the (potentially large) number of options required to
achieve the desired output.
# Preparing manuscripts for publication
Once an analysis has been completed and documented using techniques like the ones
described above it may be desirable to use it as part of a publication without having
to re-write it all. The purpose of this chapter is to investigate how well the authoring
of scientific papers in Markdown is supported by the combination of `knitr` and `pandoc`
and to demonstrate customisations to the default set-up where rquired.
## Requirements
To be able to produce manuscripts that are suitable for submission to scientific journal
several features are required. The purpose of this chapter is to explore to what extend
the combination of `knitr` and `pandoc` can deliver a publication ready manuscripts and
discuss simple extensions to add or enhance required features.
Features essential to for a manuscript intended for submission to a journal are
* References need to be cited throughout the text and listed at the end in a format
specified by the journal.
* Figures and tables need to be numbered and cross-references to these should be
generated automatically, i.e. the numbers referred to in the text are updated
automatically if the order of figures or tables changes.
* Equations need to be rendered appropriately, numbered and cross-referenced where
required.
* A list of author names and affiliations needs to be displayed as part of the title block.
* It has to be possible to preceed the main text with an abstract that may have to be
formatted differently from the body of the manuscript.
* Support for footnotes.
## Document meta information
Documents may contain metadata blocks in [YAML](http://www.yaml.org/) format. These
blocks begin with three dashes `---` and end with either three dashes `---` or thee
dots `...`. More than one metadata block can be present in the same document in which
case conflicts caused by duplicate fields will be resolved by retaining the field that
occurred first.
Below is the metadata block used for this document.
````yaml
```{r child="metadata.yaml"}
```
````
Information gathered from metadata blocks is used by `pandoc` to populate metadata
fields in the output document. This can be used to set the title, list of authors
and abstract. Entries may contain (nested) lists and objects but note that the default
templates make assumptions about the structure of specific fields. The author field in
particular is expected to be a simple list or string. For the purpose of preparing
reports or publications it may be convenient to use a richer structure, e.g. a list
containing objects for name, affiliation and contact details. See the chapter on
[custom templates](#custom-templates) for details on how this structured author
information might be used.
## Adding a bibliography
Fortunately adding a list of refernces as well as citing them throughout the document
is well supported by `pandoc`. References need to be contained in a bibliography
file, which can be in a variety of formats (check the
[pandoc documentation](http://johnmacfarlane.net/pandoc/README.html#citations) for
a list of supported formats). This file needs to be listed in the `biblography`
entry of the document's meta information. A bibliography consisting of all
references that have been cited throughout the document will be generated by `pandoc` and
added to the end of the document. The bibliography is formatted according to a
format specified in a [CSL](http://citationstyles.org/) style file. A browsable
repository with a large number of different styles is available at
[http://zotero.org/styles](http://zotero.org/styles).
A citation is inserted into the text by adding the corresponding key (consisting of a '@'
followed by the citation's identifier from the database) within square brackets.
For example, `[@smith04]` would add a citation to the article with ID `smith04` to the text
and ensure that the corresponding bibliographic information is listed in the bibliography.
Several variations of this are supported by `pandoc`, see the
[documentation](http://johnmacfarlane.net/pandoc/README.html#citations) for details.
## Better figure and table captions
We already discussed how to generate figures and tables in `knitr` and we have seen
that it is easy to add captions to these. However, so far all the figure and table cations
were plain captions without any numbering. What we would like are figure captions that
start with "Figure", or "Fig.", folowed by a number and a colon. There currently is no
`pandoc` mechanism that allows to generate such captions in multiple output formats but
there is an active and ongoing discussion that may lead to support for this in a future
`pandoc` version. In the meantime we can use R to generate suitable labels when processing
the input document with `knitr`.
The following R function allows us to keep track of figures throughout the document,
create appropriately numbered captions as well as cross-references:
```{r fig_cap, number=TRUE}
figRef <- local({
tag <- numeric()
created <- logical()
used <- logical()
function(label, caption, prefix=options("figcap.prefix"), sep=options("figcap.sep"),
prefix.highlight=options("figcap.prefix.highlight")) {
i <- which(names(tag) == label)
if(length(i) == 0){
i <- length(tag) + 1
tag <<- c(tag, i)
names(tag)[length(tag)] <<- label
used <<- c(used, FALSE)
names(used)[length(used)] <<- label
created <<- c(created, FALSE)
names(created)[length(created)] <<- label
}
if(!missing(caption)){
created[label] <<- TRUE
paste0(prefix.highlight, prefix, " ", i, sep, prefix.highlight, " ", caption)
} else {
used[label] <<- TRUE
paste(prefix, tag[label])
}
}
})
```
To get properly numbered figure captions all arguments to `knitr`'s `fig.cap` chunk
option have to be wrapped in a call to `figRef` with two arguments. The first argument
is the label that should be used to refer to the figure and the second argument is the
actual figure caption. The function is designed to allow some customisation. The prefix,
e.g. 'Figure' or 'Fig.', can be set via the `prefix` argument and the separator to be used
between the number and the caption is set by the `sep` argument. It is also possible
to adjust the formatting of the prefix in the figure caption, e.g. to display it with
strong emphasis. For convenience the desired values can be stored together with other
R options.
Here we are setting the defaults to produce captions of the form
"**Figure N:** caption text".
```{r figOptions}
options(figcap.prefix="Figure", figcap.sep=":", figcap.prefix.highlight="**")
```
Calling this function with the label as its sole argument will create a reference while
a call with two arguments (label and caption text) will create the actual figure caption.
Consider the following example:
```{r carDataPlot, verbatim=TRUE, fig.cap=figRef("carData", "Car speed and stopping distances from the 1920s.")}
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
las = 1)
lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")
```
Now it is possible to refer back to this figure in the text using
\``r rinline("figRef(\"carData\")")`\`: `r figRef("carData")` shows a plot of car speeds
and corresponding stop distances measured in the 1920s. Note the apparent
non-linearity in the data. The log-scale data shown in `r figRef("carLogData")`
has a more linear appearance.
```{r carLogDataPlot, fig.cap=figRef("carLogData", "Car speed and stopping distances on logarithmic scales.")}
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
las = 1, log = "xy")
lines(lowess(cars$speed, cars$dist, f = 2/3, iter = 3), col = "red")
```
Note how this allows to refer to figures before they are are created. Although forward
references should generally be avoided this isn't always possible when it comes to figures.
To ensure that all figures mentioned in the text actually exist the following code
can be added to a `knitr` chunk at the end of the document.
```{r finaliser, eval=FALSE, echo=TRUE}
```
`r figRef("missingFigure")` dosn't exist and this generates a warning at the end
of the document.
The same approach can be used to obtain numbered table captions and corresponding
references in the text.
```{r tab_cap, number=TRUE}
tabRef <- local({
tag <- numeric()
created <- logical()
used <- logical()
function(label, caption, prefix=options("tabcap.prefix"), sep=options("tabcap.sep"),
prefix.highlight=options("tabcap.prefix.highlight")) {
i <- which(names(tag) == label)
if(length(i) == 0){
i <- length(tag) + 1
tag <<- c(tag, i)
names(tag)[length(tag)] <<- label
used <<- c(used, FALSE)
names(used)[length(used)] <<- label
created <<- c(created, FALSE)
names(created)[length(created)] <<- label
}
if(!missing(caption)){
created[label] <<- TRUE
paste0(prefix.highlight, prefix, " ", i, sep, prefix.highlight, " ", caption)
} else {
used[label] <<- TRUE
paste(prefix, tag[label])
}
}
})
options(tabcap.prefix="Table", tabcap.sep=":", tabcap.prefix.highlight="**")
```
This can then be combined with the `pander` table generation technique demonstrated
above.
```{r carFitPlot, fig.cap=figRef("carFit", "Polynomial regression fits."), results="asis"}
plot(cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
las = 1, xlim = c(0, 25))
d <- seq(0, 25, length.out = 200)
for(degree in 1:4) {
fm <- lm(dist ~ poly(speed, degree), data = cars)
assign(paste("cars", degree, sep = "."), fm)
lines(d, predict(fm, data.frame(speed = d)), col = degree)
}
legend("topleft", legend=1:4, col=1:4, lty=1)
set.caption(tabRef("carFit", "ANOVA table for polynomial regression fits to car speed and stopping distance data"))
pander(anova(cars.1, cars.2, cars.3, cars.4))
```
This approach to figure and table captions has the advantage that it works for any
output format and as such is well suited for situations where multiple output formats
are required. The downside is that it doesn't make use of any cross-referencing facilities
that may be supported by one or several of the output formnats. For example, LaTeX has
excellent support for this already and HTML output would benefit from the use of links
to the actual figure or table. When a single output format is used it clearly makes
sense to utilise the features it provides as much as possible. The multi-format
approach presented here could be improved through the addition of some additional
markup and a [pandoc filter](http://johnmacfarlane.net/pandoc/scripting.html) that
turns this markup into format specific output.
## Structured author information
By default `pandoc` only supports simple strings (or a list of strings) for the
author field in the metadata block. This means that including information in
addition to author names, e.g. affiliations and addresses, is dificult (but note
that strings are interpreted as markdown so some formatting is possible). To really
support the generation of publication ready documents the use of more structured
author fields is desirable. For this document we use author information of the
following form:
```{.yaml}
author:
- name: Author Name
affiliation: 1
address:
- code: 1
address: Department, Institution, Street address
```
Other fields could be added to this, e.g. to indicate corresponding authors. In
order for this additional information to be displayed in the output we need to
extend the default templates to use this information. The required changes
to the the HTML and LaTeX templates are discussed [below](#structured-author).
## Equations
There is good support for formula rendering in `pandoc`. Formulas can be written
as TeX formulas between `$` (for inline math) or `$$` (for display math). These
will be rendered in an output format specific way. In some formats, like HTML,
the result depends on the command-line options used.
----
```markdown
This is a familiar bit of inline math: $E=mc^2$.
```
This is a familiar bit of inline math: $E=mc^2$.
----
----
```markdown
Here is an equation in display math mode:
$$f(x, \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
```
Here is an equation in display math mode:
$$f(x, \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}\,e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
----
While this generally works fairly well it doesn't allow for numbered equations.
A possible workaround is to use `pandoc`'s example list feature for this purpose.
An axample list consists of consecutively numbered list elements that don't have to
be placed within the same list, i.e. they can be placed throughout the document.
----
```markdown
(@cauchy) $f(x) = \frac{1}{\pi(1+x^2)}$
The Cauchy distribution (with density given in Eq. (@cauchy)) is a special case
of Student's $t$-distribution (Eq. (@tdist)) with $\nu = 1$.
(@tdist) $$f(t; \nu) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{-\frac{\nu+1}{2}}$$
```
(@cauchy) $f(x) = \frac{1}{\pi(1+x^2)}$
The Cauchy distribution (with density given in Eq. (@cauchy)) is a special case
of Student's $t$-distribution (Eq. (@tdist)) with $\nu = 1$.
(@tdist) $$f(t; \nu) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})}\,\left(1+\frac{t^2}{\nu} \right)^{-\frac{\nu+1}{2}}$$
----
While this solves the basic problem of getting numbered equations it isn't perfect. Equations
are not centred and numberes appear on the left rather than the right as is customary.
An additional problem when using display math (as in Eq. (@tdist)) is that the number
and the equation are not lined up properly. It is possible to fix this in HTML output
through the use of appropriate CSS but that doesn't help for other output formats.
----
```markdown
<div class="equation">
(@gamma) $$\Gamma(t) = \int^\infty_0 x^{t-1}e^{-x}dx$$
</div>
```
<div id="gamma" class="equation_css">
(@gamma) $$\Gamma(t) = \int^\infty_0 x^{t-1}e^{-x}dx$$
</div>
----
In the above example the equation is wrapped in a `div` with class "equation". This allows
application of suitable CSS to improve the alignment of the equation.
Horizontal alignment of the formula is relatively straightford with CSS so it can be
centred without too much difficulty in HTML output. Proper alignment with the automatically
generated number proves to be more difficult. The following javascript code does the trick.
````{.javascript .numberLines}
```{r child="include/equation.js"}
```
````
This isn't particularly elegant and only solves the problem for HTML.
For LaTeX the lack of proper equation handling is particularly unsatisfying as
LaTeX has much better support for equations. Again using a
[filter](http://johnmacfarlane.net/pandoc/scripting.html) for additional
processing to produce equations that are better suited to the output format. This should
allow the use of LaTeX equation environments in LaTeX output and could be used
to produce better HTML output as well. This filter can also make use of the `div`
introduced above. See [below](#pandoc-filters) for an example of how this can be achieved.
## Footnotes
The use of footnotes is well supported by `pandoc`. The easiest way to add a footnote is
the inline syntax.
----
```markdown
This is regular text^[with a footnote].
```
This is regular text^[with a footnote].
----
It is also possible to use labels to identify a footnote, similar to the way references work.
----
````markdown
When using the reference style a short label is present in the text[^1] and the
actual footnote text is defined elsewhere.
[^1]: This is closer to the appearance of the rendered text in the output but updating
the footnote text is a little bit more work since it may be somewhere else in the document.
The advantage of this format is that it supports multi-block content. It could even
contain a code block if desired.