-
Notifications
You must be signed in to change notification settings - Fork 0
/
main.tex
610 lines (439 loc) · 72.3 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
\documentclass{sbml-paper}
% This create the box section. We should probably be more clever and use the titlesec package from l3-packages-table. Everything will be reformatted before submission anyway
\newcounter{mybox}
\newcommand{\mybox}[1]{%
\refstepcounter{mybox}%
\noindent\textbf{\Large \rule{1cm}{0.4pt}~Box~\themybox{}. #1~\hrulefill}%
}%
\makeatletter
\renewcommand\Affilfont{\small}
% \renewcommand\maketitle{\AB@maketitle} % revert \maketitle to its old definition
\renewcommand\AB@affilsepx{\quad\protect\Affilfont} % put affiliations into one line
\hypersetup{
pdfpagemode={UseNone},
pdfcenterwindow={true},
pdfview={FitV},
pdffitwindow={true},
pdfwindowui={false},
pdfstartview={FitV},
pdfnewwindow={false},
pdfdisplaydoctitle={true},
pdfhighlight={/P},
pdflang={en},
unicode={true},
}
\AtBeginDocument{\hypersetup{pdftitle=\@title, pdfsubject={Molecular Systems Biology}}}
\makeatother
% The following stops LaTeX from generating an error for empty
% list/enumeration environments. This is not needed for our paper,
% but some bizarre issue with one of our box environments causes
% latexdiff to create an empty enumerate, and *that* fails when we
% try to run latex on the diff file. So the purpose of the following
% is only to make it possible to run latexdiff later.
%
% This particular solution for empty lists is from user "Ian Thompson"
% posted to https://tex.stackexchange.com/a/43742/8318
\makeatletter
\let\@noitemerr\relax
\makeatother
% ======================================================================
\begin{document}
\title{SBML Level 3: an extensible format for the exchange and reuse of biological models}
\author[1,2,3]{Sarah M. Keating*}
\author[4]{Dagmar Waltemath*}
\author[5]{Matthias K\"{o}nig}
\author[6]{Fengkai Zhang}
\author[7,8,9]{Andreas Dr\"{a}ger}
\author[10,11]{Claudine Chaouiya}
\author[3]{Frank T. Bergmann}
\author[12]{Andrew Finney}
\author[13]{Colin S. Gillespie}
\author[14]{Tom\'{a}\v{s} Helikar}
\author[15]{Stefan Hoops}
\author[2]{Rahuman S. Malik-Sheriff}
\author[16]{Stuart L. Moodie}
\author[17]{Ion I. Moraru}
\author[18]{Chris J. Myers}
\author[19]{Aur\'{e}lien Naldi}
\author[1, 3, 20]{Brett G. Olivier}
\author[3]{Sven Sahle}
\author[21]{James C. Schaff}
\author[1, 22]{Lucian P. Smith}
\author[23]{Maciej J. Swat}
\author[19]{Denis Thieffry}
\author[18]{Leandro Watanabe}
\author[13, 24]{Darren J. Wilkinson}
\author[17]{Michael L. Blinov}
\author[26]{Kimberly Begley}
\author[27]{James R. Faeder}
\author[28]{Harold F. G\'{o}mez}
\author[7, 8]{Thomas M. Hamm}
\author[29]{Yuichiro Inagaki}
\author[30]{Wolfram Liebermeister}
\author[31]{Allyson L. Lister}
\author[32]{Daniel Lucio}
\author[33]{Eric Mjolsness}
\author[34]{Carole J. Proctor}
\author[35, 36, 37]{Karthik Raman}
\author[38]{Nicolas Rodriguez}
\author[39]{Clifford A. Shaffer}
\author[40]{Bruce E. Shapiro}
\author[41]{Joerg Stelling}
\author[42]{Neil Swainston}
\author[43]{Naoki Tanimura}
\author[44]{John Wagner}
\author[6]{Martin Meier-Schellersheim}
\author[22]{Herbert M. Sauro}
\author[45]{Bernhard Palsson}
\author[46]{Hamid Bolouri}
\author[47, 49]{Hiroaki Kitano}
\author[48]{Akira Funahashi}
\author[2]{Henning Hermjakob}
\author[1]{John C. Doyle}
\author[1]{Michael Hucka}
\author[50]{SBML Level~3 Community members}
\affil[1]{Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125, US\authorcr}
\affil[2]{European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK\authorcr}
\affil[3]{BioQuant/COS, Heidelberg University, Heidelberg 69120, DE\authorcr}
\affil[4]{Medical Informatics, Institute for Community Health, University Medicine Greifswald, Greifswald, DE\authorcr}
\affil[5]{Institute for Theoretical Biology, Humboldt-University Berlin, Berlin, 10115, DE\authorcr}
\affil[6]{Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, US\authorcr}
\affil[7]{Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, Institute for Biomedical Informatics (IBMI), University of T\"{u}bingen, 72076 T\"{u}bingen, DE\authorcr}
\affil[8]{Department of Computer Science, University of T\"{u}bingen, 72076 T\"{u}bingen, DE\authorcr}
\affil[9]{German Center for Infection Research (DZIF), partner site T\"{u}bingen, DE\authorcr}
\affil[10]{Aix-Marseille Universit\'{e}, CNRS, Centrale Marseille, I2M, Marseille, 13288, FR\authorcr}
\affil[11]{Instituto Gulbenkian de Ci\^{e}ncia, Oeiras, P-2780-156, PT\authorcr}
\affil[12]{ANSYS UK Ltd, UK\authorcr}
\affil[13]{School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK\authorcr}
\affil[14]{Department of Biochemistry, University of Nebraska--Lincoln, Lincoln, Nebraska 68588, US\authorcr}
\affil[15]{Biocomplexity Institute \& Initiative, University of Virginia, Charlottesville, Virginia 22911, US\authorcr}
\affil[16]{Eight Pillars Ltd, 19 Redford Walk, Edinburgh EH13 0AG, UK\authorcr}
\affil[17]{Center for Cell Analysis and Modeling, UConn Health, Farmington, Connecticut 06030, US\authorcr}
\affil[18]{Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT 84112, US\authorcr}
\affil[19]{Institut de Biologie de l'ENS (IBENS), D\'{e}partement de Biologie, \'{E}cole Normale Sup\'{e}rieure, CNRS, INSERM, Universit\'{e} PSL, 75005 Paris, FR\authorcr}
\affil[20]{SysBioLab, AIMMS, Vrije Universiteit Amsterdam, De Boelelaan 1085, NL-1081HV Amsterdam, NL\authorcr}
\affil[21]{Applied BioMath, LLC, Concord, Massachusetts 01742, US\authorcr}
\affil[22]{Department of Bioengineering, University of Washington, Seattle, Washington, US\authorcr}
\affil[23]{Simcyp (a Certara company), UK\authorcr}
\affil[24]{The Alan Turing Institute, British Library, London, NW1 2DB, UK\authorcr}
% \affil[25]{Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT 06030, US\authorcr}
\affil[26]{Consultant, California Institute of Technology, Pasadena, California 91125, US\authorcr}
\affil[27]{\textls[-5]{Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, US}\authorcr}
\affil[28]{Department of Biosystems Science and Engineering, ETH Z\"{u}rich, Mattenstrasse 26, 4058, Basel, CH\authorcr}
\affil[29]{Management \& IT Consulting Division, Mizuho Information \& Research Institute, Inc., 2-3, Kanda-Nishikicho, Chiyoda-ku, Tokyo, 101-8443, JP\authorcr}
\affil[30]{Universit\'{e} Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France\authorcr}
\affil[31]{Oxford e-Research Centre (OeRC), Department of Engineering Science, University of Oxford, Oxford, UK\authorcr}
\affil[32]{College of Sciences, NC State University, Raleigh, North Carolina 27695, US\authorcr}
\affil[33]{Department of Computer Science, University of California, Irvine, California 92697, US\authorcr}
\affil[34]{Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, NE4 5PL, UK\authorcr}
\affil[35]{Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai - 600 036, IN\authorcr}
\affil[36]{Initiative for Biological Systems Engineering (IBSE), IIT Madras, IN\authorcr}
\affil[37]{Robert Bosch Centre for Data Science and Artificial Intelligence (RBC-DSAI), IIT Madras, IN\authorcr}
\affil[38]{The Babraham Institute, Cambridge, CB22 3AT, UK\authorcr}
\affil[39]{Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, US\authorcr}
\affil[40]{Department of Mathematics, California State University, Northridge, California 91325, US\authorcr}
\affil[41]{Department of Biosystems Science and Engineering and SIB Swiss Institute of Bioinformatics, ETH Z\"{u}rich, 4058 Basel, CH\authorcr}
\affil[42]{Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom\authorcr}
\affil[43]{Science Solutions Division, Mizuho Information \& Research Institute, Inc., 2-3, Kanda-Nishikicho, Chiyoda-ku, Tokyo 101-8443, JP\authorcr}
\affil[44]{IBM Research Australia, Melbourne, AU\authorcr}
\affil[45]{Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, US\authorcr}
\affil[46]{Systems Immunology, Benaroya Research Institute at Virginia Mason, 1201 Ninth Avenue, Seattle, Washington 98101, US\authorcr}
\affil[47]{The Systems Biology Institute, Tokyo, JP\authorcr}
\affil[48]{Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa 223-8522, JP\authorcr}
\affil[49]{Okinawa Institute of Science and Technology, Okinawa, JP\authorcr}
\affil[50]{A complete list of members and affiliations appears in the Supplementary Note}
\maketitle
\centerline{*\thinspace{}These authors contributed equally.}
\begin{abstract}
Systems biology has experienced dramatic growth in the number, size and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of SBML (the Systems Biology Markup Language), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models, and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multiscale models of whole cells and organs, and new data sources such as single cell measurements and live imaging, have precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level 3 provides the foundation needed to support this evolution.
\vspace*{0.25in}\noindent
\begin{tabular}{@{}lm{3in}@{}}
\textbf{MSB subject category}: & Methods \& Resources \\
\textbf{Keywords}: & computational modeling / interoperability / reproducibility / file format / systems biology\\
\textbf{Running title}: & SBML Level 3\\
\textbf{Abstract word count}: & 174 \\
\textbf{Main body word count}: & 5600 (excluding tables, boxes, acknowledgments)\\
\end{tabular}
\end{abstract}
\clearpage
% ======================================================================
\section*{Introduction}
% ======================================================================
Systems modeling and numerical simulations in biology can be traced to the mid-20\textsuperscript{th} century. Though general theorizing about systems began earlier, the application of systems analysis to biology gained attention in the 1950's thanks to the work of biologists such as von~Bertalanffy and Kacser~\citep{Von_Bertalanffy1950-dy, Kacser1957-ox}. The era of numerical simulation in biology truly began with the landmark works of Chance on enzyme kinetics~\citep{chance1940kinetics}, Hodgkin and Huxley on the molecular basis of neuronal transmission~\citep{hodgkin1952quantitative}, and Turing on the chemical basis of morphogenesis~\citep{turing1990chemical}. Since then, the number and variety of models have grown in all of the life sciences. As precise descriptions of phenomena that can be simulated, analyzed, and compared to experimental data, models provide unique insights that can confirm or refute hypotheses, suggest new experiments, and identify refinements to the models.
The availability of more data, more powerful modeling methods, and dramatically increased computing power led to the rise of systems biology as a compelling research theme around the turn of the millennium~\citep{kitano2000perspectives, ideker2001new}. Though computational models were at first published as printed equations in journal articles, the desire to reuse an ever-increasing number of models called for digital formats that were interoperable between software systems and could be easily exchanged between scientists~\citep[topics of interest as early as the 1960's; c.f.][]{Garfinkel1969construction}. This drove efforts to create tool-\emph{independent} ways of representing models that could avoid the potential for human translation errors, be stored in databases, and provide a common starting point for simulations and analyses regardless of the software used~\citep{Lloyd2004-fd, Goddard2001-ix, hucka_2001}. One such effort was SBML, the Systems Biology Markup Language. Its initial design was motivated by discussions to create a ``metabolic model file format'' following a 1999 workshop~\citep[recounted by][]{kell2008the}. A distributed community thereafter discussed ideas that informed work at Caltech in late 1999/early 2000 and led (after a series of public drafts) to the specification of the official version of SBML Level 1 Version 1 being released in March 2001~\citep{hucka_2003}.
While SBML was initially developed to exchange compartmental models of biochemical reaction networks primarily formulated in terms of chemical kinetics~\citep{hucka_2001}, it was always understood that there existed more types of models than the initial version of SBML could represent explicitly. However, seeking community consensus on a limited set of simpler features, which could be readily implemented in software at the time, was deemed a more pragmatic strategy. A deliberate decision was taken to delay the addition of more advanced capabilities to a later time. As a result, SBML has evolved in stages in a community-driven fashion that has benefited from the efforts of many researchers worldwide over two decades. As time passed, the need to support a broader range of model types, modeling frameworks, and research areas became apparent. SBML's success in serving as an interchange format for basic types of models led communities of modelers to ask whether it could be adapted or expanded to support more types. In addition to reaction-diffusion models, alternative modeling frameworks have risen in popularity in the past decade~\citep{Machado2011modelinga}, and researchers have faced interoperability problems between software tools developed for their use. These needs drove a profound change in SBML's structure: a facility to permit layering the core of SBML with new features suited to more types of models, together with a way for individual models to identify which sets of extensions they need for proper interpretation. The release of SBML Level~3~\citep{Hucka2010a} has provided a new foundation to enable the exchange of a greater variety of models in various domains of biology (Figure~\ref{level-3-diagram}).
In the rest of this article, we begin by summarizing SBML's general structure, then describe the modularity introduced in Level~3 and the wide range of modeling formalisms supported by Level~3 packages. We follow that by describing the community aspects of SBML development. We continue with a discussion of SBML's impact on both computational modeling and the modeling community, and finally, we close with a discussion of forthcoming challenges.
\begin{figure}[b]
\center
\includegraphics[width=\textwidth]{resources/SBML-Level3-v12.pdf}
\caption{SBML Level 3~\citep{Hucka2019systems} consists of a core (center) and specialized SBML Level~3 \emph{packages} (in blue), which provide syntactical constructs to support additional modeling approaches. The packages support new types of modeling (in the gray boxes) needed for large and complex models such as those used in various domains and fields of biology (in the light red boxes). The meanings of SBML package labels such as ``fbc'' are given in \autoref{packages}, with additional package information in Box~1.}
\label{level-3-diagram}
\end{figure}
\clearpage
\newpage
% ======================================================================
\section*{The structure of SBML}
\label{sec:sbml}
% ======================================================================
The core of SBML is focused on encoding models in which entities are located in containers and are acted upon by processes that modify, create or destroy entities. The containers do not need to correspond to physical structures; they can be conceptual or abstract. Additional constructs allow parameters, initial conditions, other variables, and other mathematical relationships to be defined (Figure \ref{fig:examples-sbml}A). In the most common type of model, the ``entities'' are biochemical substances, the ``containers'' are well-mixed and spatially homogeneous, and the ``processes'' are biochemical reactions happening within or between the containers. This originally led to the SBML constructs being named \emph{species}, \emph{compartments}, and \emph{reactions}, respectively (Figure \ref{fig:examples-sbml}B), but these names are historical artifacts and belie the generality of the underlying scheme. Software applications can map the names to other concepts to better suit their purposes. For instance, ``species'' could be mapped to populations of molecules, cells, or even organisms.
Modelers and software developers are encouraged to use SBML's reaction construct to define a model's behavior in preference to formulating the model explicitly as a system of equations. This gives users freedom to convert the model into the final format they prefer---a simpler operation than (for example) inferring a reaction network from a system of differential equations. More importantly, the approach also naturally handles models where reaction kinetics are unknown or unneeded, such as interaction maps, and supports the elaboration of the reaction construct using SBML packages (discussed below). That said, the use of reactions is optional, and SBML provides features sufficient for encoding a large diversity of purely mathematical models, too. Whether using reactions or not, values of model variables and their changes over time may be fixed or determined by mathematical expressions, either before or during simulation, continuously or in response to discrete events, with or without time delays. Units of measurement can be specified for all entities and values; in addition to adding a layer of essential physical knowledge (after all, how else could one interpret whether a time course is in milliseconds or years?), information about units can be used to verify the relationships expressed in a model. Units also facilitate reuse of models and components, interconnection of models, conversion of models between different frameworks, and integration of data with models.
SBML does not dictate which framework must be used to analyze or simulate a model; in fact, it purposefully lacks any explicit way to specify what is done with a model---whether to run simulations or other types of analyses, how to run them, or how to present the results---because externalizing this information enhances model reusability and permits independent innovation in separate but complementary formats. Two of the most popular methods for time-course simulation are commonly used: one is numerical integration of differential equations created from the reactions and other relationships affecting model variables, and the other is simulating the time evolution of the model as a stochastic system via algorithms such as the one developed by~\cite{gillespie1977exact}. Alternative approaches are also in use, particularly when a model is enhanced with SBML packages.
Any element of an SBML model can be elaborated using machine-readable metadata as well as human-readable notes. For metadata, two schemes are supported. The first is direct labeling of SBML elements with terms from the Systems Biology Ontology~\citep[SBO;][]{courtot2011controlled}, which allows the mathematical semantics of every element of a model to be precisely specified. The second scheme uses semantic web technologies and provides greater flexibility to capture additional metadata. For instance, a molecular species in a model can be linked to a UniProt entry~\citep{uniprot2017} if it represents a protein, or to ChEBI entry~\citep{hastings2013chebi} if it represents a simple chemical. Gene Ontology terms~\citep[GO;][]{ashburner2000gene} can be attached to species, compartments, and mathematical elements representing biological processes and functions. Simple provenance data such as identities of creators can be added to facilitate attribution and versioning. To help standardize how annotations are stored, SBML encourages the use of guidelines and resources established for this purpose~\citep{le_novere_2005}. Finally, software tools can also use annotations to encode tool-specific data in their own formats, thus providing a way to capture data that might otherwise be lost. Annotations thereby help enrich the meaning of model components, facilitate the understanding and reuse of models, and help software work with SBML more flexibly~\citep{Neal2019harmonizing}.
The core features described above have been a backbone of SBML ever since Level~2, even as SBML continued to evolve. The development of the modular Level~3, discussed in the next section, provided an opportunity to rethink and redesign a few other rarely-used features. For example, the species \emph{charge} attribute, designed to represent molecular charge, was removed in Level~3 in favor of letting an SBML package introduce more complete support for the relevant concepts.
\begin{figure}[p]
\center
\includegraphics[width=0.9\textwidth]{resources/SBML_XML_example_v04.pdf}
\caption{A closer look at SBML. (A) Fragments of the global structure of an SBML file. In this example, the use of several SBML packages is declared in the file header. Model elements in the file include the descriptions of model variables, as well as their relationships. Elements of the same type are collected into ``ListOf'' elements; \eg model parameters are in the ListOfParameters element. SBML package elements can refer to elements in the SBML Core as necessary. (B) Model elements are linked through unique identifiers used in the mathematical constructs and the elements describing the reactions, the molecular species, and their localization. The full model for this example is available in BioModels Database~\citep{Malik-Sheriff2020biomodels} as the model with identifier \href{https://identifiers.org/biomodels.db:MODEL1904090001}{MODEL1904090001}.}
\label{fig:examples-sbml}
\end{figure}
\clearpage
\newpage
% ======================================================================
\section*{SBML Level 3's modularity and breadth}\label{sec:modularity}
% ======================================================================
Constant evolution in scientific methods presents challenges for the creation of software tools and standards. One challenge arises because the creation of new standards requires labor, testing, and time. This often causes standardization efforts to lag behind the latest technical developments in a constantly-moving field. A second challenge is that users want support for new methods and standards in software tools, which pressures developers to implement support quickly. Combined with the first challenge, it means that sometimes problems with a standard's definition are not discovered until more developers attempt to use it in different situations, which in turn often means that revisions to a standard are needed after it is published. Finally, another challenge is that software development often takes place under resource constraints (\eg funding and time), limiting the scope of work that software developers can undertake---including, sometimes, limiting how many features of a standard they can support in their software.
The SBML community sought to address these challenges by putting in place certain structural features in SBML's development process. The first is the notion of \emph{Levels}. A Level in SBML is an attempt to provide a given set of features for describing models, with higher Levels providing more powerful features. For example, the ability to express discrete events was added to SBML Level~2 but does not exist in Level~1. SBML Levels are mostly upwardly compatible, in the sense that the vast majority of models encoded in Level $n$ can be translated to Level $n+1$. \emph{Versions} are used to introduce refinements to a given Level to account for realizations that come from real-life use of SBML. Finally, SBML Level~3 introduced an extensible modular architecture consisting of a central set of fixed features (named \emph{SBML Level~3 Core}), and a scheme for adding \emph{packages} that can augment the \emph{Core} by extending existing elements, adding new elements, and adjusting the meaning or scope of elements. A model declares which packages it uses in order to guide its interpretation by software applications. If a software tool detects the presence of packages that it does not support, it may inform users if it cannot work with the model. Together, these three features (Levels, Versions, packages) help address the challenges discussed above: they ease coping with evolution in methods by collecting significant changes into discrete stages (SBML Levels), they help deal with the inevitable need for revisions (Versions within Levels), and they allow developers to limit the feature set they implement (SBML Levels on the one hand, and SBML Level~3 packages on the other).
Packages allow SBML Level~3~\citep{Hucka2019systems} to represent many model types and characteristics in a more natural way than if they had to be shoehorned into SBML Core constructs exclusively. Twelve packages have been proposed to date (\autoref{packages}); seven have been fully developed into consensus specifications and are each used by at least two software implementations (Box~\ref{box:packages}), and another three have draft specifications in use by software tools. New packages can be developed independently, within dedicated communities, at a pace that suits them. This was the case for logical modeling with the CoLoMoTo community~\citep{naldi2015cooperative}, constraint-based modeling within the COBRA community~\citep{Heirendt2019creation}, and rule-based modeling with a community of like-minded software creators~\citep{faeder2009rule, Palmisano2014multistate, zhang2013simmune, boutillier2018kappa}.
Several benefits accrue from leveraging SBML as a starting point rather than creating a new, independent format. One is it makes clear where common features overlap. Most computational modeling frameworks in the domain of biology share some common concepts---variables that represent characteristics of different kinds of entities, processes that represent interactions between entities, containers/locations, etc.---and reusing SBML Level~3 Core constructs makes the conceptual similarities explicit. This in turn makes interpretation of models easier (no need to learn new terminology) and reuse simpler (no need to translate between independent formats). Another benefit is that the creators of the format can leverage existing features developed for SBML, such as mechanisms for annotations, rather than spend time developing new approaches to achieving the same goals in a new format. This in turn leads to another benefit: the ability to reuse at least some parts of existing software libraries developed for SBML. It also means that a software application may be able to interpret at least \emph{some} fundamental aspects of a model even if the application is not designed to work with a particular SBML Level~3 package, by virtue of understanding SBML Core (and perhaps other packages used by the model). This improves the potential for model reuse, and benefits model creators and software developers alike. Finally, a common foundation simplifies the creation of multiframework models in which some parts of the model use one formalism and other parts use others~\citep[e.g., coupling kinetic models with flux-balance analysis;][]{Watanabe2018dynamic}.
Though this modular approach has benefits, it is not without potential pitfalls. The main risks are fragmentation of the community, and incompatibility of packages due to complex feature dependencies. The SBML community has addressed the former by maintaining communications between package developers; the community processes have such interactions built in. As for the latter, API libraries (see Box~\ref{box:software}) can handle \emph{some} combinations of packages and hide some of the complexity. Still, there remain some combinations of packages that are not fully understood, and it remains for future work to define how (if ever) they can be combined for use in a single model.
\clearpage
\newpage
\input{packages-table}
\clearpage
\newpage
% **********************************************************************
\mybox{SBML Level~3 packages officially part of the standard}\label{box:packages}
% **********************************************************************
\textbf{Hierarchical Model Composition}~~~~The ``comp'' package~\citep{Smith2015} allows users to build models from other complete models or from model fragments, as a way to manage complexity and construct composite models. ``Submodels'' can be described within the same SBML file or linked from external files. A submodel can act as a template, and the same definition can be reused multiple times in other models to avoid duplication and enable reuse of parts. The ``comp'' package also enables submodels to have explicit interfaces (known as \emph{ports}) for optional black-box encapsulation. Finally, ``comp'' was designed so that a hierarchical model can be converted into a single SBML model that does not use any ``comp'' features, making it readable by software that does not directly support the package. The library libSBML~\citep{bornstein2008libsbml} provides a facility to do this.
\textbf{Flux Balance Constraints}~~~~The ``fbc'' package~\citep{Olivier2018a} provides a means of encoding constraint-based models and optimizations, such as is done in Flux Balance Analysis ~\citep{Bordbar2014a}. Constructs in the ``fbc'' package allow for the definition of a list of objectives for minimization or maximization, as well as flux bounds on reactions and gene-reaction mappings. Additional information such as chemical formula and charge enable further model analyses, including calculation of reaction mass balances, electron leaks, or implausible sources of matter.
\textbf{Groups}~~~~The ``groups'' package~\citep{hucka2016sbml} provides constructs to describe conceptual relationships between model elements. Groupings can indicate classification, partonomy, or merely a collection of things; a group's meaning can be specified using semantic annotations. Groups have no semantic meaning and cannot influence the mathematical interpretation of an SBML model.
\textbf{Multistate, Multicomponent and Multicompartment Species}~~~~The ``multi'' package~\citep{zhang2018multi} manages the combinatorics produced by entities either composed of multiple components, such as molecular complexes, or that can exist in multiple states, such as proteins with post-translational modifications. With the ``multi'' package, rules can be defined for how reactions depend on the states of the entities and their locations. The package adds syntactic constructs for molecular species types, compartment types, features, binding sites, and bonds. Entire families of molecular complexes sharing certain properties can be defined using patterns created using these constructs.
\textbf{Qualitative Models}~~~~The ``qual'' package~\citep{Chaouiya2015sbml} provides constructs to encode models whose dynamics can be represented by discrete, reachable states connected by state transitions denoting qualitative updates of model elements. Examples include logical regulatory networks (Boolean or multivalued) and Petri nets. The ``qual'' package introduces SBML elements to allow the definition of qualitative species, which are used to associate discrete levels of activities with entity pools, as well as transitions, which define the possible changes between states in the transition graph.
\textbf{Layout and Rendering}~~~~The ``layout''~\citep{Gauges2015} and ``render''~\citep{Bergmann2018sbml} packages extend SBML to allow graphical representations of networks or pathways to be stored within SBML files. The ``layout'' package enables the encoding of positions and sizes of graphical elements such as nodes and lines, while the information about colors, fonts, etc., are defined by the ``render'' package. This separation presents several advantages. For example, applications can offer multiple styles for visualizing the same layout of a network map. Most of the essential aspects of a network diagram can be expressed using just the ``layout'' package, and thus tools do not necessarily have to implement a full graphics environment if they do not need to support customizing a diagram's look-and-feel.
\hrulefill
\newpage
% ======================================================================
\section*{SBML as a community standard}
% ======================================================================
SBML's success can be attributed largely to its community-based development and its consensus-oriented approach. SBML has always been developed through engagement with its user community to achieve goals expressed by that same community. To resolve occasionally conflicting technical demands, a guiding principle has been to seek consensus between different viewpoints and the needs of different groups, to find a middle ground that would be---while perhaps not a perfect solution---an \emph{acceptable} and \emph{usable} solution. This attracted the researchers and software developers who constitute SBML's foremost stakeholders. By using SBML in everything from software to textbooks, they helped drive further development to face the real needs expressed by the people who have those needs. This engagement allowed faster feedback from users to developers, and has helped produce a rich toolkit of software and other resources that facilitate SBML's incorporation into software (Box~\ref{box:software}).
Over the years, the community has designed rules to organize its governance, develop and maintain the specifications, and facilitate collaboration among users. The development of SBML and its Level~3 packages is shepherded by the SBML Editors, a group of community-elected volunteers serving terms of three years who follow a written and public process detailed on the web portal SBML.org\footnote{\url{http://sbml.org/Documents/SBML_Development_Process}}. SBML Editors write or review SBML specification documents, organize discussions and vote on specific technical issues, and enact the decisions of the community. Major proposed changes to the specifications and packages are discussed by the community via the SBML mailing lists\footnote{\url{http://sbml.org/Forums/}} as well as during annual face-to-face meetings.
The community currently comes together twice a year within the context of meetings organized by COMBINE~\citep[the Computational Modeling in Biology Network;][]{Hucka2015promotinga}. \emph{HARMONY} (the Hackathon on Resources for Modeling in Biology) is a codefest that focuses on the development of software, in particular via the development of libraries, tools, and specifications; by contrast, the \emph{COMBINE Forum} meetings focus on the presentation of novel tools and the discussion of proposed features. In addition to these general meetings, special SBML working groups are organized as needed to drive SBML package development. COMBINE's central activity is coordinating and harmonizing standardization in computational biology, and SBML is one of its core standards. FAIRsharing, a broader community network that covers life sciences more comprehensively~\citep{Sansone2019fairsharing}, maintains interconnected and organized collections of resources in many areas, including curated links between SBML and many associated funders, databases, and standards\footnote{\url{https://fairsharing.org/FAIRsharing.9qv71f}}.
\clearpage
\newpage
% **********************************************************************
\mybox{Software infrastructure for SBML}\label{box:software}
% **********************************************************************
\newcommand{\tighturl}[1]{\textls[-35]{\url{#1}}}
\begin{minipage}{\textwidth}
\begin{tabular}{@{}c|c@{}}
\textbf{Application Programming Interface (API)} & \textbf{Test Suite} \\
\begin{tabular}{P{3in}}
Open-source (LGPL) libraries and code generators help read, write, manipulate, validate, and transform SBML. They support all Levels and Versions of SBML, and all Level~3 packages.
\begin{enumerate}
\item LibSBML~\citep{bornstein2008libsbml} (\tighturl{http://sbml.org/Software/libSBML}), written in C++, offers interfaces for C, C++, C\#, Java, JavaScript, MATLAB, Octave, Perl, PHP, Python, R, and Ruby
\item JSBML~\citep{Rodriguez2015} (\tighturl{http://sbml.org/Software/JSBML}) offers a pure Java API
\item Deviser (\tighturl{http://sbml.org/Software/Deviser}) generates libSBML code for rapid package prototyping
\end{enumerate}
\end{tabular}
&
\begin{tabular}{P{3in}}
The SBML Test Suite (\tighturl{http://sbml.org/Software/SBML_Test_Suite})
helps developers implement SBML compatibility and helps users check SBML features supported in software.
\begin{enumerate}
\item Thousands of test cases for
\begin{itemize}
\item Semantic interpretation of models (for both deterministic and stochastic simulation)
\item Syntactic correctness
\end{itemize}
\item A graphical front end enables cases to be filtered by Level/Version and type of test
\item An online database allows results to be uploaded and compared with results from other simulators
\end{enumerate}
\end{tabular}
\\
\hline
\vspace*{0.5ex}\textbf{Validation Facilities} & \textbf{Conversion Facilities} \\
\begin{tabular}{P{3in}}
Validation software can check files for compliance to the definition of SBML, good modeling practices, and consistency of units
\begin{enumerate}
\item API libraries include built-in validation
\item Online validator has simple user interface (\mbox{\tighturl{http://sbml.org/Facilities/Validator}})
\item Web services support software access
\end{enumerate}
Validation ensures compliance with:
\begin{itemize}
\item SBML syntax
\item SBML validation rules published as part of each accepted SBML specification
\end{itemize}
\end{tabular}
&
\begin{tabular}{P{3in}}
Converters (\tighturl{http://sbml.org/Software/Converters}) can translate some other formats to/from SBML
\begin{enumerate}
\item Conversion tools support format conversions from MATLAB, BioPAX, CellML, XPP, SBtab, and others
\item Online services such as SBFC~\citep{Rodriguez2016systems} convert uploaded files to a variety of formats
\item API libraries provide converters between different SBML Levels/Versions and different SBML constructs
\end{enumerate}
\end{tabular}
\end{tabular}
\begin{minipage}{6.25in}
\hrulefill
\vspace*{-1em}
\begin{center}
\textbf{Software Guide}
\end{center}
\vspace*{-1ex}
A catalog (\tighturl{http://sbml.org/SBML_Software_Guide}) of software applications, libraries and online services known to support SBML---over 290 entries to date
\begin{enumerate}
\item A tabular interface highlights supported SBML features of each software system
\item A list interface displays human-readable summaries of software systems
\item Software can be added to the list upon request
\end{enumerate}
\end{minipage}
\end{minipage}
% ======================================================================
\section*{Impact of SBML}
% ======================================================================
As contributors to developments in methods, software, and standards over the past two decades~\citep{Hucka2015promotinga}, we can attest to SBML's profound impact on the field, both from our own first-hand experiences and from surveys~\citep{Klipp2007systems} that indicate SBML has become a \emph{de facto} standard. The impact is a result of SBML's community-oriented development approach and its design.
The SBML development process has helped shape the field partly by directly involving software developers and modelers. Frequent workshops have provided essential feedback for developers to help them better serve modelers' needs~\citep[e.g.,][]{waltemath2014meeting}. Workshops as well as resources such as the SBML Software Guide (see Box~\ref{box:software}) helped raise awareness of existing tools, which in turn increased their use and the use of SBML. This helped create a culture of sharing models and building on existing work in systems biology~\citep{stanford2015evolution}. It also led to new activities centered on the models themselves, including automatic model generation, analysis of model structures, model retrieval, and integration of models with experimental data~\citep{Draeger2014}. SBML's successful approach to community organization has led other standardization efforts (\eg BioPAX, NeuroML, SBGN, SED-ML) to adopt some of the same approaches; SBML was also a founding member of COMBINE~\citep{Hucka2015promotinga}, discussed above. Some of the primary standardization efforts in COMBINE, such as BioPAX~\citep{Demir2010} and NeuroML~\citep{Gleeson2010}, are more domain-specific than SBML; others, such as CellML~\citep{Lloyd2004-fd}, overlap SBML's primary domains but offer alternative abstractions; and finally, still others such as SBGN~\citep{VanIersel2012}, SBOL~\citep{Roehner2016}, and SED-ML~\citep{waltemath2011reproducible}, are complementary formats.
Before the advent of SBML, it was challenging to exchange models because software tools used incompatible definition schemes. As models increased in size and complexity, manually rewriting them became more difficult, error-prone, and eventually, untenable. The development of SBML has enabled the use of a single model description throughout a project's life cycle even when projects involve heterogeneous software tools (Box~\ref{box:use-cases}). SBML-compatible software tools today allow researchers to use SBML in all aspects of a modeling project, including creation (manual or automated), annotation, comparison, merging, parametrization, simulation/analysis, results comparison, network motif discovery, system identification, omics data integration, visualization, and more. Such use of a standardized format, along with standard annotation schemes~\citep{Neal2019harmonizing} and training in reproducible methods, improves research workflows and is generally recognized as promoting research reproducibility~\citep{waltemath2016modeling}.
The availability of a well-defined format has also facilitated the comparison of software tools to each other. Using SBML-encoded models has become the norm to assess the accuracy of modeling software: initially done manually using models from BioModels Database~\citep{bergmann2008comparing}, now it is more commonly done using the SBML Test Suite (Box~\ref{box:software}). SBML's semantics are defined precisely enough that many simulation systems can produce equivalent results for over 1200 test cases, lending confidence that SBML-based simulations can be reproducible in different software environments.
While chemical kinetics models have been a staple of systems biology, other modeling frameworks exist. These have benefited from efforts to extend Level~3 to better suit their specific characteristics. Even when models could in principle be encoded using core SBML constructs, the use of features explicitly adapted to the needs of a domain can make model interpretation less error-prone and more natural. The former issue was demonstrated vividly when ad hoc methods of encoding genome-scale models led to incorrect interpretations, and a subsequent proposal to use SBML Level~3 ``fbc'' addressed representational inconsistencies that had hindered reproducibility~\citep{Ebrahim2015}. The use of more domain-specific forms of encoding has been preferred by several communities, such as the qualitative and rule-based modeling communities. For example, the quickly adopted package SBML Level~3 ``qual''~\citep{Chaouiya2015sbml} supports software interoperability for qualitative modeling, illustrated by the use of CellNOpt~\citep{terfve2012cellnoptr}, which provides a set of optimal Boolean models that best explains the causal relationships between elements of a signal transduction network and associated data, and the subsequent use of GINsim~\citep{chaouiya2012logical} or Cell Collective~\citep{helikar2012cell} to assess the dynamical properties of these models. Rule-based modeling can represent models that are impossible to express as reaction networks, such as polymerization~\citep{faeder2009rule}, or simply impractical to represent due to the combinatorial number of reactions implied by the rules~\citep{Hlavacek2003complexity}. Storing rule definitions in SBML is now feasible with the ``multi'' package, allowing rule-based modeling tools such as Simmune~\citep{zhang2013simmune} and BioNetGen~\citep{faeder2009rule} to read and write the same model definitions.
SBML has also eased the automated processing of models to the point where they have become just another type of data in the life sciences. SBML is used today as an import/export format by many databases of mathematical models~\citep{Malik-Sheriff2020biomodels, Norsigian2019, Misirli2014composable}, as well as by pathway databases~\citep{caspi2015metacyc, mi_2016, fabregat2017reactome} and reaction databases \citep{ganter2013metanetx, wittig2017sabio}. SBML is the preferred format for model curation in BioModels Database~\citep{Malik-Sheriff2020biomodels}, not only because of its popularity but also because of its provisions to precisely encode and annotate models to support reproducible modeling. SBML is also used to share models by more generic data management platforms such as SEEK~\citep{wolstencroft2016fairdomhub} and comprehensive online simulation environments~\citep[e.g.,][]{Moraru2008virtual, peters2017jws, Weidemann2008sycamore, Lee2009webbased}. Moreover, having an agreed-upon format has facilitated the introduction of better model management strategies. This includes support for tasks such as model storage and retrieval~\citep{Henkel2015combininga}, version control~\citep{Scharm2016algorithm}, and checking quality and validity~\citep{Liebermeister2008validity, Lieven2020memote}. The proliferation of derived models has led to the development of methods to compare model structure and semantic annotations~\citep{Lambusch2018identifying}, culminating in the development of several methods to quantify model similarities~\citep{henkel2016notions} that can then be used to improve the relevance of model searches. Once model elements can be compared, one can align, combine, and merge different models~\citep{krause2010annotation}.
A broader impact of SBML as a \emph{de facto} standard has been the support of publishers and funding agencies. Many journals, aware of the challenges surrounding the reproducibility of scientific results, encourage authors not only to describe their models but also to make their models available in electronic form. \emph{Molecular Systems Biology} was the first supporter of submissions in SBML format (beginning in 2005\footnote{\url{https://www.embo.org/news/press-releases/2005/now-live-molecular-systems-biology-a-first-in-systems-biology-publishing}}\textsuperscript{,}\footnote{\url{https://www.embopress.org/page/journal/17444292/authorguide\#datadeposition}}). Today, most journals still avoid \emph{requiring} a specific format, though some such as the BMC\footnote{\url{https://www.biomedcentral.com/getpublished/writing-resources/additional-files}} and FEBS\footnote{\url{https://onlinelibrary.wiley.com/page/journal/17424658/homepage/ForAuthors.html}} journals do explicitly encourage authors to submit SBML files as supporting material for research where it is relevant. Others, such as Biophysical Journal \citep{nickerson2017introducing}, recommend authors deposit models in repositories such as BioModels Database, which encourages the use of common standard formats such as SBML. Many funding agencies also now have policies related to data sharing, and some program announcements suggested the use of SBML where appropriate\footnote{See, for example, \url{https://grants.nih.gov/grants/guide/pa-files/par-08-023.html}}.
Finally, the continued development of SBML has stimulated collaborative work and the creation of consortia. This has led to better awareness and communication within groups interested in specific modeling frameworks. A good example is the CoLoMoTo effort mentioned above; it was launched by researchers who needed a format to exchange qualitative models between their software tools and developed the Qualitative Modeling package for SBML~\citep{naldi2015cooperative} as the solution. Nevertheless, challenges remain, as discussed in the next section. These will need to be confronted to ensure the longevity of SBML as well as continued developments.
\newpage
% **********************************************************************
\mybox{Examples of SBML use cases}\label{box:use-cases}
% **********************************************************************
SBML's impact on computational systems biology includes its facilitation of collaborative work. In multiple instances, it has precipitated entirely new projects, as illustrated by the examples below.
\textbf{SBML throughout the model life-cycle}~~~~Encoding a model in a standard format such as SBML makes it easier to use different software tools for different purposes, and thus makes it easier to leverage the most suitable tools at different points in a workflow. The following is an example. A signaling pathway can be designed graphically using CellDesigner~\citep{Funahashi2003celldesignera}. The resulting model can then be semi-automatically annotated using the online tool semanticSBML~\citep{krause2010annotation}. Experimental kinetic information can be retrieved in SBML format from the SABIO-Reaction Kinetics database~\citep{wittig2017sabio}. Tools such as COPASI~\citep{hoops2006copasi} and PyBioNetFit~\citep{Mitra2019pybionetfit} provide facilities to estimate parameters and to simulate the model with various algorithms. Other SBML-enabled tools such as Tellurium~\citep{Medley2018tellurium} and PySCeS~\citep{olivier2005modelling} provide capabilities such as identifiability and bifurcation analysis. Each step of the process applied to a model from creation to publication of results---modeling, simulation and analysis---can be documented using notes attached to every model element. The model can even be turned into a publishable document using SBML2\LaTeX~\citep{Draeger2009b}. Finally, the model can be exported from selected modeling tools, together with data and other information all bundled together in COMBINE Archive format~\citep{bergmann2014combine}
and published in model repositories such as BioModels Database~\citep{Malik-Sheriff2020biomodels}.
\textbf{Pipeline for automated model building}~~~~Being able to describe model elements precisely using semantic annotations facilitates the creation of automated pipelines~\citep{Drager2010automating}. Such pipelines can combine existing models with databases of molecular phenotypes or reaction kinetics~\citep{li2010systematic}. They can also generate models \emph{de novo} from data resources, as has been demonstrated by the Path2Models project~\citep{buchel2013path2models}. Path2Models has produced 143,000 SBML models---all fully annotated---for over 2,600 organisms, by using pathway data. Metabolic pathways were encoded in SBML Level~3 Core while signaling pathways were encoded with the SBML ``qual'' package~\citep{chaouiya2013sbml}. Moreover, constraint-based models of genome-scale reconstruction were provided for each organism. Other pipelines have now been built, including ones that can systematically generate alternative models for different tissue-types~\citep{wang2012reconstruction} and patient data~\citep{uhlen2017pathology}, an important step towards personalized medicine.
\textbf{Development, sharing, and re-use of genome-scale models of human metabolism}~~~~Constraint-based modeling approaches such as Flux Balance Analysis and its variants permit the use of whole-genome reconstructions together with experimental molecular phenotypes, in order to predict how mutations or different environments affect metabolism as well as predict drug targets and biomarkers~\citep{obrien2015}. With the availability of genome-scale metabolic reconstructions, the use of metabolic flux models at the same scale has been increasing~\citep{Bordbar2014a}. A recent development in the field has been the curation of consensus metabolic models, in particular for human metabolism~\citep{brunk2018}. Those community efforts rely on SBML for encoding and sharing the models, including annotations, which are crucial to being able to reuse the reconstructions later, and also for visual representation using the Layout~\citep{Gauges2015} and Rendering~\citep{Bergmann2018sbml} packages. The Flux Balance Constraint package~\citep{Olivier2018a} enables encoding of the information required for model optimization and flux calculation. Unambiguous encoding in SBML has been shown to be crucial for interpreting models and precisely computing fluxes~\citep{Ebrahim2015, Ravikrishnan2015critical}, and new validation tools for genome-scale metabolic models have been made available by the larger community~\citep[e.g., MEMOTE;][]{Lieven2020memote}.
\hrulefill
\newpage
% ======================================================================
\section*{Forthcoming challenges}
% ======================================================================
For nearly two decades, SBML has supported mathematical modeling in systems biology by helping to focus the efforts of the community and foster a culture of openness and sharing. The field is evolving rapidly, which presents challenges that the community and SBML must face.
The first challenge is to remain usable in the face of relentless growth in model sizes. One of the drivers of larger size is the rising popularity of genome-scale metabolic models~\citep{Bordbar2014a}, which can be produced semi-automatically~\citep{henry2010high}. Modeling approaches have also been developed to combine the use of several such models~\citep[e.g.,][]{bordbar2011multi}. It is reasonable to expect models of ecosystems to be produced soon (\eg microbiomes and their host). Model sizes will also increase as more models of tissues and organs are exchanged and reused, encouraged by the use of software packages that facilitate this approach, such as the open-source tools CHASTE~\citep{mirams2013chaste} and CompuCell3D~\citep{swat2012multi}. The challenge this presents is how to define, organize, and manage large models. Meeting the challenge will require a combination of novel approaches to model storage~\citep[e.g.,][]{Henkel2015combininga} and comparison~\citep[e.g.,][]{Scharm2016algorithm, Scharm2016comodi}, as well as more effective use of SBML Level~3 features. For example, the SBML Hierarchical Model Composition (``comp'') package~\citep{Smith2015} provides a way to encode models in SBML out of separate building blocks or from preexisting models; this can make larger models easier to structure and maintain, and it is a natural way to construct multiscale models. Similarly, the Arrays package may help to define and structure larger models by allowing models to be defined in a more compact form. % Methods are being developed for the efficient simulation of both SBML packages~\citep{watanabe2014hierarchical, watanabe2016efficient}.
A related challenge concerns human usability of SBML and similar XML-based formats. Though SBML is intended for software, not humans, to use directly, desire for a text-based or spreadsheet-based equivalent is often voiced~\citep[e.g.,][]{Kirouac2019reproducibility}. Various answers have been developed in the form of text-based notations~\citep[e.g.,][]{Gillespie2006tools, Smith2009antimony} and spreadsheet conventions~\citep[e.g.,][]{Lubitz2016sbtab}, with bidirectional translators for SBML. These formats have undeniable appeal for many users and use cases, despite that they do not capture the entirety of SBML (often having limited or missing facilities to express units, annotations, or SBML packages). Their chief drawback is that they become error-prone to use as model size increases. Graphical user interfaces~\citep[GUIs; e.g.,][]{Funahashi2003celldesignera, hoops2006copasi, Moraru2008virtual} can overcome this; software with GUIs can help with the cognitive burden of tracking large numbers of model elements. On the other hand, GUIs can be tedious to use when entering large models, performance of some software does not scale well with increasing model sizes, and some cannot be controlled programmatically for automation purposes. A middle ground may be domain-specific modeling languages layered on top of programming languages such as Python~\citep[e.g.,][]{olivier2005modelling, Lopez2013programming}. However, these tend to appeal only to users who are comfortable with (or willing to take time to learn) the programming language used as a substrate. Overall, further innovation in this area would be welcome, both to help support SBML Level~3 packages and to help users cope with ever-increasing model sizes.
Because of the diversity of biological phenomena amenable to mathematical modeling, as well as their scales and properties, it is likely that a broad variety of modeling approaches will be added to every researcher's essential toolbox~\citep{Cvijovic2014bridging}. Methods such as multiagent and lattice approaches are coming into wider use to represent evolving cell populations, cell migration, and deformation. Some researchers are experimenting with solutions using existing SBML packages~\citep{watanabe2016efficient, varela2018epilog}. Modeling the development of tissues and organ function may also require combining these approaches with reaction-diffusion models, or multiphysics approaches~\citep{Nickerson2016human}. Population modeling will need to complement traditional instance-based systems if we want to take into account patient variability or information coming from single-cell measurements~\citep{Levin1997mathematical}. The coupling of different approaches within the same simulation experiment is also becoming more frequent. Biomolecular reactions modeled using ODEs, Poisson processes, and Flux Balance Analyses have been coupled in the first whole-cell model~\citep{Karr2015principles}. At the organ level, liver lobules have been modeled using a combination of metabolism and multi-agent models~\citep{schliess2014integrated}. Several approaches mixing modeling of cell mechanical properties and gene regulatory networks or signaling networks have been used to study morphogenesis~\citep[e.g.,][]{tanaka2015lbibcell}. The coupling of different approaches can be done within a single hybrid model, or each model can be simulated using different software and with dynamic synchronization at run time~\citep{mattioni2013integration}. Once again, the SBML ``comp'' package can play a role in supporting these approaches, but other methods and software will be needed in the future, as well as better support for coupling models at run time using, for example, SED-ML~\citep{waltemath2011reproducible}.
These developments are arising in a landscape where structural models are sometimes not the central object of study, and instead function as collection of integrated information. An example of this is RECON3D, a comprehensive human metabolic network with metabolite and protein structure information~\citep{brunk2018}. SBML will continue to have a pivotal role here too. When SBML was introduced, the state of modeling workflows and software tools was more primitive and it was natural that a model was self-contained. SBML-encoded models often had predefined parameter values (\eg as initial values for state variables or parameters for mathematical expressions), but today, modelers increasingly want to use the same model with different parameterizations, sometimes with parameter values expressed as distributions, lists, or ranges rather than unique values. A project may also use an ensemble of related models that differ in parameters or in turning some model elements on or off~\citep{kuepfer2007ensemble}. The semantic annotation of SBML elements also has become increasingly important, forming a bedrock for many of the analyses using SBML-encoded models. The growth in size and scope of annotations has recently led the modeling community to propose a standard way of storing annotations in separate linked files~\citep{Neal2019harmonizing}, relying on the COMBINE Archive format~\citep{bergmann2014combine} to bundle everything together. Other formats that can complement SBML have been developed, and further coordination and evolution will undoubtedly happen in the future. As mentioned above, SED-ML is a format that provides a way to encode what to do with a model, which complements SBML and compensates for its lack of features to define procedures. Finally, experimentation in integrating SBML more directly with other formats and data also continues. For instance, preliminary work has shown that SBML can be enriched with SBOL~\citep{voigt2018sbmlme} to provide models of DNA components' behavior~\citep{Roehner2014a}, and conversely, ongoing work in supporting genome-scale models of metabolism and gene expression~\citep[known as \emph{ME-models}, ][]{Thiele2012multiscale} augments SBML with SBOL to more fully capture models for use with ME-modeling software. Future developments in modeling paradigms may require similar flexibility in how models are represented: some may be best served by implementing new SBML packages, others by extending existing packages, still others by combining SBML with other formats.
Besides the technical challenges, social and cultural challenges also exist for formats such as SBML. One is to continue raising awareness among researchers, software developers, and funders of the existence of SBML and related COMBINE standards. Some may not yet be using SBML simply because they are not aware of it, or its recent addition of support for many modeling formalisms (Figure~\ref{level-3-diagram}). Raising awareness will require continual education and outreach, especially to students and early-career scientists. Awareness would be aided by greater promotion on the part of journals and reviewers of the use of SBML and related formats in paper submission guidelines. Despite some progress in this area (discussed in the previous section), the lack of stronger demands by journals and reviewers is surely one reason authors are either not aware or not motivated to publish their models in software-independent formats.
In addition, usability of standard formats depends crucially on their implementation in software tools, and motivating this work is another challenge for SBML. A pivotal factor for the success of SBML has been the extensive software ecosystem, which provides relatively easy import and export of SBML from popular software systems. However, implementing full SBML compatibility in software is not a simple matter, and problems with compatibility in the software ecosystem can be a significant source of frustration. Improving the software requires continuous investment in tool development.
That, in turn, is related to a final challenge: obtaining and maintaining funding. By virtue of not being a native format of any particular software tool, a format such as SBML may require extra work to define by consensus, and then again for developers to implement in software---and still, it will lag behind the leading edge of research because exchange formats only become important after more than one software system has something to exchange. Funders may wonder whether the resources, time and effort spent on standards development would not be better applied to other goals. However, these costs must be weighed against the costs to a whole research field of \emph{not} having standards---and there are many such costs. To take one example, models in nonstandard formats are more difficult to review, verify, and reuse. Journal reviewers may not have access to the necessary software, or the software may not be well-tested, all of which increase the chances that the published model contains errors. Researchers can spend substantial time attempting to reproduce the results, only to fail. Worse, this is a repeating cost: failures to reproduce models are rarely published or publicized, which means an untold number of researchers may spend time (and research funding) on a futile effort. Funders recognize that too many research results are irreproducible, and have urged community action~\citep[e.g.,][]{Collins2014policy}. The continued development of exchange formats, such as SBML, is a crucial and cost-effective means to enable reproducible research.
% ======================================================================
\section*{Conclusion}
% ======================================================================
SBML and associated software libraries and tools have been instrumental in the growth of systems biology. As modeling and simulation grew in popularity, SBML allowed researchers to exchange and (re)use new models in an open, well-supported, interoperable format. SBML has made possible much of the research pursued by the authors of this article, and also helped us to structure our thoughts about our models and the biology they represent. Today, scientists can build, manipulate, annotate, store, reuse, publish, and connect models to each other and to basic data sources. In effect, SBML has turned models into a kind of data and transformed modeling in biology from an art to an exercise in engineering.
As the field of systems biology continues to grow and address emerging challenges, SBML will grow along with it. This evolution will (as it always has) depend on close cooperation between biologists and software developers. We hope that SBML will continue to be a source of inspiration for many researchers, especially those new to the field. In return, may they help develop the next generation of SBML to support more comprehensive, richer, and more diverse models, and expand the reach of systems modeling towards entire cells, organs, and organisms.
% ======================================================================
\section*{Acknowledgments}
We sincerely thank all current and past SBML users, developers, contributors, supporters, advisors, administrators, and community members. We give special thanks to the following people for contributions and support:
Jim Anderson,
Nadia Anwar,
Gordon Ball,
Duncan B\'{e}renguier,
Upinder Bhalla,
Fr\'{e}d\'{e}ric Y. Bois,
Benjamin Bornstein,
Richard Boys,
Ann Chasson,
Thomas Cokelaer,
Marco Donizelli,
Alexander D\"{o}rr,
Marine Dumousseau (Sivade),
Lisa Falk,
David Fange,
Ed Frank,
Ralph Gauges,
Martin Ginkel,
Nail Gizzatkulov,
Victoria Gor,
Igor Goryanin,
Ryan N. Gutenkunst,
Arnaud Henry,
Stefanie Hoffmann,
Duncan Hull,
Dagmar Iber,
Gael Jalowicki,
Henrik Johansson,
Akiya Jouraku,
Devesh Khandelwal,
Thomas B.~L. Kirkwood,
Victor Kofia,
Benjamin L. Kovitz,
Bryan Kowal,
Andreas Kremling,
Ursula Kummer,
Hiroyuki Kuwahara,
Anuradha Lakshminarayana,
Nicolas Le~Nov\`{e}re,
Thomas S. Ligon,
Adrian Lopez,
Timo Lubitz,
Peter Lyster,
Natalia Maltsev,
Jakob Matthes,
Joanne Matthews,
Tommaso Mazza,
Eric Minch,
Sebastian Nagel,
Maki Nakayama,
Poul M.~F. Nielsen,
German Nudelman,
Anika Oellrich,
Nobuyuki Ohta,
Michel Page,
Victoria Petri,
Ranjit Randhawa,
Veerasamy Ravichandran,
Elisabeth Remy,
Isabel Rojas,
Ursula Rost,
Jan D.~Rudolph,
Takayuki Saito,
Takeshi Sakurada,
Howard Salis,
Maria J. Schilstra,
Marvin Schulz,
Shalin Shah,
Daryl Shanley,
Tom Shimizu,
Jacky Snoep,
Hugh D. Spence,
Yves Sucaet,
Linda Taddeo,
Jose Juan Tapia,
Alex Thomas,
Jannis Uhlendorf,
Martijn P. van~Iersel,
Marc Vass,
Jonathan Webb,
Katja Wengler,
Benjamin Wicks,
Sarala Wimalaratne,
Haoran Yu,
Thomas Zajac,
W. Jim Zheng,
and Jason Zwolak.
The principal authors thank many funding agencies for their support of this work. F.B., A.D., M.H., T.M.H., S.M.K., B.O., and L.S., as well as SBML.org and its online resources, were supported by the National Institute of General Medical Sciences (NIGMS, US), grant \No R01-GM070923 (PI: Hucka). In addition, F.B. has been supported by the Bundesministerium f\"{u}r Bildung und Forschung (BMBF, DE), grant \No de.NBI~ModSim1, 031L0104A (PI: Ursula Kummer). M.L.B. has been supported by NIH (US) grant \No P41-GM103313 and R01-GM095485. A.D. has been supported by infrastructural funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Cluster of Excellence EXC 2124 Controlling Microbes to Fight Infections. A.F. was supported by the Grant-in-Aid for Young Scientists (B), grant \No 21700328 from JSPS KAKENHI (JP) to Keio University. J.F. was supported by National Institutes of Health (NIH, US) grant \No P41-GM103712 to the National Center for Multiscale Modeling of Biological Systems (MMBioS). H.H. was supported by the Biotechnology and Biological Sciences Research Council (BBSRC, UK) ``MultiMod'' project (grant \No BB/N019482/1). T.H. was supported by NIH (US) grant \No 5R35-GM119770-03 to the University of Nebraska--Lincoln. S.H. was supported by NIGMS (US) grant \No R01-GM080219. M.K. was supported by the Federal Ministry of Education and Research (BMBF, DE), research network Systems Medicine of the Liver (LiSyM), grant \No 031L0054, Humboldt-University Berlin (PI: K\"{o}nig). A.L. was supported by the BBSRC (UK) while working at the Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN), Newcastle University. C.M. was supported by the National Science Foundation (NSF, USA) under grant \No CCF-1748200 and CCF-1856740. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. I.M. was supported by NIH grant \No P41-EB023912 and P41-GM103313. K.R. was supported by the Department of Biotechnology, Government of India (grant \No BT/PR4949/BRB/10/1048/2012). M.M.-S. was supported by the Intramural Research Program of NIAID, NIH (US). R.M.-S. was supported by the BBSRC (UK) ``MultiMod'' project (grant \No BB/N019482/1). B.P.'s was supported by NIH (US) grant \No GM57089 to the University of California, San Diego, and by the Novo Nordisk Foundation Grant \No NNF10CC1016517. H.M.S. was supported by NIGMS (US) grant \No R01-GM123032 (PI: Sauro) and by the National Institute of Biomedical Imaging and Bioengineering (NIBIB, US) grant \No P41-EB023912 (PI: Sauro). J.C.S. was supported by NIGMS (US) grant P41-GM103313. M.S. was supported by the DDMoRe program (EU), Innovative Medicines Initiative Joint Undertaking under grant agreement 115156. N.S. was supported by BBSRC (UK) grant ``Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM)'', grant \No BB/M017702/1 (PI: Nigel~S. Scrutton). F.Z. was supported by the Intramural Research Program of NIAID, NIH (US).
We also thank the Google Summer of Code program\footnote{\url{https://summerofcode.withgoogle.com}} for support of SBML software development.
% ======================================================================
\section*{Author contributions}
S.M.K., D.W., M.K., F.Z., A.D., C.C., and M.H. wrote the bulk of the manuscript. Together with F.T.B., A.M.F., C.G., T.H., S.H., R.M.-S., S.M., I.M., C.M., A.N., B.O., S.S., J.C.S., L.S., M.S., D.T., L.W., and D.J.W., they also wrote and/or edited specifications for SBML Level~3 Core and the Level~3 packages. M.L.B., K.B., J.F., H.G., T.M.H., Y.I., W.L., A.L., D.L., E.M., C.P., K.R., N.R., C.S., B.S., J.S., J.C.S., N.S., N.T., and J.W. contributed proposals for SBML Level~3 and/or are past or current members of the SBML Team. M.M-S., H.M.S., B.P., H.B., H.K., A.F., H.H., J.C.D., and M.H. were principal investigators (or the equivalent, depending on the institution) for grants supporting SBML development.
The SBML community members listed in the supplemental note supported the development of SBML Level~3 through participation in discussions, commentary on specification documents, and/or implementation of SBML-using software.
T.H. has served as a shareholder and/or has consulted for Discovery Collective, Inc.
\clearpage
\bibliographystyle{sbml-msb}
\bibliography{literature}
%\printbibliography
\end{document}
% ======================================================================
% Please leave the following for Emacs users:
% Local Variables:
% mode: latex
% TeX-master: "main"
% End:
% ======================================================================