diff --git a/joss.05562/10.21105.joss.05562.jats b/joss.05562/10.21105.joss.05562.jats new file mode 100644 index 0000000000..f59111f215 --- /dev/null +++ b/joss.05562/10.21105.joss.05562.jats @@ -0,0 +1,407 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +5562 +10.21105/joss.05562 + +DARTS: The Data Analysis Remote Treatment +Service + + + +https://orcid.org/0000-0002-8508-8106 + +Farhi +Emmanuel + + + + + +Synchrotron SOLEIL, France + + + + +31 +5 +2023 + +8 +90 +5562 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +remote desktop +data analysis as a service (DAaaS) +qemu/kvm +virtualization + + + + + + Summary +

This paper presents the Data Analysis Remote Treatment Service + (DARTS), an open-source remote desktop service that launches on-demand + virtual machines in the cloud, and displays them in a browser. The + released environments can be used for scientific data treatment, for + example. DARTS can be deployed and configured within minutes on a + server, and can run any virtual machine. The service is fully + configurable, supports GPU allocation, is scalable and resilient + within a farm of servers. DARTS is designed around simplicity and + efficiency. It targets laboratories and facilities that wish to + quickly deploy remote data analysis solutions without investing in + complex hypervisor infrastructures. DARTS is operated at Synchrotron + SOLEIL, France, in order to provide a ready-to-use data treatment + service for X-ray experiments.

+
+ + Statement of need +

Synchrotron radiation facilities and other large-scale research + facilities generate increasingly massive and complex amounts of data + due to the nature of their experiments. This trend, referred to as the + “data deluge,” + (Wang + et al., 2018) is closely linked to the evolution of + technological bricks such as detectors, storage, network, and + computing capability.

+

To overcome this challenge, a sensible solution is to provide + suitable software on powerful computers with an interactive remote + access without the need for data transportation. By doing so, + researchers can efficiently access and analyse their data without + requiring expensive local hardware or software. Data analysis is a + vital preliminary step in the production of scientific publications, + which are the actual metric upon which research facilities are + evaluated in their societal impact.

+

While the Jupyter ecosystem + (Kluyver + et al., 2016; + Randles + et al., 2017) is now widely used for scientific data analysis, + it still requires users to have basic knowledge of commands and + scripting, and does not allow to launch full GUI applications. + Alternatively, a number of commercial solutions exist, such as Amazon + WorkSpaces + (Amazon + Web Services, 2023), FastX + (StarNet + Com. Santa Clara CA, 2023), and NX/NoMachine + (NoMachine, + 2023). Other community related software exist, such as the VISA + platform + (Caunt + Stuart, 2021), the ISIS Data Analysis as a Service + (Frazer + Barnsley, 2016), and the CoESRA service + (Guru + et al., 2016), but none of them is fully open-source and easily + installable and deployable.

+

The Data Analysis Remote Treatment Service (DARTS) is a + lightweight, on-demand, cloud service to instantiate and display + ready-to-use complete scientific software environments.

+
+ + Implementation +

The conceptual design of the Data Analysis Remote Treatment Service + (DARTS) is based on the following sequential steps:

+ + +

Identify a user and computing requirements from a web form + (landing page).

+
+ +

Launch a copy of a master virtual machine.

+
+ +

Display it in a browser.

+
+
+

The DARTS service starts from the landing page, in which a user + feeds information (credentials, computing requirements), and selects + one of the available environments (from the + machines.conf file). This information is + collected by the main script + (qemu-web-desktop.pl), which imports the main + configuration config.pl and takes care of the + whole service steps (instantiation, monitoring, self-cleaning). A + snapshot of the selected master virtual machine environment is + created, to hold user-level changes in the instance. It is then + started and attached to the QEMU embedded VNC server. A start-up + configuration script can be injected via + virt-customize + (Jones, + 2011) to be executed during the boot process. A websocket + exposes the internal VNC port as a URL, and displayed with noVNC. The + result page is generated with the proper URL for the user to connect. + The performance of the virtualization layer reaches native speed for + both CPU and GPU, as well as for disk and network.

+

Relying on a steady software stack (Apache2, Perl, QEMU) with + limited dependencies, DARTS is easy to deploy and operate. In + practice, the only DARTS-related maintenance action consists of adding + or updating the virtual machines. The simplicity of DARTS only + requires a fraction of a single staff member for its + administration.

+
+ + Research applications +

DARTS is especially suited for small to medium research + laboratories and facilities willing to quickly deploy a remote data + analysis infrastructure, with minimal maintenance.

+

At the Synchrotron SOLEIL, the service has been operated + continuously since 2020 for our users on two servers equiped with GPUs + (Farhi, + 2023). Our current production environments are a default Debian + system holding X-ray data treatment software (currently 631 scientific + applications and libraries), a reduced system meant to be distributed + to the users as they leave the facility, and a Windows 10 system with + commercial software. Our Debian images are built automatically via a + set of shell scripts + (Picca + & Farhi, 2022). This choice is meant to minimize our + maintenance. These images mount a persistent user folder (also + accessible via a JupyterHub service), as well as the experimental data + storage via NFS, CIFS/Samba, and SSHFS. In addition, information from + the authentication service (LDAP) is used to customize each instance + and install specific files and applications on top of existing master + virtual machines.

+
+ + Author contribution statement +

Conceptualization, coding, development and paper writing by + Emmanuel Farhi.

+
+ + Acknowledgements +

We thank the members of the Data Reduction and Analysis Group at + Synchrotron SOLEIL, and particularly Frédéric-Emmanuel Picca for his + continuous support during the development of this project. We also + thank Roland Mas, from the GNURANDAL company, for the Debian + packaging. This project has received support from the European Union’s + Horizon 2020 research and innovation programme under grant agreement + No 957189 “BIG-MAP” + (Tejs + Vegge, 2020).

+
+ + + + + + + WangChunpeng + SteinerUllrich + SepeAlessandro + + Synchrotron big data science + Small + 2018 + 14 + 46 + https://onlinelibrary.wiley.com/doi/abs/10.1002/smll.201802291 + 10.1002/smll.201802291 + 1802291 + + + + + + + KluyverThomas + Ragan-KelleyBenjamin + PérezFernando + GrangerBrian + BussonnierMatthias + FredericJonathan + KelleyKyle + HamrickJessica + GroutJason + CorlaySylvain + IvanovPaul + AvilaDamián + AbdallaSafia + WillingCarol + teamJupyter development + + Jupyter notebooks – a publishing format for reproducible computational workflows + Positioning and power in academic publishing: Players, agents and agendas + + LoizidesFernando + ScmidtBirgit + + IOS Press + 2016 + https://eprints.soton.ac.uk/403913/ + 10.3233/978-1-61499-649-1-87 + 87 + 90 + + + + + + RandlesBernadette M. + PasquettoIrene V. + GolshanMilena S. + BorgmanChristine L. + + Using the jupyter notebook as a tool for open science: An empirical study + 2017 ACM/IEEE joint conference on digital libraries (JCDL) + 2017 + + 10.1109/JCDL.2017.7991618 + 1 + 2 + + + + + + JonesRichard W. M. + + Virt-customize + 2011 + https://www.libguestfs.org/virt-customize.1.html + + + + + + PiccaFrédéric + FarhiEmmanuel + + SOLEIL infra-config + 2022 + https://gitlab.com/soleil-data-treatment/infra-config + + + + + + Tejs Veggeet al. + + BIG-MAP. EU H2020 grant agreement no 957189 + 2020 + https://cordis.europa.eu/project/id/957189 + 10.3030/957189 + + + + + + StarNet Com. Santa Clara CAUSA + + StarNet FastX remote linux x windows + https://www.starnet.com/fastx/ + 2023 + + + + + + Amazon Web ServicesInc. + + Amazon WorkSpaces + https://aws.amazon.com/fr/workspaces + 2023 + + + + + + NoMachine + + NX/NoMachine remote access for everybody + https://www.nomachine.com/ + 2023 + + + + + + Caunt Stuartet al. + + Virtual infrastructure for scientific analysis + https://visa.readthedocs.io + 2021 + + + + + + Frazer BarnsleyTom GriffinBrian Matthews + + Building a prototype data analysis as a service : The STFC experience + NOBUGS 2016 proceedings - new opportunities for better user group software + + RichterTobias + + ESS + 2016 + https://isis.analysis.stfc.ac.uk/ + 10.17199/NOBUGS2016.65 + 23 + 28 + + + + + + GuruSiddeswara + HaniganIvan C. + NguyenHoang Anh + BurnsEmma + SteinJohn + BlanchardWade + LindenmayerDavid + ClancyTim + + Development of a cloud-based platform for reproducible science: A case study of an IUCN red list of ecosystems assessment + Ecological Informatics + 2016 + 36 + 1574-9541 + https://www.sciencedirect.com/science/article/pii/S1574954116301182 + 10.1016/j.ecoinf.2016.08.003 + 221 + 230 + + + + + + FarhiEmmanuel + + DARTS: The data analysis remote treatment service + https://data-analysis.synchrotron-soleil.fr/qemu-web-desktop/ + 2023 + + + + +