Skip to content

Latest commit

 

History

History
249 lines (178 loc) · 20.6 KB

README.md

File metadata and controls

249 lines (178 loc) · 20.6 KB

Replication Package for "Function-as-a-Service Performance Evaluation: A Multivocal Literature Review" DOI CC BY 4.0

This replication package contains the raw dataset, scripts to produce all plots, and documentation on how to replicate our MLR study on FaaS performance evaluation.

Paper

J. Scheuner and P. Leitner, “Function-as-a-Service Performance Evaluation: A Multivocal Literature Review,” Journal of Systems and Software.

DOI arXiv

Abstract

Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. To address this gap, we conducted a multivocal literature review (MLR) covering 112 studies from academic (51) and grey (61) literature. We find that existing work mainly studies the AWS Lambda platform and focuses on micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we discover a mismatch between academic and industrial sources on tested platform configurations, find that function triggers remain insufficiently studied, and identify HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.

Citation

@article{scheuner:20-jss,
  author = {Scheuner, Joel and Leitner, Philipp},
  journal = {Journal of Systems and Software},
  doi = {10.1016/j.jss.2020.110708},
  title = {Function-as-a-Service Performance Evaluation: A Multivocal Literature Review},
  year = {2020}
}

Dataset

All extracted data originating from academic and grey literature studies is available as machine-readable CSV (./data/faas_mlr_raw.csv) and human-readable XLSX (./data/faas_mlr_raw.xlsx). The Excel file also contains all 700+ comments with guidance, decision rationales, and extra information. It is configured with a filtered view to display only relevant sources but contains the complete data (i.e., including discussion for sources considered to be not relevant in our context).

Interactive GSheet

The latest version is also available online as an interactive Google spreadsheet (GSheet): https://docs.google.com/spreadsheets/d/1EK9yg9fMZIDybnbi7thsnBx1NdqDmkW86sMygH9r8q8

The following steps describe how to use interactive querying:

  1. Chose Data > Filter views > Save as temporary filter view

    Filter View

  2. Explore the dataset using GSheet sort & filter functionality (e.g., discover open source studies):

    Filter Open Source Studies

Academic Literature Search Queries

The query_academic directory contains all search results in the *.bib format. The following figure summarizes all sources:

MLR Process for Academic Literature

Manual Search for Academic Literature

Manual Search consists of screening the following related publications:

  • a) J. Kuhlenkamp and S. Werner, “Benchmarking FaaS platforms: Call for community participation,” in 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pp. 189–194, 2018.
  • b) J. Spillner and M. Al-Ameen, “Serverless literature dataset,” 2019.
  • c) V. Yussupov, U. Breitenbücher, F. Leymann, and M. Wurster, “A systematic mapping study on engineering function-as-a-service platforms and tools,” in Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing, pp. 229–240, 2019.

Database Search

For the Database Search strategy, we use the following search string for all sources:

(serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda

We only consider publications after 2015-01-01 by either configuring the search engine appropriately or adding the following suffix to the search string: AND year>=2015.

Query Motivation
  • (serverless OR faas) find studies in the area of serverless computing and Function-as-a-Service
  • (performance OR benchmark) find studies related to performance and performance benchmarking
  • experiment targets empirical research. We are interested in measurement-based approaches but aim to exclude pure modeling research, FaaS surveys, FaaS feature comparisons, etc. We assume that academic papers mention their research methodology.
  • lambda narrows the search string to actual FaaS platforms (i.e., AWS Lambda) or when referring to 'lambda functions' (independently of the provider as used by~\citet{oakes:18}) to avoid a large number of false positives from other domains as experienced by Yussupov et al..
Query Adaptations

We performed the following adaptations of the search string:

  • Without lambda keyword: Omitting the keyword lambda resulted in too many false positives with a total of 4805 matches (vs 691). We used an initial training set of 43 publications and found that 100% of them contain the string "lambda" in their fulltext.
  • With double quotes ": Using double quotes includes only exact string matches and resulted in a total of 376 publications (vs 691) or 357 after duplicate removal. We found that this query is too narrow as it misses 6 relevant publications that are covered with our chosen search string.
Database Search Engines

We use the advanced query syntax of the following academic research databases:

ID Research Database Advanced Query Engine Docs
acm ACM Digital Library https://dlnext.acm.org/search/advanced Sidebar
ieee IEEE Explore https://ieeexplore.ieee.org/search/advanced/command Link
wos ISI Web of Science* http://apps.webofknowledge.com/UA_GeneralSearch_input.do?product=UA&search_mode=GeneralSearch Link
sd Science Direct https://www.sciencedirect.com/search/advanced Link
springer SpringerLink https://link.springer.com/search Link
wiley Wiley InterScience https://onlinelibrary.wiley.com/search/advanced PDF
scopus Scopus* https://www.scopus.com/search/form.uri?display=advanced Link

* Requires institutional (e.g., through university VPN) or personal account

Initial Search Details

The following table summarizes the initial search results and provides the exact query string and direct link for all databases. The search was performed at 2019-10-21 and all results are available as ID.bib under ./data/query_academic.

ID # Exact Query String and Link
acm 126 [[All: serverless] OR [All: faas]] AND [[All: performance] OR [All: benchmark]] AND [All: experiment] AND [All: lambda] AND [Publication Date: (01/01/2015 TO *)]
ieee 215 ((("Full Text & Metadata":serverless) OR ("Full Text & Metadata":faas)) AND (("Full Text & Metadata":performance) OR ("Full Text & Metadata":benchmark)) AND ("Full Text & Metadata":experiment) AND ("Full Text & Metadata":lambda))
wos* 3 ALL=((serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda)
sd 35 (serverless OR faas) AND (performance or benchmark) AND experiment AND lambda
springer 130 (serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda
wiley 149 (serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda
scopus 33 (ALL(serverless) OR ALL(faas)) AND (ALL(performance) OR ALL(benchmark)) AND ALL(experiment) AND ALL(lambda) AND PUBYEAR > 2015

* Requires manual steps: 1) copy the query string into the advanced search field 2) add custom year range 2015 - 2019

Export Instructions

The following instructions show how query results from the research databases are exported into *.bib files:

  • acm: 1) choose 100 per page 2) select all 3) export citation 4) choose bibtex
  • ieee: 1) choose 100 per page 2) select all 3) export > citations > bibtex 4) copy/paste into ieee.bib file
  • wos: 1) select page 2) Export > Other file formats 3) Choose Bibtex and author,source,title
  • sd: 1) display 100 per page 2) Export > Export citations to bibtex
  • springer: 1) Download results as CSV 2) Open CSV in Excel and copy all DOIs 3) Paste DOIs into Zotero's "Add item by identifier" 4) Right-click selection and export all to bibtex
    • SpringerLink does not support bibtex export and therefore, we followed a workaround described here. Notice that the import/matching could take a while until the indices and paper counts in the list are updated properly.
  • wiley: 1) Select all 2) Export citations > bibtex 3) repeat for all pages 4) merge all result files
  • scopus: 1) choose 100 per page 2) select all 3) Export > Bibtex

Grey Literature Search Queries

The query_grey directory contains all search results in the formats *.pdf and *.html. The following figure summarizes all sources:

MLR Process for Grey Literature

For the 1) Web Search strategy, we use the following search string for all sources:

(serverless OR faas) AND (performance OR benchmark)

We perform an additional Google search with the exact same search string as used for academic literature but adjust the search string for more informal grey literature by omitting the keywords experiment and lambda.

Web Search Engines

We use the following web search engines:

ID Search Engine URL
google Google Web Search https://www.google.com/
twitter Twitter Search https://twitter.com/
hackernews Hacker News Algolia Search https://hn.algolia.com/
reddit Reddit Search https://www.reddit.com/search
medium Medium Search https://medium.com/search

Web Search Details

The following table summarizes the number of relevant studies and provides the exact query string and direct link for all web searches. The search was performed at 2019-10-21 and all results are available as ID.bib under ./data/query_academic. Notice that the number of relevant studies are already de-duplicated, meaning that we found 18 relevant studies through google1 search and the additional +7 studies from google2 search only include new non-duplicate studies. Notice that with the exception of Google Search, advanced queries including logical expressions (e.g., "OR") are not supported. Therefore, we manually compose four subqueries to implement an equivalent search string.

ID Date # Exact Query String and Link
google1 2019/11/26 18 ("serverless" OR "faas") AND ("performance" OR "benchmark") AND "experiment" AND "lambda" after:2015-01-01
google2 2019/11/26 +7 ("serverless" OR "faas") AND ("performance" OR "benchmark") after:2015-01-01
twitter1 2019/12/03 +2 faas benchmark
twitter2 2019/12/03 +3 serverless benchmark
twitter3 2019/12/03 +0 faas performance
twitter4 2019/12/03 +3 serverless performance
hackernews1 2019/12/06 +0 faas benchmark
hackernews2 2019/12/06 +0 serverless benchmark
hackernews3 2019/12/06 +0 faas performance
hackernews4 2019/12/06 +1 serverless performance
reddit1 2019/12/06 +0 faas benchmark
reddit2 2019/12/06 +0 serverless benchmark
reddit3 2019/12/06 +3 faas performance
reddit4 2019/12/06 +0 serverless performance
medium1 2020/02/18 +0 faas benchmark
medium2 2020/02/18 +2 serverless benchmark
medium3 2020/02/18 +0 faas performance
medium4 2020/02/18 +1 serverless performance

Export Instructions

We used non-personalized private search mode through private Google Chrome browser windows wherever possible. Notice that the number of search results for Google search is only a rough estimate and typically changes (dramatically) when reaching the last page1. Therefore, we used double quotes " for exact matching (i.e., exclude Google's fuzzy search results) and achieving more accurate search estimates. Further, Google filters out highly redundant search results by default. For the google1 query, we repeated the search with disabled redundancy filtering and kept both versions (e.g., google2 and google2.2 or google4.2 but google4 doesn't exist because omitted results have less pages)2.

We used the Google Chrome export options for PDF and HTML in combination with the following steps:

  • google: 1) Paste link in private browser mode 2) Settings > Search Settings: choose region "United States" and 100 results per page
  • twitter: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
  • hackernews: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
  • reddit: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
  • medium: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear

The authors also saved PDF or HTML files of all relevant articles in case some sources become unavailable. However, we cannot publish these website copies for legal reasons.

1 Google Support: The count of the number of search results is incorrect
Search Engine Land: Why Google Can’t Count Results Properly

2 Google Support: In order to show you the most relevant results, we have omitted some entries

Plots

An up-to-date R language toolchain preferably with RStudio is required. Install the required packages imported at the top of each *.R file (e.g., install.packages("ggplot2")) from the official CRAN package repository (RStudio automatically detects to-be-installed packages). A dependency installation script is provided under plots/install_dependencies.R

  1. Open the RStudio project faas_mlr.Rproj (or alternatively set the R working directory to ./plots)

  2. Run a given *.R file to produce the corresponding *.pdf plot. Example: characteristics.R produces characteristics.pdf. Example:

    cd plots
    Rscript characteristics.R

The plots follow the economist color scheme in the ggthemes package.

Dependencies

Software Version
R 4.2.0
tidyr 1.1.0.9000 (dev)
vctrs 0.3.2.9000 (dev)
dplyr 1.0.1 (dev)
forcats 0.5.0
ggplot2 3.3.2
ggthemes 4.2.0

NOTE: In issue in vctrs caused error messages such as ... is not empty (2020-06) but was fixed in July and works with the above installed versions (dev versions as of 2020-07-19).