The inaugural conference, R / Pharma 2018 held on August 16th and 17th at Harvard University attracted a representatives from academia, government, and industry.
+
+
+
+
+
+
+
Date
+
+ Aug 16, 2018 1:00 PM — Aug 17, 2018 3:00 PM
+
+
+
+
+
+
+
+
+
+
+
+
+
Event
+
+
+ 2018 Conference
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Location
+
Harvard University
+
+
+
1737 Cambridge St, Cambridge, MA 02138
+
+
+
+
+
+
+
+
+
+
+
+
The R / Pharma conference began as grass-roots initiative led by data scientists working in
+the pharmaceutical industry to promote the use of R in Pharma, and to establish and share
+best practices. The founding members organized the project as an R Consortium working
+group, and undertook the ambitious task of launching an annual conference envisioned as a
+relatively small, collegial, industry-oriented event with a strong scientific program.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Keynotes (https://bit.ly/2SdvMNd) were presented by Lilliam Rosario of the FDA’s
+Center for Drug Evaluation and Research, Michael Lawrence Genentech/Roche and the R
+Core Group, and Max Kuhn and Joe Cheng from RStudio.
+Over forty-five talks and workshops were delivered with topics ranging from reproducible research, regulatory constraints and considerations, and package administration
+to scaling R for production. A considerable number of talks emphasized the development
+of production-grade Shiny applications. Although considerably larger than most of the
+Shiny applications developed, Roche’s 500,000 line application, which was presented at
+the conference and subsequently written up in a post (https://bit.ly/2NeXKFJ), reflects a
+common use case.
+
Organizing Committee
+
Melvin Munsaka - AbbVie; Bella Fang and Min Lee - Amgen; Eric Nantz - Eli Lilly; Elena
+Rantou and Paul Schuette - FDA; Elizabeth Hess - Harvard; James Black, Reinhold Koch,
+and Michael Lawrence - Genentech / Roche; Edward Louzier - Merck; Michael Blanks -
+PPD; Phil Bowsher - RStudio; and Harvey Lieberman - Sanofi.
+
Program Committee
+
Co-chairs for the program committee were Bella Feng - Amgen, John Sims - Pfizer, and Ryan
+Benz - SocalBioinformatics.
+
Program committee leads included: Melvin Munsaka - AbbVie; Robert Engle - Biogen;
+Elena Rantou and Paul Schuette - FDA; Elizabeth Hess - Harvard University; James Black
+and Reinhold Koch - Genentech/Roche; Paulo Bargo - Johnson & Johnson; Eric Nantz - Eli
+Lilly; Edward Lauzier - Merck; Xiao Ni- Novartis; Thomas Tensfeldt - Pfizer; and Harvey
+Lieberman - Sanofi.
Major themes addressed at the conference were Shiny, reproducible research, package administration, scaling R for production, and using R in a regulatory environment.
It’s no secret that there are few industries more competitive than the pharmaceutical
+industry. Big money placed on long-shot bets for block-buster drugs where being
+first makes all the difference means a constant struggle to gain a competitive edge.
+So, you might find it surprising that the inaugural R / Pharma Conference held this
+past August on the Harvard campus in a very classy auditorium was all about collaboration.
+
Some might also find it surprising that data scientists from competitive companies
+would gather to share information, but this is quite common. I have seen it before
+in other competitive industries, for example in IEEE-led standards initiatives,
+where engineers gather to forge a common technology. Not only is there the human
+need to share and learn from peers (and also brag a little), there is a larger
+force at play: a kind of market clearing operation where experts gather to gain
+as much of an advantage as they can by ensuring that no easily exploitable
+arbitrage opportunities remain.
+
It was a surprise, though (and I think a source of general amusement as the
+conference proceeded), that nearly every talk seemed to be about Shiny.
+Looking back, it is clear that it should not have been: 49% of the abstracts
+explicitly mention Shiny. This word cloud was built from the abstract submissions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Shiny is basically a technology for sharing complex information across multiple
+organizations and stakeholders with different skill sets. Shiny, too, is all
+about collaboration. For a look into the large, production-grade Shiny app,
+bioWARP, see Sebastian Wolf’s recent post.
+
Other major themes addressed at the conference were: reproducible research,
+package administration, scaling R for production, and using R in a regulatory
+environment. This last theme was underscored by a strong FDA presence.
+Lilliam Rosario from the FDA Center for Drug Evaluation & Research delivered
+the opening keynote, in which she addressed the regulatory role of CDER and
+the use of R. FDA speaker Mat Souktup spoke about the need to transcend the
+compartmentalized culture common in medical research, and how open-source tools
+are helpful in working towards this goal. He explicitly noted along the way
+that the FDA does not specify what software may be used. The third FDA speaker,
+Paul Schuette, filled in some details associated with topics raised by Rosario
+and talked about the use of R and Shiny at CDER. Along these same lines,
+Andy Nicholls from GSK conducted a well-attended and very informative workshop
+on The Challenges of Validating R. You can find Andy’s slides here.
+
Other keynote speakers were Max Kuhn, who talked about Modeling in the tidyverse
+(slides here); Joe Cheng, who described how to use Shiny responsibly in pharma
+(slides here); and Michael Lawrence, who spoke about enabling open-source
+analytics in the enterprise.
+
My very biased impression was that R / Pharma was an unqualified success at
+accomplishing the major objectives of bringing together data scientists and
+statisticians working in the Pharmaceutical industry, and of presenting a high
+quality program that explored several issues relating to the production use of
+R in a regulatory environment.
+
The following chart shows that representatives from quite a few pharmaceutical
+companies attended in spite of organization problems that artificially limited
+the overall number of attendees to about 140.
The conference is a relatively small, scientifically &
+industry oriented, collegial event focused on the use of R in the development of pharmaceuticals.
+
R/Pharma virtual will run Oct 24-26 with workshops the week prior. Registration for conference and workshops is now open.
R/Pharma 2022 will be held as a free virtual conference from November 8-10, 2022 with workshops running the week before.
+
More details on keynote speakers and call-for-talks will follow shortly.
+
R/Pharma 2020 and 2021 were huge successes as virtual events with over 2000 registrations for the 2021 event. We decided to keep the 2022 conference virtual in order to include as many participants as possible. For 2023 we are considering a hybrid event.
R/Pharma 2021 featured three discussion panels, which were held at the end of each conference day. The panels were not recorded and the only way you could hear the amazing discussions was to attend R/Pharma.
+
Validation
+
November 2nd 2021, 13:50 ET
+
+
James Black (Roche, Moderator)
+
Bill Ellis (Novartis) Is a senior leader at Novartis, developing the next generation of SCEs for GxP use.
+
Ellis Hughes (GSK) Ellis maintains valtools, an R package that helps to layer validation relevant documentation on top of R packages. Ellis is a known opinion leader on validation at R/Pharma (2019 talk) and (2020 talk).
Andy Nicholls (Glaxosmithkline) Andy is on the R Validation Hub team, a former lead for Mango’s ValidR, and a respected contributor to the discussion around ‘what is validation’ including co-authoring a whitepaper.
+
Coline Zeballos (Roche) Coline leads the R Strategy in Roche Pharma Informatics, and has worked closely with people across Roche Quality, Data Science and Informatics to develop a validation approach that scales from internal packages to CRAN. Coline is speaking on this topic at R/Pharma 2021.
+
Yilong Zhang (Merck) Is active on valtools, and the R Validation Hub. He also co-authored a paper on the validation for internal packages at Merck.
During R/Pharma 2021 we were excited to host 14 free workshops along with the conference. Workshops ran for 2-4 hours and were hosted by members of the community. The workshops are one of the highlights of R/Pharma and we are indebted to the hosts who prepare them, highlighting the community effort that R/Pharma fosters. Over 1000 people attended an R/Pharma workshop in 2021.
+
If you are interested in contibuting a workshop in 2022 please reach out to R/Pharma via email (info@rinpharma.com) or our contact page
+
R/Pharma 2021 Workshops
+
+
+
+
Workshop
+
Host(s)
+
+
+
+
+
Clinical Trials Data Analysis at Roche
+
Adrian Waddell (Roche)
+
+
+
Intro Shiny
+
Ted Laderas (DNANexus)
+
+
+
Stan: Building a Survival Model From Scratch
+
Daniel Lee (Generable)
+
+
+
Unleash Your Shiny Apps with Javascript
+
Jean-Philippe-Coene (Opifex) and David Granjon (Novartis)
+
+
+
A Case-Study Driven Introduction to the Julia Language for R Programmers
+
Devin Pastoor (Metrum)
+
+
+
Building Tidy R Packages
+
Juliane Manitz (EMD Serono) and Leigh Alexander (SomaLogic)
+
+
+
Clinical Tables in gt
+
Rich Iannone (RStudio)
+
+
+
R for Clinical Study Reports and Submission
+
Yilong Zhang (Merck), Nan Xiao (Merck) and Keaven Anderson (Merck)
+
+
+
SafetyGraphics
+
Jeremy Wildfire (Gilead), Maya Gans (Atorus Research) and Xiao Ni (Sarepta Therapeutics)
+
+
+
Python Machine Learning NLP
+
Kevin Lee (Genpact)
+
+
+
tidy-transcriptomics
+
Stefano Mangiola (Walter and Eliza Hall Institute) and Maria Doyle (Peter MacCallum Cancer Centre)
+
+
+
R Admin - RStudio Connect
+
Kelly O’Briant (RStudio)
+
+
+
Deep Learning
+
Leon Eyrich Jessen (Technical University of Denmark
+
+
+
R Package Validation Framework
+
Ellis Hughes (Fred Hutch) and Marie Vendettuoli (Fred Hutch)
R/Pharma is excited to host 20 free workshops! Workshops will run the 2 weeks prior to the gathering and are hosted by members of the community. Workshops are one of the highlights of R/Pharma and we are indebted to the hosts who prepare them.
Calling all opensource developers! Come Show off your packages and shiny apps!
+
Do you have an R package or shiny app that you want to showcase at R/Pharma?
+We are excited to announce that R/Pharma 2022 will feature a virtual exposition hall which will give you the opportunity to showcase an R package or shiny app in a virtual booth. The expo will be open throughout the entire three day conference event, running continuously so that attendees can see your package/app during breaks and down time.
+
+
You can run a video or slideshow for attendees to see when they click on your booth
+
+
Suggestion - a video of your app in use or a series of screen captures
+
+
+
When you are present at your booth you can video chat with visitors or show a live demo
+
+
Please note - we won’t be including specific times in the schedule for people to come to the Expo
+
You can schedule a demo and invite other attendees (simply announce that you’ll be demoing at a particular time on the conference chat).
+
+
+
Each booth has its own dedicated chat window where attendees can post questions/comments
+
We’ll do all the virtual booth set-up work for you
+
+
Just fill in this online form and we’ll take care of the rest
+
+
+
+
+
+
If you would like to include a virtual booth please fill out this form. Be sure to have the following information available:
+
+
General details of the app/package (name, one-line description and longer description)
+
Two graphics, one small (a hexsticker is perfect) and one large (2:1 aspect ratio works best, a screenshot perhaps?)
+
A video (Youtube, Vimeo or Wista) or google slides to play in the booth. This is what people will see when they come to the booth - it can be a recorded demonstration of the app, slides describing the package, etc.
+
+
+
+
Any questions? Just reach out to us at info@rinpharma.com or through the contact form.
+
More information on booths is available in this short video.
At R/Pharma we know it’s important to take regular breaks during a virtual event. We try to make these breaks fun and are excited to introduce you to three friends of R/Pharma who will be providing some entertainment this year.
+
Rafael De Andrade Moral
+
Rafael De Andrade Moral is a lecturer in Statistics at Maynooth University. He holds a BSc in Biology and a PhD in Statistics from University of São Paulo, Brazil. He really enjoys teaching and doing research related to the development of statistical modelling techniques applied to Ecology, Wildlife Management, Agriculture and Environmental Science. He is also interested in the computational implementation of statistical models, especially in R. He has recently founded the Theoretical and Statistical Ecology Group (https://tsecolgroup.wordpress.com) and since 2020 has been producing musical parodies to teach Statistics (http://tinyurl.com/rafamoral). In his spare time he enjoys playing guitar and writing music, as well as performing magic tricks. Last, but not least, he loves German Shepherd dogs!
+
Lodge McCammon
+
Lodge McCammon is a humorist, instructional designer, author, musician, and international education consultant. He works with school districts, universities, nonprofits, and businesses (e.g., Discovery Education, Microsoft, SAS, The Coca-Cola Company). He has given his unique keynote speeches at events like the Midwest Education Technology Conference, the University of the West Indies Open Lecture, and Amazon’s Series on Remote Learning. Lodge also creates custom content for meetings and events like the R/Pharma Conference.
+Learn more at https://lodgemccammon.com/
+
Mike Smith
+
Mike K. Smith is the lead of the R Centre of Excellence at Pfizer. He trained as a statistician, and after a period working in the Pharmacometrics modelling and simulation group he returned back to the Statistics organisation. He describes himself as a professional geek at work, and this label works equally well for his musical output. Mike likes to use randomness and probability in making music, which can be labelled as generative ambient music. You can think of it as soundtracks and textures for films and pictures that don’t exist. He makes music as MikeKSmith and his released music can be found on Bandcamp at https://mikeksmith.bandcamp.com/ with “work in progress” on SoundCloud at https://soundcloud.com/mikeksmith.
When you enter hopin you’ll be in the reception area. This is one of five main areas within the conference platform, each will be described briefly below.
+
Reception
+
The page you see when you first enter the conference. It’s a place where the conference organizers can push important announcements and where you’ll be able to see the schedule.
+
Stage
+
This is the place where all of the talks happen. Head over to the stage to listen to the speakers and watch the panels.
+
Sessions
+
Sessions is an area for smaller talks, round-tables and discussions with up to 50 on camera and an audience of up to 5000. Not only can we schedule sessions throughout the conference, but you can create them too. We’ve left sessions open for impromptu meetings. Anyone can create a session by just clicking on Create Session, filling in a few details and adding an optional picture.
+Sessions can be open, meaning anyone can go on camera, or moderated, meaning that you can manage who goes on camera. Anyone and everyone can watch sessions. Want to follow up with a speaker? Invite them to a session over a virtual coffee!
+
People
+
Here you can find people at the conference by searching for, and connecting with, other attendees. You can also try speed-networking where you’ll be randomly paired with another conference attendee for three minutes and both go on camera. It’s a great way to meet new people, and we all have something in common! Please consider trying it out - you’ll find that everyone at R/Pharma is very nice!
+
Expo
+
The expo is our virtual exhibition hall. R/Pharma does not work with sponsors and we promote open source so the exhibition hall is not a traditional one. We’re using this space to give people a chance to showcase their R packages and shiny apps. When you enter a booth you can see a video or set of slides highlighting a package or app.
+If the author is present in the booth they can interact with you and even show a live demo. The expo will be open throughout the entire three days of the conference - come check it out.
+If you would like to showcase your package or app more details are in this blog post.
+
R/Pharma aims to provide amazing workshops and presentations within a highly social atmosphere. This is clearly more challenging in a virtual environment. We hope you enjoy the conference and have the chance to make some new connections.
R/Pharma 2021 experimented with panel discussions at the end of each conference day and we are delighted to announce that we’ll be bringing them back for R/Pharma 2022.
+
The panels will not be recorded. The only way you can hear the amazing discussions is to attend R/Pharma 2022.
+
Building an Open Source Community
+
November 8th 2022, 13:20 ET
+
+
Rachel Dempsey (RStudio, Moderator)
+
Paulo Bargo (Janssen)
+
Melody Brown (LA DPH)
+
Guillaume Desachy (AZ)
+
Regis James (Regeneron)
+
Mike Smith (Pfizer)
+
+
Governing Business critical Pan-Company Collaborative R Packages
+
November 9th 2022, 13:40 ET
+
+
James Black (Roche, Moderator)
+
Keaven Anderson (Merck)
+
Michael Rimler (GSK)
+
Daniel Sabanes-Bove (Roche)
+
Mike Stackhouse (Atorus)
+
+
Finding A Way: Career Stories from 4 Senior Level Women in Data Science
+
November 10th 2022, 13:40 ET
+
+
Michelle Johnson (Metrum Research Group, Moderator)
We are excited to announce that the R/Pharma 2022 conference and workshop recordings are now available for viewing! If you were not able to attend the conference live, or wish to re-watch the talks, you can find all of the videos on the R/Pharma YouTube Channel with the following playlists:
The conference was a huge success with an amazing lineup of speakers, including the co-creator of the R language Robert Gentleman!
+
+
+
+
+
+
That’s not all … Like in previous years, the R/Pharma Conference offered a terrific lineup of workshops covering many innovative approaches to using R and other open source software in analysis pipelines and application development. You can view the workshop recordings on the R/Pharma 2022 Workshop Playlist. Be sure to view the description of the workshop video for resources shared during the workshops.
+
R/Pharma would not be possible without the talented members of our organizing and program committees, workshop instructors, presenters, and the life-sciences community. We hope you enjoy the recordings and look forward to our events in 2023!
There is much excitement for R/Pharma 2023! Please find below the plan for the gathering!
+
Based on the 2022 conference feedback, it is clear people are keen to get back together in person, as we did at Harvard in 2018 and 2019. Moreover, people very much like our virtual gathering and the accessibility for all people across the world.
+
Our 2023 plan is to have in-person and virtual components, as detailed below!
+
In-Person
+
R/Pharma is excited to announce that we are partnering with Posit to host our in-person program at posit::conf(2023), happening at the Hyatt Regency Chicago from Sunday, September 17 through Wednesday, September 20, 2023.
+
The in-person R/Pharma program at posit::conf(2023) will focus on the future of drug development using open source. The collaboration will include two activities specific to the pharmaceutical industry: (1) the “R/Pharma Roundtable Summit” for program leaders and people leading Open Source initiatives, and (2) the “Leveraging And Contributing To The Pharmaverse For Clinical Trial Reporting In R” workshop for data professionals.
Over the last five years, we’ve seen explosive growth in the use of R and other open-source technologies across drug development, with an increasing focus on open-source projects and pan-company collaboration. During the flagship event, the R/Pharma Roundtable Summit, facilitators will foster in-person discussions and conversations about key items (reproducibility, submissions, scalability, etc.) with industry leaders about open-source tools for next-generation submissions. R Validation Hub lead, Doug Kelhlkof (Roche), will lead a 90-minute workshop on a new validated repository for GxP R use. For the reminder of the day, R/Pharma and Posit will facilitate talks and discussions around open source drug development topics relevant to participating attendees. You can find a sample agenda for the R/Pharma Roundtable Summit here:
Thomas Neitmann, Associate Director Data Science at Denali Therapeutics, and Pawel Rucki, Chief Engineer at Roche, will teach a pharma-focused workshop titled, “Leveraging And Contributing To The Pharmaverse For Clinical Trial Reporting In R”. This workshop will introduce participants to the pharmaverse, a collection of open source R packages that provide the next generation backbone for clinical trial reporting. In the workshop, participants will also create ADaM datasets, prepare tables and figures as well as interactive shiny apps. You can find a sample agenda for the workshop here:
All R/Pharma program participants are encouraged to stay for the general conference happening September 19 - 20. The general conference will include two days of talks on various topics, high-profile keynotes, and opportunities to network with over 1,000 data professionals. Posit will release the full general conference schedule closer to the date. You can learn more about the conference at posit.co/conference.
+
Posit would like to hear from you, submit a talk for posit::conf! If you would like to submit a Pharma talk for the posit::conf(2023), please find the information on the Posit blog. To submit a talk, you’ll need to create a 60-second video that introduces you and your proposed topic. In the video, you should tell us who you are, why your topic is important, and what the attendees will take away from it. Submission closes at midnight on April 7 anywhere on Earth, and Posit will communicate decisions by early May.
+
Virtual
+
Our free virtual gathering will be on October 24, 25 and 26. Virtual registration will open later this Summer. This year we will feature 3 days of centent broken out as below:
+
Code/Technical Talks:
+
Day 1 will be R/Pharma, featuring only R talks.
+Day 2 will be Open Source in Pharma, featuring talks about R, Python, Julia, Stan, Javascript, and other open-source languages.
+
Impact/Advancement Talks:
+
Day 3 will be Open Source Advancing Healthcare & Drug Development, featuring content about the impact of Open Source to advance healthcare and drug development using tools such as machine learning, AI, deep learning, statistics. This topics will highlight advancements in areas such pharma, drug development, healthcare, medical devices, vaccines, precision medicine, personalized healthcare, Real-World Data and more! Our 3rd day will be shorter than the first 2.
+
The workshops will run the week before, October 16 - 20. The style and format will be very similar to 2020-2022 etc. We plan to offer free Digital Credentials for people that attend and complete the workshops.
We’re excited to announce that the call for talks for Virtual R/Pharma 2023 is now open !!
+
R/Pharma 2023, the conference on all things Open Source in Pharma, will take place October 24th-26th 2022, preceded by Training Days the week prior. The conference will be fully virtual. An in-person portion of R/Pharma will be at posit::conf(2023) on Sept 18th. More information, including a link to booking R/Pharma in-person can be found here: https://posit.co/blog/register-for-r-pharma-at-positconf2023/
+
We are particularly interested in talks from people who cannot usually make it to the in-person event, or are newer to conference speaking. R/Pharma committee members are offering free speaker coaching: as long as you have an interesting R in pharma idea and are willing to put in some work, we’ll help you develop a great talk. Talks are short (10-20 minutes) high energy presentations that give you the chance to talk about an interesting project that you’ve tackled with R in pharma. Short talks, or demos of your R/Python code, R/Stan/Julia packages, and shiny apps are great options.
+
This year we will feature content broken out as below:
+
Code/Technical Talks:
+
Day 1 will be R/Pharma, featuring mostly R talks.
+Day 2 will be Open Source in Pharma, featuring talks about R, Python, Julia, Stan, Javascript and other open source languages.
+
Impact/Advancement Talks:
+
Day 3 will be Open Source Advancing Healthcare & Drug Development, featuring content about the impact of Open Source to advance healthcare and drug development using tools such as machine learning, AI, deep learning, statistics. This topics will highlight advancements in areas such pharma, drug development, healthcare, medical devices, vaccines, precision medicine, personalized healthcare, Real-World Data and more! Our 3rd day will be shorter than the first 2.
+
We are particularly interested in submissions that have one or more of these qualities:
+
+
Examples of how shiny is used for interactive reporting for late-stage work
+
Showcase the use of Open Source for drug discovery and development
+
Using Open Source AI in Pharma
+
Applications from smaller pharma companies / biotechs
+
Expand the use of R in pharma to reach new domains and audiences
+
Combining R with other languages and tools, like Python, Julia, Tensorflow etc.
+
Reporting/Communication using R Markdown, Quarto, Shiny, ggplot2, or something else altogether
+
Discussions on how to teach R/data science effectively in pharma
+
Talks on administering R in pharma and tackling scaling, packages, cloud, HPC etc.
+
+
Applications close August 11th.
+
To apply, please submit the form found at this link with your title and abstract that introduces you and your proposed topic.
+
To ask a question, please use the form on our contact page to have your question routed to the organising commitees Slack channel or email us on info@rinpharma.com
We’re just a couple of days out from the R/Pharma round tables!
+
With the help of an advisory board spanning Amgen, BeiGene, Genentech, GSK, The Janssen Pharmaceutical Companies of Johnson & Johnson, Novartis, Pfizer, The Prostate Cancer Clinical Trials Consortium (PCCTC), Roche and Posit PBC the following topics are now on the agenda for our round tables (our attempt at paraphrasing these nuanced topics):
+
+
Can we stop validating R packages internally, and build a pan-industry process?
+
How can we help the path to interactive regulatory submissions?
+
What’s needed to be more confident depending on externally governed packages?
+
Can we combine our learnings on making a case for OS contributions?
+
Where is the potential for impact with LLMs/AA/AI for both drug development, and data scientist efficiency?
+
Where are we as data scientists and SCE developers on multi-modal, multi-disciplinary, drug development?
+
+
and probably the topic we are most excited about (especially with Roche and others attending, to discuss its SCE for CT/RWE and PHC) is…
+
+
What is a modern workflow? What are the key capabilities of a modern SCE? What can we learn from each other enabling diversity without compromising our SCE architecture - from legacy codebases/languages and their workflows built around mounted drives and unix permissions, through to multi-modal analyses querying multiple data stores via tokens as secrets, mixing not just open source languages, but different dockerfiles in a single analysis.
+
+
70 representatives from 42 companies will come together to tackle these and other questions.
+
Review the agenda below, or visit the discussion link to see the proposed topics:
R/Pharma is excited to announce our keynote speakers for 2023.
+
Dionne Price
+
Our Impact in the Evolving Data Landscape
+Data sources and the volume of data available for driving discovery and informing decisions have substantially increased over time. This increase has resulted in an evolving data and regulatory landscape ripe for the expertise of statisticians and data scientists. Statisticians and data scientists must play a key role to ensure the appropriate use of data and soundness of conclusions reached from analyses of the data. In this talk, we will explore the landscape identifying challenges and opportunities and highlight our contributions and impact.
+
Bio
+
Dionne Price is the Deputy Director of the Office of Biostatistics in the Office of Translational Sciences, Center for Drug Evaluation and Research, FDA. In this role, Dr. Price provides leadership to statisticians involved in the development and application of methodology used in the regulation of drug products. She currently leads cross-cutting, collaborative efforts across FDA to advance and facilitate the use of innovative trial designs in pharmaceutical drug development. Dr. Price received her MS in Biostatistics from the University of North Carolina at Chapel Hill and a PhD in Biostatistics from Emory University. Dr. Price is an active member of the American Statistical Association (ASA) and the Eastern North American Region of the International Biometrics Society. She is a Fellow of the ASA and the 2023 President of the ASA.
+
James Black
+
The importance of the SCE in enabling our shift from proprietary programming to open-source data science
+Historically building a great SCE for clinical reporting involved selecting a vendor, integrating their product, and supporting a single proprietary language. The shift to report clinical trials using R has had a much broader impact than just swapping out a language, with it also catalysing the adoption of data science in statistical programming. For the team building the latest generation of SCEs, this has led to a complex eco-system of dynamic dependencies to enable reproducible research, the need to adapt to a much faster pace of development of the tools used, and facilitated bringing different elements of evidence generation like trial design, and real world evidence, to co-exist with statistical programming. During this talk, we’ll discuss this evolution, the underlying tensions we continue to tackle aspiring to balance innovation against business continuity, and the critical role SCE architecture plays facilitating a shift to data science.
+
Bio
+
James Black gained his PhD from Cambridge University, with a thesis focussed on analysing the effect of randomising type 2 diabetes patients to different CVD risk management guidelines. After grad school he joined Roche’s real world data team, where he was a key driver in their rapid shift from using SAS and file shares to R, databases and git. Currently he leads Insights Engineering in Roche Pharma Product Development, and is the Product Owner for the PHC/RWE and Clinical Trial Reporting Scientific Computing Environments. He is also involved in Open Source and industry collaborations, sitting on the board of Open Source in Pharma, the R Consortium Pharma Oversight working group, PhUSE’s SCE Council and is the Product Development representative for Roche’s internal Open and Inner Source office.
+
Daniel Sabanés Bové
+
Why we Need to Improve Software Engineering in Biostatistics - A Call to Action
+Programming is ubiquitous in applied biostatistics, and most statisticians know a programming language such as R - yet software engineering is still neglected as a skill and undervalued as a profession in pharmaceutical statistics. Why is this a problem? Importantly, we run the risk of wrong decisions when relying on code that we wrote ourselves without any code review by other statisticians. When transitioning over undocumented code to successors or other teams, we cannot be sure that they can even use, yet maintain it in the future at all. Also, whether they can reproduce results we produced earlier is a matter of luck. If we later need to add features to our code, and don’t have sufficient tests in place, we will undoubtedly introduce bugs and alter the program behavior without knowing it. Finally, if we need to implement new statistical methods for analyses submitted to regulators, we need to have appropriate software validation pipelines in place, which will demand well developed and tested code. What can we do about it? First and foremost, we must become aware of the problem. Second, we need to take software engineering seriously, starting from education in basic software engineering skills - across schools, universities, and during the work life. Establishing dedicated software engineering teams within academic institutions and companies can be a key factor for the establishment of good software engineering practices and catalyze improvements across research projects. Providing attractive career paths is important for the retainment of talents. Finally, collaboration between software developers from different organizations is key to harness open-source software efficiently and optimally, while building trusted solutions. We illustrate the potential with examples of successful projects.
+
Bio
+
Daniel Sabanes Bove studied Statistics in LMU Munich and obtained his PhD at the University of Zurich for his research work on Bayesian model selection. He started his career in 2013 at Roche as a biostatistician, then worked at Google as a data scientist from 2018 to 2020 before rejoining Roche. He is currently leading the Statistical Engineering team in Roche Pharma Product Development that works on productionizing R packages, Shiny modules and how-to templates for data scientists. Daniel is co-author of multiple R packages published on CRAN and Bioconductor, as well as the book “Likelihood and Bayesian Inference: With Applications in Biology and Medicine”, and is currently co-chairing openstatsware, a working group focusing on Software Engineering in biostatistics.
R/Pharma is excited to host 17 free workshops!. Workshops will run October 16th-20th, 23rd and 27th and are hosted by members of the community. Workshops are always one of the highlights of R/Pharma and we are indebted to the hosts who prepare them.
We are excited to announce R/Pharma on-demand talks and the Expo.
+
On-Demand Content
+
For R/Pharma 2023 we are trying something new! This year we will be Premiering talks on our YouTube channel before and after the standard conference hours. This will give attendees a chance to engage with content beyond the standard conference time of 10AM to 2PM EDT. We hope this great content will provide more options for conference engagement and non-EDT inclusivity. We hope to exlore more with this effort in 2024.
+
During the Premier attendees can post comments and interact through the chat. The videos will also be available after the talks finish.
+
If you wish to experience these great talks live, the videos will be Premiered on our YouTube channel at (https://www.youtube.com/c/RinPharma) at the times listed below. Talks are also collected in this playlist.
The R/Pharma Expo is a virtual exhibition hall at the R/Pharma conference. This year we will be using the Expo to highlight useful resources for R practitioners just starting out in this space or those who are more experienced. Check out the Expo at the conference to learn about {admiral}, Posit cloud, R Consortium working groups and more!
What’s your policy on creating a safe place for attendees?
+
R/Pharma is dedicated to providing a harassment-free conference experience
+for everyone regardless of gender, sexual orientation, disability or any
+feature that distinguishes human beings. For more information, please
+see the R Consortium code of conduct.
By design, R/Pharma is deliberately a smaller conference in terms of attendance in order to encourage maximum opportunities for direct interaction with speakers. In 2018 and 2019 the conference was held in the Tsai Auditorium at the Center for Government and International Studies, Harvard University and capped at 150 attendees. The conference was run as a free event. Invitations were based on committee membership and advisory support, speaker acceptance, academic/student and diversity.
+
The conference has been run with single track consisting of keynotes from renowned industry practitioners to key R developers to leading academics. In addition there were a number of pre-conference workshops and full-length presentations as well as a number of shorter, highly-energetic lightning talks.
+
In 2020 we took a hiatus from holding the meeting at Harvard and ran a virtual conference. Over 1100 people registered for the three day virtual event that was preceded by a week of 11 workshops.
+
Following the success of the 2020 event the organizing committee decided to hold a virtual event for the 2021 meeting. 2000 people registered for the three day event which included three panel discussion groups on the areas of validation, training and leadership and scaling shiny. In addition, we held 14 workshops around the conference dates.
+
In 2022 we will be running a virtual event again.
+
Our entire event is a community-lead effort and 100% volunteer run. The event is vendor neutral and very much an academic conference. Harvard has been very helpful in hosting the event.
+
The R/Pharma conference is an event focused on the use of R in the development of pharmaceuticals. The conference covers topics including reproducible research, regulatory compliance and validation, safety monitoring, clinical trials, drug discovery, research & development, PK/PD/pharmacometrics, genomics, diagnostics, immunogenicity and more. All are discussed within the context of using R as a primary tool within the drug development process. The conference showcases the current use of R that is helping to drive biomedical research, drug discovery & development, and clinical initiatives. (Note that topics related to the use of R in hospitals/clinics for patient care by clinicians, doctors, and researchers is the focus of R/Medicine.
+
R/Pharma is dedicated to providing a harassment-free conference experience for everyone regardless of gender, sexual orientation, disability or any feature that distinguishes human beings.
+
+ Within the life sciences industry, Shiny has enabled tremendous innovations to produce web interfaces as frontends to sophisticated analyses, dynamic visualizations, and automation of clinical reporting across drug development. While industry …
+
+
+
+ Statisticians and programmers using multiple software systems (e.g., SAS, R, Python) often encounter differences in analysis results, requiring further exploration and justification. Investigating these discrepancies can be time-consuming, especially …
+
+
+
+ At Idorsia, we have developed a large R Shiny application supporting a metadata-driven approach to shell and output creation. We will demonstrate some key features of the app and discuss how this approach led to huge efficiency gains when delivering …
+
+
+
+ Over thousands of outputs (tables, graphs and listings) may need to be generated each year for filing, external publications, internal read outs and other activities in a pharmaceutical company. Although most of these outputs could be produced …
+
+
+
+ Real-world data are increasingly used to complement evidence from clinical trials. However, missing data are a major statistical challenge when the underlying missingness mechanisms are unknown, e.g., to adjust for confounding. This talk introduces …
+
+
+
+ Validating open-source R packages has been a hot topic over the past few years. This talk focuses on MetrumRG's updated process and tooling for validating first party R packages, that is R packages that we develop in-house, almost all of which are …
+
+
+
+ R and Biocondutor are important tools supporting scientific workflows across early Research and Development at Roche/Genentech. We have a broad R users community, which includes Data Scientists, Software Developers and consumers of Data Products …
+
+
+
+ Predictive modeling is a powerful tool, which amongst other things can be applied for prioritising drug candidates. Limiting the search space needed for target exploration, can reduce costs markedly partly eliminating lab time and expensive kits. …
+
+
+
+ Julia is a modern programming language that provides the ease of use of R with the speed of C++. Julia has been in development for over 11 years. Research on Julia originated at MIT in 2009. Julia is powered by multiple dispatch - a generalization of …
+
+
+
+ The use of open-source R is evolving in drug discovery, research and development for study design, data analysis, visualization, and report generation in the pharmaceutical industry. The ability to produce tables, listings and figures (TLFs) in …
+
+
+
+ Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that …
+
+
+
+ Genetically modified organisms (GMOs) and cell lines are widely used models to estimate the efficacy of drugs and understand mechanism of actions in biopharmaceutical research. As part of characterising these models, DNA sequencing technology and …
+
+
+
+ Identification of subgroups with increased or decreased treatment effect is a challenging topic with several traps and pitfalls. In this project, we would like to establish good practices for subgroup identification, by building a simulation platform …
+
+
+
+ Statistical graphics play an important role in exploratory data analysis, model checking and diagnostics. The lineup protocol (Buja et. al 2009) enables statistical significance testing using visualizations, bridging the gap between exploratory and …
+
+
+
+ The scope of the paper is to show how to produce a statistical summary report along with explanatory text using R Markdown in RStudio. Programmers write a lot of reports that describe the results of data analyses. There should be a clear and …
+
+
+
+ Introduction to the X-Omics Platform (XOP), a digital biomarker research platform for bioinformaticians and other scientist at Merck KGaA. XOP is a validated system for storing, processing, and analyzing "omics" data, including RNASeq, DNASeq …
+
+
+
+ The development of laboratory developed tests (LDTs) and in vitro diagnostics (IVDs) requires the execution of studies to determine the analytical performance of the assay. Examples of analytical studies include limit of detection, intermediate …
+
+
+
+ Non-compartmental pharmacokinetic analysis (NCA) is used in the characteristization of drugs absorption, distribution and elimination in the body. Software that implements NCA is available from commercial and non-commercial, open-source, sources. …
+
+
+
+ In the recent years, R Shiny apps have gained considerable momentum and have been utilized to develop many useful dashboards and user interfaces (UI) that allow non-programmers access to innovative tools. Due to the ease of development of Shiny apps …
+
+
+
+ The visR project for effective graphics in drug development visR is an open collaborative effort to develop solutions for effective visual communication with a focus on reporting medical and clinical data. The aim of the collaboration is to develop a …
+
+
+
+ The pharmaceutical industry has witnessed a growing interest in open source languages such as R and Python as an alternative to SAS for many activities related to clinical research. Hop on board for a whistle-stop tour of our efforts within GSK …
+
+
+
+ The current paradigm for analyzing clinical trial data is cumbersome it is an inefficient, slow, and expensive process. Several rounds of iterations between the main programmer and the validation programmer are usually needed to thoroughly explore …
+
+
+
+ In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories Counting for event based variables or categories, shifting …
+
+
+
+ Supporting data-driven decisions in the planning of clinical trials during the current pandemic involves extensive integration of heterogenous data sources, sophisticated predictive modelling, and custom visualization to communicate the predictions …
+
+
+
+ Effective visual communication is a core competency for pharmacometricians, statisticians, and, more generally, any quantitative scientist. It is essential in every step of a quantitative workflow, from scoping to execution and communicating results …
+
+
+
+ As stated in my 2018 R/Pharma presentation "Becoming Bilingual in SAS and R" I believe in problem-solving using different data science tools. This talk is about my team's efforts at using different data science tools (SAS R and Python) to harmonize …
+
+
+
+ The crisis of opioid abuse and overdose in the United States has involved unprecedented levels of opioid prescriptions and opioid-related mortality. Greater understanding of current trends in prescription opioid utilization may help prevent new cases …
+
+
+
+ MMRMs are often used as the primary analysis of continuous endpoints in longitudinal clinical trials (see e.g. Mallinckrod et al, 2008). Essentially, an MMRM is a specific linear mixed effects model that includes (at least) an interaction of …
+
+
+
+ Medical oversight during a clinical trial is an extensive and time-consuming process. To safeguard patient safety, medical monitors need to review and explore raw safety data interactively, using standard visualizations as well as specific analyses …
+
+
+
+ metashiny is an R package that provides a point-and-click interface to quickly design, prototype, and deploy essential Shiny applications without having to write one single line of R code. The core idea behind metashiny is to parametrize Shiny …
+
+
+
+ Validation of the R statistical package has become a hot topic since 2015, when the FDA issued the Statistical Software Clarifying Statement, stating officially that no specific software is required for submissions, and that any tool can be used if …
+
+
+
+ In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention). Housed within Fred Hutch, SCHARP is an instrumental partner in the …
+
+
+
+ The development of a streamlined data-aggregation methodology utilizing the statistical programming language R is described. The centralization of high-throughput experimentation data enabled the use of statistics and data exploration methods within …
+
+
+
+ The Interactive Safety Graphics (ISG) workstream of the ASA-DIA Biopharm Safety Working Group is excited to introduce the safetyGraphics package an interactive framework for evaluating clinical trial safety in R using a flexible data pipeline. Our …
+
+
+
+ The primary objective of the presentation is to share insights of democratizing powerful natural language processing tool like I2E lingumatics and open source R and Shiny. The talk will focus on how we can leverage I2E python sdk natural language …
+
+
+
+ Cohort studies of treatments developed from healthcare claims often have hundreds of thousands of patients and up to several thousand measured covariates. Therefore, new causal inference methods that combine ideas from machine learning and causal …
+
+
+
+ The data wrangling and manipulation capabilities in R make it perfectly suited for transforming raw clinical database data into structured, submission-ready CDISC datasets. By extensively using the dplyr, tidyr, and other packages in the tidyverse we …
+
+
+
+ In vivo studies are crucial to the discovery and development of novel drugs and are conducted for proof-of-concept validation, FDA applications and to support clinical trials. Appropriate study design, data analyses and interpretation are essential …
+
+
+
+ Purpose To establish a gold-standard methodology for accurately extracting progression-free survival (PFS) following Diffuse Large B-Cell Lymphoma (DLBCL) treatment using real-world electronic healthcare record (EHR) data. Results We produced an R …
+
+
+
+ The installation of a cohort of R packages can constitute a challenge; especially considering different dependency types, package versions, overlapping namespaces and varying risks assigned to each of the packages. At the same time, the number of R …
+
+
+
+ Start browsing through R tutorials online and it won't take long to stumble across a read.csv statement. CSV files serve well for detached, static analyses. They tend fail, however, when tasked with storing large, dynamic data sets being accessed …
+
+
+
+ In the early phases of clinical development, the future of a compound depends on more than just the result of hypothesis test on a single endpoint, in a single phase 2 study. We think a lot about how design choices affect immediate outcomes. GSK's …
+
+
+
+ Objectives Demonstrate an interactive and dynamic visualization tool, ModViz POP, for simulating ordinary differential equations based PK/PD models with variability. Methods ModViz POP has an in built PKPD ODE library of models based on the …
+
+
+
+ Physiologically based pharmacokinetic (PBPK) models are used extensively in drug development to address of number of problems. However, most PBPK applications have limited knowledge sharing impact because they are implemented in closed, proprietary …
+
+
+
+ Shiny makes it easy to take domain logic from an existing R script and wrap some reactive logic around it to produce an interactive webpage where others can quickly explore different variables, parameter values, models/algorithms, etc. Although the …
+
+
+
+ Scientists in drug discovery research utilize a wide variety of instrumentation and techniques to advance their research. While instrumentation vendors often provide software tools to deal with data wrangling and visualization, a simple collection of …
+
+
+
+ We know that adopting documentation, testing, and version control mechanisms are important for creating a culture of reproducibility in data science. But once you've embraced some basic development best practices, what comes next? What does it take …
+
+
+
+ We are amidst a data revolution. Just the past 5 years, the cost of sequencing a human genome has gone down approximately 10-fold. This development moves equally fast within areas such as mass spectrometry, in vitro immuno-peptide screening a.o. This …
+
+
+
+ Bayesian model-based dose-escalation designs, including one and two parameter logistic regression models, have meanwhile proven themselves in Phase I dose-escalation trials (Iasonos and O'Quigley, 2014 [1]). Compared to rule/algorithm-based designs …
+
+
+
+ In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention) and the lessons learned while creating packages as a team. Housed within …
+
+
+
+ The past few years have shown vast improvements in workflows for reproducible and distributable research within the R ecosystem. At satRday Chicago everyone in the audience said they used R Markdown, however only one person raised their hand when …
+
+
+
+ We define and illustrate a "deep visualization" paradigm for the analysis of a relatively large and complex clinical database for psoriasis (PSO) and psoriatic arthritis (PsA). This paradigm supports a growing number of machine learning and …
+
+
+
+ In a large organization, collaboration faces many obstacles. Groups may inadvertently reinvent functionality and expend redundant effort. Siloing may impede aggregation and comparison of results. Analysts may not be aware of potential collaborators. …
+
+
+
+ Developing Shiny applications that meet design goals, easily deploy to multiple platforms, and contain easily maintainable components (all while adhering to best practices) is typically a difficult endeavor. Until recently, there has not been a tool …
+
+
+
+ R is the dominant language in modern quantitative science, however it is still not widely used in pharma industry. In this talk I will share learnings in building an internal R user community in a large global organization, via efforts including …
+
+
+
+ Introduction As pharmacometricians, we sometimes jump into complex modeling before thoroughly exploring our data. This can happen due to tight timelines, lack of ready-to-use graphic tools or enthusiasm for complex models. Exploratory plots can help …
+
+
+
+ Our bioinformatics team is relied upon to quickly generate information to drive business decisions, allocate resources, and develop predictive models. As such, we constantly strive to streamline our work and create efficiencies when possible. To this …
+
+
+
+ The standardization of nonclinical study data by the Clinical Data Interchange Standards Consortium (CDISC) via the Standard for Exchange of Nonclinical Data (SEND) has created an opportunity for the collaborative development and use of open source …
+
+
+
+ R Shiny apps allow for dynamic, interactive, real-time integration of knowledge within a drug-development program to support decision making. Here, an R Shiny app was used to explore the pharmacokinetic and pharmacodynamic effects of different dosing …
+
+
+
+ Machine learning workflows can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, …
+
+
+
+ nlmixr is a free and open source R package for fitting nonlinear pharmacokinetic (PK), pharmacodynamic (PD), joint PK/PD and quantitative systems pharmacology (QSP) mixed-effects models. Currently, nlmixr is capable of fitting both traditional …
+
+
+
+ The R shiny-based nest framework (previously named teal) has been proven valuable in exploratory settings and supporting strategic decision meetings. To allow more clinical studies to be able to adopt this agile framework in a wider range, we've …
+
+
+
+ Content delivery in preparation for filing a clinical study report requires robust tooling for quickly and reproducibly compiling analysis of study data. Traditionally, this reproducibility has stemmed from one-time, rigorous validation of a …
+
+
+
+ Providing a Study Data Reviewer's Guide for Clinical Data to accompany the SDTM datasets, define.xml, and annotated CRF in a submission gives additional information to help the FDA review team. The guide is traditionally authored using MS Word - a …
+
+
+
+ The gsDesign package for group sequential design is widely used with 30k downloads. The package was originally written in 2007 with substantial documentation and Runit testing created before 2010. A Shiny interface was created to make the package …
+
+
+
+ Research reproducibility has been heatedly discussed in recent years. Some authors have pointed out that a large portion of published research findings is incorrect and/or irreproducible. Some state that the medical literature is as reliable as …
+
+
+
+ At R in Pharma 2018, I gave a workshop and a presentation on analyzing clinical trials data with R. Since then much has happened at Roche/Genentech with regard to analyzing clinical trials data with R our R-based projects got funded in order to …
+
+
+
+ GlaxoSmithKline is searching for new oncology drug targets. We have CRISPR knockout data for many cancer cell lines and many genes. For these same cell lines, we also have genomic data --somatic mutations, copy number variants, and gene expression. …
+
+
+
+ Determination of bioequivalence (BE), a crucial part of the evaluation of generic drugs, may depend on clinical endpoint studies, pharmacokinetic (PK) studies of bioavailability, and In-Vitro tests, among others. Additionally, in reviewing …
+
+
+
+ As the Pharmaceutical sector boosts its interactions with regulatory agencies using R programming as one key instrument for drug development submissions, we face a dilemma that several members of statistics and statistical programming teams are not …
+
+
+
+ In Pharmaceutical industry, personalized patient care is about having access to traditional and new data sources including comprehensive diagnostic data, sensor data, real-world data, etc., applying traditional and advanced analytics like machine …
+
+
+
+ Creating datasets and tables, listings and graphs (TLGs) for analyzing clinical trials data with R, such that in the final stage the code, datasets and TLGs can be submitted to the health authorities, is a multifaceted problem. We have been working …
+
+
+
+ During the drug development, pharmacometric models are often built to characterize and understand drug efficacy and safety. Simulations based on these models can assist drug development and quantitative decision making. However, computation can be …
+
+
+
+ Precision medicine typically refers to the development of drugs and other interventions for individual patients. But how do you assess efficacy and make predictions in this extreme small data regime? The Bayesian framework is ideal for this type of …
+
+
+
+ The R-based ecosystem, and its open-source methods for data manipulation, modeling and interpretation, is key for effective and reproducible research. This is certainly true in experiments relying on quantitative mass spectrometry. This relatively …
+
+
+
+ Recent advances in the Shiny ecosystem boost the scale and scope of serious enterprise-wide web applications. More specifically, it is entirely possible to utilize key features of Shiny Server Professional and additional R packages such as shinyjs, …
+
+
+
+ Recruitment models for clinical trials are notoriously difficult to build due to many complex factors within a study. With input from experienced practitioners, we have built an interactive tool to allow individuals to build complex recruitment …
+
+
+
+ Interactive web graphics are a popular and convenient medium for conveying information. However, web graphics are rarely used during the initial exploratory phase of a data analysis, largely due to the lack tools for seamless iteration between data …
+
+
+
+ Shiny is a popular R package that lets users develop interactive web applications using just R code. The ease of use and downstream boost in productivity mean that working with Shiny can kick off a rapid request-implementation-inspiration-request …
+
+
+
+ Despite the explosive growth and adoption of R globally, concerns over how to qualify and administrate R continues to echo in discussions about use in regulated environments. In this talk, I'll discuss the how to bridge the conceptual tenants of …
+
+
+
+ The interaction between the Major Histocompatibility Complex type I (MHC-I), a peptide and the T-cell receptor (TCR) (MHCIpTCR) is a key determinant of immune response elicitation and therefore of paramount importance in infectious- and autoimmune …
+
+
+
+ Since its foundation in 2004, Metrum Research Group has relied on R as the core technology and central framework for all of the company’s biomedical modeling and simulation (M&S) service activities, spanning more than 475 projects with 150+ different …
+
+
+
+ For the Pharma Company How many times have you made a graph and gotten an email back saying "Can we change the axes?" or "Can we change the symbols?" or "I really need to look at the graph before I can tell you what I want". It would be much more …
+
+
+
+ R is a very powerful tool for performing statistical programming, but has had a lower uptake in the life sciences when compared to SAS. As a result, many of the packages created for R are not focused on the type of tasks Statistical Programmers do. …
+
+
+
+ The pharmaceutical industry depends on accurate and reproducible data science for both preclinical and clinical analysis. Unfortunately, often an analysis cannot be reproduced and therefore its computational methodology and merit are unknown. Often, …
+
+
+
+ R Shiny has revolutionized the way statisticians and analysts distribute analytic results and research methods. We can easily build interactive web tools that enhance data visualization and facilitate data and information sharing. Shiny apps can …
+
+
+
+ The first challenge in validating an analytic tool for the pharmaceutical industry is that, despite a formal FDA definition, there is still no cross-industry agreement on what 'validation' really means with respect to an analytic tool. AIMS …
+
+
+
+ The drake package is a general-purpose workflow manager for data-driven tasks in R, with applications in the pharmaceutical industry ranging from tailored medicine to clinical trial simulation and beyond. Drake rebuilds intermediate data objects when …
+
+
+
+ The United States Food and Drug Administration (FDA) requires that clinical trial data be submitted in the Study Data Tabulation Model (SDTM) standard format. The process of developing SDTM involves mapping captured raw data to their correspondent …
+
+
+
+ In 3 years Real World Data Science Analytics in Roche/Genentech transitioned from a small team of former clinical trial programmers supporting a real world evidence team to become the largest department within the Personalised Healthcare (PHC) Centre …
+
+
+
+ When it comes to analytics of data collected in medical research, today’s culture is compartmentalized – not only across institutions, but even within institutions. Such a culture stagnates analytical development and limits the ability to fully …
+
+
+
+ Shiny is a package for turning analyses written in R into interactive web applications. This capability has obvious applications in pharma, as it lets R users build interactive apps for their collaborators to explore models or results, or to automate …
+
+
+
+ Next-generation sequencing (NGS), phage display technology and high throughput capacities enables biologists in drug discovery to characterize antibodies (Abs) based on their HCDR3 sequences and further group them into families before moving to …
+
+
+
+ In this talk, I will speak about my personal journey of learning R and transforming from a clinical study statistical programmer to a SAS/R bilingual, as well as my journey of leading the R initiative in Amgen’s Global Statistical Programming …
+
+
+
+ Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and trust results in order to confidently extend them, even when the results are their own. We …
+
+
+
+ Decision analysis balancing both data analytics and human gut feeling is critical in designing efficient routes to synthesize new, complex small molecules. This challenge is faced by any organization seeking to deliver modern pharmaceutical compounds …
+
+
+
+ Cohort studies of treatments developed from healthcare claims often have hundreds of thousands of patients and up to several thousand measured covariates. Therefore, new causal inference methods that combine ideas from machine learning and causal …
+
+
+
+ Since 2021, the FDA and the NIH have increased citations and notifications for non-compliance with required results reporting on ClinicalTrials.gov. Many studies still do not submit results to ClinicalTrials.gov; some do not publish results after 3 …
+
+
+
+ Emerging diseases like COVID-19 pose dual threats to public health and the economy. Understanding protein-protein interactions (PPIs) between viral and host proteins is crucial for antiviral therapies and studying pathogen replication. However, …
+
+
+
+ Product quality plays a vital role in the success of Biotech/Pharmaceutical organizations, and the accurate classification of quality risks is crucial to ensure the delivery of high-quality products. However, the current practice of assigning risk …
+
+
+
+ torch is an R port of PyTorch, a scientific computing library that enables fast and easy creation and training of deep learning models. In this talk, you will learn about the latest features and developments in torch, such as luz, a higher level …
+
+
+
+ Streamlining clinical trial output workflows is a key challenge for clinical studies. Our project leverages Python to link the planned analysis stored in a google sheet LoPO (List of clinical study Planned Outputs) to the study scripts that generates …
+
+
+
+ How does a risk-averse Pharma Biostatistics organization with 800+ people switch from using proprietary software to using R and other open-source tools for delivering clinical trial submissions? First slowly, then all at once. GSK started the …
+
+
+
+ The use of R in submissions to healthcare regulators presents challenges as the quality of packages must be ensured, and evidence of this quality must be readily available. The Regulatory R Package Repository Working Group aims to tackle these issues …
+
+
+
+ The success of a bacterial drug discovery program can be no greater than the phylogenetic diversity and capacity of those bacteria in the library to produce specialized metabolites (SM). However, the methods used to create bacterial strain libraries …
+
+
+
+ The dramatic increase of R in the computational, analytics, and data science areas has led to some innovative techniques in recent years for interactive analytics. This rate of change presents challenges for IT organizations to keep up and to …
+
+
+
+ Safety and efficacy data in clinical trials are mostly analyzed separately. However, especially the treatment of life-threatening disease such as cancer requires a good understanding of benefit and associated risks to make an informed therapy …
+
+
+
+ Failure to thoroughly review discrepancies and deviations in drug manufacturing is consistently one of the top citations in FDA inspectional observations. Learn how a leading biotechnology organization successfully replaced an inefficient, manual …
+
+
+
+ R is pretty good in backwards compatibility but still reproducing analysis even given script and data can be a challenge as packages, R, and math libraries keep evolving. www.rocker-project.org offers among other things version-stable R in docker …
+
+
+
+ REAP (R-Shiny Exploratory Analysis Platform) was developed by the Modeling and Simulation group within the Clinical Pharmacology department at Genentech, Inc., to support exploratory analyses of clinical data. REAP is a web-based, user-friendly, tool …
+
+
+
+ The rOpenSci project is a non-profit initiative founded as a grassroots effort in 2011. We have evolved into a truly global community of researchers and data scientists who are R users and developers from a wide range of disciplines. rOpenSci …
+
+
+
+ R has become a prominent data science tool, empowered by a fast-growing modern R eco-system. At Novartis, Shiny and markdown have gained a lot of popularity in analyzing, visualizing and reporting of clinical trial data. Traditional report analysis …
+
+
+
+ bioWARP (biostatistical Web-Applications and R Procedures) is a Shiny application enabling employees at Roche Diagnostics to create validated reports for regulatory authorities submissions. bioWARP enables people using advanced statistical methods, …
+
+
+
+ A physiologically-based mathematical model was developed as a series of ordinary differential equations to describe compositional changes (in fat and fat-free mass, FM & FFM) due to metabolizable energy exchanges in babies from birth to 2 years in …
+
+
+
+ The Data Science team in Pfizer’s Vaccine Research and Development division (VRD) creates and maintains validated applications used during high-throughput clinical testing that enable advanced analytic and reporting requirements. SAS has long been …
+
+
+
+ The United States Food and Drug Administration (FDA) uses a variety of statistical software packages for review and research. This presentation will focus on the uses of R in the Center for Drug Evaluation and Research (CDER), including graphics for …
+
+
+
+ Clinical development requires quick access to live trial data to address safety questions and evaluate data quality. Historically, teams have resorted to Excel to manually populate patient profiles, despite human error limitations, inefficiency, and …
+
+
+
+ In 2022, our team announced the first release of the tfrmt package, providing clinical programmers with the novel ability to create tables without data. With its metadata-driven engine applied to the emerging industry standard of Analysis Results …
+
+
+
+ The Pharmaceutical industry is moving towards open-source tools with companies adopting R/Shiny to revolutionize their processes in clinical reporting, drug development, and translational research, among other areas. One of the initiatives of this …
+
+
+
+ Data monitoring to ensure patient safety is an important process in clinical trials. An independent data monitoring committee (DMC) reviews safety data periodically to interpret findings and assess various safety signals. Sponsors typically provide …
+
+
+
+ In this talk, we would like to introduce openstatsware, an official working group of the American Statistical Association (ASA) Biopharmaceutical Section. The working group has a primary objective to engineer R packages that implement important …
+
+
+
+ Pharmacokinetic (PK) analysis data programming poses some unique challenges. For example, both dosing and concentration records are included, and nominal and actual relative time variables that reference the first dose, the previous dose, or the …
+
+
+
+ Finding the right dose is a critical step in pharmaceutical drug development. There has been varies statistical methodology development for the design and analysis of clinical studies. In particular, MCP-Mod (Multiple Comparisons Procedure - …
+
+
+
+ Computationally-intensive workflows exist in the design, analysis, and simulation in all phases of clinical studies. The runtime of such workflows can be significantly longer than a Shiny app can practicably handle. In such situations, binding the …
+
+
+
+ Addressing estimands in clinical studies involves handling of intercurrent events and often multiple imputation methods are applied to handle missing data. In Novo Nordisk more and more programming tasks are done in R, but still multiple imputation …
+
+
+
+ The Pharmaceutical industry is adopting new tools and technologies, putting pressure on individuals to learn many new skills in a short period of time. In order to both promote these new ways of working, and to assist those adopting it, at Genentech …
+
+
+
+ Continuous integration (CI) and continuous delivery (CD) are playing a pivotal role in ensuring that R projects in Pharma meet the highest quality standards. Particular focus is placed on ensuring that packages are fit for purpose both on internal …
+
+
+
+ gtreg internally leverages gtsummary to streamline production for regulatory tables in clinical research. There are three functions to assist with adverse event reporting tbl_ae_count(), tbl_ae(), and tbl_ae_focus(); tbl_ae_count() tabulates all AEs …
+
+
+
+ Nowadays R is the talk of everyone in the pharmaceutical industry. A lot is being said about statistical programming (CDISC datasets, TFLs) with R and addressing validation issues. The most important players embraced R in various areas of their …
+
+
+
+ When working with big data sources, such as medical claims data, the process of data review and quality control (QC) can be both complex and tedious. R and RMarkdown have become common tools for data analytics and report writing in the pharmaceutical …
+
+
+
+ RStudio is one of the most commonly used software integrated development environment (IDE) for R. In this talk we present RStudio on Amazon SageMaker, a fully managed RStudio IDE on AWS cloud. We also walk through a Health Care and Life Sciences use …
+
+
+
+ Recently there have been a lot of new developments for modeling in the tidyverse. This talk will show off tools for censored regression, an interface to clustering, and how to use the h2o.ai platform for optimizing/fitting models
+
+
+
+ The job of a data scientist working on a clinical trial team in the pharmaceutical industry is to provide the most accurate analysis possible in order to enable valid insights from the data. Ensuring data quality is extremely hard work and there are …
+
+
+
+ It is relatively simple to create a powerful visualization app using shiny, but what if you need to change your data wrangling process or wish to build a different output? How easy is it to provide this flexibility without having to rewrite the …
+
+
+
+ So you've started writing custom JavaScript for your Shiny app... but where do you put all this code?! Organizing JS files to be sourced within one another can be really hard to navigate from within a Shiny application. In this talk I’ll cover what …
+
+
+
+ Back in 2020, Atorus had the initial release of the R package Tplyr, which was built to simplify the creation of clinical summary tables. Now in 2022, new updates and enhancements have been added to Tplyr to give the user more, particularly in the …
+
+
+
+ As more and more companies move their compute environments into the cloud, the steps needed to ensure that their software suite and newfound infrastructure are FDA compliant change accordingly. In this talk, we will examine the requirements for an …
+
+
+
+ On Nov 22nd, 2021, the R Consortium R Submissions Working Group successfully submitted an R-based test submission package through the FDA eCTD gateway. The submission package has been received by the FDA staff who were able to reproduce the numerical …
+
+
+
+ Sarepta deployed RStudio Team for modern data analytics. One major hurdle we faced was how to serve data to Connect/Workbench securely in the backend. Partnering with Atorus, Sarepta solved this challenge using Box cloud storage as a secure data …
+
+
+
+ Over the past year, we’ve designed a process that is meant to mimic public package publishing as closely as possible, where packages are automatically assessed by a series of checks which may prompt manual revision should the automated processing …
+
+
+
+ Increasingly biostatisticians in pharma companies would like to use R on a daily basis, e.g. the growing number of participants in R/Pharma conferences is one metric showing this trend. As R programs replace proprietary software in this regulated …
+
+
+
+ Drug safety data present many challenges with regard to curation, analysis, interpretation, and reporting. Safety endpoints have high variability, are multidimensional, and interrelated which points out to a need to identify novel approaches to …
+
+
+
+ A prespecified adaptive plan involves automating the analysis of interim clinical trial data and adjusting elements of the trial in response. In implementing these plans, we experience random highs and lows in the data, adjacent doses of a drug with …
+
+
+
+ This invited talk will describe the current landscape of CDISC initiatives and collaborations. CDISC currently has a portfolio of innovative industry initiatives that include new standards as well as open-source software projects that are part of the …
+
+
+
+ The presentation will introduce the transition project that the whole department of +150 SAS programmers has completely moved from SAS to Open-source programming. The whole department switched from SAS Studio to R Pro Server, Window server to AWS …
+
+
+
+ Metrum Research Group (MetrumRG) has developed a suite of open-source R packages for pharmacometric analyses that can be used independently, or seamlessly integrated into a larger R-based ecosystem. To showcase this ecosystem, we used the popular …
+
+
+
+ If we could predict a patient's future risk of developing illnesses such as depression or lung cancer in the next three years, then we could potentially intervene and improve the patient's future health. The PatientLevelPrediction R package provides …
+
+
+
+ In the past years, the phama industry has seen a true paradigm shift in its use of R. Up until recently, one had to choose between R and SAS. Today, most statisticians are trained in both languages. With this in mind, at AstraZeneca we built on the …
+
+
+
+ Tables no longer just live in flat PDFs and reports, but should be able to go from apps to PDFs and Word documents with ease. To have the flexibility to do this we need to separate the analysis from the formatting. Additionally, in the pharmaceutical …
+
+
+
+ In the final stage of a clinical study, a number of tables and figures are prepared, typically using SAS, for reporting the results of the study in a clinical study report. Before the clinical study report is finalized a thorough interpretation of …
+
+
+
+ For data science teams, data preparation takes substantial investment of time, data science expertise and subject matter proficiency. However, as the name implies, data preparation is typically viewed merely as a means to an end, encouraging creation …
+
+
+
+ In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny’s greatest strengths is that it allows producing web applications solely from R code, meeting client’s more delicate …
+
+
+
+ In this talk, we will discuss an infrastructure-free R package exchange and distribution system. The components include pkglite for compact package representations, cleanslate for portable R environments, and pkglink for runtime dependency …
+
+
+
+ In recent years late stage Pharma has begun to transition from a consumer of open source, and a sporadic creator, to a heavily invested collaborator on open source tools like R packages. In this short talk, James will discuss our recent focus on open …
+
+
+
+ Statistical programming of summary tables is a well-established task within the clinical world. In the last few years, the pharmaceutical industry has seen several new packages emerge to support these activities, including the Atorus package Tplyr . …
+
+
+
+ Since its first release over eight years ago, the R community has progressively created amazing web-based applications with the Shiny package. In practically every R conference or user meetup, we see amazing examples of how Shiny is changing the …
+
+
+
+ Motivated by the rapid rise in clinical data exploration, there is an increasing need to utilize interactive graphical displays using Shiny apps. To date, the development and deployment of study apps have required specialized knowledge and …
+
+
+
+ RNA-seq transcriptome analysis workflows often generate the essential information (data and results) distributed among a variety of different tabular files and formats, e.g. raw and normalized expression values, results of differential gene …
+
+
+
+ Terms like "digitalization", "machine learning (ML)" or "artificial intelligence (AI)" are more than just buzzwords these days. Databases are analyzed worldwide with modern algorithms and entire industries are making data-driven decisions at an even …
+
+
+
+ R and Python compose the fundamental tools used by data scientists across industries including pharma and biotech. With a rich set of analytical packages in both language domains, analysts who are able to work with both possess a significantly larger …
+
+
+
+ Drug repositioning is an area of growing interest in drug development that can accelerate the discovery of new treatment options to benefit patients worldwide. Briefly, drug repositioning refers to the systematic investigation of a novel disease …
+
+
+
+ (The) Operation (formally known as) Warp Speed is a joint venture between pharma and government to bring COVID-19 vaccines to market at unprecedented speed. A key tenet of the program is to generate the data needed to establish correlates of vaccine …
+
+
+
+ In this talk, we would like to provide updates on the four biopharmaceutical industry focused R consortium cross-industry working groups. These working groups have a similar overall objective to support the use of R within the biopharmaceutical …
+
+
+
+ Tidymodels has begun to create tools for modeling event time data. This will include methods for fitting, resampling, and characterizing models with censored outcomes. This talk will describe our design goals, show some syntax for modeling, and …
+
+
+
+ In this short talk I will present few packages that can be used inside package testing framework that will help to increase overall quality of a package. The main point of focus would be static R code analysis tools such as well-known codetools …
+
+
+
+ Detailed exploration of large transcriptomics datasets, increasingly available at single-cell resolution, is a time-consuming task which often requires the complementary skill sets of data analysts and experimental scientists to complete analyses and …
+
+
+
+ In the safety analysis of clinical trials, the forest plot plays an important role. Currently, most of the forest plots are static, which makes them non-reader-friendly to Data Monitoring Committee (DMC). In this project, we propose an R package - …
+
+
+
+ R package validation is in all our minds since the pharmaceutical industry started moving away from SAS to R for its statistical analysis and regulatory submissions. Opting for open source programming requires to revisit our way of validating code, …
+
+
+
+ The CDISC-SEND data standard has created new opportunities for collaborative development of open-source software solutions to facilitate cross-study analyses of toxicology study data. A public private partnership between BioCelerate and FDA/CDER was …
+
+
+
+ Like many other companies, Merck KGaA/EMD Serono has embarked on their journey to enable the use R for regulatory submissions. Following the framework introduced by the R validation hub (Nicholls et al., 2020), we started to develop an algorithm to …
+
+
+
+ We ALL have a tendency to solve problems with solutions that may be far from optimal. How does this tendency shape our Scientific Software Architecture? What are the long-term consequences of that? What pushes us towards sub-optimal solutions? …
+
+
+
+ How do you roll out R to hundreds of colleagues, ensuring that the version you're providing is tested, qualified and well managed? Is it possible to ensure that everyone is using the same version of R and packages? How do you account for differences …
+
+
+
+ Routinely-collected healthcare databases generated from insurance claims and electronic health records have tremendous potential to provide information on the real-world effectiveness and safety of medical products. However, unmeasured confounding …
+
+
+
+ Programming is ubiquitous in applied biostatistics, and most statisticians know a programming language such as R - yet software engineering is still neglected as a skill and undervalued as a profession in pharmaceutical statistics. Why is this a …
+
+
+
+ Historically building a great SCE for clinical reporting involved selecting a vendor, integrating their product, and supporting a single proprietary language. The shift to report clinical trials using R has had a much broader impact than just …
+
+
+
+ Data sources and the volume of data available for driving discovery and informing decisions have substantially increased over time. This increase has resulted in an evolving data and regulatory landscape ripe for the expertise of statisticians and …
+
+
+
+ I will talk about some of the challenges that are now arising in BioTech. There are larger, more informative but much more complex, data sets available and being developed. While these hold great promise they add complexity to an already fragile …
+
+
+
+ Roche/Genentech, GSK, Atorus and J&J/Janssen have initiated a collaboration called pharmaverse to bring together a curated subset of open-source R packages to enable clinical reporting (from CRF to eSubmission). Where gaps are identified, new …
+
+
+
+ In this talk, we will be discussing an architecturally and bioinformatically multi-layered integrative multiomic approach to the development of target hypotheses. Scientists work to help pharmaceutical companies advance towards the identification of …
+
+
+
+ The gt package is table preparation package for R which makes the presentation of tabular data fairly easy and also has power to customize tables should you need it. The package has been in continuous development at RStudio for over three years and …
+
+
+
+ The Beatles rose to music fame in the 1960's and became a worldwide phenomenon. With millions of screaming fans and selling over 600 million records, they are often cited as one of the most influential rock bands in history. One reason for their fame …
+
+
+
+ Visual representations of data inform how machine learning practitioners think, understand, and decide. Before charts are ever used for outward communication about a ML system, they are used by the system designers and operators themselves as a tool …
+
+
+
+ With recent technological advances and availability of new data sources, we are experiencing exciting changes to the human medical product regulatory landscape. While these new areas have created challenges, they also present opportunities. This …
+
+
+
+ Even though a model prediction can be made, there are times when it should taken with some skepticism. For example, if a new data point is substantially different from the training set, its predicted value may be suspect. In chemistry, it is not …
+
+
+
+ The open-source analytics community is driving innovation in precompetitive spaces like statistical methodology, reproducibility approaches, visualization techniques, and scaling strategies. The diverse and rapdily evolving ecosystem of open-source …
+
+
+
+ The tidyverse (tidyverse.org) is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. The packages primarily consist of tools for data ingest, …
+
+
+
+ Lilliam will be presenting a perspective on what the office of computational science is doing to support regulatory review for safety assessments. She will explore the concept of collaborations and sharing to support process and transparency, along …
+
+
+
+ After a brief introduction to mrgsolve (https//mrgsolve.github.io), we will discuss concepts and applications for using the package in R to simulate from pharmacokinetic (PK) and physiologically-based PK (PBPK) models, estimate parameters given a …
+
+
+
+ Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for …
+
+
+
+ In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny's greatest strengths is that it allows producing web applications solely from R code, meeting client's more delicate …
+
+
+
+ Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that …
+
+
+
+ A four-hour workshop that will take you on a tour of how to get from data to manuscript using R Markdown. You'll learn - The basics of Markdown and knitr- How to add tables for different outputs- Workflows for working with data- How to include and …
+
+
+
+ This is a 3-hour workshop on Stan (https//mc-stan.org). The overall goal of the workshop will be to make the best use of time to answer as many Stan-related questions as possible. The level of the workshop will be intermediate to advanced, but anyone …
+
+
+
+ In this workshop we will walk through an implementation of the R Validation Hub's white paper A Risk-based Approach for Assessing R Package Accuracy within a Validated Infrastructure (https//www.pharmar.org/white-paper/). The workshop will explore …
+
+
+
+ In this workshop we will present how to perform analysis of RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each …
+
+
+
+ This workshop is introductory and open to everyone assuming basic R/Data Science skills. Please note, the workshop is very hands-on oriented, so expect to get your fingers dirty! The aim will be an introduction to ANNs in R. ANNs form the basic unit …
+
+
+
+ We will be presenting an overview of the interoperability between Python and R for the R user community at R/Pharma 2020. This workshop will highlight how statistical programmers can leverage the power of both R and Python in their daily processes. …
+
+
The past few years have shown vast improvements in workflows for reproducible and distributable research within the R ecosystem. At satRday Chicago everyone in the audience said they used R Markdown, however only one person raised their hand when asked if they could associate their reports back to the code version that generated it. Since continuous integration is quickly becoming commonplace in the R community, continuous deployment (CD) is a logical and easy step to add to your workflow to enhance reproducibility. I will demo associating R Markdown to the code version that produced it and automating the build and release of both executable and cloud-based Shiny apps. Finally, an announcement of the electricShine package for creating Electron based Shiny apps will highlight the power of using CD with production-level Shiny apps.
From playing in the backyard to designing one: Shiny transforms study designs, data analyses and statistical thinking of oncology in vivo group at Janssen
In vivo studies are crucial to the discovery and development of novel drugs and are conducted for proof-of-concept validation, FDA applications and to support clinical trials. Appropriate study design, data analyses and interpretation are essential in providing the knowledge about the drug efficacy and safety within a living organism. With drug discovery science moving forward at an ever-accelerating rate data analyses software are not always capable to offer appropriate toolset for data analyses. In the absence of a proper tool, oncology in vivo scientists at Janssen R&D needed comprehensive analysis platform to conduct appropriate and efficient analyses of in vivo data to insure quality and speed of decision-making. INVIVOLDA shiny application was developed to fulfill the gap. INVIVOLDA offers interactive and animated graphics for data explorations and powerful linear mixed effect modeling framework for longitudinal data analysis. With implemented decision trees and statistical report generation it streamlines statistical analyses of in vivo longitudinal data. INVIVOLDA success lead to more requests for Shiny applications for analyses and design of experiments in oncology in vivo group. Multiple statistical trainings were subsequently conducted to educate biologists on statistical methods implemented in Shiny applications. Once completed, comprehensive framework of Shiny apps will enhance statistical knowledge and thinking, transform the way experiments are designed and analyzed and ensure traceable and reproducible research and efficient decision making in oncology in vivo group at Janssen.
Objectives Demonstrate an interactive and dynamic visualization tool, ModViz POP, for simulating ordinary differential equations based PK/PD models with variability. Methods ModViz POP has an in built PKPD ODE library of models based on the compartmental nomenclature for simulating standard IV bolus, infusion and first order absorption scenarios. It also gives the user the ability to plug in a model from local directory to quickly simulate a model of interest. Users can also simulate from a project library which serves as a repository of final PK/PD models developed by individual project teams. Beyond the PK/PD models, it can handle complex QSP models and PBPK models equally well. Enhanced R packages, HTML/CSS, LATEX in combination with Shiny were used and provided an elegant and powerful programming framework for turning models into a web application with dynamic visualization and automated report writing. The user interface consists of several key inputs for performing the simulations. A tabbed navigation allows the user to visualize the plots, input parameters, derived values and equations. It provides the ability to download the underlying model, plots, simulated data or a comprehensive report consisting of all the key inputs and outputs of the simulations. The Help button provides a link to documentation with detailed instructions on different components of the interface. The interface also includes advanced features where users can overlay external data on over simulated data, set a certain simulation scenario as a reference or carry out sensitivity analysis based simulations. Conclusions This easy to use interface can serve as a valuable tool for teams to explore and evaluate potential scenarios and thus facilitate collaborative decision making in the drug discovery and development paradigm. References Kyle T. Baron et. al. mrgsolve Simulate from ODE-Based Population PK/PD and Systems Pharmacology Models. 2017
The Interactive Safety Graphics (ISG) workstream of the ASA-DIA Biopharm Safety Working Group is excited to introduce the safetyGraphics package an interactive framework for evaluating clinical trial safety in R using a flexible data pipeline. Our group seeks to modernize clinical trial safety monitoring by building tools for data exploration and reporting in a highly collaborative open source environment. At present, our team includes clinical and technical representatives from the pharmaceutical industry, academia, and the FDA, and additional contributors are always welcome. The current release of the safetyGraphics R package includes graphics related to drug-induced liver injury. The R package is paired with an in-depth clinical workflow for monitoring liver function created by expert clinicians based on medical literature. safetyGraphics features interactive visualizations built using htmlwidgets, a Shiny application, and the ability to export a fully reproducible instance of the charts with associated source code. To ensure quality and accuracy, the package includes more than 300 unit tests, and it has been vetted through a beta testing process that included feedback from more than 20 clinicians and analysts. The Shiny application can easily be extended to include new charts or applied to other disease areas due to its modular design and generalized charting framework. Several companies have adapted the tool for their own use, leading to interesting discussions and paving the way for enhancements, which demonstrates the power of open source and community collaboration.
Start browsing through R tutorials online and it won’t take long to stumble across a read.csv statement. CSV files serve well for detached, static analyses. They tend fail, however, when tasked with storing large, dynamic data sets being accessed interactively by globally dispersed, concurrent users. Often the go to in this situation is a traditional relational database management system such as Oracle or MySQL, but there are other options! This talk will review the various back-ends, along with potential considerations, Merck’s Digital Proactive Process Analytics group has implemented to support various Shiny applications, dashboards, and automated data analysis pipelines.
The installation of a cohort of R packages can constitute a challenge; especially considering different dependency types, package versions, overlapping namespaces and varying risks assigned to each of the packages. At the same time, the number of R packages to be installed grows exponentially with each new package added. Their complex dependencies may create conflicts. In this context, the R admin is often confronted with a cohort of packages without knowing the package of interest. We use statistical analysis techniques from the field of complex network analysis in order to shed light into the non-trivial dependency structures of package cohorts. Furthermore, we simplify the network graph to find improved installation sequences for a pre-selected cohorts of R packages. We reduce large package cohorts to a sufficient shortlist of packages, whose installation automatically pulls in other packages via dependencies without causing conflicts. The build time of a library may be greatly reduced. As a byproduct, we generate a graph of the build on the exact dependency tree and actual versions used for auditing and change control in the regulated workflows. This strategy also allows for the identification of high-risk packages and their importance in the dependency tree.
Cohort studies of treatments developed from healthcare claims often have hundreds of thousands of patients and up to several thousand measured covariates. Therefore, new causal inference methods that combine ideas from machine learning and causal inference may improve analysis of these studies by taking advantage of the wealth of information measured in claims. In order to evaluate the performance of these methods as applied to claims-based studies, we use a combination of real data examples and plasmode simulation, implemented in R package ‘plasmode’, which creates realistic simulated datasets based on a real cohort study. In this talk, I will give an overview of our progress so far and what is left to be done.
We are amidst a data revolution. Just the past 5 years, the cost of sequencing a human genome has gone down approximately 10-fold. This development moves equally fast within areas such as mass spectrometry, in vitro immuno-peptide screening a.o. This facilitates the search for bio-markers, biologics, therapeutics, etc. but also redefines the requirements for storing, accessing and working with data and the skillset of bio data scientists. In this talk I will present tidysq, an R-package aiming at extending the Tidyverse framework to include (tidy) bio-data-science / bioinformatics. Tidysq will be presented in context with current status in ML driven (neo)epitope prediction within cancer immunotherapy.
bioWARP (biostatistical Web-Applications and R Procedures) is a Shiny application enabling employees at Roche Diagnostics to create validated reports for regulatory authorities submissions. bioWARP enables people using advanced statistical methods, who cannot program R. It builds a connection to the validated R-packages developed at Roche with an easy to use and elegant user interface. Its modular environment can host an unlimited number of such interfaces. bioWARP now consists of tools for reporting reference ranges, equality by linear regression, precision by variance component analysis and homogeneity by inhouse developed equivalence tests . bioWARP’s most important feature is the ability to move all statistical evaluations right into PDF reports. These are validated and can directly be used for submission to regulatory authorities. bioWARP is called the “largest shiny application in the world” by us as it already consists of 16 tools, has over 100.000 lines of code, >500 buttons and interaction items and is growing and growing and growing.
Even though a model prediction can be made, there are times when it should taken with some skepticism. For example, if a new data point is substantially different from the training set, its predicted value may be suspect. In chemistry, it is not uncommon to create an “applicability domain” model that measures the amount of potential extrapolation from the training set. The applicable package will be used to demonstrate different method to measure how much a new data point is an extrapolation from the original data (if at all).
The development of a streamlined data-aggregation methodology utilizing the statistical programming language R is described. The centralization of high-throughput experimentation data enabled the use of statistics and data exploration methods within R to accelerate the identification and optimization of chemical reactions.
Shiny makes it easy to take domain logic from an existing R script and wrap some reactive logic around it to produce an interactive webpage where others can quickly explore different variables, parameter values, models/algorithms, etc. Although the interactivity is great for many reasons, once an interesting result is found, it’s more difficult to prove the correctness of the result since (1) the result can only be (easily) reproduced via the Shiny app and (2) the relevant domain logic which produced the result is obscured by Shiny’s reactive logic. The R package shinymeta provides tools for capturing and exporting domain logic for execution outside of a Shiny runtime (so that others can reproduce Shiny-based result(s) from a new R session).
Scientists in drug discovery research utilize a wide variety of instrumentation and techniques to advance their research. While instrumentation vendors often provide software tools to deal with data wrangling and visualization, a simple collection of isolated tools often fails to address the scale and overall scope of the data analysis encountered in discovery research. This can lead to lost productivity as scientists work to process, collate and summarize their work across various informatics systems. Shiny has seen increasing use in Pharma clinical research. On the other hand, Shiny apps are also transforming discovery research by putting powerful data wrangling, visualization and reporting tools into the hands of bench scientists. To solve the aforementioned problems in drug discovery research, we developed a suite of tools using Shiny to automate data wrangling, integration, visualization and reporting. Examples will be presented in areas such as automated signal processing and batch analysis of electrocardiogram data, collection and visualization of biomarker and physiology measurements from pharmacology studies, and access to a high performance computation cluster for non-experts. These Shiny tools and have shown value by increasing productivity and accelerating scientific research.
Bayesian model-based dose-escalation designs, including one and two parameter logistic regression models, have meanwhile proven themselves in Phase I dose-escalation trials (Iasonos and O’Quigley, 2014 [1]). Compared to rule/algorithm-based designs such as the 3+3 design, model-based designs have the advantage of being more flexible in choosing the target toxicity rate and cohort size. Bayesian modeling allows to combine prior knowledge of the drug (e.g. from animal tox studies, data from comparator drugs or data from other studies) with the observed data from the current trial. The model-based approach accounts for uncertainty, optimize dose recommendations (balance of risk versus benefit for patients) and allows for dose de-escalation as well as dose re-escalation. Recent research has shown that model-based approaches are more reliable in estimating the maximum tolerated dose (MTD) and allocating less patients to ineffective or excessively toxic doses (Jaki et al., 2019 [2]). Because of their higher complexity, model-based designs are still seen critical among clinicians (Le Tourneau et al., 2012 [3]), thus the classical 3+3 design is still widely implemented due to its simplicity and transparency. Conaway et al., 2019 [4] have highlighted that the choice of design in early phase affects the outcome of the drug development process, therefore more attention should be paid to early-stage designs. In this talk we will showcase how we bring Bayesian dose-escalation models closer to non-statisticians by means of R. We will discuss how we plan, implement and communicate the dose-escalation design to the clinicians. Moreover, we show how we present our results to the safety monitoring committee (SMC) and how we support the SMC in dose escalation decisions. We use a simple data example to illustrate the proposed methodology. All will be discussed within the context of using R as primary tool. [1] Iasonos, A. and O’Quigley, J (2014). Adaptive Dose-Finding Studies A Review of Model-Guided Phase I Clinical Trials. Journal of clinical oncology, 2014; 32(23). [2] Jaki, T., Clive, S. and Weir, C.J. (2013). Principles of dose finding studies in cancer a comparison of trial designs. Cancer Chemother Pharmacol 711107-1114. [3] Le Tourneau C, Gan HK, Razak ARA, Paoletti X (2012). Efficiency of New Dose Escalation Designs in Dose-Finding Phase I Trials of Molecularly Targeted Agents. PLoS ONE 7(12) e51039. [4] Conaway, M.R. and Petroni, G.R. (2019). The impact of early phase trial design in the drug development process. Clinical Cancer Research.
Physiologically based pharmacokinetic (PBPK) models are used extensively in drug development to address of number of problems. However, most PBPK applications have limited knowledge sharing impact because they are implemented in closed, proprietary software. Much of the physiologic data and knowledge required for these models is publically available or available in the pre-competitive space. To this end, we’ve engaged in the development of open science PBPK models, using R as the scaffolding for this work. In particular, our group has developed the mrgsolve R package which utilizes Rcpp to compile models of systems of ordinary differential equations. One example is the development of a PBPK model to predict maternal/fetal exposures for drugs that are primarily metabolized by liver CYP450 enzymes throughout pregnancy. This model aims to utilize a quantitative understanding of the physiological and biochemical changes that occur throughout pregnancy to inform clinical pharmacology decisions where clinical trials cannot. The model was validated against the observed data of 5 different drugs midazolam, metoprolol, caffeine, nevirapine, and artemether. A series of local sensitivity analyses followed by parameter optimization further improved model predictions using the mrgsolve and nloptr R packages. The developed maternal-fetal PBPK model in its flexible open-source implementation provides a transparent, platform-independent, and reproducible system for model-informed decision support while developing exposure-based dosing recommendations in maternal/fetal patient populations.
In the early phases of clinical development, the future of a compound depends on more than just the result of hypothesis test on a single endpoint, in a single phase 2 study. We think a lot about how design choices affect immediate outcomes. GSK’s Quantitative Decision Making (QDM) framework focusses on the question,“How do we design our study in order to increase the chances that it will deliver data that will allow us to decide whether the drug should continue in development, or stop?” The QDM Framework has been developed in R and takes advantage of the Biostatistics HPC environment, running thousands of hypothetical scenarios in close to real-time. The initiative is changing the way we plan and deliver clinical trials. Thanks to a Shiny front end, Statisticians are able to walk clinical teams through key trial design decisions in order to estimate the Probability of Success – a key component in the QDM framework. This presentation will cover the core QDM concepts and present the key communication outputs created to support the process.
The data wrangling and manipulation capabilities in R make it perfectly suited for transforming raw clinical database data into structured, submission-ready CDISC datasets. By extensively using the dplyr, tidyr, and other packages in the tidyverse we create datasets that are ready for 1. pharmacokinetic analysis (done in other software), 2. generation of tables, listings, and figures (TLFs) and 3. submission to the FDA. I’ll also go over report ready TLF generation in R and how we’ve had great success in being able to produce beautiful and easy to read TLFs for our analysts and clients by using the ggplot2 and the officer packages.
R is pretty good in backwards compatibility but still reproducing analysis even given script and data can be a challenge as packages, R, and math libraries keep evolving. www.rocker-project.org offers among other things version-stable R in docker (Rocker) images. A small example will be presented how this allows on any docker runtime environment to execute analysis with highest reproducibility. Such environments are part of all major commercial cloud providers but also allow on-premises installations.
In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention) and the lessons learned while creating packages as a team. Housed within Fred Hutch, SCHARP is an instrumental partner in the research and clinical trials surrounding HIV prevention and vaccine development. Part of SCHARP’s work involves analyzing experimental biomarkers and endpoints which change as the experimental question, analysis methods, antigens measured, and assays evolve. Maintaining a validated code base that is rigid in its output format, but flexible enough to cater a variety of inputs with minimal custom coding has proven to be important for reproducibility and scalability. SCHARP has developed several key steps in the creation, validation, and documentation of R packages that take advantage of R’s packaging functionality. First, the programming team works with leadership to define specifications and lay out a roadmap of the package at the functional level. Next, statistical programmers work together and approach the task from a software development view. Once the code has been developed, the package is validated according to procedures that comply with 21 CFR part 11, and leverage software development life cycle (SDLC) methodology. Finally, the package is made available for use across the team on live data. These procedures set up a framework for validating assay processing packages that furthers the ability of Fred Hutch to provide world-class support for our clinical trials.
We know that adopting documentation, testing, and version control mechanisms are important for creating a culture of reproducibility in data science. But once you’ve embraced some basic development best practices, what comes next? What does it take to feel confident that our data products will make it to production? This talk will cover case studies in how I work with R users at various organizations to bridge the gaps that form between development and production. I’ll cover reasons why CI/CD tools can enhance reproducibility for R and data science, showcase practical examples like automated testing and push-based application deployment, and point to simple resources for getting started with these tools in a number of different environments.
Purpose To establish a gold-standard methodology for accurately extracting progression-free survival (PFS) following Diffuse Large B-Cell Lymphoma (DLBCL) treatment using real-world electronic healthcare record (EHR) data. Results We produced an R Shiny application which can capture, annotate, and transform unstructured EHR data into structured data - specifically, treatment lines, cycles, and response criteria with corresponding dates - ready for analysis of PFS. An annotation schema for capturing real-world data was also developed. Mapping of common phrases used by clinicians in real-world practice to response criteria resulted in a dictionary of these phrases.
The primary objective of the presentation is to share insights of democratizing powerful natural language processing tool like I2E lingumatics and open source R and Shiny. The talk will focus on how we can leverage I2E python sdk natural language processing toolkit to perform natural language processing and visualize text mining results with R and Shiny. We will present several uses of our R shiny platform called pharmine and its use cases which we developed for minining biomedical data.
This workshop is introductory and open to everyone assuming basic R/Data Science skills. Please note, the workshop is very hands-on oriented, so expect to get your fingers dirty! The aim will be an introduction to ANNs in R. ANNs form the basic unit of deep learning and are immensely powerful in predictive modelling, but not without pitfalls. In this workshop, we will be working with conceptually understanding what an ANN is, how we train an ANN and how predictions are subsequently made. We will also touch upon parameters, hyper-parameters and how to handle data all in context of model over-fitting. All of the aforementioned will be done using TensorFlow via Keras for R.
We will be presenting an overview of the interoperability between Python and R for the R user community at R/Pharma 2020. This workshop will highlight how statistical programmers can leverage the power of both R and Python in their daily processes. Participants will get hands on experience working with some of the best aspects of both R and Python, and how these two languages can work together within R Markdown.
In this workshop we will present how to perform analysis of RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions. We can achieve this for bulk RNA sequencing data with the tidybulk, tidyHeatmap and tidyverse packages. We will also touch on packages for tidy single-cell transcriptional analyses. These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data. Recommended pre-requisites- Basic knowledge of RStudio- Some familiarity with tidyverse syntax- Background Reading Introduction to R for Biologists
In this workshop we will walk through an implementation of the R Validation Hub’s white paper A Risk-based Approach for Assessing R Package Accuracy within a Validated Infrastructure (https//www.pharmar.org/white-paper/). The workshop will explore two core themes 1. R Packages Risk Assessment2. TestingIn part 1, we will use a small set of pre-selected R packages to see how the R Validation Hub’s Risk Assessment Application and the riskmetric R package can be used to create risk assessment reports for an R package.In part 2, we will discuss how testing can be used to reduce the risk for those packages with high risk. In particular, we will discuss the testing philosphy with respect to software validation and demonstrate how the ‘testthat’ package can be used to perform the necessary steps to test traceability requirements.Prior knowledge of the basic structure of R packages is required for the second part of this workshop.
A four-hour workshop that will take you on a tour of how to get from data to manuscript using R Markdown. You’ll learn - The basics of Markdown and knitr- How to add tables for different outputs- Workflows for working with data- How to include and style graphics
Safety and efficacy data in clinical trials are mostly analyzed separately. However, especially the treatment of life-threatening disease such as cancer requires a good understanding of benefit and associated risks to make an informed therapy decision for an individual patient. Recently approved immunotherapeutic drugs in oncology are associated with potential side effects such as immune-related hypothyroidism, rash and colitis. There is some biological reasoning that the occurrence of immune-related adverse events and corresponding management may compromise the drug response. On the other hand, it has been observed that patients responding to treatment might face a higher likelihood of adverse drug reactions. A multi-state model is able to explore these hypotheses and offers the opportunity of insights into potential associations while addressing some of the methodological challenges. For example, the necessity of a time-dependent approach to accommodate the fact that safety and efficacy events can occur throughout the treatment. Moreover, longer treatment duration can impact simultaneously the likelihood of efficacy as well as safety events, i.e., introducing immortal time bias. The multistate model is able to unfold this spurious correlation. We present an approach for analysis and exemplify the methodology with simulated data.
This is a 3-hour workshop on Stan (https//mc-stan.org). The overall goal of the workshop will be to make the best use of time to answer as many Stan-related questions as possible. The level of the workshop will be intermediate to advanced, but anyone is welcome to join. The workshop will be taught by Daniel Lee. Daniel is one of the original Stan developers (started in 2011). He’s been involved in the whole stack language, CmdStan, RStan, PyStan, continuous integration, setting up the forums, StanCon, and more. He’s had a lot of experience with debugging computational issues, the crossover between statistical models and computational, understanding how all the pieces fit in together, and knowing a lot of different ways to accomplish the same thing in the Stan language.The format of this workshop won’t be a straight online lecture. I’m personally tired of Zoom meetings; I don’t think the intro course I teach works in this format. Instead, we’ll have a blend of an instructor-led example, a masterclass, and an AMA. Please come with questions or ask them as we go along. Here’s a rough plan (but we can deviate from this)1. Brief introduction to Stan. Goal understand what Stan is, what the inferences are (and why they’re different), and agree on terminology.2. Walkthrough an example (survival or PK/PD model). Show differences between posterior distributions and point estimates. Maybe discuss quality of MCMC sampling.3. One-on-one with a participant. Walk through their problem, attempt at a solution, walk through the different modeling choices we’re making, how to structure simulated data, etc.4. One-on-one with another participant.5. Questions / Wrap up.
Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that actually run. Targets learns how your pipeline fits together, skips costly runtime for steps that are already up to date, runs the rest with optional implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the output matches the underlying code and data. In other words, the package saves time while increasing our ability to trust the conclusions of the research. Targets surpasses the most burdensome permanent limitations of its predecessor, drake, to achieve greater efficiency and provide a safer, smoother, friendlier user experience. This hands-on workshop teaches targets using a realistic case study. Participants begin with the R implementation of a machine learning project, convert the workflow into a targets-powered pipeline, and efficiently maintain the output as the code and data change. R proficiency intermediate and above required.
In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny’s greatest strengths is that it allows producing web applications solely from R code, meeting client’s more delicate expectations will often involve going beyond R code and work with HTML, CSS, and JavaScript. We recognize that R developers tend not to be familiar with the latter as they generally do not have significant background in web development, these may therefore appear daunting at first.This workshop aims to put attendees at ease with inviting those web technologies into their shiny applications so they can exceed client’s expectations. The workshop will comprise three parts.Part 1 hones in on the development of a new template on top of Shiny with the htmltools package. Workshop attendees will have the opportunity to collaborate with the RinteRface team on the shinybulma project (https//github.com/RinteRface/shinybulma).Part 2 delves into bi-directional communication in Shiny how the R server communicates with the front-end and vice versa, how the input/output system works.Part 3 ends the workshop by exposing all the less known functions/methods that are however likely to help you in your Shiny journey!Prerequisites for the workshop- Be proficient with Shiny- Basic knowledge about R6- Be proficient with package development- JavaScript/CSS skills may help but are not mandatory
After a brief introduction to mrgsolve (https//mrgsolve.github.io), we will discuss concepts and applications for using the package in R to simulate from pharmacokinetic (PK) and physiologically-based PK (PBPK) models, estimate parameters given a model and data, and visualize simulation results with a Shiny app. We will establish a basic framework for running optimization in R and work hands-on examples using different optimizers, including local and global search algorithms. Building on this framework, we will also illustrate related workflows including global and local sensitivity analysis. Finally, we will develop and deploy a Shiny app using Rstudio Connect, allowing interaction with the model and optimization results by non-modeling stakeholders.
Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems? Do you need a flexible framework for handling text data that allows you to engage in tasks from exploratory data analysis to supervised predictive modeling? This tutorial is geared toward an R user with intermediate familiarity with R, RStudio, the basics of regression and classification modeling, and tidyverse packages such as dplyr and ggplot2. This person is comfortable with the main functions from dplyr and ggplot2 and is now ready to learn how to analyze and model text using tidy data principles. This R user has some experience with statistical modeling (such as using lm() and glm()) for prediction and classification and wants to learn how to build models with text.
With recent technological advances and availability of new data sources, we are experiencing exciting changes to the human medical product regulatory landscape. While these new areas have created challenges, they also present opportunities. This presentation will review the various FDA initiatives in Real World Data/Real World Evidence, Complex Innovative Designs, digital health technology, and COVID-19 and highlight the areas for which statisticians play a key strategic role in efficiently addressing current and future medical product development and evaluation challenges. Statisticians need to be effective leaders and communicators in interdisciplinary teams.
Medical oversight during a clinical trial is an extensive and time-consuming process. To safeguard patient safety, medical monitors need to review and explore raw safety data interactively, using standard visualizations as well as specific analyses tailored to the disease and the clinical study. The creation of semi-automated reports in R could facilitate this operation. The reports include interactive visualizations (with the plotly package) and interactive descriptive statistics tables and listings (with the DT package) for safety review of the patients. Template reports (based on Rmarkdown) incorporating standard analyses are integrated within an R package. The reports are set up via YAML configuration files to allow non-R users to customize the report for his/her specific study. Such report is created from datasets in CDISC standard SDTM or ADaM format, and delivered in the form of linked self-contained html pages. The creation of the report documentation (in the R package) and the validation of the input parameters in the config files is automated and provided with the JSON schema format. The medical oversight tool is integrated with functionalities to generate patient profiles, CSR-ready in-text tables, and enables comparison of results between multiple interim data batches delivered in the course of the clinical trial. The tool will be demonstrated on a publicly available dataset.
MMRMs are often used as the primary analysis of continuous endpoints in longitudinal clinical trials (see e.g. Mallinckrod et al, 2008). Essentially, an MMRM is a specific linear mixed effects model that includes (at least) an interaction of treatment arm and categorical visit variables as fixed effects. The covariance structure of the residuals can have different forms, and often an unstructured (i.e. saturated parametrization) covariance matrix is preferred. This structure can be represented by random effects in the mixed model. All of this has typically been implemented in proprietary software, such as SAS, as its PROC MIXED routine is generally seen as a gold standard for mixed models. However, this does not allow the use of interactive web applications to explore the clinical study data in a flexible way. Furthermore, fitting such proprietary software into workflows such as automatic document generation is not convenient. Therefore, we wanted to implement MMRM in R. Several challenges had to be solved, such as finding the right R-packages for this purpose. We finally settled on lme4 in combination with lmerTest , which could match results in SAS up to numerical precision. Convergence of estimates can be an issue and multiple optimization algorithms are therefore tried in parallel to enhance robustness. Extracting the covariance matrix estimate from lme4 results was solved as well as finding model fit statistics that match SAS results. We use our own rtables to produce tables and ggplot2 for plots. We developed a Shiny module in our internal framework for exploratory web applications. Further validation in the next months will allow us to use the R implementation for regulatory purposes, with greater flexibility and efficiency than before.
In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention). Housed within Fred Hutch, SCHARP is an instrumental partner in the research and clinical trials surrounding HIV prevention and vaccine development. Part of SCHARP’s work involves analyzing experimental biomarkers and endpoints which change as the experimental question, analysis methods, antigens measured, and assays evolve. Maintaining a validated code base that is rigid in its output format, but flexible enough to cater a variety of inputs with minimal custom coding has proven to be important for reproducibility and scalability. SCHARP has developed several key steps in the creation, validation, and documentation of R packages that take advantage of R’s packaging functionality. First, the programming team works with leadership to define specifications and lay out a roadmap of the package at the functional level. Next, statistical programmers work together to develop the package, taking advantage of the rich R ecosystem of packages for development such as roxygen2, devtools, usethis, and testthat. Once the code has been developed, the package is validated to ensure it passes all specifications using a combination of testthat and rmarkdown. Finally, the package is made available for use across the team on live data. These procedures set up a framework for validating assay processing packages that furthers the ability of Fred Hutch to provide world-class support for our clinical trials.
metashiny is an R package that provides a point-and-click interface to quickly design, prototype, and deploy essential Shiny applications without having to write one single line of R code. The core idea behind metashiny is to parametrize Shiny modules, which are reusable units of Shiny logic with their own namespace. Instead of modifying a module to fit various analytical needs, metashiny strives to build a module template that encompasses a wide range of popular Shiny logic, then uses a “meta”-Shiny interface to collect user requirements and customize the Shiny modules using these inputs as parameters. The customized Shiny modules are embedded in the “meta”-Shiny for preview, may be downloaded in a self-contained, functioning Shiny directory and may be deployed to a Shiny Server with minimal configuration. metashiny may be very useful in the initial design phase of Shiny products. Finally, an important feature for non-R users is it eliminates the need of learning Shiny code and the R environment, thus enables analytical colleagues from all backgrounds to explore the fantastic power of Shiny.
Validation of the R statistical package has become a hot topic since 2015, when the FDA issued the Statistical Software Clarifying Statement, stating officially that no specific software is required for submissions, and that any tool can be used if only it is reliable and documented appropriately. It instantly brought the attention of the pharmaceutical industry. Individual attempts to fulfil validation requirement and bring R to the controlled environments were made by a number of companies independently. In addition, combined efforts of the biggest pharmaceutical companies resulted in launching the R Validation Hub project. While most of the initiatives seem to focus on documentation and package quality assessment, relying on the results of unit tests delivered “as is” by the authors of R packages, we at 2KMM CRO set different priorities, driven by the importance of exhaustive numerical validation done in the first place. Without that, there is a risk that all the efforts on documentation and quality assurance will pertain to routines which results differ from those obtained with other trusted software in a way that cannot be adequately justified. While we do not undermine the importance of documentation and early unit testing, we believe that numerical validation, going far beyond running those tests, is mandatory to achieve satisfying level of reliability. We would like to share our findings in this area, including the choice of reference input data and results used during the validation, sources of discrepancies between R and other software, interpretation and acceptance of the results.
The crisis of opioid abuse and overdose in the United States has involved unprecedented levels of opioid prescriptions and opioid-related mortality. Greater understanding of current trends in prescription opioid utilization may help prevent new cases of abuse, addiction, and overdose. The U.S. Food and Drug Administration (FDA, the Agency) is expanding its capacity for proactive pharmacovigilance of drug abuse, in addition to other drug safety signals. In post-market safety surveillance, pharmacy dispensing data provide valuable insights to the Agency for oversight of drug utilization. The drug dispensing data include the number of product dispensings aggregated over a time frame (e.g., months) by geographical locations (e.g., states, core-based statistical areas). One promising approach to enhance pharmacovigilance using these data would be through data enrichment geographically referenced public data sources covering detailed information on demographic, socioeconomic, and healthcare service can be overlaid to proprietary, nationally projected data for prescription drug dispensing. Our project, funded by the Center for Drug Evaluation and Research (CDER) Safety Research Interest Group (SRIG) program, seeks to develop a data analysis pipeline and software for generating real-world evidence (RWE) that will monitor changes in prescription opioid use and guide proactive pharmacovigilance of drug abuse. The software will provide tools to augment proprietary, nationally projected data for prescription drug dispensing with other geographically referenced, publicly available, demographic, socioeconomic, or healthcare service data. The software will generate RWE including user-interactive data visualization, spatio-temporal modeling, and machine learning for identifying factors potentially associated with drug utilization, misuse, and abuse.
In this talk, I will speak about my personal journey of learning R and transforming from a clinical study statistical programmer to a SAS/R bilingual, as well as my journey of leading the R initiative in Amgen’s Global Statistical Programming Department and Amgen R meetup, working with IS, statistician, quality, LMS and external partners. I will conclude by talking about the areas of challenge and the direction of R for statistical programming in a regulated environment and proposals for R in Pharma collaboration.
As stated in my 2018 R/Pharma presentation “Becoming Bilingual in SAS and R” I believe in problem-solving using different data science tools. This talk is about my team’s efforts at using different data science tools (SAS R and Python) to harmonize data from 10+ clinical studies to build a robust and automated data mart that will eventually integrate biomarker data from clinical studies and real world data(RWD). (1) SAS data dictionary and ODS are first used because of two reasons Firstly ADaM datasets are in sas7bdat format. Secondly Data dictionary and ODS are powerful tools that R or Python have not well-established package. (2) R is used for its visualization power and Shiny and Rstudio’s Reticulate tools for integration of Python into R projects. (3) Python is used for its fuzzywuzzy package and potentially NLTK package. In this project we are particularly pleased and impressed by Rstudio’s work on seamlessly integrating Python tools into R projects. This project showcases the use case of combining the three programming languages in Clinical Data Integration space. It also provides a POC(proof of concept) for integrating Kite internal data with external data and RWD data. It is also future looking in the sense that it prepares us to deal with future wearable device data that innovative technology and precision medicine will bring into Oncology treatment scene.
Visual representations of data inform how machine learning practitioners think, understand, and decide. Before charts are ever used for outward communication about a ML system, they are used by the system designers and operators themselves as a tool to make better modeling choices. Practitioners use visualization, from very familiar statistical graphics to creative and less standard plots, at the points of most important human decisions when other ways to validate those decisions can be difficult. Visualization approaches are used to understand both the data that serves as input for machine learning and the models that practitioners create. In this talk, learn about the process of building a ML model in the real world, how and when practitioners use visualization to make more effective choices, and considerations for ML visualization tooling.
Supporting data-driven decisions in the planning of clinical trials during the current pandemic involves extensive integration of heterogenous data sources, sophisticated predictive modelling, and custom visualization to communicate the predictions to decision makers. We used R to rapidly deliver end-to-end planning tools for GSK in this difficult time. We built a pipeline to integrate, clean and, crucially - test, a variety of internal and external datasets. This data then fed into a patient recruitment model and, finally, into a SQL-powered shiny app for interactive visualizations. The creation of the planning tool required bringing together statisticians, data scientists and clinical operations in an intense collaboration, powered by R.
Effective visual communication is a core competency for pharmacometricians, statisticians, and, more generally, any quantitative scientist. It is essential in every step of a quantitative workflow, from scoping to execution and communicating results and conclusions. With this competency, we can better understand data and influence decisions toward appropriate actions. Without it, we can fool ourselves and others and pave the way to wrong conclusions and actions. The goal of this talk is to convey this competency through three laws of effective visual communication for the quantitative scientist have a clear purpose, show the data clearly, and make the message obvious.
The visR project for effective graphics in drug development visR is an open collaborative effort to develop solutions for effective visual communication with a focus on reporting medical and clinical data. The aim of the collaboration is to develop a user-friendly, fit for purpose, open source package to simplify the use of good graphical principles for effective visual communication of typical analyses of interventional and observational data encountered in clinical drug development.
A physiologically-based mathematical model was developed as a series of ordinary differential equations to describe compositional changes (in fat and fat-free mass, FM & FFM) due to metabolizable energy exchanges in babies from birth to 2 years in low-to-middle income countries.1 The objective of this work was to identify potential biomarkers for future intervention studies, identify when to intervene to protect and/or rescue growth in individuals suffering from malnutrition, and to identify which of these individuals would be more or less likely to respond to a nutritional intervention. A translation of this model (155 parameters and 26 compartments) using R and the open-source mrgsolve package2 provided an efficient platform for multi-parameter optimization, as required during additional model development and for subsequent simulations. For comparison, a 8.62 seconds simulation with viral and bacterial infections (no interventions) in the R/mrgsolve implementation required 226 seconds in Matlab. Model translation to R also enabled simulations with a Shiny App, allowing users to simulate individual infant phenotypes and infection events and visualize growth and energy levels over time, relative to healthy (WHO) standards. The model currently also includes a relatively simple implementation of persistent antibiotic therapy with a potential for inclusion of drug exposure-related effects, i.e. - through a pharmacokinetic (PK) model, to describe effects of antiviral or antibiotic therapy. The challenge to this development is the scarcity of available data describing this therapy in malnourished children that would be needed for model calibration. Further development of the model includes linking to other systems models such Mother-fetus energy exchange or PBPK mother-fetus models, to enable simulations of growth beginning at gestation.
The pharmaceutical industry has witnessed a growing interest in open source languages such as R and Python as an alternative to SAS for many activities related to clinical research. Hop on board for a whistle-stop tour of our efforts within GSK Biostatistics to integrate R programming into the clinical reporting pipeline. Hear how our journey started, where we are now, and what challenges and opportunities lie ahead.
The development of laboratory developed tests (LDTs) and in vitro diagnostics (IVDs) requires the execution of studies to determine the analytical performance of the assay. Examples of analytical studies include limit of detection, intermediate precision, and stability studies. These studies often require similar analyses to be repeated multiple times on replicates or different sample types. The results of these analyses need to be stored in data structures that are easily accessible to the lead analyst as well as additional team members responsible for validating the work. Nested data frames are a powerful and flexible data structures that are well suited for these requirements. This talk will show how storing all of the steps of an analysis pipeline in a nested data frame allows analysts to utilize the well-established functionality of the tidyverse family of packages for efficient analysis and summarization of the data. It will also discuss how nested data frames are well suited for reproducibility and traceability, which are vital to documenting analytical performance. Reproducibility is often achieved by writing R notebooks in an environment that maintains package version consistency (e.g. docker, RStudio Server). Using nested data frames as the underlying data structure within these frameworks provides a transparent and modular method for storing the results of an existing analysis and providing easily accessible data for downstream analysis.
In the recent years, R Shiny apps have gained considerable momentum and have been utilized to develop many useful dashboards and user interfaces (UI) that allow non-programmers access to innovative tools. Due to the ease of development of Shiny apps and lack of complex examples, R developers often create a new shiny app in a single app.r file that contains both the ui and server code/ As a project grows, and capabilities expand in the app, a common practice is to separate the code into two files, one for the server object and one for the ui object. While these approaches may suffice for simple applications, they can lead a developer or team of developers down a path to an application with many lines of code (e.g. 15,000+) in a single file that can be extremely difficult to debug, test, maintain or expand. This approach can also lead to a file with a mixture of UI/server related code in the same files as complex computational code. In this talk, I will present the PREP (Packages fRom tEmPlates) package that was created to help teams streamline development of R Shiny apps and R packages using an approach that follows software development best practices. The PREP package adds new project types to R Studio to help streamline new project creation and development. There are three PREP project type options 1) a Shiny app as a package, 2) a Shiny app or 3) R package that is setup with the unit testing framework included utilizing testthat and is intended to contain all the complex computational functionality. Both Shiny app options are organized using modules with a consistent default theme, ability to switch between color theme options and example code for commonly implemented tasks. By developing the complex computations in the R packages and the Shiny app as separate projects, teams can utilize each person’s skill set better and simplify the testing thus making a more robust final product. By developing the Shiny app with modules, teams can avoid extremely long single files and allow for sharing customized controls within different pages, make it much easier of using source control technology like GitHub. In addition, the PREP package includes functions to add new tabs and modules to the Shiny app and create new functions with testing setup in the computational package to avoid multiple steps of creating files for new functions and testing. PREP is designed to be used by new package/Shiny developers and is highly customizable for expert users without adding a dependency to your final product.
Non-compartmental pharmacokinetic analysis (NCA) is used in the characteristization of drugs absorption, distribution and elimination in the body. Software that implements NCA is available from commercial and non-commercial, open-source, sources. openNCA is a Pfizer, Inc in-house developed desktop application with enterprise capabilities designed to provide a PK bioanalysis result repository as well as an NCA computation routines. The system is built with modern technologies including Javascript/Typescript, Angular, Electron, Elasticsearch, Modeshape, Splunk, docker and a substantial R code base that implements system functions, configuration, analysis, reporting and user defined functionality. openNCA capabilities include -Repository/Library/Metadata stores -Data Loading/Merging/Validation -Integration with Clinical Trial operational data -Integration with Patient Information Management System -Data Access controls -Data Transformation -NCA Analysis -RMarkdown and LaTeX Reporting -Shiny Apps -Quality Control -Workflow, Data, Transformation and Analysis Lineage -Navigation and Search -Reporting Event management -Publishing/Data Sharing Design considerations for openNCA include reproducibility, security/integrity, extensibility, discoverability and traceability. Extensibility is a cornerstone characteristic that is enabled through extensive utilization of the application of R scripts and Shiny apps to configure the system functions. The openNCA computation engine R package (https//github.com/tensfeldt/openNCA) for NCA analyses enables some unique capabilities and forms one module of the system and is open-sourced under the MIT license. openNCA, both the R driven application and NCA computation R package, provides an example of an industrial application of R and is represents the in-kind contribution from Pfizer Inc to the intial prototype project of the Pharmaceutical Open Source Software Consortium (POSSC https//www.possc.org/) to promote industrial support for open-source software development and innovation for the Clinical Pharmacology and Pharmacometrics discipline.
The current paradigm for analyzing clinical trial data is cumbersome it is an inefficient, slow, and expensive process. Several rounds of iterations between the main programmer and the validation programmer are usually needed to thoroughly explore the data. Furthermore, clinicians and statisticians often would like to explore the data themselves but lack a robust and flexible platform to carry out this data exploration. For instance, they may need to inspect an endpoint for patients with certain genetic markers, to analyze biomarker data, or to create table summaries. To meet these needs, we created tidyCDISC, an open source Shiny application that can be used to generate custom tables, statistics, and figures. The application has three modules a drag and drop table generator, a graphical population explorer, and a patient history explorer. We’ve taken a modularized approach to our package to ensure the application can be easily expanded upon to include further analyses and figures. By sharing our application as an open source solution, we hope to help other scientists with similar problems as well as promote collaboration in the pharmaceutical industry.
In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories Counting for event based variables or categories, shifting to describe changes in state, and descriptive statistics to summarize continuous variables. For many of the tables that go into a clinical submission, at least when considering safety outputs, the tables are made up of a combination of these approaches. Consider a demographics table. When you look at the table, you can begin breaking the output down into smaller, redundant, components. These components can be viewed as ‘layers’, and the table as a whole is constructed by stacking those layers. Tplyr uses this concept to provide an intuitive framework to building clinical safety summaries.
Next-generation sequencing (NGS), phage display technology and high throughput capacities enables biologists in drug discovery to characterize antibodies (Abs) based on their HCDR3 sequences and further group them into families before moving to hit-to-lead stage of drug discovery and development. This enables diversification of Ab portfolio and insures back up options if Ab candidate fails. However, there was no method or software available in-house to support Ab discovery with capacities to apply biophysical rules to classify the sequences. Shiny app “Group My Abs” was developed to apply biophysical properties for Ab characterization to the NGS data. Several Multiple Sequence Alignment algorithms implemented in the app enable sequence comparability. A method was developed to evaluate differences between comparable sequences and subsequently classify sequences into families. The app provides custom-made and interactive data visualization, enables refined Ab classification in a mathematical manner, considerably increases efficiency and insures reproducibility. This all decreases bias and enables informative decision making during the hit-to-lead stage in biologics drug discovery.
The Beatles rose to music fame in the 1960’s and became a worldwide phenomenon. With millions of screaming fans and selling over 600 million records, they are often cited as one of the most influential rock bands in history. One reason for their fame was their ability to communicate in the middle of songs without using words and without missing a single beat. This led me to consider some of the best collaborations I have been a part of, which are those where the team is in complete alignment and information flows easily from one team member to another. As analysts, it is our job to enhance the ability of teams to communicate with the best tool at our disposal; graphics. Just like Paul and John, our graphics need to communicate without speaking to convey information and help teams make critical decisions about clinical trials. Novartis leverages the potential of R - Shiny to develop interactive tools that engage users to explore their clinical trial data with ease. Although several programs have been impacted with this technology, the goal of reaching the entire drug development portfolio is still a work in progress. This talk will describe our experiences with R - Shiny with some examples. Finally, it should be stressed that creating effective Shiny Apps requires thought, as well as adherence to strong graphical principles. In this vein, we will provide and describe our Graphical Principles Cheat Sheet(TM) that covers many aspects and considerations one should follow when devising either static or dynamic graphics.
Julia is a modern programming language that provides the ease of use of R with the speed of C++. Julia has been in development for over 11 years. Research on Julia originated at MIT in 2009. Julia is powered by multiple dispatch - a generalization of both, object oriented programming and functional programming. Julia’s multiple dispatch makes it easy to write programs at a high level of abstraction while simultaneously getting high performance. This has led to Julia being used by over 10,000 companies and 1,500 universities worldwide. Pumas, developed in Julia, integrates mechanistic pharmacometric models with Scientific Machine Learning and neural networks. In a recent case study, we demonstrated 175x speedup for a QSP workload. Pumas is designed for every type of analysis scientists perform throughout the drug development lifecycle in one seamless environment. Leveraging Julia’s parallel capabilities, Pumas leverages distributed computing and GPUs and runs on the cloud through the JuliaHub platform. These workflows leverage the Julia’s database, statistics, and visualization functionality in a single package.
Genetically modified organisms (GMOs) and cell lines are widely used models to estimate the efficacy of drugs and understand mechanism of actions in biopharmaceutical research. As part of characterising these models, DNA sequencing technology and bioinformatics analyses are used systematically to study their genomes. Therefore, large volumes of data are generated and various algorithms are applied to analyse this data, which introduces a challenge on representing all findings in an informative and concise manner. Scientific visualisation can be used to facilitate the explanation of complex genomic editing events such as integration events, deletions, insertions, etc. However, current visualization tools tend to focus on numerical data, ignoring the need to visualise editing events on a larger yet biologically-relevant scale. Thus, we have developed gmoviz, an R package designed to extend traditional bioinformatics workflows used for genomic characterization with powerful visualization capabilities based on the Circos plotting framework. The circular layout used in gmoviz’s plots enables users to succinctly display genome-wide information about complex genomic editing events along with contextual biological information to improve the interpretation of findings. The gmoviz package has been developed by utilizing the many features of the Bioconductor ecosystem in order to support several genomic file formats and to seamlessly generate publication-quality figures. Finally, a complex transgenic mouse model, which harbours human gene knock-in, gene knock-outs, segmental insertion, deletion and concatemerisation events, has been used to illustrate the functionality of gmoviz.
The scope of the paper is to show how to produce a statistical summary report along with explanatory text using R Markdown in RStudio. Programmers write a lot of reports that describe the results of data analyses. There should be a clear and automatic path from data and code to the final report. R Markdown is ideal for this as it is a system for combining code and text into a single document. It is also an efficient, user-friendly tool for producing reports that do not need constant updating. RStudio is often used in the Pharmaceutical and Healthcare industries for analysis and data visualization, and the R Markdown tool can also be leveraged for creating reports and datasets for submission to regulatory agencies. This paper presents an RStudio program that demonstrates how to use R Markdown to generate a statistical table showing adverse events (AE) by system organ class (or preferred term) and severity grade along with text that explains the table. Collecting AE data and performing analysis of AEs is a common and critical part of Clinical Trials. A well-developed reporting system such as one generated with R Markdown, provides a solid foundation and an efficient approach towards a better understanding of what the data represent.
Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that actually run. The targets package learns how your pipeline fits together, skips costly runtime for steps that are already up to date, runs the rest with optional implicit parallel computing, abstracts files as R objects, and shows tangible evidence that the output matches the underlying code and data. In other words, the package saves time while increasing our ability to trust the conclusions of the research. In addition, it surpasses the most burdensome permanent limitations of its predecessor, drake, to achieve greater efficiency and provide a safer, smoother, friendlier user experience. This talk debuts targets with an example COVID-19 clinical trial simulation study.
R and Biocondutor are important tools supporting scientific workflows across early Research and Development at Roche/Genentech. We have a broad R users community, which includes Data Scientists, Software Developers and consumers of Data Products developed with R. The presentation will explain the guiding principles behind the creation and management of computational environments for Research. The first part will show how we provide shared R environments, which enable result reproducibility and provide access to custom compute resources for interactive data analysis workflows. While the second part will demonstrate how we create corresponding environments for software development, including a brief overview of the tooling and infrastructure, which streamlines the development, testing and deployment of R packages and Shiny applications.
Routinely-collected healthcare databases generated from insurance claims and electronic health records have tremendous potential to provide information on the real-world effectiveness and safety of medical products. However, unmeasured confounding stemming from non-randomized treatments and poorly measured comorbidities remains the greatest obstacle to utilizing these data sources for real-world evidence generation. To reduce unmeasured confounding, data-driven algorithms can be used to leverage the large volume of information in healthcare databases to identify proxy variables for confounders that are either unknown to the investigator or not directly measured in these data sources (proxy confounder adjustment). Evidence has shown that data-driven algorithms for proxy confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. Consequently, there has been a recent explosion in the development of data-driven methods for high-dimensional proxy confounder adjustment. In this talk, I will discuss recent advancements in data-driven methods for high-dimensional proxy confounder adjustment and their implementation within the R computing environment. I will discuss challenges in assessing the validity of alternative analytic choices to tailor analyses to the given study to improve validity and robustness when estimating treatment effects in healthcare databases.
The dramatic increase of R in the computational, analytics, and data science areas has led to some innovative techniques in recent years for interactive analytics. This rate of change presents challenges for IT organizations to keep up and to maintain their software stacks for scientists for regulated and non-regulated environments. Techniques and Best Practices for managing updates and use cases will be presented from an R-Admin’s perspective using a combination of opensource and professional tools.
The use of open-source R is evolving in drug discovery, research and development for study design, data analysis, visualization, and report generation in the pharmaceutical industry. The ability to produce tables, listings and figures (TLFs) in customized rich text format (RTF) using R is crucial to enhance the workflow of using Microsoft Word to assemble analysis results. We developed an R package, r2rtf, that standardizes the approach to generate highly customized TLFs in RTF format. The r2rtf package provides flexibility to customize table appearance for table title, subtitle, column header, footnote, and data source. The table size, border type, color, and line width can be adjusted in each cell as well as column width, row height, text format, font size, text color, alignment, etc. The control of the format can be row or column vectorized by leveraging the vectorization in R. Furthermore, r2rtf provides pagination, section grouping, multiple tables concatenations for complicated table layouts. In this paper, we provide an overview r2rtf workflow with examples for both required and optional easy-to-use functions. Code examples are provided to create customized RTF tables and figures with highlighted features. The open-source r2rtf R package is available at https//github.com/Merck/r2rtf.
Identification of subgroups with increased or decreased treatment effect is a challenging topic with several traps and pitfalls. In this project, we would like to establish good practices for subgroup identification, by building a simulation platform that allows for assessment and comparison of different quantitative subgroup identification strategies. Based on that we would like to provide guidance on different technical approaches. In addition, we would like to provide guidance on a recommended workflow for subgroup identification efforts to ensure best practices are used.
Statistical graphics play an important role in exploratory data analysis, model checking and diagnostics. The lineup protocol (Buja et. al 2009) enables statistical significance testing using visualizations, bridging the gap between exploratory and inferential statistics. We created an R-shiny App that facilitates the user to generate these lineups by using preloaded examples or by uploading their own data. The user can then act as a human judge to select the plot which he/she think has the real data and see if a correct choice is made. If a correct choice is made, it would be enough evidence to believe that the real plot is significantly different from the “null” plots. The app also calculates the “see”-value based on the selections made by multiple independent users which can be used to decide statistical significance. The app supports different types of analysis using continuous, binary or time-to-event response and continuous or categorical predictors.
Predictive modeling is a powerful tool, which amongst other things can be applied for prioritising drug candidates. Limiting the search space needed for target exploration, can reduce costs markedly partly eliminating lab time and expensive kits. Predictive modeling is however not without pitfalls… In this short talk, I will present a (fictive) data science case story, outlining one major challenge in predictive modeling, while demonstrating how to address said challenge.
Introduction to the X-Omics Platform (XOP), a digital biomarker research platform for bioinformaticians and other scientist at Merck KGaA. XOP is a validated system for storing, processing, and analyzing “omics” data, including RNASeq, DNASeq (whole-exome and whole-genome), digital pathology datasets, and eventually proteomics and other datatypes.
The tidyverse (tidyverse.org) is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. The packages primarily consist of tools for data ingest, manipulation, and visualization. In the last year or so, Rstudio and others have been creating a set of packages focused on the modeling process. In this talk, we will introduce the tidyverse and illustrate these new tools whose goals are to simplify the modeling process, encourage empirical validation and good methodology, and to enable a wider variety of approaches.
The gt package is table preparation package for R which makes the presentation of tabular data fairly easy and also has power to customize tables should you need it. The package has been in continuous development at RStudio for over three years and so it’s now a good time to reflect on what’s been done, where we are at, and how we can make this package and table publication even better in the future. I’ll provide some background and insight on the decisions made when starting the gt project. We’ll go over what gt is capable of doing today. It’s indeed a very useful package at this point in time and I’d like to talk about some of the features that make it great. I’ll provide some examples of tables that can be made right now. We want to make even more investments in table generation tooling so, to this end, we are working on a new project that aims to make table rendering easier through software that focuses entirely on rendering (i.e., writing to HTML, LaTeX, and RTF) an intermediate representation (IR) of a table. By doing this, authors of table packages won’t have to focus on writing code that renders tables to different formats. Instead, the authors of Pharma-focused table packages could concentrate their efforts on making their APIs even better.
R package validation is in all our minds since the pharmaceutical industry started moving away from SAS to R for its statistical analysis and regulatory submissions. Opting for open source programming requires to revisit our way of validating code, internally but also in a cross-Pharma effort when it comes to CRAN. Roche will present its approach to R package validation, and share some material for you to apply.
Like many other companies, Merck KGaA/EMD Serono has embarked on their journey to enable the use R for regulatory submissions. Following the framework introduced by the R validation hub (Nicholls et al., 2020), we started to develop an algorithm to qualify a CRAN package as a Merck standard package. In a nutshell If an R package passes the installation qualification and successfully executes available tests, the package will be made available to the user. Then, an automated risk assessment of R packages is performed based on the test coverage score (more is better) and the riskmetric score generated from the meta-information (smaller is better). If pre-defined thresholds are fulfilled, the package is qualified as Merck standard package, otherwise an explicit (manual) risk assessment is needed. In this presentation, we introduce our pathway to a risk-based assessment of R packages at Merck. We provide relevant details on the statistical analysis which led to the definition of thresholds supporting a robust classification of CRAN packages as Merck standard packages. We want to inspire other companies and seek feedback from the community.
In the safety analysis of clinical trials, the forest plot plays an important role. Currently, most of the forest plots are static, which makes them non-reader-friendly to Data Monitoring Committee (DMC). In this project, we propose an R package - forestly - to realize interactive forest plots. There are five fancy interactive features. First, users can apply filters to filter out the adverse event (AE) category of interest, for example, the serious AE, drug-related AE, AE with toxicity grade 3-5, etc… Second, the filtered AE summary tables can be downloaded as .rft files. Third, subject-level details can be drilled down by clicking the downward triangle bottom. Forth, labels can be revealed by hovering the mouse over a point. Fifth, search bars are embedded for users to quickly find an AE of interest.
In this short talk I will present few packages that can be used inside package testing framework that will help to increase overall quality of a package. The main point of focus would be static R code analysis tools such as well-known codetools or lintr and also less popular packages such as prefixer . For each of them, I am going to give a short introduction, present its configuration capabilities and how to use them within testthat framework.
R has become a prominent data science tool, empowered by a fast-growing modern R eco-system. At Novartis, Shiny and markdown have gained a lot of popularity in analyzing, visualizing and reporting of clinical trial data. Traditional report analysis plan (RAP) process was designed to create static table, figure and listings. In this talk, I will use ShinyRAP (a shiny app) to illustrate a novel framework/workflow of planning and executing of both pre-specified and ad-hoc analyses, as well as building dynamic/interactive reports through R/Shiny/Markdown. The app features efficient and organized programming through meta-data and shiny modules, and dynamic display of results via multi-select, grouping and searching, etc. Although motivated from a clinical trial data context, this framework can also be used for other types of data.
Detailed exploration of large transcriptomics datasets, increasingly available at single-cell resolution, is a time-consuming task which often requires the complementary skill sets of data analysts and experimental scientists to complete analyses and interpretation in an efficient manner. The iSEE (Interactive SummarizedExperiment Explorer) R/Bioconductor software package (https//bioconductor.org/packages/iSEE/), built on the shiny R framework, provides a general-purpose graphical interface for exploring any rectangular dataset with additional sample and feature annotations, such as single-cell RNA-seq data. Users can create, configure, and interact with the iSEE interface, enabling quick iterations of data visualization. This facilitates generation of new scientific hypotheses and insights into biological phenomena, and empowers a wide range of researchers to explore their data in depth. iSEE also guarantees the reproducibility of the analysis, by reporting the code generating all the output elements as well as the layout and configuration of the user interface. The combination of interactivity and reproducibility makes iSEE an ideal candidate to bridge and complement the expertise of researchers, who are able to design flexible, accessible, and robust dashboards that can also be directly shared and deployed in collaborative contexts - connecting large data collections to broad audiences, thus further increasing the value of generated research data.
We ALL have a tendency to solve problems with solutions that may be far from optimal. How does this tendency shape our Scientific Software Architecture? What are the long-term consequences of that? What pushes us towards sub-optimal solutions? What prevents us from reaching the optimal ones? Are there better solutions that we are missing? How could we make sure we do not miss potentially superior solutions? How those superior solutions could help us achieve our mission in a more efficient way? I will try to answer those questions in the context of an exemplary scientific software architecture which evolves over time and with the help from recently published outcomes of problem-solving experiments.
How do you roll out R to hundreds of colleagues, ensuring that the version you’re providing is tested, qualified and well managed? Is it possible to ensure that everyone is using the same version of R and packages? How do you account for differences between departments and groups in which R packages they need? How do you minimise effort so that you’re not perpetually in the process of qualifying and deploying R and packages? How do you balance the need to be flexible and allow colleagues to try new R packages and methods, versus having R under strict version and change control to ensure that their results are submission ready and reproducible in the long term? In this talk we’ll discuss how we’re trying to achieve this at Pfizer.
The CDISC-SEND data standard has created new opportunities for collaborative development of open-source software solutions to facilitate cross-study analyses of toxicology study data. A public private partnership between BioCelerate and FDA/CDER was established in part to develop and publicize novel methods of extracting value from SEND datasets. As part of this work in collaboration with PHUSE, an R package, sendigR, has been developed to enable end users to easily construct a relational database from any collection of SEND datasets and then query that database to perform cross-study analyses. The package includes an R Shiny application with a graphical user interface, allowing users who are not familiar with the R programming language to perform cross-study analysis. Experienced R programmers, on the other hand, will be able to integrate the package functions into their own custom scripts/packages and potentially contribute improvements to the functionality of sendigR.
In this talk, we will be discussing an architecturally and bioinformatically multi-layered integrative multiomic approach to the development of target hypotheses. Scientists work to help pharmaceutical companies advance towards the identification of potent therapeutics on a daily basis. In some scenarios, biological scientists can develop therapeutic tools without a specific target in mind. In this case, they would like to generate a list of potential targets for their tools, within a given set of parameters for the delivery. However, combing through all of the appropriate databases to find these targets that have the appropriate molecular biology characteristics, viable mouse models that recapitulate the human disease phenotypes, and pathologies in the tissues of interest, to generate this list is very difficult to perform manually. This work requires making recursive decisions from the present wealth of biological literature and its data at scale. Such decision-making is a herculean task that requires the simultaneous propagated joins of annotated entity catalogs (genes, knockout mice, diseases, structured vocabulary terms, etc.) and, orthogonally, recursive filtration of hierarchical associations between those entities and controlled biomedical vocabularies. To streamline and accelerate this process, we used public data repositories (Uniprot, National Center for Biotechnology Information, International Mouse Phenotyping Consortium, Online Mendelian Inheritance in Man), ontologies (Gene Ontology, Mammalian Phenotype Ontology, Human Phenotype Ontology), and their multi-species (mouse, human) entity annotations to populate and index a MySQL relational database and a Neo4j graph database with their descriptive and relational properties. We then built an API (application programming interface) via the plumber package for R to dynamically generate optimized SQL and Neo4j Cypher queries that interact with the MySQL database, via the RMariaDB package for R, and the Neo4j graph database, via the neo4r package for R, to fuse data across the ingested biomedical repository data and use the yielded results to generate parseable JSON objects. Finally, we built a user-friendly shiny app for constructing and submitting queries via the API, parsing the JSON API outputs, and providing interactive network visualizations of the queries via the VisNet package for R, in-depth explanations of how the results were generated, and links to external resources for further relevant scientific data. We delivered this app to fellow scientist collaborators via RStudio Connect, enabling these biologists to, within milliseconds, leverage high-dimensional, multi-species relationships to identify potential targets.
Tidymodels has begun to create tools for modeling event time data. This will include methods for fitting, resampling, and characterizing models with censored outcomes. This talk will describe our design goals, show some syntax for modeling, and describe subsequent additions to tidymodels.
Decision analysis balancing both data analytics and human gut feeling is critical in designing efficient routes to synthesize new, complex small molecules. This challenge is faced by any organization seeking to deliver modern pharmaceutical compounds to patients in a prompt manner. In this presentation, we highlight the incorporation of data science approaches using R to develop metrics that aid in the development process current complexity, risk quantification, and process efficiency forecasting. Current complexity is a metric established from human insights that assesses a molecule’s complexity in the context of capability, tracking the ‘current’ complexity of a given molecule over time and enabling the quantitative assessment of a new route or process. Risk quantification utilizes a Bayesian framework to quantify risk from real data and operational patterns, at both the project and portfolio level, for assessing the delivery risk of early candidate nomination assets in areas such as FTE resource modeling. Process efficiency can be estimated with a predictive analytics framework capable of quantifying the probable efficiency of a proposed synthesis or benchmarking the outcome performance of the developed process, thereby minimizing the environmental impact of pharmaceutical production. These strategies have been effectively used to aid the decision-making processes for pharmaceutical R&D.
Drug repositioning is an area of growing interest in drug development that can accelerate the discovery of new treatment options to benefit patients worldwide. Briefly, drug repositioning refers to the systematic investigation of a novel disease indication for a drug molecule. Drug repositioning can be accelerated using various tools and technologies, including intelligent dashboards, data integration and human-in-the-loop machine learning. A typical drug repositioning investigation generates a large amount that often needs to be linked and interpreted using a visual grammar familiar to various scientific groups leading drug repositioning investigation. We developed OneView - a shiny app that enables seamless integration, computing and visualization to accelerate drug repositioning investigations. As in many clinical and pre-clinical projects, the problem that OneView tries to solve is to connect biologists and clinicians with the data in a meaningful way. The core data behind the dashboard are from an analysis comparing transcriptomic signatures of drug molecules with hundreds of disease transcriptomic signatures, creating connections between a compound and diseases based on an inverse correlation between the transcriptomic signatures. To fully understand the significance of the relationships, OneView provides a dynamic dashboard enabling scientists to filter/search within the data, follow connections through multiple datasets, and provide meaningful interactive visualizations. We have incorporated additional data from several internal knowledge repositories to find further evidence to substantiate potential links between a compound and a disease. From a technical aspect, the most challenging part has been visualizing the data in the best way. A lot of the interesting information is in the standard connections of different elements in the data - such as common genes in multiple mappings between compound and disease signatures. In many cases, network plots were too busy to display those connections meaningfully. Instead, UpSet plots were found the best way to visualize interactions between multiple sets. While several packages are implementing UpSet plots in R, none of them allowed for interactive visualizations. To allow interaction with the visualization and further drilling down the data by selecting bars in the graph, we implemented our version of UpSet plots using the JavaScript library D3.
RNA-seq transcriptome analysis workflows often generate the essential information (data and results) distributed among a variety of different tabular files and formats, e.g. raw and normalized expression values, results of differential gene expression analysis, or functional enrichment analysis. The efficient interpretation of the results can be hampered due to this fragmentation, and the same can happen even when providing static analysis reports. We developed the GeneTonic package (https//bioconductor.org/packages/GeneTonic/), containing a Shiny application which provides an efficient and interactive solution to combine the results of RNA-seq analysis. GeneTonic assists users in the identification of relevant functional patterns, as well as their contextualization in the data and results at hand, with interactivity (to make the analysis simple and accessible) and reproducibility (via RMarkdown reports) to simplify the integration of all components and communication of results. With GeneTonic, researchers can generate a variety of visualizations, including bird’s eye perspective summaries (with interactive bipartite gene-geneset graphs or enrichment maps) as well as detailed information and visualizations of individual genes and gene-sets. These can be further inspected via drill-down actions that display additional content in specific elements of the user interface, streamlining analysis, interpretation, and knowledge extraction of transcriptome data for a broad spectrum of collaborating scientists. (https//doi.org/10.1101/2021.05.19.444862)
In this talk, we would like to provide updates on the four biopharmaceutical industry focused R consortium cross-industry working groups. These working groups have a similar overall objective to support the use of R within the biopharmaceutical industry, with complementary scopes. We would also like to call for volunteers for these three working groups (these working groups are open to everyone). R-based submission pilots to FDA provide example R-submission materials to the public, identify potential gaps in R based submissions - Presenter Ning Leng (Roche). R table for regulatory reporting develop packages and white papers for generating tables in R to fulfill regulatory requirements - Presenter Adrian Waddell (Roche). R certificates R trainings and certification for the SAS->R transition - Presenter Kate Ostbye (SCHARP). R adoption series A series of webinars focusing on adoption of R - Presenter Andy Nicholls (GSK).
Terms like “digitalization”, “machine learning (ML)” or “artificial intelligence (AI)” are more than just buzzwords these days. Databases are analyzed worldwide with modern algorithms and entire industries are making data-driven decisions at an even faster pace. In Pharma, it is not enough to get the prediction (the what). The model must also explain how it came to the prediction (the why). ML models can only be debugged and audited when they can be interpreted, which then allows for fairness, robustness and trust. Presently, however, the amount, complexity, variety, and speed of clinical data runs the risk of leaving us knowing less about our compounds than regulatory bodies. While the capabilities of ML and AI have received much attention, their role in clinical development has now moved from the theoretical to practical application stage. Using industrialized ML/AI tools, can detect clinically relevant, highly complex safety/efficacy signals that are not identifiable via classical approaches that force hypotheses on the data. By deriving the best hypothesis given the data, ML is currently the best available methodology to create holistic mathematical models of complex (biological) systems using all available data and variables while complementing findings from classical approaches. We, the Biomarker & Data Insight Group at Bayer, have developed a MLAI pipeline in R. Our MLAI pipeline is comprised of four core-modules (data preprocessing, modeling / hyperparameter tuning, higher order interaction analysis and reporting) using most of the available data of late phase trails covering standard endpoint types (time-to-event, class and continuous.). Each core module has its own created internal R package integrating several R packages (e.g. tidyverse, tidymodels, mlr3, iml, Rmarkdown, Shiny,…). The pipeline is an industrialized, mature and validated software product with continuous delivery and continuous deployment. Something special about this pipeline is that we have the effort to open the “black box” using explainable AI. With these extra tools, we can understand better why a certain variable is relevant for the prediction, reveal the nature of its relationship (monotonic or non-monotonic) with the outcome, and make the ML results more understandable and meaningful for clinicians.
R and Python compose the fundamental tools used by data scientists across industries including pharma and biotech. With a rich set of analytical packages in both language domains, analysts who are able to work with both possess a significantly larger selection of tools in their toolbox compared to single language analysts. To consolidate these camps, the reticulate package has played a fundamental and critical role in enabling the direct use of Python from the R console. Additionally, integration of Python capabilities into the RStudio IDE allows single point of access to both languages and their integration. Once a Python module or class is imported, however, accessing methods and attributes from R requires the usage of the $ operator in a way that is not completely consistent with typical R code and creates challenges for integration of objects or models developed in both languages. The result can become a mixture of R-esque and Python-like code that can resemble two different language structures, despite the efforts to combine them. In order to provide analysts an environment in which Python modules and classes can be used as though they were R-native objects, SomaLogic developed the PyR package. This package consists of a set of Python classes that wrap Python objects and a set of S3 methods providing wrappers to those imported classes. A model object hierarchy defining the expected interfaces for the Python components provides an overall architecture enabling introduction of new Python capabilities in a way that appears to the user to be native R code.
(The) Operation (formally known as) Warp Speed is a joint venture between pharma and government to bring COVID-19 vaccines to market at unprecedented speed. A key tenet of the program is to generate the data needed to establish correlates of vaccine protection – immune responses that predict the level of protective efficacy of the vaccines. Our team was tasked with designing an analysis plan and the code needed to analyze the data and produce results that answered these key questions. However, lacking full FDA approval of their products, some vaccine manufacturers were highly protective of their data. Thus, our team was faced with the challenge of building an analysis pipeline capable of analyzing data that we have never seen, on servers that we do not have access to, all under the extreme time pressure associated with COVID vaccine development. In this talk, I will describe the R-based set of tools that we used to achieve this goal and some lessons learned along the way.
Cohort studies of treatments developed from healthcare claims often have hundreds of thousands of patients and up to several thousand measured covariates. Therefore, new causal inference methods that combine ideas from machine learning and causal inference may improve analysis of these studies by taking advantage of the wealth of information measured in claims. In order to evaluate the performance of these methods as applied to claims-based studies, we use a combination of real data examples and plasmode simulation, implemented in R package ‘plasmode’, which creates realistic simulated datasets based on a real cohort study. In this talk, I will give an overview of our progress so far and what is left to be done.
Motivated by the rapid rise in clinical data exploration, there is an increasing need to utilize interactive graphical displays using Shiny apps. To date, the development and deployment of study apps have required specialized knowledge and considerable effort. However, the similarity across domains and endpoints in clinical studies motivated us to build a comprehensive framework that scales shiny app creation across the portfolio. The Datapipeline harmonized framework democratizes the shiny app creation. It enables non-technical associates to create and deploy professional shiny applications quickly. It also empowers shiny developers to build reusable shiny modules that may be easily shared in a plug-and-play manner, ultimately accelerating future application development.
In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny’s greatest strengths is that it allows producing web applications solely from R code, meeting client’s more delicate expectations will often involve going beyond R code and work with HTML, CSS, and JavaScript. We recognize that R developers tend not to be familiar with the latter as they generally do not have significant background in web development, these may therefore appear daunting at first. In this talk, I’ll present my journey toward the creation of the RinteRface organization, powering many Shiny extensions like bs4Dash or shinyMobile as well as the work in progress “Outstanding user interfaces with Shiny” book (https//divadnojnarg.github.io/outstanding-shiny-ui/), exposing some keys to design amazing user experiences.
In recent years late stage Pharma has begun to transition from a consumer of open source, and a sporadic creator, to a heavily invested collaborator on open source tools like R packages. In this short talk, James will discuss our recent focus on open source collaboration in post-competitive tools, and some important lessons we’ve learned.
Since its first release over eight years ago, the R community has progressively created amazing web-based applications with the Shiny package. In practically every R conference or user meetup, we see amazing examples of how Shiny is changing the landscape of data exploration and sophisticated production pipelines. Have you ever wondered just how these amazing applications came to be, and what tools you can use to carve your own Shiny journey? In this talk I will shine a spotlight on the latest books addressing best practices in application development, and how they opened tremendous possibilities in both my professional and open-source Shiny projects.
Failure to thoroughly review discrepancies and deviations in drug manufacturing is consistently one of the top citations in FDA inspectional observations. Learn how a leading biotechnology organization successfully replaced an inefficient, manual inspection process with a genealogy visualization and inspection solution to optimize drug manufacturing quality control. This session will cover implementation approaches and lessons learned in data mapping, technology selection, visualization development, and predictive model generation.
For data science teams, data preparation takes substantial investment of time, data science expertise and subject matter proficiency. However, as the name implies, data preparation is typically viewed merely as a means to an end, encouraging creation of expensive but often single-use and fragile elements in data analysis workflows. Rather than seeing data preparation as an obstacle to be removed, we propose a framework that recognizes the time and expertise invested in data preparation and seeks to maximize the value that can be derived from it. Viewing analysis-ready data as a multi-purpose, modularly built product that should lend itself to collaborative development and maintenance, the framework of Data-as-a-Product (DaaP) aims to remove barriers to version tracking and collaborative data development and maintenance. Specifically, the framework, which is entirely implemented in R, enables joint code and data versioning based on git, standardizes metadata capture, tracks R packages used, and encourages best practices such as adherence to functional programming and use of data testing. Collectively, the patterns established by the DaaP framework can help data science teams transition from developing expensive, single-use “wrangled” datasets to building maintainable, version-controlled, and extendable data products that could serve as reliable components of their data analyses workflows.
In the final stage of a clinical study, a number of tables and figures are prepared, typically using SAS, for reporting the results of the study in a clinical study report. Before the clinical study report is finalized a thorough interpretation of the results are needed and results are discussed with stakeholders and management. To facilitate this, a PowerPoint presentation is prepared containing tables and figures with study results. A modest approach is to manually copy-paste the tables and figures from the clinical study report into the slides. However, due to the manual work, this approach is both time consuming and prone to errors. Rmarkdown offers the possibility to render PowerPoint presentations and can thus be used as an alternative approach to automatic generate slide decks for communication of clinical study results, and we have been working with this idea at Novo Nordisk. The idea is to use the data that underlie tables and figures prepared for the clinical study report, and with a minor programming effort create outputs that are embedded into a PowerPoint presentation. We will discuss possibilities with this automatic approach and discuss issues encountered when implementing the approach when preparing for communication of a clinical study.
In this talk, we will discuss an infrastructure-free R package exchange and distribution system. The components include pkglite for compact package representations, cleanslate for portable R environments, and pkglink for runtime dependency resolution. We will also discuss its potential applications in reproducible research and submissions.
Statistical programming of summary tables is a well-established task within the clinical world. In the last few years, the pharmaceutical industry has seen several new packages emerge to support these activities, including the Atorus package Tplyr . Much of the focus has been on creating static tables, be it in HTML, or more classically RTF. With the visualization power available within R and JavaScript, the natural next step is to enhance interactive capabilities. This presentation will demonstrate how Tplyr take this next step. Using metadata captured by Tplyr, we will show how a user can hit rewind and drill into the source data that lead to a summary data point.
REAP (R-Shiny Exploratory Analysis Platform) was developed by the Modeling and Simulation group within the Clinical Pharmacology department at Genentech, Inc., to support exploratory analyses of clinical data. REAP is a web-based, user-friendly, tool providing standard methods and outputs for conducting typical analyses within a clinical pharmacology group. With REAP, a clinical pharmacologist or pharmacometrician can perform Exposure-Response, dose linearity, and concentration-corrected QT analyses, PKPD simulations, NONMEM data quality checks, and PK graphic analyses without writing code. Results can be used to enhance scientific understanding of the relationship between exposure, response, and the PK characteristics of the molecule. In this talk, I will demonstrate how REAP can be used to perform dose linearity and Exposure-Response analyses.
This invited talk will describe the current landscape of CDISC initiatives and collaborations. CDISC currently has a portfolio of innovative industry initiatives that include new standards as well as open-source software projects that are part of the CDISC Open-Source Alliance (COSA). This talk will highlight many of the innovative initiatives reshaping standards-based data flows including transparent and metadata rich JSON transfer files, biomedical concepts, conformance rules, study design, analysis results standards, and RWD.
If we could predict a patient’s future risk of developing illnesses such as depression or lung cancer in the next three years, then we could potentially intervene and improve the patient’s future health. The PatientLevelPrediction R package provides a standardized analytic framework for developing diagnostic and prognostic models using observational healthcare data (e.g., electronic healthcare data and insurance claims data). It utilizes the OMOP common data model, a standard data structure, to enable rapid but reliable model development. The package contains a library of binary classifiers and survival models (with R, Python, C++ and Java backends) for users to select but also enables the flexibility of writing custom supervised learning methods. In addition, the package contains a suite of recommended performance metrics and visualizations. In this talk we will demonstrate how to use the package to develop and internally validate data-driven models and then show how the standardized approach makes large-scale external validation possible. We will also illustrate the built-in shiny app that enables interactive visualization of multiple models.
A prespecified adaptive plan involves automating the analysis of interim clinical trial data and adjusting elements of the trial in response. In implementing these plans, we experience random highs and lows in the data, adjacent doses of a drug with drastically different results, and lots and lots of uncertainty. To facilitate training in adaptive trials, newcomers need to see data as it might accumulate within a trial and attempt to make design decisions based on that data. To this end we have created ANTICS, a free public R/Shiny based tool that guides a user through a single adaptive trial. ANTICS has modules for dose escalation, dose finding, enrichment, and staged/seamless designs. Repeated plays of ANTICS introduce the idea of simulation and emphasize how the same rules can produce different results when faced with random data. ANTICS has modules for dose escalation, dose finding, enrichment, and staged/seamless designs. A scoring system guides the decision making, emphasizing real world incentives such as getting a correct arm into phase 3 or penalizing players for running a phase 3 in a poor arm. We hope it is a valuable resource for anyone beginning an exploration of adaptive trials.
Tables no longer just live in flat PDFs and reports, but should be able to go from apps to PDFs and Word documents with ease. To have the flexibility to do this we need to separate the analysis from the formatting. Additionally, in the pharmaceutical industry our tables need to be able to change their format. Many journals have different requirements on how to present p-value, and other styling questions. So, we built a package devoted to the formatting of tables based on semi-structured analysis results datasets. This package is unique because it allows users to create tables without data, which helps build mock shells, a key part of table creation within pharma. Additionally, we wanted to be able to layer formats on top of each other as most tables in the industry are built to an internal standard, with minor tweaks around the edges. The code to create the shells can then be reused for the final tables, which saves time and brings table formatting off the critical path. Through the utilization of analysis results datasets, layerable and data-independent formatting, we’ve been able to create a package to meet the specific needs of our business.
The presentation will introduce the transition project that the whole department of +150 SAS programmers has completely moved from SAS to Open-source programming. The whole department switched from SAS Studio to R Pro Server, Window server to AWS Cloud computing environment, and the transition of SAS programmers to R/Python programmers. The presentation will also discuss the challenges of the project such as inexperience in Open-source Programming, new analytic platform, and change management. It will introduce how the transition-support team, executive leadership and SAS programmers have overcome the challenges together during the transition. It will also discuss the difference in SAS and Open-source language and programming, and it will show some examples of the conversion of SAS codes to R/Python codes. Finally, it will close with the benefits of the Open-source programming culture and the lessons learned from the transition from SAS to Open-source programming.
In the past years, the phama industry has seen a true paradigm shift in its use of R. Up until recently, one had to choose between R and SAS. Today, most statisticians are trained in both languages. With this in mind, at AstraZeneca we built on the growing interest for R, at any stage of the drug development but also company-wide. Since April 2021, we have launched several internal initiatives aiming at federating the community of R users within AstraZeneca. We started by stealing with pride a public initiative, TidyTuesdays, and make it our very own, calling it #azTidyTuesday. On a monthly basis, we promote a public dataset to the community of AstraZeneca R users. This is done by aligning the #azTidyTuesday editions with one AZ value or with an ongoing event (Pride Month, COP26). We also put in place monthly Lunch & Learns, interviews of R users and blog posts. And in early 2022, we organized the first AstraZeneca R Conference. While building this community, we tried many things. Some worked well from the beginning, some required improvements. But all of these initiatives bore fruit as the number of R community members saw a 5-fold increase since launch. More than the numbers, the vibrancy of the community is what makes us proud.
The rOpenSci project is a non-profit initiative founded as a grassroots effort in 2011. We have evolved into a truly global community of researchers and data scientists who are R users and developers from a wide range of disciplines. rOpenSci advocates for a culture of open and reproducible research. We do this by creating technical infrastructure in the form of carefully vetted, staff- and community-contributed R software tools that lower barriers to working with scientific data sources on the web. We have developed a highly successful model for peer review of scientific software that provides transparent, constructive and collegial review of R packages. Our community is our best asset. We are building social infrastructure in the form of a welcoming and diverse community. rOpenSci.org hosts blog posts by authors and reviewers of onboarded packages to share both functionality and lessons learned; we promote these on social media to bring their work to a wider audience. Our discussion forum, community calls and annual hackathon-flavored unconference are designed to share best practices and to build a trust network for the often challenging discussions about doing research more reproducibly.
Drug safety data present many challenges with regard to curation, analysis, interpretation, and reporting. Safety endpoints have high variability, are multidimensional, and interrelated which points out to a need to identify novel approaches to explore, analyze, and present these data in some meaningful and insightful way. Visual analytics presents an alternative to the traditional tabular outputs for exploring, assessing, and reporting safety data and present an opportunity to enhance and facilitate evaluation of drug safety and help convey multiple pieces of information concisely and more effectively than tables. Graphical depictions of safety data can play a big role in facilitating communication of safety results with different stakeholders including regulators, investigators, and data monitoring committees. Visual analytics facilitates blending data visualization, statistical, and data mining techniques to create visualization modalities that help users make sense out of safety data with emphasis on how to complement computation and visualization to perform effective and meaningful analyses. Importantly, it is critical to develop readily available tools for stakeholders to use for visual analytics of drug safety data. The tools must take into account considerations revolving around structured assessment driven by safety questions of interest and should consider appropriate user interface parlor. In this discussion, we shall discuss one such tool being developed through a joint collaboration of the ASA, PHUSE, and FDA.
Metrum Research Group (MetrumRG) has developed a suite of open-source R packages for pharmacometric analyses that can be used independently, or seamlessly integrated into a larger R-based ecosystem. To showcase this ecosystem, we used the popular Quarto framework to create a publicly available website with 20+ articles demonstrating how our ecosystem can be used to streamline a wide range of tasks typically required in a population pharmacokinetic (PK) analysis. The site is accompanied by a public Github repository containing example code for each task covered in the articles. This talk will give an overview of the ecosystem and articles, with a focus on traceability and reproducibility. We will also touch on our methods for doing open-source development in a validated environment, to facilitate analyses suitable for regulatory submission.
Roche/Genentech, GSK, Atorus and J&J/Janssen have initiated a collaboration called pharmaverse to bring together a curated subset of open-source R packages to enable clinical reporting (from CRF to eSubmission). Where gaps are identified, new collaborative development teams can be formed across companies to build solutions fit for industry adoption. Any individual or organisation would be able to join our community and contribute. Our ultimate aim is to reduce duplication of efforts and gain increased harmonization to the way we work across the industry, so that collectively we can bring medicines to patients faster. This talk would be intended to introduce data scientists across the industry to this concept and the benefits of open source collaborations. We will share what is currently available under the pharmaverse, short demos, along with stories and learnings we’ve had on this journey so far. Example R packages include admiral (a toolkit for ADaM generation), rtables & Tplyr (for TLG creation), teal (for Shiny apps), and much more.
On Nov 22nd, 2021, the R Consortium R Submissions Working Group successfully submitted an R-based test submission package through the FDA eCTD gateway. The submission package has been received by the FDA staff who were able to reproduce the numerical results. This submission was an example submission package following eCTD specifications which include a proprietary R package, R scripts for analysis, R-based analysis data reviewer guide, and other required eCTD components. To our knowledge, this is the first publicly available R-based, or open-source-language-based FDA submission to the eCTD gateway for CDER. We hope this submission package and our learnings can serve as a good reference for future R-based regulatory submissions.
So you’ve started writing custom JavaScript for your Shiny app… but where do you put all this code?! Organizing JS files to be sourced within one another can be really hard to navigate from within a Shiny application. In this talk I’ll cover what bundling for JS means, and how to do it in the context of Shiny. This talk should empower you to not only write code in JavaScript but bring modern tooling to your app that will allow for organization and cleaner code.
It is relatively simple to create a powerful visualization app using shiny, but what if you need to change your data wrangling process or wish to build a different output? How easy is it to provide this flexibility without having to rewrite the underlying code? This presentation will highlight shiny frameworks - apps that are adaptable and extensible. We shall illustrate how separation of data, application and presentation promotes flexible content and how interpreters can build app components at runtime.
Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and trust results in order to confidently extend them, even when the results are their own. We present the GREX framework, which facilitates this iterative process via the principled management of computational results. GREX combines robust storage, provenance tracking, automated annotation, discoverability, and reproduciblity of computational results. We will discuss both the underlying conceptual work and our reference implementation of our framework.
Increasingly biostatisticians in pharma companies would like to use R on a daily basis, e.g. the growing number of participants in R/Pharma conferences is one metric showing this trend. As R programs replace proprietary software in this regulated industry, the requirements for quality and reliability increase, e.g. validation of packages might be required. More importantly, programs used in multiple projects or over a longer time period with changing biostatisticians need to be maintainable, intuitive and reliable. Getting this done takes experience and time. In particular, it is challenging to do this as a side project next to other daily biostatistics work and without the right tools in place. Therefore forming a dedicated team, working together closely with the applied biostatisticians, methods experts and IT professionals, can help. We share our Statistical Engineering team experience in Roche, covering agile ways of working, resourcing, product prioritization, and open source collaborations. We believe that taking the Research Software Engineering aspects seriously and adopting an Open Source mindset is key to sustainably and reliably use R in Pharma.
Sarepta deployed RStudio Team for modern data analytics. One major hurdle we faced was how to serve data to Connect/Workbench securely in the backend. Partnering with Atorus, Sarepta solved this challenge using Box cloud storage as a secure data backend through Box API, JWT token, & Box Python SDK. RStudio Team & Box are seamlessly integrated with Okta SSO and JWT token authentication, providing a two-layer secure authentication that leverages Box frontend permission management. Inspired by boxr pkg, we developed usebox for R users and leveraged reticulate to call Box Python SDK. We illustrate usebox through a real Shiny App Patient Listing Generator, powered by open-source pkg subpat . We believe this real-world harmonized use of R, Python, Shiny, and Box offers insights to the community and organizations facing similar challenges.
Back in 2020, Atorus had the initial release of the R package Tplyr, which was built to simplify the creation of clinical summary tables. Now in 2022, new updates and enhancements have been added to Tplyr to give the user more, particularly in the area of Shiny. In this presentation we will discuss how Tplyr collects metadata that provides traceability to every result it derives. Furthermore, these metadata features are externalized, allowing users to extend Tplyr’s metadata or even build their own, which enables all these of these features to extend beyond Tplyr itself. Ultimately when paired with Shiny, users can utilize Tplyr to build click-through tables where a reviewer can click on a result and immediately view the subset of data that was used to derive it.
Over the past year, we’ve designed a process that is meant to mimic public package publishing as closely as possible, where packages are automatically assessed by a series of checks which may prompt manual revision should the automated processing discover any gaps. This process is automated where possible, and (internally) transparent, with an interface to our internal continuous integration tooling to make feedback as visible and directly actionable as possible. We’d like to share our approach, with special emphasis to the core design philosophies that we’ve set out to adhere to building a process that stays close to open source tools and contributing back where we can (github.com/Genentech/rd2markdown, github.com/Genentech/covtracer). Beyond what we’ve developed internally, we’d like to sit this work next to recent efforts by the R Validation Hub and offer a vision for a shared, public repository that can support transparent quality assessment. By doing this, we’d like to set the stage for further feedback and broader participation to ensure that the future of R package ecosystems is more immediately actionable and accessible across institutions and regulators.
As more and more companies move their compute environments into the cloud, the steps needed to ensure that their software suite and newfound infrastructure are FDA compliant change accordingly. In this talk, we will examine the requirements for an FDA validated architecture in the context of R. This is a potentially tricky process depending on what the user requirements are and since R is open-source, extra care must be taken to ensure validation and qualification of the entire cloud R ecosystem. What does validation and qualification entail, and what are the pitfalls that stakeholders need to be mindful of? This presentation will demonstrate at a high level the steps required to be FDA compliant for a validated R GxP system in the cloud.
The job of a data scientist working on a clinical trial team in the pharmaceutical industry is to provide the most accurate analysis possible in order to enable valid insights from the data. Ensuring data quality is extremely hard work and there are teams of people at clinical trial sites, vendor companies, and within the sponsor institution all working to identify and resolve data issues in order to help make datasets analysis ready. Before performing an important analysis a data scientist may want a way to reassure themself about the quality of their data and identify any important issues that have slipped through the cracks. sdtmchecks is a simple, easy to use, open source R package to help identify analysis impacting and actionable data quality issues in SDTM datasets. This talk will touch on this package’s crowd-sourced development history at Roche/Genentech as an accessible way for non-R coders to get initial, practical experience with R, its current use at the company within a Shiny app, as well as its future potential as an open-source tool publicly available for cross-industry collaboration.
I will talk about some of the challenges that are now arising in BioTech. There are larger, more informative but much more complex, data sets available and being developed. While these hold great promise they add complexity to an already fragile analytic environment. I will divide the challenges into three groups, software, data technologies and analytics and provide some examples of how we can increase sharing and reduce complexity. I will also argue that the funding of data resources and technologies, whether government or private, needs to adapt to the changing landscape and provide some suggestions of how that could happen.
Recently there have been a lot of new developments for modeling in the tidyverse. This talk will show off tools for censored regression, an interface to clustering, and how to use the h2o.ai platform for optimizing/fitting models
Addressing estimands in clinical studies involves handling of intercurrent events and often multiple imputation methods are applied to handle missing data. In Novo Nordisk more and more programming tasks are done in R, but still multiple imputation methods are conducted using SAS. We look into the possibilities and obstacles of conducting multiple imputation using R and SAS interchangeably. The aim is to make it possible for the statisticians to choose between SAS and R based on either personal preferences or on which tool is best suited for the specific task. More specifically, the approach has been to simulate data, and to compare multiple imputation results from SAS (PROC MI and PROC MIANALYZE) to results from R (MICE package).
RStudio is one of the most commonly used software integrated development environment (IDE) for R. In this talk we present RStudio on Amazon SageMaker, a fully managed RStudio IDE on AWS cloud. We also walk through a Health Care and Life Sciences use case for carrying out analysis and building a machine learning model, showcasing how RStudio on SageMaker can help streamlining the data processing, model building and model deployment process for data scientists and machine learning engineers.
Nowadays R is the talk of everyone in the pharmaceutical industry. A lot is being said about statistical programming (CDISC datasets, TFLs) with R and addressing validation issues. The most important players embraced R in various areas of their activity and share their experience, develop game-changing tools, and propose solutions. But the market consists also of small CROs, like ours, fully based on R, harnessing it in all areas of our statistical and programming work from the trial design, through creation of datasets, conducting planned and ad-hoc statistical analyses, generation of the TFLs, ending with the validation of both the outcomes and the R software itself. In the last 5 years of daily work with R, we gained a solid hands-on experience. Now we would like to share our story the problems we faced, the measures we tried and the solutions that have proven successful. We believe in the importance of sharing experience from diverse subjects, as small CROs have their own specificity, needs and possibilities in terms of budget and people. This raises the awareness of differentiated challenges and needs, so important in working out effective, broad-based ways to make R a worthy competitor of the current industry standards.
Computationally-intensive workflows exist in the design, analysis, and simulation in all phases of clinical studies. The runtime of such workflows can be significantly longer than a Shiny app can practicably handle. In such situations, binding the execution of computational tasks with active browser sessions can be detrimental to the user experience, making both the tasks and the web application less reliable. We present a R based solution to accommodate long running workflows with asynchronous execution of background tasks. By utilizing functions from shiny, httr, plumber, and pins packages, one can construct scalable web applications capable of asynchronous processing that cover processing-block free task creation, execution, status query, and results retrieval. A Bayesian dose finding automation example will be used to illustrate how the proposed framework allows users to initiate long-running tasks remotely and review the results later without utilizing resources on their local devices.
Continuous integration (CI) and continuous delivery (CD) are playing a pivotal role in ensuring that R projects in Pharma meet the highest quality standards. Particular focus is placed on ensuring that packages are fit for purpose both on internal systems as well as meeting the various requirements for CRAN/BioConductor. In this talk, we will discuss best practices that were adopted into making developer-friendly and efficient CI/CD pipelines and the impact that these pipelines have had in the open source Pharma community and at Roche/Genentech. Three case studies of package and pipelines will be discussed - one on a beginner level and two on an advanced level. The first will be CI/CD workflows for the Admiral R package and presented from the perspective of a newcomer to CI/CD. The second and third use cases will be regarding the NEST framework and the RBMI R package, both of which will have more advanced discussions presented by experienced CI/CD developers.
gtreg internally leverages gtsummary to streamline production for regulatory tables in clinical research. There are three functions to assist with adverse event reporting tbl_ae_count(), tbl_ae(), and tbl_ae_focus(); tbl_ae_count() tabulates all AEs observed, whereas both tbl_ae() and tbl_ae_focus() count a single AE per subject by maximum grade. Furthermore, tbl_reg_summary() produces standard data summary tables often used in regulatory submissions and ‘tbl_listing()’ enables a formatted grouped printing of raw AE listings. All functions are highly customizable to make your regulatory reporting a breeze!
When working with big data sources, such as medical claims data, the process of data review and quality control (QC) can be both complex and tedious. R and RMarkdown have become common tools for data analytics and report writing in the pharmaceutical space. RMarkdown’s ability to integrate both the code used to query the data, with the results and visualization of it, provides a more powerful interface in which the output can contain both the process and the results, unifying two procedures which are separate with other approaches to the QC process. We present our analytical QC pipeline process which can be developed at the start of a new project and marries the processes of deeply understanding the underlying data in the early stages, with developing an ongoing pipeline of reports and QC procedures which can be automated and run in the future when the data is refreshed. The reduced timeline of this process greatly increases the speed in which updated data can be ingested accurately and confidently, and results in key stakeholders having quick access to the most up to date information in downstream processes and decision making.
The Pharmaceutical industry is adopting new tools and technologies, putting pressure on individuals to learn many new skills in a short period of time. In order to both promote these new ways of working, and to assist those adopting it, at Genentech we are building a new Coursera specialisation. In this talk I will share details of what this specialisation will cover and when you can expect it to be available. I will discuss the different aspects of data science we chose to focus on, and how we are going to promote the use of pharmaverse tools for clinical reporting.
Data sources and the volume of data available for driving discovery and informing decisions have substantially increased over time. This increase has resulted in an evolving data and regulatory landscape ripe for the expertise of statisticians and data scientists. Statisticians and data scientists must play a key role to ensure the appropriate use of data and soundness of conclusions reached from analyses of the data. In this talk, we will explore the landscape identifying challenges and opportunities and highlight our contributions and impact.
Clinical development requires quick access to live trial data to address safety questions and evaluate data quality. Historically, teams have resorted to Excel to manually populate patient profiles, despite human error limitations, inefficiency, and lack of reproducibility. Other solutions offer singular views that require repetitive programming and CDISC SDTM/ADaM dependencies. These options rarely provide quality information at a fast turnaround pace. To solve these issues, we developed ctpatprofile , a modular Shiny app framework for on-demand, user generated patient profiles using live, raw EDC and central lab data. Building on Sarepta’s 2022 usebox talk, ctpatprofile features Python BOX SDK for data ingestion, an approachable 1-click UI, and exports to standard outputs for communication. We will compare trade-offs of using EDC raw data vs derived data and emphasize the use case of starting from raw data, a common scenario in Biotech, or early-stage clinical trials. We will show technical innovations like flexible YAML configurations and parallelized R Markdown PDF rendering for enhanced user experience. We hope the information shared will help Pharmas/Biotechs to explore creating patient profiles from raw EDC data.
Data monitoring to ensure patient safety is an important process in clinical trials. An independent data monitoring committee (DMC) reviews safety data periodically to interpret findings and assess various safety signals. Sponsors typically provide static reports that include tables, listings, and figures (TLFs) to help with this assessment. However, these static reports may not offer quick access to information for specific data points, requiring DMC members to navigate multiple pages or request additional details from the sponsor. To help DMC members review process, we developed an end-to-end process and tools to create interactive reports with drill down options using open-sourced R packages. Using ADaM data as input, these tools can generate interactive reports that the DMC can use to get easy access to additional details on-demand resulting in efficient review and enhanced user experience.
Pharmacokinetic (PK) analysis data programming poses some unique challenges. For example, both dosing and concentration records are included, and nominal and actual relative time variables that reference the first dose, the previous dose, or the next upcoming dose are required. Modelling data may require a number of numeric covariates. CDISC ADaM submission standards for Non-compartmental Analysis (ADNCA) data have recently been published, and new standards for Population PK (ADPPK) are forthcoming. admiral is an open-source R package for creating CDISC ADaM data. It can be used effectively to create both types of PK analysis data. Additional tools from other Pharmaverse packages such as metacore and metatools can be used to simplify the workflow. I will discuss some of the challenges of Pharmacokinetic data programming and show some of the solutions developed in admiral and the Pharmaverse.
Finding the right dose is a critical step in pharmaceutical drug development. There has been varies statistical methodology development for the design and analysis of clinical studies. In particular, MCP-Mod (Multiple Comparisons Procedure - Modelling) has gain increasing popularity and has received wide recognition from the regulatory agencies (e.g., EMA in 2014 and FDA in 2016) on the design and analysis for Phase 2 studies. Based on the methodology, R package, DoseFinding, provides functions for both design and analysis of dose-finding experiments. Novartis has conducted a survey of how to help the associate to easily access the tools to use MCP-Mod for the design of Phase 2b study. As a result, a Rshiny app is developed focus on the design of Phase 2b study based on MCP-Mod. The app handles continuous, binary and count endpoint, and allow computations based on multiple choices of study set simultaneously.
The Pharmaceutical industry is moving towards open-source tools with companies adopting R/Shiny to revolutionize their processes in clinical reporting, drug development, and translational research, among other areas. One of the initiatives of this global effort focuses on improving the practices of R-based clinical trial regulatory submission. The “R Submission Working Group” is exploring the use of R/Shiny applications for regulatory agencies to interactively review the data and analysis, complementing the static and extensive documentation. We are going to take a look at the teal -based Shiny application that is being used as a proof-of-concept in FDA pilots and present two formats for packaging the application that aims at allowing the reviewers to access the application seamlessly and without the overhead of setting a complex local R environment and installing all the packages and system dependencies. The packaging formats include Podman, an open-source alternative to Docker, and webR, which allows R and Shiny to run exclusively in the browser without any system dependencies. webR is still under heavy development and it is an exciting opportunity that is made possible by the innovation of an open-source global community.
In this talk, we would like to introduce openstatsware, an official working group of the American Statistical Association (ASA) Biopharmaceutical Section. The working group has a primary objective to engineer R packages that implement important statistical methods, with current focus on mixed models for repeated measures (MMRM) for both frequentist and Bayesian inferences, and health technology assessment (HTA). The secondary objective is to develop and disseminate best practices for engineering high-quality open-source statistical software, for which we have given workshops in different countries/cities and made Youtube video series for education purpose. We would also like to introduce the R package mmrm for implementing MMRM, which was developed as a cross-company collaboration via openstatsware. A critical advantage of mmrm over existing implementations is that it is faster and converges more reliably. It also provides a comprehensive set of features users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjusted degrees of freedom, extract the least square means estimates using the emmeans package, and use tidymodels for easy model fitting. We aim to establish mmrm as a new standard for fitting MMRM.
In 2022, our team announced the first release of the tfrmt package, providing clinical programmers with the novel ability to create tables without data. With its metadata-driven engine applied to the emerging industry standard of Analysis Results Data (ARD), tfrmt stands out among an abundance of other R-based table-making utilities. As we entered 2023, we sought to extend the capabilities of tfrmt to a wider user-base, beyond teams working solely in R. In this presentation, I will share our efforts to bridge the gap between tfrmt and its range of potential users. This includes the development of the companion Shiny app, tfrmtbuilder , which simplifies the learning process for newcomers and enables advanced customizations for non-programmers. Additionally, tfrmt ’s new ability to generate and consume language-agnostic metadata is crucial for templating and reuse. Attendees will learn about our approach and discover how to incorporate tfrmt , along with its recent enhancements, into their unique workflows.
Historically building a great SCE for clinical reporting involved selecting a vendor, integrating their product, and supporting a single proprietary language. The shift to report clinical trials using R has had a much broader impact than just swapping out a language, with it also catalysing the adoption of data science in statistical programming. For the team building the latest generation of SCEs, this has led to a complex eco-system of dynamic dependencies to enable reproducible research, the need to adapt to a much faster pace of development of the tools used, and facilitated bringing different elements of evidence generation like trial design, and real world evidence, to co-exist with statistical programming. During this talk, we’ll discuss this evolution, the underlying tensions we continue to tackle aspiring to balance innovation against business continuity, and the critical role SCE architecture plays facilitating a shift to data science.
torch is an R port of PyTorch, a scientific computing library that enables fast and easy creation and training of deep learning models. In this talk, you will learn about the latest features and developments in torch, such as luz, a higher level interface that simplifies your model training code, and vetiver, a new integration that allows you to deploy your torch models with just a few lines of code. You will also see how torch works well with other R packages and tools to enhance your data science workflow. Whether you are new to torch or already an experienced user, this talk will show you how torch can help you tackle your data science challenges and inspire you to build your own models.
Shiny is a package for turning analyses written in R into interactive web applications. This capability has obvious applications in pharma, as it lets R users build interactive apps for their collaborators to explore models or results, or to automate workflows. However, the interactivity of Shiny apps is a double-edged sword, as it introduces challenges to the traceability and reproducibility of your analysis. To use interactive applications in pharma responsibly, these challenges must be addressed. In this talk, I’ll look at some of the tools and techniques you can use in Shiny to deal with these challenges head-on.
The use of R in submissions to healthcare regulators presents challenges as the quality of packages must be ensured, and evidence of this quality must be readily available. The Regulatory R Package Repository Working Group aims to tackle these issues by identifying and prototyping a technical framework that supports a transparent, open, dynamic, and cross-industry approach to creating and maintaining a repository of R packages, complete with evidence of their quality and assessment criteria. This initiative aims to streamline in-house validation processes, facilitate burden-sharing of validation efforts, improve package quality through transparent, open peer review, and minimize risks associated with using public R packages for analyses submitted to regulatory bodies. Over the past few months, the group has recruited key representatives, including validation managers and regulatory authorities, conducted stakeholder interviews, and explored product concepts. This talk will present the key findings from the stakeholder interviews, discuss the product concepts generated thus far, and outline the future steps for the Regulatory R Package Repository Working Group.
Since 2021, the FDA and the NIH have increased citations and notifications for non-compliance with required results reporting on ClinicalTrials.gov. Many studies still do not submit results to ClinicalTrials.gov; some do not publish results after 3 years following study completion. Institutions are limited by a system that provides useful data, but additional steps are required to plan for future actions. Some institutions develop procedures that are not reproducible elsewhere. Private companies have developed software to monitor compliance with a cost. To date, there are a lack of low cost solutions to help institutions remain compliant. The Clinical Trials Dashboard aims to increase compliance by making tracking registration and results status simple, transparent, and reproducible. A user uploads the csv files downloaded from the Protocol Registration and Results System and the dashboard merges, aggregates, and flags studies for review. Downloads include retrospective report as displayed in the dashboard; prospective results, including NIH-defined clinical trials, due in next quarter; and a file with parsed contact information to plan compliance activity.
How does a risk-averse Pharma Biostatistics organization with 800+ people switch from using proprietary software to using R and other open-source tools for delivering clinical trial submissions? First slowly, then all at once. GSK started the transition of using R for its clinical trial data analysis in 2020 and now uses R for our regulatory-reviewed outputs. The AccelerateR Team, an agile pod of R experts and data scientists, rotates through GSK Biostatistics study teams sitting side by side to answer questions and mentor during this transition. We will share our experience from AccelerateR and how other organizations can use our learnings to scale R from pilots to full enterprise adoption and contribute to open source industry R packages.
Streamlining clinical trial output workflows is a key challenge for clinical studies. Our project leverages Python to link the planned analysis stored in a google sheet LoPO (List of clinical study Planned Outputs) to the study scripts that generates SDTM/ADAM/TLG outputs. We also employ Snakemake, a powerful workflow management system, to automate creation of an execution plan that can then orchestrate the generation of output files from the processed data, using parallel computing. To simplify the data collection process, we have created a Google Sheets add-on that allows statistical programming analysts to input clinical studies information directly. Using the in-production LoPO tool as a case study, we will present learnings that have shaped our current best practices on writing, versioning, testing and deploying a Python package as a critical component of our clinical reporting workflow.
Emerging diseases like COVID-19 pose dual threats to public health and the economy. Understanding protein-protein interactions (PPIs) between viral and host proteins is crucial for antiviral therapies and studying pathogen replication. However, experimental techniques have limitations and machine learning models primarily focus on sequence-derived features, neglecting semantic information and necessitating effective encoding schemes. To address these challenges, we present DeProViR, a deep-learning framework for predicting virus-human interactions using amino acid sequences. DeProViR incorporates a Siamese-like neural network that combines convolutional and bidirectional LSTM networks to capture contextual information. Using GloVe embeddings, DeProViR seamlessly integrates semantic associations, enhancing PPI prediction. This innovative framework overcomes limitations in feature engineering and encoding scheme dependence. DeProViR provides an efficient solution for predicting host-virus interactions, facilitating therapy development, and advancing our understanding of diseases.
Product quality plays a vital role in the success of Biotech/Pharmaceutical organizations, and the accurate classification of quality risks is crucial to ensure the delivery of high-quality products. However, the current practice of assigning risk levels (Critical, Major, and Minor) to self-reported quality issues (QIs) suffers from subjectivity and noise, leading to unreliable risk assessments. To address this limitation, this study aims to develop a web-based application that leverages Natural Language Processing (NLP) algorithms to infer the risk level based on the description of the issue (free text data). In this work, we propose a novel data-driven framework for classifying risk levels, which integrates state-of-the-art deep neural network (DNN) models with ensemble learning concepts. By utilizing the power of NLP techniques, our framework enables the automatic discovery and analysis of quality risks. Through extensive numeric experimentation, we demonstrate the effectiveness of our approach with proper performance metrics. The research findings presented in this work shed light on the potential of NLP in uncovering quality risks and offer valuable insights to practitioners in the pharmaceutical industry. Also, this study contributes to the growing body of knowledge in the field of risk management and highlights the importance of utilizing NLP algorithms for quality assurance in the biotech/pharmaceutical domain. Regarding our technology stack, we utilize Python, Streamlit, and Posit Connect for the development and deployment of the model.
Interactive web graphics are a popular and convenient medium for conveying information. However, web graphics are rarely used during the initial exploratory phase of a data analysis, largely due to the lack tools for seamless iteration between data manipulation, modeling, and visualization. As we’ve known for several decades, interactive graphics can augment exploratory analysis, but are only practical when we can iterate quickly. This talk demonstrates how to use the R packages plotly and dashR to rapidly produce interactive web graphics and applications that augment data exploration in addition to being easily distributed.
Programming is ubiquitous in applied biostatistics, and most statisticians know a programming language such as R - yet software engineering is still neglected as a skill and undervalued as a profession in pharmaceutical statistics. Why is this a problem? Importantly, we run the risk of wrong decisions when relying on code that we wrote ourselves without any code review by other statisticians. When transitioning over undocumented code to successors or other teams, we cannot be sure that they can even use, yet maintain it in the future at all. Also, whether they can reproduce results we produced earlier is a matter of luck. If we later need to add features to our code, and don’t have sufficient tests in place, we will undoubtedly introduce bugs and alter the program behavior without knowing it. Finally, if we need to implement new statistical methods for analyses submitted to regulators, we need to have appropriate software validation pipelines in place, which will demand well developed and tested code. What can we do about it? First and foremost, we must become aware of the problem. Second, we need to take software engineering seriously, starting from education in basic software engineering skills - across schools, universities, and during the work life. Establishing dedicated software engineering teams within academic institutions and companies can be a key factor for the establishment of good software engineering practices and catalyze improvements across research projects. Providing attractive career paths is important for the retainment of talents. Finally, collaboration between software developers from different organizations is key to harness open-source software efficiently and optimally, while building trusted solutions. We illustrate the potential with examples of successful projects.
Comparing Analysis Method Implementations in Software (CAMIS): An open source repository to document differences in statistical methodology across software
Statisticians and programmers using multiple software systems (e.g., SAS, R, Python) often encounter differences in analysis results, requiring further exploration and justification. Investigating these discrepancies can be time-consuming, especially when documentation doesn’t fully explain the software’s approach. Reasons for discrepancies may include differences in statistical methods, options, convergence algorithms, and rounding methods. Usually, neither software is incorrect, but they operate differently. As statisticians increasingly use multiple software, identifying reasons for differences becomes crucial. Comparing Analysis Method Implementations in Software (CAMIS) is a collaboration between PHUSE, the R Validation Hub, PSI AIMS, and the R consortium. The project investigates differences and similarities between SAS and R, storing code, case studies, results, differences, and findings in an open-source GitHub repository. This talk will discuss the project’s future roadmap and how you can contribute. By encouraging open-source collaboration, the project aims to become the go-to repository for statisticians and programmers to reference.
Over thousands of outputs (tables, graphs and listings) may need to be generated each year for filing, external publications, internal read outs and other activities in a pharmaceutical company. Although most of these outputs could be produced utilizing previous existing codes with trial specific adjustments, this process is still labor-intensive and requires good data&coding knowledge. Therefore, in this proof-of-concept project, we explored the potential of implementing large language model (LLM)-based frameworks to develop R codes to produce the outputs from ADaM datasets. GPT4 Code Interpreter with uploaded supporting files (template codes, variable dictionary and function manuals) demonstrated good potential of completing following tasks per user’s natural language requests 1) select the fit-for-purpose template code; 2) search in the variable dictionary and propose variables to use; 3) modify template codes to filter patients and update the output contents. This shows promising prospects for LLMs as an assistant for the future output generation, which will significantly reduce the labor required and lower the barrier to data&coding knowledge.
A Pivotal Year of R Pilots for Shiny Application Submissions to the FDA / Celebrating a Milestone: R Tables for Regulatory Submission Working Group s eBook on Clinical Data Table Generation
Within the life sciences industry, Shiny has enabled tremendous innovations to produce web interfaces as frontends to sophisticated analyses, dynamic visualizations, and automation of clinical reporting across drug development. While industry sponsors have widely adopted Shiny as part of their analytics and reporting toolset, a relatively unexplored frontier has been the inclusion of a Shiny application inside a clinical submission package to regulatory agencies such as the FDA. The R Consortium R Submissions Working Group has continued the positive momentum of previous submission pilots to achieve substantial progress in this domain. In this talk, we will share the development journey of the working group’s Pilot 2 successful submission of a Shiny application to the FDA, along with the progress on the use of novel technologies such as Linux containers and web assembly to bundle a Shiny application into a self-contained package, facilitating a smoother process of both transferring and executing the application. The R Consortium’s R Tables for Regulatory Submissions (RTRS) Working Group has released the first edition of (Tables in Clinical Trials with R)[ https//rconsortium.github.io/rtrs-wg/] as a free and openly accessible ebook. The book contributes to the development of a theory of displaying tabular information by identifying a small number of table archetypes that may be used to generate the most common tables employed in clinical submissions. Chapters in the book demonstrate how these tables may be rendered in different R Packages including flextable, gt, rtables (with and without tern), tables, tfrmt and tidytlg. All tables are generated from CDISC-compliant data. Comparing the code showcases the robustness of R for aggregating and displaying tabular information and illuminates the flexibility and design tradeoff of the various R packages. The talk will discuss the motivation for the book, present the idea of table archetypes, show some representative tables, and make the case for R as a superb language for analyzing clinical trial data. The RTRS working group expects Tables in Clinical Trials with R to become a primary resource of clinical programming teams.
Recent advances in the Shiny ecosystem boost the scale and scope of serious enterprise-wide web applications. More specifically, it is entirely possible to utilize key features of Shiny Server Professional and additional R packages such as shinyjs, DT, and batchtools to build Shiny applications that supports session management, high-performance computing, and reproducibility in a friendly and logical interface. Additionally, the shinytest package enables a robust workflow for developing applications efficiently, as well as being an important component to automate a validation testing framework. In this talk, I will share examples of key features and lessons learned in creating a technically powerful shiny application that integrates these pieces together.
Real-world data are increasingly used to complement evidence from clinical trials. However, missing data are a major statistical challenge when the underlying missingness mechanisms are unknown, e.g., to adjust for confounding. This talk introduces the smdi R package, which aims to streamline routine missing data investigations of partially observed confounders based on a suite of three group diagnostics. The structural missingness assumptions were recently validated in a simulation study and are characterized through M-graphs of realistic relationships between a partially observed confounder and its association with an exposure, outcome and other fully observed covariates. Aiming to differentiate between different missingness mechanisms, the package implements three group diagnostics to 1) compare distributions between patients with and without the partially observed confounder, 2) asses the ability to predict missingness based on observed covariates, and 3) examine if missingness is associated with the outcome under study. As a result, combining all group diagnostics can give guidance on how the underlying missingness for partially observed confounders could be characterized and approached in downstream analyses.
Validating open-source R packages has been a hot topic over the past few years. This talk focuses on MetrumRG’s updated process and tooling for validating first party R packages, that is R packages that we develop in-house, almost all of which are also open-source. This has been the fruit of interdisciplinary collaboration within Metrum, and has also benefited tremendously from the work of folks in the R Validation Hub and other cross-industry working groups. The result is a streamlined process for our developers, a better experience for our scientific users, and improved validation documentation for our Quality team.
At Idorsia, we have developed a large R Shiny application supporting a metadata-driven approach to shell and output creation. We will demonstrate some key features of the app and discuss how this approach led to huge efficiency gains when delivering results from a Phase III trial earlier this year. Additionally, we will discuss the value of the metadata collected within the app, and highlight the reuse of this in driving further Shiny apps supporting data review and exploration around the specified outputs. Our hope is to show the versatility of Shiny for this work, particularly when coupled with well-designed metadata management.
The interaction between the Major Histocompatibility Complex type I (MHC-I), a peptide and the T-cell receptor (TCR) (MHCIpTCR) is a key determinant of immune response elicitation and therefore of paramount importance in infectious- and autoimmune diseases and cancer. Current state-of-the-art models developed by our group can with great precision model MHCIp interactions. Using data from VDJdb and IEDB, we created an ensemble of convolutional neural networks, which to the best of our knowledge is the world’s first sequence based model capable of capturing the entire MHCIpTCR system. Due to limited data, we however currently can only model the interaction between the CDR3 region of the TCR’s beta chain with HLA-A*0201 and 3 peptides. However, as the model framework is easily extendable, we will increase the breadth and thus improve the model, as soon as more data become available. Using the current model and an independent test set, we obtained AUC = 0.747. TensorFlow is an open source software library for neural network models made by Google. Recently RStudio released Keras an API for accessing TensorFlow in R. Keras enables fast experimentation - Being able to go from idea to result with the least possible delay is key to doing good research.
Creating datasets and tables, listings and graphs (TLGs) for analyzing clinical trials data with R, such that in the final stage the code, datasets and TLGs can be submitted to the health authorities, is a multifaceted problem. We have been working on a number of R packages to create an R-based analysis environment that can be used for exploratory and regulatory analysis of clinical trials data. These projects include table creation (open source http//github.com/Roche/rtables); random data generation; querying CDISC standards; TLG creation; a pipeline for specifying and producing data and TLG deliverables (with logs, automation, titles and footnotes, etc.); a modular shiny-based exploratory framework that provides dynamic encodings, variable-based filtering, and R-code generation for the displayed outputs. The maturity of these projects varies, but the workflow and analysis environment as a whole can be demonstrated nicely. In this talk, we would like to generate interest in collaboration in order to make these projects more general and with the final goal of open-sourcing some of them.
Despite the explosive growth and adoption of R globally, concerns over how to qualify and administrate R continues to echo in discussions about use in regulated environments. In this talk, I’ll discuss the how to bridge the conceptual tenants of reproducibility, traceability, and accuracy to robust, yet agile, implementations such that and organization can maintain validated systems without imposing the shackles found in traditional validated environments. Furthermore, I will discuss a number of design elements specific to the open-source R ecosystem, such as using packages from CRAN and github, and cover how to embrace these, while responsibly managing risk in enterprise environments.
When it comes to analytics of data collected in medical research, today’s culture is compartmentalized – not only across institutions, but even within institutions. Such a culture stagnates analytical development and limits the ability to fully master the data thereby reducing the effectiveness in communicating clinical information to stakeholders. A unified culture can exist – statisticians, programmers, and clinicians need to speak to each other; regulatory agencies, pharmaceutical companies, and academics need to speak to each other. Once everyone comes together to discuss how the medical research data should be collected, interrogated and presented; analytics can be developed and shared within and across institutions. From an analytics perspective, nothing new needs to be developed, solutions are already available – many of them free. We just need to come together and start talking. So let’s talk.
R is a very powerful tool for performing statistical programming, but has had a lower uptake in the life sciences when compared to SAS. As a result, many of the packages created for R are not focused on the type of tasks Statistical Programmers do. In this talk I introduce several packages and in house training we are developing to aid regulatory outputs. The R packages include rcompare, a package to allow comparison of datasets, analogous to proc compare in SAS, and r4spa which allows outputs to be in the correct format for production. Each package solves a problem particular to the life sciences, and is intended to improve uptake of R usage within the industry. Similarly, to the R packages, the training is focused on providing examples of actual work, so that users of the training will be able to immediately apply their knowledge.
During the drug development, pharmacometric models are often built to characterize and understand drug efficacy and safety. Simulations based on these models can assist drug development and quantitative decision making. However, computation can be time-consuming and communication with the project team may not be productive. Shiny applications are developed as a simulation tool which allows rapid real-time simulations based on user-selected inputs and dynamic visualization of the results. It provides an easy access to individuals with no specific background of modeling and simulation. In the talk, I will present some case studies where the shiny application was used to perform simulations and facilitate the communication with decision makers.
For the Pharma Company How many times have you made a graph and gotten an email back saying “Can we change the axes?” or “Can we change the symbols?” or “I really need to look at the graph before I can tell you what I want”. It would be much more efficient for your customers to explore the data and the visualizations in an easy-to-use method and can free you up to work on the myriad other tasks you have to do. Rather than learn another programming language, the shiny package uses the R code you already know to create interactive visualizations with a small bit of additional learning. This talk will go over example dashboards for such data as adverse events, labs, and primary endpoints that will aid data managers, statisticians, clinical people, and… even you.
The drake package is a general-purpose workflow manager for data-driven tasks in R, with applications in the pharmaceutical industry ranging from tailored medicine to clinical trial simulation and beyond. Drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every runthrough starts from scratch, and completed workflows have tangible evidence of reproducibility. Drake is more scalable than knitr, more thorough than memoization, and more R-focused than other pipeline toolkits such as GNU Make, remake, and snakemake.
The open-source analytics community is driving innovation in precompetitive spaces like statistical methodology, reproducibility approaches, visualization techniques, and scaling strategies. The diverse and rapdily evolving ecosystem of open-source tools and standards stands in contrast with the disposition of the enterprise towards stability, standardization, and reliability. This talk will present the policies and frameworks we have developed at Genentech to enable internal scientists to responsibly leverage open-source tools and to participate in the community process through their own contributions.
In 3 years Real World Data Science Analytics in Roche/Genentech transitioned from a small team of former clinical trial programmers supporting a real world evidence team to become the largest department within the Personalised Healthcare (PHC) Centre of Excellence. This transition was driven by industry-wide acknowledgement of the growing importance of leveraging analytics to support PHC, but this change necessitated radical changes in workflows and competencies. To adapt to this change, the team has moved from using a single proprietary software (SAS) to becoming an open source-focused, R based but increasingly programming language agnostic, department. A core driver of this transition was the development of an internal suite of R packages that handled markdown templates and database access through to wrappers for common plots and documenting git hashes of all code used. Bringing this diverse set of tools into a coherent eco-system is a meta-package modelled on the tidyverse.
In Pharmaceutical industry, personalized patient care is about having access to traditional and new data sources including comprehensive diagnostic data, sensor data, real-world data, etc., applying traditional and advanced analytics like machine learning to create meaningful insights, and then realizing value from those insights for smarter and more efficient research and development (R&D) and improving patient access and personalized patient care. Biomarker research is a key component of the PHC Strategy, complementing efforts to access high-dimensional genomics data and conducting appropriated analysis using right tools differentiate from those for current well-established clinical trials. This paper, in perspective of R&D, describes the close collaboration between China Oncology Biomarker Data group (OBD China) and Product Development Biometrics (PDB) expertise, from sample collection, lab process in biomarker stand-alone studies to meaningful results mainly conducted in R, which enables to prioritize molecule development, inform the design of specific trials and identify R&D opportunities for regional diseases.
Shiny is a popular R package that lets users develop interactive web applications using just R code. The ease of use and downstream boost in productivity mean that working with Shiny can kick off a rapid request-implementation-inspiration-request cycle. Designing your applications with an eye toward future expansion can save time and reduce human error in the long term.
R Shiny has revolutionized the way statisticians and analysts distribute analytic results and research methods. We can easily build interactive web tools that enhance data visualization and facilitate data and information sharing. Shiny apps can empower non-statisticians to explore and visualize their data or perform their own analyses with methods we develop. Harnessing this power, R users have developed Shiny apps for visualizing clinical trials and pharmaceutical data, as well as applications that aid in study design and analysis. I will present examples of how Shiny can be used in many stages of the drug development process and discuss the challenges as well as benefits of incorporating these tools in pharmaceutical workflows.
Since its foundation in 2004, Metrum Research Group has relied on R as the core technology and central framework for all of the company’s biomedical modeling and simulation (M&S) service activities, spanning more than 475 projects with 150+ different sponsors. Projects include pharmacokinetic-pharmacodynamic modeling, quantitative systems pharmacology models, simulation-based trial design evaluations, disease progression and patient population modeling, model-based meta analysis of competitor data, model-based comparative effectiveness assessments, and data management activities, etc., all within a regulated environment. Analyses were conducted in R or via other software tools which are managed via R scripts, functions, or packages. Key deliverables of M&S projects are routinely provided as R packages or interactive simulation applications, driven by R (and R Shiny). R has also been an essential component of Metrum’s vision for Open Science in biomedical M&S, allowing for accessibility and reproducibility of platform models developed for multiple disease areas.
The R-based ecosystem, and its open-source methods for data manipulation, modeling and interpretation, is key for effective and reproducible research. This is certainly true in experiments relying on quantitative mass spectrometry. This relatively new and rapidly evolving field must overcome many sources of unwanted variation. It has many unsolved challenges, both in the appropriate use of the existing methods and tools, and in developing methods that address specialized problems. This talk will illustrate our R-based efforts to promote sound statistical practice, and build a community of competent practitioners. First, we will present Cardinal, a comprehensive tool for quantitative mass spectrometry-based imaging, as well as MSstats, a general but flexible framework for mass spectrometry-based proteomics. We will highlight the importance of these tools for pharmaceutical research in an example of statistical characterization of therapeutic protein modifications. Second, we will detail our efforts of building a community of competent users through a world-wide series of short courses, intended for experimentalists and computational scientists alike.
Precision medicine typically refers to the development of drugs and other interventions for individual patients. But how do you assess efficacy and make predictions in this extreme small data regime? The Bayesian framework is ideal for this type of inference as it allows us to combine population and personal effects in a principled way and make predictions for both groups and individuals. The inferences are further improved when we introduce mechanistically inspired components into the modeling framework. I’ll talk about building pharma models in the small data regime and how we use Stan (a statistical modeling language for Bayesian inference) with R for analysis.
Recruitment models for clinical trials are notoriously difficult to build due to many complex factors within a study. With input from experienced practitioners, we have built an interactive tool to allow individuals to build complex recruitment models using the R/Shiny framework. The Tool Enhance R, our platform for study modeling, was ported from an Excel-based tool to the R/Shiny platform to increase model development speed, expand capability and drive transparency into model development. The tool allows users to specify critical model attributes (i.e. country site distribution, recruitment/activation rates, country-specific vacations), and provide instantaneous feedback that changes have on a model’s probability of success. Using the RStudio Connect platform, we are able to grant multi-level access to users through a single web interface. Model development is tracked by exporting results to a SharePoint site and logging versions for future review/auditing. This gives significant levels of transparency on how a model was created and evolved over time. For web analytics, we used Piwik, and internal web analytics platform, to monitor how users navigate through the platform and identify browsing behavior. The application was built upon the Shiny Dashboard framework and leverages many visualization packages, including Plotly, Timevis, ggplot2 and many more. Many challenges arose in its develop, from controlling over-zealous user clicks causing out of control execution, to integrating service account execution of apps to facilitate centralized data control. This project pushed the limits of what the R/Shiny platform is capable of and demonstrates how data scientists can build useful solutions.
The United States Food and Drug Administration (FDA) requires that clinical trial data be submitted in the Study Data Tabulation Model (SDTM) standard format. The process of developing SDTM involves mapping captured raw data to their correspondent SDTM domains based on rules and conditions put by the Clinical Data Interchange Standards Consortium (CDISC) organization. SDTM data is further used for building the Analysis Data Model (ADaM) which is used for clinical trial statistical analysis. Mistakes in the mapping process are common due to the process complexities; issues that are missed may potentially affect the clinical trial result. Therefore, it is very essential to preserve the quality of the SDTM data. Currently, the main tool for checking SDTM conformance is Pinnacle21 (formerly known as OpenCDISC). Notably, there are usability and viability checks that are not included in Pinnacle21. This work describes the creation of an R shiny app to supplement the Pinnacle21 checks. This interactive app applies various CDISC-compliant SDTM data validation checks, and provides the user with a comprehensive report on possible inconsistencies in the data. The app would allow programmers to proactively find data mapping errors. In addition, it is straightforward to use and can save tremendous amounts of time. Additionally, this app’s audience extends beyond the programming community and covers other individuals who have an interest in data quality, particularly Data Managers and individuals in Clinical Science and Clinical Operations.
The pharmaceutical industry depends on accurate and reproducible data science for both preclinical and clinical analysis. Unfortunately, often an analysis cannot be reproduced and therefore its computational methodology and merit are unknown. Often, the data, code, or description of computational methods is not maintained. In order to implement good practices of reproducible computational research, the leadership of the company must invest time and resources into planning, training, ensuring adoption of common practices and tools, implementing documentation systems, encouraging discipline on the individual and group level, creating incentives, and requiring accountability. At Eisai, we have developed a working system for reproducible computational research that is enabled by leadership, technology, and culture. With regards to technology, we primarily use Rmarkdown and R Notebooks on an Rstudio server used by all our analysts. The Rstudio server is maintained by an administrator who installs packages for all users, creating a common package environment that ensures that code can be rerun in the future. Data and code are stored in a shared network drive and version control is accomplished by using Git. A wiki that is editable by all analysts is used to organize all analyses (tracked with unique analysis IDs) and provides links to code and results. With regards to culture, the leadership has promoted the values of quality and reproducibility. When yearly objectives are set, the performance criteria includes the creation of analysis documents (e.g. Rmarkdown reports), use of version control, and organization of data on shared network drives. Setting aside time for wiki documentation in the midst of high demands from project teams is helped by having periodic “documentation day” parties. To verify reproducibility, we have implemented “witnessing” once the analysis is finished, it is reviewed by an independent team member who officially signs off on the work, stating that the reproducibility criteria have been met. Our success in implementing reproducible computational research can serve as a model for other companies to use. Here we have provided a model based on leadership, technology, and culture.
The first challenge in validating an analytic tool for the pharmaceutical industry is that, despite a formal FDA definition, there is still no cross-industry agreement on what ‘validation’ really means with respect to an analytic tool. AIMS (Application and Implementation of Methodologies in Statistics), a Special Interest Group within PSI have been attempting to answer this question with respect to R. In doing so we recently received approval from the R Consortium for an online R package validation repository and are now looking to formalise some early definitions. In this presentation I will walk through some of the challenges that we have identified thus far and outline what we’re hoping to achieve with the platform.
Lilliam will be presenting a perspective on what the office of computational science is doing to support regulatory review for safety assessments. She will explore the concept of collaborations and sharing to support process and transparency, along with a perspective with the use of R.
The United States Food and Drug Administration (FDA) uses a variety of statistical software packages for review and research. This presentation will focus on the uses of R in the Center for Drug Evaluation and Research (CDER), including graphics for labels, Bayesian designs and analyses, simulations, machine learning, data quality and data integrity efforts, as well as interactive visualizations using R Shiny. Some of the challenges with using R will be discussed, as well as advantages of using R to collaborate with colleagues in industry and academe through Cooperative Research and Development Agreements (CRADAs), Broad Agency Agreements (BAAs), and working groups associated with professional societies (ASA, DIA, PhUSE).
Determination of bioequivalence (BE), a crucial part of the evaluation of generic drugs, may depend on clinical endpoint studies, pharmacokinetic (PK) studies of bioavailability, and In-Vitro tests, among others. Additionally, in reviewing Abbreviated New Drug Applications (ANDA), FDA reviewers often analyze safety studies and perform various kinds of simulations. A growing, vibrant group of statisticians in the Office of Biostatistics, CDER/FDA has adopted R for both their routine tasks and to address numerous scientific questions that are received in the form of internal consults. During the past 5 years, we have used R to run power simulations; generate the distribution of certain statistics of interest; assess the similarity of and cluster amino-acid sequences as well as, derive the distribution of the molecular weight of such sequences of a certain length; and determine the validity of data sets categorized for genotoxicity. R-package SABE was developed to accompany a new statistical test, used to assess BE of topical dermatological products when data for evaluation come from the In-Vitro Permeation Test (IVPT) [1]. BE tests consider comparisons between a Test (usually generic) and a Reference (RLD) product under a replicate study design. A function that assesses BE of a Test and a Reference formulation uses a mixed scaled criterion for the PK metrics AUC (Area Under the Curve) and Cmax (maximum concentration).
Our bioinformatics team is relied upon to quickly generate information to drive business decisions, allocate resources, and develop predictive models. As such, we constantly strive to streamline our work and create efficiencies when possible. To this end, we have developed a set of tools utilizing R packages, project templates, and parameterized R Markdown reports that enables a semi-automated, standardized modeling work flow. These tools have largely increased our output in generating informative metrics, improved our ability to reproduce our results, and empowered us to scale our team in a way that supports our company goals.
The gsDesign package for group sequential design is widely used with >30k downloads. The package was originally written in 2007 with substantial documentation and Runit testing created before 2010. A Shiny interface was created to make the package more approachable in about 2015. Recent efforts have focused on updating package to use Roxygen2, pkgdown, covr/covrpage and testthat as well as changing vignettes from Sweave to R Markdown. The learning curve for this modernization will be discussed as well as usage in a regulated environment.
nlmixr is a free and open source R package for fitting nonlinear pharmacokinetic (PK), pharmacodynamic (PD), joint PK/PD and quantitative systems pharmacology (QSP) mixed-effects models. Currently, nlmixr is capable of fitting both traditional compartmental PK models as well as more complex models implemented using ordinary differential equations (ODEs). It is under intensive development and has succeeded in attracting extensive attention and a willingness to make contributions from the pharmaceutical modeling community. We believe that, over time, it will become a capable, credible alternative to commercial software tools, such as NONMEM, Monolix, and Phoenix NLME.
Developing Shiny applications that meet design goals, easily deploy to multiple platforms, and contain easily maintainable components (all while adhering to best practices) is typically a difficult endeavor. Until recently, there has not been a tool addressing the optimal development workflow and construction of Shiny apps. The golem package by Think-R offers an opinionated framework for creating a Shiny app as a package, with usethis -like functionality to add a diverse set of capabilities. In this presentation, I will share how golem enables a robust standard for Shiny development and how it magically brought a dormant application back to life.
Introduction As pharmacometricians, we sometimes jump into complex modeling before thoroughly exploring our data. This can happen due to tight timelines, lack of ready-to-use graphic tools or enthusiasm for complex models. Exploratory plots can help to uncover useful insights in the data and identify aspects to be explored further through modeling or in future studies. Exploratory plots can even quickly answer questions without the need of a complex model, improving our efficiency and providing timely impact on project strategy. The Exploratory Graphics (xGx) tool is an open-source R-based tool, freely available on GitHub [1]. Intuitively organized by datatype and driven by analysis questions, the tool aims to encourage a question-based approach to data exploration focusing on the key questions relevant to dose-exposure-response analyses. Objectives - Facilitate the purposeful exploration of PKPD data - Encourage a question-based approach to data exploration, focusing on dose-exposure-response relationships - Provide a teaching tool for people new to PKPD analysis Methods PK (single and multiple ascending dose), and PD (continuous, time-to-event, categorical, count, and ordinal) data were simulated and formatted according to a typical PKPD modeling dataset format. Lists of key questions relevant to dose-exposure-response exploration were compiled, and exploratory plots were generated to answer each question. The graphs were created following good graphics principles to ensure quality and consistency in our graphical communications [2]. Results Examples of the key analysis questions include - Provide an overview of the data - What type of data is it (e.g. continuous, binary, categorical)? - How many doses? - What is the range of doses explored? - For PK data, how many potential compartments are observed? - Is the exposure dose-proportional? - Is there evidence of nonlinearity in clearance? - Assess the variability - How large is the between subject variability compared to between dose separation? - Can any of the between subject variability be attributed to any covariates? - Are there any patterns in the within subject variability (e.g. circadian rhythms, seasonal effects, food effects, underlying disease progression)? - Assess the dose/exposure-response relationship - Is there evidence of a correlation between dose/exposure and response? - Is the relationship positive or negative? - Is there a plateau or maximal effect in the observed dose/exposure range? - Is there evidence of a delay between exposure and response? For each datatype in the simulated dataset, plots were generated to answer these key questions. The plots along with the codes to produce them were compiled into a user friendly interface. The tool is intuitively organized by datatype and driven by the analysis questions. Since the graphs were generated based on a typical modeling dataset format and hosted online, they can be easily accessed and applied to new projects. Conclusion Exploratory plots were generated, built around typical key questions particularly relevant to dose-exposure-response exploration and compiled into a user friendly interface. The Exploratory Graphics (xGx) tool can help underscore the role of purposeful data exploration for quantitative scientists. Through a question-based approach, xGx helps uncover useful insights that can be revealed without complex modeling and identify aspects of the data that may be explored further. References [1] Margolskee, A., Khanshan, F., Stein, A., Ho, Y., and Looby, M. (2019) Exploratory Graphics (xGx). Pharmacometrics, Novartis Institutes for Biomedical Research, Cambridge. (Available from https//opensource.nibr.com/xgx/) [2] Margolskee, A., Baillie, M., Magnusson, B., Jones, J. and Vandemeulebroecke, M. (2018) Graphics principles cheat sheet. Biostatistical Sciences and Pharmacometrics, Novartis Institutes for Biomedical Research, Cambridge. (Available from https//graphicsprinciples.github.io/)
The standardization of nonclinical study data by the Clinical Data Interchange Standards Consortium (CDISC) via the Standard for Exchange of Nonclinical Data (SEND) has created an opportunity for the collaborative development and use of open source software solutions to analyze and visualize toxicology study data. Shiny is an open source R package that facilitates the development of user-friendly, web-based applications. The Pharmaceutical Users Software Exchange (PhUSE) consortium has provided a platform for stakeholders throughout the pharmaceutical industry to collaboratively build and share tools, e.g. R Shiny applications, to enhance the effectiveness and efficiency of drug development. The modeling of standard repeat-dose toxicology study endpoints, e.g. body weights, clinical signs, clinical pathology, histopathology, toxicokinetics, etc., in SEND has created new opportunities for dynamic, interactive visualization of study data above and beyond the static tables and figures typically included in static study reports. For example, clinical pathology data from nonclinical toxicology studies can be difficult to digest when presented as group means in data tables, due to the large number of potentially correlated analytes collected across treatment groups, sexes, and potentially multiple timepoints. An R Shiny application has been developed to allow end users to comprehensively examine these datasets, using a variety of analytical and visualization methods, with relative ease. The application is publicly hosted on shinyapps.io, and the source code can be found on the PhUSE GitHub website.
As the Pharmaceutical sector boosts its interactions with regulatory agencies using R programming as one key instrument for drug development submissions, we face a dilemma that several members of statistics and statistical programming teams are not currently advanced R programmers. For many years SAS has been a powerful tool in the data analysis repertoire of pharma statisticians however the recent development of automation capabilities such as RMarkdown and R/Shiny have created a new venue to expedite access to consumable information in the form of reports, presentations or interactive graphics that can be produced efficiently and in standard format for all phases of a drug development or submission process. At Janssen we aim to improve the literacy in R programming and achieve nearly 100% adhesion by statistics and statistical programming teams in the coming 2-3 years. To achieve this goal, we are leveraging all types of training formats, from online training, to in-house instructor led seminar, to one-on-one mentoring. One of the key methods we have been developing is the use of RStudio.Cloud as a platform for internal crowd-led hands-on workshops where statisticians/programmers are “thought” to solve on-the-job real problems ranging from visualization to automated reports. In this presentation we will discuss our experience creating this program and share lessons learned, mistakes and successes.
The Data Science team in Pfizer’s Vaccine Research and Development division (VRD) creates and maintains validated applications used during high-throughput clinical testing that enable advanced analytic and reporting requirements. SAS has long been the de-facto standard for analyzing data in a regulated GxP environment. Web deployment of these applications has been the best approach, and Pfizer VRD has developed several mid-tier applications in Java that submit batch SAS processes on a High Performance Computing grid. Pfizer VRD’s high level approach is the same across different assay platforms data are pulled from a combination of electronic files and Oracle databases and analyzed, results are written back to an Oracle database, and electronic output files are made in various formats (e.g. PDF). The regulated nature of Pfizer VRD’s work and the difficulty in deploying R-based applications over the web have previously been an impediment to the use of R, but new tools such as RStudio’s Shiny Server Pro have helped us overcome those challenges. This presentation focuses on a comparison of the architecture used to deploy our SAS applications and the infrastructure required to deploy R-based applications to meet GxP requirements. Real life examples will be provided to illustrate the usefulness of this platform in a regulated laboratory environment.
At R in Pharma 2018, I gave a workshop and a presentation on analyzing clinical trials data with R. Since then much has happened at Roche/Genentech with regard to analyzing clinical trials data with R our R-based projects got funded in order to extend our R tools and make them production ready. Since then we have created an R developer team that uses the SCRUM framework to work on our R tools. We also have subject-matter expert teams that translate the business knowledge into documentation and R code. Much effort has gone into setting up a CI/CD environment and defining a GitHub workflow that enabled our teams to collaborate efficiently and to keep the code quality high. In this presentation, I will first introduce the team structures, workflows, CI/CD environment and give some updates on the software development side. Finally, I will conclude with a brief update on our discussions around open sourcing and collaboration within the pharma industry and use this opportunity to start a conversation with the audience.
The R shiny-based nest framework (previously named teal) has been proven valuable in exploratory settings and supporting strategic decision meetings. To allow more clinical studies to be able to adopt this agile framework in a wider range, we’ve developed a R package osprey and its accompanying R-shiny modules package teal.osprey, for the summarization, analysis and visualization of safety and early efficacy data. At its current development stage, the packages provide standard safety tables and several plots, covering adverse events, disposition, tumor burden, response data and a few other domains. The packages inherited the code reproducibility and interactive dynamic filtering functionality, allowing seamless integration with the previously-existing modules in the nest framework.
Providing a Study Data Reviewer’s Guide for Clinical Data to accompany the SDTM datasets, define.xml, and annotated CRF in a submission gives additional information to help the FDA review team. The guide is traditionally authored using MS Word - a 100% manual and labor intensive process with its inherent shortcomings often exposed and aggravated during the usually frenzied sponsor submission process. R offers a more efficient solution with greater reproducibility Programmatic document generation facilitated by Shiny and R Markdown. Shiny not only manages R Markdown knitting but gives the sponsor staff, who oftentimes are unfamiliar with R, the ability to quickly leverage R with just a crash course in Markdown. An example of applying Shiny and R Markdown to generate the Study Data Reviewer’s Guide for Clinical Data will be presented.
In a large organization, collaboration faces many obstacles. Groups may inadvertently reinvent functionality and expend redundant effort. Siloing may impede aggregation and comparison of results. Analysts may not be aware of potential collaborators. However, a shared computational analysis environment, supported by centrally developed infrastructure and well-defined policies, enables discoverability, facilitates reuse, promotes communication between analysts, and improves comparability of results. We will present how we are pursuing this vision at Genentech.
R is the dominant language in modern quantitative science, however it is still not widely used in pharma industry. In this talk I will share learnings in building an internal R user community in a large global organization, via efforts including cataloging existing works, coordinating R adoption pilots and trainings, etc. In addition, I will share our experiences and challenges in building a streamlined workflow with an automated writing component to enhance efficiency and reproducibility in a recent health authority interaction, towarding our mission of bringing therapies to patients faster.
Content delivery in preparation for filing a clinical study report requires robust tooling for quickly and reproducibly compiling analysis of study data. Traditionally, this reproducibility has stemmed from one-time, rigorous validation of a development environment and analytic workflow. More recently, this paradigm has shifted to match modern software development principles, transitioning toward continuous monitoring of software validation and quality. I’ll share our developing perspectives on validation and reproducibility, driven by a need to leverage open source tools. This vision leans on open source software such as R and its package ecosystems, publicly maintained containerized environments like the rocker project and cross-industry risk assessment via the R Validation Hub. By treating analysis as a software process in the content pipeline transforming raw data into analytic results, we can take advantage of the continuous deployment workflows prevalent in the software development world to shorten our filing timelines, while simultaneously delivery a more reproducible product to our health authority partners.
Research reproducibility has been heatedly discussed in recent years. Some authors have pointed out that a large portion of published research findings is incorrect and/or irreproducible. Some state that the medical literature is as reliable as expected. Despite the true state of the research literature, we can agree that reproducibility is of greatest importance to biomarker detection, including in the context of drug discovery and development. Research reproducibility may become a challenge in the current research community, because of the interdisciplinary nature of modern studies and latent gaps between the advancement of technology and the ability to analyze the resulting data. In this presentation, we will describe the reproducibility challenges of biomarker detection in liquid biopsies, which hold prevalent promises in disease diagnosis, treatment, and prevention. Our study shows that proper development and use of R packages can give a significant contribution to reproducible, high-quality, and cutting-edge research in nucleic acid biomarker detection, such as detection of novel circulating microRNA biomarkers.
The success of a bacterial drug discovery program can be no greater than the phylogenetic diversity and capacity of those bacteria in the library to produce specialized metabolites (SM). However, the methods used to create bacterial strain libraries have seen little innovation in nearly 80 years. Current practice relies entirely on colony morphology and/or 16S rRNA gene sequencing analysis to decide which isolated strains to retain for addition to a drug discovery library. However, these practices create inefficient libraries plagued with a high degree of taxonomic and chemical redundancy by relying on physical characteristics that have limited correlation with strains’ SM, the foundation of drug discovery. Therefore, the development of a platform to rapidly prioritize unknown bacterial strains based on phylogeny and SM would greatly increase the efficiency of the front-end of microbial drug discovery. Our lab has recently developed such a platform, called IDBac, which uses in situ matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to analyze protein and specialized metabolite spectra of single bacterial colonies. Utilizing R and Shiny, alongside state-of-the-art packages and techniques in MALDI processing and data visualization, we created a stand-alone executable program for MALDI-TOF MS bacterial analysis. Using unsupervised learning methods and visualizations we have demonstrated IDBac’s capabilities by creating protein and specialized metabolite MS profiles, generating protein MS hierarchical groupings that accurately mirrored phylogenetic groupings and further distinguishing isolates based on inter- and intra-species differences in specialized metabolite production. With the ease of use of modern MALDI instrumentation and interactive, intuitive data exploration, IDBac can rapidly profile up to 384 bacteria in 4 hours. To our knowledge, IDBac is the first attempt to couple in situ MS analyses of protein content and specialized metabolite production and will enable laboratories with access to a MALDI-TOF MS the ability to rapidly create more efficient libraries for their drug discovery programs.
R Shiny apps allow for dynamic, interactive, real-time integration of knowledge within a drug-development program to support decision making. Here, an R Shiny app was used to explore the pharmacokinetic and pharmacodynamic effects of different dosing regimens of the anti IL-17 human mAb Cosentyx® (secukinumab) in pediatric patients. Secukinumab has been studied and approved to treat psoriasis in adult patients. Models which describe the dose-exposure-response relationships in adults (Lee et al., Clin Pharmacol Ther, 2019 and FDA, Medical Reviews BLA 125504, 2015) were used in the mrgsolve simulation package to explore these relationships in pediatric patients. The prior adult knowledge, used in conjunction with the computational infrastructure leveraged through R, the Shiny app, mrgsolve, and Rcpp, allows researchers to explore various dosing regimens in a difficult-to-study patient population. The tools and approaches described here have been routinely used to support regulatory interactions (ex. PIP) involving pediatric dosing.
GlaxoSmithKline is searching for new oncology drug targets. We have CRISPR knockout data for many cancer cell lines and many genes. For these same cell lines, we also have genomic data –somatic mutations, copy number variants, and gene expression. We use machine learning (random forests) to find predictive relationships between genomic features and cell line growth under knockout. Then we use GLASSES, a shiny app, to share the results with biologists. GLASSES lets scientists interactively explore key relationships and discover novel cancer vulnerabilities.
Machine learning workflows can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, speed, scale, and reproducibility of such projects with the drake R package. drake resolves the dependency structure of your analysis pipeline, skips tasks that are already up to date, executes the rest with optional distributed computing, and organizes the output so you rarely have to think about data files. This talk demonstrates a deep learning project with drake-powered automation.
We define and illustrate a “deep visualization” paradigm for the analysis of a relatively large and complex clinical database for psoriasis (PSO) and psoriatic arthritis (PsA). This paradigm supports a growing number of machine learning and exploratory analyses, and it provides a framework for Shiny applications and dashboards used to communicate results with internal and external clinicians. Our R platform implements a “whole-patient” data view including omics, imaging, and hundreds of anatomical assessments (scores) on multiple tissues, such as skin, joints, bones, entheses, etc. The package makes extensive use of anatomical metadata objects implemented as reference classes (Chambers, 2016), both for computing over anatomical structures and for visualizing disease state both at specific anatomical locations and at the patient-level. We present examples including visualization of bone and joint structural damage assessment scores, clustering of patients according to their disease trajectories, and association of pain to clinical endpoints over time.
+
+ Nowadays R is the talk of everyone in the pharmaceutical industry. A lot is being said about statistical programming (CDISC datasets, TFLs) with R and addressing validation issues. The most important players embraced R in various areas of their …
+
+
+
+ Validation of the R statistical package has become a hot topic since 2015, when the FDA issued the Statistical Software Clarifying Statement, stating officially that no specific software is required for submissions, and that any tool can be used if …
+
+
+
+ We ALL have a tendency to solve problems with solutions that may be far from optimal. How does this tendency shape our Scientific Software Architecture? What are the long-term consequences of that? What pushes us towards sub-optimal solutions? …
+
+
+
+ Drug safety data present many challenges with regard to curation, analysis, interpretation, and reporting. Safety endpoints have high variability, are multidimensional, and interrelated which points out to a need to identify novel approaches to …
+
+
+
+ The primary objective of the presentation is to share insights of democratizing powerful natural language processing tool like I2E lingumatics and open source R and Shiny. The talk will focus on how we can leverage I2E python sdk natural language …
+
+
+
+ In this talk, I will speak about my personal journey of learning R and transforming from a clinical study statistical programmer to a SAS/R bilingual, as well as my journey of leading the R initiative in Amgen’s Global Statistical Programming …
+
+
+
+ The Pharmaceutical industry is moving towards open-source tools with companies adopting R/Shiny to revolutionize their processes in clinical reporting, drug development, and translational research, among other areas. One of the initiatives of this …
+
+
+
+ Drug repositioning is an area of growing interest in drug development that can accelerate the discovery of new treatment options to benefit patients worldwide. Briefly, drug repositioning refers to the systematic investigation of a novel disease …
+
+
+
+ In the past years, the phama industry has seen a true paradigm shift in its use of R. Up until recently, one had to choose between R and SAS. Today, most statisticians are trained in both languages. With this in mind, at AstraZeneca we built on the …
+
+
+
+ In the pharmaceutical industry, a great deal of the data presented in the outputs we create are very similar. For the most part, most of these tables can be broken down into a few categories Counting for event based variables or categories, shifting …
+
+
+
+ So you've started writing custom JavaScript for your Shiny app... but where do you put all this code?! Organizing JS files to be sourced within one another can be really hard to navigate from within a Shiny application. In this talk I’ll cover what …
+
+
+
+ Back in 2020, Atorus had the initial release of the R package Tplyr, which was built to simplify the creation of clinical summary tables. Now in 2022, new updates and enhancements have been added to Tplyr to give the user more, particularly in the …
+
+
+
+ Sarepta deployed RStudio Team for modern data analytics. One major hurdle we faced was how to serve data to Connect/Workbench securely in the backend. Partnering with Atorus, Sarepta solved this challenge using Box cloud storage as a secure data …
+
+
+
+ Statistical programming of summary tables is a well-established task within the clinical world. In the last few years, the pharmaceutical industry has seen several new packages emerge to support these activities, including the Atorus package Tplyr . …
+
+
+
+ We will be presenting an overview of the interoperability between Python and R for the R user community at R/Pharma 2020. This workshop will highlight how statistical programmers can leverage the power of both R and Python in their daily processes. …
+
+
+
+ RStudio is one of the most commonly used software integrated development environment (IDE) for R. In this talk we present RStudio on Amazon SageMaker, a fully managed RStudio IDE on AWS cloud. We also walk through a Health Care and Life Sciences use …
+
+
+
+ Terms like "digitalization", "machine learning (ML)" or "artificial intelligence (AI)" are more than just buzzwords these days. Databases are analyzed worldwide with modern algorithms and entire industries are making data-driven decisions at an even …
+
+
+
+ A prespecified adaptive plan involves automating the analysis of interim clinical trial data and adjusting elements of the trial in response. In implementing these plans, we experience random highs and lows in the data, adjacent doses of a drug with …
+
+
+
+ Product quality plays a vital role in the success of Biotech/Pharmaceutical organizations, and the accurate classification of quality risks is crucial to ensure the delivery of high-quality products. However, the current practice of assigning risk …
+
+
+
+ The current paradigm for analyzing clinical trial data is cumbersome it is an inefficient, slow, and expensive process. Several rounds of iterations between the main programmer and the validation programmer are usually needed to thoroughly explore …
+
+
+
+ In this workshop we will walk through an implementation of the R Validation Hub's white paper A Risk-based Approach for Assessing R Package Accuracy within a Validated Infrastructure (https//www.pharmar.org/white-paper/). The workshop will explore …
+
+
+
+ Shiny is a popular R package that lets users develop interactive web applications using just R code. The ease of use and downstream boost in productivity mean that working with Shiny can kick off a rapid request-implementation-inspiration-request …
+
+
+
+ For the Pharma Company How many times have you made a graph and gotten an email back saying "Can we change the axes?" or "Can we change the symbols?" or "I really need to look at the graph before I can tell you what I want". It would be much more …
+
+
+
+ For data science teams, data preparation takes substantial investment of time, data science expertise and subject matter proficiency. However, as the name implies, data preparation is typically viewed merely as a means to an end, encouraging creation …
+
+
+
+ The development of a streamlined data-aggregation methodology utilizing the statistical programming language R is described. The centralization of high-throughput experimentation data enabled the use of statistics and data exploration methods within …
+
+
+
+ Decision analysis balancing both data analytics and human gut feeling is critical in designing efficient routes to synthesize new, complex small molecules. This challenge is faced by any organization seeking to deliver modern pharmaceutical compounds …
+
+
+
+ Failure to thoroughly review discrepancies and deviations in drug manufacturing is consistently one of the top citations in FDA inspectional observations. Learn how a leading biotechnology organization successfully replaced an inefficient, manual …
+
+
+
+ This invited talk will describe the current landscape of CDISC initiatives and collaborations. CDISC currently has a portfolio of innovative industry initiatives that include new standards as well as open-source software projects that are part of the …
+
+
+
+ Recruitment models for clinical trials are notoriously difficult to build due to many complex factors within a study. With input from experienced practitioners, we have built an interactive tool to allow individuals to build complex recruitment …
+
+
+
+ Genetically modified organisms (GMOs) and cell lines are widely used models to estimate the efficacy of drugs and understand mechanism of actions in biopharmaceutical research. As part of characterising these models, DNA sequencing technology and …
+
+
+
+ The current paradigm for analyzing clinical trial data is cumbersome it is an inefficient, slow, and expensive process. Several rounds of iterations between the main programmer and the validation programmer are usually needed to thoroughly explore …
+
+
+
+ The installation of a cohort of R packages can constitute a challenge; especially considering different dependency types, package versions, overlapping namespaces and varying risks assigned to each of the packages. At the same time, the number of R …
+
+
+
+ Safety and efficacy data in clinical trials are mostly analyzed separately. However, especially the treatment of life-threatening disease such as cancer requires a good understanding of benefit and associated risks to make an informed therapy …
+
+
+
+ The pharmaceutical industry depends on accurate and reproducible data science for both preclinical and clinical analysis. Unfortunately, often an analysis cannot be reproduced and therefore its computational methodology and merit are unknown. Often, …
+
+
+
+ Within the life sciences industry, Shiny has enabled tremendous innovations to produce web interfaces as frontends to sophisticated analyses, dynamic visualizations, and automation of clinical reporting across drug development. While industry …
+
+
+
+ Since its first release over eight years ago, the R community has progressively created amazing web-based applications with the Shiny package. In practically every R conference or user meetup, we see amazing examples of how Shiny is changing the …
+
+
+
+ Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that …
+
+
+
+ Data science can be slow. A single round of statistical computation can take several minutes, hours, or even days to complete. The targets R package keeps results up to date and reproducible while minimizing the number of expensive tasks that …
+
+
+
+ Developing Shiny applications that meet design goals, easily deploy to multiple platforms, and contain easily maintainable components (all while adhering to best practices) is typically a difficult endeavor. Until recently, there has not been a tool …
+
+
+
+ Machine learning workflows can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, …
+
+
+
+ Recent advances in the Shiny ecosystem boost the scale and scope of serious enterprise-wide web applications. More specifically, it is entirely possible to utilize key features of Shiny Server Professional and additional R packages such as shinyjs, …
+
+
+
+ The drake package is a general-purpose workflow manager for data-driven tasks in R, with applications in the pharmaceutical industry ranging from tailored medicine to clinical trial simulation and beyond. Drake rebuilds intermediate data objects when …
+
+
+
+ The use of R in submissions to healthcare regulators presents challenges as the quality of packages must be ensured, and evidence of this quality must be readily available. The Regulatory R Package Repository Working Group aims to tackle these issues …
+
+
+
+ Like many other companies, Merck KGaA/EMD Serono has embarked on their journey to enable the use R for regulatory submissions. Following the framework introduced by the R validation hub (Nicholls et al., 2020), we started to develop an algorithm to …
+
+
+
+ (The) Operation (formally known as) Warp Speed is a joint venture between pharma and government to bring COVID-19 vaccines to market at unprecedented speed. A key tenet of the program is to generate the data needed to establish correlates of vaccine …
+
+
+
+ Data sources and the volume of data available for driving discovery and informing decisions have substantially increased over time. This increase has resulted in an evolving data and regulatory landscape ripe for the expertise of statisticians and …
+
+
+
+ On Nov 22nd, 2021, the R Consortium R Submissions Working Group successfully submitted an R-based test submission package through the FDA eCTD gateway. The submission package has been received by the FDA staff who were able to reproduce the numerical …
+
+
+
+ The CDISC-SEND data standard has created new opportunities for collaborative development of open-source software solutions to facilitate cross-study analyses of toxicology study data. A public private partnership between BioCelerate and FDA/CDER was …
+
+
+
+ With recent technological advances and availability of new data sources, we are experiencing exciting changes to the human medical product regulatory landscape. While these new areas have created challenges, they also present opportunities. This …
+
+
+
+ The crisis of opioid abuse and overdose in the United States has involved unprecedented levels of opioid prescriptions and opioid-related mortality. Greater understanding of current trends in prescription opioid utilization may help prevent new cases …
+
+
+
+ The standardization of nonclinical study data by the Clinical Data Interchange Standards Consortium (CDISC) via the Standard for Exchange of Nonclinical Data (SEND) has created an opportunity for the collaborative development and use of open source …
+
+
+
+ Research reproducibility has been heatedly discussed in recent years. Some authors have pointed out that a large portion of published research findings is incorrect and/or irreproducible. Some state that the medical literature is as reliable as …
+
+
+
+ Determination of bioequivalence (BE), a crucial part of the evaluation of generic drugs, may depend on clinical endpoint studies, pharmacokinetic (PK) studies of bioavailability, and In-Vitro tests, among others. Additionally, in reviewing …
+
+
+
+ When it comes to analytics of data collected in medical research, today’s culture is compartmentalized – not only across institutions, but even within institutions. Such a culture stagnates analytical development and limits the ability to fully …
+
+
+
+ Lilliam will be presenting a perspective on what the office of computational science is doing to support regulatory review for safety assessments. She will explore the concept of collaborations and sharing to support process and transparency, along …
+
+
+
+ The United States Food and Drug Administration (FDA) uses a variety of statistical software packages for review and research. This presentation will focus on the uses of R in the Center for Drug Evaluation and Research (CDER), including graphics for …
+
+
+
+ In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention). Housed within Fred Hutch, SCHARP is an instrumental partner in the …
+
+
+
+ In this talk I will discuss the steps that have been created for validating internally generated R packages at SCHARP (Statistical Center for HIV/AIDS Research and Prevention) and the lessons learned while creating packages as a team. Housed within …
+
+
+
+ The scope of the paper is to show how to produce a statistical summary report along with explanatory text using R Markdown in RStudio. Programmers write a lot of reports that describe the results of data analyses. There should be a clear and …
+
+
+
+ Detailed exploration of large transcriptomics datasets, increasingly available at single-cell resolution, is a time-consuming task which often requires the complementary skill sets of data analysts and experimental scientists to complete analyses and …
+
+
+
+ Medical oversight during a clinical trial is an extensive and time-consuming process. To safeguard patient safety, medical monitors need to review and explore raw safety data interactively, using standard visualizations as well as specific analyses …
+
+
+
+ Over thousands of outputs (tables, graphs and listings) may need to be generated each year for filing, external publications, internal read outs and other activities in a pharmaceutical company. Although most of these outputs could be produced …
+
+
+
+ Continuous integration (CI) and continuous delivery (CD) are playing a pivotal role in ensuring that R projects in Pharma meet the highest quality standards. Particular focus is placed on ensuring that packages are fit for purpose both on internal …
+
+
+
+ The job of a data scientist working on a clinical trial team in the pharmaceutical industry is to provide the most accurate analysis possible in order to enable valid insights from the data. Ensuring data quality is extremely hard work and there are …
+
+
+
+ This is a 3-hour workshop on Stan (https//mc-stan.org). The overall goal of the workshop will be to make the best use of time to answer as many Stan-related questions as possible. The level of the workshop will be intermediate to advanced, but anyone …
+
+
+
+ Precision medicine typically refers to the development of drugs and other interventions for individual patients. But how do you assess efficacy and make predictions in this extreme small data regime? The Bayesian framework is ideal for this type of …
+
+
+
+ The presentation will introduce the transition project that the whole department of +150 SAS programmers has completely moved from SAS to Open-source programming. The whole department switched from SAS Studio to R Pro Server, Window server to AWS …
+
+
+
+ In this talk, we would like to introduce openstatsware, an official working group of the American Statistical Association (ASA) Biopharmaceutical Section. The working group has a primary objective to engineer R packages that implement important …
+
+
+
+ In the recent years, R Shiny apps have gained considerable momentum and have been utilized to develop many useful dashboards and user interfaces (UI) that allow non-programmers access to innovative tools. Due to the ease of development of Shiny apps …
+
+
+
+ The visR project for effective graphics in drug development visR is an open collaborative effort to develop solutions for effective visual communication with a focus on reporting medical and clinical data. The aim of the collaboration is to develop a …
+
+
+
+ The development of laboratory developed tests (LDTs) and in vitro diagnostics (IVDs) requires the execution of studies to determine the analytical performance of the assay. Examples of analytical studies include limit of detection, intermediate …
+
+
+
+ Supporting data-driven decisions in the planning of clinical trials during the current pandemic involves extensive integration of heterogenous data sources, sophisticated predictive modelling, and custom visualization to communicate the predictions …
+
+
+
+ In this workshop we will walk through an implementation of the R Validation Hub's white paper A Risk-based Approach for Assessing R Package Accuracy within a Validated Infrastructure (https//www.pharmar.org/white-paper/). The workshop will explore …
+
+
+
+ In the early phases of clinical development, the future of a compound depends on more than just the result of hypothesis test on a single endpoint, in a single phase 2 study. We think a lot about how design choices affect immediate outcomes. GSK's …
+
+
+
+ GlaxoSmithKline is searching for new oncology drug targets. We have CRISPR knockout data for many cancer cell lines and many genes. For these same cell lines, we also have genomic data --somatic mutations, copy number variants, and gene expression. …
+
+
+
+ The first challenge in validating an analytic tool for the pharmaceutical industry is that, despite a formal FDA definition, there is still no cross-industry agreement on what 'validation' really means with respect to an analytic tool. AIMS …
+
+
+
+ Statisticians and programmers using multiple software systems (e.g., SAS, R, Python) often encounter differences in analysis results, requiring further exploration and justification. Investigating these discrepancies can be time-consuming, especially …
+
+
+
+ How does a risk-averse Pharma Biostatistics organization with 800+ people switch from using proprietary software to using R and other open-source tools for delivering clinical trial submissions? First slowly, then all at once. GSK started the …
+
+
+
+ In 2022, our team announced the first release of the tfrmt package, providing clinical programmers with the novel ability to create tables without data. With its metadata-driven engine applied to the emerging industry standard of Analysis Results …
+
+
+
+ Continuous integration (CI) and continuous delivery (CD) are playing a pivotal role in ensuring that R projects in Pharma meet the highest quality standards. Particular focus is placed on ensuring that packages are fit for purpose both on internal …
+
+
+
+ Tables no longer just live in flat PDFs and reports, but should be able to go from apps to PDFs and Word documents with ease. To have the flexibility to do this we need to separate the analysis from the formatting. Additionally, in the pharmaceutical …
+
+
+
+ In this talk, we would like to provide updates on the four biopharmaceutical industry focused R consortium cross-industry working groups. These working groups have a similar overall objective to support the use of R within the biopharmaceutical …
+
+
+
+ Real-world data are increasingly used to complement evidence from clinical trials. However, missing data are a major statistical challenge when the underlying missingness mechanisms are unknown, e.g., to adjust for confounding. This talk introduces …
+
+
+
+ I will talk about some of the challenges that are now arising in BioTech. There are larger, more informative but much more complex, data sets available and being developed. While these hold great promise they add complexity to an already fragile …
+
+
+
+ Routinely-collected healthcare databases generated from insurance claims and electronic health records have tremendous potential to provide information on the real-world effectiveness and safety of medical products. However, unmeasured confounding …
+
+
+
+ Cohort studies of treatments developed from healthcare claims often have hundreds of thousands of patients and up to several thousand measured covariates. Therefore, new causal inference methods that combine ideas from machine learning and causal …
+
+
+
+ Cohort studies of treatments developed from healthcare claims often have hundreds of thousands of patients and up to several thousand measured covariates. Therefore, new causal inference methods that combine ideas from machine learning and causal …
+
+
+
+ At Idorsia, we have developed a large R Shiny application supporting a metadata-driven approach to shell and output creation. We will demonstrate some key features of the app and discuss how this approach led to huge efficiency gains when delivering …
+
+
+
+ RNA-seq transcriptome analysis workflows often generate the essential information (data and results) distributed among a variety of different tabular files and formats, e.g. raw and normalized expression values, results of differential gene …
+
+
+
+ In vivo studies are crucial to the discovery and development of novel drugs and are conducted for proof-of-concept validation, FDA applications and to support clinical trials. Appropriate study design, data analyses and interpretation are essential …
+
+
+
+ As the Pharmaceutical sector boosts its interactions with regulatory agencies using R programming as one key instrument for drug development submissions, we face a dilemma that several members of statistics and statistical programming teams are not …
+
+
+
+ Next-generation sequencing (NGS), phage display technology and high throughput capacities enables biologists in drug discovery to characterize antibodies (Abs) based on their HCDR3 sequences and further group them into families before moving to …
+
+
+
+ The use of open-source R is evolving in drug discovery, research and development for study design, data analysis, visualization, and report generation in the pharmaceutical industry. The ability to produce tables, listings and figures (TLFs) in …
+
+
+
+ When working with big data sources, such as medical claims data, the process of data review and quality control (QC) can be both complex and tedious. R and RMarkdown have become common tools for data analytics and report writing in the pharmaceutical …
+
+
+
+ If we could predict a patient's future risk of developing illnesses such as depression or lung cancer in the next three years, then we could potentially intervene and improve the patient's future health. The PatientLevelPrediction R package provides …
+
+
+
+ Julia is a modern programming language that provides the ease of use of R with the speed of C++. Julia has been in development for over 11 years. Research on Julia originated at MIT in 2009. Julia is powered by multiple dispatch - a generalization of …
+
+
+
+ As stated in my 2018 R/Pharma presentation "Becoming Bilingual in SAS and R" I believe in problem-solving using different data science tools. This talk is about my team's efforts at using different data science tools (SAS R and Python) to harmonize …
+
+
+
+ gtreg internally leverages gtsummary to streamline production for regulatory tables in clinical research. There are three functions to assist with adverse event reporting tbl_ae_count(), tbl_ae(), and tbl_ae_focus(); tbl_ae_count() tabulates all AEs …
+
+
+
+ Within the life sciences industry, Shiny has enabled tremendous innovations to produce web interfaces as frontends to sophisticated analyses, dynamic visualizations, and automation of clinical reporting across drug development. While industry …
+
+
+
+ Data monitoring to ensure patient safety is an important process in clinical trials. An independent data monitoring committee (DMC) reviews safety data periodically to interpret findings and assess various safety signals. Sponsors typically provide …
+
+
+
+ Computationally-intensive workflows exist in the design, analysis, and simulation in all phases of clinical studies. The runtime of such workflows can be significantly longer than a Shiny app can practicably handle. In such situations, binding the …
+
+
+
+ In this talk, we will discuss an infrastructure-free R package exchange and distribution system. The components include pkglite for compact package representations, cleanslate for portable R environments, and pkglink for runtime dependency …
+
+
+
+ In the safety analysis of clinical trials, the forest plot plays an important role. Currently, most of the forest plots are static, which makes them non-reader-friendly to Data Monitoring Committee (DMC). In this project, we propose an R package - …
+
+
+
+ The use of open-source R is evolving in drug discovery, research and development for study design, data analysis, visualization, and report generation in the pharmaceutical industry. The ability to produce tables, listings and figures (TLFs) in …
+
+
+
+ Introduction to the X-Omics Platform (XOP), a digital biomarker research platform for bioinformaticians and other scientist at Merck KGaA. XOP is a validated system for storing, processing, and analyzing "omics" data, including RNASeq, DNASeq …
+
+
+
+ Start browsing through R tutorials online and it won't take long to stumble across a read.csv statement. CSV files serve well for detached, static analyses. They tend fail, however, when tasked with storing large, dynamic data sets being accessed …
+
+
+
+ Objectives Demonstrate an interactive and dynamic visualization tool, ModViz POP, for simulating ordinary differential equations based PK/PD models with variability. Methods ModViz POP has an in built PKPD ODE library of models based on the …
+
+
+
+ Bayesian model-based dose-escalation designs, including one and two parameter logistic regression models, have meanwhile proven themselves in Phase I dose-escalation trials (Iasonos and O'Quigley, 2014 [1]). Compared to rule/algorithm-based designs …
+
+
+
+ The gsDesign package for group sequential design is widely used with 30k downloads. The package was originally written in 2007 with substantial documentation and Runit testing created before 2010. A Shiny interface was created to make the package …
+
+
+
+ The dramatic increase of R in the computational, analytics, and data science areas has led to some innovative techniques in recent years for interactive analytics. This rate of change presents challenges for IT organizations to keep up and to …
+
+
+
+ Validating open-source R packages has been a hot topic over the past few years. This talk focuses on MetrumRG's updated process and tooling for validating first party R packages, that is R packages that we develop in-house, almost all of which are …
+
+
+
+ Metrum Research Group (MetrumRG) has developed a suite of open-source R packages for pharmacometric analyses that can be used independently, or seamlessly integrated into a larger R-based ecosystem. To showcase this ecosystem, we used the popular …
+
+
+
+ After a brief introduction to mrgsolve (https//mrgsolve.github.io), we will discuss concepts and applications for using the package in R to simulate from pharmacokinetic (PK) and physiologically-based PK (PBPK) models, estimate parameters given a …
+
+
+
+ Physiologically based pharmacokinetic (PBPK) models are used extensively in drug development to address of number of problems. However, most PBPK applications have limited knowledge sharing impact because they are implemented in closed, proprietary …
+
+
+
+ R Shiny apps allow for dynamic, interactive, real-time integration of knowledge within a drug-development program to support decision making. Here, an R Shiny app was used to explore the pharmacokinetic and pharmacodynamic effects of different dosing …
+
+
+
+ During the drug development, pharmacometric models are often built to characterize and understand drug efficacy and safety. Simulations based on these models can assist drug development and quantitative decision making. However, computation can be …
+
+
+
+ Despite the explosive growth and adoption of R globally, concerns over how to qualify and administrate R continues to echo in discussions about use in regulated environments. In this talk, I'll discuss the how to bridge the conceptual tenants of …
+
+
+
+ Since its foundation in 2004, Metrum Research Group has relied on R as the core technology and central framework for all of the company’s biomedical modeling and simulation (M&S) service activities, spanning more than 475 projects with 150+ different …
+
+
+
+ A physiologically-based mathematical model was developed as a series of ordinary differential equations to describe compositional changes (in fat and fat-free mass, FM & FFM) due to metabolizable energy exchanges in babies from birth to 2 years in …
+
+
+
+ Statistical graphics play an important role in exploratory data analysis, model checking and diagnostics. The lineup protocol (Buja et. al 2009) enables statistical significance testing using visualizations, bridging the gap between exploratory and …
+
+
+
+ Pharmacokinetic (PK) analysis data programming poses some unique challenges. For example, both dosing and concentration records are included, and nominal and actual relative time variables that reference the first dose, the previous dose, or the …
+
+
+
+ The R-based ecosystem, and its open-source methods for data manipulation, modeling and interpretation, is key for effective and reproducible research. This is certainly true in experiments relying on quantitative mass spectrometry. This relatively …
+
+
+
+ Finding the right dose is a critical step in pharmaceutical drug development. There has been varies statistical methodology development for the design and analysis of clinical studies. In particular, MCP-Mod (Multiple Comparisons Procedure - …
+
+
+
+ It is relatively simple to create a powerful visualization app using shiny, but what if you need to change your data wrangling process or wish to build a different output? How easy is it to provide this flexibility without having to rewrite the …
+
+
+
+ In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny’s greatest strengths is that it allows producing web applications solely from R code, meeting client’s more delicate …
+
+
+
+ Motivated by the rapid rise in clinical data exploration, there is an increasing need to utilize interactive graphical displays using Shiny apps. To date, the development and deployment of study apps have required specialized knowledge and …
+
+
+
+ Identification of subgroups with increased or decreased treatment effect is a challenging topic with several traps and pitfalls. In this project, we would like to establish good practices for subgroup identification, by building a simulation platform …
+
+
+
+ Statistical graphics play an important role in exploratory data analysis, model checking and diagnostics. The lineup protocol (Buja et. al 2009) enables statistical significance testing using visualizations, bridging the gap between exploratory and …
+
+
+
+ The Beatles rose to music fame in the 1960's and became a worldwide phenomenon. With millions of screaming fans and selling over 600 million records, they are often cited as one of the most influential rock bands in history. One reason for their fame …
+
+
+
+ Effective visual communication is a core competency for pharmacometricians, statisticians, and, more generally, any quantitative scientist. It is essential in every step of a quantitative workflow, from scoping to execution and communicating results …
+
+
+
+ metashiny is an R package that provides a point-and-click interface to quickly design, prototype, and deploy essential Shiny applications without having to write one single line of R code. The core idea behind metashiny is to parametrize Shiny …
+
+
+
+ In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny's greatest strengths is that it allows producing web applications solely from R code, meeting client's more delicate …
+
+
+
+ Scientists in drug discovery research utilize a wide variety of instrumentation and techniques to advance their research. While instrumentation vendors often provide software tools to deal with data wrangling and visualization, a simple collection of …
+
+
+
+ We define and illustrate a "deep visualization" paradigm for the analysis of a relatively large and complex clinical database for psoriasis (PSO) and psoriatic arthritis (PsA). This paradigm supports a growing number of machine learning and …
+
+
+
+ Introduction As pharmacometricians, we sometimes jump into complex modeling before thoroughly exploring our data. This can happen due to tight timelines, lack of ready-to-use graphic tools or enthusiasm for complex models. Exploratory plots can help …
+
+
+
+ nlmixr is a free and open source R package for fitting nonlinear pharmacokinetic (PK), pharmacodynamic (PD), joint PK/PD and quantitative systems pharmacology (QSP) mixed-effects models. Currently, nlmixr is capable of fitting both traditional …
+
+
+
+ R has become a prominent data science tool, empowered by a fast-growing modern R eco-system. At Novartis, Shiny and markdown have gained a lot of popularity in analyzing, visualizing and reporting of clinical trial data. Traditional report analysis …
+
+
+
+ Addressing estimands in clinical studies involves handling of intercurrent events and often multiple imputation methods are applied to handle missing data. In Novo Nordisk more and more programming tasks are done in R, but still multiple imputation …
+
+
+
+ In the final stage of a clinical study, a number of tables and figures are prepared, typically using SAS, for reporting the results of the study in a clinical study report. Before the clinical study report is finalized a thorough interpretation of …
+
+
+
+ The data wrangling and manipulation capabilities in R make it perfectly suited for transforming raw clinical database data into structured, submission-ready CDISC datasets. By extensively using the dplyr, tidyr, and other packages in the tidyverse we …
+
+
+
+ Medical oversight during a clinical trial is an extensive and time-consuming process. To safeguard patient safety, medical monitors need to review and explore raw safety data interactively, using standard visualizations as well as specific analyses …
+
+
+
+ R Shiny has revolutionized the way statisticians and analysts distribute analytic results and research methods. We can easily build interactive web tools that enhance data visualization and facilitate data and information sharing. Shiny apps can …
+
+
+
+ In this workshop we will present how to perform analysis of RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each …
+
+
+
+ Drug safety data present many challenges with regard to curation, analysis, interpretation, and reporting. Safety endpoints have high variability, are multidimensional, and interrelated which points out to a need to identify novel approaches to …
+
+
+
+ How do you roll out R to hundreds of colleagues, ensuring that the version you're providing is tested, qualified and well managed? Is it possible to ensure that everyone is using the same version of R and packages? How do you account for differences …
+
+
+
+ Non-compartmental pharmacokinetic analysis (NCA) is used in the characteristization of drugs absorption, distribution and elimination in the body. Software that implements NCA is available from commercial and non-commercial, open-source, sources. …
+
+
+
+ The Data Science team in Pfizer’s Vaccine Research and Development division (VRD) creates and maintains validated applications used during high-throughput clinical testing that enable advanced analytic and reporting requirements. SAS has long been …
+
+
+
+ Within the life sciences industry, Shiny has enabled tremendous innovations to produce web interfaces as frontends to sophisticated analyses, dynamic visualizations, and automation of clinical reporting across drug development. While industry …
+
+
+
+ torch is an R port of PyTorch, a scientific computing library that enables fast and easy creation and training of deep learning models. In this talk, you will learn about the latest features and developments in torch, such as luz, a higher level …
+
+
+
+ Emerging diseases like COVID-19 pose dual threats to public health and the economy. Understanding protein-protein interactions (PPIs) between viral and host proteins is crucial for antiviral therapies and studying pathogen replication. However, …
+
+
+
+ As more and more companies move their compute environments into the cloud, the steps needed to ensure that their software suite and newfound infrastructure are FDA compliant change accordingly. In this talk, we will examine the requirements for an …
+
+
+
+ Julia is a modern programming language that provides the ease of use of R with the speed of C++. Julia has been in development for over 11 years. Research on Julia originated at MIT in 2009. Julia is powered by multiple dispatch - a generalization of …
+
+
+
+ In this talk, we would like to provide updates on the four biopharmaceutical industry focused R consortium cross-industry working groups. These working groups have a similar overall objective to support the use of R within the biopharmaceutical …
+
+
+
+ In this talk, we will be discussing an architecturally and bioinformatically multi-layered integrative multiomic approach to the development of target hypotheses. Scientists work to help pharmaceutical companies advance towards the identification of …
+
+
+
+ The Interactive Safety Graphics (ISG) workstream of the ASA-DIA Biopharm Safety Working Group is excited to introduce the safetyGraphics package an interactive framework for evaluating clinical trial safety in R using a flexible data pipeline. Our …
+
+
+
+ R and Biocondutor are important tools supporting scientific workflows across early Research and Development at Roche/Genentech. We have a broad R users community, which includes Data Scientists, Software Developers and consumers of Data Products …
+
+
+
+ MMRMs are often used as the primary analysis of continuous endpoints in longitudinal clinical trials (see e.g. Mallinckrod et al, 2008). Essentially, an MMRM is a specific linear mixed effects model that includes (at least) an interaction of …
+
+
+
+ In a large organization, collaboration faces many obstacles. Groups may inadvertently reinvent functionality and expend redundant effort. Siloing may impede aggregation and comparison of results. Analysts may not be aware of potential collaborators. …
+
+
+
+ R is the dominant language in modern quantitative science, however it is still not widely used in pharma industry. In this talk I will share learnings in building an internal R user community in a large global organization, via efforts including …
+
+
+
+ The R shiny-based nest framework (previously named teal) has been proven valuable in exploratory settings and supporting strategic decision meetings. To allow more clinical studies to be able to adopt this agile framework in a wider range, we've …
+
+
+
+ Content delivery in preparation for filing a clinical study report requires robust tooling for quickly and reproducibly compiling analysis of study data. Traditionally, this reproducibility has stemmed from one-time, rigorous validation of a …
+
+
+
+ At R in Pharma 2018, I gave a workshop and a presentation on analyzing clinical trials data with R. Since then much has happened at Roche/Genentech with regard to analyzing clinical trials data with R our R-based projects got funded in order to …
+
+
+
+ In Pharmaceutical industry, personalized patient care is about having access to traditional and new data sources including comprehensive diagnostic data, sensor data, real-world data, etc., applying traditional and advanced analytics like machine …
+
+
+
+ Creating datasets and tables, listings and graphs (TLGs) for analyzing clinical trials data with R, such that in the final stage the code, datasets and TLGs can be submitted to the health authorities, is a multifaceted problem. We have been working …
+
+
+
+ The open-source analytics community is driving innovation in precompetitive spaces like statistical methodology, reproducibility approaches, visualization techniques, and scaling strategies. The diverse and rapdily evolving ecosystem of open-source …
+
+
+
+ R is a very powerful tool for performing statistical programming, but has had a lower uptake in the life sciences when compared to SAS. As a result, many of the packages created for R are not focused on the type of tasks Statistical Programmers do. …
+
+
+
+ The United States Food and Drug Administration (FDA) requires that clinical trial data be submitted in the Study Data Tabulation Model (SDTM) standard format. The process of developing SDTM involves mapping captured raw data to their correspondent …
+
+
+
+ In 3 years Real World Data Science Analytics in Roche/Genentech transitioned from a small team of former clinical trial programmers supporting a real world evidence team to become the largest department within the Personalised Healthcare (PHC) Centre …
+
+
+
+ Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and trust results in order to confidently extend them, even when the results are their own. We …
+
+
+
+ R is pretty good in backwards compatibility but still reproducing analysis even given script and data can be a challenge as packages, R, and math libraries keep evolving. www.rocker-project.org offers among other things version-stable R in docker …
+
+
+
+ REAP (R-Shiny Exploratory Analysis Platform) was developed by the Modeling and Simulation group within the Clinical Pharmacology department at Genentech, Inc., to support exploratory analyses of clinical data. REAP is a web-based, user-friendly, tool …
+
+
+
+ bioWARP (biostatistical Web-Applications and R Procedures) is a Shiny application enabling employees at Roche Diagnostics to create validated reports for regulatory authorities submissions. bioWARP enables people using advanced statistical methods, …
+
+
+
+ Programming is ubiquitous in applied biostatistics, and most statisticians know a programming language such as R - yet software engineering is still neglected as a skill and undervalued as a profession in pharmaceutical statistics. Why is this a …
+
+
+
+ Streamlining clinical trial output workflows is a key challenge for clinical studies. Our project leverages Python to link the planned analysis stored in a google sheet LoPO (List of clinical study Planned Outputs) to the study scripts that generates …
+
+
+
+ Historically building a great SCE for clinical reporting involved selecting a vendor, integrating their product, and supporting a single proprietary language. The shift to report clinical trials using R has had a much broader impact than just …
+
+
+
+ The use of R in submissions to healthcare regulators presents challenges as the quality of packages must be ensured, and evidence of this quality must be readily available. The Regulatory R Package Repository Working Group aims to tackle these issues …
+
+
+
+ The Pharmaceutical industry is adopting new tools and technologies, putting pressure on individuals to learn many new skills in a short period of time. In order to both promote these new ways of working, and to assist those adopting it, at Genentech …
+
+
+
+ Continuous integration (CI) and continuous delivery (CD) are playing a pivotal role in ensuring that R projects in Pharma meet the highest quality standards. Particular focus is placed on ensuring that packages are fit for purpose both on internal …
+
+
+
+ Roche/Genentech, GSK, Atorus and J&J/Janssen have initiated a collaboration called pharmaverse to bring together a curated subset of open-source R packages to enable clinical reporting (from CRF to eSubmission). Where gaps are identified, new …
+
+
+
+ On Nov 22nd, 2021, the R Consortium R Submissions Working Group successfully submitted an R-based test submission package through the FDA eCTD gateway. The submission package has been received by the FDA staff who were able to reproduce the numerical …
+
+
+
+ Over the past year, we’ve designed a process that is meant to mimic public package publishing as closely as possible, where packages are automatically assessed by a series of checks which may prompt manual revision should the automated processing …
+
+
+
+ Increasingly biostatisticians in pharma companies would like to use R on a daily basis, e.g. the growing number of participants in R/Pharma conferences is one metric showing this trend. As R programs replace proprietary software in this regulated …
+
+
+
+ In recent years late stage Pharma has begun to transition from a consumer of open source, and a sporadic creator, to a heavily invested collaborator on open source tools like R packages. In this short talk, James will discuss our recent focus on open …
+
+
+
+ In this talk, we would like to provide updates on the four biopharmaceutical industry focused R consortium cross-industry working groups. These working groups have a similar overall objective to support the use of R within the biopharmaceutical …
+
+
+
+ In this short talk I will present few packages that can be used inside package testing framework that will help to increase overall quality of a package. The main point of focus would be static R code analysis tools such as well-known codetools …
+
+
+
+ R package validation is in all our minds since the pharmaceutical industry started moving away from SAS to R for its statistical analysis and regulatory submissions. Opting for open source programming requires to revisit our way of validating code, …
+
+
+
+ The rOpenSci project is a non-profit initiative founded as a grassroots effort in 2011. We have evolved into a truly global community of researchers and data scientists who are R users and developers from a wide range of disciplines. rOpenSci …
+
+
+
+ Recently there have been a lot of new developments for modeling in the tidyverse. This talk will show off tools for censored regression, an interface to clustering, and how to use the h2o.ai platform for optimizing/fitting models
+
+
+
+ Tidymodels has begun to create tools for modeling event time data. This will include methods for fitting, resampling, and characterizing models with censored outcomes. This talk will describe our design goals, show some syntax for modeling, and …
+
+
+
+ The gt package is table preparation package for R which makes the presentation of tabular data fairly easy and also has power to customize tables should you need it. The package has been in continuous development at RStudio for over three years and …
+
+
+
+ Visual representations of data inform how machine learning practitioners think, understand, and decide. Before charts are ever used for outward communication about a ML system, they are used by the system designers and operators themselves as a tool …
+
+
+
+ The pharmaceutical industry has witnessed a growing interest in open source languages such as R and Python as an alternative to SAS for many activities related to clinical research. Hop on board for a whistle-stop tour of our efforts within GSK …
+
+
+
+ A four-hour workshop that will take you on a tour of how to get from data to manuscript using R Markdown. You'll learn - The basics of Markdown and knitr- How to add tables for different outputs- Workflows for working with data- How to include and …
+
+
+
+ Shiny makes it easy to take domain logic from an existing R script and wrap some reactive logic around it to produce an interactive webpage where others can quickly explore different variables, parameter values, models/algorithms, etc. Although the …
+
+
+
+ We know that adopting documentation, testing, and version control mechanisms are important for creating a culture of reproducibility in data science. But once you've embraced some basic development best practices, what comes next? What does it take …
+
+
+
+ Even though a model prediction can be made, there are times when it should taken with some skepticism. For example, if a new data point is substantially different from the training set, its predicted value may be suspect. In chemistry, it is not …
+
+
+
+ Interactive web graphics are a popular and convenient medium for conveying information. However, web graphics are rarely used during the initial exploratory phase of a data analysis, largely due to the lack tools for seamless iteration between data …
+
+
+
+ Shiny is a package for turning analyses written in R into interactive web applications. This capability has obvious applications in pharma, as it lets R users build interactive apps for their collaborators to explore models or results, or to automate …
+
+
+
+ The tidyverse (tidyverse.org) is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. The packages primarily consist of tools for data ingest, …
+
+
+
+ Providing a Study Data Reviewer's Guide for Clinical Data to accompany the SDTM datasets, define.xml, and annotated CRF in a submission gives additional information to help the FDA review team. The guide is traditionally authored using MS Word - a …
+
+
+
+ Clinical development requires quick access to live trial data to address safety questions and evaluate data quality. Historically, teams have resorted to Excel to manually populate patient profiles, despite human error limitations, inefficiency, and …
+
+
+
+ Sarepta deployed RStudio Team for modern data analytics. One major hurdle we faced was how to serve data to Connect/Workbench securely in the backend. Partnering with Atorus, Sarepta solved this challenge using Box cloud storage as a secure data …
+
+
+
+ In this talk, we would like to provide updates on the four biopharmaceutical industry focused R consortium cross-industry working groups. These working groups have a similar overall objective to support the use of R within the biopharmaceutical …
+
+
+
+ R and Python compose the fundamental tools used by data scientists across industries including pharma and biotech. With a rich set of analytical packages in both language domains, analysts who are able to work with both possess a significantly larger …
+
+
+
+ Our bioinformatics team is relied upon to quickly generate information to drive business decisions, allocate resources, and develop predictive models. As such, we constantly strive to streamline our work and create efficiencies when possible. To this …
+
+
+
+ Predictive modeling is a powerful tool, which amongst other things can be applied for prioritising drug candidates. Limiting the search space needed for target exploration, can reduce costs markedly partly eliminating lab time and expensive kits. …
+
+
+
+ This workshop is introductory and open to everyone assuming basic R/Data Science skills. Please note, the workshop is very hands-on oriented, so expect to get your fingers dirty! The aim will be an introduction to ANNs in R. ANNs form the basic unit …
+
+
+
+ We are amidst a data revolution. Just the past 5 years, the cost of sequencing a human genome has gone down approximately 10-fold. This development moves equally fast within areas such as mass spectrometry, in vitro immuno-peptide screening a.o. This …
+
+
+
+ The interaction between the Major Histocompatibility Complex type I (MHC-I), a peptide and the T-cell receptor (TCR) (MHCIpTCR) is a key determinant of immune response elicitation and therefore of paramount importance in infectious- and autoimmune …
+
+
+
+ The past few years have shown vast improvements in workflows for reproducible and distributable research within the R ecosystem. At satRday Chicago everyone in the audience said they used R Markdown, however only one person raised their hand when …
+
+
+
+ The success of a bacterial drug discovery program can be no greater than the phylogenetic diversity and capacity of those bacteria in the library to produce specialized metabolites (SM). However, the methods used to create bacterial strain libraries …
+
+
+
+ Purpose To establish a gold-standard methodology for accurately extracting progression-free survival (PFS) following Diffuse Large B-Cell Lymphoma (DLBCL) treatment using real-world electronic healthcare record (EHR) data. Results We produced an R …
+
+
+
+ Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for …
+
+
+
+ In this workshop we will present how to perform analysis of RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each …
+
+
+
+ In recent years, R users' understanding of Shiny has greatly increased but so have client expectations. While one of Shiny's greatest strengths is that it allows producing web applications solely from R code, meeting client's more delicate …
+
+
+
+ Since 2021, the FDA and the NIH have increased citations and notifications for non-compliance with required results reporting on ClinicalTrials.gov. Many studies still do not submit results to ClinicalTrials.gov; some do not publish results after 3 …
+
+
This is a free event brought to you by passionate volunteers
+
By signing up, you agree that R/Pharma assumes no responsibility, and any costs, annoyances or catastrophic events incurred due to attending this conference are yours and yours alone
+
We’ll take some photos and audio recordings of the talks. Please don’t take recordings of other participants without their permission. * Please let us know if you prefer not to have your talk recorded. While participating in the group activities, your photo may be taken and be posted on the group website. It is your responsibility to inform others if you do not wish to be photographed and inform us if any of your photos to be removed.
+
Please don’t share content at R/Pharma unless you have permission to do so.
+
This is a free event and invites/tickets cannot be transferred or exchanged without permission of the Organizing Committee. In the unlikely occurrence that R in Pharma to be cancelled or postponed due to circumstances beyond the control, we cannot be held responsible for any costs incurred by the event attendee.
+
Speakers are subject to change and we do not guarantee that any specific speakers or participants will appear at the event. However, we will always try to find a suitable replacement if one of our key speakers or participants is unable to attend. Views expressed by speakers at the event may not be the views of R in Pharma.
+
Event materials are provided on an “as is” basis and R in Pharma makes no warranty regarding the accuracy or completeness of those materials.
+
+
Diversity:
+
R/Pharma is dedicated to providing a harassment-free conference experience for everyone regardless of gender, sexual orientation, disability or any feature that distinguishes human beings.
+
Policy Statement of the R in Pharma Conference:
+
The purpose of the R in Pharma conference is to foster discussion on the use of R in the pharmaceutical industry. R in Pharma is not, and hopefully never will be a commercially oriented meeting. As such, we do not accept administrative responsibility for distributing commercial materials, nor do we allow vendors to participate in the meeting in any role other than as scientists/practitioners interested in learning or discussing the application of R in Pharma. All attendees are requested to adhere to the spirit of this policy. Please don’t spam attendees.
+
Terms
+
R/Pharma is free event and comes with ABSOLUTELY NO WARRANTY. YOUR PARTICIPATION IS AT YOUR SOLE AND EXCLUSIVE RISK. By accepting the invitation to attend and registering a ticket, you freely and voluntarily assume all risks. You also agree not to hold organizers responsible for their negligence in connection with the conference or Content or events.
+
R/Pharma assumes no responsibility for any costs (e.g. travel) in any circumstance, examples including if the conference is cancelled, or named speakers are unable to attend.
+
You agree to indemnify, defend, and hold R/Pharma and all parites and its subsidiaries, affiliates, officers, directors, agents, co-branders or other partners, employees, and representatives harmless from and against any and all claims, damages, losses, costs or expenses (including reasonable attorneys’ fees and disbursements) which arise directly or indirectly out of or from (i) your breach of this Agreement, (ii) any allegation that any materials that you submit to R/Pharma or present (e.g., conference, github, website) infringe or otherwise violate the copyright, trade secret, trademark or other intellectual property rights of a third party, and (iii) your access or use of the Sites and/or the Services.
+
NOTWITHSTANDING ANYTHING TO THE CONTRARY HEREIN, TO THE FULLEST EXTENT ALLOWED BY LAW, YOU AGREE TO WAIVE, DISCHARGE CLAIMS, RELEASE All Parties FROM ALL LIABILITY AND INDEMNIFY AND HOLD HARMLESS ORGANIZERS, HOSTS, SUBSIDIARIES, AFFILIATES, OFFICERS, AGENTS, VOLUNTEERS AND OTHER PARTNERS AND EMPLOYEES, FROM ANY AND ALL LIABILITY ON ACCOUNT OF, OR IN ANY WAY RESULTING FROM INJURIES AND DAMAGES IN ANY WAY CONNECTED WITH ANY EVENTS OR ACTIVITIES.
+
Notes
+
No confidential information should be disclosed at meetings and no warranty of confidentiality is made by the event organizers. Audio Recordings of events may be taken by event organizers and general content may be used in discussion postings or meeting highlights on our website, github or used elsewhere. Attendees of events acknowledge this content may be shared and release the event organizers to use this content freely. Audio or video recordings of events by attendees are prohibited. We will not share information if you request we do not do so.
+
Cloud Software Terms
+
Please note that Hopin, RStudio Cloud and Eventbrite are third-party Cloud services.
+
The R/Pharma conference runs on the hopin virtual event platform. Hopin terms and conditions can be found here: https://hopin.com/legal-policy-center.
+
We expect you to provide your full name and email address for our events, and you will have to register with the Eventbrite and Hopin services as a personal consumer. Eventbrite will ask you to create an account with them using at least your first name, last name and your email address. Their terms and conditions state (in summary) that by doing so “[paragraph 17.1] you acknowledge and agree that you grant them full right and licence to use this information for the purpose of operating their Services (including Eventbrite’s promotional and marketing services).” Please also note that Eventbrite is a company registered in the United States, and the data you provide to them is most likely to be kept in the servers outside the European Union. As a result, if you prefer not to use the services provided by Eventbrite to register for a workshop, as an alternative please contact us at info@rinpharma.com directly providing your details required for registration and we will process your registration, if booked while the workshop was still open.
R/Pharma is excited to present a total of 16 workshops this year, hosted by members of our community. Zoom links will be sent to workshop attendees a couple of days before the workshop.
R/Pharma is an amazing community and all of these workshops are put on by volunteers at no cost. If you would like to contribute to a future workshop please reach out to us through the contact page.
+
+
+
+
+
+
+
Oct 25 09:00-12:00 ET
+
+
Clinical Trials Data Analysis at Roche
+
Hosted by Adrian Waddell (Roche)
+
+ Workshop Filled
+
+
+
+
+
+
+
+
+
Oct 25 14:00-16:00 ET
+
+
Intro Shiny
+
Hosted by Ted Laderas (DNANexus)
+
+ Workshop Filled
+
+
+
+
+
+
+
+
+
Oct 26 09:00-12:00 ET
+
+
Stan: Building a survival model from scratch
+
Hosted by Daniel Lee (Generable)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 26 16:00-18:00 ET
+
+
Unleash your Shiny apps with JavaScript
+
Hosted by Jean-Philippe Coene (Opifex) & David Granjon (Novartis)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 27 09:00-12:00 ET
+
+
A case-study driven introduction to the Julia language for R programmers
+
Hosted by Devin Pastoor (Metrum Research Group)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 27 13:00-16:00 ET
+
+
Building Tidy R Packages
+
Hosted by Juliane Manitz (EMD Serono) & Leigh Alexander (SomaLogic)
+
+ Workshop Filled
+
+
+
+
+
+
+
+
+
Oct 28 10:00-12:30 ET
+
+
Clinical Tables in gt
+
Hosted by Rich Iannone (RStudio)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 28 13:30-17:00 ET
+
+
R for Clinical Study Reports and Submission
+
Hosted by Yilong Zhang (Merck), Nan Xiao (Merck) & Keaven Anderson (Merck)
+
+ Workshop Filled
+
+
+
+
+
+
+
+
+
Oct 29 09:00-12:00 ET
+
+
SafetyGraphics
+
Hosted by Jeremy Wildfire (Gilead), Maya Gans (Atorus Research) & Xiao Ni (Sarepta Therapeutics)
+
+ Workshop Filled
+
+
+
+
+
+
+
+
+
Oct 29 15:00-17:00 ET
+
+
Python Machine Learning NLP
+
Hosted by Kevin Lee (Genpact)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 01 08:00-11:00 ET
+
+
tidy-transcriptomics
+
Hosted by Stefano Mangiola and Maria Doyle
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 01 13:00-16:00 ET
+
+
R Admin - RStudio Connect
+
Hosted by Kelly O'Briant (RStudio)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 05 09:00-13:00 ET
+
+
Deep Learning
+
Hosted by Leon Eyrich Jessen (Technical University of Denmark)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 05 10:00-13:00 ET
+
+
R Package Validation Framework
+
Hosted by Ellis Hughes (Fred Hutch) & Marie Vendettuoli (Fred Hutch)
R/Pharma is excited to present a total of 16 workshops this year, hosted by members of our community. Zoom links will be sent to workshop attendees a couple of days before the workshop.
R/Pharma is an amazing community and all of these workshops are put on by volunteers at no cost. If you would like to contribute to a future workshop please reach out to us through the contact page.
+
+
+
+
+
+
+
Oct 25 10:00-13:00 ET
+
+
Introduction to Julia
+
Hosted by Jose Storopoli (Univ Sao Paulo)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 25 13:00-15:00 ET
+
+
Shiny for Python
+
Hosted by Ryan Johnson (RStudio)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 26 09:00-13:00 ET
+
+
Drill Down Summary Tables in Shiny with Tplyr
+
Hosted by Mike Stackhouse (Atorus)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 26 13:00-16:00 ET
+
+
Creating Submission-Quality Clinical Trial Reporting Tables in R With rtables
+
Hosted by Adrian Waddell and Gabe Becker (Roche)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 27 08:00-11:00 ET
+
+
How to build Shiny testing architecture
+
Hosted by Marcin Dubel (Appsilon)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 27 12:00-15:00 ET
+
+
How to use pointblank to understand, validate, and document your data
+
Hosted by Rich Iannone (RStudio)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 28 09:00-12:00 ET
+
+
Using R to derive robust insights from real-world health care data
+
Hosted by Nathaniel Phillips (Plinth)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 28 13:00-16:00 ET
+
+
Coding Typical Clinical Processes Using R for the Entry Level Programmer
+
Hosted by Brian Varney (Experis)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 28 13:00-16:00 ET
+
+
Introduction to Observable - Companion Visualization Tool for Your Analysis
+
Hosted by Michael Freeman and Allison Horst (Observable)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 31 09:00-12:00 ET
+
+
Building a Data Science Community at Your Pharma
+
Hosted by Rachael Dempsey (RStudio)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 31 10:00-14:00 ET
+
+
Clinical Reporting in R (2 day workshop - Oct 31, Nov 1)
+
Hosted by Christina Fillmore (GSK), Ellis Hughes (GSK) and Thomas Neitmann (Roche)
+
+ Workshop Filled
+
+
+
+
+
+
+
+
+
Nov 01 10:00-13:00 ET
+
+
Admin - Clinical Environment Management
+
Hosted by Satish Murthy (J&J) and Devin Pastoor (RStudio)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 01 12:00-15:00 ET
+
+
Advanced Stan Workshop
+
Hosted by Daniel Lee (Syclik)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 02 09:00-12:00 ET
+
+
Introducing {gtreg}: an R package to produce regulatory tables for clinical research
+
Hosted by Shannon Pileggi and Daniel Sjoberg (Memorial Sloan Kettering Cancer Center)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 02 09:30-11:30 ET
+
+
Introduction to {shinyValidator}
+
Hosted by David Granjon (Novartis)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 03 06:00-08:00 ET
+
+
A hands on introduction to Quarto
+
Hosted by Julia Mueller (University of Freiburg)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 03 09:00-12:00 ET
+
+
Building Production-Quality Shiny Applications
+
Hosted by Eric Nantz (Eli Lilly)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 03 13:00-16:00 ET
+
+
How to use a complete JavaScript toolchain in your Shiny development
+
Hosted by David Hall (Novartis)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 04 10:00-13:00 ET
+
+
An intro to CI/CD for R packages
+
Hosted by Dinakar Kulkarni (Roche) and Ben Straub (GSK)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Nov 04 14:00-17:00 ET
+
+
Graph Neural Networks in Drug Discovery: Opportunities and Solutions
+
Hosted by Zichen Wang and Vassilis N. Ioannidis (Amazon)
R/Pharma is excited to present a total of 16 workshops this year, hosted by members of our community. Zoom links will be sent to workshop attendees a couple of days before the workshop.
R/Pharma is an amazing community and all of these workshops are put on by volunteers at no cost. If you would like to contribute to a future workshop please reach out to us through the contact page.
+
+
+
+
+
+
+
Oct 16 09:00-13:00 ET
+
+
Using pointblank Package to Ensure Maximal Data Quality: Pfizer Case Study
+
Hosted by Rich Iannone (Posit)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 16 12:00-14:00 ET
+
+
Building a Survival Model in Stan
+
Hosted by Daniel Lee (Bayesian Ops)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 16 14:00-16:00 ET
+
+
Observable Plot
+
Hosted by Allison Horst and Michael Freeman(Observable)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 17 10:00-14:00 ET
+
+
Version Control with Git & RStudio – Handy Tips for Use & How to Resolve Common Issues
+
Hosted by Alexandra Lauer (Merck KGaA, Darmstadt, Germany), Christina Fillmore (GSK), Irene Vassallo (Incyte) and Lyn Taylor (Parexel)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 17 14:00-16:00 ET
+
+
Visual Studio Code for Pharma: An Introduction
+
Hosted by Megan Chiang (Procogia)
+
+ Registration Closed
+
+
+
+
+
+
+
+
+
Oct 18 09:00-11:00 ET
+
+
Introduction to Machine Learning with {tidymodels}