-
Notifications
You must be signed in to change notification settings - Fork 80
Google Summer of Code 2021 Ideas
This page contains ideas for Google Summer of Code (GSoC) 2021.
Ruby 3.0 was released in December 2020 and is an exciting major upgrade of a great programming language! Ruby is used in industry and science. Ruby is a great language for learning and boosting your productivity. Writing code in Ruby is fun. Joining the SciRuby Google Summer of Code will make you a better programmer.
Our projects aim to make Ruby a better environment for science. Even if it is true that python has become the most prominent language for science we think that Ruby is the superior language and makes students (and mentors!) better programmers. We encourage diversity and the SciRuby project consists of friendly people from many backgrounds and nationalities. SciRuby is a growth organization, students become mentors and mentors become org admins (including this year's Udit Gulati), and almost every year our GSoC students went on to win additional awards and funding for their projects. We have a code of conduct which can be found here.
With the new GSoC projects are limited to about 180 hours of work. We have take care to adjust tasks, but often it is hard to predict how long something will take. During GSoC we'll adjust the program accordingly. Mentors are reachable through the mailing list of the GNU Guix project. Also, feel free to contact us individually.
- Pangenomes for Ruby
- NMatrix/NumRuby projects
- Making daru-view independent
- Improvements & Enhancement in Daru
- Ruby with machine learning Rust
- Technical Analysis with Ruby
- Ruby and the common workflow language (CWL)
- Binding SciRuby against HPC Rust libraries for artificial intelligence, linear algebra etc.
- Apply gaming technology with the pangenome explorer using graphics on GPU
- Ruby for scientific publications
- Improve Ruby wrapper for SymEngine
- Ruby wrapper for Shogun - Machine learning library
You can join us in the #sciruby
channel on chat.freenode.net
or via our mailing list.
IMPORTANT NOTICE: SciRuby encourages diversity. Scientific progress in general benefits from diversity and software development for science is no exception. We are really happy that the number of people from Asia, Africa and South America applying for GSoC projects is increasing. Our org admin this year is from India, our previous org admin was from Brazil. We have had students from Japan, India, Sri Lanka, Russia, etc. We have women software developers in our program. We are happy to hear from you all!
We strongly recommend that you pick one of the ideas listed below. We value contributions in advance of GSoC, even if they're just little ones. Go pick out something in one of our trackers and work on it, talk to folks on the listserv, and get an idea for what features are needed. These projects are not carved in stone, we can still adapt them to your ideas. Note the new GSoC is much shorter than in previous years. Projects should match the time line.
You don't need to know a lot about Ruby to work on a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you may need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.
In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) or our Google Group (see sciruby.com to sign up) and we can help you.
See also:
Most of the main SciRuby’s landing page on Github holds the stable version of SciRuby gems but developers and contributors should work on the very latest (bleeding edge) repositories in order to make sure that changes can be committed without conflict arising.
Try reading Finding The SciRuby Development Repositories on Github if you would like a brief introduction on finding the latest development gems to work on from Github. Also go through the coding guidelines before sending your first patch.
Here's a great tutorial: http://www.thinkful.com/learn/github-pull-request-tutorial/
Have a look and feel free to ask if you have any questions.
Guidelines for mentors to submit projects:
- Specify the name of your project as a heading.
- Write a paragraph or two with further details.
- Write a small 'Skills' section detailing the skills that the student must possess to complete the project.
- Write down your own GitHub handle and contact details in a 'Mentor Details' section over which the student can contact you.
- If anyone else wants to co-mentor a project, please specify your details along with the mentor's details.
Usually C-extensions are written for speed. Rust is a safe alternative that can reach comparable speeds and has high level abstractions for multi-core programming.
The student will work on pangenome functionalities, optimize them, document them and provide a path for similar exercises that can be done by others. If you want to know more about pangenomes and why they are at the cutting edge of research in (human) genetics: watch the talk by Erik Garrison.
Skills: Interest in multi-languages, high performance computing, C, Rust etc.
Difficulty: Advanced (indeed)
Mentor: @pjotrp, @george-githinji, @chfi, @ekg
NumRuby is a successor of NMatrix. NumRuby is a linear algebra library for Ruby that is highly performance oriented.
NumRuby is a successor of NMatrix. NumRuby is a linear algebra library for Ruby that is highly performance oriented.
- Add serialization support.
- Slicing to make use of view instead of copying data.
- Fix broadcasting.
- Implement random engine.
- Release NumRuby gem.
- Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)
Currently, NumRuby uses OpenBLAS for matrix, vector products. A user should also be able to use other BLAS implementations such as Intel MKL.
- Decouple NumRuby code from OpenBLAS.
- Implement generic code for BLAS library interface.
- Make sure that the library is working as expected using different BLAS implementations. Write tests for same.
- Write benchmarking code for the same.
- Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)
NumRuby is for dense matrices computation. For sparse matrices computation, we have Ruby-Sparse. Ruby-Sparse is a relatively newer project with a lot of potential. Sparse matrices are well suited for Graph algorithms. We currently don't have any graph algorithms support and it is quite useful to have this in the library itself.
- Read and understand the most used graph algorithms. Come up with the most optimal implementations for these algorithms.
- Implement the most basic graph algorithms for CSR/CSC and DIA sparse types.
- Implement some of the frequently used advance graph algorithms for CSR only.
- Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)
Block Compressed Row (BSR) sparse matrix format is a type of sparse matrix implementation which has recently been used quite frequently in the scientific work and hence been recently implemented in most of the major sparse libraries.
- Provide BSR sparse matrix support.
- Implement efficient conversion with dense matrix libraries like NumRuby and Numo-NArray.
- Implement efficient conversion with other sparse implementations (CSR, CSC, DIA).
- Mentors: Prasun Anand(@prasunanand), Udit Gulati(@uditgulati)
Learn basics of daru-view, from sciruby/blog or daru-view/wiki.
Daru (Data Analysis in RUby) is a library for analysis, manipulation and visualization of data. daru-view is for easy and interactive plotting in web application & IRuby notebook. It can work in frameworks like Rails, Sinatra, Nanoc and hopefully in others too.
It is a plugin gem to Data Analysis in RUby(Daru) for visualisation of data
Currently daru-view have dependencies with lazy_high_charts and googlevisualr, where SciRuby don't have any control. We have solved problems like (mainly):
- daru dataframe or vector compatible plotting gem.
- a gem that can work smoothly in any Ruby web application framework, IRuby notebook as well as terminal.
So now it is the time to be independent,
Because:
-
we don't have much control over these gems and also we will be keep adding new features directly from HighCharts and Google Charts official sites.
-
we have extended (overload and override) most of the methods from lazy_high_charts and googlevisualr, to make it compatible for IRuby notebook and all ruby frameworks or to add new chart features already presents in HighCharts and Google Charts.
-
daru-view should be able to handle future chart types as well without (or very less) modifying codebase.
You can find more details about in this wiki page - 'Making daru-view independent'.Along with this we also want to consider new ideas written in Idea wiki page
- daru-view/wiki/ideas
- Discussion in sciruby mailing thread
- Shekhar's blog post: GSoC 2017
- GSoC 2018 Progress Report
- GSoC 2018 discussion
- Skills: Basic knowledge of Ruby, Design pattern and Design Principles, Javascript and Ruby web application frameworks.
- Mentors: Shekhar (@Shekharrajak), Sameer (@v0dro)
- Difficulty: Moderate.
daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby. Th has various features like :
- Flexible and intuitive API for manipulation and analysis of data.
- Easy plotting, statistics and arithmetic.
- Easy splitting, aggregation and grouping of data.
- Quickly reducing data with pivot tables for quick data summary. and so on.
You can find most of the examples in here
While it has many methods for data wrangling, it is slow for a lot of use cases (check out these benchmarks). This task will involve figuring out the slow areas of daru and porting them to Rubex, which is a language for writing C extensions for Ruby or using simple Ruby C extension.
- Student needs to benchmark various daru methods and check how the Ruby C binding can help significant performance boost.
- List out features that are essential for data science and not present in daru currently.
- How can we improve the performance using parallel programming in Ruby?
- How can we remove visualization and I/O APIs from
daru
and use thedaru-view
anddaru-io
plugin gems instead?
Why this project is important:
-
SciRuby is planning for a powerful and fast Machine learning gem, that will be completely compatible with daru and namtrix gem. So we have to make daru faster and more powerful accordingly. We need to find a solution using namtrix as well.
-
If we want to improve Ruby for Data Science usage we have to keep update the daru features and it's API as per the present situation.
-
We already have plugin gems for visualization and I/O operation which is stable and functional. So we may now think about removing it from
daru
and use thedaru-io
anddaru-view
instead.
Other tasks
Related links
More about daru
Skills: Experience in data analysis | Experience in Ruby and C | General understanding of how compilers work | Understanding of good benchmarking practices
Difficulty: Advanced
Mentor: @v0dro, Shekhar (@Shekharrajak)
Interested students should take a look at https://github.com/rivella50/talib-ruby . We are going to build on top of it.
Skills required: Maths, Statistics, Finance, C and Ruby
Difficulty: Medium
Mentor: @prasunanand, @uditgulati)
CWL is a specification for building pipelines of tools. The configuration is in YAML. We would like to create a Ruby DSL that can generate these YAML definitions so we have an elegant way for deploying workflows on compute clusters. CWL is use, for example, in COVID-19 PubSeq
The student will create a number of pre-agreed functionalities, optimize them, document them and provide a path for similar exercises that can be done by others.
Skills: Interest in DSLs, workflows, parallel computing.
Difficulty: Average
Mentor: @pjotrp, @george-githinji, @mr-c
Usually C-extensions are written for speed. Rust is a safe alternative that can reach comparable speeds and has high level abstractions for multi-core programming. In SciRuby we love all languages that start with the letter R.
The student will bind a number of pre-agreed functionalities, optimize them, document them and provide a path for similar exercises that can be done by others. Software deployment of mixed languages often proves difficult.
Skills: Interest in multi-languages, high performance computing, C, Rust etc.
Difficulty: Advanced (indeed)
Mentor: @pjotrp, @george-githinji, @chfi
~1min video showing a yeast pangenome: https://youtu.be/TOJZeeCqatk
gfaestus is a Vulkan-accelerated GFA visualization tool for pangenomes. Help add Ruby bindings and Rust functionality for the pangenome explorer. Check out this online video
Vulkan is a low-overhead, cross-platform 3D graphics and computing API. Vulkan targets high-performance realtime 3D graphics applications such as video games and interactive media across all platforms. Compared to OpenGL, Direct3D 11 and Metal, Vulkan is intended to offer higher performance and more balanced CPU/GPU usage.
Code is at https://github.com/chfi/gfaestus
Skills: Interest in multi-languages, GPU graphics, FPS, C, Rust etc.
Difficulty: Advanced (indeed)
Mentor: @chfi, @ekg, @pjotrp and others on matrix/element pangenome groups
The backend for the Journal of Open Source Software (JOSS) is written in Ruby and makes full use of the Github API. The full work flow is based on the github issue tracker. In this project we want to refactor the source code and make it flexible to it can target multiple backends and be built on a full free software stack. Ruby is ideal for web-programming and this work is embedded in development happening for JOSS. With this publication oriented software we also target other journals, such as the BiohackrXiv. For the existing code base see https://github.com/openjournals/ Whedon and whedon-api repositories.
Skills: Interest in Ruby, web programming, Github API and scientific publishing
Difficulty: Moderate
Mentor: @pjotrp, @ktym, @arfon, members of @openjournals
A project started by the SymPy organisation, SymEngine is a standalone fast C++ symbolic manipulation library.
It solves mathematical problems the same way a human does, but way more quickly and precisely. The motivation for SymEngine is to develop the Computer Algebra System once in C++ and then use it from other languages rather than doing the same thing all over again for each language that it is required in.
The project for Ruby bindings has already been setup at symengine.rb. Few things that the project involves are:
- Extending the C interface of SymEngine library.
- Wrapping up the C interface for Ruby using Ruby C API, including error handling.
- Designing the Ruby interface.
- Integrating IRuby with symengine gem for better printing and writing IRuby notebooks.
- Integrating the gem with existing gems like
gmp
,mpfr
andmpc
. - Making the installation of symengine gem easier.
You can find the same idea in SymPy Idea-list here
Important links: - GSoC 2016 report - GSoC 2015 work
Recommended skills: You should be comfortable with C/C++ and familiar with Ruby. Refer to the wiki to get started.
Mentors: Co-mentor @Shekharrajak and @pjotrp
Shogun is an open-source machine learning library that offers a wide range of efficient and unified machine learning methods. It is written in C++ and provides Ruby wrapper as well.
We have plan to make it compatible with SciRuby data science related gems like: daru, daru-io & daru-view, nmatrix, rubyplot, distribution, statsample and all other which is useful in some point for data science projects.
Ongoing discussion is happening here: #4814.
SciRuby and Shogun team will be collaborating to make it happen.
Potential mentor: @prasunanand Co-mentor: @shekharrajak
If you have something completely different idea in your mind. First, you should start a discussion thread on the mailing list for your idea. The SciRuby will surely look into it and the idea may get improved during the discussion to be selected for GSoC period.
The best project for you is one you are interested in and are knowledgeable about. That way, you will be the most successful and productive in your project and have the most fun doing it, while we will be the most confident in your commitment and your ability to complete it.
Please use the below Idea Template to Mention Ideas:
Idea
(project idea, how it will help Ruby community and future of the project)
Current status of the idea
(Describe the work that has been done and timeline)
Involved Software and technology
Difficulty
(Advanced, Intermediate, or Beginner and any specific comments on the difficulty)
Skills and Knowledge required
(Any prerequisite knowledge or approach needed)