Have you ever wished you could read the awesome stuff on geeksforgeeks.org on your
iPad? Or on your Kindle/Kindle App? Well, now you have it. Look under the directory goodies
and do the world some good with your algorithmic prowess ;)
Here's how the books look like in the iBooks App and Kindle App on my iPad. Kindle hasn't been tested.
Right now we have those books under goodies
- Tree Source: here
- Graph Source: here
- Array Source: here
- Recursion Source: here
- Backtracking Source: here
- Linked List Source: here
- Math Problems Source: here
- Greedy Algorithm Source: here
- Pattern Matching Source: here
- Divide and Conquer Source: here
- Dynamic Programming Source: here
- Advanced Data Structure Source: here
Want to create a book from the geeksforgeeks
site yourself? No problem. But you'll need some tools to get started. Apart from Python 2
you also need those.
Scrapy is used to download webpages from geeksforgeeks
. It makes it super easy to do so with its rules.
Install it with pip install scrapy
So you have the html files locally. But those html files have many other stuff you don't want. You only want... goodies. No problem. Check out boilerpipy, it can remove all the unnecessary stuff like header and comments, leaving you with only the article itself. It has the functionality of Pocket or Readability you might be familiar with.
Pandoc is just super. It's used here to convert html files or markdown files to epub files. But it can do so much more. It's also super easy to generate pdf
versions of the books if you want. You should definitely check it out.
You'll need kindlegen to generate mobi
files so you can read on your beloved Kindle or Kindle App. Download it from Amazon site and install.
You just need to use kindlegen awesome.epub
and it'll give you a file called awesome.mobi
. Awesome, right?
Go to the geeksforgeeks
subdirectory and run commands like scrapy crawl geeksforgeeks -a category=category -a name=name
.
For example, running scrapy crawl geeksforgeeks -a category=tag -a name=pattern-searching
will crawl from the page http://www.geeksforgeeks.org/tag/pattern-searching/
. category and name are two arguments our spider takes. On geeksforgeeks, things can be organized by tag
or category
. Specify the category/tag and the name, Scrapy will do the rest for you.
Following the example in 1, now go into the makethebook
subdirectory and you should be able to find a directory called pattern-searching
. Now run python generate_book.py pattern-searching
. It will first clean the html files, concatenate the cleaned files into one, then use pandoc
to create an epub file from the markdown file. In the end a mobi file is created using kindlegen
.
Yay! Done!
The encoding isn't well handled yet. You'll spot some gibberish(mostly caused by ‘ and ’) once in a while. While it won't affect your understanding much, it's quite annoying.
I've only worked on this project for a few days since I had the idea. It has huge room to improve. It's the first time I used Scrapy
and pandoc
.
You can contribute in many ways. Besides contributing code to this project. You are more than welcome to contribute in the following ways.
Every tag or category on geeksforgeeks
can be turned into a book. So you are welcome to add more books.
The style for generating epub
books is under styles
subdirectory. Welcome to submit your style sheets.
You can also make/submit cover images for the books so pandoc
can use them when generating epub
files. Right now they don't have any.
The content on Geeksforgeeks is licensed under Creative Commons Attribution-NonCommercial-NoDerivs 2.5 India. See the license here
The code in this project is licensed under Apache License, Version 2.0. See the license here
Jing Zhou, gnijuohz at gmail.com