CVPR Accepted Papers Viewer

NOW LIVE FOR 2020!

The main goal of these scripts is to build a page that displays the accepted papers for CVPR 2020 in a way that is easier for humans to parse (see: https://mattdeitke.com/CVPR-2020). Below is an example of what this repository will display, and following that is what CVPR open access currently shows.

In particular, there is functionality to cluster papers based on latent Dirichlet allocation topics, create thumbnail images from the first 8 pages of each PDF, find the abstracts, copy a BibTeX, view the paper and supplementary material, and more. The scripts use Python 3.7 and should work for any past and future CVPR conference (unless they change how they display the pages). Modifications can be made to adapt the format to another conference.

Installation

Clone this repository git clone https://github.com/mattdeitke/CVPR2019
Save the HTML from where the accepted papers are displayed. For CVPR, this year, that would be http://openaccess.thecvf.com/CVPR2020.py.
Run download_pdfs.py to download all the PDFs inside the content/ folder. (Note: You will need to point the script from cvpr2020oar.html to where ever you saved the HTML files from part 1.)
Run getabstracts.py to generate the abstract files inside the abstracts/ folder.
Install ImageMagick, which can be done using sudo apt-get install imagemagick or using another supported method such as brew install imagemagick.
Run pdftowordcloud.py to generate top words for each paper. The output is saved in topwords.p.
Run pdftothumbs.py to generate tiny thumbnails for all papers. The outputs are saved in thumbs/ folder.
Run scrape.py to generate each paperid, title, authors list by scraping the cvpr2020oar.html page.
Run makecorpus.py to create allpapers.txt file that has all papers (one per row).
Run python lda.py -f allpapers.txt -k 7 --alpha=0.5 --beta=0.5 -i 100 . This will generate a pickle file called ldaphi.p that contains the LDA word distribution matrix. Thanks to this nice LDA code by @shuyo! It requires nltk library and numpy. In this example we are using 7 categories. You would need to change the cvprnice_template.html file a bit if you wanted to try different number of categories.
Finally, run generatenicelda.py to create the index.html page.

Acknowledgements

Big thanks to @karpathy for his NeurIPS preview and ArXiV Sanity Preserver, which is what this repository builds on! Also a thanks to @tholman for creating a more modern GitHub Corners and @shuyo for the LDA code! Finally, more thanks go to the people at CVPR for openly publishing all of their accepted papers!

Licence

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVPR Accepted Papers Viewer

Installation

Acknowledgements

Licence

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
abstracts		abstracts
content		content
doc/static		doc/static
thumbs		thumbs
CapsnetEmirati_Final.ipynb		CapsnetEmirati_Final.ipynb
LICENSE		LICENSE
Readme.md		Readme.md
allpapers.txt		allpapers.txt
cvpr2020oar.html		cvpr2020oar.html
cvprnice_template.html		cvprnice_template.html
download_pdfs.py		download_pdfs.py
generatenicelda.py		generatenicelda.py
getabstracts.py		getabstracts.py
index.html		index.html
jquery-1.8.3.min.js		jquery-1.8.3.min.js
lda.py		lda.py
ldaphi.p		ldaphi.p
makecorpus.py		makecorpus.py
out.txt		out.txt
papers.p		papers.p
pdftothumbs.py		pdftothumbs.py
pdftowordcloud.py		pdftowordcloud.py
scrape.py		scrape.py
stopwords.txt		stopwords.txt
topwords.p		topwords.p
vocabulary.py		vocabulary.py

License

Divs1159/CVPR-Accepted-Papers-Viewer

Folders and files

Latest commit

History

Repository files navigation

CVPR Accepted Papers Viewer

Installation

Acknowledgements

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages