Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Automatic Differentiation #398

Open
NickSeagull opened this issue Oct 6, 2017 · 29 comments
Open

Support Automatic Differentiation #398

NickSeagull opened this issue Oct 6, 2017 · 29 comments

Comments

@NickSeagull
Copy link

NickSeagull commented Oct 6, 2017

If we had AD, Accelerate would be ready for creating Deep Learning frameworks on top of it.

I've found a very nice post explaining a way of implementing it in Rust and Python.

The thing is, if we implement it inside Accelerate it would be great, as all programs could be passed through AD.

The source for the obsolete rad package created by Edward Kmett is quite simple, and might be trivial to implement. (http://hackage.haskell.org/package/rad-0.1.6.3/docs/src/Numeric-RAD.html)

@JonathanFraser
Copy link

Have you checked out the ad package on hackage.

import qualified Prelude as P

import Numeric.AD as AD
import Data.Array.Accelerate          as Acc
import qualified Data.Array.Accelerate.LLVM.PTX     as GPU
import qualified Data.Array.Accelerate.LLVM.Native     as CPU

range :: Exp Int -> Acc (Array DIM1 Double)
range length = generate (index1 length) (\ix -> fromIntegral $ unindex1 ix + 1)

value :: Exp Int -> Acc (Array DIM1 Double)
value length = map (\v -> 2 * pi * v / (fromIntegral length)) (range length)

derivative :: Exp Int -> Acc (Array DIM1 Double)
derivative length = Acc.map (AD.diff cos) (value length)

res = GPU.run (derivative 100)

main = P.putStrLn $ P.show res

This code appears to run correctly and successfully take the derivative.

@NickSeagull
Copy link
Author

I thought that AD didnt work with Accelerate. This is very interesting, thanks @JonathanFraser 😄

@alex404
Copy link

alex404 commented Jan 22, 2018

I thought I should point out that the 'ad' library won't work in general with Accelerate. The above example only works because the differentiation is applied directly to a simple function (cos), and not to a function which involves Accelerate. In general, in order to take the gradient with 'ad' of a function of some container, that container has to instantiate Traversable, and the elements of the container have to be of some numeric typeclass, (e.g. RealFloat). As far as a I can tell, this rules out combing Accelerate and 'ad' in general.

I've developed a pretty large neural network library in haskell for my scientific work, and I've written my own linear algebra libraries so that I can use 'ad'. I'm eagerly awaiting for AD in Accelerate so I can throw out my own implementation and make use of GPU computations. Unfortunately I don't have time to develop this on my own, but perhaps it would be a good proposal for a Google/Haskell summer of code?

@tmcdonell
Copy link
Member

I would be happy to help mentor a student with Accelerate related aspects, but would need a co-mentor who understood the AD related aspects.

I think the risk here (as a short-term project) is that there are an unknown number of design decisions required.

As far as I understand, the AD package is not currently set up to use unboxed vectors / BLAS routines. If it did, then we would have a pretty solid model to follow, but as it is the first problem this project faces is one of design rather than implementation, and I have no idea where that problem falls on the scale of trivial to impossible --- people familiar with the AD package or similar libraries I am sure have a better idea on this, and I'd love to hear from them.

@alex404
Copy link

alex404 commented Jan 23, 2018

So I reached out to Edward Kmett about this, and he said he'd be willing to contribute to the AD side of things on such a project. He didn't say so explicitly, but his email seemed to imply that he thinks its a reasonable project, and he had some thoughts on the kinds of challenges that would need to be faced.

So anyway that seems quite promising! I'm not quite sure how to proceed, because I'm only tangentially involved, but I'll happily help however I can (I have been using haskell for 10 years, although it's all for my own work). Unfortunately I just checked the actual GSoC deadline and its actually today, so I'm not sure if that means that this project is already locked out for this year. Any thoughts?

@tmcdonell
Copy link
Member

I just checked and it looks like the organisation applications have closed already (at least in my time zone), which is unfortunate ): It means we don't have an urgent deadline at least.

I think, it would still be a good idea to write the proposal down. Depending on how things go, the community may run the Haskell Summer of Code again this year, or somebody interested might find the proposal and wish to work on it because it is a cool idea (masters / undergrad thesis students?).

Do you want to start writing down some ideas for that?

@alex404
Copy link

alex404 commented Jan 24, 2018

So I wrote to the Haskell commitee to see if they still recommended submitting a proposal, and apparently the "organization deadline" is much earlier then the actual project deadline. So even though any proposal we'd submit won't be part of selling Haskell to GSoC, it could still end up being part of the program. So that seems like good news all around!

It seems like the proposals are quite short, and are simply written in markdown. Over the next couple days I'll fork the summer of haskell page and write up a draft, and then link it here so that you and anyone else can contribute. Does that sound like a good plan?

@tmcdonell
Copy link
Member

Awesome, sounds like a good plan! 👍

@alex404
Copy link

alex404 commented Jan 25, 2018

Alright, done and done. The repo is here. I just wrote up something quite preliminary, so edit it however you wish, and I'll go over it again too.

I think more specific details about what the project entails would be good. I also listed myself as a mentor. Although I can't contribute as well to the core of the project, I can at least help when it comes to applications (i.e. deep learning) and benchmarking.

@tmcdonell
Copy link
Member

Great! I will add some more to it soon, but it is looking good already.

I think it is good that you put yourself as a mentor; having a person grounded in the applications for the work is always useful (:

@alex404
Copy link

alex404 commented Feb 8, 2018

Hey, Edward Kmett added some info on AD to the project proposal. Would you mind filling it out with Accelerate side stuff? I'll edit it once you're done and run it by you guys again.

@tmcdonell
Copy link
Member

@ekmett re alex404/summer-of-haskell@668a05a: I think I've (quite recently) implemented all the branchy code for Data.Complex and Linear.Quaternion already (except for this unhandleable case which throws error), unless I misunderstand what you meant?

@tmcdonell
Copy link
Member

tmcdonell commented Feb 9, 2018

So I think one way to structure this project would be:

  1. start work on solving Vectored AD ekmett/ad#2, possibly for just a subset of functionality in ad.
  2. given that api, it should be easier to implement the instances where the containers of AD variables are accelerate arrays.

Depending on how the student progresses we could decide to stop at / focus on part 1 only, as providing a BLAS-backed implementation of ad (e.g. via hmatrix) would still be of great benefit to the community. Part 2 then is the extension to enable a GPU-based implementation as well.

What do you think?

@alex404
Copy link

alex404 commented Feb 9, 2018

Adressing ekmett/ad#2 definitely sounds like a good way forward. My only concern is that if we only apply this solution to implementing AD in hmatrix, then we will have changed the apparent goals of the project rather significantly, and AD + GPU computation is a "sexier" project then AD + BLAS/LAPACK.

I suppose you're saying though that we can condition our choice of target library on how the project evolves, which sounds like a good strategy. It's hard for me to see though why hmatrix would provide a simpler target then going straight for accelerate. Do you think you could explain that concisely, or does it just boil down to, accelerate is way more complicated then just some BLAS/LAPACK bindings? Presumably this is something we'd want to write up in the proposal.

@tmcdonell
Copy link
Member

Yes, I think the hmatrix option is just a good backup, in case (1) takes too long, or (2) encounters some unforeseen problem which there isn't enough time to solve, etc; it gives the student a second avenue to get some cool benchmarks/results. It is probably not necessary to explicitly mention this backup plan in the proposal; as you said we could make that decision as the project evolves.

@tmcdonell
Copy link
Member

Okay, I added what we've been discussing to the proposal.

@tmcdonell
Copy link
Member

The more I think about it this two-step proposal seems too roundabout. It might be better to just say "we can use ad as inspiration for how the library might be designed (it is comprehensive, well written etc.) and implement something similar from scratch but backed by accelerate instead", rather than trying to bolt it onto this existing package (which sometimes works, e.g. linear-accelerate, but I have my doubts about that this time...)

@alex404
Copy link

alex404 commented Feb 10, 2018

Yah, I think what you're saying is right, and we can of course always change our minds about how exactly to proceed if/when the project actually happens. My understanding of AD is that it is more subtle then complicated, and need not take a lot of raw code to implement. With ad as a template (and ekmett's expertise) I think it's not overly ambitious to develop an implementation of AD for accelerate as a summer project.

@ekmett
Copy link

ekmett commented Feb 10, 2018

@tmcdonell Yeah it looks like you already took the hard road there. Then the same general idea applies. I have no particular attachment to doing this in ad vs. as a separate package. I think as the current linear-accelerate code shows, its probably easier to draw inspiration than to build directly atop. Feel free to adjust the language of the proposal accordingly.

@tmcdonell
Copy link
Member

Haskell.org has been accepted into GSoC this year https://summerofcode.withgoogle.com/organizations/5706672807346176/

@alex404
Copy link

alex404 commented Feb 13, 2018

That's great news!
How should we proceed now? Do you want to rework your contribution now that we're changing strategies? Based on our discussion I can also rework the draft myself if that would be easier, and then you (both) can have another look.

@tmcdonell
Copy link
Member

I updated the proposal a bit based on what we have been discussing, if you could take a look at it again that would be great!

@alex404
Copy link

alex404 commented Feb 14, 2018

Okay, I had another go at the draft. I've written it up saying there are 2 strategies (based on the suggestions of @ekmett ), and that one may be easier/less good but we can ultimately decide based on the experience of the student. Apologies if I butchered any of the technical details, I'm trying to get it in a form readable by people not immediately familiar with accelerate + ad.

@tmcdonell
Copy link
Member

great! looks good to me 👍

@alex404
Copy link

alex404 commented Feb 23, 2018

Okay, our idea is now online at the summer of Haskell. Any thoughts how we should find a student? Should we try to advertise or should we just wait for some enterprising student to just get in touch?

@mstksg
Copy link

mstksg commented Apr 4, 2018

Not sure if it is relevant, but in case it is helpful, just leaving as a note here my backprop library. It's a re-tooling of the ad API to allow for heterogeneous backprop, and so works with unboxed vectors and linear algebra results, etc (with the help of gradient-providing libraries like hmatrix-backprop; I have been looking into how best to adapt it for accelerate.

And, as someone who has been in these trenches for a little over a year now, I'd be happy to answer any questions or offer any help I can :)

@abhishekmaha23
Copy link

Hey, so I read the report of the successful GSoC project, and I'd like to take it further. Is the code merged into the library? What's the current state of things? I'm building a toolkit as a personal project using Haskell and this would help a lot.

@dschrempf
Copy link

May I ask about the status of this project? I am interested in performing automatic differentation of functions of vectors of type storable (I need some algorithm exported by hmatrix). I think I could replace those with vector matrix multiplications only using accelerate. Thanks for your help!

@tomsmeding
Copy link
Member

@dschrempf Firstly make sure that what you want is actually possible to do using Accelerate. Accelerate has its own notion of arrays; there is accelerate-io-vector which @tmcdonell may be able to tell more about if you're having trouble (see also Accelerate gitter). Interoperability may be nontrivial or easy enough, depending on exactly how your application is structured.


In case that Accelerate is a good choice for your application, however:

I have a fork of Accelerate that implements reverse-mode automatic differentiation on a significant subset of the language: https://github.com/tomsmeding/accelerate/tree/no-explode

The supported subset of the language excludes loops (while/awhile), permute, stencil and the scan variants; also, there are some restrictions on fold (only over Float, and no array indexing in the combination function). The rest should work, if I'm not forgetting anything. The API is in Data.Array.Accelerate.ReverseAD. This was work for my master's thesis. Some of these restrictions are hard to solve (e.g. while loops), some just take a bit of work that I haven't put in yet (e.g. some of the fold restrictions).

This isn't merged in Accelerate proper yet because there are some rough edges, and because I'm currently (with my PhD advisor) working on a different method of doing AD on Accelerate-like languages that has potential to give a cleaner and more verifiable implementation.

Because of it being a fork, I know that it may be hard to use. Nevertheless, should you decide to try it, I'd love to hear if it works for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants