Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A benchmark of random-fu (replacing mwc-random) #323

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

idontgetoutmuch
Copy link
Member

No description provided.

@idontgetoutmuch
Copy link
Member Author

Annoyingly I have had to comment out some of the existing benchmarks. I tried

./Speed-Fu --match prefix Normal
benchmarking Normal single sample monad bayes
time                 172.3 μs   (172.1 μs .. 172.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 172.8 μs   (172.6 μs .. 173.1 μs)
std dev              807.4 ns   (512.3 ns .. 1.369 μs)

but if I uncomment the other tests then

./Speed-Fu --match prefix Normal
Error: none of the specified names matches a benchmark

even though

./Speed-Fu --list | grep Normal
Normal single sample monad bayes

@idontgetoutmuch
Copy link
Member Author

So with random-fu I get

benchmarking Normal single sample monad bayes
time                 171.3 μs   (171.1 μs .. 171.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 171.8 μs   (171.5 μs .. 172.2 μs)
std dev              1.028 μs   (566.3 ns .. 1.746 μs)

but with mwc-random I get

benchmarking Normal single sample monad bayes
time                 304.9 μs   (304.3 μs .. 305.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 305.1 μs   (304.7 μs .. 306.0 μs)
std dev              2.089 μs   (1.312 μs .. 3.365 μs)

@Shimuuar I don't know why this would be. The implementation in random-fu is a bit opaque and uses a different algorithm (not the Doornik modified ziggurat method as far as I can see). I will try other distributions. This is just for information. No need to do anything.

@turion
Copy link
Collaborator

turion commented Oct 18, 2023

Can you, instead of editing in place, copy the file Sampler/Strict.hs to Sampler/StrictFu.hs maybe? And then you have two sampler types, both of which you can import in the benchmark, and test against each other?

@idontgetoutmuch
Copy link
Member Author

idontgetoutmuch commented Oct 19, 2023

Well this is mysterious. When I run both benchmarks in one program I get

cabal build speed-bench
speed-bench --match prefix Normal

Removing: speed-samples.csv
Removing: raw.dat
benchmarking Normal single sample monad bayes
time                 21.92 μs   (21.85 μs .. 22.02 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 21.94 μs   (21.90 μs .. 22.03 μs)
std dev              183.7 ns   (126.4 ns .. 300.8 ns)

benchmarking Normal single sample monad bayes fu
time                 112.4 μs   (112.3 μs .. 112.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 112.4 μs   (112.3 μs .. 112.6 μs)
std dev              412.1 ns   (273.1 ns .. 674.6 ns)

I wonder what the benchmark is actually measuring.

Comment on lines +142 to +147
normalBenchmarks = [ bench "Normal single sample monad bayes" $ nfIO $ do
sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
return $ sum xs)
, bench "Normal single sample monad bayes fu" $ nfIO $ do
FU.sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
return $ sum xs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are calling polymorphic code (the inner do blocks) on particular types. This means that additionally to the actual computation, type class dictionary lookup is also performed. This can have a performance impact. Maybe this can be improved by adding a SPECIALISE pragma to the MonadDistribution instance definitions of both sampler modules. See https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/pragmas.html#specialize-pragma.

I imagine a change like this:

  uniform a b = SamplerT (ReaderT $ uniformRM (a, b))
  {-# SPECIALISE uniform :: Double -> Double -> SamplerIO Double #-}

This is for the MWC sampler, and a similar pragma can be added to the random-fu sampler. Hopefully, both will be faster and more comparable then.

instance StatefulGen g m => MonadDistribution (SamplerT g m) where
random = SamplerT (ReaderT $ RF.runRVar $ RF.stdUniform)

uniform a b = SamplerT (ReaderT $ RF.runRVar $ RF.doubleUniform a b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uniform a b = SamplerT (ReaderT $ RF.runRVar $ RF.doubleUniform a b)
uniform a b = SamplerT (ReaderT $ RF.runRVar $ RF.doubleUniform a b)
{-# SPECIALISE uniform :: Double -> Double -> SamplerIO Double #-}

@idontgetoutmuch
Copy link
Member Author

@turion I didn't try your suggestions yet and went back to basics. I have

{-# LANGUAGE ImportQualifiedPost #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (mkStdGen, newIOGenM)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)

normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)

normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
                       xs <- replicateM 1000 normalFu
                       return $ sum xs
                   , bench "Normal mwc" $ nfIO $ do
                       xs <- replicateM 1000 normalMwc
                       return $ sum xs
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

and get

benchmarking Normal fu
time                 172.2 μs   (172.1 μs .. 172.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 172.6 μs   (172.5 μs .. 173.1 μs)
std dev              828.2 ns   (175.5 ns .. 1.717 μs)

benchmarking Normal mwc
time                 294.5 μs   (294.1 μs .. 295.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 294.6 μs   (294.3 μs .. 295.1 μs)
std dev              1.295 μs   (856.7 ns .. 1.894 μs)

So now I need to add something like a MonadDistribution class (which looks like it is reversing the performance results somehow).

@idontgetoutmuch
Copy link
Member Author

{-# LANGUAGE ImportQualifiedPost #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (IOGenM, mkStdGen, newIOGenM, StdGen)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)
import Control.Monad.Reader (MonadIO, ReaderT (..))
import Statistics.Distribution.Normal (normalDistr)
import Statistics.Distribution (ContDistr (quantile))
import System.Random.Stateful (StatefulGen, uniformDouble01M)


normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)

normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0

-- | Draw from a continuous distribution using the inverse cumulative density
-- function.
draw :: (ContDistr d, MonadDistribution m) => d -> m Double
draw d = fmap (quantile d) random

-- | Monads that can draw random variables.
class (Monad m) => MonadDistribution m where
    -- | Draw from a uniform distribution.
  random ::
    -- | \(\sim \mathcal{U}(0, 1)\)
    m Double

  -- | Draw from a normal distribution.
  normal ::
    -- | mean μ
    Double ->
    -- | standard deviation σ
    Double ->
    -- | \(\sim \mathcal{N}(\mu, \sigma^2)\)
    m Double
  normal m s = draw (normalDistr m s)

-- | The sampling interpretation of a probabilistic program
-- Here m is typically IO or ST
newtype SamplerT g m a = SamplerT {runSamplerT :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
newtype SamplerU g m a = SamplerU {runSamplerU :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)

-- | convenient type synonym to show specializations of SamplerT
-- to particular pairs of monad and RNG
type SamplerIO = SamplerT (IOGenM StdGen) IO
type SamplerIP = SamplerU (IOGenM StdGen) IO

instance StatefulGen g m => MonadDistribution (SamplerT g m) where
  random = SamplerT (ReaderT $ RF.runRVar $ RF.stdUniform)
  normal m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)

instance StatefulGen g m => MonadDistribution (SamplerU g m) where
  random = SamplerU (ReaderT uniformDouble01M)
  normal m s = SamplerU (ReaderT (MWC.normal m s))

-- | Run the sampler with a fixed random seed
sampleIOfixed :: SamplerIO a -> IO a
sampleIOfixed x = newIOGenM (mkStdGen 1729) >>= sampleWith x

sampleIOfixee :: SamplerIP a -> IO a
sampleIOfixee x = newIOGenM (mkStdGen 1729) >>= sampleWiti x

-- | Sample with a random number generator of your choice e.g. the one
-- from `System.Random`.
sampleWith :: SamplerT g m a -> g -> m a
sampleWith (SamplerT m) = runReaderT m

sampleWiti :: SamplerU g m a -> g -> m a
sampleWiti (SamplerU m) = runReaderT m

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
                       xs <- replicateM 1000 normalFu
                       return $ sum xs
                   , bench "Normal mwc" $ nfIO $ do
                       xs <- replicateM 1000 normalMwc
                       return $ sum xs
                   , bench "Normal fu via MonadDistribution" $ nfIO $ do
                       sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   , bench "Normal mwc via MonadDistribution" $ nfIO $ do
                       sampleIOfixee (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

gives this

./FuVsMwc 
benchmarking Normal fu
time                 170.5 μs   (170.3 μs .. 170.7 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 170.8 μs   (170.7 μs .. 171.0 μs)
std dev              532.9 ns   (350.6 ns .. 729.4 ns)

benchmarking Normal mwc
time                 294.1 μs   (293.9 μs .. 294.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 294.5 μs   (294.2 μs .. 295.1 μs)
std dev              1.335 μs   (754.2 ns .. 2.170 μs)

benchmarking Normal fu via MonadDistribution
time                 173.3 μs   (173.0 μs .. 173.8 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 173.6 μs   (173.2 μs .. 174.1 μs)
std dev              1.425 μs   (902.5 ns .. 2.208 μs)

benchmarking Normal mwc via MonadDistribution
time                 302.1 μs   (301.8 μs .. 302.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 303.1 μs   (302.5 μs .. 304.7 μs)
std dev              3.053 μs   (1.805 μs .. 5.216 μs)

So it seems adding the type class adds almost no overhead and the results remain consistent (with random-fu being faster than mwx for normal samples).

@turion
Copy link
Collaborator

turion commented Nov 6, 2023

If you add a type class to the same module, the type class dictionary lookup will be optimized away by GHC. For it to have a performance impact, you have to put it into a separate module, as is done in the library.

The compilation unit for GHC is a single module. This means that modules/files are optimized independently. In consequence, you will not get realistic benchmarks if you put the type class code in the same file as the benchmark. A benchmark has to be written like any user code in order to be realistic, i.e. use the library.

Again, if you add INLINE and SPECIALISE pragmas, you should be able to get rid of the type class overhead again.

@turion
Copy link
Collaborator

turion commented Nov 6, 2023

I did not check my statements against Core, so one should maybe look into the optimized core first to make sure that this is what's happening.

@idontgetoutmuch
Copy link
Member Author

If you add a type class to the same module, the type class dictionary lookup will be optimized away by GHC. For it to have a performance impact, you have to put it into a separate module, as is done in the library.

The compilation unit for GHC is a single module. This means that modules/files are optimized independently. In consequence, you will not get realistic benchmarks if you put the type class code in the same file as the benchmark. A benchmark has to be written like any user code in order to be realistic, i.e. use the library.

Again, if you add INLINE and SPECIALISE pragmas, you should be able to get rid of the type class overhead again.

I am sure you are right but the discrepancy seems to be caused by using ghc -isrc -imodels -XBlockArguments -XOverloadedStrings -fforce-recomp benchmark/Speed.hs and cabal build speed-bench.

@idontgetoutmuch
Copy link
Member Author

WIth ghc -O1 -isrc -imodels -XBlockArguments -XOverloadedStrings -fforce-recomp benchmark/Speed.hs I get consistency and mwc beats random-fu. Now maybe is the time to specialise and inline?

@idontgetoutmuch
Copy link
Member Author

I tried

instance StatefulGen g m => MonadDistribution (SamplerT g m) where
  random = uniform
  normal m s = normalSpec m s

uniform :: (StatefulGen g m, RF.Distribution RF.StdUniform a) => SamplerT g m a
uniform = SamplerT (ReaderT $ RF.runRVar $ RF.stdUniform)
{-# SPECIALISE uniform :: SamplerIO Double #-}

normalSpec :: (StatefulGen g m, RF.Distribution RF.Normal a) => a -> a -> SamplerT g m a
normalSpec m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)
{-# SPECIALISE normalSpec :: Double -> Double -> SamplerIO Double #-}

but it made no difference.

@idontgetoutmuch
Copy link
Member Author

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import Data.Random.Distribution.Uniform qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (IOGenM, mkStdGen, newIOGenM, StdGen)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (replicateM)
import Control.Monad.Reader (MonadIO, ReaderT (..))
import Statistics.Distribution.Normal (normalDistr)
import Statistics.Distribution (ContDistr (quantile))
import System.Random.Stateful (StatefulGen, uniformDouble01M)


normalFu :: IO Double
normalFu = newIOGenM (mkStdGen 1729) >>= (RF.runRVar $ RF.normal 0.0 1.0)

normalMwc :: IO Double
normalMwc = newIOGenM (mkStdGen 1729) >>= MWC.normal 0.0 1.0

-- | Draw from a continuous distribution using the inverse cumulative density
-- function.
draw :: (ContDistr d, MonadDistribution m) => d -> m Double
draw d = fmap (quantile d) random

-- | Monads that can draw random variables.
class (Monad m) => MonadDistribution m where
    -- | Draw from a uniform distribution.
  random ::
    -- | \(\sim \mathcal{U}(0, 1)\)
    m Double

  -- | Draw from a normal distribution.
  normal ::
    -- | mean μ
    Double ->
    -- | standard deviation σ
    Double ->
    -- | \(\sim \mathcal{N}(\mu, \sigma^2)\)
    m Double
  normal m s = draw (normalDistr m s)

-- | The sampling interpretation of a probabilistic program
-- Here m is typically IO or ST
newtype SamplerT g m a = SamplerT {runSamplerT :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)
newtype SamplerU g m a = SamplerU {runSamplerU :: ReaderT g m a} deriving (Functor, Applicative, Monad, MonadIO)

-- | convenient type synonym to show specializations of SamplerT
-- to particular pairs of monad and RNG
type SamplerIO = SamplerT (IOGenM StdGen) IO
type SamplerIP = SamplerU (IOGenM StdGen) IO

instance StatefulGen g m => MonadDistribution (SamplerT g m) where
  random = uniform
  normal m s = normalSpec m s

uniform :: StatefulGen g m => SamplerT g m Double
uniform = SamplerT (ReaderT $ RF.runRVar $ RF.doubleStdUniform)
{-# SPECIALISE uniform :: SamplerIO Double #-}

normalSpec :: (StatefulGen g m, RF.Distribution RF.Normal a) => a -> a -> SamplerT g m a
normalSpec m s = SamplerT (ReaderT $ RF.runRVar $ RF.normal m s)
{-# SPECIALISE normalSpec :: Double -> Double -> SamplerIO Double #-}

instance StatefulGen g m => MonadDistribution (SamplerU g m) where
  random = SamplerU (ReaderT uniformDouble01M)
  normal m s = SamplerU (ReaderT (MWC.normal m s))

-- | Run the sampler with a fixed random seed
sampleIOfixed :: SamplerIO a -> IO a
sampleIOfixed x = newIOGenM (mkStdGen 1729) >>= sampleWith x

sampleIOfixee :: SamplerIP a -> IO a
sampleIOfixee x = newIOGenM (mkStdGen 1729) >>= sampleWiti x

-- | Sample with a random number generator of your choice e.g. the one
-- from `System.Random`.
sampleWith :: SamplerT g m a -> g -> m a
sampleWith (SamplerT m) = runReaderT m

sampleWiti :: SamplerU g m a -> g -> m a
sampleWiti (SamplerU m) = runReaderT m

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Normal fu" $ nfIO $ do
                       xs <- replicateM 1000 normalFu
                       return $ sum xs
                   , bench "Normal mwc" $ nfIO $ do
                       xs <- replicateM 1000 normalMwc
                       return $ sum xs
                   , bench "Normal fu via MonadDistribution" $ nfIO $ do
                       sampleIOfixed (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   , bench "Normal mwc via MonadDistribution" $ nfIO $ do
                       sampleIOfixee (do xs <- replicateM 1000 $ normal 0.0 1.0
                                         return $ sum xs)
                   , bench "Uniform fu via MonadDistribution" $ nfIO $ do
                       sampleIOfixed (do xs <- replicateM 1000 $ random
                                         return $ sum xs)
                   , bench "Uniform mwc via MonadDistribution" $ nfIO $ do
                       sampleIOfixee (do xs <- replicateM 1000 $ random
                                         return $ sum xs)
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

gives

naive-standalone
benchmarking Normal fu
time                 123.8 μs   (123.4 μs .. 124.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 123.8 μs   (123.5 μs .. 124.2 μs)
std dev              1.272 μs   (1.082 μs .. 1.524 μs)

benchmarking Normal mwc
time                 22.09 μs   (21.81 μs .. 22.49 μs)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 21.98 μs   (21.82 μs .. 22.23 μs)
std dev              664.2 ns   (500.6 ns .. 816.0 ns)
variance introduced by outliers: 33% (moderately inflated)

benchmarking Normal fu via MonadDistribution
time                 121.6 μs   (119.5 μs .. 123.2 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 117.3 μs   (116.0 μs .. 118.8 μs)
std dev              4.790 μs   (4.188 μs .. 5.198 μs)
variance introduced by outliers: 41% (moderately inflated)

benchmarking Normal mwc via MonadDistribution
time                 23.91 μs   (23.54 μs .. 24.15 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 23.48 μs   (23.22 μs .. 23.74 μs)
std dev              869.4 ns   (832.5 ns .. 915.3 ns)
variance introduced by outliers: 43% (moderately inflated)

benchmarking Uniform fu via MonadDistribution
time                 95.03 μs   (94.54 μs .. 95.31 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 91.65 μs   (90.58 μs .. 92.75 μs)
std dev              3.602 μs   (3.424 μs .. 3.777 μs)
variance introduced by outliers: 41% (moderately inflated)

benchmarking Uniform mwc via MonadDistribution
time                 9.928 μs   (9.809 μs .. 10.08 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 10.02 μs   (9.922 μs .. 10.12 μs)
std dev              346.9 ns   (316.6 ns .. 364.0 ns)
variance introduced by outliers: 42% (moderately inflated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants