Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator() is horrendously slow #75

Open
shiozaki opened this issue Feb 9, 2015 · 8 comments
Open

operator() is horrendously slow #75

shiozaki opened this issue Feb 9, 2015 · 8 comments

Comments

@shiozaki
Copy link
Contributor

shiozaki commented Feb 9, 2015

I know this is not a bug but has to be reported. The MP2 assembly step becomes slower by a lot if I use BTAS' native operator().

(I previously wrote 50-100, but seems that was not quite right. It is slow, though.)

@evaleev
Copy link
Member

evaleev commented Feb 9, 2015 via email

@shiozaki
Copy link
Contributor Author

shiozaki commented Feb 9, 2015

See above. I will measure the timing more accurately.

@shiozaki
Copy link
Contributor Author

shiozaki commented Feb 9, 2015

the BTAS code I am talking about is the following, which does not seem to be written for efficiency. The timing follows (takes a lot as it needs to recompile the entire BAGEL)

      /// accesses element using its index, given as a pack of integers
      template<typename index0, typename... _args>
      typename std::enable_if<std::is_integral<index0>::value, const_reference>::type
      operator() (const index0& first, const _args&... rest) const
      {
        typedef typename common_signed_type<index0, typename index_type::value_type>::type ctype;
        auto indexv = {static_cast<ctype>(first), static_cast<ctype>(rest)...};
        index_type index = array_adaptor<index_type>::construct(indexv.size());
        std::copy(std::begin(indexv), std::end(indexv), std::begin(index));
        return storage_[ range_.ordinal(index) ];
      }

@shiozaki
Copy link
Contributor Author

shiozaki commented Feb 9, 2015

Code in BAGEL - I am talking about the loop in the middle (again, timing follows)

    for (int n = 0; n != nloop; ++n) {
      // take care of data. The communication should be hidden
      if (n+ncache < nloop)
        cache.block(n+ncache, n-1);

      const int i = get<0>(cache.task(n));
      const int j = get<1>(cache.task(n));
      if (i < 0 || j < 0) continue;
      cache.data_wait(n);

      shared_ptr<const Matrix> iblock = cache(i);
      shared_ptr<const Matrix> jblock = cache(j);
      const Matrix mat(*iblock % *jblock); // V
      Matrix mat2 = mat; // 2T-T^t
      if (i != j) {
        mat2 *= 2.0;
        mat2 -= *mat.transpose();
      }   
      Matrix mat3 = mat; // T

      for (int a = 0; a != nvirt; ++a) {
        for (int b = 0; b != nvirt; ++b) {
          const double denom = -eig(a+nocc)+eig(i)-eig(b+nocc)+eig(j);
          mat2(b,a) /= denom;
          mat3(b,a) /= denom;
        }   
      }   

      const double fac = i == j ? 1.0 : 2.0;
      ecorr += mat.dot_product(mat2) * fac;
      dmp2->add_block(2.0, nocca, nocca, nvirt, nvirt, mat2 % mat3);
      if (i != j)
        dmp2->add_block(2.0, nocca, nocca, nvirt, nvirt, mat2 ^ mat3);
    }   

@evaleev
Copy link
Member

evaleev commented Feb 9, 2015

It is most probably due to dynamic memory allocation in index construction.
When you know the number of dimensions parametrize tensor by std::array. I
do this when using btas with libint.
On Feb 9, 2015 3:06 PM, "Toru Shiozaki" [email protected] wrote:

the BTAS code I am talking about is the following, which does not seem to
be written for efficiency. The timing follows (takes a lot as it needs to
recompile the entire BAGEL)

  /// accesses element using its index, given as a pack of integers
  template<typename index0, typename... _args>
  typename std::enable_if<std::is_integral<index0>::value, const_reference>::type
  operator() (const index0& first, const _args&... rest) const
  {
    typedef typename common_signed_type<index0, typename index_type::value_type>::type ctype;
    auto indexv = {static_cast<ctype>(first), static_cast<ctype>(rest)...};
    index_type index = array_adaptor<index_type>::construct(indexv.size());
    std::copy(std::begin(indexv), std::end(indexv), std::begin(index));
    return storage_[ range_.ordinal(index) ];
  }


Reply to this email directly or view it on GitHub
#75 (comment).

@shiozaki
Copy link
Contributor Author

shiozaki commented Feb 9, 2015

Sorry maybe this could be partly my fault. I will figure it out (also I should default to std::array now).

@evaleev
Copy link
Member

evaleev commented Feb 9, 2015

we definitely need an alias for constexpr number of dimensions ... perhaps something like this:

/// Tensor with const number of dimensions
template <typename _T,
size_t _N,
CBLAS_ORDER _Order = CblasRowMajor,
class _Storage = btas::DEFAULT::storage<_T>,
class = typename std::enable_if<std::is_same<_T, typename _Storage::value_type>::value>::type
>
using TensorNd = Tensor<_T,
RangeNd<_Order, std::array<long, _N>, btas::BoxOrdinal<_Order,std::array<long, _N>>>,
_Storage
>;

@shiozaki
Copy link
Contributor Author

shiozaki commented Feb 9, 2015

View::begin() seems very slow, which was perhaps responsible for my problem. just for your info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants