Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance #16

Open
sberryman opened this issue Feb 16, 2018 · 5 comments
Open

Performance #16

sberryman opened this issue Feb 16, 2018 · 5 comments

Comments

@sberryman
Copy link

I've been trying to run the flow on some 4K video and I'm not getting anywhere near the performance you reported in the paper.

Oddly enough using the reference implementation and changing DISOpticalFlow::PRESET_ULTRAFAST to DISOpticalFlow::PRESET_FAST was producing flow at roughly 450-480ms per frame. Using preset 3 I'm getting great flow results but it is stating TIME (O.Flow Run-Time ) (ms): 3293.45. When using the default 2 it runs very quickly at TIME (O.Flow Run-Time ) (ms): 122.223

I also see you started a video branch, have you implemented that and not pushed to github by any chance?

@AshwinSekar
Copy link
Collaborator

What kind of gpu/cpu setup are you using? The video branch is still in the works, it was a test to grab process video frame by frame and render it in real time.

@sberryman
Copy link
Author

Output for op_point=2 and op_point=3
https://gist.github.com/sberryman/b613ba3146878f12fc56c5876c194e40

The only change I made was to output an image vs the .flo file. That shouldn't impact any of the timing either based on what I saw in the code.

@sberryman
Copy link
Author

On a side node I had to remove #include <arm_neon.h> in refine_variational.cpp and FDF1.0.1/image.cpp

I also had to comment out lines in CMakeLists.txt for eigen3 to locate the correct include directory and switch to VECTOR_WIDTH=1 in order to get it to compile.

@sberryman
Copy link
Author

Reference

PRESET_ULTRAFAST

Duration: ~140ms
flow_ultrafast_140ms

PRESET_FAST

Duration: ~434ms
flow_fast_434ms

PRESET_MEDIUM

Duration: ~899ms
flow_medium_899ms

FlowOnTheGo

op_point - 3

Duration: ~3306 ms
flow_oppoint_3

op_point - 2

Duration: ~121ms
flow_oppoint_2

op_point=1 takes ~1ms and is pretty much empty and op_point=4 results in a cuda error CUDA error at /root/FlowOnTheGo/src/kernels/flowUtil.cu:533 code=77(cudaErrorIllegalAddress) "cudaHostGetDevicePointer(&a11c1, a11->c1, 0)"

@sberryman
Copy link
Author

FYI, this was all done using the master branch. After looking through optimize_refine I see you have quite a few optimizations there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants