======updated in year 2013=====
This version just tide the API, using struct clmat to wrap the 3 parameter, I think it will reduce the chance of mismatched parameters.
Since OpenFoam need update the boundary condition to build the matrix A ( to solve Ax=b ) multiple runs, it is not as efficient as a pure Ax=b sparse matrix problem.
I did this piece of work, just to practice openCL, to understand OpenFoam source code better when I was hunting jobs during PhD writing up stage.
However, I have other liability in my new jobs. I need to learn more of EEE stuff, I can not update the code to OpenFoam 2.x. Since the matrix solving code is mature, it should work with OpenFoam 2.x, with or without tiny tuning.
Sorry about this.
However, I upload the code on sourceforge.net, I hope it may help somebody else to make leap in the direction: OpenCL4OpenFoam
clUtils (C lib for vector and sparse matrix multiply ) has been extracted from clFoam v0.1, and it has been tested on windows 7 mingw32 building.
First annoucement on CFDonline
Until now, clFoam single precision has been tested on ATI 5650M GPU and NVidia Tesla C2050. The speed is slightly slower than CPU on Tesla C2050 for 160000 cells of case: cavity 4 times steps (clPCG). (see profilingDatasheet.xls in profiling data/ for details)
160000 cells on cluster: redqueen of mancherster University single precison (SP) only DIC precond
AMD Qcore CPU 2700 MHZ
ExecutionTime = 31.93 s ClockTime = 32 s
Tesla C2050, only clPCG no interface update
ExecutionTime = 33.85 s ClockTime = 44 s
Tesla C2050, only clPCG interface update
ExecutionTime = 39.95 s ClockTime = 55 s
The openCL solver is still promising, as it is a new tech and has great space to improve.
Quite a lot of work to do, any advice on improving the efficiency is appreciated. further, there must be some errors in the manual, DO leave me a email to correct them.
Thanks very much
1. Project Layout
# file system structure of the project generated by command:
there are 3 projects(subfolders) in clFoam
clUtils/ basic vector csrMatrix operation written by author
Tested and profiled on AMD_STREAM_SDK, SP on GPU and DP on CPU
clFoam/ clPCG and clPBICG solver based on clUtils/
Tested and profiled on AMD_STREAM_SDK , single precison on GPU
vclFoam/ a wrapper to call viennaCL blas solver
Not finished, there is a bug
# other resource included
doc/ some useful documents, tutorials, install manuals
bin/ some bash scripts
SpeedITOFPlugin1.1/ is downloaded from SpeedIT toolkit website and edited for SP support
(1)clUtils : single precision works for both AMD and NV GPU
double precision past the test on openCL via GPU
double precision on cuda 3.1, fails for “OUT_OF_RESOURCE”
double precision NOT work properly on Tesla C2050 Cuda 3.1
(2)clFoam is usable for only single precison on GPU, clPCG and clPBiCG
(see profilingDatasheet.xls in profiling data/ for details)
For double precision, it should work but still buggy.
I did not have hardware handy for debug, only ssh assess to the remote cluster without upgrade to CUDA 4.0
(3)vclFoam is totally not usable,
As vclFoam will be not probably faster than clFoam, I do not spend quite a lot time on that plugin
clFoam requires the following:
* A recent C++ compiler (e.g. gcc 4.x.x), GCC >4.4 is needed!!!
* OpenFoam 1.7.X
* OpenCL: For accessing GPUs(shared library and include files)
For AMD GPUs, install the AMD_STREAM_SDK
SEE installation guide:
For Nvida GPUs, CUDA_SDK and CUDA_TOOLKIT
SEE installation guide:
* uBLAS : (shipped with the Boost libraries)
#sudo apt-get install boost
* viennaCL 1.1 header has been put into vclFoam,
the install tutorials are put in separate files:
4. Authors and Contact
June 01 2011