User:WillWare/GPUs and CUDA

The PyCUDA guy does development with a NVIDIA GTX 280 video board. These can be found in the $230 to $250 range. The specs are pretty impressive, it has 240 cores on it.

Laptops are available with the 280M (maybe M=mobile?) and 260M, the 260M being more affordable. The big brand in this space is Alienware.

CUDA and PyCUDA

CUDA is NVIDIA's programming environment for its GPUs. It appears to be a fairly standard C environment with library support for their hardware. It is available for Linux, and according to them, will probably work (no promises) on most distributions.

http://www.nvidia.com/object/cuda_get.html

PyCUDA is a Python API for CUDA. When you need to write C code, you write it as a string passed to a function, and it gets built if needed.

Here is an example of embedding some C code into your Python code.

mod = drv.SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1))
print dest-a*b

I've been tinkering with CUDA on an Ubuntu system. I needed to "sudo apt-get install g++" to make it work. Here's the makefile I'm using for one of the exercises.

.SUFFIXES: .cu .go

.cu.go:
       nvcc -deviceemu -g $(@:.go=.cu) -o $@

TARGETS=reverseArray_multiblock.go \
       reverseArray_multiblock_fast.go \
       reverseArray_singleblock.go

all: $(TARGETS)

clean:
       rm -f $(TARGETS)

Remember to turn spaces to tabs so the makefile works.

Getting CUDA working on Slackware is easier, because Slackware nowadays installs everything. I ran some demos from the SDK, including a simulation of smoke, and a very scalable picture of a Julia set. Quite cool, and they gave you the option of comparing GPU performance to CPU perforrnance, and of course it was like night and day.

One thing that could be really cool with CUDA would be to do something really complicated and interesting with a force-feedback joystick. Just the sense of smoothness in the GPU real-time calculation makes me think that some very cool stuff could be done there.

http://www.simprojects.nl/hacking_a_force_feedback_stick.htm
http://www.linuxquestions.org/questions/programming-9/force-feedback-joysticks-199033/
http://libff.wiki.sourceforge.net/ -- I think this is the official Linux support for force-feedback devices
http://wiki.freegamedev.net/index.php/Force_Feedback

Scaling to a cluster

One challenging thing with this is that, unless you use clever algorithms like DPMTA, you pretty much need cluster-wide communication on every time step. My impression is that PAL is primarily mechanical and does not yet have any provisions for a multi-pole algorithm. But the API for adding DPMTA to another program is pretty simple, maybe this isn't that big a deal.

A cluster might not offer an advantage. Consider that only a tiny fraction of the entire simulation will be a high-res molecular dynamic simulation, but that will consume almost all the CPU cycles, and it will need much smaller time steps. If you put the MD on one node, then the other nodes are idle almost all the time. So a cluster works only if you can spread the MD over the cluster, NAMD-style, and still get it to play nice with the lumped models.

The mid-resolution model is a lumped model that updates with short time steps. This is necessary in order to interact correctly with the nearby high-res model. So this is chosen by spatial proximity to the high-res model.

References

NVIDIA stuff

Wikipedia links: CUDA, Molecular modeling on GPU, GPGPU

Science stuff

http://www.ks.uiuc.edu/Publications/Papers/PDF/STON2009/STON2009.pdf -- High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs
http://www.hpccommunity.org/f55/clusters-produce-25-open-hpc-applications-591/ -- Clusters That Produce: 25 Open HPC Applications -- not about GPUs at all, but interesting info about HPC apps
http://mtzweb.stanford.edu/research/gpu/ -- Quantum Chemistry on GPU
http://en.wikipedia.org/wiki/Hartree%E2%80%93Fock
http://en.wikipedia.org/wiki/Basis_set_(chemistry)
http://en.wikipedia.org/wiki/Molecular_orbital

GPUs for scientific computation:

CUDA and PyCUDA

Scaling to a cluster

References

NVIDIA stuff

Science stuff

Related pages