The CUDA scalable parallel programming model provides readily-understood abstractions that free programmers to focus on efficient parallel algorithms. It uses a hierarchy of thread groups, shared memory, and barrier synchronization to express fine-grained and coarse-grained parallelism, using sequential C code for one thread.
Since CUDA was released in 2007, developers have written scalable parallel programs for a wide range of applications, including computational chemistry, sparse matrix solvers, sorting, searching, and physics models. These applications scale transparently to hundreds of processor cores and thousands of concurrent threads.
NVIDIA GPUs with the new Tesla unified graphics and computing architecture run CUDA programs, and are widely avaialable in laptops, PCs, workstations, and servers. The Tesla architecture is massively multithreaded and scales to over one hundred processor cores.
Slides:
Download slides for this talk in PDF format.
About the speaker:
John Nickolls is director of architecture at NVIDIA for GPU computing. He was previously with Broadcom, Silicon Spice, Sun Microsystems, and was a co-founder of MasPar Computer. His interests include parallel processing systems, languages, and architectures. Nickolls has a BS in electrical engineering and computer science from the University of Illinois, and MS and PhD degrees in electrical engineering from Stanford University.
Contact information:
John Nickolls
NVIDIA
2701 San Tomas Expressway
Santa Clara, CA 95050
jnickolls@nvidia.com