Trends in Computer Architecture

Only available on StudyMode
  • Topic: Parallel computing, Graphics processing unit, Pixel shader
  • Pages : 28 (7731 words )
  • Download(s) : 923
  • Published : August 1, 2010
Open Document
Text Preview
........................................................................................................................................................................................................................................................

NVIDIA TESLA: A UNIFIED GRAPHICS AND COMPUTING ARCHITECTURE
TO ENABLE FLEXIBLE, PROGRAMMABLE GRAPHICS AND HIGH-PERFORMANCE COMPUTING, NVIDIA HAS DEVELOPED THE TESLA SCALABLE UNIFIED GRAPHICS AND PARALLEL COMPUTING ARCHITECTURE. ITS SCALABLE PARALLEL ARRAY OF PROCESSORS IS MASSIVELY MULTITHREADED AND PROGRAMMABLE IN C OR VIA GRAPHICS APIS.

........................................................................................................................................................................................................................................................

......

Erik Lindholm John Nickolls Stuart Oberman John Montrym NVIDIA

The modern 3D graphics processing unit (GPU) has evolved from a fixedfunction graphics pipeline to a programmable parallel processor with computing power exceeding that of multicore CPUs. Traditional graphics pipelines consist of separate programmable stages of vertex processors executing vertex shader programs and pixel fragment processors executing pixel shader programs. (Montrym and Moreton provide additional background on the traditional graphics processor architecture.1) NVIDIA’s Tesla architecture, introduced in November 2006 in the GeForce 8800 GPU, unifies the vertex and pixel processors and extends them, enabling high-performance parallel computing applications written in the C language using the Compute Unified Device Architecture (CUDA2–4) parallel programming model and development tools. The Tesla unified graphics and computing architecture is available in a scalable family of GeForce 8-series GPUs and Quadro GPUs for laptops, desktops, workstations, and servers. It also provides the processing architecture for the Tesla GPU computing platforms introduced in 2007 for high-performance computing.

In this article, we discuss the requirements that drove the unified graphics and parallel computing processor architecture, describe the Tesla architecture, and how it is enabling widespread deployment of parallel computing and graphics applications.

The road to unification

The first GPU was the GeForce 256, introduced in 1999. It contained a fixedfunction 32-bit floating-point vertex transform and lighting processor and a fixedfunction integer pixel-fragment pipeline, which were programmed with OpenGL and the Microsoft DX7 API.5 In 2001, the GeForce 3 introduced the first programmable vertex processor executing vertex shaders, along with a configurable 32-bit floating-point fragment pipeline, programmed with DX85 and OpenGL.6 The Radeon 9700, introduced in 2002, featured a programmable 24-bit floating-point pixelfragment processor programmed with DX9 and OpenGL.7,8 The GeForce FX added 32bit floating-point pixel-fragment processors. The XBox 360 introduced an early unified GPU in 2005, allowing vertices and pixels to execute on the same processor.9 ........................................................................

0272-1732/08/$20.00

G

2008 IEEE

Published by the IEEE Computer Society.

39

......................................................................................................................................................................................................................... HOT CHIPS 19

Vertex processors operate on the vertices of primitives such as points, lines, and triangles. Typical operations include transforming coordinates into screen space, which are then fed to the setup unit and the rasterizer, and setting up lighting and texture parameters to be used by the pixelfragment processors. Pixel-fragment processors operate on rasterizer output, which fills the interior of primitives, along with the interpolated parameters. Vertex and...
tracking img