《大规模并行处理器程序设计（英文版·原书第4版） [美]胡文美 [美]大卫·B. 柯克 [黎巴嫩]伊扎特·埃尔·哈吉》 - [美]胡文美 [美]大卫·B. 柯克[黎巴嫩]伊扎特 - 机械工业出版社 - 香港大書城

	登入帳戶　 \|　訂單查詢　 \|　購物車/收銀台(0)　\|　在線留言板　 \|　付款方式　 \|　運費計算　 \|　聯絡我們　 \|　幫助中心　\|　加入書簽
		會員登入新用戶登記

HOME

新書上架

暢銷書架

好書推介

2024年度TOP

臺灣用戶

品種：超過100萬種各類書籍/音像和精品，正品正價，放心網購，悭钱省心

服務：香港／台灣／澳門／海外

送貨：速遞／郵局／服務站

新書上架：簡體書繁體書
暢銷書架：簡體書繁體書
好書推介：簡體書繁體書

『簡體書』大规模并行处理器程序设计（英文版·原书第4版） [美]胡文美 [美]大卫·B. 柯克 [黎巴嫩]伊扎特·埃尔·哈吉

書城自編碼： 4097760
分類：簡體書→大陸圖書→計算機/網絡→程序設計
作者： [美]胡文美 [美]大卫·B. 柯克[黎巴嫩]伊扎特
國際書號(ISBN)： 9787111774716
出版社：机械工业出版社
出版日期： 2025-03-01

頁數/字數： /
書度/開本： 16开釘裝：平装

售價：HK$ 141.9

我要買件

** 我創建的書架 **
未登入.

新書推薦：

《文化中国研究丛书清末的下层社会启蒙运动：1901-1911 》
售價：HK$ 75.9

《巴赫传时代观念和最爱的书》
售價：HK$ 107.8

《意料之内：有限的认知与不确定的环境》
售價：HK$ 64.9

《中国科技金融生态年度报告2024 》
售價：HK$ 74.8

《国家创新指数报告2024 》
售價：HK$ 94.6

《叶长海文集（全十册）》
售價：HK$ 968.0

《对外汉语教学探究：面向东南亚的汉语教学思考和实践跨越语言之桥，融通文化之脉》
售價：HK$ 63.8

《《古籍识小录》上下册》
售價：HK$ 512.6

編輯推薦：

第4版重要更新：·增加关于CUDA的新内容，包括较新的库，如CUDNN。·新增关于常用并行模式（模板、归约、排序）的章节，并对之前的章节（卷积、直方图、稀疏矩阵、图遍历、深度学习）进行了全面更新。·新增一章专门讨论GPU架构，包含Ampere等新的架构示例。·优化关于问题分解策略和性能方面的讨论，增加新的优化检查清单。

內容簡介：

本书内容简洁、直观、实用，强调计算思维能力和并行编程技巧。本书主要分为四个部分：第一部分介绍异构并行计算编程的基础概念，包括数据并行化、GPU架构、CUDA编程及程序性能优化方法等内容；第二部分介绍并行模式，包括卷积、模板、并行直方图、归约、前缀和、归并等内容；第三部分介绍高级模式及应用，包括排序、稀疏矩阵计算、图遍历、深度学习、迭代式磁共振成像重建、静电势能图和计算思维等内容；第四部分介绍高级编程实践，包括异构计算集群编程、CUDA动态并行化等内容。本书不仅适合高等院校计算机相关专业的学生学习，也适合并行计算领域的技术人员参考。

關於作者：

胡文美（Wen-mei W. Hwu） NVIDIA公司杰出研究科学家兼高级研究总监。伊利诺伊大学厄巴纳-香槟分校荣休教授，并行计算研究中心首席科学家。他在编译器设计、计算机体系结构、微体系结构和并行计算方面做出了卓越贡献，是IEEE Fellow、ACM Fellow，荣获了包括ACM-IEEE CS Eckert-Mauchly奖、ACM Grace Murray Hopper奖、ACM SIGARCH Maurice Wilkes奖在内的众多奖项。他拥有加州大学伯克利分校计算机科学博士学位。
大卫·B. 柯克（David B. Kirk）美国国家工程院院士，NVIDIA Fellow，曾任NVIDIA公司首席科学家。2002年，他荣获ACM SIGGRAPH计算机图形成就奖，以表彰其在把高性能计算机图形系统推向大众市场方面做出的杰出贡献。他拥有加州理工学院计算机科学博士学位。
伊扎特·埃尔·哈吉（Izzat El Hajj）贝鲁特美国大学计算机科学系助理教授。他的研究方向是针对新兴并行处理器和内存技术的应用加速和编程支持，特别是GPU和内存内处理。他拥有伊利诺伊大学厄巴纳-香槟分校电气与计算机工程博士学位。

Contents
Foreword
Preface　
Acknowledgments　
CHAPTER 1 Introduction 1
1.1 Heterogeneous parallel computing 3
1.2 Why more speed or parallelism 7
1.3 Speeding up real applications 9
1.4 Challenges in parallel programming 11
1.5 Related parallel programming interfaces 13
1.6 Overarching goals 14
1.7 Organization of the book 15
References 19
Part I Fundamental Concepts
CHAPTER 2 Heterogeneous data parallel computing　23
With special contribution from David Luebke
2.1 Data parallelism 23
2.2 CUDA C program structure 27
2.3 A vector addition kernel 28
2.4 Device global memory and data transfer 31
2.5 Kernel functions and threading 35
2.6 Calling kernel functions 40
2.7 Compilation 42
2.8 Summary 43
Exercises 44
References 46
CHAPTER 3 Multidimensional grids and data 47
3.1 Multidimensional grid organization 47
3.2 Mapping threads to multidimensional data 51
3.3 Image blur: a more complex kernel 58
3.4 Matrix multiplication 62
3.5 Summary 66
Exercises 67
CHAPTER 4 Compute architecture and scheduling 69
4.1 Architecture of a modern GPU 70
4.2 Block scheduling 70
4.3 Synchronization and transparent scalability 71
4.4 Warps and SIMD hardware 74
4.5 Control divergence 79
4.6 Warp scheduling and latency tolerance 83
4.7 Resource partitioning and occupancy 85
4.8 Querying device properties 87
4.9 Summary 90
Exercises 90
References 92
CHAPTER 5 Memory architecture and data locality 93
5.1 Importance of memory access efficiency 94
5.2 CUDA memory types 96
5.3 Tiling for reduced memory traffic 103
5.4 A tiled matrix multiplication kernel 107
5.5 Boundary checks 112
5.6 Impact of memory usage on occupancy 115
5.7 Summary 118
Exercises　119
CHAPTER 6 Performance considerations 123
6.1 Memory coalescing 124
6.2 Hiding memory latency 133
6.3 Thread coarsening 138
6.4 A checklist of optimizations 141
6.5 Knowing your computation’s bottleneck 145
6.6 Summary 146
Exercises　146
References 147
Part II Parallel Patterns
CHAPTER 7 Convolution
An introduction to constant memory and caching 151
7.1 Background 152
7.2 Parallel convolution: a basic algorithm 156
7.3 Constant memory and caching 159
7.4 Tiled convolution with halo cells 163
7.5 Tiled convolution using caches for halo cells 168
7.6 Summary 170
Exercises　171
CHAPTER 8 Stencil 173
8.1 Background 174
8.2 Parallel stencil: a basic algorithm 178
8.3 Shared memory tiling for stencil sweep 179
8.4 Thread coarsening 183
8.5 Register tiling 186
8.6 Summary 188
Exercises　188
CHAPTER 9 Parallel histogram 191
9.1 Background 192
9.2 Atomic operations and a basic histogram kernel 194
9.3 Latency and throughput of atomic operations 198
9.4 Privatization 200
9.5 Coarsening 203
9.6 Aggregation 206
9.7 Summary 208
Exercises　209
References 210
CHAPTER 10 Reduction
And minimizing divergence 211
10.1 Background 211
10.2 Reduction trees 213
10.3 A simple reduction kernel 217
10.4 Minimizing control divergence 219
10.5 Minimizing memory divergence 223
10.6 Minimizing global memory accesses

內容試閱：

Preface
We are proud to introduce to you the fourth edition of Programming Massively Parallel Processors: A Hands-on Approach.
Mass market computing systems that combine multicore CPUs and many-thread GPUs have brought terascale computing to laptops and exascale computing to clusters. Armed with such computing power， we are at the dawn of the wide-spread use of computational experiments in the science， engineering， medical， and business disciplines. We are also witnessing the wide adoption of GPU computing in key industry vertical markets， such as finance， e-commerce， oil and gas， and manufacturing. Breakthroughs in these disciplines will be achieved by using computational experiments that are of unprecedented levels of scale， accuracy， safety， controllability， and observability. This book provides a critical ingredient for this vision: teaching parallel programming to millions of graduate and under-graduate students so that computational thinking and parallel programming skills will become as pervasive as calculus skills.
The primary target audience of this book consists of graduate and undergradu-ate students in all science and engineering disciplines in which computational thinking and parallel programming skills are needed to achieve breakthroughs. The book has also been used successfully by industry professional developers who need to refresh their parallel computing skills and keep up to date with ever-increasing speed of technology evolution. These professional developers work in fields such as machine learning， network security， autonomous vehicles， computa-tional financing， data analytics， cognitive computing， mechanical engineering， civil engineering， electrical engineering， bioengineering， physics， chemistry， astronomy， and geography， and they use computation to advance their fields. Thus these developers are both experts in their domains and programmers. The book takes the approach of teaching parallel programming by building up an intu-itive understanding of the techniques. We assume that the reader has at least some basic C programming experience. We use CUDA C， a parallel programming environment that is supported on NVIDIA GPUs. There are more than 1 billion of these processors in the hands of consumers and professionals， and more than 400，000 programmers are actively using CUDA. The applications that you will develop as part of your learning experience will be runnable by a very large user community.
Since the third edition came out in 2016， we have received numerous com-ments from our readers and instructors. Many of them told us about the existing features they value. Others gave us ideas about how we should expand the book’s contents to make it even more valuable. Furthermore， the hardware and software for heterogeneous parallel computing have advanced tremendously since 2016. In the hardware arena， three more generations of GPU computing architectures， namely， Volta， Turing， and Ampere， have been introduced since the third edition. In the software domain， CUDA 9 through CUDA 11 have allowed programmers to access new hardware and system features. New algorithms have also been developed. Accordingly， we added four new chapters and rewrote a substantial number of the existing chapters.
The four newly added chapters include one new foundational chapter， namely， Chapter 4 (Compute Architecture and Scheduling)， and three new parallel patterns and applications chapters: Chapter 8 (Stencil)， Chapter 10 (Reduction and Minimizing Divergence)， and Chapter 13 (Sorting). Our motivation for adding these chapters is as follows:
.
Chapter 4 (Compute Architecture and Scheduling): In the previous edition the discussions on architecture and scheduling considerations were scattered across multiple chapters. In this edition， Chapter 4 consolidates these discussions into one focused chapter that serves as a centralized reference for readers who are particularly interested in this topic.
.
Chapter 8 (Stencil): In the previous edition the stencil pat

書城介紹　 \|　合作申請　\|　索要書目　 \|　新手入門　\|　聯絡方式　 \|　幫助中心　\|　找書說明　 \|　送貨方式　\|　付款方式	香港用户　 \|　台灣用户　\|　海外用户

	megBook.com.hk
Copyright © 2013 - 2025 （香港）大書城有限公司　 All Rights Reserved.