Sophie

Sophie

distrib > Mageia > 5 > x86_64 > media > nonfree-release > by-pkgid > 9e74824375d82b5e6c4b187b256e94f6 > files > 146

python-pycuda-2014.1-6.mga5.nonfree.x86_64.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Tutorial &mdash; PyCUDA 2014.1 documentation</title>
    
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '2014.1',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="PyCUDA 2014.1 documentation" href="index.html" />
    <link rel="next" title="Device Interface" href="driver.html" />
    <link rel="prev" title="Installation" href="install.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="driver.html" title="Device Interface"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="install.html" title="Installation"
             accesskey="P">previous</a> |</li>
        <li><a href="index.html">PyCUDA 2014.1 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="tutorial">
<h1>Tutorial<a class="headerlink" href="#tutorial" title="Permalink to this headline">¶</a></h1>
<div class="section" id="getting-started">
<h2>Getting started<a class="headerlink" href="#getting-started" title="Permalink to this headline">¶</a></h2>
<p>Before you can use PyCuda, you have to import and initialize it:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">pycuda.driver</span> <span class="kn">as</span> <span class="nn">cuda</span>
<span class="kn">import</span> <span class="nn">pycuda.autoinit</span>
<span class="kn">from</span> <span class="nn">pycuda.compiler</span> <span class="kn">import</span> <span class="n">SourceModule</span>
</pre></div>
</div>
<p>Note that you do not <em>have</em> to use <a class="reference internal" href="util.html#module-pycuda.autoinit" title="pycuda.autoinit"><tt class="xref py py-mod docutils literal"><span class="pre">pycuda.autoinit</span></tt></a>&#8211;
initialization, context creation, and cleanup can also be performed
manually, if desired.</p>
</div>
<div class="section" id="transferring-data">
<h2>Transferring Data<a class="headerlink" href="#transferring-data" title="Permalink to this headline">¶</a></h2>
<p>The next step in most programs is to transfer data onto the device.
In PyCuda, you will mostly transfer data from <a class="reference external" href="http://docs.scipy.org/doc/numpy/reference/index.html#module-numpy" title="(in NumPy v1.9)"><tt class="xref py py-mod docutils literal"><span class="pre">numpy</span></tt></a> arrays
on the host. (But indeed, everything that satisfies the Python buffer
interface will work, even a <a class="reference external" href="http://docs.python.org/dev/library/stdtypes.html#str" title="(in Python v3.5)"><tt class="xref py py-class docutils literal"><span class="pre">str</span></tt></a>.) Let&#8217;s make a 4x4 array
of random numbers:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">numpy</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span>
</pre></div>
</div>
<p>But wait&#8211;<em>a</em> consists of double precision numbers, but most nVidia
devices only support single precision:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">a</span> <span class="o">=</span> <span class="n">a</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
</pre></div>
</div>
<p>Finally, we need somewhere to transfer data to, so we need to
allocate memory on the device:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">a_gpu</span> <span class="o">=</span> <span class="n">cuda</span><span class="o">.</span><span class="n">mem_alloc</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">nbytes</span><span class="p">)</span>
</pre></div>
</div>
<p>As a last step, we need to transfer the data to the GPU:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_htod</span><span class="p">(</span><span class="n">a_gpu</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="executing-a-kernel">
<h2>Executing a Kernel<a class="headerlink" href="#executing-a-kernel" title="Permalink to this headline">¶</a></h2>
<p>For this tutorial, we&#8217;ll stick to something simple: We will write code to
double each entry in <em>a_gpu</em>. To this end, we write the corresponding CUDA C
code, and feed it into the constructor of a
<a class="reference internal" href="driver.html#pycuda.compiler.SourceModule" title="pycuda.compiler.SourceModule"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.compiler.SourceModule</span></tt></a>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">mod</span> <span class="o">=</span> <span class="n">SourceModule</span><span class="p">(</span><span class="s">&quot;&quot;&quot;</span>
<span class="s">  __global__ void doublify(float *a)</span>
<span class="s">  {</span>
<span class="s">    int idx = threadIdx.x + threadIdx.y*4;</span>
<span class="s">    a[idx] *= 2;</span>
<span class="s">  }</span>
<span class="s">  &quot;&quot;&quot;</span><span class="p">)</span>
</pre></div>
</div>
<p>If there aren&#8217;t any errors, the code is now compiled and loaded onto the
device. We find a reference to our <a class="reference internal" href="driver.html#pycuda.driver.Function" title="pycuda.driver.Function"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.Function</span></tt></a> and call
it, specifying <em>a_gpu</em> as the argument, and a block size of 4x4:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">func</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">get_function</span><span class="p">(</span><span class="s">&quot;doublify&quot;</span><span class="p">)</span>
<span class="n">func</span><span class="p">(</span><span class="n">a_gpu</span><span class="p">,</span> <span class="n">block</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
</pre></div>
</div>
<p>Finally, we fetch the data back from the GPU and display it, together with the
original <em>a</em>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">a_doubled</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">empty_like</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_dtoh</span><span class="p">(</span><span class="n">a_doubled</span><span class="p">,</span> <span class="n">a_gpu</span><span class="p">)</span>
<span class="k">print</span> <span class="n">a_doubled</span>
<span class="k">print</span> <span class="n">a</span>
</pre></div>
</div>
<p>This will print something like this:</p>
<div class="highlight-python"><div class="highlight"><pre>[[ 0.51360393  1.40589952  2.25009012  3.02563429]
 [-0.75841576 -1.18757617  2.72269917  3.12156057]
 [ 0.28826082 -2.92448163  1.21624792  2.86353827]
 [ 1.57651746  0.63500965  2.21570683 -0.44537592]]
[[ 0.25680196  0.70294976  1.12504506  1.51281714]
 [-0.37920788 -0.59378809  1.36134958  1.56078029]
 [ 0.14413041 -1.46224082  0.60812396  1.43176913]
 [ 0.78825873  0.31750482  1.10785341 -0.22268796]]
</pre></div>
</div>
<p>It worked! That completes our walkthrough. Thankfully, PyCuda takes
over from here and does all the cleanup for you, so you&#8217;re done.
Stick around for some bonus material in the next section, though.</p>
<p>(You can find the code for this demo as <tt class="file docutils literal"><span class="pre">examples/demo.py</span></tt> in the PyCuda
source distribution.)</p>
<div class="section" id="shortcuts-for-explicit-memory-copies">
<h3>Shortcuts for Explicit Memory Copies<a class="headerlink" href="#shortcuts-for-explicit-memory-copies" title="Permalink to this headline">¶</a></h3>
<p>The <a class="reference internal" href="driver.html#pycuda.driver.In" title="pycuda.driver.In"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.In</span></tt></a>, <a class="reference internal" href="driver.html#pycuda.driver.Out" title="pycuda.driver.Out"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.Out</span></tt></a>, and
<a class="reference internal" href="driver.html#pycuda.driver.InOut" title="pycuda.driver.InOut"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.InOut</span></tt></a> argument handlers can simplify some of the memory
transfers. For example, instead of creating <em>a_gpu</em>, if replacing <em>a</em> is fine,
the following code can be used:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">func</span><span class="p">(</span><span class="n">cuda</span><span class="o">.</span><span class="n">InOut</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">block</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div>
</div>
</div>
<div class="section" id="prepared-invocations">
<h3>Prepared Invocations<a class="headerlink" href="#prepared-invocations" title="Permalink to this headline">¶</a></h3>
<p>Function invocation using the built-in <a class="reference internal" href="driver.html#pycuda.driver.Function.__call__" title="pycuda.driver.Function.__call__"><tt class="xref py py-meth docutils literal"><span class="pre">pycuda.driver.Function.__call__()</span></tt></a>
method incurs overhead for type identification (see <a class="reference internal" href="driver.html#reference-doc"><em>Device Interface</em></a>). To
achieve the same effect as above without this overhead, the function is bound
to argument types (as designated by Python&#8217;s standard library <a class="reference external" href="http://docs.python.org/dev/library/struct.html#module-struct" title="(in Python v3.5)"><tt class="xref py py-mod docutils literal"><span class="pre">struct</span></tt></a>
module), and then called. This also avoids having to assign explicit argument
sizes using the <cite>numpy.number</cite> classes:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">func</span><span class="o">.</span><span class="n">prepare</span><span class="p">(</span><span class="s">&quot;P&quot;</span><span class="p">,</span> <span class="n">block</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">func</span><span class="o">.</span><span class="n">prepared_call</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">a_gpu</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="bonus-abstracting-away-the-complications">
<h2>Bonus: Abstracting Away the Complications<a class="headerlink" href="#bonus-abstracting-away-the-complications" title="Permalink to this headline">¶</a></h2>
<p>Using a <a class="reference internal" href="array.html#pycuda.gpuarray.GPUArray" title="pycuda.gpuarray.GPUArray"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.gpuarray.GPUArray</span></tt></a>, the same effect can be
achieved with much less writing:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">pycuda.gpuarray</span> <span class="kn">as</span> <span class="nn">gpuarray</span>
<span class="kn">import</span> <span class="nn">pycuda.driver</span> <span class="kn">as</span> <span class="nn">cuda</span>
<span class="kn">import</span> <span class="nn">pycuda.autoinit</span>
<span class="kn">import</span> <span class="nn">numpy</span>

<span class="n">a_gpu</span> <span class="o">=</span> <span class="n">gpuarray</span><span class="o">.</span><span class="n">to_gpu</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">))</span>
<span class="n">a_doubled</span> <span class="o">=</span> <span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">a_gpu</span><span class="p">)</span><span class="o">.</span><span class="n">get</span><span class="p">()</span>
<span class="k">print</span> <span class="n">a_doubled</span>
<span class="k">print</span> <span class="n">a_gpu</span>
</pre></div>
</div>
</div>
<div class="section" id="advanced-topics">
<h2>Advanced Topics<a class="headerlink" href="#advanced-topics" title="Permalink to this headline">¶</a></h2>
<div class="section" id="structures">
<h3>Structures<a class="headerlink" href="#structures" title="Permalink to this headline">¶</a></h3>
<p>(contributed by Nicholas Tung, find the code in <tt class="file docutils literal"><span class="pre">examples/demo_struct.py</span></tt>)</p>
<p>Suppose we have the following structure, for doubling a number of variable
length arrays:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">mod</span> <span class="o">=</span> <span class="n">SourceModule</span><span class="p">(</span><span class="s">&quot;&quot;&quot;</span>
<span class="s">    struct DoubleOperation {</span>
<span class="s">        int datalen, __padding; // so 64-bit ptrs can be aligned</span>
<span class="s">        float *ptr;</span>
<span class="s">    };</span>

<span class="s">    __global__ void double_array(DoubleOperation *a) {</span>
<span class="s">        a = &amp;a[blockIdx.x];</span>
<span class="s">        for (int idx = threadIdx.x; idx &lt; a-&gt;datalen; idx += blockDim.x) {</span>
<span class="s">            a-&gt;ptr[idx] *= 2;</span>
<span class="s">        }</span>
<span class="s">    }</span>
<span class="s">    &quot;&quot;&quot;</span><span class="p">)</span>
</pre></div>
</div>
<p>Each block in the grid (see CUDA documentation) will double one of the arrays.
The <cite>for</cite> loop allows for more data elements than threads to be doubled,
though is not efficient if one can guarantee that there will be a sufficient
number of threads. Next, a wrapper class for the structure is created, and
two arrays are instantiated:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">class</span> <span class="nc">DoubleOpStruct</span><span class="p">:</span>
    <span class="n">mem_size</span> <span class="o">=</span> <span class="mi">8</span> <span class="o">+</span> <span class="n">numpy</span><span class="o">.</span><span class="n">intp</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">nbytes</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">array</span><span class="p">,</span> <span class="n">struct_arr_ptr</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">cuda</span><span class="o">.</span><span class="n">to_device</span><span class="p">(</span><span class="n">array</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">dtype</span> <span class="o">=</span> <span class="n">array</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">array</span><span class="o">.</span><span class="n">dtype</span>
        <span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_htod</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">struct_arr_ptr</span><span class="p">),</span> <span class="n">numpy</span><span class="o">.</span><span class="n">int32</span><span class="p">(</span><span class="n">array</span><span class="o">.</span><span class="n">size</span><span class="p">))</span>
        <span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_htod</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">struct_arr_ptr</span><span class="p">)</span> <span class="o">+</span> <span class="mi">8</span><span class="p">,</span> <span class="n">numpy</span><span class="o">.</span><span class="n">intp</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">)))</span>
    <span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">cuda</span><span class="o">.</span><span class="n">from_device</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">dtype</span><span class="p">))</span>

<span class="n">struct_arr</span> <span class="o">=</span> <span class="n">cuda</span><span class="o">.</span><span class="n">mem_alloc</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">DoubleOpStruct</span><span class="o">.</span><span class="n">mem_size</span><span class="p">)</span>
<span class="n">do2_ptr</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">struct_arr</span><span class="p">)</span> <span class="o">+</span> <span class="n">DoubleOpStruct</span><span class="o">.</span><span class="n">mem_size</span>

<span class="n">array1</span> <span class="o">=</span> <span class="n">DoubleOpStruct</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">),</span> <span class="n">struct_arr</span><span class="p">)</span>
<span class="n">array2</span> <span class="o">=</span> <span class="n">DoubleOpStruct</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">),</span> <span class="n">do2_ptr</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">&quot;original arrays&quot;</span><span class="p">,</span> <span class="n">array1</span><span class="p">,</span> <span class="n">array2</span><span class="p">)</span>
</pre></div>
</div>
<p>This code uses the <a class="reference internal" href="driver.html#pycuda.driver.to_device" title="pycuda.driver.to_device"><tt class="xref py py-func docutils literal"><span class="pre">pycuda.driver.to_device()</span></tt></a> and
<a class="reference internal" href="driver.html#pycuda.driver.from_device" title="pycuda.driver.from_device"><tt class="xref py py-func docutils literal"><span class="pre">pycuda.driver.from_device()</span></tt></a> functions to allocate and copy values, and
demonstrates how offsets to an allocated block of memory can be used. Finally,
the code can be executed; the following demonstrates doubling both arrays, then
only the second:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">func</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">get_function</span><span class="p">(</span><span class="s">&quot;double_array&quot;</span><span class="p">)</span>
<span class="n">func</span><span class="p">(</span><span class="n">struct_arr</span><span class="p">,</span> <span class="n">block</span> <span class="o">=</span> <span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">grid</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">&quot;doubled arrays&quot;</span><span class="p">,</span> <span class="n">array1</span><span class="p">,</span> <span class="n">array2</span><span class="p">)</span>

<span class="n">func</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">intp</span><span class="p">(</span><span class="n">do2_ptr</span><span class="p">),</span> <span class="n">block</span> <span class="o">=</span> <span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">grid</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">&quot;doubled second only&quot;</span><span class="p">,</span> <span class="n">array1</span><span class="p">,</span> <span class="n">array2</span><span class="p">,</span> <span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="where-to-go-from-here">
<h2>Where to go from here<a class="headerlink" href="#where-to-go-from-here" title="Permalink to this headline">¶</a></h2>
<p>Once you feel sufficiently familiar with the basics, feel free to dig into the
<a class="reference internal" href="driver.html#reference-doc"><em>Device Interface</em></a>. For more examples, check the in the <tt class="file docutils literal"><span class="pre">examples/</span></tt>
subdirectory of the distribution.  This folder also contains several benchmarks
to see the difference between GPU and CPU based calculations. As a reference for
how stuff is done, PyCuda&#8217;s test suite in the <tt class="file docutils literal"><span class="pre">test/</span></tt> subdirectory of the
distribution may also be of help.</p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Tutorial</a><ul>
<li><a class="reference internal" href="#getting-started">Getting started</a></li>
<li><a class="reference internal" href="#transferring-data">Transferring Data</a></li>
<li><a class="reference internal" href="#executing-a-kernel">Executing a Kernel</a><ul>
<li><a class="reference internal" href="#shortcuts-for-explicit-memory-copies">Shortcuts for Explicit Memory Copies</a></li>
<li><a class="reference internal" href="#prepared-invocations">Prepared Invocations</a></li>
</ul>
</li>
<li><a class="reference internal" href="#bonus-abstracting-away-the-complications">Bonus: Abstracting Away the Complications</a></li>
<li><a class="reference internal" href="#advanced-topics">Advanced Topics</a><ul>
<li><a class="reference internal" href="#structures">Structures</a></li>
</ul>
</li>
<li><a class="reference internal" href="#where-to-go-from-here">Where to go from here</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="install.html"
                        title="previous chapter">Installation</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="driver.html"
                        title="next chapter">Device Interface</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/tutorial.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="driver.html" title="Device Interface"
             >next</a> |</li>
        <li class="right" >
          <a href="install.html" title="Installation"
             >previous</a> |</li>
        <li><a href="index.html">PyCUDA 2014.1 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2008, Andreas Kloeckner.
      Last updated on Jan 22, 2015.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.2.3.
    </div>
  </body>
</html>