Sophie

Sophie

distrib > Mageia > 5 > x86_64 > media > core-updates > by-pkgid > c044f82ec6193fba7e13c97913613b07 > files > 982

ipython-doc-2.3.0-2.3.mga5.noarch.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Parallel examples &mdash; IPython 2.3.0 documentation</title>
    
    <link rel="stylesheet" href="../_static/default.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '2.3.0',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <link rel="top" title="IPython 2.3.0 documentation" href="../index.html" />
    <link rel="up" title="Using IPython for parallel computing" href="index.html" />
    <link rel="next" title="DAG Dependencies" href="dag_dependencies.html" />
    <link rel="prev" title="Getting started with Windows HPC Server 2008" href="parallel_winhpc.html" /> 
  </head>
  <body>

<div style="background-color: white; text-align: left; padding: 10px 10px 15px 15px">
<a href="http://ipython.org/"><img src="../_static/logo.png" border="0" alt="IPython Documentation"/></a>
</div>

    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="dag_dependencies.html" title="DAG Dependencies"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="parallel_winhpc.html" title="Getting started with Windows HPC Server 2008"
             accesskey="P">previous</a> |</li>
        <li><a href="http://ipython.org">home</a>|&nbsp;</li>
        <li><a href="../search.html">search</a>|&nbsp;</li>
       <li><a href="../index.html">documentation </a> &raquo;</li>

          <li><a href="index.html" accesskey="U">Using IPython for parallel computing</a> &raquo;</li> 
      </ul>
    </div>

      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Parallel examples</a><ul>
<li><a class="reference internal" href="#million-digits-of-pi">150 million digits of pi</a><ul>
<li><a class="reference internal" href="#serial-calculation">Serial calculation</a></li>
<li><a class="reference internal" href="#parallel-calculation">Parallel calculation</a></li>
</ul>
</li>
<li><a class="reference internal" href="#conclusion">Conclusion</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="parallel_winhpc.html"
                        title="previous chapter">Getting started with Windows HPC Server 2008</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="dag_dependencies.html"
                        title="next chapter">DAG Dependencies</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="../_sources/parallel/parallel_demos.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="../search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="parallel-examples">
<span id="id1"></span><h1>Parallel examples<a class="headerlink" href="#parallel-examples" title="Permalink to this headline">¶</a></h1>
<p>In this section we describe two more involved examples of using an IPython
cluster to perform a parallel computation. We will be doing some plotting,
so we start IPython with matplotlib integration by typing:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">ipython</span> <span class="o">--</span><span class="n">matplotlib</span>
</pre></div>
</div>
<p>at the system command line.
Or you can enable matplotlib integration at any point with:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [1]: </span><span class="o">%</span><span class="k">matplotlib</span>
</pre></div>
</div>
<div class="section" id="million-digits-of-pi">
<h2>150 million digits of pi<a class="headerlink" href="#million-digits-of-pi" title="Permalink to this headline">¶</a></h2>
<p>In this example we would like to study the distribution of digits in the
number pi (in base 10). While it is not known if pi is a normal number (a
number is normal in base 10 if 0-9 occur with equal likelihood) numerical
investigations suggest that it is. We will begin with a serial calculation on
10,000 digits of pi and then perform a parallel calculation involving 150
million digits.</p>
<p>In both the serial and parallel calculation we will be using functions defined
in the <tt class="file docutils literal"><span class="pre">pidigits.py</span></tt> file, which is available in the
<tt class="file docutils literal"><span class="pre">examples/parallel</span></tt> directory of the IPython source distribution.
These functions provide basic facilities for working with the digits of pi and
can be loaded into IPython by putting <tt class="file docutils literal"><span class="pre">pidigits.py</span></tt> in your current
working directory and then doing:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [1]: </span><span class="n">run</span> <span class="n">pidigits</span><span class="o">.</span><span class="n">py</span>
</pre></div>
</div>
<div class="section" id="serial-calculation">
<h3>Serial calculation<a class="headerlink" href="#serial-calculation" title="Permalink to this headline">¶</a></h3>
<p>For the serial calculation, we will use <a class="reference external" href="http://www.sympy.org">SymPy</a> to
calculate 10,000 digits of pi and then look at the frequencies of the digits
0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
SymPy is capable of calculating many more digits of pi, our purpose here is to
set the stage for the much larger parallel calculation.</p>
<p>In this example, we use two functions from <tt class="file docutils literal"><span class="pre">pidigits.py</span></tt>:
<tt class="xref py py-func docutils literal"><span class="pre">one_digit_freqs()</span></tt> (which calculates how many times each digit occurs)
and <tt class="xref py py-func docutils literal"><span class="pre">plot_one_digit_freqs()</span></tt> (which uses Matplotlib to plot the result).
Here is an interactive IPython session that uses these functions with
SymPy:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [7]: </span><span class="kn">import</span> <span class="nn">sympy</span>

<span class="gp">In [8]: </span><span class="n">pi</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">pi</span><span class="o">.</span><span class="n">evalf</span><span class="p">(</span><span class="mi">40</span><span class="p">)</span>

<span class="gp">In [9]: </span><span class="n">pi</span>
<span class="gh">Out[9]: </span><span class="go">3.141592653589793238462643383279502884197</span>

<span class="gp">In [10]: </span><span class="n">pi</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">pi</span><span class="o">.</span><span class="n">evalf</span><span class="p">(</span><span class="mi">10000</span><span class="p">)</span>

<span class="gp">In [11]: </span><span class="n">digits</span> <span class="o">=</span> <span class="p">(</span><span class="n">d</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="nb">str</span><span class="p">(</span><span class="n">pi</span><span class="p">)[</span><span class="mi">2</span><span class="p">:])</span>  <span class="c"># create a sequence of digits</span>

<span class="gp">In [13]: </span><span class="n">freqs</span> <span class="o">=</span> <span class="n">one_digit_freqs</span><span class="p">(</span><span class="n">digits</span><span class="p">)</span>

<span class="gp">In [14]: </span><span class="n">plot_one_digit_freqs</span><span class="p">(</span><span class="n">freqs</span><span class="p">)</span>
<span class="gh">Out[14]: </span><span class="go">[&lt;matplotlib.lines.Line2D object at 0x18a55290&gt;]</span>
</pre></div>
</div>
<p>The resulting plot of the single digit counts shows that each digit occurs
approximately 1,000 times, but that with only 10,000 digits the
statistical fluctuations are still rather large:</p>
<img alt="../_images/single_digits.png" src="../_images/single_digits.png" />
<p>It is clear that to reduce the relative fluctuations in the counts, we need
to look at many more digits of pi. That brings us to the parallel calculation.</p>
</div>
<div class="section" id="parallel-calculation">
<h3>Parallel calculation<a class="headerlink" href="#parallel-calculation" title="Permalink to this headline">¶</a></h3>
<p>Calculating many digits of pi is a challenging computational problem in itself.
Because we want to focus on the distribution of digits in this example, we
will use pre-computed digit of pi from the website of Professor Yasumasa
Kanada at the University of Tokyo (<a class="reference external" href="http://www.super-computing.org">http://www.super-computing.org</a>). These
digits come in a set of text files (<a class="reference external" href="ftp://pi.super-computing.org/.2/pi200m/">ftp://pi.super-computing.org/.2/pi200m/</a>)
that each have 10 million digits of pi.</p>
<p>For the parallel calculation, we have copied these files to the local hard
drives of the compute nodes. A total of 15 of these files will be used, for a
total of 150 million digits of pi. To make things a little more interesting we
will calculate the frequencies of all 2 digits sequences (00-99) and then plot
the result using a 2D matrix in Matplotlib.</p>
<p>The overall idea of the calculation is simple: each IPython engine will
compute the two digit counts for the digits in a single file. Then in a final
step the counts from each engine will be added up. To perform this
calculation, we will need two top-level functions from <tt class="file docutils literal"><span class="pre">pidigits.py</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre>    <span class="sd">&quot;&quot;&quot;</span>
<span class="sd">    d = txt_file_to_digits(filename)</span>
<span class="sd">    freqs = one_digit_freqs(d)</span>
<span class="sd">    return freqs</span>

<span class="sd">def compute_two_digit_freqs(filename):</span>
<span class="sd">    &quot;&quot;&quot;</span>
    <span class="n">Read</span> <span class="n">digits</span> <span class="n">of</span> <span class="n">pi</span> <span class="kn">from</span> <span class="nn">a</span> <span class="nn">file</span> <span class="nn">and</span> <span class="nn">compute</span> <span class="nn">the</span> <span class="mi">2</span> <span class="n">digit</span> <span class="n">frequencies</span><span class="o">.</span>
    <span class="sd">&quot;&quot;&quot;</span>
<span class="sd">    d = txt_file_to_digits(filename)</span>
<span class="sd">    freqs = two_digit_freqs(d)</span>
<span class="sd">    return freqs</span>

<span class="sd">def reduce_freqs(freqlist):</span>
<span class="sd">    &quot;&quot;&quot;</span>
    <span class="n">Add</span> <span class="n">up</span> <span class="n">a</span> <span class="nb">list</span> <span class="n">of</span> <span class="n">freq</span> <span class="n">counts</span> <span class="n">to</span> <span class="n">get</span> <span class="n">the</span> <span class="n">total</span> <span class="n">counts</span><span class="o">.</span>
</pre></div>
</div>
<p>We will also use the <tt class="xref py py-func docutils literal"><span class="pre">plot_two_digit_freqs()</span></tt> function to plot the
results. The code to run this calculation in parallel is contained in
<tt class="file docutils literal"><span class="pre">examples/parallel/parallelpi.py</span></tt>. This code can be run in parallel
using IPython by following these steps:</p>
<ol class="arabic simple">
<li>Use <strong class="command">ipcluster</strong> to start 15 engines. We used 16 cores of an SGE linux
cluster (1 controller + 15 engines).</li>
<li>With the file <tt class="file docutils literal"><span class="pre">parallelpi.py</span></tt> in your current working directory, open
up IPython, enable matplotlib, and type <tt class="docutils literal"><span class="pre">run</span> <span class="pre">parallelpi.py</span></tt>.  This will download
the pi files via ftp the first time you run it, if they are not
present in the Engines&#8217; working directory.</li>
</ol>
<p>When run on our 16 cores, we observe a speedup of 14.2x. This is slightly
less than linear scaling (16x) because the controller is also running on one of
the cores.</p>
<p>To emphasize the interactive nature of IPython, we now show how the
calculation can also be run by simply typing the commands from
<tt class="file docutils literal"><span class="pre">parallelpi.py</span></tt> interactively into IPython:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [1]: </span><span class="kn">from</span> <span class="nn">IPython.parallel</span> <span class="kn">import</span> <span class="n">Client</span>

<span class="go"># The Client allows us to use the engines interactively.</span>
<span class="go"># We simply pass Client the name of the cluster profile we</span>
<span class="go"># are using.</span>
<span class="gp">In [2]: </span><span class="n">c</span> <span class="o">=</span> <span class="n">Client</span><span class="p">(</span><span class="n">profile</span><span class="o">=</span><span class="s">&#39;mycluster&#39;</span><span class="p">)</span>
<span class="gp">In [3]: </span><span class="n">v</span> <span class="o">=</span> <span class="n">c</span><span class="p">[:]</span>

<span class="gp">In [3]: </span><span class="n">c</span><span class="o">.</span><span class="n">ids</span>
<span class="gh">Out[3]: </span><span class="go">[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]</span>

<span class="gp">In [4]: </span><span class="n">run</span> <span class="n">pidigits</span><span class="o">.</span><span class="n">py</span>

<span class="gp">In [5]: </span><span class="n">filestring</span> <span class="o">=</span> <span class="s">&#39;pi200m.ascii.</span><span class="si">%(i)02d</span><span class="s">of20&#39;</span>

<span class="go"># Create the list of files to process.</span>
<span class="gp">In [6]: </span><span class="n">files</span> <span class="o">=</span> <span class="p">[</span><span class="n">filestring</span> <span class="o">%</span> <span class="p">{</span><span class="s">&#39;i&#39;</span><span class="p">:</span><span class="n">i</span><span class="p">}</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">16</span><span class="p">)]</span>

<span class="gp">In [7]: </span><span class="n">files</span>
<span class="gh">Out[7]:</span>
<span class="go">[&#39;pi200m.ascii.01of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.02of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.03of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.04of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.05of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.06of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.07of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.08of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.09of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.10of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.11of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.12of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.13of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.14of20&#39;,</span>
<span class="go"> &#39;pi200m.ascii.15of20&#39;]</span>

<span class="go"># download the data files if they don&#39;t already exist:</span>
<span class="gp">In [8]: </span><span class="n">v</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">fetch_pi_file</span><span class="p">,</span> <span class="n">files</span><span class="p">)</span>

<span class="go"># This is the parallel calculation using the Client.map method</span>
<span class="go"># which applies compute_two_digit_freqs to each file in files in parallel.</span>
<span class="gp">In [9]: </span><span class="n">freqs_all</span> <span class="o">=</span> <span class="n">v</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">compute_two_digit_freqs</span><span class="p">,</span> <span class="n">files</span><span class="p">)</span>

<span class="go"># Add up the frequencies from each engine.</span>
<span class="gp">In [10]: </span><span class="n">freqs</span> <span class="o">=</span> <span class="n">reduce_freqs</span><span class="p">(</span><span class="n">freqs_all</span><span class="p">)</span>

<span class="gp">In [11]: </span><span class="n">plot_two_digit_freqs</span><span class="p">(</span><span class="n">freqs</span><span class="p">)</span>
<span class="gh">Out[11]: </span><span class="go">&lt;matplotlib.image.AxesImage object at 0x18beb110&gt;</span>

<span class="gp">In [12]: </span><span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s">&#39;2 digit counts of 150m digits of pi&#39;</span><span class="p">)</span>
<span class="gh">Out[12]: </span><span class="go">&lt;matplotlib.text.Text object at 0x18d1f9b0&gt;</span>
</pre></div>
</div>
<p>The resulting plot generated by Matplotlib is shown below. The colors indicate
which two digit sequences are more (red) or less (blue) likely to occur in the
first 150 million digits of pi. We clearly see that the sequence &#8220;41&#8221; is
most likely and that &#8220;06&#8221; and &#8220;07&#8221; are least likely. Further analysis would
show that the relative size of the statistical fluctuations have decreased
compared to the 10,000 digit calculation.</p>
<img alt="../_images/two_digit_counts.png" src="../_images/two_digit_counts.png" />
</div>
</div>
<div class="section" id="conclusion">
<h2>Conclusion<a class="headerlink" href="#conclusion" title="Permalink to this headline">¶</a></h2>
<p>To conclude these examples, we summarize the key features of IPython&#8217;s
parallel architecture that have been demonstrated:</p>
<ul class="simple">
<li>Serial code can be parallelized often with only a few extra lines of code.
We have used the <tt class="xref py py-class docutils literal"><span class="pre">DirectView</span></tt> and <tt class="xref py py-class docutils literal"><span class="pre">LoadBalancedView</span></tt> classes
for this purpose.</li>
<li>The resulting parallel code can be run without ever leaving the IPython&#8217;s
interactive shell.</li>
<li>Any data computed in parallel can be explored interactively through
visualization or further numerical calculations.</li>
<li>We have run these examples on a cluster running RHEL 5 and Sun GridEngine.
IPython&#8217;s built in support for SGE (and other batch systems) makes it easy
to get started with IPython&#8217;s parallel capabilities.</li>
</ul>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="dag_dependencies.html" title="DAG Dependencies"
             >next</a> |</li>
        <li class="right" >
          <a href="parallel_winhpc.html" title="Getting started with Windows HPC Server 2008"
             >previous</a> |</li>
        <li><a href="http://ipython.org">home</a>|&nbsp;</li>
        <li><a href="../search.html">search</a>|&nbsp;</li>
       <li><a href="../index.html">documentation </a> &raquo;</li>

          <li><a href="index.html" >Using IPython for parallel computing</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright The IPython Development Team.
      Last updated on Sep 02, 2015.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.2.3.
    </div>
  </body>
</html>