Jekyll2017-05-03T23:20:40+00:00https://boorstat.github.io/boorstatboorstat — simply about useless.
Russian Writers CSV and some graphs2017-03-25T00:00:00+00:002017-03-25T00:00:00+00:00https://boorstat.github.io/lit/russian/2017/03/25/russian-writers-csv<p><img src="/images/lit/russian/writers-life.png" alt="Russian Writers Graph" /></p>
<p>It was always interesting for me how different russian writers relate to each other.<br />
For example, could Dostoyevky meet Gogol at all or not?<br />
We are going to create csv with brief info like years of life for all russian writers.<br />
I did not think too much and just parse <a href="https://en.wikipedia.org/wiki/List_of_Russian-language_writers">appropriate wiki page</a> – thanks to it very much:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>
<span class="n">WIKI_RUS_WRITERS_URL</span> <span class="o">=</span> <span class="s">'https://en.wikipedia.org/wiki/List_of_Russian-language_writers'</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">WIKI_RUS_WRITERS_URL</span><span class="p">)</span><span class="o">.</span><span class="n">content</span><span class="p">,</span> <span class="s">'lxml'</span><span class="p">)</span>
</code></pre>
</div>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">re</span>
<span class="n">all_lis</span> <span class="o">=</span> <span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">'div'</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="s">'mw-content-text'</span><span class="p">)</span><span class="o">.</span><span class="n">find_all</span><span class="p">(</span><span class="s">'li'</span><span class="p">)</span>
<span class="n">lis</span> <span class="o">=</span> <span class="p">[</span><span class="n">li</span><span class="o">.</span><span class="n">text</span> <span class="k">for</span> <span class="n">li</span> <span class="ow">in</span> <span class="n">all_lis</span> <span class="k">if</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s">r'> </span><span class="err">\</span><span class="s">('</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">li</span><span class="p">))]</span>
<span class="k">def</span> <span class="nf">extract_fields</span><span class="p">(</span><span class="n">li</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">li</span><span class="o">.</span><span class="n">partition</span><span class="p">(</span><span class="s">'('</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="n">years</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s">r'</span><span class="err">\</span><span class="s">((.+?</span><span class="err">\</span><span class="s">))'</span><span class="p">,</span> <span class="n">li</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">years</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'[0-9]{4}'</span><span class="p">,</span> <span class="n">years</span><span class="p">)</span>
<span class="n">birth</span><span class="p">,</span> <span class="n">death</span> <span class="o">=</span> <span class="s">''</span><span class="p">,</span> <span class="s">''</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">years</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">birth</span> <span class="o">=</span> <span class="n">years</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">years</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">birth</span><span class="p">,</span> <span class="n">death</span> <span class="o">=</span> <span class="n">years</span>
<span class="n">descr</span> <span class="o">=</span> <span class="n">li</span><span class="o">.</span><span class="n">partition</span><span class="p">(</span><span class="s">')'</span><span class="p">)[</span><span class="mi">2</span><span class="p">]</span>
<span class="n">descr_parts</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">r', (?=[A-Z])'</span><span class="p">,</span> <span class="n">descr</span><span class="p">)</span>
<span class="n">descr</span> <span class="o">=</span> <span class="n">descr_parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">strip</span><span class="p">(</span><span class="s">',.;'</span><span class="p">)</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="n">works</span> <span class="o">=</span> <span class="s">''</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">descr_parts</span><span class="p">)</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">works</span> <span class="o">=</span> <span class="s">','</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">descr_parts</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="k">return</span> <span class="n">name</span><span class="p">,</span> <span class="n">birth</span><span class="p">,</span> <span class="n">death</span><span class="p">,</span> <span class="n">descr</span><span class="p">,</span> <span class="n">works</span>
<span class="n">lis</span> <span class="o">=</span> <span class="p">[</span><span class="n">extract_fields</span><span class="p">(</span><span class="n">li</span><span class="p">)</span> <span class="k">for</span> <span class="n">li</span> <span class="ow">in</span> <span class="n">lis</span><span class="p">]</span>
<span class="n">lis</span> <span class="o">=</span> <span class="p">[</span><span class="n">li</span> <span class="k">for</span> <span class="n">li</span> <span class="ow">in</span> <span class="n">lis</span> <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="n">li</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="s">''</span> <span class="ow">and</span> <span class="n">li</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">==</span> <span class="s">''</span><span class="p">)]</span>
<span class="n">columns</span> <span class="o">=</span> <span class="p">(</span><span class="s">'name'</span><span class="p">,</span> <span class="s">'birth_year'</span><span class="p">,</span> <span class="s">'death_year'</span><span class="p">,</span> <span class="s">'about'</span><span class="p">,</span> <span class="s">'works'</span><span class="p">)</span>
<span class="n">separator</span> <span class="o">=</span> <span class="s">';'</span>
<span class="k">def</span> <span class="nf">save_to</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">s</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">';'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">columns</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">li</span> <span class="ow">in</span> <span class="n">lis</span><span class="p">:</span>
<span class="n">s</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">';'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">li</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">io</span> <span class="kn">import</span> <span class="n">StringIO</span>
<span class="n">csv</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">()</span>
<span class="n">save_to</span><span class="p">(</span><span class="n">csv</span><span class="p">)</span>
<span class="c"># # or local file</span>
<span class="c"># with open('russian-writers.csv', 'w') as f:</span>
<span class="c"># save_to(f)</span>
<span class="k">print</span><span class="p">(</span><span class="n">csv</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()[:</span><span class="mi">1000</span><span class="p">],</span> <span class="s">'...'</span><span class="p">)</span>
</code></pre>
</div>
<div class="highlighter-rouge"><pre class="highlight"><code>name;birth_year;death_year;about;works
Alexander Ablesimov;1742;1783;opera librettist, poet, dramatist, satirist and journalist;
Fyodor Abramov;1920;1983;novelist and short story writer;Two Winters and Three Summers
Grigory Adamov;1886;1945;science fiction writer;The Mystery of the Two Oceans
Georgy Adamovich;1892;1972;poet, critic, memoirist, tanslator;
Alexander Afanasyev;1826;1871;folklorist who recorded and published over 600 Russian folktales and fairytales;Russian Fairy Tales
Alexander Afanasyev-Chuzhbinsky;1816;1875;poet, writer, ethnographer and translator;
Alexander Afinogenov;1904;1941;playwright;A Far Place
M. Ageyev;1898;1973;pseudonymous writer;Cocain Romance
Chinghiz Aitmatov;1928;2008;;Kyrgyz novelist and short story writer,Jamilya,The Day Lasts More Than a Hundred Years
David Aizman;1869;1922;;Russian-Jewish writer and playwright
Bella Akhmadulina;1937;2010;modern poet;The String
Anna Akhmatova;1889;1966;acmeist poet;Requiem
Ivan Aksakov;1823;1886;journalist, slavophile ...
</code></pre>
</div>
<p><a href="https://raw.githubusercontent.com/boorstat/boorstat-files/master/lit/russian/russian-writers.csv">Russian Writers CSV</a><br />
<a href="https://github.com/boorstat/boorstat-files/blob/master/lit/russian/russian-writers.csv">The same on github code</a></p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>
<span class="kn">import</span> <span class="nn">plotly.plotly</span> <span class="kn">as</span> <span class="nn">py</span>
<span class="kn">import</span> <span class="nn">plotly.graph_objs</span> <span class="kn">as</span> <span class="nn">go</span>
<span class="n">RUS_WITERS_CSV</span> <span class="o">=</span> <span class="s">'https://raw.githubusercontent.com/boorstat/boorstat-files/master/lit/russian/russian-writers.csv'</span>
</code></pre>
</div>
<p>Getting csv as pandas dataframe:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">RUS_WITERS_CSV</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">),</span> <span class="n">index_col</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s">';'</span><span class="p">)</span>
</code></pre>
</div>
<p>If you have data frame – you have a graph :)</p>
<p>At first – prettier looking multi-color Gantt style graph.<br />
Which is much easier to code in addition.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">plotly.figure_factory</span> <span class="kn">as</span> <span class="nn">FF</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="n">FAMOUS_RUS_WRITERS</span> <span class="o">=</span> <span class="p">[</span>
<span class="s">'Leo Tolstoy'</span><span class="p">,</span>
<span class="s">'Fyodor Dostoyevsky'</span><span class="p">,</span>
<span class="s">'Mikhail Bulgakov'</span><span class="p">,</span>
<span class="s">'Aleksandr Solzhenitsyn'</span><span class="p">,</span>
<span class="s">'Alexander Pushkin'</span><span class="p">,</span>
<span class="s">'Ivan Turgenev'</span><span class="p">,</span>
<span class="s">'Anton Chekhov'</span><span class="p">,</span>
<span class="s">'Alexander Blok'</span><span class="p">,</span>
<span class="s">'Ivan Bunin'</span><span class="p">,</span>
<span class="s">'Marina Tsvetaeva'</span><span class="p">,</span>
<span class="s">'Nikolai Gogol'</span><span class="p">,</span>
<span class="s">'Mikhail Lermontov'</span><span class="p">,</span>
<span class="s">'Maxim Gorky'</span><span class="p">,</span>
<span class="s">'Boris Pasternak'</span><span class="p">,</span>
<span class="s">'Vladimir Mayakovsky'</span><span class="p">,</span>
<span class="s">'Ivan Goncharov'</span><span class="p">,</span>
<span class="s">'Nikolai Leskov'</span><span class="p">,</span>
<span class="s">'Mikhail Saltykov-Shchedrin'</span><span class="p">,</span>
<span class="s">'Sergei Yesenin'</span><span class="p">,</span>
<span class="s">'Isaak Babel'</span><span class="p">,</span>
<span class="s">'Andrei Bely'</span><span class="p">,</span>
<span class="s">'Ivan Krylov'</span><span class="p">,</span>
<span class="s">'Osip Mandelstam'</span><span class="p">,</span>
<span class="s">'Mikhail Sholokhov'</span><span class="p">,</span>
<span class="s">'Anna Akhmatova'</span><span class="p">,</span>
<span class="s">'Nikolay Nekrasov'</span><span class="p">,</span>
<span class="p">]</span>
<span class="n">df_gantt</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'name'</span><span class="p">,</span> <span class="s">'birth_year'</span><span class="p">,</span> <span class="s">'death_year'</span><span class="p">]]</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">df_gantt</span><span class="o">.</span><span class="n">rename</span><span class="p">(</span>
<span class="n">columns</span><span class="o">=</span><span class="p">{</span>
<span class="s">'name'</span><span class="p">:</span> <span class="s">'Task'</span><span class="p">,</span>
<span class="s">'birth_year'</span><span class="p">:</span> <span class="s">'Start'</span><span class="p">,</span>
<span class="s">'death_year'</span><span class="p">:</span> <span class="s">'Finish'</span><span class="p">},</span>
<span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">df_gantt</span> <span class="o">=</span> <span class="n">df_gantt</span><span class="p">[</span><span class="n">df_gantt</span><span class="p">[</span><span class="s">'Task'</span><span class="p">]</span><span class="o">.</span><span class="nb">map</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="nb">any</span><span class="p">([</span><span class="n">w</span> <span class="ow">in</span> <span class="n">v</span> <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">FAMOUS_RUS_WRITERS</span><span class="p">]))]</span>
<span class="n">df_gantt_birth</span> <span class="o">=</span> <span class="n">df_gantt</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">df_gantt_birth</span><span class="o">.</span><span class="n">sort_values</span><span class="p">([</span><span class="s">'Start'</span><span class="p">],</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">df_gantt_birth</span><span class="o">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">df_gantt_birth</span><span class="p">[</span><span class="s">'Task'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_gantt_birth</span><span class="p">[[</span><span class="s">'Task'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">()[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">df_gantt_birth</span><span class="p">[</span><span class="s">'Start'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_gantt_birth</span><span class="p">[[</span><span class="s">'Start'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="s">'{}-12-31'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]))),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">df_gantt_birth</span><span class="p">[</span><span class="s">'Finish'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_gantt_birth</span><span class="p">[[</span><span class="s">'Finish'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="s">'{}-12-31'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]))),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">FF</span><span class="o">.</span><span class="n">create_gantt</span><span class="p">(</span>
<span class="n">df_gantt_birth</span><span class="p">,</span> <span class="n">showgrid_x</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">showgrid_y</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">title</span><span class="o">=</span><span class="s">'Famous Russian Writers Years of Life sotred by Birth'</span><span class="p">)</span>
<span class="n">py</span><span class="o">.</span><span class="n">iplot</span><span class="p">(</span><span class="n">fig</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="s">'russian-writers-years-of-life-start-sorted'</span><span class="p">,</span> <span class="n">world_readable</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre>
</div>
<iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="https://plot.ly/~boorstat/32.embed" height="600px" width="900px"></iframe>
<p>And the same writers list but sorted by length of life:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">df_gantt_len</span> <span class="o">=</span> <span class="n">df_gantt</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">df_gantt_len</span><span class="p">[</span><span class="s">'life_len'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_gantt_len</span><span class="p">[[</span><span class="s">'Start'</span><span class="p">,</span> <span class="s">'Finish'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">])),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">df_gantt_len</span><span class="o">.</span><span class="n">sort_values</span><span class="p">([</span><span class="s">'life_len'</span><span class="p">],</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">df_gantt_len</span><span class="o">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">df_gantt_len</span><span class="p">[</span><span class="s">'Task'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_gantt_len</span><span class="p">[[</span><span class="s">'Task'</span><span class="p">,</span> <span class="s">'life_len'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="s">'{name}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">()[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="nb">len</span><span class="o">=</span><span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">df_gantt_len</span><span class="p">[</span><span class="s">'Start'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_gantt_len</span><span class="p">[[</span><span class="s">'Start'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="s">'{}-12-31'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]))),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">df_gantt_len</span><span class="p">[</span><span class="s">'Finish'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_gantt_len</span><span class="p">[[</span><span class="s">'Finish'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="s">'{}-12-31'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">math</span><span class="o">.</span><span class="n">isnan</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="k">else</span> <span class="s">'2020'</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">FF</span><span class="o">.</span><span class="n">create_gantt</span><span class="p">(</span>
<span class="n">df_gantt_len</span><span class="p">,</span> <span class="n">showgrid_x</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">showgrid_y</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">title</span><span class="o">=</span><span class="s">'Famous Russian Writers Years of Life sorted by Length of Life'</span><span class="p">)</span>
<span class="n">py</span><span class="o">.</span><span class="n">iplot</span><span class="p">(</span><span class="n">fig</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="s">'russian-writers-years-of-life-len-sorted'</span><span class="p">,</span> <span class="n">world_readable</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre>
</div>
<iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="https://plot.ly/~boorstat/34.embed" height="600px" width="900px"></iframe>
<p>You can see the “EDIT CHART” button at the bottom on graphs.<br />
It’s interesting button – thanks to <a href="https://plot.ly">Plotly</a> very much.<br />
Click it and Online Graph Maker opens.<br /></p>
<p>But seems like it’s not possible to get filterable by writers data grid in that Online Maker with current <a href="https://github.com/plotly/plotly.py/blob/v2.0.0/plotly/figure_factory/_gantt.py#L582">create_gantt()</a> implementation.<br />
That’s why we are going to create the next graph.<br />
Which is more complicate a little bit but it has out of the box ability to be filtered through Plotly Online Graph Maker.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">df_bars</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">df_bars</span><span class="p">[</span><span class="s">'birth_year'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_bars</span><span class="p">[[</span><span class="s">'birth_year'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">df_bars</span><span class="p">[</span><span class="s">'death_year'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_bars</span><span class="p">[[</span><span class="s">'death_year'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="nb">int</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">math</span><span class="o">.</span><span class="n">isnan</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="k">else</span> <span class="mi">2020</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">df_bars</span><span class="p">[</span><span class="s">'life_len'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df_bars</span><span class="p">[[</span><span class="s">'birth_year'</span><span class="p">,</span> <span class="s">'death_year'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">offsets</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Bar</span><span class="p">(</span>
<span class="n">y</span><span class="o">=</span><span class="n">df_bars</span><span class="p">[</span><span class="s">'name'</span><span class="p">],</span>
<span class="n">x</span><span class="o">=</span><span class="n">df_bars</span><span class="p">[</span><span class="s">'birth_year'</span><span class="p">],</span>
<span class="n">name</span><span class="o">=</span><span class="s">'birth'</span><span class="p">,</span>
<span class="n">orientation</span> <span class="o">=</span> <span class="s">'h'</span><span class="p">,</span>
<span class="n">opacity</span><span class="o">=</span><span class="mi">0</span>
<span class="p">)</span>
<span class="n">lifes</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Bar</span><span class="p">(</span>
<span class="n">y</span><span class="o">=</span><span class="n">df_bars</span><span class="p">[</span><span class="s">'name'</span><span class="p">],</span>
<span class="n">x</span><span class="o">=</span><span class="n">df_bars</span><span class="p">[</span><span class="s">'life_len'</span><span class="p">],</span>
<span class="n">name</span><span class="o">=</span><span class="s">'life len'</span><span class="p">,</span>
<span class="n">orientation</span> <span class="o">=</span> <span class="s">'h'</span><span class="p">,</span>
<span class="n">hoverinfo</span><span class="o">=</span><span class="n">df_bars</span><span class="p">[[</span><span class="s">'name'</span><span class="p">,</span> <span class="s">'death_year'</span><span class="p">]]</span><span class="o">.</span><span class="nb">apply</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="s">'{name} ({year})'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">year</span><span class="o">=</span><span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">offsets</span><span class="p">,</span> <span class="n">lifes</span><span class="p">]</span>
<span class="n">layout</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Layout</span><span class="p">(</span>
<span class="n">barmode</span><span class="o">=</span><span class="s">'stack'</span><span class="p">,</span>
<span class="n">showlegend</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">margin</span><span class="o">=</span><span class="p">{</span><span class="s">'l'</span><span class="p">:</span> <span class="mi">200</span><span class="p">},</span>
<span class="n">xaxis</span><span class="o">=</span><span class="p">{</span>
<span class="s">'autorange'</span><span class="p">:</span> <span class="bp">False</span><span class="p">,</span>
<span class="s">'range'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1650</span><span class="p">,</span> <span class="mi">2020</span><span class="p">]}</span>
<span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Figure</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">layout</span><span class="p">)</span>
<span class="n">py</span><span class="o">.</span><span class="n">iplot</span><span class="p">(</span><span class="n">fig</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="s">'russian-writers-life-bars'</span><span class="p">)</span>
</code></pre>
</div>
<iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="https://plot.ly/~boorstat/36.embed" height="525px" width="100%"></iframe>
<p>Click “EDIT CHART” at the bottom of graph.<br />
You should see something like this after nearly 10 secs of loading:</p>
<p><a href="https://boorstat.github.io/images/plolty-online-graph-maker/after-open.png"><img src="https://boorstat.github.io/images/plolty-online-graph-maker/after-open.png" /></a></p>
<p>Select Filter in menu and click “+ Filter” button.<br />
You can fill fields in that manner:</p>
<p><a href="https://boorstat.github.io/images/plolty-online-graph-maker/after-filter-added.png"><img src="https://boorstat.github.io/images/plolty-online-graph-maker/after-filter-added.png" /></a></p>
<p>Try to experiment with writers list on graph.<br />
Then you can export the result to image formats, data like json, code (python, node.js and others) or even html.<br />
But registration in Plot.ly is needed to get these features work.<br />
After logged in let’s return to Graph Maker and click “Save” button.<br />
Then you can find saved graph in Your Files and any of them you can export to needed format.</p>
<p><a href="https://boorstat.github.io/images/plolty-online-graph-maker/export-dlg-preview.png"><img src="https://boorstat.github.io/images/plolty-online-graph-maker/export-dlg-preview.png" /></a></p>
<p>Hope you enjoyed!</p>Pushkin’s Duels CSV and graph2017-03-11T00:00:00+00:002017-03-11T00:00:00+00:00https://boorstat.github.io/pushkin/duels/2017/03/11/pushkin-duels-csv<p><img src="https://boorstat.github.io/images/lit/pushkin/duels-csv.png" alt="Pushkin Duels CSV" /></p>
<p>There are more than 25 known Pushkin’s duels.<br />
Let’s make CSV containing these duels details.<br />
Here its full content:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">requests</span>
<span class="n">DUELS_CSV_URL</span> <span class="o">=</span> <span class="s">'https://raw.githubusercontent.com/boorstat/boorstat-files/master/lit/pushkin/duels.csv'</span>
<span class="k">print</span><span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">DUELS_CSV_URL</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
</code></pre>
</div>
<div class="highlighter-rouge"><pre class="highlight"><code>year,opponent_name,opponent_descr,cause,pushkin_shot,opponent_shot
1816,Paul Hannibal,uncle,during a ball Paul lugged away Pushkin’s girlfriend miss Loshakova,0,0
1817,Pyotr Kaverin,friend,Kaverin’s facetious poem,0,0
1819,Kondratiy Ryleev,poet,Ryleev told a joke about Pushkin at a high society gathering,0,0
1819,Wilhelm Kiichelbecker,friend,funny verses about Küchelbecker,0,1
1819,Modest Korf,Ministry of justice worker,Pushkin’s drunk manservant pestered Korf’s servant who finally beat Pushkin’s servant up,0,0
1819,Denisevich,Major,Pushkin behaved provocatively in theater: he yelled at actors,0,0
1820,Orlov Fedor and Alexey Alexeev,,Orlov and Alexeev reprimanded Pushkin for being drunk and trying to play pool,0,0
1821,Deguilly,French military officer,a quarrel under unclear circumstances,0,0
1822,Semyon Starov,lieutenant colonel,a conflict occurred because of a restaurant orchestra at a casino where both indulged in gambling,1,1
1822,Ivan Lanov,65-year-old state councilor,a quarrel during a holiday dinner,0,0
1822,Todor Balsh,Moldavian nobleman,Balsh’s wife Maria responded to Pushkin’s question in an impolite manner,1,1
1822,Skartla Pruncul,Bessarabian landowner,they were seconds at someone else’s duel and could not agree upon the rules of the duel,0,0
1822,Seweryn Potocki,Active Privy Councillor,discussion about serfdom at the dinner table,0,0
1822,Rutkowski,captain,Pushkin did not believe that a hailstone can weigh up to 3 pounds (which is possible) and made fun of the retired captain,0,0
1822,Inglezi,Chisinau tycoon,Pushkin coveted his wife (a gypsy woman Ludmila Shekora),0,0
1832,Alexander Zubov,General Staff warrant officer,Pushkin had caught Zubov on cheating during a game of cards,0,1
1823,Ivan Rousseau,young writer,Pushkin’s personal dislike for this person,0,0
1826,Nikolay Turgenev,one of the leaders of the Union of Welfare and a member of the Northern Society,Tugrenev did not approve of Pushkin’s poem,0,0
1827,Vladimir Solomirskiy,artillery officer,the officer’s female friend Sofia to whom Pushkin was personally attracted,0,0
1828,Alexander Golitsyn,Minister of Education,Pushkin wrote a bold epigram so the Minister arranged a rough interrogation,0,0
1828,Lagrenée,French Embassy Secretary in St.Petersburg,an unknown girl at a ball,0,0
1829,Mr. Hvostov,Foreign Office worker,Hvostov was dissatisfied by Pushkin’s epigrams,0,0
1836,Nikolay Repin,,Repin was dissatisfied with Pushkin’s poems about him,0,0
1836,Semyon Hlustin,Foreign Office worker,Hlustin did not approve of Pushkin’s poetry,0,0
1836,Vladimir Sollogub,minor Russian writer,Sologub’s unflattering remarks about the poet’s wife Natalia,0,0
1836,George d’Anthès,French officer,an anonymous letter which stated that Pushkin’s wife had been cheating on her husband with d’Anthès,0,0
1837,George d’Anthès,French officer,an anonymous letter which stated that Pushkin’s wife had been cheating on her husband with d’Anthès,1,1
</code></pre>
</div>
<p><a href="https://raw.githubusercontent.com/boorstat/boorstat-files/master/lit/pushkin/duels.csv">Pushkin’s duels csv</a><br /></p>
<p>Source for this data:<br />
<a href="https://rinatim.com/2016/12/03/alexander-pushkins-duels/">https://rinatim.com/2016/12/03/alexander-pushkins-duels/</a><br />
<a href="http://d-push.net">http://d-push.net</a><br />
<a href="https://ru.wikipedia.org/wiki/Пушкин,_Александр_Сергеевич">https://ru.wikipedia.org/wiki/Пушкин,_Александр_Сергеевич</a><br /></p>
<p>And now let’s try to use it and plot some visual representation of Pushkin’s duels into real shots conversion.<br />
Starting with imports:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">io</span> <span class="kn">import</span> <span class="n">StringIO</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>
<span class="kn">import</span> <span class="nn">plotly.plotly</span> <span class="kn">as</span> <span class="nn">py</span>
<span class="kn">import</span> <span class="nn">plotly.graph_objs</span> <span class="kn">as</span> <span class="nn">go</span>
</code></pre>
</div>
<p>Getting csv as pandas dataframe:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_csv</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">DUELS_CSV_URL</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">),</span> <span class="n">index_col</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
<span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</code></pre>
</div>
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>year</th>
<th>opponent_name</th>
<th>opponent_descr</th>
<th>cause</th>
<th>pushkin_shot</th>
<th>opponent_shot</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1816</td>
<td>Paul Hannibal</td>
<td>uncle</td>
<td>during a ball Paul lugged away Pushkin’s girlf...</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>1</th>
<td>1817</td>
<td>Pyotr Kaverin</td>
<td>friend</td>
<td>Kaverin’s facetious poem</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>2</th>
<td>1819</td>
<td>Kondratiy Ryleev</td>
<td>poet</td>
<td>Ryleev told a joke about Pushkin at a high soc...</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<th>3</th>
<td>1819</td>
<td>Wilhelm Kiichelbecker</td>
<td>friend</td>
<td>funny verses about Küchelbecker</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<th>4</th>
<td>1819</td>
<td>Modest Korf</td>
<td>Ministry of justice worker</td>
<td>Pushkin’s drunk manservant pestered Korf’s ser...</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
</div>
<p>Looks very exciting as for me:)<br />
Time to plot Pushkin’s Duels Histogram.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">life_years</span> <span class="o">=</span> <span class="p">{</span><span class="s">'start'</span><span class="p">:</span> <span class="mi">1799</span><span class="p">,</span> <span class="s">'end'</span><span class="p">:</span> <span class="mi">1837</span><span class="p">,</span> <span class="s">'size'</span><span class="p">:</span> <span class="mi">1</span><span class="p">}</span>
<span class="n">nobody_shot_data</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Histogram</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[(</span><span class="n">df</span><span class="p">[</span><span class="s">'pushkin_shot'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">'opponent_shot'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)][</span><span class="s">'year'</span><span class="p">],</span>
<span class="n">xbins</span><span class="o">=</span><span class="n">life_years</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s">'Nobody'</span>
<span class="p">)</span>
<span class="n">only_opponent_shot_data</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Histogram</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[(</span><span class="n">df</span><span class="p">[</span><span class="s">'pushkin_shot'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">'opponent_shot'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)][</span><span class="s">'year'</span><span class="p">],</span>
<span class="n">xbins</span><span class="o">=</span><span class="n">life_years</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s">'Only opponent'</span>
<span class="p">)</span>
<span class="n">both_shot_data</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Histogram</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[(</span><span class="n">df</span><span class="p">[</span><span class="s">'pushkin_shot'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">'opponent_shot'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)][</span><span class="s">'year'</span><span class="p">],</span>
<span class="n">xbins</span><span class="o">=</span><span class="n">life_years</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s">'Both shot'</span>
<span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">only_opponent_shot_data</span><span class="p">,</span> <span class="n">nobody_shot_data</span><span class="p">,</span> <span class="n">both_shot_data</span><span class="p">]</span>
<span class="n">layout</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Layout</span><span class="p">(</span><span class="n">barmode</span><span class="o">=</span><span class="s">'stack'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"Pushkin's Duels Histogram"</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Figure</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">layout</span><span class="p">)</span>
<span class="n">py</span><span class="o">.</span><span class="n">iplot</span><span class="p">(</span><span class="n">fig</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="s">'pushkin-duels'</span><span class="p">)</span>
</code></pre>
</div>
<iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="https://plot.ly/~boorstat/12.embed" height="525px" width="100%"></iframe>Visualised Dostoyevsky Idiot characters activity rate2017-03-05T00:00:00+00:002017-03-05T00:00:00+00:00https://boorstat.github.io/dostoyevsky/idiot/2017/03/05/dostoyevsky-idiot-characters<p>In <a href="/dostoyevsky/idiot/2017/01/08/dostoyevsky-idiot-python-object.html">previous post</a> we’ve generated <a href="https://github.com/boorstat/boorstat-files/raw/master/lit/dostoevsky/idiot.json">json</a> based on <a href="https://github.com/boorstat/boorstat-files/raw/master/lit/dostoevsky/The_Idiot.txt">Idiot text</a>.</p>
<p>In this post we’re going to use this data and visualise characters activity rate along the chapters.<br />
First of all we need dict of lists how characters can be called or named in text:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">CHARACTERS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'Prince Myshkin'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Lev Nikolayevich'</span><span class="p">,</span> <span class="s">'Lef Nicolayevitch'</span><span class="p">,</span> <span class="s">'Myshkin'</span><span class="p">,</span> <span class="s">r'prince(?! S</span><span class="err">\</span><span class="s">.)'</span><span class="p">],</span>
<span class="s">'Nastasya Philipovna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Nastasia Philipovna'</span><span class="p">,</span> <span class="s">'Barashkova'</span><span class="p">],</span>
<span class="s">'Parfyon Semyonovich Rogozhin'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Parfyon'</span><span class="p">,</span> <span class="s">'Rogozhin'</span><span class="p">,</span> <span class="s">'Rogojin'</span><span class="p">],</span>
<span class="s">'General Ivan Fyodorovich Yepanchin'</span><span class="p">:</span> <span class="p">[</span><span class="s">'general'</span><span class="p">,</span> <span class="s">'Ivan Fyodorovich'</span><span class="p">],</span>
<span class="s">'Elizaveta Prokofyevna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Elizabetha'</span><span class="p">,</span> <span class="s">'Prokofievna'</span><span class="p">,</span> <span class="s">r'Mrs</span><span class="err">\</span><span class="s">. Epanchin'</span><span class="p">],</span>
<span class="s">'Alexandra Ivanovna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Alexandra'</span><span class="p">],</span>
<span class="s">'Adelaida Ivanovna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Adelaida'</span><span class="p">],</span>
<span class="s">'Aglaya Ivanovna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Aglaya'</span><span class="p">],</span>
<span class="s">'General Ardalion Alexandrovich Ivolgin'</span><span class="p">:</span> <span class="p">[</span><span class="s">'general'</span><span class="p">,</span> <span class="s">'Ivolgin'</span><span class="p">,</span> <span class="s">'Ardalion'</span><span class="p">],</span>
<span class="s">'Nina Alexandrovna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Nina'</span><span class="p">],</span>
<span class="s">'Gavrila Ardalionovich'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Gavrila'</span><span class="p">,</span> <span class="s">'Ganya'</span><span class="p">,</span> <span class="s">'Ganechka'</span><span class="p">,</span> <span class="s">'Ganka'</span><span class="p">],</span>
<span class="s">'Varvara Ardalionovna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Varvara'</span><span class="p">],</span>
<span class="s">'Lukyan Timofeevich Lebedev'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Lukyan'</span><span class="p">,</span> <span class="s">'Lebedeff'</span><span class="p">],</span>
<span class="s">'Vera Lukyanovna'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Vera'</span><span class="p">],</span>
<span class="s">'Ippolit Terentyev'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Ippolit'</span><span class="p">],</span>
<span class="s">'Ivan Petrovich Ptitsyn'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Ivan Petrovich'</span><span class="p">,</span> <span class="s">'Ptitsin'</span><span class="p">],</span>
<span class="s">'Evgeny Pavlovich Radomsky'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Pavlovitch'</span><span class="p">,</span> <span class="s">'Radomski'</span><span class="p">],</span>
<span class="s">'Prince S.'</span><span class="p">:</span> <span class="p">[</span><span class="s">'prince S.'</span><span class="p">],</span>
<span class="s">'Afanasy Ivanovich Totsky'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Afanasy Ivanovitch'</span><span class="p">,</span> <span class="s">'Totski'</span><span class="p">],</span>
<span class="s">'Ferdyshchenko'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Ferdishenko'</span><span class="p">],</span>
<span class="s">'Keller'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Keller'</span><span class="p">],</span>
<span class="s">'Antip Burdovsky'</span><span class="p">:</span> <span class="p">[</span><span class="s">'Antip'</span><span class="p">,</span> <span class="s">'Burdovsky'</span><span class="p">]</span>
<span class="p">}</span>
</code></pre>
</div>
<p><a href="https://plot.ly">Plotly</a> and its python API is used to visualise data at the final stage.<br />
Se we need to do some imports for it:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">plotly.plotly</span> <span class="kn">as</span> <span class="nn">py</span>
<span class="kn">import</span> <span class="nn">plotly.graph_objs</span> <span class="kn">as</span> <span class="nn">go</span>
</code></pre>
</div>
<p>Also we need to import our own python package: python-boorstat.<br />
It can be easily installed using pip.<br />
Read <a href="/setup/">how</a>.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">boorstat.lit.dostoyevsky.idiot</span> <span class="kn">import</span> <span class="n">idiot</span>
</code></pre>
</div>
<p>This is the most top level of our visualization script:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">roman</span> <span class="o">=</span> <span class="n">idiot</span><span class="o">.</span><span class="n">from_json</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">parse_parts</span><span class="p">(</span><span class="n">roman</span><span class="p">)</span>
<span class="n">traces</span> <span class="o">=</span> <span class="n">prepare_data</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">plot</span><span class="p">(</span><span class="n">traces</span><span class="p">)</span>
</code></pre>
</div>
<p>Final plot function:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">plot</span><span class="p">(</span><span class="n">traces</span><span class="p">):</span>
<span class="n">layout</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Layout</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">'Idiot Characters'</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">go</span><span class="o">.</span><span class="n">Figure</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">traces</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">layout</span><span class="p">)</span>
<span class="k">return</span> <span class="n">py</span><span class="o">.</span><span class="n">iplot</span><span class="p">(</span>
<span class="n">fig</span><span class="p">,</span>
<span class="n">filename</span><span class="o">=</span><span class="s">'idiot-characters'</span><span class="p">,</span>
<span class="n">sharing</span><span class="o">=</span><span class="s">'public'</span><span class="p">)</span>
</code></pre>
</div>
<p>Couple of functions where idiot parts and chapters parsed and charactes rates are set:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">rate_characters</span><span class="p">(</span><span class="n">chapter</span><span class="p">):</span>
<span class="n">characters</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">char</span><span class="p">,</span> <span class="n">regexps</span> <span class="ow">in</span> <span class="n">CHARACTERS</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">characters</span><span class="p">[</span><span class="n">char</span><span class="p">]</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="n">regex</span><span class="p">,</span> <span class="n">chapter</span><span class="p">[</span><span class="s">'text'</span><span class="p">],</span> <span class="n">re</span><span class="o">.</span><span class="n">U</span><span class="p">))</span> <span class="k">for</span> <span class="n">regex</span> <span class="ow">in</span> <span class="n">regexps</span><span class="p">])</span>
<span class="k">return</span> <span class="n">characters</span>
<span class="k">def</span> <span class="nf">parse_parts</span><span class="p">(</span><span class="n">roman</span><span class="p">):</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">part</span> <span class="ow">in</span> <span class="n">roman</span><span class="p">[</span><span class="s">'parts'</span><span class="p">]:</span>
<span class="k">for</span> <span class="n">chapter</span> <span class="ow">in</span> <span class="n">part</span><span class="p">[</span><span class="s">'chapters'</span><span class="p">]:</span>
<span class="n">data</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s">'chapter'</span><span class="p">:</span> <span class="s">'{} - {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">part</span><span class="p">[</span><span class="s">'title'</span><span class="p">],</span> <span class="n">chapter</span><span class="p">[</span><span class="s">'title'</span><span class="p">]),</span>
<span class="s">'rates'</span><span class="p">:</span> <span class="n">rate_characters</span><span class="p">(</span><span class="n">chapter</span><span class="p">)})</span>
<span class="k">return</span> <span class="n">data</span>
</code></pre>
</div>
<p>And huge code to convert native python data to plotly objects ready for plotting:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">prepare_data</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">data</span>
<span class="n">traces</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">character</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s">'rates'</span><span class="p">])):</span>
<span class="n">traces</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">go</span><span class="o">.</span><span class="n">Scatter</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="p">[],</span>
<span class="n">y</span><span class="o">=</span><span class="p">[],</span>
<span class="n">text</span><span class="o">=</span><span class="p">[],</span>
<span class="n">fill</span><span class="o">=</span><span class="s">'tonexty'</span><span class="p">,</span>
<span class="n">mode</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span>
<span class="n">line</span><span class="o">=</span><span class="p">{</span><span class="s">'shape'</span><span class="p">:</span> <span class="s">'spline'</span><span class="p">},</span>
<span class="n">hoverinfo</span><span class="o">=</span><span class="s">'text'</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="n">character</span><span class="p">))</span>
<span class="n">sums</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">trace</span> <span class="ow">in</span> <span class="n">traces</span><span class="p">:</span>
<span class="k">for</span> <span class="n">chapter</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">sums</span><span class="p">[</span><span class="n">chapter</span><span class="p">[</span><span class="s">'chapter'</span><span class="p">]]</span> <span class="o">=</span> <span class="n">sums</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">chapter</span><span class="p">[</span><span class="s">'chapter'</span><span class="p">],</span> <span class="mi">0</span><span class="p">)</span> <span class="o">+</span> <span class="n">chapter</span><span class="p">[</span><span class="s">'rates'</span><span class="p">][</span><span class="n">trace</span><span class="p">[</span><span class="s">'name'</span><span class="p">]]</span>
<span class="n">trace</span><span class="p">[</span><span class="s">'x'</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">chapter</span><span class="p">[</span><span class="s">'chapter'</span><span class="p">])</span>
<span class="n">trace</span><span class="p">[</span><span class="s">'y'</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sums</span><span class="p">[</span><span class="n">chapter</span><span class="p">[</span><span class="s">'chapter'</span><span class="p">]])</span>
<span class="n">trace</span><span class="p">[</span><span class="s">'text'</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
<span class="s">'{} - {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">chapter</span><span class="p">[</span><span class="s">'rates'</span><span class="p">][</span><span class="n">trace</span><span class="p">[</span><span class="s">'name'</span><span class="p">]],</span> <span class="n">trace</span><span class="p">[</span><span class="s">'name'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">chapter</span><span class="p">[</span><span class="s">'rates'</span><span class="p">][</span><span class="n">trace</span><span class="p">[</span><span class="s">'name'</span><span class="p">]]</span> <span class="k">else</span> <span class="s">''</span><span class="p">)</span>
<span class="n">traces_with_liners</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">trace</span> <span class="ow">in</span> <span class="n">traces</span><span class="p">:</span>
<span class="n">traces_with_liners</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">trace</span><span class="p">)</span>
<span class="n">traces_with_liners</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">go</span><span class="o">.</span><span class="n">Scatter</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="n">trace</span><span class="p">[</span><span class="s">'x'</span><span class="p">],</span>
<span class="n">y</span><span class="o">=</span><span class="n">trace</span><span class="p">[</span><span class="s">'y'</span><span class="p">],</span>
<span class="n">fill</span><span class="o">=</span><span class="s">'tonexty'</span><span class="p">,</span>
<span class="n">showlegend</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
<span class="n">line</span><span class="o">=</span><span class="p">{</span><span class="s">'shape'</span><span class="p">:</span> <span class="s">'spline'</span><span class="p">},</span>
<span class="n">hoverinfo</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span>
<span class="n">mode</span><span class="o">=</span><span class="s">'none'</span><span class="p">,</span>
<span class="n">fillcolor</span><span class="o">=</span><span class="s">'#ffffff'</span>
<span class="p">))</span>
<span class="k">return</span> <span class="n">traces_with_liners</span>
</code></pre>
</div>
<p>And returning to code from which we started — let’s see the plot result:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">roman</span> <span class="o">=</span> <span class="n">idiot</span><span class="o">.</span><span class="n">from_json</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">parse_parts</span><span class="p">(</span><span class="n">roman</span><span class="p">)</span>
<span class="n">traces</span> <span class="o">=</span> <span class="n">prepare_data</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">plot</span><span class="p">(</span><span class="n">traces</span><span class="p">)</span>
</code></pre>
</div>
<iframe id="igraph" scrolling="no" style="border:none;" seamless="seamless" src="https://plot.ly/~boorstat/10.embed" height="525px" width="800px"></iframe>
<p>Click on character in legend — and you’ll see where it is on graph.</p>In previous post we’ve generated json based on Idiot text.Dostoyevsky Idiot — python object2017-01-08T00:00:00+00:002017-01-08T00:00:00+00:00https://boorstat.github.io/dostoyevsky/idiot/2017/01/08/dostoyevsky-idiot-python-object<p>We’re going to jsonify and objectify great creation of Fyodor Dostoyevsky – The Idiot.<br />
We’ll get well structured data ready for further experiments.</p>
<p><img src="https://boorstat.github.io/images/dostoyevsky-idiot-object.jpg" alt="Dostoyevsky Idiot JSON" /></p>
<p>We have <a href="https://github.com/boorstat/boorstat-files/raw/master/lit/dostoevsky/The_Idiot.txt">“The Idiot” text</a>.<br />
That’s top level function to objectify this text:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">objectify_idiot</span><span class="p">(</span><span class="n">url</span><span class="o">=</span><span class="n">TEXT_URL</span><span class="p">):</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span><span class="o">.</span><span class="n">text</span>
<span class="n">idiot</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'title'</span><span class="p">:</span> <span class="s">'The Idiot'</span><span class="p">,</span>
<span class="s">'author'</span><span class="p">:</span> <span class="s">'Fyodor Dostoyevsky'</span><span class="p">,</span>
<span class="s">'text'</span><span class="p">:</span> <span class="n">text</span><span class="p">}</span>
<span class="n">part_seps</span> <span class="o">=</span> <span class="p">[</span><span class="s">'PART {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">roman</span><span class="o">.</span><span class="n">toRoman</span><span class="p">(</span><span class="n">i</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)]</span>
<span class="n">part_seps</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s">'Copyright'</span><span class="p">)</span>
<span class="n">parts</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'|'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">part_seps</span><span class="p">),</span> <span class="n">text</span><span class="p">)</span>
<span class="n">idiot</span><span class="p">[</span><span class="s">'copyright'</span><span class="p">]</span> <span class="o">=</span> <span class="n">parts</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="n">parts</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]]</span>
<span class="n">idiot</span><span class="p">[</span><span class="s">'parts'</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="n">objectify_part</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">parts</span><span class="p">)]</span>
<span class="k">return</span> <span class="n">idiot</span>
</code></pre>
</div>
<p><code class="highlighter-rouge">objectify_part()</code> and others low level functions implementation <a href="https://github.com/boorstat/python-boorstat/blob/master/boorstat/lit/dostoyevsky/idiot/idiot.py">can be found here</a>.<br />
Run <code class="highlighter-rouge">pip install git+https://github.com/boorstat/python-boorstat.git</code> to install python package with ready to use functions:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">boorstat.lit.dostoyevsky.idiot</span> <span class="kn">import</span> <span class="n">idiot</span>
<span class="n">idiot_as_dict</span> <span class="o">=</span> <span class="n">idiot</span><span class="o">.</span><span class="n">objectify_idiot</span><span class="p">()</span>
<span class="n">idiot_as_dict_from_pregenerated_json</span> <span class="o">=</span> <span class="n">idiot</span><span class="o">.</span><span class="n">from_json</span><span class="p">()</span>
</code></pre>
</div>
<p>Final structure can be understood from <a href="https://github.com/boorstat/boorstat-files/raw/master/lit/dostoevsky/idiot.json">pregenerated json</a>.<br />
Also this screenshot from debugger can clarify hierarchy inside “The Idiot” object:</p>
<p><img src="https://boorstat.github.io/images/dostoyevsky-idiot-object-structure.png" alt="Dostoyevsky Idiot Python object structure" /></p>We’re going to jsonify and objectify great creation of Fyodor Dostoyevsky – The Idiot. We’ll get well structured data ready for further experiments.