Getting started with ``mpl-probscale`` ====================================== Installation ------------ ``mpl-probscale`` is developed on Python 3.6. It is also tested on Python 3.4, 3.5, and even 2.7 (for the time being). From conda ~~~~~~~~~~ Official releases of ``mpl-probscale`` can be found on conda-forge: ``conda install --channel=conda-forge mpl-probscale`` Fairly recent builds of the development verions are available on my channel: ``conda install --channel=conda-forge mpl-probscale`` From PyPI ~~~~~~~~~ Official source releases are also available on PyPI ``pip install probscale`` From source ~~~~~~~~~~~ ``mpl-probscale`` is a pure python package. It should be fairly trivial to install from source on any platform. To do that, download or clone from `github `__, unzip the archive if necessary then do: :: cd mpl-probscale # or wherever the setup.py got placed pip install . I recommend ``pip install .`` over ``python setup.py install`` for `reasons I don't fully understand `__. .. code:: python %matplotlib inline .. code:: python import warnings warnings.simplefilter('ignore') import numpy from matplotlib import pyplot from scipy import stats import seaborn clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'} seaborn.set(style='ticks', context='talk', color_codes=True, rc=clear_bkgd) Background ---------- Built-in matplotlib scales ~~~~~~~~~~~~~~~~~~~~~~~~~~ To the casual user, you can set matplotlib scales to either "linear" or "log" (logarithmic). There are others (e.g., logit, symlog), but I haven't seen them too much in the wild. Linear scales are the default: .. code:: python fig, ax = pyplot.subplots() seaborn.despine(fig=fig) .. image:: getting_started_files/output_4_0.png Logarithmic scales can work well when your data cover several orders of magnitude and don't have to be in base 10. .. code:: python fig, (ax1, ax2) = pyplot.subplots(nrows=2, figsize=(8,3)) ax1.set_xscale('log') ax1.set_xlim(left=1e-3, right=1e3) ax1.set_xlabel("Base 10") ax1.set_yticks([]) ax2.set_xscale('log', basex=2) ax2.set_xlim(left=2**-3, right=2**3) ax2.set_xlabel("Base 2") ax2.set_yticks([]) seaborn.despine(fig=fig, left=True) .. image:: getting_started_files/output_6_0.png Probabilty Scales ~~~~~~~~~~~~~~~~~ ``mpl-probscale`` lets you use probability scales. All you need to do is import it. Before importing, there is no probability scale available in matplotlib: .. code:: python try: fig, ax = pyplot.subplots() ax.set_xscale('prob') except ValueError as e: pyplot.close(fig) print(e) .. parsed-literal:: Unknown scale type 'prob' To access probability scales, simply import the ``probscale`` module. .. code:: python import probscale fig, ax = pyplot.subplots(figsize=(8, 3)) ax.set_xscale('prob') ax.set_xlim(left=0.5, right=99.5) ax.set_xlabel('Normal probability scale (%)') seaborn.despine(fig=fig) .. image:: getting_started_files/output_11_0.png Probability scales default to the standard normal distribution (note that the formatting is a percentage-based probability) You can even use different probability distributions, though it can be tricky. You have to pass a frozen distribution from either `scipy.stats `__ or `paramnormal `__ to the ``dist`` kwarg in ``ax.set_[x|y]scale``. Here's a standard normal scale with two different beta scales and a linear scale for comparison. .. code:: python fig, (ax1, ax2, ax3, ax4) = pyplot.subplots(figsize=(9, 5), nrows=4) for ax in [ax1, ax2, ax3, ax4]: ax.set_xlim(left=2, right=98) ax.set_yticks([]) ax1.set_xscale('prob') ax1.set_xlabel('Normal probability scale, as percents') beta1 = stats.beta(a=3, b=2) ax2.set_xscale('prob', dist=beta1) ax2.set_xlabel('Beta probability scale (α=3, β=2)') beta2 = stats.beta(a=2, b=7) ax3.set_xscale('prob', dist=beta2) ax3.set_xlabel('Beta probability scale (α=2, β=7)') ax4.set_xticks(ax1.get_xticks()[12:-12]) ax4.set_xlabel('Linear scale (for reference)') seaborn.despine(fig=fig, left=True) .. image:: getting_started_files/output_13_0.png Ready-made probability plots ---------------------------- ``mpl-probscale`` ships with a small ``viz`` module that can help you make a probability plot of a sample. With only the sample data, ``probscale.probplot`` will create a figure, compute the plotting position and non-exceedance probabilities, and plot everything: .. code:: python numpy.random.seed(0) sample = numpy.random.normal(loc=4, scale=2, size=37) fig = probscale.probplot(sample) seaborn.despine(fig=fig) .. image:: getting_started_files/output_15_0.png You should specify the matplotlib axes on which the plot should occur if you want to customize the plot using matplotlib commands directly: .. code:: python fig, ax = pyplot.subplots(figsize=(7, 3)) probscale.probplot(sample, ax=ax) ax.set_ylabel('Normal Values') ax.set_xlabel('Non-exceedance probability') ax.set_xlim(left=1, right=99) seaborn.despine(fig=fig) .. image:: getting_started_files/output_17_0.png Lots of other options are directly accessible from the ``probplot`` function signature. .. code:: python fig, ax = pyplot.subplots(figsize=(3, 7)) numpy.random.seed(0) new_sample = numpy.random.lognormal(mean=2.0, sigma=0.75, size=37) probscale.probplot( new_sample, ax=ax, probax='y', # flip the plot datascale='log', # scale of the non-probability axis bestfit=True, # draw a best-fit line estimate_ci=True, datalabel='Lognormal Values', # labels and markers... problabel='Non-exceedance probability', scatter_kws=dict(marker='d', zorder=2, mew=1.25, mec='w', markersize=10), line_kws=dict(color='0.17', linewidth=2.5, zorder=0, alpha=0.75), ) ax.set_ylim(bottom=1, top=99) seaborn.despine(fig=fig) .. image:: getting_started_files/output_19_0.png Percentile and Quanitile plots ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For convenience, you can do percetile and quantile plots with the same function. .. note:: The percentile and probability axes are plotted against the same values. The difference is only that "percentiles" are plotted on a linear scale. .. code:: python fig, (ax1, ax2, ax3) = pyplot.subplots(nrows=3, figsize=(8, 7)) probscale.probplot(sample, ax=ax1, plottype='pp', problabel='Percentiles') probscale.probplot(sample, ax=ax2, plottype='qq', problabel='Quantiles') probscale.probplot(sample, ax=ax3, plottype='prob', problabel='Probabilities') ax2.set_xlim(left=-2.5, right=2.5) ax3.set_xlim(left=0.5, right=99.5) fig.tight_layout() seaborn.despine(fig=fig) .. image:: getting_started_files/output_22_0.png Working with seaborn ``FacetGrids`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Good news, everyone. The ``probplot`` function generally works as expected with `FacetGrids `__. .. code:: python plot = ( seaborn.load_dataset("tips") .assign(pct=lambda df: 100 * df['tip'] / df['total_bill']) .pipe(seaborn.FacetGrid, hue='sex', col='time', row='smoker', margin_titles=True, aspect=1., size=4) .map(probscale.probplot, 'pct', bestfit=True, scatter_kws=dict(alpha=0.75), probax='y') .add_legend() .set_ylabels('Non-Exceedance Probabilty') .set_xlabels('Tips as percent of total bill') .set(ylim=(0.5, 99.5), xlim=(0, 100)) ) .. image:: getting_started_files/output_24_0.png