Getting started with mpl-probscale

Installation

mpl-probscale is developed on Python 3.6. It is also tested on Python 3.4, 3.5, and even 2.7 (for the time being).

From conda

Official releases of mpl-probscale can be found on conda-forge:

conda install --channel=conda-forge mpl-probscale

Fairly recent builds of the development verions are available on my channel:

conda install --channel=conda-forge mpl-probscale

From PyPI

Official source releases are also available on PyPI pip install probscale

From source

mpl-probscale is a pure python package. It should be fairly trivial to install from source on any platform. To do that, download or clone from github, unzip the archive if necessary then do:

cd mpl-probscale # or wherever the setup.py got placed
pip install .

I recommend pip install . over python setup.py install for reasons I don’t fully understand.

%matplotlib inline
import warnings
warnings.simplefilter('ignore')

import numpy
from matplotlib import pyplot
from scipy import stats
import seaborn

clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'}
seaborn.set(style='ticks', context='talk', color_codes=True, rc=clear_bkgd)

Background

Built-in matplotlib scales

To the casual user, you can set matplotlib scales to either “linear” or “log” (logarithmic). There are others (e.g., logit, symlog), but I haven’t seen them too much in the wild.

Linear scales are the default:

fig, ax = pyplot.subplots()
seaborn.despine(fig=fig)
../_images/output_4_01.png

Logarithmic scales can work well when your data cover several orders of magnitude and don’t have to be in base 10.

fig, (ax1, ax2) = pyplot.subplots(nrows=2, figsize=(8,3))
ax1.set_xscale('log')
ax1.set_xlim(left=1e-3, right=1e3)
ax1.set_xlabel("Base 10")
ax1.set_yticks([])

ax2.set_xscale('log', basex=2)
ax2.set_xlim(left=2**-3, right=2**3)
ax2.set_xlabel("Base 2")
ax2.set_yticks([])

seaborn.despine(fig=fig, left=True)
../_images/output_6_01.png

Probabilty Scales

mpl-probscale lets you use probability scales. All you need to do is import it.

Before importing, there is no probability scale available in matplotlib:

try:
    fig, ax = pyplot.subplots()
    ax.set_xscale('prob')
except ValueError as e:
    pyplot.close(fig)
    print(e)
Unknown scale type 'prob'

To access probability scales, simply import the probscale module.

import probscale
fig, ax = pyplot.subplots(figsize=(8, 3))
ax.set_xscale('prob')
ax.set_xlim(left=0.5, right=99.5)
ax.set_xlabel('Normal probability scale (%)')
seaborn.despine(fig=fig)
../_images/output_11_01.png

Probability scales default to the standard normal distribution (note that the formatting is a percentage-based probability)

You can even use different probability distributions, though it can be tricky. You have to pass a frozen distribution from either scipy.stats or paramnormal to the dist kwarg in ax.set_[x|y]scale.

Here’s a standard normal scale with two different beta scales and a linear scale for comparison.

fig, (ax1, ax2, ax3, ax4) = pyplot.subplots(figsize=(9, 5), nrows=4)

for ax in [ax1, ax2, ax3, ax4]:
    ax.set_xlim(left=2, right=98)
    ax.set_yticks([])

ax1.set_xscale('prob')
ax1.set_xlabel('Normal probability scale, as percents')

beta1 = stats.beta(a=3, b=2)
ax2.set_xscale('prob', dist=beta1)
ax2.set_xlabel('Beta probability scale (α=3, β=2)')

beta2 = stats.beta(a=2, b=7)
ax3.set_xscale('prob', dist=beta2)
ax3.set_xlabel('Beta probability scale (α=2, β=7)')

ax4.set_xticks(ax1.get_xticks()[12:-12])
ax4.set_xlabel('Linear scale (for reference)')

seaborn.despine(fig=fig, left=True)
../_images/output_13_01.png

Ready-made probability plots

mpl-probscale ships with a small viz module that can help you make a probability plot of a sample.

With only the sample data, probscale.probplot will create a figure, compute the plotting position and non-exceedance probabilities, and plot everything:

numpy.random.seed(0)
sample = numpy.random.normal(loc=4, scale=2, size=37)

fig = probscale.probplot(sample)
seaborn.despine(fig=fig)
../_images/output_15_0.png

You should specify the matplotlib axes on which the plot should occur if you want to customize the plot using matplotlib commands directly:

fig, ax = pyplot.subplots(figsize=(7, 3))

probscale.probplot(sample, ax=ax)

ax.set_ylabel('Normal Values')
ax.set_xlabel('Non-exceedance probability')
ax.set_xlim(left=1, right=99)
seaborn.despine(fig=fig)
../_images/output_17_0.png

Lots of other options are directly accessible from the probplot function signature.

fig, ax = pyplot.subplots(figsize=(3, 7))

numpy.random.seed(0)
new_sample = numpy.random.lognormal(mean=2.0, sigma=0.75, size=37)

probscale.probplot(
    new_sample,
    ax=ax,
    probax='y', # flip the plot
    datascale='log', # scale of the non-probability axis
    bestfit=True, # draw a best-fit line
    estimate_ci=True,
    datalabel='Lognormal Values',  # labels and markers...
    problabel='Non-exceedance probability',
    scatter_kws=dict(marker='d', zorder=2, mew=1.25, mec='w', markersize=10),
    line_kws=dict(color='0.17', linewidth=2.5, zorder=0, alpha=0.75),
)

ax.set_ylim(bottom=1, top=99)
seaborn.despine(fig=fig)
../_images/output_19_0.png

Percentile and Quanitile plots

For convenience, you can do percetile and quantile plots with the same function.

Note

The percentile and probability axes are plotted against the same values. The difference is only that “percentiles” are plotted on a linear scale.

fig, (ax1, ax2, ax3) = pyplot.subplots(nrows=3, figsize=(8, 7))

probscale.probplot(sample, ax=ax1, plottype='pp', problabel='Percentiles')
probscale.probplot(sample, ax=ax2, plottype='qq', problabel='Quantiles')
probscale.probplot(sample, ax=ax3, plottype='prob', problabel='Probabilities')

ax2.set_xlim(left=-2.5, right=2.5)
ax3.set_xlim(left=0.5, right=99.5)
fig.tight_layout()
seaborn.despine(fig=fig)
../_images/output_22_01.png

Working with seaborn FacetGrids

Good news, everyone. The probplot function generally works as expected with FacetGrids.

plot = (
    seaborn.load_dataset("tips")
        .assign(pct=lambda df: 100 * df['tip'] / df['total_bill'])
        .pipe(seaborn.FacetGrid, hue='sex', col='time', row='smoker', margin_titles=True, aspect=1., size=4)
        .map(probscale.probplot, 'pct', bestfit=True, scatter_kws=dict(alpha=0.75), probax='y')
        .add_legend()
        .set_ylabels('Non-Exceedance Probabilty')
        .set_xlabels('Tips as percent of total bill')
        .set(ylim=(0.5, 99.5), xlim=(0, 100))
)
../_images/output_24_0.png