interactive large plot with vaex

775 Views Asked by At

I am using python 3.8 on Windows 10; trying to make a plot with about 700M points in it, sound wave analysis. Here: Interactive large plot with ~20 million sample points and gigabytes of data

Vaex was highly recommended. I am trying to use examples from the Vaex tutorial but the graph does not appear. I could not find a good example on Internet.

import vaex
import numpy as np
df = vaex.example()
df.plot1d(df.x, limits='99.7%');

The Vaex documents don't mention that pyplot.show() should be used to display. Plot1d plots a histogram. How to plot just connected points?

2

There are 2 best solutions below

0
On

Short answer: Vaex cannot directly create line plots. The next best option is an x/y scatter plot, which is possible, but not straightforward.

Vaex has no built-in method intended for plotting a scatter or line plot for lots of data. Its performance boost comes when plotting aggregate statistics like heatmaps, histograms, etc. Vaex is lightning fast at generating the heatmap, but sends the rasterized heatmap to matplotlib for plotting. That way, matplotlib only has to deal with the heatmap, and not worry about the millions of data points (a vaex dev explain this in https://github.com/vaexio/vaex/issues/653#issuecomment-604906650). For a basic line/scatter plot, there is no prior calculation that vaex can do to reduce the amount of data that is sent to matplotlib, so matplotlib must deal with all the millions of points. Vaex provides df.scatter(df.x, df.y), but it is just a convenience wrapper for plt.scatter() from matplotlib, which doesn't handle large data sets.

In Interactive large plot with ~20 million sample points and gigabytes of data, the scatter plot is generated by using df.plot_widget(df.x, df.y) from vaex in a creative way. df.plot_widget(df.x, df.y) actually generates a 2D-heatmap with x and y as the bins, and the counts of how many data points there are in each bin determines the plot color. This is why the outlier is black, while than the rest of the data is white when he does df.plot_widget(df.x, df.y, f='log', shape=128, backend='bqplot'). If you want to do the same, df.plot_widget() is now deprecated, and you should use df.viz.heatmap() instead. I recently described how to use a similar approach to generate more advanced scatter plots with markers and multiple data series: https://github.com/vaexio/vaex/issues/2391. I recommend checking it out to gain an understanding of the implementation.

Note that this way of plotting is not interactive at the outset, because matplotlib only has the rasterized heatmap, and has no way of updating when you zoom, pan or resize. Interactivity is only added by having vaex recalculate the heatmap whenever zooming/panning/resizing. This is implemented by vaex when using interactive widgets in Jupyter notebooks, but as far as I know, they have no support for interactivity outside of Jupyter notebook. Interactivity can be added manually, but it is a hassle. I describe how to do it in https://github.com/vaexio/vaex/issues/2391.

0
On

I am pretty sure that the vaex documentation explains that the (now deprecated) method .plot1d(...) is a wrapper around matplotlib plotting routines.

If you would like to create custom plots using the binned data, you can take this approach (I also found it in their docs)

import vaex
import numpy as np
import pylab as plt

# Load example data
df = vaex.example()

# Do the binning yourself
counts = df.count(binby=df.x, shape=64, limits='99.7%')

# Take care of the x-axis
limits = df.limits_percentage(df.x, percentage=99.7)
xvals = np.linspace(limits[0], limits[1], num=64)

# Create your custom plot via matplotlib, plotly or your favorite tool
p.plot(xvals, counts, marker='o', ms=5);