Successfully pretty printing pandas.Series subclass with more than 60 elements

404 Views Asked by At

This is likely an easy fix, but I don't know how to do it.

I have extended the pandas.Series class so that it can contain datasets for my research. Here's the code that I've written so far:

import pandas as pd
import numpy as np
from allantools import oadev
class Tombstone(pd.Series):
    """An extension of ``pandas.Series``, which contains raw data from a
    tombstone test.

    Parameters
    ----------
    data : array-like of floats
        The raw data measured in volts from a lock-in amplifier. If no scale
        factor is provided, this data is presumed to be in units of °/h.
    rate : float
        The sampling rate in Hz
    start : float
        The unix time stamp of the start of the run. Used to create the index
        of the Tombstone object. This can be calculated by
        running ``time.time()`` or similar. If no value is passed, the index
        of the Tombstone object will be in hours since start.
    scale_factor : float
        The conversion factor between the lock-in amplifier voltage and deg/h,
        expressed in deg/h/V.

    Attributes
    ----------
    adev : 2-tuple of arrays of floats
        Returns the Allan deviation in degrees/hour in a 2-tuple. The first
        tuple is an array of floats representing the integration times. The
        second tuple is an array of floats representing the allan deviations.
    noise : float
        The calculated angular random walk in units of °/√h taken from the
        1-Hz point on the
        Allan variance curve.
    arw : float
        The calculated angular random walk in units of °/√h taken from the
        1-Hz point on the
        Allan deviation curve.
    drift : float
        The minimum allan deviation in units of °/h.
    """

    def __init__(self, data, rate, start=None, scale_factor=0, *args, **kwargs):

        if start:
            date_index = pd.date_range(
                start=start*1e9, periods=len(data),
                freq='%.3g ms' % (1000/rate), tz='UTC')
            date_index = date_index.tz_convert('America/Los_Angeles')
        else:
            date_index = np.arange(len(data))/60/60/rate
        super().__init__(data, date_index)
        if scale_factor:
            self.name = 'voltage'
        else:
            self.name = 'rotation'
        self.rate = rate

    @property
    def _constructor(self):
        return Tombstone

    @property
    def adev(self):
        tau, dev, _, _ = oadev(np.array(self), rate=self.rate,
                               data_type='freq')
        return tau, dev

    @property
    def noise(self):
        _, dev, _, _ = oadev(np.array(self), rate=self.rate, data_type='freq')
        return dev[0]/60

    # alias
    arw = noise

    @property
    def drift(self):
        tau, dev, _, _ = oadev(np.array(self), rate=self.rate,
                               data_type='freq')
        return min(dev)

I can run this in a Jupyter notebook:

>>> t = Tombstone(np.random.rand(60), rate=10)
>>> t
0.000000    0.497036
0.000028    0.860914
0.000056    0.626183
0.000083    0.537434
0.000111    0.451693
...

The output of the last term shows the pandas.Series as expected.

But when I pass 61 elements to the constructor, I get an error

>>> t = Tombstone(np.random.rand(61), rate=10)
>>> t
TypeError: cannot concatenate a non-NDFrame object

Even with large datasets, I can still run commands without problem:

>>> from matplotlib.pyplot import loglog, show
>>> t = Tombstone(np.random.rand(10000), rate=10)
>>> t.noise
>>> loglog(*t.adev); show()

But I always get an error when I ask Jupyter notebook to pretty print t.

2017-09-13 Update

After poking through the stack trace, it seems that the problem is in when pandas tries to concatenate the first few elements and the last few elements with an ellipsis in between. Running the code below reproduces the last few lines of the stack trace:

>>> pd.concat(t.iloc[10:], t.iloc[:-10])

TypeError                                 Traceback (most recent call last)
<ipython-input-12-86a3d2f95e07> in <module>()
----> 1 pd.concat(t.iloc[10:], t.iloc[:-10])

/Users/wheelerj/miniconda3/lib/python3.5/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
   1332                        keys=keys, levels=levels, names=names,
   1333                        verify_integrity=verify_integrity,
-> 1334                        copy=copy)
   1335     return op.get_result()
   1336 

/Users/wheelerj/miniconda3/lib/python3.5/site-packages/pandas/tools/merge.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
   1389         for obj in objs:
   1390             if not isinstance(obj, NDFrame):
-> 1391                 raise TypeError("cannot concatenate a non-NDFrame object")
   1392 
   1393             # consolidate

TypeError: cannot concatenate a non-NDFrame object
2

There are 2 best solutions below

0
On

I found a fix, which should work in my case. I still think there is a way to solve it by representing the slices as NDFrame objects. Maybe someone else on SO can figure that out.

If I override the __repr__ built-in function inside of my Tombstone class,

def __repr__(self):
    ret = 'Tombstone('
    ret += 'rate=%.3g' % self.rate
    # etc...
    ret += ')'
    return ret

I can run the following:

>>> t = Tombstone(np.random.rand(61), rate=10)
>>> t
Tombstone(rate=10)
0
On

I think the problem is in your call to super().__init__(). pd.Series.__init__() has a number of additional arguments that you aren't passing through. In my case I was getting the fastpath parameter set, but not handling it.

If I tweak your __init__() like this, it seems to work:

def __init__(self, data=None, index=None, rate=None, start=None, scale_factor=0, *args, **kwargs):
    if index is None and rate is not None:
        if start:
            date_index = pd.date_range(
                start=start*1e9, periods=len(data),
                freq='%.3g ms' % (1000/rate), tz='UTC')
            date_index = date_index.tz_convert('America/Los_Angeles')
        else:
            date_index = np.arange(len(data))/60/60/rate
    else:
        date_index=index
    super().__init__(data, date_index, *args, **kwargs)
    if scale_factor:
        self.name = 'voltage'
    else:
        self.name = 'rotation'
    self.rate = rate

You need to ensure that take and indexing through iloc return objects of your type (Tombstone in this case).