I'm trying to retrieve data from an astropy fits file, in Python, where the data from lots of unique ID objects have been collated into a continuous record.
By this I mean the .fits file has shape: (25603520,)
I would like to get all the data from the rows where the ID is preselected. E.g. for the .fits file below, I would like to get all 5 columns worth of data for the rows who's first column value is: 'NGTSJ225342.6+015412'.
FITS_rec([('NGTSJ225342.6+015412', 2457881.8909375 , 85.15793 , 10.182051, 0),
('NGTSJ225342.6+015412', 2457881.89107639, 109.891716, 10.210967, 0),
('NGTSJ225342.6+015412', 2457881.89122685, 87.59581 , 10.136151, 0),
...,
('NGTSJ225330.3+012025', 2458082.58070602, nan, nan, 0),
('NGTSJ225330.3+012025', 2458082.58085648, nan, nan, 0),
('NGTSJ225330.3+012025', 2458082.58099537, nan, nan, 0)],
dtype=(numpy.record, [('SOURCE_ID', 'S20'), ('HJD', '>f8'), ('SYSFLUX', '>f4'), ('FLUX_ERR', '>f4'), ('FLAG', '>i4')]))
Ideally I would like to do this with a quick query as these files in my real dataset are so large that I would like to avoid vstacking and using where statements if I can.
Things I have tried:
f[1].data['SOURCE_ID'=='NGTSJ225445.7+014958']
... returns:
FITS_rec([], shape=(0, 25603520),
dtype=(numpy.record, [('SOURCE_ID', 'S20'), ('HJD', '>f8'), ('SYSFLUX', '>f4'), ('FLUX_ERR', '>f4'), ('FLAG', '>i4')]))
and...
f[1].data['SYSFLUX'][np.array(f[1].data['SOURCE_ID']) == 'NGTSJ225342.6+015412']
...works, but seems non-optimal.
Related package imports:
from astropy.io import fits
Apologies for not being able to provide a more concise and reproducable test script for this. I'm actually not sure how to create the 1D list like .fits record with a mix of strings, floats and intigers like the file I'm working with.