Python script becomes unresponsive at 100% CPU usage (single core)

49 Views Asked by At

enter image description hereI am trying to make a measurement using the CAEN DT5742 16-channel digitizer using the library CAENPy which is basically just a wrapper around the actual CAENDigitizer.

My program scans an area with a laser using stepper motors and reads out the data coming from an analog readout board via the digitizer.

It's been working more or less well, but I noticed that my program randomly becomes unresponsive and 2 processes (my program is multiprocessed) draw 100% CPU (single core).

The code I use for acquiring data with the digitizer:

def read_and_save_events(self, max_num_events: int = 1):
        """Reads a specified number of events from the digitizer.

        Arguments
        ---------
        max_num_events: int, default 1
            Number of events to read.

        Returns
        -------
        nevts: int
            Number of events read.
        """
        nevts: int = 0
        data = []
        retries = 0
        while retries < MAX_RETRIES:
            retries += 1
            try:
                with self.device:
                    self.log.info("Reading %d events...", max_num_events)
                    while nevts < max_num_events:
                        time.sleep(0.05)
                        waveforms = self.get_waveforms()
                        current_nevts = len(waveforms)
                        nevts += current_nevts
                        data += waveforms
                        self.log.info(
                            "Read %d out of %d events...", nevts, max_num_events
                        )
                break
            except RuntimeError:
                self.log.error("Encountered error during read. Retrying...")
                self.hard_reset(self._device_id)
                self.close()
                self.device = CAEN_DT5742_Digitizer(self._device_id)
                self.init()
                time.sleep(RETRY_TIMEOUT)
        else:
            self.log.error("Too many retries, aborting read...")

        if self._save_path is None:
            self.log.warning("No save path specified, waveforms not saved!")
            return 0

        # Disentangle data and save to file
        df = pd.DataFrame(data)
        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
        data_file = os.path.join(self._save_path, f"waveforms_{timestamp}.h5")
        self.curr_savefile = data_file

        with pd.HDFStore(data_file, "w") as store:
            for channel in df.columns:
                channel_df = []
                for eventid, event in enumerate(df[channel]):
                    event_df = pd.DataFrame(event)
                    for column in event_df.columns:
                        col = pd.Series(
                            event_df[column].values,
                            name=f"{eventid}_{column.split()[0]}",
                        )
                        channel_df.append(col)
                channel_df = pd.concat(channel_df, axis=1)
                store.put(channel, channel_df)

I profiled the program using py-spy and got the attached call stack for one of the heavy duty processes. So, apparently the problem is with the _GetNumEvents method from the library.

My question: Can I even solve this bug? If not, how would I monitor my program to get out of this error state?

0

There are 0 best solutions below