Better way of using QFileSystemWatcher to import data files as they are generated?

775 Views Asked by At

I am using PyQt4 to write an application that monitors a folder, and imports data (contained in sub-folders) as it is collected so that analysis can be run on it as the data is generated.

Essentially, the way that the data collection software works is that it generates a folder where it puts its data files. In that folder it puts several files - different .xml files that contain the different configuration parameters for the software as well as the actual data files, saved as .csv files.

I've written a little test application (below) that is able to watch a folder, look for the addition subfolders, and then watch those subfolders for changes in files so that the data can be imported. However, I'm not sure if I'm going about this the best way. In particular, because multiple files are added I have to sift through what is and isn't the addition of the files I need (generally I just need 1 of the xml files and 1 of the data files, the other bits are extraneous).

Also, and this is probably the bigger issue - the data file itself starts out as another file (just a temporary file) that the data acquisition software then converts at the end of the recording to a corresponding .csv file (the temporary file is then deleted). The directoryChanged signal that is emitted when the .csv file is generated occurs following the conversion, but before the file is entirely written (sometimes - this is where things get a bit murky). What I mean by that is that sometimes, if I import the data when that signal is emitted the entire data set is imported. Other times, only a portion (e.g. the first 200 rows) is imported. If I wait for the next directoryChanged signal, however, I get the whole data set. I'm not really sure, though, how to wait for specifically that next signal. I could potentially use a counter (in the same way that I'm using switch to jump between True and False), but that seems rather clunky (in the same way that my current implementation with "switch" seems very clunky to me)

That may not be clear, so I've provided an example output below. Where it says "Import Data" that is where I would have my import function. That is where sometimes I get the full data set, and other times I just get a portion of it.

I'm just wondering if there is a better way of going about doing this, as this was my first pass (read: guess) at how to do this.

Code:

class FileWatcher(QtGui.QWidget):

    def __init__(self):
        QtGui.QWidget.__init__(self)

        self.resize(160, 160)

        self.data_folder = None
        self.current_xml = None
        self.vr_csv = None
        self.watch_path = []
        self.watch_folder = None
        self.switch = False
        self.counter = 1

        layout = QtGui.QVBoxLayout(self)

        self.start_btn = QtGui.QPushButton("Start")
        self.start_btn.clicked.connect(self.start_watching)

        self.stop_btn = QtGui.QPushButton("Stop")
        self.stop_btn.clicked.connect(self.stop_watching)

        layout.addWidget(self.start_btn)
        layout.addWidget(self.stop_btn)

    def start_watching(self):
        folder = QtGui.QFileDialog.getExistingDirectory(self)
        self.watch_path.append(folder)
        self.watch_folder = QtCore.QFileSystemWatcher(self.watch_path)
        self.watch_folder.directoryChanged.connect(self.directory_changed)

    def stop_watching(self):
        self.data_folder = None
        self.current_xml = None
        self.vr_csv = None
        self.watch_path = []
        self.watch_folder = None
        self.switch = False
        self.counter = 1

    def data_folder_changed(self, path):

        print("data_folder_changed: %s" % self.counter)
        self.counter += 1

        xml_files = glob(path + "\\*VoltageRecording*.xml")
        csv_files = glob(path + "\\*VoltageRecording*.csv")

        if any(xml_files):
            if self.current_xml != xml_files[-1]:
                self.current_xml = xml_files[-1]
                print(self.current_xml.split('\\')[-1])

                vals = pxml.parse_vr(self.current_xml)
                self.vr_csv = os.path.join(path, vals['voltage recording file'] + '.csv')
                print(self.vr_csv.split('\\')[-1])

        if any(csv_files):
            if self.vr_csv == csv_files[-1]:
                print("switch: %s" % self.switch)

                if self.switch and self.vr_csv is not None:
                    print("Import Data")
                    print(self.vr_csv.split('\\')[-1])
                    self.switch = False
                    self.vr_csv = None

                else:
                    self.switch = True

        print()

    def directory_changed(self, path):
        data_folder_path = [glob(path+"/*")[-1], ]
        print("directory_changed")
        print(data_folder_path)
        print()

        self.data_folder = QtCore.QFileSystemWatcher(data_folder_path)
        self.data_folder.directoryChanged.connect(self.data_folder_changed)

Example Output (3 cycles, so there would be 3 data files imported):

directory_changed
['C:\\Users\\User\\Desktop\\Test_watch\\Test-001']

data_folder_changed: 1

data_folder_changed: 2

data_folder_changed: 3
Test-001_Cycle00001_VoltageRecording_001.xml
Test-001_Cycle00001_VoltageRecording_001.csv

data_folder_changed: 4

data_folder_changed: 5

data_folder_changed: 6
switch: False

data_folder_changed: 7
switch: True
Import Data
Test-001_Cycle00001_VoltageRecording_001.csv

data_folder_changed: 8

data_folder_changed: 9
Test-001_Cycle00002_VoltageRecording_001.xml
Test-001_Cycle00002_VoltageRecording_001.csv

data_folder_changed: 10

data_folder_changed: 11

data_folder_changed: 12
switch: False

data_folder_changed: 13
switch: True
Import Data
Test-001_Cycle00002_VoltageRecording_001.csv

data_folder_changed: 14
Test-001_Cycle00003_VoltageRecording_001.xml
Test-001_Cycle00003_VoltageRecording_001.csv

data_folder_changed: 15

data_folder_changed: 16

data_folder_changed: 17

data_folder_changed: 18
switch: False

data_folder_changed: 19
switch: True
Import Data
Test-001_Cycle00003_VoltageRecording_001.csv

data_folder_changed: 20
1

There are 1 best solutions below

0
On

There is a python library called watchdog, for the looks of it, it has more detailed events that you can use to tell with more accuracy what kind of event was it. I've used it in a couple of projects with no problems, one of them using PySide (for reference).