How to refresh dataframes in real time using pandasGUI (without 'remove' and 'add' functions)

1k Views Asked by At

I'm basically importing a table from PostgreSQL every 5 seconds and refreshing it into a dataframe, which is different everytime besides the columns. Right now I only somehow achieved this by deleting the old dataframe, followed by the addition of a new one using the functions "store.add_dataframe" and "store.remove_dataframe" from pandasGUI. This method works but it takes too long and I'm aware of a function called refresh, although I don't know if it actually refreshs the dataframe or even if it's faster than adding/deleting a new one. Any thoughts?

versions: Python 3.8.5 pandasGUI 0.2.9 pandas 1.1.3

1

There are 1 best solutions below

1
On

PandasGUI creator here.

Right now I only somehow achieved this by deleting the old dataframe, followed by the addition of a new one using the functions "store.add_dataframe" and "store.remove_dataframe" from pandasGUI. This method works but it takes too long and I'm aware of a function called refresh, although I don't know if it actually refreshs the dataframe or even if it's faster than adding/deleting a new one.

There's not any alternative faster than add_dataframe and remove_dataframe. The refresh method is just a convenience method that automatically calls both of those for every DataFrame variable in your name space that matches the name of one in PandasGUI, the full code for that method is below.

    # Replace all GUI DataFrames with the current DataFrame of the same name from the scope show was called
    def refresh(self):
        callers_local_vars = self.caller_stack.f_locals.items()
        refreshed_names = []
        for var_name, var_val in callers_local_vars:
            for ix, name in enumerate([pgdf.name for pgdf in self.store.data]):
                if var_name == name:
                    none_found_flag = False
                    self.store.remove_dataframe(var_name)
                    self.store.add_dataframe(var_val, name=var_name)
                    refreshed_names.append(var_name)

        if not refreshed_names:
            print("No matching DataFrames found to refresh")
        else:
            print(f"Refreshed {', '.join(refreshed_names)}")

There are probably ways to speed up add_dataframe but that just hasn't been a priority. You could open an issue requesting that so it gets some visibility. I think the majority of the runtime for large datasets is often the calculation of statistics, maybe I could add a flag to the settings to skip that.

By the way, for stuff specific to packages like PandasGUI it's probably better to just make an issue or discussion thread on the GitHub project instead. Then you can be sure someone knowledgeable about the project will see it. And since the project is still in version 0.x.y it's subject to API changes so building up a base on Stack Overflow answers isn't as useful since things might get outdated.