My problem is that I have a dataframe which has null values, but these null values are filled with another column of the same data frame, then I would like to know how to take that column and put the information of the other column to fill the missing data. I'm using deepnote link: https://deepnote.com
For example:
Column A | Column B |
---|---|
Cell 1 | Cell 2 |
NULL | Cell 4 |
My desired output:
Column A |
---|
Cell 1 |
Cell 4 |
I think it should be with sub queries and using some WHERE, any ideas?
It is not 100% clear which direction you need your solution to go, so I am offering two alternatives which I think should get you going.
Pandas way
You seem to be working with Pandas dataframes. The usual way to work with Pandas dataframes is to use Pandas builtin functions. In this case, there is literally a function for filling null values, it's called fillna. We can use it to fill values from another column like this:
This will make your
df_clean
look like you needDataframe SQL way
You mentioned "queries" and "where" in your question which seems you might be playing with some combination of Python and SQL world. Enter DuckDB world which supports exactly this, in Deepnote we call these Dataframe SQLs.
You can query e.g. CSV files directly from these Dataframe SQL blocks, but you can also use a previously defined Dataframe.
In order to fill the null values like you are requesting, we can use standard SQL querying and a function called coalesce as Paul correctly pointed out.
This will also create what you need in SQL world. In Deepnote, specifically, this will also give you a Dataframe.
Feel free to check out my project in Deepnote with these examples, and go ahead and duplicate it if you want to iterate on the code a bit. There is also plenty more alternatives, if you're in a real SQL database and want to update existing columns, you would use
update
statement. And if you are in a pure Python, this is of course also possible in a loop or using lambda functions.