Get a list of distinct values from one column in dynamic data where another column equals specific text?

65 Views Asked by At

I'm using SQL and Python to automatically populate a template Excel file with data each morning. In the Excel file, using the below table as an example, I need to get a distinct list of codes from the Code column, but only if the New column has a value of Y. This is an automated report that goes out each morning, so I can't do any manual data manipulation.

Existing Data Sample:

| New | Code |
| --- | ---- |
| N   | RTB  |
| N   | RTB  |
| Y   | RTB  |
| N   | SCR  |
| N   | SCR  |
| Y   | SCR  |
| Y   | SCR  |
| N   | SLD  |
| N   | SLD  |
| N   | SLD  |
| N   | RTS  |
| N   | RTS  |
| Y   | RTS  |
| Y   | RTS  |
| N   | CC   |
| N   | CC   |

There are five codes listed in the above table, but only three have a value of Y in the New column. I would like the following output:


| Code |
| ---- |
| RTB  |
| SCR  |
| RTS  |  

Creating a pivot table and listing by New then by Code creates exactly what I need, but as I said above this has to be completely automated so the pivot table wouldn't work. I just can't figure out the right formula.

This formula gets me close, but using the above example, it would list all five codes: =IFERROR(INDEX(DATA!$B$2:$B$9999, MATCH(0, INDEX(COUNTIF($A$1:A1, DATA!$B$2:$B$9999),0,0), 0))), "") I don't know how to tell it to filter by the New column as well.

If there's an easier way to do this in Python or SQL then I'm good with that as well. I'm creating dataframes from SQL queries and writing them to excel to get the dynamic data mentioned above.

Thank you!

2

There are 2 best solutions below

1
J_H On

Pandas is happy to filter and uniquify those for you.

>>> import pandas as pd
>>> df = pd.DataFrame({'New': list('NNYNNYY'), 'Code': 'RTB RTB RTB SCR SCR SCR RTB'.split()})
>>> sorted(df[df.New == 'Y'].Code.unique())
['RTB', 'SCR']

Feel free to use a method like .to_csv() or .to_sql() to send those reporting results to the place where you need them.

1
T N On

It appears that Excel now has functions for this. Try:

=UNIQUE(FILTER(Data!B:B, Data!A:A = "Y"))

Results:

| New | Code | Code |
| --- | ---- | ---- |
| N   | RTB  | RTB  |
| N   | RTB  | SCR  |
| Y   | RTB  | RTS  |
| N   | SCR  |      |
| N   | SCR  |      |
| Y   | SCR  |      |
| Y   | SCR  |      |
| N   | SLD  |      |
| N   | SLD  |      |
| N   | SLD  |      |
| N   | RTS  |      |
| N   | RTS  |      |
| Y   | RTS  |      |
| Y   | RTS  |      |
| N   | CC   |      |
| N   | CC   |      |

I am not sure what version introduced these relatively new functions. I found them referenced in this article, dated December 2018. Seems a magnitude simpler than all the match, index, and relative referencing that the old techniques needed.