I'm using SQL and Python to automatically populate a template Excel file with data each morning. In the Excel file, using the below table as an example, I need to get a distinct list of codes from the Code column, but only if the New column has a value of Y. This is an automated report that goes out each morning, so I can't do any manual data manipulation.
Existing Data Sample:
| New | Code |
| --- | ---- |
| N | RTB |
| N | RTB |
| Y | RTB |
| N | SCR |
| N | SCR |
| Y | SCR |
| Y | SCR |
| N | SLD |
| N | SLD |
| N | SLD |
| N | RTS |
| N | RTS |
| Y | RTS |
| Y | RTS |
| N | CC |
| N | CC |
There are five codes listed in the above table, but only three have a value of Y in the New column. I would like the following output:
| Code |
| ---- |
| RTB |
| SCR |
| RTS |
Creating a pivot table and listing by New then by Code creates exactly what I need, but as I said above this has to be completely automated so the pivot table wouldn't work. I just can't figure out the right formula.
This formula gets me close, but using the above example, it would list all five codes:
=IFERROR(INDEX(DATA!$B$2:$B$9999, MATCH(0, INDEX(COUNTIF($A$1:A1, DATA!$B$2:$B$9999),0,0), 0))), "") I don't know how to tell it to filter by the New column as well.
If there's an easier way to do this in Python or SQL then I'm good with that as well. I'm creating dataframes from SQL queries and writing them to excel to get the dynamic data mentioned above.
Thank you!
Pandas is happy to filter and uniquify those for you.
Feel free to use a method like .to_csv() or .to_sql() to send those reporting results to the place where you need them.