I have been tasked with trying to format and display data via PHP and SQL from a CSV spreadsheet to get rid of duplicate and incorrect values. I already have the code written that imports the CSV to the database without any issues, but my problem is trying to figure out how to write a query to do what I need as the GROUP BY is not working for my needs.
My database server type is Percona Server.
Query I have tried, and is to the extent of my knowledge of SQL for grouping data:
SELECT ourso,sodte,invnr,indat,itnsa,cusnr,txam1,txcd1,txbod,state,invam,itnbr,itdsc,bzip,szip
FROM mytable
GROUP BY ourso,sodte,invnr,indat,itnsa,cusnr,txam1,txcd1,txbod,state,invam,itnbr,itdsc,bzip,szip
Sample Data (What my table columns look like with some sample data)
| id | ourso | sodte | invnr | indat | comno | shpnr | city | itnsa | linam | extca | cusnr | txam1 | bcity | txcd1 | txbod | state | invam | itnbr | itdsc | bzip | szip |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0TYXA | 2020-01-08 | 762819 | 2020-01-12 | 1 | 0 | CHARLOTTE | 8075.48 | 8075.48 | 6957.90 | 622000 | 0.00 | CHARLOTTE | 1 | NC0060 | NC | 2740.59 | IT-1923 | ITEM INFO | 28208 | 28208 |
| 2 | 0TYXA | 2020-01-08 | 762819 | 2020-01-12 | 1 | 0 | CHARLOTTE | 8075.48 | 8075.48 | 6957.90 | 622000 | 0.00 | CHARLOTTE | 1 | NC0060 | NC | 10021.21 | IT-1920 | ITEM INFO | 28208 | 28208 |
| 3 | 0TYXA | 2020-01-08 | 762819 | 2020-01-12 | 1 | 0 | CHARLOTTE | 1149.98 | 1149.98 | 996.79 | 622000 | 0.00 | CHARLOTTE | 1 | NC0060 | NC | 10021.21 | IT-1923 | ITEM INFO | 28208 | 28208 |
| 4 | 0TYXA | 2020-01-08 | 762819 | 2020-01-12 | 1 | 0 | CHARLOTTE | 1149.98 | 1149.98 | 996.79 | 622000 | 0.00 | CHARLOTTE | 1 | NC0060 | NC | 10021.21 | IT-1920 | ITEM INFO | 28208 | 28208 |
In the sample above, I have 4 rows, but the itnsa/linam/extca columns are being duplicated. the values for id 1 and 4 are correct for their respective column "itnbr", but 2 and 3 are displaying values for the wrong items.
Is there a way to write a SELECT statement that will group together same "invnr" rows, but make the itnsa/linam/extca only show the correct value?
The spreadsheet is over 60,000 rows with these errors throughout the entire spreadsheet because of how it exported (we have no control over this), and have switched to new software, but this data is from the old software.
Any recommendations would be welcome, as I have not been able to find an answer via google (I have also explored the use of DISTINCT, but it is not working in how I understand to use it).