How to isolate duplicate subvalues of a group by SQL statement

57 Views Asked by At

I have been tasked with trying to format and display data via PHP and SQL from a CSV spreadsheet to get rid of duplicate and incorrect values. I already have the code written that imports the CSV to the database without any issues, but my problem is trying to figure out how to write a query to do what I need as the GROUP BY is not working for my needs.

My database server type is Percona Server.

Query I have tried, and is to the extent of my knowledge of SQL for grouping data:

SELECT ourso,sodte,invnr,indat,itnsa,cusnr,txam1,txcd1,txbod,state,invam,itnbr,itdsc,bzip,szip 
FROM mytable 
GROUP BY ourso,sodte,invnr,indat,itnsa,cusnr,txam1,txcd1,txbod,state,invam,itnbr,itdsc,bzip,szip

Sample Data (What my table columns look like with some sample data)

id ourso sodte invnr indat comno shpnr city itnsa linam extca cusnr txam1 bcity txcd1 txbod state invam itnbr itdsc bzip szip
1 0TYXA 2020-01-08 762819 2020-01-12 1 0 CHARLOTTE 8075.48 8075.48 6957.90 622000 0.00 CHARLOTTE 1 NC0060 NC 2740.59 IT-1923 ITEM INFO 28208 28208
2 0TYXA 2020-01-08 762819 2020-01-12 1 0 CHARLOTTE 8075.48 8075.48 6957.90 622000 0.00 CHARLOTTE 1 NC0060 NC 10021.21 IT-1920 ITEM INFO 28208 28208
3 0TYXA 2020-01-08 762819 2020-01-12 1 0 CHARLOTTE 1149.98 1149.98 996.79 622000 0.00 CHARLOTTE 1 NC0060 NC 10021.21 IT-1923 ITEM INFO 28208 28208
4 0TYXA 2020-01-08 762819 2020-01-12 1 0 CHARLOTTE 1149.98 1149.98 996.79 622000 0.00 CHARLOTTE 1 NC0060 NC 10021.21 IT-1920 ITEM INFO 28208 28208

In the sample above, I have 4 rows, but the itnsa/linam/extca columns are being duplicated. the values for id 1 and 4 are correct for their respective column "itnbr", but 2 and 3 are displaying values for the wrong items.

Is there a way to write a SELECT statement that will group together same "invnr" rows, but make the itnsa/linam/extca only show the correct value?

The spreadsheet is over 60,000 rows with these errors throughout the entire spreadsheet because of how it exported (we have no control over this), and have switched to new software, but this data is from the old software.

Any recommendations would be welcome, as I have not been able to find an answer via google (I have also explored the use of DISTINCT, but it is not working in how I understand to use it).

0

There are 0 best solutions below