Python heapq providing unexpected output

73 Views Asked by At

I have a simple python script that aims to read all rows in a csv file and perform a heap sort based on the second element of each row. Here is my function to read the file:

def read Processes():
    file = open('Project1/Processes/sample.csv', 'r')
    reader = csv.reader(file, delimiter=',')
    processes = []
    for row in reader:
    heapq.heappush(processes,(row[1],row))
return processes

And here is how I am printing the data:

processess = readProcesses()
while processess:
    print(heapq.heappop(processess))

The initial output for the data is:

('10016', ['90', '10016', '8070'])
('10136', ['0', '10136', '11315'])
('10461', ['83', '10461', '79576'])
('10969', ['206', '10969', '52071'])
('2997', ['58', '2997', '12935'])
('3666', ['108', '3666', '98952'])
('3946', ['109', '3946', '22268'])
('4083', ['236', '4083', '81516'])
('4233', ['182', '4233', '28817'])
('4395', ['64', '4395', '94292'])
('4420', ['51', '4420', '52133'])

Essentially, all values over 10,000 appear at the top of the heap. The rest of the values are in the appropriate order. I've tried calling heapify after adding all of the processes but it has no effect.

1

There are 1 best solutions below

0
On

The first element of each tuple in your heap is a string, not an integer. Thus, lexicographical comparison is used, rather than a comparison of the underlying values.

If you want to compare based on the numerical values that the strings represent, change

heapq.heappush(processes,(row[1],row))

to

heapq.heappush(processes,(int(row[1]),row))