I'm working on the problem below and have posted my code. My question is, my current implementation is using a sort in Python - sorted(sorted_freq, reverse=True)
. I referred some other implementation which is using a max heap (http://www.geeksforgeeks.org/rearrange-a-string-so-that-all-same-characters-become-at-least-d-distance-away/). I think they have the same time complexity which is O(n*log n)
(if I calculate wrong, please feel free to correct). So I'm wondering if there is any benefit in terms of performance of using heap other than sort?
My code is written in Python 2.7.
Problem:
Given a string and a positive integer d. Some characters may be repeated in the given string. Rearrange characters of the given string such that the same characters become d distance away from each other. Note that there can be many possible rearrangements, the output should be one of the possible rearrangements. If no such arrangement is possible, that should also be reported. Expected time complexity is O(n) where n is length of input string.
Examples:
Input: "abb", d = 2
Output: "bab"
Input: "aacbbc", d = 3
Output: "abcabc"
Input: "geeksforgeeks", d = 3
Output: egkegkesfesor
Input: "aaa", d = 2
Output: Cannot be rearranged
Source code:
from collections import defaultdict
def rearrange(source, distance):
freq = defaultdict(int)
for c in source:
freq[c] += 1
sorted_freq = []
for k,v in freq.items():
sorted_freq.append((v,k))
sorted_freq = sorted(sorted_freq, reverse=True)
result = [0] * len(source)
for i, (v,k) in enumerate(sorted_freq):
index = i
while index < len(result) and result[index] != 0:
index += 1
if index == len(result):
return None
count = v
while count > 0:
result[index] = k
index += distance
count -= 1
if index >= len(source) and count > 0:
return None
return result
if __name__ == "__main__":
print rearrange('abb', 2)
print rearrange('aacbbc', 3)
print rearrange('geeksforgeeks', 3)
print rearrange('aaa', 2)
The time complexity of the best algorithm presented at the link is O(n + m log m) where m is the number of unique characters in the input string. As it was mentioned, since m always less the total number of letters in the alphabet (which is a fixed constant), if m is small compared to n the time complexity can be considered O(n). There's no difference in time complexity when O(m log m) sorting algorithm is used to sort the frequencies instead of a heap.
Note that your implementation has time complexity O(nm) since you initialize
index
withi
in every loop. Here's an alternative implementation usingCounter
instead ofdefaultdict
which has the issue fixed and short comparison of the performance in degenerate case:Output:
Update:
most_common
usesheapq.nlargest
under the hood which equals to heapsort in casen
is the length of given iterable. As can be seen from the results above there's no real difference. Results of course depends on the size of the data and number of unique characters.