Python dictionary sorting Anagram of a string

938 Views Asked by At

I have the following question and I found this Permutation of string as substring of another, but this is using C++, I am kind of confused applying to python.

Given two strings s and t, determine whether some anagram of t is a substring of s. For example: if s = "udacity" and t = "ad", then the function returns True. Your function definition should look like: question1(s, t) and return a boolean True or False.

So I answered this question but they want me to use dictionaries instead of sorting string. The reviewer saying that;

We can first compile a dictionary of counts for t and check with every possible consecutive substring sets in s. If any set is anagram of t, then we return True, else False. Comparing counts of all characters will can be done in constant time since there are only limited amount of characters to check. Looping through all possible consecutive substrings will take worst case O(len(s)). Therefore, the time complexity of this algorithm is O(len(s)). space complexity is O(1) although we are creating a dictionary because we can have at most 26 characters and thus it is bounded.

Could you guys please help how I can use dictionaries in my solution.

Here is my solution;

# Check if s1 and s2 are anagram to each other
def anagram_check(s1, s2):
    # sorted returns a new list and compare
    return sorted(s1) == sorted(s2)

# Check if anagram of t is a substring of s
def question1(s, t):
    for i in range(len(s) - len(t) + 1):
        if anagram_check(s[i: i+len(t)], t):
            return True
    return False

def main():
    print question1("udacity", "city")

if __name__ == '__main__':
    main()  

'''
Test Case 1: question1("udacity", "city") -- True
Test Case 2: question1("udacity", "ud") -- True
Test Case 3: question1("udacity", "ljljl") -- False
'''

Any help is appreciated. Thank you,

2

There are 2 best solutions below

17
On BEST ANSWER

A pure python solution for getting an object which corresponds to how many of that char in the alphabet is in a string (t)

Using the function chr() you can convert an int to its corresponding ascii value, so you can easily work from 97 to 123 and use chr() to get that value of the alphabet.

So if you have a string say:

t = "abracadabra"

then you can do a for-loop like:

dt = {}
for c in range(97, 123):
   dt[chr(c)] = t.count(chr(c))

this worked for this part of the solution giving back the result of:

{'k': 0, 'v': 0, 'a': 5, 'z': 0, 'n': 0, 't': 0, 'm': 0, 'q': 0, 'f': 0, 'x': 0, 'e': 0, 'r': 2, 'b': 2, 'i': 0, 'l': 0, 'h': 0, 'c': 1, 'u': 0, 'j': 0, 'p': 0, 's': 0, 'y': 0, 'o': 0, 'd': 1, 'w': 0, 'g': 0}

A different solution?

Comments are welcome, but why is storing in a dict necessary? using count(), can you not simply compare the counts for each char in t, to the count of that char in s? If the count of that char in t is greater than in s return False else True.

Something along the lines of:

def question1(s, t):
   for c in range(97, 123):
      if t.count(chr(c)) > s.count(chr(c)):
         return False
   return True

which gives results:

>>> question1("udacity", "city")
True
>>> question1("udacity", "ud")
True
>>> question1("udacity", "ljljl")
False

If a dict is necessary...

If it is, then just create two as above and go through each key...

def question1(s, t):
   ds = {}
   dt = {}
   for c in range(97, 123):
      ds[chr(c)] = s.count(chr(c))
      dt[chr(c)] = t.count(chr(c))
   for c in range(97, 123):
      if dt[chr(c)] > ds[chr(c)]:
         return False
   return True

Update

The above answers ONLY CHECK FOR SUBSEQUENCES NOT SUBSTRING anagrams. As maraca explained to me in the comments, there is a distinction between the two and your example makes that clear.

Using the sliding window idea (by slicing the string), the code below should work for substrings:

def question1(s, t):
   dt = {}
   for c in range(97, 123):
      dt[chr(c)] = t.count(chr(c))
   for i in range(len(s) - len(t) + 1):
      contains = True
      for c in range(97, 123):
         if dt[chr(c)] > s[i:i+len(t)].count(chr(c)):
            contains = False
            break
      if contains:
         return True
   return False

The code above does work for ALL cases and utilizes a dictionary to speed up the calculations correctly :)

2
On
import collections
print collections.Counter("google")

Counter({'o': 2, 'g': 2, 'e': 1, 'l': 1})