How do I measure the periodicity (or frequency) of a list of values?

1.4k Views Asked by At

Let's say I have a list of values: [0, 10, 20, 10, 0, 10, 20, 10, 0, ...]

Clearly there's periodicity. We see that there is a cycle every 5 entries. I want to measure the average periodicity, or the average number of entries it takes to complete a cycle, within the list above.

This seems similar to measuring autoocorrelation but I don't know where to begin to get some sort of measure of the "frequency" or "periodicity", aka how fast a cycle is completed.

5

There are 5 best solutions below

11
On

Initiate an empty list

interval = []

and use a recursive function, like so:

def check_for_interval(interval,list):

    ## step 1: add first list element into your interval
    interval.append(list[0])

    ## step 2: remove that element from your list
    list.pop(0)

    ## step 3: get the current content of your interval, plus the next
    ## element, and check if the concatenated content appears another time     
    ## in the source list.

    ## first, make sure you got only strings in your list, for join to work
    str_interval = []
    for y in interval:
      str_interval.append(str(y))

    ## attach the next element, which now is the first one of the list
    ## because you popped the "new" first one above
    str_interval.append(str(list[0]))

    ## next, concatenate the list content as string, like so:
    current_interval = ",".join(str_interval)

    ## now, transform the current remaining list (except the "new" first
    ## element cause added in your test string above) into a string of the 
    ## exact same structure (elements separated by commas)
    str_test = []
    list_test = list[1:]
    for z in list_test:
      str_test.append(str(z))

    ## next,concatenate the list content as string, like so:
    remaining_elements = ",".join(str_test)

    ## finally, check if the current_interval is present inside the
    ## remaining_elements. If yes
    if remaining_elements.find(current_interval) != -1:

        ## check if the amount of remaining elements is equal to the amount
        ## of elements constituting the interval -1 at the moment, OR if the
        ## current_interval is found in the remaining elements, its
        ## starting index is equal to 0, and the len of str_test is a pair
        ## entire multiple of str_interval
        check_list_test = remaining_elements.split(",")
        check_list_interval = current_interval.split(",")

        if (len(str_interval) == len(str_test)) or (remaining_elements.find(current_interval) == 0 and len(str_test) % len(str_interval) == 0 and (len(str_test) / len(str_interval)) % 2 == 0 and (len(check_list_test) / len(check_list_interval)) * check_list_interval == check_list_test):

            ## If yes, attach the "new" first element of the list to the interval
            ## (that last element was included in str_interval, but is not yet
            ## present in interval)
            interval.append(list[0])

            ## and print the output
            print("your interval is: " + str(interval))
        else:

            ## otherwise, call the function recursively
            check_for_interval(interval,list)

    else:
        ## when the current interval is not found in the remaining elements,
        ## and the source list has been fully iterated (str_test's length
        ## == 0), this also means that we've found our interval
        if len(str_test) == 0:

            ## add the last list element into the interval
            interval.append(list[0])
            print("your interval is: " + str(interval))
        else:

            ## In all other cases, again call the function recursively
            check_for_interval(interval,list)

OPTIMIZED CODE ONLY, WITHOUT COMMENTS

def int_to_str_list(source):
    new_str_list = []
    for z in source:
        new_str_list.append(str(z))
    return new_str_list

def check_for_interval(interval,list):
    interval.append(list[0])
    list.pop(0)
    str_interval = int_to_str_list(interval)
    str_interval.append(str(list[0]))
    current_interval = ",".join(str_interval)
    str_test = int_to_str_list(list[1:])
    remaining_elements = ",".join(str_test)
    str_exam = remaining_elements.find(current_interval)
    if str_exam != -1:
        interval_size = len(str_interval)
        remaining_size = len(str_test)
        rem_div_inter = remaining_size / interval_size
        if (interval_size == remaining_size) or (str_exam == 0 and remaining_size % interval_size == 0 and rem_div_inter % 2 == 0 and rem_div_inter * str_interval == str_test):
            interval.append(list[0])
            print("your interval is: " + str(interval))
        else:
            check_for_interval(interval,list)
    else:
        if len(str_test) == 0 :
            interval.append(list[0])
            print("your interval is: " + str(interval))
        else:
            check_for_interval(interval,list)

To do what you want, simply run your function after initiating []

interval = []
check_for_interval(interval,list)

should work for pretty much any case, delivering you the interval as output.

0
On

The "pure" average periodicity will necessarily be equivalent to the length of the list divided by the count of occurrences of an element.

We can also account for first and last appearances, and use that in our calculation, although this may affect you calculation in ways that you might not want:

from collections import Counter
values = [0, 10, 20, 10, 0, 10, 20, 10, 0]

counts = Counter(values)

periodicities = dict()

r_values = values[::-1]
for k, v in counts.items():
    print(r_values.index(k), values.index(k))
    periodicities[k] = (len(values) - r_values.index(k) - values.index(k) + 1) / v

print(periodicities)

Result:

{
  0: 3.3333333333333335,
  10: 2.0,
  20: 3.0
}
0
On

Minimal version:

a=[0, 10, 20, 10, 0, 10, 20, 10, 0, 10, 20]
n=len(a)

# The idea is to compare the repeated subset of the array with the original array
# while keeping the sizes equal

periods = [i for i in range(2,n//2+1) if a[:i]*(n//i)==a[:n - n % i]]

print('Min period=',periods[0], '\n',a[:periods[0]])

Output:

Min period: 4 
 [0, 10, 20, 10]

for-loop version:

Here is the same idea with for-loop just to make it more clear:

a=[0, 10, 20, 10, 0, 10, 20, 10, 0, 10, 20]
n = len(a)
periods=[]
for i in range(2, n // 2 + 1): # cycle's max length = 1/2 of sequence 
  m = n // i 
  word = a[:i]
  repeated_word = [a[:i]*m][0]
  same_size_array = a[:len(repeated_word)]
  isCycle = repeated_word == same_size_array
  if isCycle:
    periods.append(i)

  print(
      '%s-char word\t' % i,word,
      '\nRepeated word\t',repeated_word,
      '\nSame size array\t',same_size_array,
      '\nEqual(a Cycle)?\t',isCycle
      ,'\n'
      )
  
period = periods[0] # shortest cycle
print('Min period:',period,'\n',a[:period])

Output (long version):

2-char word  [0, 10] 
Repeated word    [0, 10, 0, 10, 0, 10, 0, 10, 0, 10] 
Same size array  [0, 10, 20, 10, 0, 10, 20, 10, 0, 10] 
Equal(a Cycle)?  False 

3-char word  [0, 10, 20] 
Repeated word    [0, 10, 20, 0, 10, 20, 0, 10, 20] 
Same size array  [0, 10, 20, 10, 0, 10, 20, 10, 0] 
Equal(a Cycle)?  False 

4-char word  [0, 10, 20, 10] 
Repeated word    [0, 10, 20, 10, 0, 10, 20, 10] 
Same size array  [0, 10, 20, 10, 0, 10, 20, 10] 
Equal(a Cycle)?  True 

5-char word  [0, 10, 20, 10, 0] 
Repeated word    [0, 10, 20, 10, 0, 0, 10, 20, 10, 0] 
Same size array  [0, 10, 20, 10, 0, 10, 20, 10, 0, 10] 
Equal(a Cycle)?  False 

Min period: 4 
 [0, 10, 20, 10]
0
On

Note: I'm assuming you're referring to exact periodicity rather than some measure of autocorrelation. E.g., [1, 5, 8, 1, 5, 8.0000000001] would have a period of 6 rather than 3.

This is by no means optimal, but in a pinch anyone can brute force a solution that looks something like the following.

def period(L):
    n = len(L)
    for i in range(1, n):
        if n%i:
            # save a little work if `i` isn't a factor of `n`
            continue

        if all(L[k:k+i]==L[:i] for k in range(0, n, i)):
            # `L` is just `L[:i]*x` for some `x`, and since `i` is
            # increasing this must be the smallest `i` where that
            # is true
            return i
    
    # if no factor suffices, the smallest generator is the entire list
    return n

With a little more effort we can get linear performance rather than quadratic. Optimizing it further is left as an exercise for somebody who isn't me.

def period(L):
    if not L:
        return 0
    
    guess = 1
    for i, x in enumerate(L[1:], 1):
        if x != L[i%guess]:
            """
            We know for certain the period is not `guess`. Moreover, if we've
            gotten this far we've ruled out every option less than `guess`.
            Additionally, no multiple of `guess` suffices because the fact
            that `L` consists of repetitions of width `guess` up till now means
            that `i%(t*guess)!=x` for any `t` so that `t*guess<i`. Interestingly,
            that's the precisely the observation required to conclude
            `guess >= i+1`; there is some positive integer multiple of `guess`
            so that `L[:i+1]` consists of a copy of that width and some number
            of elements that are identical to that copy up to and excluding `x`.
            Since `L[:i+1]` has that structure, no width between that multiple
            of `guess` and `i` can generate `L`. Hence, the answer is at least
            `i+1`.
            """
            guess = i+1
            while len(L)%guess:
                """
                Additionally, the answer must be a factor of `len(L)`. This
                might superficially look quadratic because of the loop-in-a
                -loop aspect to it, but since `1<=guess<=len(L)` and `guess`
                is always increasing we can't possibly run this code more
                than some linear number of times.
                """
                guess += 1
            """
            If we've gotten this far, `guess` is a factor of `L`, and it is
            exactly as wide as all the elements we've seen so far. If we
            continue iterating through `L` and find that it is just a bunch
            of copies of this initial segment then we'll be done. Otherwise,
            we'll find ourselves in this same if-statement and reset our
            `guess` again.
            """
    return guess

If you want all periods, then those are simply every multiple of the minimum period which are also factors of the total length. Supposing you have a way to compute the prime factorization or all positive factors (including 1 and the integer itself) of a positive integer, the following routine can get you those. Actually finding the factors of an integer is probably out of scope and is well-answered elsewhere.

def all_periods(minimum_period, collection_size):
    p, n = minimum_period, collection_size
    if p==0:
        yield = 0
        return
    for f in positive_factors(n / p):
        yield f * p
5
On

Here is one way to approach this problem. Basically, you iterate from 2 to len(lst)//2 + 1 and check if the first n elements matches every next n elements, return n if true. If no match is found, return len(lst)

def get_periodicity(lst):
    t = len(lst)
    for n in range(2, t//2 + 1):
        for p in range(1, t//n):
            if lst[:n] != lst[n*p:n*p+n]:
                break
        else:
            rem = t%n
            if not rem or lst[-rem:] == lst[:rem]:
                return n
    else:
        return t

Tests

>>> get_periodicity([0, 10, 20, 10, 0, 10, 20, 10, 0, 10, 20])
4
>>> get_periodicity([1,1,2,1,1,2,1,1,2,1,1,2])
3
>>> get_periodicity([1,1,2,1,1,2,1,1,2,1,1,2,3])
13