Interrogating the results of the Markov simulation - Help and feedback highly appreciated

117 Views Asked by At

I have built a Markov chain with which I can simulate the daily routine of people (activity patterns). Each simulation day is divided into 144-time steps and the person can carry out one of fourteen activities. Those are: Away - work (1) Away - leisure (2) Away - shopping (3) Sleeping (4) Cooking (5) Using dishwasher (6) Doing laundry (7) Vacuuming (8) Watching TV (9) Using PC (10) Personal hygiene (11) Ironing (12) Listening to music (13) Other (14)

I calculated the transition probabilities for the Markov chain from a data set containing a total of 226 diaries of persons who documented their activities in a 10-minute interval. The data set can be found here: https://www.dropbox.com/s/tyjvh810eg5brkr/data?dl=0

I have now simulated 4000 days with the Markov chain and want to validate the results with the original data set.

For this purpose, I analyse the results for the activities individually and calculate the expected value as well as the 95% confidence interval for each time step. Then I check if the mean value for the activity from the original dataset lies within this interval and calculate the number of all points that do not lie within the upper or lower threshold of the confidence interval.

However, for each simulation I get deviations of different heights, sometimes in the 1% range, sometimes more than 10%.

I wonder now how this can be possible and whether it can be possible at all.

I have simulated 4x 4000 days and the results are:

Graphical results for first simulation

Results for the simulation

The first part of the code is here (sorry, but its too long):

# import of libraries

# First, create a 143xarray for transition probability matrix since the first state of the activity pattern will be determined by the starting probability which will be calculated later.

data= data(Dataframe vom dataset see link above)

# in case activity is not observed in time step t, return 0 instead divide by 0

def a1(x, y):
    try:
        return x / y
    except ZeroDivisionError:
        return 0

#create matrix for transition probabilities
matrix = np.empty(shape=(143, 14, 14))

#iterate through each time step and calculate probabilities for each activity (from state j to state m)

#save matrix as 3D-array
h=data
for i in range(0, 143):
    for j in range(1, 15):
        for m in range(1, 15):
            a = a1(len(h[(h.iloc[:, i] == j) & (h.iloc[:, i + 1] == m)]), len(h[h.iloc[:, i] == j]))
            matrix[i, j - 1, m - 1] = a
np.save(einp_wi_week, matrix)

# now calculate starting probabilities for time step 0 (initial start activity)

matrix_original_probability_starting_state= np.empty(shape=(14, 1))
for i in range(0, 1):
    for j in range(1, 15):
        a = a1(len(h[(h.iloc[:, i] == j)]), len(h.iloc[:, i]))
        matrix_original_probability_starting_state[j - 1, i] = a
np.save(einp_wi_week_start, matrix_original_probability_starting_state)


# import transition probabilities for markov-chain (I have several worksheets to keep track of my approach and to document the results)


# MARKOV-CHAINS TO GENERATE ACTIVITY PATTERNS


# for every time step, the markov-chain can be either in one of these states / activities
states = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"]

# Possible sequences of state changes (since transferring from state 1 to state 11, for instance has the same code as transferring from 11 to 1, all state transfers for 1 to a number higher than 9 is adjusted in the transfer code)
transitionName = [["11", "12", "13", "14", "15", "16", "17", "18", "19", "0110", "0111", "0112", "0113", "0114"], ["21", "22", "23", "24", "25", "26", "27", "28", "29", "210", "211", "212", "213", "214"], ["31", "32", "33", "34", "35", "36", "37", "38", "39", "310", "311", "312", "313", "314"], ["41", "42", "43", "44", "45", "46", "47", "48", "49", "410", "411", "412", "413", "414"], ["51", "52", "53", "54", "55", "56", "57", "58", "59", "510", "511", "512", "513", "514"], ["61", "62", "63", "64", "65", "66", "67", "68", "69", "610", "611", "612", "613", "614"], ["71", "72", "73", "74", "75", "76", "77", "78", "79", "710", "711", "712", "713", "714"], ["81", "82", "83", "84", "85", "86", "87", "88", "89", "810", "811", "812", "813", "814"], ["91", "92", "93", "94", "95", "96", "97", "98", "99", "910", "911", "912", "913", "914"], ["101", "102", "103", "104", "105", "106", "107", "108", "109", "1010", "1011", "1012", "1013", "1014"], ["111", "112", "113", "114", "115", "116", "117", "118", "119", "1110", "1111", "1112", "1113", "1114"], ["121", "122", "123", "124", "125", "126", "127", "128", "129", "1210", "1211", "1212", "1213", "1214"], ["131", "132", "133", "134", "135", "136", "137", "138", "139", "1310", "1311", "1312", "1313", "1314"], ["141", "142", "143", "144", "145", "146", "147", "148", "149", "1410", "1411", "1412", "1413", "1414"]]


def activity_forecast(simulation_days):
    # activitypattern is an empty array that stores the sequence of states in the simulation.
    activitypattern = []
    for m in range(0,simulation_days):
        # for each simulation day, the activitypattern for all 144 time steps (10-min resolution) must be generated

        # pro determines the transition probabilities matrix that is used for the simulation day and can be changed with regards to the day simulated (weekday, Friday, Saturday, Sunday). 

        # Need to be changed for the validation in order to validate activity patterns for weekdays, Fridays, Saturdays, and Sundays (see transition probability matrices above and comments later)

        pro = einp_wi_week

        #Determine starting activity. Therefore use vector with starting probabilities for all activites, generated in "Clustern_Ausgangsdaten..."
        activityToday = np.random.choice(states,replace=True,p=einp_wi_week_start[:,0])



        # create array were the sequence of states for simulation day is stored. This array will be appended to the activity pattern array

        activityList = [activityToday]

        #since the first activity state for each simulation day is determined, we have to generate activity states for the 143 following time steps
        for i in range(0,143):
            #create sequence of activities
            if activityToday == "1":
                change = np.random.choice(transitionName[0],replace=True,p=pro[i,0,:])
                if change == "11":
                    activityList.append("1")
                    pass
                elif change == "12":
                    activityToday = "2"
                    activityList.append("2")
                elif change == "13":
                    activityToday = "3"
                    activityList.append("3")
                elif change == "14":
                    activityToday = "4"
                    activityList.append("4")
                elif change == "15":
                    activityToday = "5"
                    activityList.append("5")
                elif change == "16":
                    activityToday = "6"
                    activityList.append("6")
                elif change == "17":
                    activityToday = "7"
                    activityList.append("7")
                elif change == "18":
                    activityToday = "8"
                    activityList.append("8")
                elif change == "19":
                    activityToday = "9"
                    activityList.append("9")
                elif change == "0110":
                    activityToday = "10"
                    activityList.append("10")
                elif change == "0111":
                    activityToday = "11"
                    activityList.append("11")
                elif change == "0112":
                    activityToday = "12"
                    activityList.append("12")
                elif change == "0113":
                    activityToday = "13"
                    activityList.append("13")
                else:
                    activityToday = "14"
                    activityList.append("14")
            elif activityToday == "2":
                change = np.random.choice(transitionName[1],replace=True,p=pro[i,1,:])           
                if change == "22":
                    activityList.append("2")
                    pass
                elif change == "21":
                    activityToday = "1"
                    activityList.append("1")
                elif change == "23":
                    activityToday = "3"
                    activityList.append("3")
                elif change == "24":
                    activityToday = "4"
                    activityList.append("4")
                elif change == "25":
                    activityToday = "5"
                    activityList.append("5")
                elif change == "26":
                    activityToday = "6"
                    activityList.append("6")
                elif change == "27":
                    activityToday = "7"
                    activityList.append("7")
                elif change == "28":
                    activityToday = "8"
                    activityList.append("8")
                elif change == "29":
                    activityToday = "9"
                    activityList.append("9")
                elif change == "210":
                    activityToday = "10"
                    activityList.append("10")
                elif change == "211":
                    activityToday = "11"
                    activityList.append("11")
                elif change == "212":
                    activityToday = "12"
                    activityList.append("12")
                elif change == "213":
                    activityToday = "13"
                    activityList.append("13")
                else:
                    activityToday = "14"
                    activityList.append("14")
            [code is too long - code needs to be written for each activity (1-14)]

        activitypattern.append(activityList)
    # save the files in order to use them later for comparison(and also for documentation since they can change for each simulation)
    df=pd.DataFrame(activitypattern)
    df.to_csv("Activitypatterns_synthetic_one_person_ft_aggregated_week_4000_days_new",index=False)

#call function
activity_forecast(4000)

I am happy for any advice/feedback.

Thanks, Felix

1

There are 1 best solutions below

0
On

Here is more code :)

# Activity patterns have been saved

# Now, use original data and calculate the probability for each time step

k=data
matrix_original_one_person_wi_week= np.empty(shape=(14, 144))
for i in range (0,144):
    for j in range (1,15):
        a=a1(len(k[(k.iloc[:,i]==j)]),len(k.iloc[:,i]))
        matrix_original_one_person_wi_week[j-1,i]=a

#create dataframe
matrix_df_activities_original_one_person_winter_week=pd.DataFrame(matrix_original_one_person_wi_week)
print(matrix_df_activities_original_one_person_winter_week)


# Activity profiles have been generated for [4000] simulation days and stored as csv file.

# loading data
activities_one_person_winter_week_4000_days=pd.read_csv("XXX", delimiter=";")

# load timesteps csv (10-min-resolution array) to create array with 10-min time steps as x-axis
timesteps = pd.read_csv("XXX", delimiter=";", header=None)
timesteps_array = timesteps[0].values.tolist()

# now transform to dataframe
df_activities_one_person_winter_week_4000_days=pd.DataFrame(activities_one_person_winter_week_4000_days)

# upper and lower threshold for the 95% confidence interval are stored within a single array.
confidence_upper = []
confidence_lower = []

# now iterate through each time step to get the mean value and the upper and lower threshold

# this is done for each activity

# for each column of the dataframe, change values of activities to 1 (activity observed) or 0 (other activity observed)

# for changing the activity for which the values should be calculated, just change the the first value within np.where (a==)

for i in range(0,144):
    a = df_activities_one_person_winter_week_4000_days.iloc[:,i]

    #look for specifc value (first number) and replace it with (1) the rest ist (0), activity starts with 1, not zero!
    count = np.where(a == 1, 1, 0)
    mean = count.mean()

    #confidence interval is 0.95
    confidence = 0.95

    # number of observations
    n = len(count)

    #standard error
    std_err = stats.sem(count)

    # calculate value for lower and upper threshold
    threshold_value = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)

    # add / subtract threshold value from mean value mean
    upper_threshold = mean + threshold_value
    lower_threshold = mean - threshold_value

    #append values to array
    confidence_upper.append(upper_threshold*100)
    confidence_lower.append(lower_threshold*100)

#now we need to calculate the mean value for each activity for each time step from the original data

# just change the first value in iloc for the activity you want to have the probabilities for (its always "a" (activity number defined in np.where) - 1)

original_data_value=np.array(matrix_df_activities_original_one_person_winter_week.iloc[0,:])   




# Now count the frequency how many data points of the original distributiona are outside of the confidence interval

# first multiply value by 100 in order to compare (since other arrays are in XX% format)
#original_data_value_percentage=[]
#for i in range(0,144):
    #a=original_data_value[i]
    #original_data_value_percentage.append(a)
#original_data_value_percentage

# now create array where all values that are lower/upper of confidence interval are stored
count_frequency=[]

#iterate through each time step
for i in range(0,144):
    if (original_data_value[i]*100)<confidence_lower[i]:
        count_frequency.append(original_data_value[i]*100)
        print("lower",i, (original_data_value[i]*100),confidence_lower[i])
    elif (original_data_value[i]*100)>confidence_upper[i]:
        count_frequency.append(original_data_value[i]*100)
        print("upper", i, (original_data_value[i]*100),confidence_upper[i])
    else:
        pass

print((len(count_frequency)/144)*100)