I have built a Markov chain with which I can simulate the daily routine of people (activity patterns). Each simulation day is divided into 144-time steps and the person can carry out one of fourteen activities. Those are: Away - work (1) Away - leisure (2) Away - shopping (3) Sleeping (4) Cooking (5) Using dishwasher (6) Doing laundry (7) Vacuuming (8) Watching TV (9) Using PC (10) Personal hygiene (11) Ironing (12) Listening to music (13) Other (14)
I calculated the transition probabilities for the Markov chain from a data set containing a total of 226 diaries of persons who documented their activities in a 10-minute interval. The data set can be found here: https://www.dropbox.com/s/tyjvh810eg5brkr/data?dl=0
I have now simulated 4000 days with the Markov chain and want to validate the results with the original data set.
For this purpose, I analyse the results for the activities individually and calculate the expected value as well as the 95% confidence interval for each time step. Then I check if the mean value for the activity from the original dataset lies within this interval and calculate the number of all points that do not lie within the upper or lower threshold of the confidence interval.
However, for each simulation I get deviations of different heights, sometimes in the 1% range, sometimes more than 10%.
I wonder now how this can be possible and whether it can be possible at all.
I have simulated 4x 4000 days and the results are:
The first part of the code is here (sorry, but its too long):
# import of libraries
# First, create a 143xarray for transition probability matrix since the first state of the activity pattern will be determined by the starting probability which will be calculated later.
data= data(Dataframe vom dataset see link above)
# in case activity is not observed in time step t, return 0 instead divide by 0
def a1(x, y):
try:
return x / y
except ZeroDivisionError:
return 0
#create matrix for transition probabilities
matrix = np.empty(shape=(143, 14, 14))
#iterate through each time step and calculate probabilities for each activity (from state j to state m)
#save matrix as 3D-array
h=data
for i in range(0, 143):
for j in range(1, 15):
for m in range(1, 15):
a = a1(len(h[(h.iloc[:, i] == j) & (h.iloc[:, i + 1] == m)]), len(h[h.iloc[:, i] == j]))
matrix[i, j - 1, m - 1] = a
np.save(einp_wi_week, matrix)
# now calculate starting probabilities for time step 0 (initial start activity)
matrix_original_probability_starting_state= np.empty(shape=(14, 1))
for i in range(0, 1):
for j in range(1, 15):
a = a1(len(h[(h.iloc[:, i] == j)]), len(h.iloc[:, i]))
matrix_original_probability_starting_state[j - 1, i] = a
np.save(einp_wi_week_start, matrix_original_probability_starting_state)
# import transition probabilities for markov-chain (I have several worksheets to keep track of my approach and to document the results)
# MARKOV-CHAINS TO GENERATE ACTIVITY PATTERNS
# for every time step, the markov-chain can be either in one of these states / activities
states = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"]
# Possible sequences of state changes (since transferring from state 1 to state 11, for instance has the same code as transferring from 11 to 1, all state transfers for 1 to a number higher than 9 is adjusted in the transfer code)
transitionName = [["11", "12", "13", "14", "15", "16", "17", "18", "19", "0110", "0111", "0112", "0113", "0114"], ["21", "22", "23", "24", "25", "26", "27", "28", "29", "210", "211", "212", "213", "214"], ["31", "32", "33", "34", "35", "36", "37", "38", "39", "310", "311", "312", "313", "314"], ["41", "42", "43", "44", "45", "46", "47", "48", "49", "410", "411", "412", "413", "414"], ["51", "52", "53", "54", "55", "56", "57", "58", "59", "510", "511", "512", "513", "514"], ["61", "62", "63", "64", "65", "66", "67", "68", "69", "610", "611", "612", "613", "614"], ["71", "72", "73", "74", "75", "76", "77", "78", "79", "710", "711", "712", "713", "714"], ["81", "82", "83", "84", "85", "86", "87", "88", "89", "810", "811", "812", "813", "814"], ["91", "92", "93", "94", "95", "96", "97", "98", "99", "910", "911", "912", "913", "914"], ["101", "102", "103", "104", "105", "106", "107", "108", "109", "1010", "1011", "1012", "1013", "1014"], ["111", "112", "113", "114", "115", "116", "117", "118", "119", "1110", "1111", "1112", "1113", "1114"], ["121", "122", "123", "124", "125", "126", "127", "128", "129", "1210", "1211", "1212", "1213", "1214"], ["131", "132", "133", "134", "135", "136", "137", "138", "139", "1310", "1311", "1312", "1313", "1314"], ["141", "142", "143", "144", "145", "146", "147", "148", "149", "1410", "1411", "1412", "1413", "1414"]]
def activity_forecast(simulation_days):
# activitypattern is an empty array that stores the sequence of states in the simulation.
activitypattern = []
for m in range(0,simulation_days):
# for each simulation day, the activitypattern for all 144 time steps (10-min resolution) must be generated
# pro determines the transition probabilities matrix that is used for the simulation day and can be changed with regards to the day simulated (weekday, Friday, Saturday, Sunday).
# Need to be changed for the validation in order to validate activity patterns for weekdays, Fridays, Saturdays, and Sundays (see transition probability matrices above and comments later)
pro = einp_wi_week
#Determine starting activity. Therefore use vector with starting probabilities for all activites, generated in "Clustern_Ausgangsdaten..."
activityToday = np.random.choice(states,replace=True,p=einp_wi_week_start[:,0])
# create array were the sequence of states for simulation day is stored. This array will be appended to the activity pattern array
activityList = [activityToday]
#since the first activity state for each simulation day is determined, we have to generate activity states for the 143 following time steps
for i in range(0,143):
#create sequence of activities
if activityToday == "1":
change = np.random.choice(transitionName[0],replace=True,p=pro[i,0,:])
if change == "11":
activityList.append("1")
pass
elif change == "12":
activityToday = "2"
activityList.append("2")
elif change == "13":
activityToday = "3"
activityList.append("3")
elif change == "14":
activityToday = "4"
activityList.append("4")
elif change == "15":
activityToday = "5"
activityList.append("5")
elif change == "16":
activityToday = "6"
activityList.append("6")
elif change == "17":
activityToday = "7"
activityList.append("7")
elif change == "18":
activityToday = "8"
activityList.append("8")
elif change == "19":
activityToday = "9"
activityList.append("9")
elif change == "0110":
activityToday = "10"
activityList.append("10")
elif change == "0111":
activityToday = "11"
activityList.append("11")
elif change == "0112":
activityToday = "12"
activityList.append("12")
elif change == "0113":
activityToday = "13"
activityList.append("13")
else:
activityToday = "14"
activityList.append("14")
elif activityToday == "2":
change = np.random.choice(transitionName[1],replace=True,p=pro[i,1,:])
if change == "22":
activityList.append("2")
pass
elif change == "21":
activityToday = "1"
activityList.append("1")
elif change == "23":
activityToday = "3"
activityList.append("3")
elif change == "24":
activityToday = "4"
activityList.append("4")
elif change == "25":
activityToday = "5"
activityList.append("5")
elif change == "26":
activityToday = "6"
activityList.append("6")
elif change == "27":
activityToday = "7"
activityList.append("7")
elif change == "28":
activityToday = "8"
activityList.append("8")
elif change == "29":
activityToday = "9"
activityList.append("9")
elif change == "210":
activityToday = "10"
activityList.append("10")
elif change == "211":
activityToday = "11"
activityList.append("11")
elif change == "212":
activityToday = "12"
activityList.append("12")
elif change == "213":
activityToday = "13"
activityList.append("13")
else:
activityToday = "14"
activityList.append("14")
[code is too long - code needs to be written for each activity (1-14)]
activitypattern.append(activityList)
# save the files in order to use them later for comparison(and also for documentation since they can change for each simulation)
df=pd.DataFrame(activitypattern)
df.to_csv("Activitypatterns_synthetic_one_person_ft_aggregated_week_4000_days_new",index=False)
#call function
activity_forecast(4000)
I am happy for any advice/feedback.
Thanks, Felix
Here is more code :)