How to calculate zipf exponent in R?

355 Views Asked by At

The generalized Zipf's law states that, if we rank a collection of n objects in non-decreasing order according to their size, the product of a power of the rank and of the size of each object is constant throughout the collection, i.e.

enter image description here

where r is the rank, zr is the size of the rth object, and alpha is Zipf's parameter.

I would like to calculate the exponent of the function which shows zipf's law in data, i.e. the Zipf's parameter/exponent. My data are the following:

> dput(df)
structure(list(x = c(1.06936486607035, 1.3232662468642, 1.57716762765805, 
1.83106900845189, 2.08497038924574, 2.33887177003959, 2.59277315083344, 
2.84667453162729, 3.10057591242114, 3.35447729321498, 3.60837867400883, 
3.86228005480268, 4.11618143559653, 4.37008281639038, 4.62398419718422, 
4.87788557797807, 5.13178695877192, 5.38568833956577, 5.63958972035962, 
5.89349110115347, 6.14739248194731, 6.40129386274116, 6.65519524353501, 
6.90909662432886, 7.16299800512271, 7.41689938591655, 7.6708007667104, 
7.92470214750425, 8.1786035282981, 8.43250490909195, 8.6864062898858, 
8.94030767067964, 9.19420905147349, 9.44811043226734, 9.70201181306119, 
9.95591319385504, 10.2098145746489, 10.4637159554427, 10.7176173362366, 
10.9715187170304, 11.2254200978243, 11.4793214786181, 11.733222859412, 
11.9871242402058, 12.2410256209997, 12.4949270017935, 12.7488283825874, 
13.0027297633812, 13.2566311441751, 13.5105325249689, 13.7644339057628, 
14.0183352865566, 14.2722366673505, 14.5261380481443, 14.7800394289382, 
15.033940809732, 15.2878421905258, 15.5417435713197, 15.7956449521135, 
16.0495463329074, 16.3034477137012, 16.5573490944951, 16.8112504752889, 
17.0651518560828, 17.3190532368766, 17.5729546176705, 17.8268559984643, 
18.0807573792582, 18.334658760052, 18.5885601408459, 18.8424615216397, 
19.0963629024336, 19.3502642832274, 19.6041656640213, 19.8580670448151, 
20.111968425609, 20.3658698064028, 20.6197711871967, 20.8736725679905, 
21.1275739487844, 21.3814753295782, 21.6353767103721, 21.8892780911659, 
22.1431794719598, 22.3970808527536, 22.6509822335474, 22.9048836143413, 
23.1587849951351, 23.412686375929, 23.6665877567228, 23.9204891375167, 
24.1743905183105, 24.4282918991044, 24.6821932798982, 24.9360946606921
), y = c(-2.97228886692625, -2.95440976170107, -2.93928459279152, 
-2.92685672250007, -2.91707897563357, -2.91054871731668, -2.90679861996743, 
-2.90554785065139, -2.90675006309313, -2.91036572966993, -2.91696470816554, 
-2.92597057051316, -2.93718053039632, -2.95054999876795, -2.96603736909913, 
-2.98406085693689, -3.00405379487858, -3.02588740495999, -3.04950848046858, 
-3.07486235427239, -3.10210692287855, -3.13082120061712, -3.16091945841148, 
-3.19233074728207, -3.2249788128355, -3.25869455640463, -3.29332682179158, 
-3.32879100108009, -3.36499680219032, -3.40182490231023, -3.43885206667123, 
-3.47620809318544, -3.51379996912, -3.55153068719991, -3.58922700390204, 
-3.62638735300239, -3.66333893302836, -3.70000206447245, -3.73629766644354, 
-3.77202286263472, -3.80675861286812, -3.84092561898948, -3.87447795824782, 
-3.90737403696004, -3.93941841852314, -3.97035436941129, -4.00059707307105, 
-4.03013619484928, -4.05896597861451, -4.0869186255246, -4.11388702659758, 
-4.14021632809744, -4.16591776523316, -4.19100561781447, -4.21534097913824, 
-4.23891623603497, -4.26199110836985, -4.2845881782092, -4.30673264230625, 
-4.32831990641116, -4.34940948783001, -4.37019317123469, -4.39070825537989, 
-4.41099561113224, -4.43100956980122, -4.45086204575797, -4.47069657988101, 
-4.49057219155847, -4.51055207455314, -4.53067990556232, -4.55108865901739, 
-4.57188026595389, -4.59313206537618, -4.61492481492843, -4.63739606090163, 
-4.6606424296565, -4.68472597512036, -4.70972674413206, -4.73572647040504, 
-4.76290834240927, -4.79128541380379, -4.820888083087, -4.85177836410324, 
-4.88401718152641, -4.91772243314579, -4.95283585285162, -4.98936501364757, 
-5.02733146115584, -5.0667505747377, -5.10751161205962, -5.14958042252788, 
-5.19293189917589, -5.23752155483956, -5.28329086087404, -5.3297251199846
)), row.names = c(NA, -95L), class = c("tbl_df", "tbl", "data.frame"
))

They result from the kernel density estimation of degree distibution of a network (i.e. in the x axis we have the degree and in the y axis the logarithm of number of nodes with that degree).

How can I estimate the Zipf's exponent from this dataset?

1

There are 1 best solutions below

2
On

You can check out the gamlss package which provides functions to fit Zipf distribution (and other varieties of it).

https://cran.r-project.org/web/packages/gamlss/gamlss.pdf (pg. 47)

# install.packages('gamlss')
library(gamlss)

gamlss(
   formula = ...,
   data = ...,
   family = ZIPF(mu.link = 'log')
)

But I don't really understand your data. You said the x axis is the degree so how come they are not integers? Is your network a valued network?

Also you said the y axis is the logarithm of number of nodes. But that would imply the number of nodes are numbers between 0 and 1 which doesn't really make sense.