Sparse matrix from list in R

76 Views Asked by At

Hi I have a file having structure as follows:

    > df
    LATITUDE1 LONGITUDE1 LATITUDE2 LONGITUDE2   X   V    Y   W  Cell1  Cell2
1      -71.2       -180   -71.344     178.97 -72 -72 -180 178 -26100 -25742
2      -71.0       -180   -71.300     177.70 -71 -72 -180 177 -25740 -25743
3      -70.8       -180   -71.300     177.70 -71 -72 -180 177 -25740 -25743
4      -70.6       -180   -71.444     174.30 -71 -72 -180 174 -25740 -25746
5      -70.4       -180   -71.040     175.76 -71 -72 -180 175 -25740 -25745
6      -70.2       -180   -70.499     176.33 -71 -71 -180 176 -25740 -25384
7      -70.0       -180   -70.350     177.03 -70 -71 -180 177 -25380 -25383
8      -69.8       -180   -70.995     176.40 -70 -71 -180 176 -25380 -25384
9      -69.6       -180   -71.309     171.87 -70 -72 -180 171 -25380 -25749
10     -69.4       -180   -71.015     171.42 -70 -72 -180 171 -25380 -25749

I have some R-code that summarizes non-zero transition probabilities from Cell1-levels to Cell2-levels:

counts <- by(df, df$Cell1, function(d) c(table(d$Cell2)/nrow(d)))

> counts1
df$Cell1: -26100
-25742 -25743 -25746 -25745 -25384 -25383 -25749 
     1      0      0      0      0      0      0 
------------------------------------------------------------ 
df$Cell1: -25740
-25742 -25743 -25746 -25745 -25384 -25383 -25749 
   0.0    0.4    0.2    0.2    0.2    0.0    0.0 
------------------------------------------------------------ 
df$Cell1: -25380
-25742 -25743 -25746 -25745 -25384 -25383 -25749 
  0.00   0.00   0.00   0.00   0.25   0.25   0.50 

I would like to be able to make a sparse matrix of transition probabilities from this list (zero and non-zero): Since my list elements are of unequal length this is rather difficult. I have tried do.call but this its not acceptable, since I would have to look up "manually" every Cell-level and determine whether or not it should be zero.

> do.call(rbind, counts)
-25746 -25745 -25743 -25384
-26100    1.0   1.00   1.00    1.0
-25740    0.2   0.20   0.40    0.2
-25380    0.5   0.25   0.25    0.5

Thank you.

EDIT: Using akrins code below I get a matrix of the form

do.call(rbind, counts)
       -25742 -25743 -25746 -25745 -25384 -25383 -25749
-26100      1    0.0    0.0    0.0   0.00   0.00    0.0
-25740      0    0.4    0.2    0.2   0.20   0.00    0.0
-25380      0    0.0    0.0    0.0   0.25   0.25    0.5

I am expecting results of the form

    A    B    C    D
A  aa    0   ac    0
B  ba   bb    0   bd
C   0   cb    0    0
D   0   db    0    0
1

There are 1 best solutions below

3
On

The table function creates one entry per level when given factors.

If I understood correctly, this is what you want:

df <- read.table(text="    LATITUDE1 LONGITUDE1 LATITUDE2 LONGITUDE2   X   V    Y   W  Cell1  Cell2
1      -71.2       -180   -71.344     178.97 -72 -72 -180 178 -26100 -25742
2      -71.0       -180   -71.300     177.70 -71 -72 -180 177 -25740 -25743
3      -70.8       -180   -71.300     177.70 -71 -72 -180 177 -25740 -25743
4      -70.6       -180   -71.444     174.30 -71 -72 -180 174 -25740 -25746
5      -70.4       -180   -71.040     175.76 -71 -72 -180 175 -25740 -25745
6      -70.2       -180   -70.499     176.33 -71 -71 -180 176 -25740 -25384
7      -70.0       -180   -70.350     177.03 -70 -71 -180 177 -25380 -25383
8      -69.8       -180   -70.995     176.40 -70 -71 -180 176 -25380 -25384
9      -69.6       -180   -71.309     171.87 -70 -72 -180 171 -25380 -25749
10     -69.4       -180   -71.015     171.42 -70 -72 -180 171 -25380 -25749")

levels <- unique(c(df$Cell1, df$Cell2))
df$Cell1 <- factor(df$Cell1, levels=levels)
df$Cell2 <- factor(df$Cell2, levels=levels)
t <- table(df$Cell1, df$Cell2)

require("Matrix")
mat <- Matrix(t, sparse=T)

This yields:

>t

         -26100 -25740 -25380 -25742 -25743 -25746 -25745 -25384 -25383 -25749
  -26100      0      0      0      1      0      0      0      0      0      0
  -25740      0      0      0      0      2      1      1      1      0      0
  -25380      0      0      0      0      0      0      0      1      1      2
  -25742      0      0      0      0      0      0      0      0      0      0
  -25743      0      0      0      0      0      0      0      0      0      0
  -25746      0      0      0      0      0      0      0      0      0      0
  -25745      0      0      0      0      0      0      0      0      0      0
  -25384      0      0      0      0      0      0      0      0      0      0
  -25383      0      0      0      0      0      0      0      0      0      0
  -25749      0      0      0      0      0      0      0      0      0      0

If you know that the cells are between e.g. -30000 and 30000 you can simply set levels=-30000:30000.

EDIT: If you want the probabilities, just normalize the lines or use prop.table to do it.

t <- prop.table(table(df$Cell1, df$Cell2), margin=1)

But you end up with NaN on the lines with no entries. You should normalize the lines yourself, or if you prefer the quick and dirty way, t[is.nan(t)] <- 0

So that you end up with:

> mat
10 x 10 sparse Matrix of class "dtCMatrix"
   [[ suppressing 10 column names ‘-26100’, ‘-25740’, ‘-25380’ ... ]]

-26100 . . . 1 .   .   .   .    .    .  
-25740 . . . . 0.4 0.2 0.2 0.20 .    .  
-25380 . . . . .   .   .   0.25 0.25 0.5
-25742 . . . . .   .   .   .    .    .  
-25743 . . . . .   .   .   .    .    .  
-25746 . . . . .   .   .   .    .    .  
-25745 . . . . .   .   .   .    .    .  
-25384 . . . . .   .   .   .    .    .  
-25383 . . . . .   .   .   .    .    .  
-25749 . . . . .   .   .   .    .    .