Split ets in erlang

258 Views Asked by At

I've a work in a course with mapReduce algorithem so i built an ets table in Erlang from a big data file and I would like to work on it concurrently. The table turned out to be very big and I would like to know if there is a way to split the one big table into a few smaller tables so that I can search the table concurrently using mapReduce algo, Is there any way to split one big table into sub tables??? Thnx.

2

There are 2 best solutions below

0
On

You can search an ETS table concurrently without any need to split the table already:

http://www.erlang.org/doc/man/ets.html#new_2_read_concurrency

If the table is large, I would recommend you use a good match pattern to help reduce the search size: http://www.erlang.org/doc/man/ets.html#select-2

0
On

I have worked on an intranet app in which i had to keep things in RAM most of the time. I created a stable caching library which helped me abstract the ETS mechanisms. In this library, i create to worker gen_servers whose work is to create, own and expose methods for ETS tables. I named them: cache1 and cache2. These two keep transferring ownership to each other in a redundant fashion in case one of them gets a problem. Get application: http://www.4shared.com/zip/z_VgKLpa/cache-10.html Just unzip it and use the Emake file to re-compile it, and then put it into your Erlang Lib directory

To see how it works, here is a shell intraction.

F:\programming work\cache-1.0>erl -pa ebin
Eshell V5.9  (abort with ^G)
1> application:start(cache).
ok
2> rd(student,{name,age,sex}).
student
3> cache_server:new(student,set,2).
ok
4> cache_server:write(#student{name = "Muzaaya Joshua",
                        sex = "Male",age = (2012 - 1987) }).
ok
5> cache_server:write(student,[#student{name = "Joe",sex = "Male"},
                #student{name = "Mike",sex = "Male"}]).
ok
6> cache_server:read({student,"Muzaaya Joshua"}).
[#student{name = "Muzaaya Joshua",age = 25,sex = "Male"}]
7> cache_server:read({student,"Joe"}).
[#student{name = "Joe",age = undefined,sex = "Male"}]
8> cache_server:get_tables().
[{cache1,[student]},{cache2,[]}]
9> rd(class,{class,no_of_students}).
class
10> cache_server:get_tables().
[{cache1,[student]},{cache2,[]}]
11> cache_server:new(class,set,2).
ok
12> cache_server:get_tables().
[{cache1,[student]},{cache2,[class]}]
13> cache_server:write(class,[
        #class{class = "Primary " ++ integer_to_list(N),
        no_of_students = random:uniform(50)} || N <- lists:seq(1,7)])
.
ok
14> cache_server:read({class,"Primary 6"}).
[#class{class = "Primary 6",no_of_students = 30}]
15> cache_server:delete({class,"Primary 2"}).
ok
16> cache_server:get_cache_state().
[{server_state,cache1,1,[student]},
 {server_state,cache2,1,[class]}]
17> rd(food,{name,type,value}).
food
18> cache_server:new(food,set,2).
ok
19> cache_server:write(food,[#food{name = "Orange",
                        type = "fruit",value = "Vitamin C"}]).
ok
20> cache_server:get_cache_state().
[{server_state,cache1,2,[food,student]},
 {server_state,cache2,1,[class]}]
21>
Now, to understand the importance of ets:give_away/3, lets see what happens when either cache1 or cache2 crashes. Remember that the current server state (which shows the current owner of a table) is:
21> cache_server:get_cache_state().
[{server_state,cache1,2,[food,student]},
 {server_state,cache2,1,[class]}]
22>
Let me crash cache1 and we see.
22> gen_server:cast(cache1,stop).
ok
        Cache Server: cache2 has taken over table: food from server: cache1
23>
        Cache Server: cache2 has taken over table: student from server: cache1
23> cache_server:get_cache_state().
[{server_state,cache1,0,[]},
 {server_state,cache2,3,[student,food,class]}]
24>
And likewise the other one:
24> gen_server:cast(cache2,stop).
ok
        Cache Server: cache1 has taken over table: student from server: cache2
25>
        Cache Server: cache1 has taken over table: food from server: cache2
25>
        Cache Server: cache1 has taken over table: class from server: cache2
25> cache_server:get_cache_state().
[{server_state,cache1,3,[class,food,student]},
 {server_state,cache2,0,[]}]
26>
Thats it ! You could use the concepts in the source code to create something on your own. The ETS tables created by that library are public and named , so you can directly access them using ETS functions.