I've a work in a course with mapReduce algorithem so i built an ets table in Erlang from a big data file and I would like to work on it concurrently. The table turned out to be very big and I would like to know if there is a way to split the one big table into a few smaller tables so that I can search the table concurrently using mapReduce algo, Is there any way to split one big table into sub tables??? Thnx.
Split ets in erlang
258 Views Asked by Alon Rolnik At
2
There are 2 best solutions below
0

I have worked on an intranet app in which i had to keep things in RAM most of the time. I created a stable caching library
which helped me abstract the ETS
mechanisms. In this library, i create to worker gen_servers
whose work is to create, own and expose methods for ETS
tables. I named them: cache1
and cache2
. These two keep transferring ownership to each other in a redundant fashion in case one of them gets a problem. Get application: http://www.4shared.com/zip/z_VgKLpa/cache-10.html
Just unzip it and use the Emake file
to re-compile it, and then put it into your Erlang Lib directory
To see how it works, here is a shell intraction.
F:\programming work\cache-1.0>erl -pa ebin Eshell V5.9 (abort with ^G) 1> application:start(cache). ok 2> rd(student,{name,age,sex}). student 3> cache_server:new(student,set,2). ok 4> cache_server:write(#student{name = "Muzaaya Joshua", sex = "Male",age = (2012 - 1987) }). ok 5> cache_server:write(student,[#student{name = "Joe",sex = "Male"}, #student{name = "Mike",sex = "Male"}]). ok 6> cache_server:read({student,"Muzaaya Joshua"}). [#student{name = "Muzaaya Joshua",age = 25,sex = "Male"}] 7> cache_server:read({student,"Joe"}). [#student{name = "Joe",age = undefined,sex = "Male"}] 8> cache_server:get_tables(). [{cache1,[student]},{cache2,[]}] 9> rd(class,{class,no_of_students}). class 10> cache_server:get_tables(). [{cache1,[student]},{cache2,[]}] 11> cache_server:new(class,set,2). ok 12> cache_server:get_tables(). [{cache1,[student]},{cache2,[class]}] 13> cache_server:write(class,[ #class{class = "Primary " ++ integer_to_list(N), no_of_students = random:uniform(50)} || N <- lists:seq(1,7)]) . ok 14> cache_server:read({class,"Primary 6"}). [#class{class = "Primary 6",no_of_students = 30}] 15> cache_server:delete({class,"Primary 2"}). ok 16> cache_server:get_cache_state(). [{server_state,cache1,1,[student]}, {server_state,cache2,1,[class]}] 17> rd(food,{name,type,value}). food 18> cache_server:new(food,set,2). ok 19> cache_server:write(food,[#food{name = "Orange", type = "fruit",value = "Vitamin C"}]). ok 20> cache_server:get_cache_state(). [{server_state,cache1,2,[food,student]}, {server_state,cache2,1,[class]}] 21>Now, to understand the importance of
ets:give_away/3
, lets see what happens when either cache1
or cache2
crashes. Remember that the current server state (which shows the current owner of a table) is: 21> cache_server:get_cache_state(). [{server_state,cache1,2,[food,student]}, {server_state,cache2,1,[class]}] 22>Let me crash
cache1
and we see. 22> gen_server:cast(cache1,stop). ok Cache Server: cache2 has taken over table: food from server: cache1 23> Cache Server: cache2 has taken over table: student from server: cache1 23> cache_server:get_cache_state(). [{server_state,cache1,0,[]}, {server_state,cache2,3,[student,food,class]}] 24>And likewise the other one:
24> gen_server:cast(cache2,stop). ok Cache Server: cache1 has taken over table: student from server: cache2 25> Cache Server: cache1 has taken over table: food from server: cache2 25> Cache Server: cache1 has taken over table: class from server: cache2 25> cache_server:get_cache_state(). [{server_state,cache1,3,[class,food,student]}, {server_state,cache2,0,[]}] 26>Thats it ! You could use the concepts in the source code to create something on your own. The
ETS
tables created by that library are public
and named
, so you can directly access them using ETS
functions.
You can search an ETS table concurrently without any need to split the table already:
http://www.erlang.org/doc/man/ets.html#new_2_read_concurrency
If the table is large, I would recommend you use a good match pattern to help reduce the search size: http://www.erlang.org/doc/man/ets.html#select-2