In Ruby - Compare two Enumerators elegantly, it was said
The problem with zip is that it creates arrays internally, no matter what Enumerable you pass. There's another problem with length of input params
I had a look at the implementation of Enumerable#zip in YARV, and saw
static VALUE
enum_zip(int argc, VALUE *argv, VALUE obj)
{
int i;
ID conv;
NODE *memo;
VALUE result = Qnil;
VALUE args = rb_ary_new4(argc, argv);
int allary = TRUE;
argv = RARRAY_PTR(args);
for (i=0; i<argc; i++) {
VALUE ary = rb_check_array_type(argv[i]);
if (NIL_P(ary)) {
allary = FALSE;
break;
}
argv[i] = ary;
}
if (!allary) {
CONST_ID(conv, "to_enum");
for (i=0; i<argc; i++) {
argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
}
}
if (!rb_block_given_p()) {
result = rb_ary_new();
}
/* use NODE_DOT2 as memo(v, v, -) */
memo = rb_node_newnode(NODE_DOT2, result, args, 0);
rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);
return result;
}
Am I understanding the following bits correctly?
Check whether all of the arguments are arrays, and if so, replace some indirect reference to the array with a direct reference
for (i=0; i<argc; i++) {
VALUE ary = rb_check_array_type(argv[i]);
if (NIL_P(ary)) {
allary = FALSE;
break;
}
argv[i] = ary;
}
If they aren't all arrays, create an enumerator instead
if (!allary) {
CONST_ID(conv, "to_enum");
for (i=0; i<argc; i++) {
argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
}
}
Create an array of arrays only if a block isn't given
if (!rb_block_given_p()) {
result = rb_ary_new();
}
If everything is an array, use zip_ary
, otherwise use zip_i
, and call a block on each set of values
/* use NODE_DOT2 as memo(v, v, -) */
memo = rb_node_newnode(NODE_DOT2, result, args, 0);
rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);
Return an array of arrays if no block is given, else return nil (Qnil
)?
return result;
}
I'll be using 1.9.2-p0 as that's what I have on hand.
The
rb_check_array_type
function looks like this:And
rb_check_convert_type
looks like this:Note the
convert_type
call. This looks a lot like C version ofArray.try_convert
andtry_convert
just happens to look like this:So, yes, the first loop is looking for anything in
argv
that is not an array and setting theallary
flag if it finds such a thing.In
enum.c
, we see this:So
id_each
is an internal reference for the Rubyeach
iterator method. And invm_eval.c
, we have this:So this:
Is calling
to_enum
(with, essentially, the default argument) on whatever is inargv[i]
.So, the end result of the first
for
andif
blocks is thatargv
is either full of arrays or full of enumerators rather than possibly being a mix of the two. But note how the logic works: if something is found that isn't an array, then everything becomes an enumerator. The first part of theenum_zip
function will wrap arrays in enumerators (which is essentially free or at least cheap enough not to worry about) but won't expand enumerators into arrays (which could be quite expensive). Earlier versions might have gone the other way (prefer arrays over enumerators), I'll leave that as an exercise for the reader or historians.The next part:
Creates a new empty array and leaves it in
result
ifzip
is being called without a block. And here we should note whatzip
returns:If there is a block, then there is nothing to return and
result
can stay asQnil
; if there isn't a block, then we need an array inresult
so that an array can be returned.From
parse.c
, we see thatNODE_DOT2
is a double-dot range but it looks like they're just using the new node as a simple three element struct;rb_new_node
just allocates an object, sets some bits, and assigns three values in a struct:nd_set_type
is just a bit fiddling macro. Now we havememo
as just a three element struct. This use ofNODE_DOT2
appears to be a convenient kludge.The
rb_block_call
function appears to be the core internal iterator. And we see our friendid_each
again so we'll be doing aneach
iteration. Then we see a choice betweenzip_i
andzip_ary
; this is where the inner arrays are created and pushed ontoresult
. The only difference betweenzip_i
andzip_ary
appears to be the StopIteration exception handling inzip_i
.At this point we've done the zipping and we either have the array of arrays in
result
(if there was no block) or we haveQnil
inresult
(if there was a block).Executive Summary: The first loop explicitly avoids expanding enumerators into arrays. The
zip_i
andzip_ary
calls will only work with non-temporary arrays if they have to build an array of arrays as a return value. So, if you callzip
with at least one non-array enumerator and use the block form, then it is enumerators all the way down and the "problem with zip is that it creates arrays internally" does not happen. Reviewing 1.8 or other Ruby implementations is left as an exercise for the reader.