In Ruby - Compare two Enumerators elegantly, it was said
The problem with zip is that it creates arrays internally, no matter what Enumerable you pass. There's another problem with length of input params
I had a look at the implementation of Enumerable#zip in YARV, and saw
static VALUE
enum_zip(int argc, VALUE *argv, VALUE obj)
{
int i;
ID conv;
NODE *memo;
VALUE result = Qnil;
VALUE args = rb_ary_new4(argc, argv);
int allary = TRUE;
argv = RARRAY_PTR(args);
for (i=0; i<argc; i++) {
VALUE ary = rb_check_array_type(argv[i]);
if (NIL_P(ary)) {
allary = FALSE;
break;
}
argv[i] = ary;
}
if (!allary) {
CONST_ID(conv, "to_enum");
for (i=0; i<argc; i++) {
argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
}
}
if (!rb_block_given_p()) {
result = rb_ary_new();
}
/* use NODE_DOT2 as memo(v, v, -) */
memo = rb_node_newnode(NODE_DOT2, result, args, 0);
rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);
return result;
}
Am I understanding the following bits correctly?
Check whether all of the arguments are arrays, and if so, replace some indirect reference to the array with a direct reference
for (i=0; i<argc; i++) {
VALUE ary = rb_check_array_type(argv[i]);
if (NIL_P(ary)) {
allary = FALSE;
break;
}
argv[i] = ary;
}
If they aren't all arrays, create an enumerator instead
if (!allary) {
CONST_ID(conv, "to_enum");
for (i=0; i<argc; i++) {
argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
}
}
Create an array of arrays only if a block isn't given
if (!rb_block_given_p()) {
result = rb_ary_new();
}
If everything is an array, use zip_ary, otherwise use zip_i, and call a block on each set of values
/* use NODE_DOT2 as memo(v, v, -) */
memo = rb_node_newnode(NODE_DOT2, result, args, 0);
rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);
Return an array of arrays if no block is given, else return nil (Qnil)?
return result;
}
I'll be using 1.9.2-p0 as that's what I have on hand.
The
rb_check_array_typefunction looks like this:And
rb_check_convert_typelooks like this:Note the
convert_typecall. This looks a lot like C version ofArray.try_convertandtry_convertjust happens to look like this:So, yes, the first loop is looking for anything in
argvthat is not an array and setting theallaryflag if it finds such a thing.In
enum.c, we see this:So
id_eachis an internal reference for the Rubyeachiterator method. And invm_eval.c, we have this:So this:
Is calling
to_enum(with, essentially, the default argument) on whatever is inargv[i].So, the end result of the first
forandifblocks is thatargvis either full of arrays or full of enumerators rather than possibly being a mix of the two. But note how the logic works: if something is found that isn't an array, then everything becomes an enumerator. The first part of theenum_zipfunction will wrap arrays in enumerators (which is essentially free or at least cheap enough not to worry about) but won't expand enumerators into arrays (which could be quite expensive). Earlier versions might have gone the other way (prefer arrays over enumerators), I'll leave that as an exercise for the reader or historians.The next part:
Creates a new empty array and leaves it in
resultifzipis being called without a block. And here we should note whatzipreturns:If there is a block, then there is nothing to return and
resultcan stay asQnil; if there isn't a block, then we need an array inresultso that an array can be returned.From
parse.c, we see thatNODE_DOT2is a double-dot range but it looks like they're just using the new node as a simple three element struct;rb_new_nodejust allocates an object, sets some bits, and assigns three values in a struct:nd_set_typeis just a bit fiddling macro. Now we havememoas just a three element struct. This use ofNODE_DOT2appears to be a convenient kludge.The
rb_block_callfunction appears to be the core internal iterator. And we see our friendid_eachagain so we'll be doing aneachiteration. Then we see a choice betweenzip_iandzip_ary; this is where the inner arrays are created and pushed ontoresult. The only difference betweenzip_iandzip_aryappears to be the StopIteration exception handling inzip_i.At this point we've done the zipping and we either have the array of arrays in
result(if there was no block) or we haveQnilinresult(if there was a block).Executive Summary: The first loop explicitly avoids expanding enumerators into arrays. The
zip_iandzip_arycalls will only work with non-temporary arrays if they have to build an array of arrays as a return value. So, if you callzipwith at least one non-array enumerator and use the block form, then it is enumerators all the way down and the "problem with zip is that it creates arrays internally" does not happen. Reviewing 1.8 or other Ruby implementations is left as an exercise for the reader.