Note: See the updates at the bottom of this article.
First, a little teaser (in the D programming language):
// testInputVisitor.d // Requires DMD compiler v2.059 or up import std.algorithm; import std.range; import std.stdio; import libInputVisitor; struct Foo { string[] data = ["hello", "world"]; void visit(InputVisitor!(Foo, string) v) { v.yield("a"); v.yield("b"); foreach(str; data) v.yield(str); } void visit(InputVisitor!(Foo, int) v) { v.yield(1); v.yield(2); v.yield(3); } } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo.inputVisitor!string) writeln(item); // Prints: 1, 2, 3 foreach(item; foo.inputVisitor!int) writeln(item); // It's a range! Prints: 10, 30 auto myRange = foo.inputVisitor!int; foreach(item; myRange.filter!( x => x!=2 )().map!( x => x*10 )()) writeln(item); }
Rewinding back to the start, here's our original problem: We want to iterate over a collection.
Ever since the D1 days, D has had foreach's parter, opApply, as the standard way to do this:
// testOpApply.d import std.stdio; struct Foo { string[] data = ["hello", "world"]; int opApply(int delegate(string) dg) { int result=0; result = dg("a"); if(result) return result; result = dg("b"); if(result) return result; foreach(str; data) { result = dg(str); if(result) break; } return result; } } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo) writeln(item); }
The logic within opApply is mostly natural and straightforward, but it is complicated a bit by dealing with that result variable.
Later on, D2 came along and introduced ranges - D's clean, powerful answer to C++/STL's iterators. The concept of ranges and their motivations and benefits are well explained in Andrei Alexandrescu's classic article On Iteration (and further information is in this article), so I won't go into that.
Here's an input range version of the above examples:
// testInputRange.d import std.stdio; struct Foo { string[] data = ["hello", "world"]; @property auto inputRange() { struct FooRange { private Foo foo; private size_t index=0; @property bool empty() { return index >= foo.data.length + 2; } @property string front() { if(index == 0) return "a"; if(index == 1) return "b"; return foo.data[index-2]; } void popFront() { index++; } } return FooRange(this); } } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo.inputRange) writeln(item); }
Unfortunately, that's more complex than the old opApply style, but it provides some very good benefits, particularly when used together with Phobos's std.algorithm and std.range.
But if all you need is basic foreach-style input range capabilities, that can be a lot of boilerplate. Plus, it doesn't lend itself to complex internal logic quite as naturally as opApply does. And opApply itself could stand to be easier to write.
If you've looked carefully, you may have noticed that opApply is essentially a poor man's coroutine. That's why the flow of logic is more natural in opApply than in the range version. For those unfamiliar with coroutines, see this old-but-excellent set of slides and the accompanying video. Here's the upshot: Coroutines are like a combination of traditional iterators and the visitor pattern - combining the best of both worlds.
D offers coroutines via Fiber. Looking at the docs, one of the two main ways to use Fiber is by subclassing it. Hmm...a class...easy coroutine iteration...What if we designed a Fiber subclass that was also a range?
As it turns out, fiber coroutines and input ranges make a fantastic combination that combines the power and flexibility of an input range with opApply's natural flow of logic, but without the boilerplate of opApply's messy result variable. You've already seen in the first example how easy it is to use this InputVisitor, as I call it. The source is very short, and available in the file libInputVisitor.d.
There is one small enhancement we can make to the original example, though. Compared to the opApply version, the foreach when using InputVisitor is a little more verbose:
foreach(item; foo.inputVisitor!string)
We can trim that down by adding one member to the Foo struct:
... struct Foo { ... @property auto visitor() { return this.inputVisitor!string; } ... } void main() { Foo foo; // Prints: a, b, hello, world foreach(item; foo.visitor) writeln(item); ... }
Another member can also be added to handle the int version.
The downside of this is that it doesn't lend itself very well - if even at all - to any ranges more powerful than the basic input range. Execution of a function doesn't flow in reverse, so bidirectional ranges are out. And certainly no random access. Forward range might be possible in theory, but it would depend on being able to clone a halted fiber - I don't know whether that's possible.
Beware, I haven't done any benchmarks on this, so I have no idea how much of an overhead this InputVisitor might have over the opApply or input range versions. Could be a lot, could be minimal - I don't know.
UPDATE (2012-05-01): Changed libInputVisitor license to WTFPL. For something this trivial, even the super-permissive zlib/libpng license is overkill.
UPDATE (2012-05-10): This D newsgroup thread revealed that the InputVisitor approach does in fact have a non-trivial performance overhead. Oh well :(
But Adam Ruppe pointed out a nice trick for improving the syntax of the opApply version. His approach, used like mixin(yield("someVar"));, isn't quite as nice looking as v.yield(someVar); and doesn't result in an actual range (you can only foreach over it). But when you just need a basic generator, I think it provides the best balance between syntax and speed.
4 comments for "Combine Coroutines and Input Ranges for Dead-Simple D Iteration"
There are some good comments on this over on the D newsgroup:
http://forum.dlang.org/thread/jno6o5$qtb$1@digitalmars.com
This is really cool, I browsed on over from that recent thread about how VB/C# has iterators ("yield return") but D doesn't. While Fibers may not be super fast, they are definitely more powerful than C#/VB "yield". Also, I recently compared stack switching to async/await, and I'm convinced that Fibers are a slightly more powerful abstraction:
http://qscribble.blogspot.ca/2012/07/asyncawait-vs-stack-switching.html
Since async/await is THE new feature of C# 5.0, the fact that you can already accomplish the very same things in D, albeit with a minor performance hit, seems like a good excuse to celebrate. The only problem is, someday when .NET becomes a standard back-end for D, that back-end will not support fibers (that's why C# needs async/await in the first place).
However, I have a bone to pick with this library. The caller in main() must call "foo.inputVisitor" instead of "foo.visit" which I found very confusing at first. Even more confusing, it relies on calling an "eponymous template" via UFCS. I mean, I'm somewhat used to seeing UFCS being used to call plain-old functions, but in this case I had to re-read section 7.5.1 of TDPL to refresh my memory about "eponymous templates"... good thing InputVisitor is a simple module, or I would have been totally lost. Might I suggest the following simpler definition instead?
InputVisitor!(Obj, Elem) inputVisitor(Elem, Obj)(Obj obj)
{
return new InputVisitor!(Obj, Elem)(obj);
}
To clearly connect the producer and consumer, I wonder if there is some way that the consumer (call site) could use the same name as the producer (callee)...
Lastly, I wonder why fiber-switching is slowish. I thought I'd try to understand the process in a debugger, but I can't seem to step into standard library functions... you wouldn't happen to know how I could get a debug build of Phobos that I could step into with VisualD?
"Since async/await is THE new feature of C# 5.0"
Interesting. I've been pretty out of the C# loop for a while (uhh, no puns intended), so this is the first I've heard of async/await.
"The caller in main() must call "foo.inputVisitor" instead of "foo.visit" which I found very confusing at first."
Yea, I don't know of any way to get around that though. There needs to be the one function "visit" to yield the values, but then you still need something else ("inputVisitor") to actually return the range.
I guess maybe you could overload the functions, name them both "visit", but then you have two "visit" functions that do very different things. Plus you'd be calling visit with no arguments, but then it executes a "visit" *with* one param. And you'd have to remember not to directly call the "visit" that takes an arg. So that could be confusing, too. (But maybe less confusing?)
"Lastly, I wonder why fiber-switching is slowish."
My understanding is it's because it has to save state, which is slower then merely incrementing a counter as in an ordinary for loop, or calling/returning-from a function as with opApply or non-fiberous ranges.
"you wouldn't happen to know how I could get a debug build of Phobos that I could step into with VisualD?"
You should be able to recompile DMD/Phobos/Druntime all in debug with DVM ( https://bitbucket.org/doob/dvm ):
>dvm compile --debug [path to either git or release sources of dmd/phobos/druntime]
I don't whether that's enough to step through it in VisualD though, I haven't tried. Although actually, on some platforms (forget what) that might not do *all* three DMD/Phobos/Druntime in debug, because some of the makefiles are kinda shitty/inconsistent and don't offer a debug target.
Without DVM, recompiling druntime/phobos is pretty easy. First make sure your PATH is set up so "make" invokes the make from DMC. And if it's a git clone of phobos/druntime, then do:
>cd druntime
>make -fwin32.mak DFLAGS=flags-to-pass-to-dmd
>cd ..\phobos
>make -fwin32.mak DRUNTIME=..\druntime DFLAGS=flags-to-pass-to-dmd
>copy phobos.lib {wherever}
The testInputVisitor is now on GitHub and DUB:
GitHub: https://github.com/Abscissa/libInputVisitor
DUB: http://code.dlang.org/packages/libinputvisitor