Sunday, July 6, 2008

So what happened (the multi-key CouchDB question)?

So, if any of you have been following along with my fun ride upon the cozy cushions of CouchDB, you'll be happy to hear there is hope (well, some - I had to pick up a book on Erlang).

Anyhow, if you follow the link here you will see that there was a list of proposed fixes for my desire to pass in multiple keys to CouchDB instead of just one. Granted, yes it could get out of hand, you could pass in 1,000,000 keys. But so what! Either way you're going to have to query for them, whether it's a query after a query a billion times or if you can somehow lump them together.

So as you can see there were a few options:

1. We could execute our query with each key each time and lump them together. The downside of this is that I really don't want Ruby to have to lump all of those results together, I can only imagine the paging for it would get incredibly *blah*. Plus the memory usage for Ruby to merge all of those DM::Collection objects together would likely be the equivalent of a Chihuahua passing a peach seed. Not pretty.

2. We could use a full text search using Lucene, which CouchDB already has implemented. I suppose the only downside of that would be...??? I'll need to think of something, it's really not a bad idea, it just may come to me later. The extent of my full text search capabilities have been with MySQL and...oh, hold on I keep laughing, it sucked. bad. I really hope CouchDB's is better - I'm sure it is, I haven't seen anything so far that is crap!

3. We (read: Me {Bradford}) hack at the CouchDB internals and allow for views to accept multiple keys for comparison rather than one. I'd like to see this happen via something to the effect of:

A single key query URL
http://.../_view/foo/bars?key="mykey"

A Multi-key query URL
http://.../_view/foo/bars?key[]="mykey1"&key[]="mykey2"
OR
http://.../_view/foo/bars?keys=["mykey1","mykey2","mykey3"]

I'd like to keep it along the lines of form control arrays key[] this way you could just analyze the parameter itself to see what it was (an array or a string) and do the appropriate things with it. Whereas adding a parameter keys may need to be repeated throughout the code (Not likely, but the former makes sense to me most).

I'd like to extend a warm welcome to anyone willing to walk the lines of CouchDB's internals with me in order to implement such functionality, I promise to keep you in line as long as you return the favor. As I said, I'm JUST NOW learning Erlang and am horribly under-qualified, but hey, I really feel strongly about this functionality, so, what the hell, right?

Bring on the opinions!

1 comment:

JanL said...

Heya Bradford,
maybe the -dev list or #couchdb are better places to ask for help :-)

I might be able to give you some hints about where to start.

Cheers
Jan
--