Friday, July 25, 2008

CouchDBXapr

Ok, so I've been working on a Xapian implementation for CouchDB to feed off of in Ruby. Just a quick note for anyone who is attempting the same.

1. On OSX, everything works great, the bindings install to their proper places. All is well with the world.

2. However, in production mode, on a Gentoo server, running make check reveals that we are unable to find our libxapian.so file.

SO, in order to remedy this and to save anyone else looking for this fine obscure error. Run make install...

You'll then notice any of your Ruby scripts are unable to process anything that even mentions the word 'xapian'.

Then, simply do the following:

cd /usr/local/lib
cp libxapian.so.15 /usr/lib64/

et voila!

You are now ready for some fulltext goodness and if you're like me are now ready for some much needed rest.

Thursday, July 10, 2008

Integrating with UrlZen

For my twitter-like application (mumbl - not yet released) I've decided to give users the ability to shorten their long URL's for their mumbls as they're typing them using UrlZen.

Here's the Ruby implementation - I'll post a follow-up containing the jQuery portion. basically to get around XSS I call a URL on my system which then calls up UrlZen

require 'mechanize'
agent = WWW::Mechanize.new
@url = params[:url]
@new_url = agent.get("http://urlzen.com/shorten/?autocopy=true&url=#{@url}").search("//p[@class='short_url']//strong").inner_html

Sunday, July 6, 2008

So what happened (the multi-key CouchDB question)?

So, if any of you have been following along with my fun ride upon the cozy cushions of CouchDB, you'll be happy to hear there is hope (well, some - I had to pick up a book on Erlang).

Anyhow, if you follow the link here you will see that there was a list of proposed fixes for my desire to pass in multiple keys to CouchDB instead of just one. Granted, yes it could get out of hand, you could pass in 1,000,000 keys. But so what! Either way you're going to have to query for them, whether it's a query after a query a billion times or if you can somehow lump them together.

So as you can see there were a few options:

1. We could execute our query with each key each time and lump them together. The downside of this is that I really don't want Ruby to have to lump all of those results together, I can only imagine the paging for it would get incredibly *blah*. Plus the memory usage for Ruby to merge all of those DM::Collection objects together would likely be the equivalent of a Chihuahua passing a peach seed. Not pretty.

2. We could use a full text search using Lucene, which CouchDB already has implemented. I suppose the only downside of that would be...??? I'll need to think of something, it's really not a bad idea, it just may come to me later. The extent of my full text search capabilities have been with MySQL and...oh, hold on I keep laughing, it sucked. bad. I really hope CouchDB's is better - I'm sure it is, I haven't seen anything so far that is crap!

3. We (read: Me {Bradford}) hack at the CouchDB internals and allow for views to accept multiple keys for comparison rather than one. I'd like to see this happen via something to the effect of:

A single key query URL
http://.../_view/foo/bars?key="mykey"

A Multi-key query URL
http://.../_view/foo/bars?key[]="mykey1"&key[]="mykey2"
OR
http://.../_view/foo/bars?keys=["mykey1","mykey2","mykey3"]

I'd like to keep it along the lines of form control arrays key[] this way you could just analyze the parameter itself to see what it was (an array or a string) and do the appropriate things with it. Whereas adding a parameter keys may need to be repeated throughout the code (Not likely, but the former makes sense to me most).

I'd like to extend a warm welcome to anyone willing to walk the lines of CouchDB's internals with me in order to implement such functionality, I promise to keep you in line as long as you return the favor. As I said, I'm JUST NOW learning Erlang and am horribly under-qualified, but hey, I really feel strongly about this functionality, so, what the hell, right?

Bring on the opinions!

Thursday, July 3, 2008

It's all about document design with CouchDB

Hello everyone, been stalking the mailing list for a while and thought this might be worthy of a post as I was asked to solve it, yet, I couldn't!

Let's take the classic blog example document with the following fields/values:

"_id": "1f2fc3955b91aed5e7369f0b0ba8214e",
"_rev": "1226709986",
"Author": "Bradford",
"Type": "Post",
"Body": "Just mentioning this for a sample blog post.",
"PostedDate": "2008-07-02T23:22:12-04:00",
"Subject": "My Fine Blog Post",
"Tags": ["octopus","hockey","squidward","bradford","recreation"]

Next, I'd like to find each blog post that contains ANY of the following tags ["octopus","hockey"]. Now, generally speaking this isn't so bad. We could write a simple view:

function (doc) {
if (doc.Type == 'Post') {
for (var i = 0;i < doc.tags.length; i++) {
emit(doc.tags[i],doc);
}
}
}

We would get back each one of our tags as a key, yea? Only if we supplied one at a time. So how does one go about supplying a range, array (not sure what we'd call it here) of keys to be searched on? http://...?key=["octopus","hockey"] maybe? I'm unsure of the plan of attack for such a thing. Maybe I'm just going about it in the wrong direction. Any thoughts?