Wednesday, June 9, 2010

Dynamicly-Aggregated Custom-Filtered Deal Alert Feed

I originally wrote this for the Pipes message board, but they have a 4,000 character cut off. That's right; this is going to take a long time, so you may want to get some snacks (Edit: I see now that the old forums have been completely wiped and replaced, so good thing I posted the article here). 

I hate Best Buy. Like a lot of folks, I've been a big fan of gadgets and computer equipment for around 15 years. But, since it's an expensive hobby, I just can't pay anything close to retail for the things I want. Since Best Buy has somehow managed to go from my ideal browsing place, to transcending MSRP into the yeah-you'll-pay-that-much-and-like-it stratosphere. As such, I can't walk in for more than a few minutes, anymore.

So, when I first found Yahoo! Pipes, I immediately knew what I wanted to do with it: make a big aggregated deals feed. Apparently, everyone else had the same bright idea, since there are approximately 27 million pipes that were just deal feeds.

However, the allure of aggregated feeds quickly fades as some of the most common problems rear their heads. Before a new deals pipe is even finished being created, it's being ignored by everyone, including its owner. Quickly, here are the biggest problems that I've seen with my attempts at making aggregated deals feeds:
Deluge of Content (Too many deals to parse if you don't check it every half hour)
Dead-pooled Content (Deal feeds notoriously disappear into 404s. So maintaining your list of deals becomes too much hassle and eventually you have a trickle of deals, if you're even paying attention to them anymore.)
Diverse Content (You can't find anything that you're specifically interested in when you're drowning.)
Dynamic Interests (Even if you did it right and filtered your deals, the filters need to be frequently swapped.)
Duplicate Content (A lot of resharing of the same deals)

I've run into these problems in all different forms many times when trying to make deal pipes for myself. I had basically given up on using pipes to solve deals problems, not because it was impossible, but it ended with frustrating output.

My solution at the time, was to make a Custom Search on Google for the deal sites that I knew of, which would at least let me specify what I was looking for. I wouldn't get notified of anything in Reader, but at least I could narrow the results down to what I was shopping for. I sent what I called the Strictly Deals search to a few of my friends in hopes that they might find it useful and more or less left it alone.

A couple of years later, I was discussing a new product with my friend, and he turned to searching for deals on it. Instead of opening my custom search engine, he opened up his Google Reader tab and entered the search term into the box there.

"Why aren't you using the Strictly Deals engine I made?" I asked.

He replied that he needed to know how recently a deal was posted. At that time, you couldn't filter Google results by the last 24 hours or last week, like you can now (it's now hidden in the Custom Search view).

My friend had been collecting deals feeds in his Google Reader folder for years, and he never read any of them. He would simply select that folder and Mark All as Read, then do searches for specific items. Since Google Reader sorts the posts by date, it solves the timeliness problem he was concerned about. However, it still required pro-active searching to find the deals.

So, when my friend asked me if Pipes could take his existing Google Reader "deals" feeds folder and filter it by specific keywords, I thought, "I'd never thought of that, but that's a great idea."

The benefits over my previous attempts were obvious. Using Google Reader as the deal feed list, the subscription process was easy to do and very intuitive. Further, the Pipe would allow passive monitoring instead of actively searching. I confirmed that I could pull out a specific folder as a feed and parse those posts to filter them.

I set about creating the pipes that would process the items and identify if the specified keyword were present. 

As for how to specify the keywords, I first turned to a familiar method I’d used in other pipes: the text input module with the user’s input as a comma-delimited string of keywords. But, thinking it through a little bit, using that method would have hampered the passive nature of the feed. If he were to buy one of the items, he would want to remove it from his list. In order to do that, he’d have to change it at the pipe’s main page and resubscribe to a new feed. There had to be a better way.

Besides being a gadgeteer, I’m also an avid spreadsheet user; I even manage my pipes to-do lists through Google Docs. So, naturally, my next idea was to put the list of search terms into a Google Spreadsheet. Just like with the deals folder, the spreadsheet would have to be public for Pipes to grab it, but it was otherwise a cakewalk. 

I recursively iterated against each search term to see if it could be found in any of the feed items, then output only the matches. I also prepended the term that was matched to the title, so it was less ambiguous why that item appeared. My pipe was finished, and I delivered it to my friend.

While working out some of the bugs from my new pipe and adding search engine features such as blocking items with certain words (via the minus sign), support for OR logic (one term or another) and support for phrases (via quotation marks), I set out to make it useful for my own deals. 

I don’t maintain a folder for tons of deals feeds like my friend, so I knew that I’d have to find a new way to provide the source. Since the Google Reader folder feed is simply a normal feed, I first tried adding feed URLs as a URL input module, alongside the Google Reader ID that I was using. This worked, but the only way to handle multiple feeds was to put it into another pipe. I wanted something more user-friendly so that non-Pipes users could take advantage of it.

I was looking at complex things like OPML files when, it hit me: if Google Docs was convenient for maintaining lists of keywords and phrases, why not keep the feed URLs in a Google Spreadsheet, as well. And, while I’m at it, why not put the search terms list and feeds list into one doc on separate sheets?

That turned out to be the sweet spot of usability for me, as it didn't require fiddling with any Pipes aggregation and many people could understand modifying the spreadsheet for both feeds and search terms.

To polish the experience off, I added an instruction sheet that explains more about what the user needs to do (such as making a copy of the Doc and setting it to Public) and generate the feed URL to paste into their feed reader.

We've been happily using the pipe for several months, now, and I've had some other people try it out who are interested in deals. No complaints about the functionality and I think it solves 4 of the 5 problems that aggregated feeds typically have: The Deluge of Content is solved by only permitting items that match what you want to see. Diverse Content and Dead-pooled Content are solved by the ability to freely add or remove the feeds you want. Your Dynamic Interests are satisfied by the easy-to-use spreadsheet system that lets you edit, add, and remove search terms with a powerful Google-like keyword parser. 

That leaves just Duplicate Content. Unfortunately, while I do remove duplicate matches if they share the same URL, if your different source feeds frequently mirror each other’s deals, you’re going to see multiple variations of the same content. There may be some better ways to do fuzzy logic on that, and I've considered that (such as regex matching the article price against a stored price to see if it matches for that search term), ultimately the time saved is minuscule, because I find myself just mentally skipping the few dupes I get as I’m scanning my deals feed.

It’s worth noting that there are some practical limitations to the pipe. For instance, the way that I’m checking for terms involves regex, which is not really able to handle large quantities of text in Yahoo! Pipes. That might change in the future, but luckily, deals feeds don’t usually wax poetic (except woot.com). It won’t break the whole pipe if text is too long, it just won’t match the too-long feed item. But, otherwise, there is nothing stopping anyone for using this for any other purpose, not just deals feeds. If you have any cool uses, let me know.

So, I don’t have to worry about Best Buy’s high prices (even though it should be criminal). I know a good deal when I see one, now.

If you want to try out the Custom Alert Feed pipe, access it through the spreadsheet interface (make a copy, so you can edit the sheets):

If you’re curious about the pipe(s) behind it (there are nine), look here:

Lastly, if you happen to be interested in the Strictly Deals custom search I made on Google, it’s here (the “&tbs=qdr:d” is what filters the results for the last 24 hours and puts a drop-down on top of the results page so you can modify the date range):