Last week on SMX West a Google rep presented on URL structures (can somebody get me those slides). One part of her presentation was about MAVRICK URL’s, and in particular the long and complicated url’s you sometimes see on the Interwebs. i.e. used in her presentation:
http://shop.ebay.com/items/_W0QQ_nkwZipodQQ_armrsZ1QQ_fromZR40QQ_mdoZ
The URL without the W0QQ double encoding would look like this:
http://shop.ebay.com/?_from=R40&_trksid=m38.l1313&_nkw=ipod&_sacat=See-All-Categories
If you search for the keyword W0QQ, the top result is a forum from 2006;
…sometimes non-coders pose the darnedest questions.
My housemate today asked me why “all the URLs on shopping sites suddenly have W0QQ at the end of them?”
To this, I had to say “I have absolutely no clue” and, curiosity peaked, I am afraid to say twenty minutes later I still have absolutely no clue. You can’t really google it — you’ll be saturated with results. Those few places I found it embedded in code were dead ends.
So if you’re bored and up for a worthy challenge, try this one out.
What the heck is w0qq?
Funny that already in 2006 somebody was asking this question. The answer is quite simple;
W0 demarcates the start of data and the QQ is a delimiter (think about it, can you imagine an instance where a word or a piece of data would have two capital Q’s next to each other?)
In 2004 search engines were not smart enough to read dynamic URL’s. Especially those URL’s that had a lot of parameters in them to determine sort order or aspects of the product search for shopping sites were a problem to get these indexed. Replacing the dynamic parameters like & or ? with static delimiters was one technique back in the days to make a dynamic URL static for the search engines to crawl.
Now fast forward to 2009, Search Engines have become much smarter and are now able to understand dynamic URL with parameters much better. Last week they even announced their new canonical tag to help website owners to avoid duplicate content issues when it comes to sort order.
When you already have URL’s which include the W0QQ, it becomes a priority to get these removed. As the W0QQ can hurt your business rather than help you, the machine of the big company should start cranking out cleaner URL’s.
However, changing something in an existing infrastructure of a HUGE site can be difficult. A project needs to be booked, which competes for recourses with other projects which might have a higher expected ROI. Before you know it, the project will get funded only to roll out somewhere in Q4 – 2010, due to the lack of clarity on the impact the bad URL’s might have.
I’m all for “prettyfying the web” and will work hard to get the W0QQ legacy fixed. If only I had some more guidance on what the impact is when the W0QQ is not removed.
BTW> I’ve been trying to get the W0QQ.com domain, which appears to be free but still sits with TuCows. We have great ideas for T-Shirts etc. W0QQ is a legacy of the old Web 1.0 which should have a place in Internet history!
I think you got the slides now 😉
Here is the link again :
http://googlewebmastercentral.blogspot.com/2009/08/optimize-your-crawling-indexing.html
Cheers
Charles