Google crawling pages disallowed by robots.txt

posted by Brian Search No Comments »

A few weeks ago, we were completing the development of a tracking application for a website.  Basically, this tracking application exists on dynamically generated pages of the website that have the following structure:

www.mydomain.com/products/track/

It’s basically just a little tool that logs the user’s ip address, the item they clicked on, and then automatically redirects them to a vendor that sells that product.  The user never even knows they’ve visited the page.  To them, it’s a seamless transition from the item they clicked on to the vendor’s website.  It’s there to help us track user behavior, learn how to make the website better, and keep the vendors who are paying us for those referrals honest.

Now, obviosuly there’s no reason for a search engine to need to index these pages.  There’s no useful content there at all.  So, we made use of robots.txt to tell the search engines that there is no reason to look at those pages.

What is robots.txt?

Robots.txt is simply a file that can be placed on a website to notify automated “crawlers” that there are certain parts of the site that should not be visited.  It makes use of the robots exclusion protocol.  When an automated crawler (such as Googlebot) visits a website, it looks at the robots.txt file to see if there are any pages it should not vist.  Not all automated crawlers pay attention to robots.txt, but the major search engines claim that they do.

Using robots.txt, we told the crawlers not to visit any pages in the “track” folder.  That worked out really well.  A few weeks later, we decided that if a user rolled over a link and saw the word “track”, they might get spooked and wouldn’t want to click on that particular link.  People don’t really like the idea of being tracked.  So, we decided to change the structure of the tracking application to the following:

 www.mydomain.com/products/buy

This naming convention seemed much more inocuous and was in line with what the user was trying to do.  We updated the robots.txt file to reflect these changes and uploaded the changes.

Here comes Googlebot!

Much to our surprise, a few hours later, we started to see a lot of clicks coming from the same ip address.  Thinking I had a rogue Chinese robot on my hands (that sounds silly but it has happened before), I looked up the ip address.  Lo and behold, it belongs to Google!

Throughout the day, I watched as Googlebot clicked on item after item with a frequency of roughly every 2 minutes.  I rechecked my robots.txt file.  It should have been blocking this activity.  I logged into my Google webmaster tools account and found the problem:

Google downloads robots.txt about once every 24 hours.

This particular website’s robots.txt file had been downloaded earlier in the morning.  Even though these were new files, the protocol is an exclusion protocol.  Since these files were not listed in the file Google had cached, they were fair game.  A few hours later, Googlebot called in reinforcements.  The website was now getting hit by two different Google ip addresses with a frequency of roughly every hour.  Unfortunately, they didn’t bring their credit cards.  They kept going until about 1:00 am the next morning when the new robots.txt file was finally downloaded and cached.  In total Google crawled and indexed a little over 1000 pages of content that was blocked using robots.txt.

The funny thing is that these pages were actually indexed.  I searched and found them a week later.  They were all indexed with the content of the landing pages on the vendor’s sites.  So, we inadvertantly pulled off a decent sized cloaking operation - something that is expressly against Google’s quality guidelines.  I sweated it for a while, but there doesn’t seem to be any negative effects on the site’s rankings.

So, the lesson is that if you’re going to upload pages that you don’t want a search engine to crawl, you should disallow those pages in the robots.txt file and make that file available at least 24 hours before you upload the actual files to the website.  If you have a Google webmaster tools account, it’d be a good idea to log in and see which version of the robots.txt file is in Google’s cache.

I thought the saga was over, but a few days later a few of the pages were crawled by Googlebot again.  In this case, it was only about 5 pages, so it may have been a small bug in the system, or perhaps even a Google employee hand checking things.  In any case, the pages are still in the index.

The Art of Creating a Logo

posted by kelly Marketing 1 Comment »

There are lots of things to consider when creating a logo to represent a client. A successful logo must reflect the client’s business, represent the values and personality of the company and, of course, look great. At R-Design we appreciate the important role creating the perfect logo plays in effectively building a brand. We try to follow our logo design process diligently, and would like to take you on a “tour” of a routine logo design recently done by our team.

One of Raffi’s favorite marketing theories is something she picked up at a seminar given by the head of marketing for Starbucks. I won’t take the time to tell the whole story here, but the nuts and bolts of his theory is this:

* Think of several words that that describe your company, what makes it unique and what you stand for (the speaker suggested 5).
* Make sure that every piece of your marketing reflects ALL of these words.
* This will ensure that your marketing efforts will present a cohesive brand identity.

We try to incorporate this thinking into creating logos and other identity pieces for our clients. This leads us to step one of our logo process.

STEP ONE: Who is the Client?
As mentioned above, a successful logo needs to do more than simply look pretty. It needs to tell consumers who you are and what you do. So our first step is to find out just who we are creating the logo for. During our initial meeting we will get answers from the client about who they are, what they stand for and what they need people to know about them. Quite often we do this by helping the client come up with their “5 words”.

PROJECT: Create a logo for a small new company in MA, “Little Mangos”.
Little Mangos is an in-home cooking service that comes to your kitchen and prepares all natural baby food (no preservatives) which can be frozen and used over the course of several months. So what is Little Mangos all about? INFANTS & PARENTS. HEALTH CONSCIOUS. NATURAL.

STEP TWO: Who is the Audience?
Once we know who you are, we need to find out who your customer is. Who is your target audience? What is important to them? How can you make their life easier? What are their fears? We explored this with Little Mangos by considering who will be interested in the product (health conscious parents) and who will be willing/able to afford the specialty service (higher income families).So we determined that their niche market is wealthy new parents who are concerned with health issues. In many cases they are families with both parents working - or they would be cooking this natural baby food themselves. Or possibly they are busy at-home moms who do not know how to cook or can not find the time to cook. These are the people we need to appeal to.

STEP THREE: Client’s Design Ideas
Once we have established the “WHO” we move on to the “WHAT”. What is the name of your company? What is your product or service? Do you have any particular imagery in mind? How about a tag line that you would like incorporated into the logo? What colors best represent your industry?

The comments we received from Little Mangos were, “My company is called ‘Little Mangos’. My idea is to make a Mango out of the letter “O” for part of the logo. A phone call narrowed down color choices to shades orange, green and brown. These colors are all natural, and are the colors found in mangos. Armed with a good idea of the image they wanted to project and some thoughts on what they would like the logo to look like, we move on to step four.

STEP FOUR: Design Rough Drafts
A designer will spend time coming up with a number of different design ideas. The amount of time this takes, and the number of comps we come up with will depend on how much specific direction we are given from the client. The first-round ideas for Little Mangos can be seen below. These were then sent to the client for feedback, with a note that a more detailed drawing of a mango could be added later.

Round One Logo Comps

STEP FIVE: Early Revisions
The round one comments from the clients were that she was most drawn to the last one in row one (with the mango O), but would like to see it with some different fonts . Our office favorites were the ones in the middle row, enclosed in the oval.

Once the client has chosen a general design (or a couple) that they like, we will fine tune those specific comps based on the feedback we get from the client and each other. The early revisions on the Little Mangos designs are below.

First Round Revisions

STEP SIX: Final Revisions
Once a specific design of the logo is chosen, we do the final tweaking of colors, positioning and fonts. Of the early revisions seen above, the client preferred the layout of the top left, but the font of the top right. The final comps are shown here - with a new idea thrown in.

Round Three Revisions

STEP SEVEN: Final Logo Selection
A phone conversation brought the discussion to specific items within the logo choices that the client liked and did not like. The Little Mangos final logo choices can be seen below. The final logo selected by the client is the one on the top.

Final Logo Selection

Upon completion of the logo design, the client should be left with a logo that they love and are excited to put on everything. Make sure you have a vector version of your logo so it can be resized and still look great on printed material (business cards) - packaging (like baby food jar labels) - and billboards. After all, a logo is the first impression that people get of a company!