0 Comments

Setting up CI to Azure Websites is extremely easy and convenient. Whether you choose to do it from Git or Visual Studio Online, having a staging deployment slot makes this absolutely dead simple to ensure that you’re putting high quality code into production so that a bad check in doesn’t affect live users. The problem is that search engine bots are super greedy and manage to start crawling your staging site, which you definitely don’t want - then real users may end up there. There are a few easy options you have to address this, and they aren’t very well documented at the moment.

The first trick is getting the name of the site you’re on, which you can do with the following:

Environment.GetEnvironmentVariable(“WEBSITE_HOSTNAME”);

If you’re on the staging site, this will return a value ending in -staging.azurewebsites.net. So now you need a place to check for this.

I chose to do this with an HTTP Module, partly because I’m old school and partly because I want this check to happen before it gets into any ASP.Net work that ultimately won’t be necessary. The key bits are this:

string CurrentEnv = Environment.GetEnvironmentVariable(“WEBSITE_HOSTNAME”);
if (!string.IsNullOrEmpty(CurrentEnv) && CurrentEnv.ToLower().EndsWith(“-staging.azurewebsites.net”) && IsBot()) // Redirect to main site

There’s a second trick to get this working - it will stick if you release this and swap slots at this point. Go into the latest Azure portal (portal.azure.com at the time of writing) and add an app setting to both the production and staging slots, making sure you check the Slot Setting box. This forces the site to restart before it swaps to production. When you don’t have the sticky slot setting, the site doesn’t restart, and continues to get the staging value back from WEBSITE_HOSTNAME while in production. Name your sticky app setting whatever you like, it only matters that it is there. You can also set it via Powershell, if that’s more your speed.

To see this in action, use something that allows you to change your user agent to Googlebot and visit: http://stagingbotredirector-staging.azurewebsites.net/

For full code, visit the Github repo for the site above. And if you do use the HTTP module method, be sure to register the module in the web.config in the system.webServer modules section.

Sources:

1. See the Deployment Slot App Settings/… section

2. Detecting Honest Web Crawlers