Friday, February 10, 2012

A Strange BotNet that Processes Javascript? Signature: Windows / Internet Explorer, New Unique Visitors, Direct Navigation to 1 of 3 pages (evenly divided in thirds), 100% Bounce Rate

I noticed an odd direct traffic spike in Google Analytics.
A fairly large site I monitor started seeing some suspicious traffic around Jan. 26, 2012. Google Analytics reported a direct traffic spike that I initially attributed to a media mention. When I probed into it further though I noticed some oddities. The traffic was exhibiting an overly high bounce rate (over 90%) and the traffic spike was limited to Internet Explorer users (all versions). The strange traffic was being reported as all new visitors and was limited to three pages on the site (the main page, and two pages that are off the beaten path). I setup an advanced filter for this signature and the bounce rate was nearly a 100%.

Is my site being attacked by a botnet?
At this point I began thinking it was some sort of botnet attack. The visits matching the signature were spread out geographically according to Google Analytics although they were primarily coming from Canada. Since it was just a thousand or so visits, I thought it would go away on its own. Several days later it mushroomed to 40,000 unique visits matching the signature, this time primarily coming from the US, still geographically spread out though with no dominant network source. Another oddity--the three pages being visited had a neat one third access distribution across each one. How do a 150,000 random people achieve that randomly?

But, do botnets process Javascript?
Now I am second guessing the botnet assessment. From what I've read so far, botnets don't process javascript. If it were a traditional botnet, why would Google Analytics' (and Quantcast's) javascript tracking code be triggered?


Why would a Denial of Service attack limit itself to one request?
On the denial of service attack line of thinking, I find it odd that each computer makes only one request.

Is this a new website attack?
I currently have more questions than answers. The attack / odd traffic continues. As I have not been able to find anyone else with this issue I thought I would start a post about it. Please chime in if you have thoughts, ideas, or are experiencing the same thing.  I will post back as I learn more.

---2/13/12 Update---
Traffic matching the signature is starting to subside. I've done some analysis of raw access-log files and discovered a few things:
1. IPs made more than a single request. Several sample IPs I explored made 20+ page requests a day distributed across the three urls. Each one had an empty referrer. They must not have loaded cookies because Google Analytics saw each request as a new visit.
2. Whatever was making the page requests also sent GET requests for css, js, and graphic files referenced in the page.
3. I saw two oddities in the logs when exploring IPs making requests that matched the signature:

  • One IP which made a number of calls to the pages had mixed in among mostly empty referrers this suspicious referrer: http://92zvns0kany7-zitmd.com/ (DO NOT VISIT THIS SITE) followed by a long string of numbers / characters. I did a whois lookup and found it was registered to Club Freedom, Yamir Jayantilal in India. The IP it is hosted on is in Latvia. When I did a Google search on the name server "cnmsn.com" I discovered this site that keeps a log of urls associated with malicious activity. I didn't find the specific URL but I did find a series of similar urls (i.e. wcrb8t2r06ufigd.com DO NOT VISIT THIS SITE) listed as "malware calls home." These domains were registered to the same person / club as the one I noted in the logs.
  • Another abnormal series of entries for an IP that made calls to the same pages also had entries for GET requests to: /crossdomain.xml and /text/javascript 
I am not sure how these two relate to the rest of the requests because I am not finding other IPs with the same oddities. On the other hand I am wondering if this could be some type of malware related activity. Are malware infected computers attempting to phone home to my website? I've done a comparison of the three pages to the file versions stored in our Git repository and am not finding any discrepancies. 


---2/19/12 Update---
Just when I thought I could post that the non-human traffic had officially disappeared... it returned. This time it is just hitting one of the three pages it previously hit. I added a graph above from Google Analytics showing just the traffic that matches the signature of the odd traffic.

---4/21/12 Update---
An updated on the Unidentified Non Human Web Traffic (AKA: Zombie Robots, Cyborg Attack). I wish I could report the traffic has gone away but it just won't die. Here is the latest screenshot from Google Analytics:


It appears to be somewhat cyclical.  Every three to four weeks there is a new 3 day rise in traffic that matches the pattern followed by gradual drop off and a lull. 

Other people are reporting the same odd traffic now:
One interesting solution others have used to protect advertisers from inflated impressions is to wait to load ads until  some type of user activity is detected via javascript like a mouse move.

We started capturing headers for traffic that matched the fingerprint of the pattern. Here is a sample of a few (cookie info and host removed) :

Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US;q=0.5
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Accept-Encoding: gzip, deflate
Host: www.##########.com
Connection: Keep-Alive
Cookie: #############

Accept: */*
Accept-Language: it-ch
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET 
CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Accept-Encoding: gzip, deflate
Host: www.##########.com
Connection: Keep-Alive
Cookie: #############

Accept: */*
Accept-Language: en-ca
UA-CPU: x86
Accept-Encoding: gzip, deflate
Cookie: #############
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 
3.0.04506)
Host: www.###########.com
Connection: Keep-Alive

Common traits are that the headers are fairly simple and contain no other cookies aside from one we set as part of the traffic control process we've been using to identify and roadblock this strange traffic. Normal users tend to have a string of GA related cookies show up in the header info. I am no http header expert. If you are and see something odd about these headers let me know!

I am wishing there was some expert to turn to for unraveling this mystery. It seems some big names like Google and Microsoft as well as security / anti-virus companies and major ad networks would want to be on top of this. Unfortunately it is very difficult to reach upper level people at these companies that would recognize that this merits some review.  Something is flying under the radar that has the potential to morph into far worse than a traffic nuisance. Since it started we've had 3/4 of a million unique visitors that match the fingerprint--all of which fool traffic analysis programs into thinking it is a human request. Again, most of these are unique IP addresses spread out geographically and from a variety of networks. It is impossible to block this on the network or IP level. 

Hoping for better news to report in the next update.








6 comments:

  1. Same thing is happening to us. If you have any updates, I'd love to hear where you're at with this -- have you been able to distinguish this traffic from normal traffic?

    http://stkywll.com/2012/03/02/annoying-cyborgs-attach-distort-analytics/

    ReplyDelete
  2. Thanks for the comment Matt and linking me into others experiencing this. Nice to know I am not the only one dealing with this. I must not be crazy!

    Two and a half months later, we are still seeing this odd traffic. It shifted to landing on just one page. It seems to be slowly going away. We are down to 3-5K visits a day from the peak of 33K.

    Aside from setting up a page to intercept the traffic and qualify it for the purpose of protecting traffic quality for our ads, we didn't figure out anything else to do. We hoping if we ride it out long enough it will eventually go away.

    If you want to remove it from Analytics one way to do that would be to show a "We are experiencing unusual traffic volume notice" page to visitors that match the profile. Don't include the GA tag on the page. Provided a link to continue to the site for the few humans who see the page. This keeps the traffic from being logged. The trouble is that if you don't log it you wouldn't know when it was gone. Another option would be to trigger an event that you can use as a filter.

    ReplyDelete
    Replies
    1. I'm facing the same issue

      Would it be possible for you to share the code you're using to intercept and qualify the traffic?

      Delete
  3. Guys,

    Here's the solution we ended up implementing. It relies on the fact the the bad traffic appears to be devoid of DOM events.

    http://stkywll.com/2012/04/27/annoying-robots-a-solution-for-google-analytics/

    ReplyDelete
  4. Hi

    We've started seeing the same thing... it's generating about 3-5k visits per day, and most of it is from 3 locations in the US (Emeryville, Clearwater, Tempe).
    Traffic from all 3 locations started appearing on 31 August 2012.
    Each visit loads only 1 page, and has a session duration of zero second and 100% bounce rate.
    The pages visited are all over our site - it almost looks like the site is being crawled, but each visit only loads 1 page.
    The traffic is indicated as Direct in Google Analytics, but if I add Domain as a secondary dimension it shows as 'msn.com'.
    All of it is Windows, all has a screen res of 1024x768, all uses Mozilla 5 compatible UA.
    We thought it might be Pingdom or something like that but have stopped all those services and the traffic persists.
    We did launch an updated GA tag on the same day that we started seeing the weird zombie direct traffic.
    The tag update connects DoubleClick with GA to enable remarketing list creation via GA.
    http://support.google.com/analytics/bin/answer.py?hl=en&answer=2444872
    We do not run Adsense on our site at all.
    It would be interesting to know whether other people affected by this issue run the new tag or have Adsense displayed on their account.

    ReplyDelete
  5. Have had similar issues. In one case it was designed to show up in analytics reports with their link which was designed to get the admin to click on the link to go to their site. That was ONE. The others have been generating spikes in traffic that LOOK like they could be upticks in real traffic, but they all come from the same cluster of sources. Luckily, I do not do Adsense but could see how this might mess some folks up who do.

    ReplyDelete