I noticed an odd direct traffic spike in Google Analytics.
A fairly large site I monitor started seeing some suspicious traffic around Jan. 26, 2012. Google Analytics reported a direct traffic spike that I initially attributed to a media mention. When I probed into it further though I noticed some oddities. The traffic was exhibiting an overly high bounce rate (over 90%) and the traffic spike was limited to Internet Explorer users (all versions). The strange traffic was being reported as all new visitors and was limited to three pages on the site (the main page, and two pages that are off the beaten path). I setup an advanced filter for this signature and the bounce rate was nearly a 100%.
Is my site being attacked by a botnet?
At this point I began thinking it was some sort of botnet attack. The visits matching the signature were spread out geographically according to Google Analytics although they were primarily coming from Canada. Since it was just a thousand or so visits, I thought it would go away on its own. Several days later it mushroomed to 40,000 unique visits matching the signature, this time primarily coming from the US, still geographically spread out though with no dominant network source. Another oddity--the three pages being visited had a neat one third access distribution across each one. How do a 150,000 random people achieve that randomly?
But, do botnets process Javascript?
Now I am second guessing the botnet assessment. From what I've read so far, botnets don't process javascript. If it were a traditional botnet, why would Google Analytics' (and Quantcast's) javascript tracking code be triggered?
Why would a Denial of Service attack limit itself to one request?
On the denial of service attack line of thinking, I find it odd that each computer makes only one request.
Is this a new website attack?
I currently have more questions than answers. The attack / odd traffic continues. As I have not been able to find anyone else with this issue I thought I would start a post about it. Please chime in if you have thoughts, ideas, or are experiencing the same thing. I will post back as I learn more.
---2/13/12 Update---
Traffic matching the signature is starting to subside. I've done some analysis of raw access-log files and discovered a few things:
1. IPs made more than a single request. Several sample IPs I explored made 20+ page requests a day distributed across the three urls. Each one had an empty referrer. They must not have loaded cookies because Google Analytics saw each request as a new visit.
2. Whatever was making the page requests also sent GET requests for css, js, and graphic files referenced in the page.
3. I saw two oddities in the logs when exploring IPs making requests that matched the signature:
- One IP which made a number of calls to the pages had mixed in among mostly empty referrers this suspicious referrer: http://92zvns0kany7-zitmd.com/ (DO NOT VISIT THIS SITE) followed by a long string of numbers / characters. I did a whois lookup and found it was registered to Club Freedom, Yamir Jayantilal in India. The IP it is hosted on is in Latvia. When I did a Google search on the name server "cnmsn.com" I discovered this site that keeps a log of urls associated with malicious activity. I didn't find the specific URL but I did find a series of similar urls (i.e. wcrb8t2r06ufigd.com DO NOT VISIT THIS SITE) listed as "malware calls home." These domains were registered to the same person / club as the one I noted in the logs.
- Another abnormal series of entries for an IP that made calls to the same pages also had entries for GET requests to: /crossdomain.xml and /text/javascript
I am not sure how these two relate to the rest of the requests because I am not finding other IPs with the same oddities. On the other hand I am wondering if this could be some type of malware related activity. Are malware infected computers attempting to phone home to my website? I've done a comparison of the three pages to the file versions stored in our Git repository and am not finding any discrepancies.
---
2/19/12 Update---
Just when I thought I could post that the non-human traffic had officially disappeared... it returned. This time it is just hitting one of the three pages it previously hit. I added a graph above from Google Analytics showing just the traffic that matches the signature of the odd traffic.
---4/21/12 Update---
An updated on the Unidentified Non Human Web Traffic (AKA:
Zombie Robots,
Cyborg Attack). I wish I could report the traffic has gone away but it just won't die. Here is the latest screenshot from Google Analytics:
It appears to be somewhat cyclical. Every three to four weeks there is a new 3 day rise in traffic that matches the pattern followed by gradual drop off and a lull.
Other people are reporting the same odd traffic now:
One interesting solution others have used to protect advertisers from inflated impressions is to wait to load ads until some type of user activity is detected via javascript like a mouse move.
We started capturing headers for traffic that matched the fingerprint of the pattern. Here is a sample of a few (cookie info and host removed) :
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US;q=0.5
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Accept-Encoding: gzip, deflate
Host: www.##########.com
Connection: Keep-Alive
Cookie: #############
Accept: */*
Accept-Language: it-ch
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET
CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Accept-Encoding: gzip, deflate
Host: www.##########.com
Connection: Keep-Alive
Cookie: #############
Accept: */*
Accept-Language: en-ca
UA-CPU: x86
Accept-Encoding: gzip, deflate
Cookie: #############
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR
3.0.04506)
Host: www.###########.com
Connection: Keep-Alive
Common traits are that the headers are fairly simple and contain no other cookies aside from one we set as part of the traffic control process we've been using to identify and roadblock this strange traffic. Normal users tend to have a string of GA related cookies show up in the header info. I am no http header expert. If you are and see something odd about these headers let me know!
I am wishing there was some expert to turn to for unraveling this mystery. It seems some big names like Google and Microsoft as well as security / anti-virus companies and major ad networks would want to be on top of this. Unfortunately it is very difficult to reach upper level people at these companies that would recognize that this merits some review. Something is flying under the radar that has the potential to morph into far worse than a traffic nuisance. Since it started we've had 3/4 of a million unique visitors that match the fingerprint--all of which fool traffic analysis programs into thinking it is a human request. Again, most of these are unique IP addresses spread out geographically and from a variety of networks. It is impossible to block this on the network or IP level.
Hoping for better news to report in the next update.