Where’s all the time going – CPU time that is?

Posted on 9 Oct ’12 by Earl

©Meandering Passage - Earl Moore Photography
Among the rocks, mountain stream, SR 215, NC

 

This website/blog is possessed!

That’s about the only answer I have left after trying for five days to discover why it’s using so much CPU time  on my shared host.  And of course, my hosting provider, rightly so, is wanting me to find that answer or else they will have to limit resources or pull the plug on this site.

So, Meandering Passage is running bare bones just now. There’s not even a contact form available at the moment.

I’m down to only two WordPress plugins, Akismet Anti-SPAM & WP Super Cache;
I’ve stopped the internal WP-CRON from running on each page load and now it’s manually scheduled to run once every six hours; and
I’ve changed to the previous theme which I kn0w ran before with no problem.

The only thing I’ve been able to find suspicious after reviewing the site stats was a web crawler “80legs” generating a disproportionate amount of hits and bandwidth. I’ve blocked it in my robot.txt file but it may still be crawling the site.  I’ve exhausted what little knowledge I know of this “stuff.”  I’m out of ideas.

My host, ICDSoft, and their support staff have been top notch in supporting and working with me on this problem.

ICDSoft Support  just notified me they made a change in my .htaccess file to deny/block the “80legs” bot from my site.

Now we wait a couple and see what happens — to see if Meandering Passage remains “among the rocks” as well.

Update: 10/11/2012:

After two days it’s clear the “80legs” web crawler was the culprit for the high (3x over the limit) CPU usage of my site.  I don’t know why “80legs” was crawling my site at such a high rate.  I don’t believe it’s anything I did knowingly and if that’s the case this could happen to almost any site.  In my opinion, even if “80legs” is distributive web crawling as they claim it doesn’t react in a timely and responsible manner to instructions in the “robots.txt” file.  Here’s someone else talking about this issue.

Below is a screenshot of the CPU time usage chart.  The web-crawler was blocked on Oct 9th resulting in a marked difference.

What Others Are Saying

  1. Paul 9 Oct ’12 at 10:51 am

    Good luck with this, Earl. I hope that they figure out what the heck is going on, soon.

  2. Earl 9 Oct ’12 at 11:09 am

    Thanks, Paul. Nothing I’ve done to this point has have any noticeable effect…which I’m interpreting means my internal blog was/is using only a small percentage of the total CPU time. I do think my genuine efforts and robust communications with the support staff has keep them from pulling the plug — they know I’m trying everything I can at this end. It takes a day for the stats to show any results — waitings the hard part.

    I’m hoping the step support just took to block that dang “80legs” web crawler will solve this mystery.

  3. Tom Dills 9 Oct ’12 at 12:37 pm

    I sure am glad you know what all that stuff means! Although I do have Akismet on my own blog, so I know what that is. I guess that’s why you do that stuff for a living. :)

    • Earl 9 Oct ’12 at 4:15 pm

      Tom, some of this “stuff” in regards to WordPress and hosting installations I’ve only learn about in the last 5 days. Like your on-line detective work you wrote about recently, it’s amazing what you can fine with some applicable key word searches. :-)

  4. ken bello 9 Oct ’12 at 4:51 pm

    I’m glad you put your best man on this job, Earl. Here’s hoping you get this straightened out soon.

    • Earl 9 Oct ’12 at 11:09 pm

      Ken, I hope it’s straightened out soon too…I grow weary of this. Thanks!

  5. Monte Stevens 9 Oct ’12 at 10:10 pm

    You’re above my head and talking a foreign language to me. What the heck is a web crawler with 80 legs suppose to be doing? Sounds like a big centipede on the loose. Good luck with it, Earl!

    • Earl 9 Oct ’12 at 11:13 pm

      Monte, Well it’s just what it sounds like — it crawls from link to link on the web recording and/or downloading information the same as all the other bots or “spiders” do, including those from Google and Microsoft. Only this 80leg one seems to overload and over visit much as an unwelcome guest.

  6. NR | ExP 10 Oct ’12 at 9:37 am

    I had similiar issues about a year ago with another WordPress site that I use to run for someone. I found out that there was a vulnerability in one of the light box plug-ins that was being used for WordPress that allowed outside to inject malicious code. The code used the server run some scripts causing a lot of excess CPU usage. Luckily, I was able to get an old backup of the site before the code injection and did a clean install of WP / new SQL DB and restored from the clean back up. Just for safe measure, I also changed the installation folder in the root diretory and added some very robust security settings:

    * locked .htaccess
    * changed admin login link
    * removed error message on the log-in page
    * 5 try site lock out
    * all versions # removed from pub view
    * removed “Admin”

    I don’t assume to know that security is an issue here, but certainly with some of the holes that’s associated with WP it’s definitely worth locking up the site, if you haven’t already.

    • Earl 10 Oct ’12 at 10:27 am

      Hi NR,

      Thanks for the info and it’s good to hear from you again. A couple of years ago someone tried to hack this site and at that time I implemented many of the security options you listed. During this episode I did check for the light box vulnerability and even had an outside source scan my site for any additional malware or malicious scripts — nothing found.

      Good news is after blocking that strange “80legs” web crawler/spider, yesterday’s CPU and DB resource time used fell well back into safe limits. I’ve activated a few of my “needed” plugins and I’ll be monitoring it closely for a few more days but I think that was it. I don’t know why that particular search spider was hitting my site so hard but blocking it in the robots.txt file didn’t stop it. The difficult part, it was almost invisible unless you dug in the logs and stats.

  7. Michael Aulia @CravingTech.com 18 Oct ’12 at 12:35 am

    My blog has been experiencing a crazy CPU spike this past month as well and the hosting support points it to Google bots (and possibly others as well)

    So you tried to block them in robots.txt to no avail but success on the .htaccess? Can you open your htaccess file and tell us the line? This way we can implement it on our end as well
    Michael Aulia @CravingTech.com recently posted..Samsung Galaxy S3 ad – the next big thing is already here

    • Earl 18 Oct ’12 at 6:59 am

      Hi Michael…I would be careful about blocking the Google Bots as you’ll loose placement in Google searches and therefore traffic. I would check for the “80legs” spider as it seems to be a real abuser. In any case below is what was added to my .htaccess file, at the very end, to block the 80legs bot.

      SetEnvIfNoCase User-Agent “80legs” badBot
      Deny from env=badBot

      I can’t say if this would work on your host or not.

    • Michael Aulia @CravingTech.com 20 Oct ’12 at 10:17 pm

      Yeah, I was being assured that it wouldn’t change anything in terms of ranking, SEO, and how fast my new contents get indexed, but I’m still a bit skeptical.

      The hosting did find 2 specific bots (not 80legs) so I blocked their IP on the cPanel. I haven’t received any suspension notice yet (only been a few days though). We’ll see :) thanks
      Michael Aulia @CravingTech.com recently posted..Samsung Galaxy S3 ad – the next big thing is already here

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>