当前位置

自己编写的网站监控程序

James Qi 在 2017年6月5日 - 14:59 提交
内容摘要:偶尔会发现各Drupal系列网站的sitemap.xml丢失的情况,应该是在运行cron阶段把老的sitemap.xml删除了,但在生成新的sitemap.xml的时候因为网站数据量大导致mysql进......

  偶尔会发现各Drupal系列网站的sitemap.xml丢失的情况,应该是在运行cron阶段把老的sitemap.xml删除了,但在生成新的sitemap.xml的时候因为网站数据量大导致mysql进程被杀死或者其它超时等原因重新生成失败,我们管理员一般不会去特意检查这个sitemap是否存在,而搜索引擎会一遍一遍经常检查,遇到不存在报错的情况可能会存在相当长的时间。(解决办法:可以在Sitemap配置中的ADVANCED SETTINGS里面把Disable cron generation of sitemap files.勾选,或者运行drush vset xmlsitemap_disable_cron_regeneration 1)

  今天想到用以前编写的PHP巡检程序来检查,在博客中找到以前记录的需求《网站监控程序的开发需求》、《关于网站及服务器状态监控的笔记》,是2013年记录的,后来2014年自己编写了程序,虽然不是很完善,但也基本可用,已经进行了几年的站点巡检工作,这次干脆把源代码也贴出来:

<?php
/*
* 网站监控程序
* James Qi
*
* 运行办法:php main.php 参数1 参数2
* 参数1:必填,point_id,其中point_id是检测点参数,表示从这个程序检测哪些系列
* 参数2:可选,page_name,需要巡检的页面名称,例如sitemap.xml/robots.txt等,如果不填写则默认为空
*
* 备忘记录:
* 2013年11月:做了初步需求规划,同事进行了初次开发,用web界面实现了部分巡检,但还存在一些问题,未投入实际使用,先用外部免费监控服务
* 2014年12月:做了基本的站点循环、页面检查、邮件报警,数据库已经建表但尚未启用,目前只用了PHP程序来进行设置、运行,也没有WEB界面
*
* 工作列表: 
* 1、分为子程序:main.php, groups.inc.php, group.inc.php, site.inc.php, page.inc.php, robots.php, sitemap.php等;
* 2、数据库记录,考虑设置web界面;
* 3、需要巡检的站点再仔细检查、添加;
* 4、巡检模式:是否检查手机版、是否检查多语言、是否检查首页、是否检查分类页、是否检查列表页、是否检查内页等;
* 5、智能判断等,详见http://jamesqi.com/node/442 《网站监控程序的开发需求》。
*/

if (isset($argv[1])) {
    $point_id = $argv[1];
    log_print("point_id = $point_id\n");
} else {
    print "no arg, please add point_id to run this programe like this: 'php main.php point_id'.\n";
    exit;
}

if (isset($argv[2])) {
    $page_name = $argv[2];
    log_print("page_name = $page_name\n");
} else {
    $page_name = '';
}

//system settings

$should_include = 'Drupal 7,Drupal 6,Drupal,nokia_mobile,MediaWiki 1.23.5,MediaWiki 1.16.0,MediaWiki,sitemap,robots';
$should_not_include = 'Fatal error,Warning:,Notice:,403 Forbidden,Apache 2 Test Page,undefined,Error,';
$should_not_include .= 'Too many connections,PDOException,file_exists(),mysql_connect,Cannot contact the database server,A database query error has occurred,This wiki has a problem,Error:,HTTP request,Internal error,';
$should_not_include .= '报错:,出错:,警告:';
//$alarm = TRUE;
//$email = 'myemail@example.com';
$timeout = 60;//访问超时门限,单位秒,超过则报警
$retry = 3;//重试最大次数
$delay = 30;//重试之间的延迟时间,单位秒

$log = '';

log_print("should_include = $should_include\n");
log_print("should_not_include = $should_not_include\n");
log_print("timeout = $timeout, retry = $retry, delay = $delay\n");

$count_group = 0;
$count_site = 0;
$count_page = 0;
$count_retry = 0;
$count_error = 0;
$count_header = 0;
$count_body = 0;
$time_header = 0;
$time_body = 0;

//monitor points settings

$point = array();

$point['china_0'] = array(
    'point_id' => 'china_0',
    'point_name' => 'China 0',
    'groups' => 'mingluji_gongshang,mingluji_hangye',
    'email' => 'myemail@example.com',
    'check_mode' => 'full'
);

$point['china_1'] = array(
    'point_id' => 'china_1',
    'point_name' => 'China 1',
    'groups' => 'jamesqi,jamesqi_wiki',
    'email' => 'myemail@example.com',
    'check_mode' => 'full'
);

$point['foreign_0'] = array(
    'point_id' => 'foreign_0',
    'point_name' => 'Foreign 0',
    'groups' => 'en18daonet,bizdirlib_usa',
    'email' => 'myemail@example.com',
    'check_mode' => 'full'
);

$point['foreign_1'] = array(
    'point_id' => 'foreign_1',
    'point_name' => 'Foreign 1',
    'groups' => 'chahaoba_wiki_language,youbianku_wiki_language',
    'email' => 'myemail@example.com',
    'check_mode' => 'full'
);

$point['test'] = array(
    'point_id' => 'test',
    'point_name' => 'Test',
    'groups' => 'test1,test2',
    'email' => 'myemail@example.com',
    'check_mode' => 'full'
);

$groups = $point[$point_id]['groups'];
$email = $point[$point_id]['email'];

$line = "groups = $groups, email = $email\n";
log_print($line);

//website group settings

$group = array();

$group['mingluji_gongshang'] = array(
  'group_id' => 'mingluji_gongshang',
  'group_name' => 'mingluji new gongshang',
  'base_url' => 'gongshang.mingluji.com',
  'sub_sites' => ',beijing,tianjin,hebei,neimenggu,shanxi,shanghai,anhui,jiangsu,zhejiang',
  'sub_mode' => 'sub_path',
  'cms' => 'Drupal 7',
  'mobile_mode' => 'sub_domain',
  'languages' => 'zh-hans',
  'home_site' => 'gongshang.mingluji.com',
  'example_site' => 'gongshang.mingluji.com/xizang',
  'check_mode' => 'full'
);

$group['mingluji_hangye'] = array(
  'group_id' => 'mingluji_hangye',
  'group_name' => 'mingluji hangye',
  'base_url' => 'hangye.mingluji.com',
  'sub_sites' => ',anquan,bangongwenjiao,baojian,baozhuang,caiwu,canyinyule,dianchi',
  'sub_mode' => 'sub_path',
  'cms' => 'Drupal 7',
  'mobile_mode' => 'sub_domain',
  'languages' => 'zh-hans',
  'home_site' => 'ditu.mingluji.com',
  'example_site' => 'ditu.mingluji.com/xizang',
  'check_mode' => 'full'
);

$group['jamesqi'] = array(
  'group_id' => 'jamesqi',
  'group_name' => 'jamesqi',
  'base_url' => 'jamesqi.com',
  'sub_sites' => ',m,jiapu',
  'sub_mode' => 'sub_domain',
  'cms' => 'Drupal 7',
  'mobile_mode' => 'sub_domain',
  'languages' => 'zh-hans',
  'home_site' => 'jamesqi.com',
  'example-site' => 'jamesqi.com',
  'check_mode' => 'full'
);

$group['jamesqi_wiki'] = array(
  'group_id' => 'jamesqi_wiki',
  'group_name' => 'jamesqi wiki sites',
  'base_url' => 'jamesqi.com',
  'sub_sites' => 'www,mobile',
  'sub_mode' => 'sub_domain',
  'cms' => 'MediaWiki',
  'mobile_mode' => 'sub_domain',
  'languages' => 'zh-cn',
  'home_site' => 'www.jamesqi.com',
  'example-site' => 'www.jamesqi.com',
  'check_mode' => 'full'
);

$group['en18daonet'] = array(
  'group_id' => 'en18daonet',
  'group_name' => 'en18daonet wiki',
  'base_url' => '18dao.net',
  'sub_sites' => 'en',
  'sub_mode' => 'sub_domain',
  'cms' => 'MediaWiki',
  'mobile_mode' => 'm.',
  'languages' => 'en',
  'home_site' => 'en.18dao.net',
  'example-site' => 'en.18dao.net',
  'check_mode' => 'full'
);

$group['bizdirlib_usa'] = array(
  'group_id' => 'bizdirlib_usa',
  'group_name' => 'bizdirlib_usa',
  'base_url' => 'bizdirlib.com',
  'sub_sites' => 'usa,ak,al,ar,az,ca,co,ct,dc,de,fl,ga,hi,ia,id,il,in',
  'sub_mode' => 'sub_domain',
  'cms' => 'Drupal 6',
  'mobile_mode' => '/m',
  'languages' => '49',
  'home_site' => 'usa.bizdirlib.com',
  'example-site' => 'ak.bizdirlib.com',
  'check_mode' => 'full'
);

$group['chahaoba_wiki_language'] = array(
  'group_id' => 'chahaoba_wiki_language',
  'group_name' => 'chahaoba_wiki_language',
  'base_url' => 'chahaoba.com',
  'sub_sites' => 'ar,de,en,es,fr,it,ja,ko,pt,ru',
  'sub_mode' => 'sub_domain',
  'cms' => 'MediaWiki',
  'mobile_mode' => '.m.',
  'languages' => '10',
  'home_site' => 'en.chahaoba.com',
  'example-site' => 'en.chahaoba.com',
  'check_mode' => 'full'
);

$group['youbianku_wiki_language'] = array(
  'group_id' => 'youbianku_wiki_language',
  'group_name' => 'youbianku_wiki_language',
  'base_url' => 'youbianku.com',
  'sub_sites' => 'ar,de,en,es,fr,it,ja,ko,pt,ru',
  'sub_mode' => 'sub_domain',
  'cms' => 'MediaWiki',
  'mobile_mode' => '.m.',
  'languages' => '10',
  'home_site' => 'en.youbianku.com',
  'example-site' => 'en.youbianku.com',
  'check_mode' => 'full'
);

//print_r($group);

//functions

function sub_path($base_url, $sub_sites) {
    $sites = '';
    foreach (explode(',',$sub_sites) as $value) {
        $value = trim($value);
        $sites .= ",$base_url/$value";
    }
    $sites = substr($sites,1);
    return $sites;
}

function sub_domain($base_url, $sub_sites) {
    $sites = '';
    foreach (explode(',',$sub_sites) as $value) {
        $value = trim($value);
        if ( $value =='') {
            $sites .= ",$base_url";
        } else {
            $sites .= ",$value.$base_url";
        }
    }
    $sites = substr($sites,1);
    return $sites;
}

function check_sites($sites) {
  foreach (explode(',',$sites) as $value) {
    $value = trim($value);
    //print "site =  $value\n";
    check_site($value);
  }
}

function check_site($site) {
    global $log;
    global $count_site;
    global $page_name;
    $count_site ++;
    $line = "------------------------------------\n count_site = $count_site, site = $site\n";
    log_print($line);
    if ($page_name != '') {
        $page = "$site/$page_name";
    } else {
        $page = $site;
    }
    check_page($page);
}

function check_page($page) {
    global $log;
    global $count_page;
    global $retry;
    global $delay;
    global $count_retry;
    global $count_error;
    global $count_header;
    global $count_body;
    global $time_header;
    global $time_body;
    $count_page ++;
    $url = "http://$page";
    for ( $count_retry = 0; $count_retry <= $retry; $count_retry ++ ) {
        $time_start = microtime(TRUE);
        $header = get_headers($url, 1);
        $time_end = microtime(TRUE);
        $time_long = $time_end - $time_start;
        $time_long = round($time_long,3);
        $count_header++;
        $time_header = $time_header + $time_long;
        $line = 'header length = '.strlen( implode( ',', $header ) )." bytes, time = $time_long seconds\n";
        log_print($line);
        print_r ($header);
        if ( $header == FALSE ) {
            $line = "count_retry = $count_retry\n";
            log_print($line);
            sleep ( $delay );
        } else {
            break;
        }
    }
    if ( $header == FALSE ) {
        $line = "page = $page, code = get headers FALSE, count_retry = $count_retry\n";
        log_print($line);
        print_r($header);
        alarm ($page,"get headers FALSE, retry $count_retry times","get headers FALSE, retry $count_retry times");
    } elseif ( $header[0] == 'HTTP/1.1 200 OK' || $header[1] == 'HTTP/1.1 200 OK' || $header[0] == 'HTTP/1.0 200 OK' || $header[1] == 'HTTP/1.0 200 OK' ) {
        $time_start = microtime(TRUE);
        $content = file_get_contents($url);
        //$header = $http_response_header;
        $time_end = microtime(TRUE);
        $time_long = $time_end - $time_start;
        $time_long = round($time_long,3);
        $count_body++;
        $time_body = $time_body + $time_long;
        $line = 'content length = '.strlen($content)." bytes, time = $time_long seconds\n";
        log_print($line);
        $title = get_title($content);
        log_print("title = $title\n");
        global $timeout;
        if ($time_long > $timeout) {
            alarm ($page,"time too long: $time_long > $timeout",array_to_string($header));
        }
        if (!should_include($content)) {
            alarm ($page,'should_include not found',$content);
        } else {
            $line = "$page should_include found\n";
            log_print($line);
        }
        if (!should_not_include($content)) {
            alarm ($page,'should_not_include found',$content);
        } else {
            $line = "$page should_not_include not found\n";
            log_print($line);
        }
    } else {
        $line = "page = $page, code = not 200 OK\n";
        $count_error++;
        log_print($line);
        print_r($header);
        alarm ($page,$header[0],array_to_string($header));
    }
}

function should_include($content) {
    global $should_include;
    global $count_error;
    $token = strtok($should_include, ',');
    while ($token !== false) {
        $token = trim($token);
        print "token=$token,";
        if ( strpos($content,$token) == false ) {
            $token = strtok(',');
        } else {
            log_print("found = $token\n");
            return TRUE;//found
        }
    }
    print "\n";
    $count_error++;
    return FALSE;//not found
}

function should_not_include($content) {
    global $should_not_include;
    global $line;//?
    global $count_error;
    $token = strtok($should_not_include, ',');
    while ($token !== false) {
        $token = trim($token);
        print "token=$token,";
        if ( strpos($content,$token) == false ) {
            $token = strtok(',');
        } else {
            $line = "found = $token\n";
            log_print($line);
            $count_error++;
            return FALSE;//found
        }
    }
    print "\n";
    return TRUE;//not found
}

function get_title($content) {
    $start = strpos( $content, '<title>' );
    $end = strpos( $content, '</title>' );
    $title = substr($content, $start + 7, $end - $start - 7 );
    return $title;
}

function alarm($page,$code,$message) {
    global $log;
    global $email;
    $alarm = "ALARM: page = $page, code = $code\n";
    log_print($alarm);
    //print_r($header);
    $mailbox = $email;
    $subject = $alarm;
    sendmail($mailbox, $subject, $message);
}

function sendmail($mailbox, $subject, $message) {
    mail($mailbox, $subject, $message);
}

function log_print($string) {
    global $log;
    $log .= $string;
    print $string;
}

function array_to_string($array) {
    $string = '';
    foreach ($array as $key => $value) {
        $string .= "[$key] => $value,\n";
    }
    return $string;
}

//main loop

$time_from = time();

//print "groups = $groups\n";
$groups_array = explode( ',', $groups );
//print_r ($groups_array);

foreach ( $groups_array as $value ) {
//print "here!\n";
//print array_key_exists( $value, $group );
    if ( !array_key_exists( $value, $group ) ) {
        $line = "group $value not found\n";
        log_print($line);
    } else {
        $series = $group[$value];
        $group_id = $series['group_id'];
        $sub_mode = $series['sub_mode'];
        $base_url = $series['base_url'];
        $sub_sites = $series['sub_sites'];
        $count_group ++;
        $line = "===========================================\n";
        $line .= "count_group = $count_group, group_id = $group_id, sub_mode = $sub_mode, base_url = $base_url, sub_sites = $sub_sites\n"; 
        log_print($line);
        switch ($sub_mode) {
        case 'sub_path':
            $sites = sub_path( $base_url, $sub_sites );
            break;
        case 'sub_domain':
            $sites = sub_domain( $base_url, $sub_sites );
            break;
        default:
            log_print("sub_mode $sub_mode error\n");
        }

        $line = "sites = $sites\n";
        log_print($line);
        check_sites($sites);
    }
}

$time_to = time();

$time_total = $time_to - $time_from;
$time_average = round($time_total/$count_site,3);
$time_total = gmstrftime('%H:%M:%S', $time_total);

$time_header_average = round($time_header/$count_header,3);
$time_body_average = round($time_body/$count_body,3);

$mailbox = $email;

$line = "===========================================\n";
log_print($line);

$subject = "point $point_id: $count_group groups $count_site sites $count_error errors";
$subject .= ", total: $time_total average: $time_average seconds, header $time_header_average / body $time_body_average";
log_print("$subject\n");

$line = "from ".date('Y-m-d H:i:s',$time_from)." to ".date('Y-m-d H:i:s',$time_to)."\n";
$line .= "time_header/count_header = $time_header/$count_header, time_body/count_body = $time_body/$count_body\n";
log_print($line);

$message = $log;
sendmail($mailbox, $subject, $message);
print "$log";

?>

  再在Linux的/etc/crontab中加入定时运行,国内服务器运行检查国内站点的:

00 01 * * * root php /path/main.php china_0
30 04 * * * root php /path/main.php china_1

  国外服务器运行检查国外站点的:

00 02 * * * root php /path/main.php foreign_0
30 05 * * * root php /path/main.php foreign_1

  这样每天凌晨自动运行巡检并发送统计结果的邮件到指定信箱。

  上面这是检查网站首页的情况,如果要检查sitemap.xml,可以在/etc/crontab中加入这样的:

00 01 * * * root php /path/main.php china_0 sitemap.xml

  后面还可以专门针对sitemap.xml的检查对main.php进行修改。

 

自由标签:

添加新评论

Plain text

  • 不允许使用HTML标签。
  • 自动将网址与电子邮件地址转变为链接。
  • 自动断行和分段。
验证码
本问题用于测试您是否为人类访问者,避免自动垃圾发贴。
图形验证
键入显示在图片中的字符