偶爾會發現各Drupal系列網站的sitemap.xml丢失的情況,應該是在運行cron階段把老的sitemap.xml删除了,但在生成新的sitemap.xml的時候因為網站數據量大導緻mysql進程被殺死或者其它超時等原因重新生成失敗,我們管理員一般不會去特意檢查這個sitemap是否存在,而搜索引擎會一遍一遍經常檢查,遇到不存在報錯的情況可能會存在相當長的時間。(解決辦法:可以在Sitemap配置中的ADVANCED SETTINGS裡面把Disable cron generation of sitemap files.勾選,或者運行drush vset xmlsitemap_disable_cron_regeneration 1)
今天想到用以前編寫的PHP巡檢程序來檢查,在博客中找到以前記錄的需求《網站監控程序的開發需求》、《關于網站及服務器狀态監控的筆記》,是2013年記錄的,後來2014年自己編寫了程序,雖然不是很完善,但也基本可用,已經進行了幾年的站點巡檢工作,這次幹脆把源代碼也貼出來:
<?php /* * 網站監控程序 * James Qi * * 運行辦法:php main.php 參數1 參數2 * 參數1:必填,point_id,其中point_id是檢測點參數,表示從這個程序檢測哪些系列 * 參數2:可選,page_name,需要巡檢的頁面名稱,例如sitemap.xml/robots.txt等,如果不填寫則默認為空 * * 備忘記錄: * 2013年11月:做了初步需求規劃,同事進行了初次開發,用web界面實現了部分巡檢,但還存在一些問題,未投入實際使用,先用外部免費監控服務 * 2014年12月:做了基本的站點循環、頁面檢查、郵件報警,數據庫已經建表但尚未啟用,目前隻用了PHP程序來進行設置、運行,也沒有WEB界面 * * 工作列表: * 1、分為子程序:main.php, groups.inc.php, group.inc.php, site.inc.php, page.inc.php, robots.php, sitemap.php等; * 2、數據庫記錄,考慮設置web界面; * 3、需要巡檢的站點再仔細檢查、添加; * 4、巡檢模式:是否檢查手機版、是否檢查多語言、是否檢查首頁、是否檢查分類頁、是否檢查列表頁、是否檢查内頁等; * 5、智能判斷等,詳見http://jamesqi.com/node/442 《網站監控程序的開發需求》。 */ if (isset($argv[1])) { $point_id = $argv[1]; log_print("point_id = $point_id\n"); } else { print "no arg, please add point_id to run this programe like this: 'php main.php point_id'.\n"; exit; } if (isset($argv[2])) { $page_name = $argv[2]; log_print("page_name = $page_name\n"); } else { $page_name = ''; } //system settings $should_include = 'Drupal 7,Drupal 6,Drupal,nokia_mobile,MediaWiki 1.23.5,MediaWiki 1.16.0,MediaWiki,sitemap,robots'; $should_not_include = 'Fatal error,Warning:,Notice:,403 Forbidden,Apache 2 Test Page,undefined,Error,'; $should_not_include .= 'Too many connections,PDOException,file_exists(),mysql_connect,Cannot contact the database server,A database query error has occurred,This wiki has a problem,Error:,HTTP request,Internal error,'; $should_not_include .= '報錯:,出錯:,警告:'; //$alarm = TRUE; //$email = 'myemail@example.com'; $timeout = 60;//訪問超時門限,單位秒,超過則報警 $retry = 3;//重試最大次數 $delay = 30;//重試之間的延遲時間,單位秒 $log = ''; log_print("should_include = $should_include\n"); log_print("should_not_include = $should_not_include\n"); log_print("timeout = $timeout, retry = $retry, delay = $delay\n"); $count_group = 0; $count_site = 0; $count_page = 0; $count_retry = 0; $count_error = 0; $count_header = 0; $count_body = 0; $time_header = 0; $time_body = 0; //monitor points settings $point = array(); $point['china_0'] = array( 'point_id' => 'china_0', 'point_name' => 'China 0', 'groups' => 'mingluji_gongshang,mingluji_hangye', 'email' => 'myemail@example.com', 'check_mode' => 'full' ); $point['china_1'] = array( 'point_id' => 'china_1', 'point_name' => 'China 1', 'groups' => 'jamesqi,jamesqi_wiki', 'email' => 'myemail@example.com', 'check_mode' => 'full' ); $point['foreign_0'] = array( 'point_id' => 'foreign_0', 'point_name' => 'Foreign 0', 'groups' => 'en18daonet,bizdirlib_usa', 'email' => 'myemail@example.com', 'check_mode' => 'full' ); $point['foreign_1'] = array( 'point_id' => 'foreign_1', 'point_name' => 'Foreign 1', 'groups' => 'chahaoba_wiki_language,youbianku_wiki_language', 'email' => 'myemail@example.com', 'check_mode' => 'full' ); $point['test'] = array( 'point_id' => 'test', 'point_name' => 'Test', 'groups' => 'test1,test2', 'email' => 'myemail@example.com', 'check_mode' => 'full' ); $groups = $point[$point_id]['groups']; $email = $point[$point_id]['email']; $line = "groups = $groups, email = $email\n"; log_print($line); //website group settings $group = array(); $group['mingluji_gongshang'] = array( 'group_id' => 'mingluji_gongshang', 'group_name' => 'mingluji new gongshang', 'base_url' => 'gongshang.mingluji.com', 'sub_sites' => ',beijing,tianjin,hebei,neimenggu,shanxi,shanghai,anhui,jiangsu,zhejiang', 'sub_mode' => 'sub_path', 'cms' => 'Drupal 7', 'mobile_mode' => 'sub_domain', 'languages' => 'zh-hans', 'home_site' => 'gongshang.mingluji.com', 'example_site' => 'gongshang.mingluji.com/xizang', 'check_mode' => 'full' ); $group['mingluji_hangye'] = array( 'group_id' => 'mingluji_hangye', 'group_name' => 'mingluji hangye', 'base_url' => 'hangye.mingluji.com', 'sub_sites' => ',anquan,bangongwenjiao,baojian,baozhuang,caiwu,canyinyule,dianchi', 'sub_mode' => 'sub_path', 'cms' => 'Drupal 7', 'mobile_mode' => 'sub_domain', 'languages' => 'zh-hans', 'home_site' => 'ditu.mingluji.com', 'example_site' => 'ditu.mingluji.com/xizang', 'check_mode' => 'full' ); $group['jamesqi'] = array( 'group_id' => 'jamesqi', 'group_name' => 'jamesqi', 'base_url' => 'jamesqi.com', 'sub_sites' => ',m,jiapu', 'sub_mode' => 'sub_domain', 'cms' => 'Drupal 7', 'mobile_mode' => 'sub_domain', 'languages' => 'zh-hans', 'home_site' => 'jamesqi.com', 'example-site' => 'jamesqi.com', 'check_mode' => 'full' ); $group['jamesqi_wiki'] = array( 'group_id' => 'jamesqi_wiki', 'group_name' => 'jamesqi wiki sites', 'base_url' => 'jamesqi.com', 'sub_sites' => 'www,mobile', 'sub_mode' => 'sub_domain', 'cms' => 'MediaWiki', 'mobile_mode' => 'sub_domain', 'languages' => 'zh-cn', 'home_site' => 'www.jamesqi.com', 'example-site' => 'www.jamesqi.com', 'check_mode' => 'full' ); $group['en18daonet'] = array( 'group_id' => 'en18daonet', 'group_name' => 'en18daonet wiki', 'base_url' => '18dao.net', 'sub_sites' => 'en', 'sub_mode' => 'sub_domain', 'cms' => 'MediaWiki', 'mobile_mode' => 'm.', 'languages' => 'en', 'home_site' => 'en.18dao.net', 'example-site' => 'en.18dao.net', 'check_mode' => 'full' ); $group['bizdirlib_usa'] = array( 'group_id' => 'bizdirlib_usa', 'group_name' => 'bizdirlib_usa', 'base_url' => 'bizdirlib.com', 'sub_sites' => 'usa,ak,al,ar,az,ca,co,ct,dc,de,fl,ga,hi,ia,id,il,in', 'sub_mode' => 'sub_domain', 'cms' => 'Drupal 6', 'mobile_mode' => '/m', 'languages' => '49', 'home_site' => 'usa.bizdirlib.com', 'example-site' => 'ak.bizdirlib.com', 'check_mode' => 'full' ); $group['chahaoba_wiki_language'] = array( 'group_id' => 'chahaoba_wiki_language', 'group_name' => 'chahaoba_wiki_language', 'base_url' => 'chahaoba.com', 'sub_sites' => 'ar,de,en,es,fr,it,ja,ko,pt,ru', 'sub_mode' => 'sub_domain', 'cms' => 'MediaWiki', 'mobile_mode' => '.m.', 'languages' => '10', 'home_site' => 'en.chahaoba.com', 'example-site' => 'en.chahaoba.com', 'check_mode' => 'full' ); $group['youbianku_wiki_language'] = array( 'group_id' => 'youbianku_wiki_language', 'group_name' => 'youbianku_wiki_language', 'base_url' => 'youbianku.com', 'sub_sites' => 'ar,de,en,es,fr,it,ja,ko,pt,ru', 'sub_mode' => 'sub_domain', 'cms' => 'MediaWiki', 'mobile_mode' => '.m.', 'languages' => '10', 'home_site' => 'en.youbianku.com', 'example-site' => 'en.youbianku.com', 'check_mode' => 'full' ); //print_r($group); //functions function sub_path($base_url, $sub_sites) { $sites = ''; foreach (explode(',',$sub_sites) as $value) { $value = trim($value); $sites .= ",$base_url/$value"; } $sites = substr($sites,1); return $sites; } function sub_domain($base_url, $sub_sites) { $sites = ''; foreach (explode(',',$sub_sites) as $value) { $value = trim($value); if ( $value =='') { $sites .= ",$base_url"; } else { $sites .= ",$value.$base_url"; } } $sites = substr($sites,1); return $sites; } function check_sites($sites) { foreach (explode(',',$sites) as $value) { $value = trim($value); //print "site = $value\n"; check_site($value); } } function check_site($site) { global $log; global $count_site; global $page_name; $count_site ++; $line = "------------------------------------\n count_site = $count_site, site = $site\n"; log_print($line); if ($page_name != '') { $page = "$site/$page_name"; } else { $page = $site; } check_page($page); } function check_page($page) { global $log; global $count_page; global $retry; global $delay; global $count_retry; global $count_error; global $count_header; global $count_body; global $time_header; global $time_body; $count_page ++; $url = "http://$page"; for ( $count_retry = 0; $count_retry <= $retry; $count_retry ++ ) { $time_start = microtime(TRUE); $header = get_headers($url, 1); $time_end = microtime(TRUE); $time_long = $time_end - $time_start; $time_long = round($time_long,3); $count_header++; $time_header = $time_header + $time_long; $line = 'header length = '.strlen( implode( ',', $header ) )." bytes, time = $time_long seconds\n"; log_print($line); print_r ($header); if ( $header == FALSE ) { $line = "count_retry = $count_retry\n"; log_print($line); sleep ( $delay ); } else { break; } } if ( $header == FALSE ) { $line = "page = $page, code = get headers FALSE, count_retry = $count_retry\n"; log_print($line); print_r($header); alarm ($page,"get headers FALSE, retry $count_retry times","get headers FALSE, retry $count_retry times"); } elseif ( $header[0] == 'HTTP/1.1 200 OK' || $header[1] == 'HTTP/1.1 200 OK' || $header[0] == 'HTTP/1.0 200 OK' || $header[1] == 'HTTP/1.0 200 OK' ) { $time_start = microtime(TRUE); $content = file_get_contents($url); //$header = $http_response_header; $time_end = microtime(TRUE); $time_long = $time_end - $time_start; $time_long = round($time_long,3); $count_body++; $time_body = $time_body + $time_long; $line = 'content length = '.strlen($content)." bytes, time = $time_long seconds\n"; log_print($line); $title = get_title($content); log_print("title = $title\n"); global $timeout; if ($time_long > $timeout) { alarm ($page,"time too long: $time_long > $timeout",array_to_string($header)); } if (!should_include($content)) { alarm ($page,'should_include not found',$content); } else { $line = "$page should_include found\n"; log_print($line); } if (!should_not_include($content)) { alarm ($page,'should_not_include found',$content); } else { $line = "$page should_not_include not found\n"; log_print($line); } } else { $line = "page = $page, code = not 200 OK\n"; $count_error++; log_print($line); print_r($header); alarm ($page,$header[0],array_to_string($header)); } } function should_include($content) { global $should_include; global $count_error; $token = strtok($should_include, ','); while ($token !== false) { $token = trim($token); print "token=$token,"; if ( strpos($content,$token) == false ) { $token = strtok(','); } else { log_print("found = $token\n"); return TRUE;//found } } print "\n"; $count_error++; return FALSE;//not found } function should_not_include($content) { global $should_not_include; global $line;//? global $count_error; $token = strtok($should_not_include, ','); while ($token !== false) { $token = trim($token); print "token=$token,"; if ( strpos($content,$token) == false ) { $token = strtok(','); } else { $line = "found = $token\n"; log_print($line); $count_error++; return FALSE;//found } } print "\n"; return TRUE;//not found } function get_title($content) { $start = strpos( $content, '<title>' ); $end = strpos( $content, '</title>' ); $title = substr($content, $start + 7, $end - $start - 7 ); return $title; } function alarm($page,$code,$message) { global $log; global $email; $alarm = "ALARM: page = $page, code = $code\n"; log_print($alarm); //print_r($header); $mailbox = $email; $subject = $alarm; sendmail($mailbox, $subject, $message); } function sendmail($mailbox, $subject, $message) { mail($mailbox, $subject, $message); } function log_print($string) { global $log; $log .= $string; print $string; } function array_to_string($array) { $string = ''; foreach ($array as $key => $value) { $string .= "[$key] => $value,\n"; } return $string; } //main loop $time_from = time(); //print "groups = $groups\n"; $groups_array = explode( ',', $groups ); //print_r ($groups_array); foreach ( $groups_array as $value ) { //print "here!\n"; //print array_key_exists( $value, $group ); if ( !array_key_exists( $value, $group ) ) { $line = "group $value not found\n"; log_print($line); } else { $series = $group[$value]; $group_id = $series['group_id']; $sub_mode = $series['sub_mode']; $base_url = $series['base_url']; $sub_sites = $series['sub_sites']; $count_group ++; $line = "===========================================\n"; $line .= "count_group = $count_group, group_id = $group_id, sub_mode = $sub_mode, base_url = $base_url, sub_sites = $sub_sites\n"; log_print($line); switch ($sub_mode) { case 'sub_path': $sites = sub_path( $base_url, $sub_sites ); break; case 'sub_domain': $sites = sub_domain( $base_url, $sub_sites ); break; default: log_print("sub_mode $sub_mode error\n"); } $line = "sites = $sites\n"; log_print($line); check_sites($sites); } } $time_to = time(); $time_total = $time_to - $time_from; $time_average = round($time_total/$count_site,3); $time_total = gmstrftime('%H:%M:%S', $time_total); $time_header_average = round($time_header/$count_header,3); $time_body_average = round($time_body/$count_body,3); $mailbox = $email; $line = "===========================================\n"; log_print($line); $subject = "point $point_id: $count_group groups $count_site sites $count_error errors"; $subject .= ", total: $time_total average: $time_average seconds, header $time_header_average / body $time_body_average"; log_print("$subject\n"); $line = "from ".date('Y-m-d H:i:s',$time_from)." to ".date('Y-m-d H:i:s',$time_to)."\n"; $line .= "time_header/count_header = $time_header/$count_header, time_body/count_body = $time_body/$count_body\n"; log_print($line); $message = $log; sendmail($mailbox, $subject, $message); print "$log"; ?>
再在Linux的/etc/crontab中加入定時運行,國内服務器運行檢查國内站點的:
00 01 * * * root php /path/main.php china_0 30 04 * * * root php /path/main.php china_1
國外服務器運行檢查國外站點的:
00 02 * * * root php /path/main.php foreign_0 30 05 * * * root php /path/main.php foreign_1
這樣每天淩晨自動運行巡檢并發送統計結果的郵件到指定信箱。
上面這是檢查網站首頁的情況,如果要檢查sitemap.xml,可以在/etc/crontab中加入這樣的:
00 01 * * * root php /path/main.php china_0 sitemap.xml
後面還可以專門針對sitemap.xml的檢查對main.php進行修改。
评论