刚才记录了一篇《自己编写的网站监控程序》,可以实现比较复杂的多系列网站巡检,设置第二个参数为sitemap.xml就可以检查网站地图。
不过看到以前还写过一个更简单的sitemap.xml检查程序monitor_xmlsitemap.php,也把PHP源代码贴出来:
<?php
function check($host) {
//$keyword = 'xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"';
$keyword_index = 'sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"';
$keyword_urlset = 'urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"';
$url = "https://$host";
$time_start = microtime(true);
print "----------\n";
print "url = $url\n";
$content = file_get_contents($url);
print 'content length = '.strlen($content)."\n";
$time_end = microtime(true);
$time_long = $time_end - $time_start;
print "time = $time_long s\n";
if (strpos($content,$keyword_index) != FALSE) {
print "index file\n";
} elseif (strpos($content,$keyword_urlset) != FALSE) {
print "urlset file\n";
} else {
print "error: not sitemap file!\n";
}
print "\n";
}
print "monitor start\n";
$sites = array(
'www.jamesqi.com',
'jamesqi.com',
'www.youbianku.cn',
'w.youbianku.cn',
'wyoubianku.cn',
'www.baidu.com',
'www.google.com'
);
//print_r($sites);
$uri = '/sitemap.xml';
foreach ($sites as $site) {
$host = "$site$uri";
//print "host = $host\n";
check($host);
}
print "monitor end\n";
?>
运行结果输出网址、内容字节数、抓取时间、内容性质(索引文件/内容文件/错误内容)这几个数据。适合单独手工运行。
评论