刚才记录了一篇《自己编写的网站监控程序》,可以实现比较复杂的多系列网站巡检,设置第二个参数为sitemap.xml就可以检查网站地图。
不过看到以前还写过一个更简单的sitemap.xml检查程序monitor_xmlsitemap.php,也把PHP源代码贴出来:
<?php function check($host) { //$keyword = 'xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"'; $keyword_index = 'sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"'; $keyword_urlset = 'urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"'; $url = "https://$host"; $time_start = microtime(true); print "----------\n"; print "url = $url\n"; $content = file_get_contents($url); print 'content length = '.strlen($content)."\n"; $time_end = microtime(true); $time_long = $time_end - $time_start; print "time = $time_long s\n"; if (strpos($content,$keyword_index) != FALSE) { print "index file\n"; } elseif (strpos($content,$keyword_urlset) != FALSE) { print "urlset file\n"; } else { print "error: not sitemap file!\n"; } print "\n"; } print "monitor start\n"; $sites = array( 'www.jamesqi.com', 'jamesqi.com', 'www.youbianku.cn', 'w.youbianku.cn', 'wyoubianku.cn', 'www.baidu.com', 'www.google.com' ); //print_r($sites); $uri = '/sitemap.xml'; foreach ($sites as $site) { $host = "$site$uri"; //print "host = $host\n"; check($host); } print "monitor end\n"; ?>
运行结果输出网址、内容字节数、抓取时间、内容性质(索引文件/内容文件/错误内容)这几个数据。适合单独手工运行。
评论