最近在尝试为网址添加百度的MIP(Mobile Instant Page - 移动网页加速器)版本,网站改动后先用MIP Validator进行验证和Preview预览,没有大的问题就可以等着百度蜘蛛来爬取了,不过还可以在百度站长平台中主动提交MIP版本,让百度蜘蛛更快、更全面知晓。
进入百度站长平台后,可以在已有的网站(没有的网站需要先验证和添加网站)菜单中选择“移动专区、MIP引入”,先要确认“《百度MIP资源接入内容责任书》的相关协议”,然后看到“手动提交”和“主动推送(实时)”,手动提交每次可以提交20个MIP页面链接,而主动推送是用程序的方式每天可以提交10000个MIP页面链接,每次运行提交不超过2000个。
我们采取了PHP方式:
<?php $urls = array( 'https://xunren.longren.com/?mip', 'https://xunren.longren.com/node?mip', ); $api = 'http://data.zz.baidu.com/urls?site=xunren.longren.com&token=xxxxxx&type=mip'; $ch = curl_init(); $options = array( CURLOPT_URL => $api, CURLOPT_POST => true, CURLOPT_RETURNTRANSFER => true, CURLOPT_POSTFIELDS => implode("\n", $urls), CURLOPT_HTTPHEADER => array('Content-Type: text/plain'), ); curl_setopt_array($ch, $options); $result = curl_exec($ch); echo $result; ?>
运行的结果如下:
{"remain":9998,"success":2}
将sitemap.xml中的内容复制到一个文本文件,进行一些需要的替换,就形成了每行一个MIP网址的格式,再划分为适当的数量,复制到上面的程序中运行。
前段时间把个人博客添加了MIP版本,今天在百度站长平台中看收录、校验量都是正常的,但展示量、点击量却很少。还要再继续观察看看吧。
2017年6月23日补充:百度对提交mip的速度限制比较麻烦,特别是对我们页面数量巨大的网站来说,有几十万页面就需要几十天来提交,耗费人力。昨天编写了一个程序,读取事先整理好的一个链接列表文件来提交其中一部分。再通过Linux的Cron机制来设置定时运行,基本上一次设置好了以后,每天自动提交5次共1万条网址,只需要人工检查一下日志文件是否正常就可以。先在的提交结果是这样:
{"remain":4996000,"success":2000,"success_mip":2000,"remain_mip":6000}
如果有很多站点,可以每个站点都写一个定时的文本放在/etc/cron.d 目录下,例如xihanhanxicidian.cron.txt:
10 17 25 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 0001 2000 15 17 25 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 2001 2000 20 17 25 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 4001 2000 25 17 25 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 6001 2000 30 17 25 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 8001 2000 30 8 26 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 10001 2000 35 8 26 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 12001 2000 40 8 26 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 14001 2000 45 8 26 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 16001 2000 50 8 26 8 * root php /root/mip/submit_mip.php xihanhanxi.18dao.cn /root/mip/mip_xihanhanxi.18dao.cn.txt 18001 2000
submit_mip.php程序内容:
<?php
/*
* submit mip links to baidu zhanzhang
* james qi 2017-6-22
* command line: php submit_mip.php $site_domain $file_name $start_number $length_number
* for example: php submit_mip.php jamesqi.com jamesqi.com.txt 10001 2000
*/
ini_set('memory_limit','1024M');
/*
print "
command line: php submit_mip.php $site_domain $file_name $start_number $length_number
for example: php submit_mip.php jamesqi.com jamesqi.com.txt 10001 2000
";
*/
if ( isset( $argv[1] ) ) {
$site_domain = $argv[1];
} else {
print "please provide mip links site domain (for example: jamesqi.com) arg\n";
exit;
}
if ( isset( $argv[2] ) ) {
$file_name = $argv[2];
} else {
print "please provide mip links file name (for example: jamesqi.com.txt) arg\n";
exit;
}
if ( isset( $argv[3] ) ) {
$start_number = $argv[3];
} else {
print "please provide mip links start number (for example: 10001) arg\n";
exit;
}
if ( isset( $argv[4] ) ) {
$length_number = $argv[4];
} else {
print "please provide mip links length number (for example: 2000) arg\n";
exit;
}
$log_file_name = "$file_name.log";
$log_file = fopen("$log_file_name", "a");
$date_time = "--------\n";
$date_time .= date("Y-m-d").' ';
$date_time .= date("h:i:sa");
$date_time .= "\n";
$args = "site_domain=$site_domain, file_name=$file_name, start_number=$start_number, length_number=$length_number\n";
$command = '/alidata/server/php/bin/php '.$argv[0].' '.$argv[1].' '.$argv[2].' '.$argv[3].' '.$argv[4]."\n";
print $date_time;
print $args;
print $command;
fwrite($log_file, $date_time);
fwrite($log_file, $args);
fwrite($log_file, $command);
$content = file_get_contents($file_name);
$array = explode("\r\n", $content);
$count = count($array);
if ($count <= 1) {
$array = explode("\n", $content);
$count = count($array);
if ($count <= 1) {
print "exit, count = $count\n";
exit;
}
}
print "count = $count\n";
fwrite($log_file, "count = $count\n");
/*
$urls = array(
'http://www.example.com/1.html',
'http://www.example.com/2.html',
);
*/
$array_length = array_slice($array,$start_number-1,$length_number);
//print_r($array_length);
$urls = array();
foreach ($array_length as $key=>$value) {
$urls[$key] = trim($value)."?mip";
}
$urls[0] = substr($urls[0],strpos($urls[0],'https'));
//print_r($urls);
$api = 'http://data.zz.baidu.com/urls?site=https://'.$site_domain.'&token=your_token&type=mip';
print "api=$api\n";
fwrite($log_file, "api=$api\n");
$ch = curl_init();
$options = array(
CURLOPT_URL => $api,
CURLOPT_POST => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POSTFIELDS => implode("\n", $urls),
CURLOPT_HTTPHEADER => array('Content-Type: text/plain'),
);
curl_setopt_array($ch, $options);
$result = curl_exec($ch);
print "result = $result\n";
fwrite($log_file, "result = $result\n");
fclose($log_file);
?>
数据文件mip_xihanhanxi.18dao.cn.txt 部分示例:
https://xinhuazidian.18dao.cn/zidian/%E8%82%A6 https://xinhuazidian.18dao.cn/zh-hant/zidian/%E8%82%A6 https://xinhuazidian.18dao.cn/zidian/%E8%82%A7 https://xinhuazidian.18dao.cn/zh-hant/zidian/%E8%82%A7 https://xinhuazidian.18dao.cn/zidian/%E8%82%A8 https://xinhuazidian.18dao.cn/zh-hant/zidian/%E8%82%A8 https://xinhuazidian.18dao.cn/zidian/%E8%82%A9 https://xinhuazidian.18dao.cn/zh-hant/zidian/%E8%82%A9 https://xinhuazidian.18dao.cn/zidian/%E4%B8%8B https://xinhuazidian.18dao.cn/zh-hant/zidian/%E4%B8%8B https://xinhuazidian.18dao.cn/zidian/%E4%BA%80 https://xinhuazidian.18dao.cn/zh-hant/zidian/%E4%BA%80 https://xinhuazidian.18dao.cn/zidian/%E5%8C%A2 https://xinhuazidian.18dao.cn/zh-hant/zidian/%E5%8C%A2 https://xinhuazidian.18dao.cn/zidian/%E8%82%AA https://xinhuazidian.18dao.cn/zh-hant/zidian/%E8%82%AA https://xinhuazidian.18dao.cn/zidian/%E8%82%AB https://xinhuazidian.18dao.cn/zh-hant/zidian/%E8%82%AB https://xinhuazidian.18dao.cn/zidian/%E8%82%AC https://xinhuazidian.18dao.cn/zh-hant/zidian/%E8%82%AC
运行后会生成.log的日志文件,用于查看提交的返回信息,了解提交是否成功。
2017年8月28日补充:用上面的办法提交了2个多月时间,把好些站点的mip网址都提交完毕,但奇怪一直就没有多少来自mip的流量,上周同事发现原来我在写提交网址程序的时候,居然把添加的后缀?mip写成了?amp,这样当然无法验证通过了,而且我们也没有经常去查看提交的反馈,百度也不像Google那要自动发现amp页面、发现amp页面有问题会发邮件提醒,导致2个多月提交的链接全部都是错的,唉,浪费了好多时间!只好在修改后再次提交正确的mip网址。
Drupal站的mip版本我们都是在原网址后面带一个?mip后缀,而MediaWiki版本我们都是另外设置一个单独的二级域名/三级域名,例如web页“https://www.jamesqi.com/首页”的MIP版本就是“https://mip.jamesqi.com/首页”,批量提交的程序submit_mip.php改一下提交的网址不需要加后缀,另外保存为submit_subdomain.php,而需要提交的网址可以从https://www.jamesqi.com/sitemap.xml 获取,常用的几种名字空间(Magic words):
ns:-2 - ns:Media ns:-1 - ns:Special ns:0 - main ns:1 - ns:Talk ns:2 - ns:User ns:3 - ns:User talk ns:4 - ns:Project ns:5 - ns:Project talk ns:6 - ns:File ns:7 - ns:File talk ns:8 - ns:MediaWiki ns:9 - ns:MediaWiki talk ns:10 - ns:Template ns:11 - ns:Template talk ns:12 - ns:Help ns:13 - ns:Help talk ns:14 - ns:Category ns:15 - ns:Category talk
我们需要复制的网站地图链接:
https://www.jamesqi.com/sitemap-jamesqi_www-jingle-NS_0-0.xml https://www.jamesqi.com/sitemap-jamesqi_www-jingle-NS_4-0.xml https://www.jamesqi.com/sitemap-jamesqi_www-jingle-NS_6-0.xml https://www.jamesqi.com/sitemap-jamesqi_www-jingle-NS_12-0.xml https://www.jamesqi.com/sitemap-jamesqi_www-jingle-NS_14-0.xml
进行整理、合并并替换其中的子域名为mip.jamesqi.com后保存成文本文件mip.jamesqi.com.txt,其中内容示例如下:
https://mip.jamesqi.com/027.cn%E7%9A%84%E5%AD%90%E5%9F%9F%E5%90%8D%E8%A2%AB%E7%99%BE%E5%BA%A6%E8%A7%A3%E5%B0%81%E4%BA%86 https://mip.jamesqi.com/027%E5%8D%9A%E5%AE%A2%E5%B0%86%E5%8D%87%E7%BA%A7%E4%B8%BA%E4%B8%AA%E4%BA%BA%E9%97%A8%E6%88%B7%EF%BC%8C%E5%8A%9F%E8%83%BD%E5%85%88%E7%9D%B9%E4%B8%BA%E5%BF%AB%EF%BC%81 https://mip.jamesqi.com/%E5%88%86%E7%B1%BB:%E9%BB%84%E9%A1%B5 https://mip.jamesqi.com/%E5%88%86%E7%B1%BB:%E9%BC%A0%E6%A0%87 https://mip.jamesqi.com/%E5%88%86%E7%B1%BB:%E9%BD%90%E8%BE%BE%E5%86%85 https://mip.jamesqi.com/%E5%88%86%E7%B1%BB:%E9%BE%99%E4%BA%BA
然后运行:
php /root/mip/submit_subdomain.php mip.jamesqi.com /root/mip/mip.jamesqi.com.txt 0001 2000
需要提交的数量超过10000以上时需要分多天提交,可以参看前面设置cron的办法定时提交。写一个定时的文本放在/etc/cron.d 目录下,例如mip.jamesqi.com.cron.txt:
10 17 25 8 * root php /root/mip/submit_subdomain.php mip.jamesqi.com /root/mip/mip.jamesqi.com.txt 0001 2000 15 17 25 8 * root php /root/mip/submit_subdomain.php mip.jamesqi.com /root/mip/mip.jamesqi.com.txt 2001 2000
评论