前几天在为系列网站提交网站地图的时候,发现提交多语言、包含手机版的上百个sitemap实在是太痛苦了,不停地click+copy&paste做多了直让人犯困,算了一下,至少需要几十个小时来做这种极度无聊的工作。
了解到Google Webmaster Data API是可以用于批量提交sitemap的,但不知道如何使用,查了一些资料也没有找到突破头绪就先搁置了,后来尝试用robots.txt提交,但这种办法只能提交网站更目录下的sitemap(例如http://che.postcodebase.com/sitemap.xml),无法识别子目录(或者说子路径)下的sitemap(例如http://che.postcodebase.com/m/ar/sitemap.xml)。
前两天找到一篇《Google WebMaster API (PHP) for submitting dynamic sitemaps》是最接近我们需要的解决办法,但文章中提到的下载地址失效就没有继续弄。今天下决心非得解决不可,找程序员同事也询问了办法,然后逐条调试、修改,遇到php配置的问题又跟换了一台服务器上调试,最后终于是成功了!
对老外这篇文章又爱又恨啊,感谢文章中的例子给了php实现的办法,但里面的一些配置没有说清楚、程序中有错误代码,搞得多花了好多时间来弄。现在把几个要点记下来:
- Zend_GData下载地址搬家了,前文中的已经不对;
- 下载Zend_GData后,可以不再下载专门的Zend Framework;
- Zend生效需要include_path,可以改php.ini或者程序中加入ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library");
- $sitemap-location应该是$sitemap_location,而且需要先赋值,等于要提交的sitemap网址;
- 程序中多了两段<entry...>和</entry>错误代码;
- $result=$fdata->post($xml,"https://www.google.com/webmasters/tools/feeds/http://yoursite中后面一个http://没有进行urlencode;
- 运行环境中的php需要支持ssl以便用https访问google account获得授权。
好歹算是可以用了,一会儿就提交成功了好多个站点的数百个sitemap,没有白费几个小时的时间!最后把我用的代码再贴出来:
<?php
ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library");
echo "load start<br />\n";
require_once 'Zend/Loader.php';
Zend_Loader::loadClass('Zend_Gdata');
Zend_Loader::loadClass('Zend_Gdata_ClientLogin');
Zend_Loader::loadClass('Zend_Gdata_Gapps');
echo "load end<br />\n";
// Provide Google Account Information
$email = 'abc@gmail.com';
$passwd = '123';
$service = 'sitemaps';
// Try to connect
echo "try start<br />\n";
try {
$client = Zend_Gdata_ClientLogin::getHttpClient($email, $passwd, $service);
} catch (Zend_Gdata_App_CaptchaRequiredException $cre) {
echo 'URL of CAPTCHA image: ' . $cre->getCaptchaUrl() . "n";
echo 'Token ID: ' . $cre->getCaptchaToken() . "n";
} catch (Zend_Gdata_App_AuthException $ae) {
echo 'Problem authenticating: ' . $ae->exception() . "n";
}
echo "try end<br />\n";
$sitemap_location = 'http://cyp.postcodebase.com/sitemap.xml';
add_sitemap($sitemap_location,$client);
function add_sitemap($sitemap_location,$client){
$xml ='<entry xmlns="http://www.w3.org/2005/Atom" xmlns:wt="http://schemas.google.com/webmasters/tools/2007"><id>'.$sitemap_location.'</id>';
$xml.="<category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/webmasters/tools/2007#sitemap-regular'/><wt:sitemap-type>WEB</wt:sitemap-type></entry>";
$fdata = new Zend_Gdata($client);
echo "post start<br />\n";
$result=$fdata->post($xml,"https://www.google.com/webmasters/tools/feeds/http%3A%2F%2Fcyp%2Epostcodebase%2Ecom%2F/sitemaps/",null,"application/atom+xml");
echo "$sitemap_location<br />\n";
echo "post end<br />\n";
}
?>
里面多了几句echo调试语句,便于查看运行情况,这个php程序(/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/submit.php)可以在命令行运行,也可以通过浏览器查看服务器hawk726上的这个php程序所在的网址来运行,效果一样。
2013年4月补充:新网站也可以用API来添加网站、验证网站,我只添加成功了,但验证没有成功。添加的程序:
<?php
//echo phpinfo();
//include_path='/usr/local/apache2/htdocs/drupal7.postcodebase.com/sites/all/will-delete/ZendGdata-1.12.0/library';
//include_path='';
ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library");
echo "load start<br />\n";
require_once 'Zend/Loader.php';
//$dirs="/usr/local/apache2/htdocs/drupal7.postcodebase.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata";
Zend_Loader::loadClass('Zend_Gdata');
Zend_Loader::loadClass('Zend_Gdata_ClientLogin');
Zend_Loader::loadClass('Zend_Gdata_Gapps');
echo "load end<br />\n";
//$subdomain="jpn";
$subdomain_array=array("afg");
// Provide Google Account Information
$email = 'email@gmail.com';
$passwd = 'password';
$service = 'sitemaps';
// Try to connect
echo "try start<br />\n";
try {
$client = Zend_Gdata_ClientLogin::getHttpClient($email, $passwd, $service);
} catch (Zend_Gdata_App_CaptchaRequiredException $cre) {
echo 'URL of CAPTCHA image: ' . $cre->getCaptchaUrl() . "n";
echo 'Token ID: ' . $cre->getCaptchaToken() . "n";
} catch (Zend_Gdata_App_AuthException $ae) {
echo 'Problem authenticating: ' . $ae->exception() . "n";
}
echo "try end<br />\n";
foreach ($subdomain_array as $subdomain) {
$post_address="https://www.google.com/webmasters/tools/feeds/sites/";
echo "post_address: $post_address<br />\n";
$site_url = "http://$subdomain.bizdirlib.com/";
add_site($site_url,$client,$post_address);
}
function add_site($site_url,$client,$post_address){
$xml ="<atom:entry xmlns:atom='http://www.w3.org/2005/Atom'>";
$xml.='<atom:content src="'.$site_url.'" />';
$xml.="</atom:entry>";
$fdata = new Zend_Gdata($client);
//echo "post start<br />\n";
echo "site_url: $site_url<br />\n";
$result=$fdata->post($xml,$post_address,null,"application/atom+xml");
//echo "post end<br />\n";
}
?>
添加还是用的与提交一样的post方式,验证需要put,还有获取列表需要get,put和get的例子都还没有找到,我也不熟悉,只有先放一放再说。
2013年9月22日补充:为了解决验证、修改geolocation等问题,自己折腾了很久PUT还是没有搞定,到处去找,终于是找到了一段非常好的程序(Webmaster Tools 或者 WebmasterTools.php),现在WebmasterTools.php放在hawk726的/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/上:
<?php
class WebmasterTools {
function WebmasterTools($username, $password) {
$this->_Login($username, $password);
}
function _Http($method, $url, $contentType, $content='') {
$method = strtoupper($method);
$opts = array('http' =>
array(
'method' => $method,
'protocol_version' => 1.0,
'header' => 'Content-type: ' . $contentType .
(isset($this->auth) && isset($this->auth['Auth']) ? "\nAuthorization: GoogleLogin auth=" . $this->auth['Auth'] : '' ) .
"\nContent-Length: " . strlen($content),
'content' => $content
)
);
$context = stream_context_create($opts);
$result = @file_get_contents($url, false, $context);
return $result;
}
function _Login($username, $password, $service='sitemaps') {
$postdata = http_build_query(
array('accountType' => 'GOOGLE',
'Email' => $username,
'Passwd' => $password,
'source' => 'WebmasterTools-Class',
'service'=> $service)
);
$login = $this->_Http('POST', 'https://www.google.com/accounts/ClientLogin','application/x-www-form-urlencoded', $postdata);
$lines = explode("\n", $login);
$data = array();
foreach ($lines as $line) {
list($var,$value) = explode('=', $line);
$data[$var] = $value;
}
$this->auth=$data;
}
function _GetText($node) {
$text = '';
for ($i=0; $i < $node->childNodes->length; $i++) {
$child = $node->childNodes->item($i);
if ($child->nodeType==XML_TEXT_NODE)
$text .= $child->wholeText;
}
return $text;
}
// array_elements_in has the set of tags we should use as array b
// because they may repeat.
function _ElementToArray($node, $array_elements_in = array()) {
$row = array();
$array_elements = array();
foreach ($array_elements_in as $array_element)
$array_elements[$array_element] = true;
for ($i=0; $i < $node->childNodes->length; $i++) {
$item = $node->childNodes->item($i);
if (!isset($item->tagName)) continue;
$children = $this->_ElementToArray($item, $array_elements_in);
if (count($children) > 0) {
$value = $children;
} else {
$value = $this->_GetText($item);
}
if (isset($array_elements[$item->tagName])) {
if (!isset($row[$item->tagName])) $row[$item->tagName] = array();
$row[$item->tagName][] = $value;
} else
$row[$item->tagName] = $value;
}
return $row;
}
function _callWMT($method, $url, $site='', $params = array(), $array_elements_in = array()) {
$method = strtolower($method);
$site = "http://$site/";
$url = str_replace('{site}', urlencode($site), $url);
$xml = '';
if ($method=='post' || $method=='put') {
$doc = new DOMDocument('1.0', 'utf-8');
$root = $doc->createElementNS("http://www.w3.org/2005/Atom", 'atom:entry' );
if (count($params) > 0) {
$root->setAttributeNS('http://www.w3.org/2000/xmlns/','xmlns:wt','http://schemas.google.com/webmasters/tools/2007');
}
$doc->appendChild($root);
$element = $doc->createElement('atom:id', $site);
$root->appendChild($element);
if (count($params) > 0) {
$element = $doc->createElement('atom:category');
$element->setAttribute('scheme','http://schemas.google.com/g/2005#kind');
$element->setAttribute('term','http://schemas.google.com/webmasters/tools/2007#site-info');
$root->appendChild($element);
} else {
$element = $doc->createElement('atom:content');
$element->setAttribute('src',$site);
$root->appendChild($element);
}
foreach ($params as $tag => $value) {
if (is_array($value)) {
$element = $doc->createElement("wt:$tag", $value['_value']);
foreach($value as $att => $value) {
if($att=='_value') continue;
$element->setAttribute('att','value');
}
} else {
$element = $doc->createElement("wt:$tag", $value);
$root->appendChild($element);
}
}
$xml = $doc->saveXML();
}
$body = $this->_Http($method, $url, "application/atom+xml", $xml);
if ($body!='') {
$doc = new DOMDocument();
$success = $doc->loadXML($body);
return $this->_ElementToArray($doc, $array_elements_in);
} else {
return false;
}
}
function createSite($site) {
$this->_callWMT('post', 'https://www.google.com/webmasters/tools/feeds/sites/', $site);
// Google does send Content-Lenght back and get_contents fails so we get the site again !
return $this->getSite($site);
}
function deleteSite($site) {
return $this->_callWMT('delete', 'https://www.google.com/webmasters/tools/feeds/sites/{site}', $site);
}
function setGeoLocation($site, $location) {
return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('geolocation' => $location));
}
function setPreferredDomain($site, $domain='') {
if ($domain=='') $domain = $site;
return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('preferred-domain' => $domain));
}
function getSite($site) {
$entries = $this->_callWMT('get','https://www.google.com/webmasters/tools/feeds/sites/{site}', $site);
return $entries;
}
function getSites() {
$rawSites = $this->_callWMT('get','https://www.google.com/webmasters/tools/feeds/sites','',array(),array('entry'));
$sites = array();
foreach ($rawSites['feed']['entry'] as $entry) {
$site = explode('/', $entry['title']);
$site = $site[2];
$sites[$site] = $entry;
}
return $sites;
}
function verifySite($site, $location = '') {
$entry = $this->getSite($site);
$vm = $entry['entry']['wt:verification-method'];
if ($location!='')
file_put_contents("$location/$vm", $vm);
return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site,
array('verification-method' =>
array('_value' => $vm,
'type' => 'htmlpage',
'in-use' => 'true',
'file-content' => "goolge-site-verification: $vm"
)
));
}
}
function ut_WebmasterTools ($username, $password, $website,$location) {
$wt = new WebmasterTools($username, $password);
echo "Get Site\n";
print_r($wt->getSite($website));
echo "Delete Site\n";
print_r($wt->deleteSite($website));
echo "Create Site\n";
print_r($wt->createSite($website));
echo "Verify Site\n";
print_r($wt->verifySite($website));
echo "Set Location\n";
print_r($wt->setGeoLocation($website,$location));
}
?>
这段程序用起来很方便,似乎也不需要调用复杂的GData Zend什么的,所以无需特别的配置环境,我已经用这段代码修改了200多个站点的geolocation。
另外,Google Webmaster API有添加站点的数量限制,不能超过1000个,但这个限制只针对API,如果是人工手工添加更多的站点,是可以超过1000个的。
2014年8月21日补充:转移到usloft4065服务器上的这个目录:/var/www/html/yellowpage.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/
评论1
Google Webmaster Data API中没有删除网址的命令
我们有些错误的网址,虽然现在已经是404状态,但Google Webmaster Tools中看还是显示抓取错误,希望能主动批量提交删除,目前在Webmaster Data API中没有找到删除URL的办法,只能手工逐个提交。
不过这个问题似乎也不算太大,Google自己会逐步删除这些错误的网址,逐步不再爬取和提示错误的,我想是这样的。
补充:后面用301重定向基本解决了,让错误的网址转到正确的网址去,这个办法应该最合适了。