前几天在为系列网站提交网站地图的时候,发现提交多语言、包含手机版的上百个sitemap实在是太痛苦了,不停地click+copy&paste做多了直让人犯困,算了一下,至少需要几十个小时来做这种极度无聊的工作。
了解到Google Webmaster Data API是可以用于批量提交sitemap的,但不知道如何使用,查了一些资料也没有找到突破头绪就先搁置了,后来尝试用robots.txt提交,但这种办法只能提交网站更目录下的sitemap(例如http://che.postcodebase.com/sitemap.xml),无法识别子目录(或者说子路径)下的sitemap(例如http://che.postcodebase.com/m/ar/sitemap.xml)。
前两天找到一篇《Google WebMaster API (PHP) for submitting dynamic sitemaps》是最接近我们需要的解决办法,但文章中提到的下载地址失效就没有继续弄。今天下决心非得解决不可,找程序员同事也询问了办法,然后逐条调试、修改,遇到php配置的问题又跟换了一台服务器上调试,最后终于是成功了!
对老外这篇文章又爱又恨啊,感谢文章中的例子给了php实现的办法,但里面的一些配置没有说清楚、程序中有错误代码,搞得多花了好多时间来弄。现在把几个要点记下来:
- Zend_GData下载地址搬家了,前文中的已经不对;
- 下载Zend_GData后,可以不再下载专门的Zend Framework;
- Zend生效需要include_path,可以改php.ini或者程序中加入ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library");
- $sitemap-location应该是$sitemap_location,而且需要先赋值,等于要提交的sitemap网址;
- 程序中多了两段<entry...>和</entry>错误代码;
- $result=$fdata->post($xml,"https://www.google.com/webmasters/tools/feeds/http://yoursite中后面一个http://没有进行urlencode;
- 运行环境中的php需要支持ssl以便用https访问google account获得授权。
好歹算是可以用了,一会儿就提交成功了好多个站点的数百个sitemap,没有白费几个小时的时间!最后把我用的代码再贴出来:
<?php ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library"); echo "load start<br />\n"; require_once 'Zend/Loader.php'; Zend_Loader::loadClass('Zend_Gdata'); Zend_Loader::loadClass('Zend_Gdata_ClientLogin'); Zend_Loader::loadClass('Zend_Gdata_Gapps'); echo "load end<br />\n"; // Provide Google Account Information $email = 'abc@gmail.com'; $passwd = '123'; $service = 'sitemaps'; // Try to connect echo "try start<br />\n"; try { $client = Zend_Gdata_ClientLogin::getHttpClient($email, $passwd, $service); } catch (Zend_Gdata_App_CaptchaRequiredException $cre) { echo 'URL of CAPTCHA image: ' . $cre->getCaptchaUrl() . "n"; echo 'Token ID: ' . $cre->getCaptchaToken() . "n"; } catch (Zend_Gdata_App_AuthException $ae) { echo 'Problem authenticating: ' . $ae->exception() . "n"; } echo "try end<br />\n"; $sitemap_location = 'http://cyp.postcodebase.com/sitemap.xml'; add_sitemap($sitemap_location,$client); function add_sitemap($sitemap_location,$client){ $xml ='<entry xmlns="http://www.w3.org/2005/Atom" xmlns:wt="http://schemas.google.com/webmasters/tools/2007"><id>'.$sitemap_location.'</id>'; $xml.="<category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/webmasters/tools/2007#sitemap-regular'/><wt:sitemap-type>WEB</wt:sitemap-type></entry>"; $fdata = new Zend_Gdata($client); echo "post start<br />\n"; $result=$fdata->post($xml,"https://www.google.com/webmasters/tools/feeds/http%3A%2F%2Fcyp%2Epostcodebase%2Ecom%2F/sitemaps/",null,"application/atom+xml"); echo "$sitemap_location<br />\n"; echo "post end<br />\n"; } ?>
里面多了几句echo调试语句,便于查看运行情况,这个php程序(/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/submit.php)可以在命令行运行,也可以通过浏览器查看服务器hawk726上的这个php程序所在的网址来运行,效果一样。
2013年4月补充:新网站也可以用API来添加网站、验证网站,我只添加成功了,但验证没有成功。添加的程序:
<?php //echo phpinfo(); //include_path='/usr/local/apache2/htdocs/drupal7.postcodebase.com/sites/all/will-delete/ZendGdata-1.12.0/library'; //include_path=''; ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library"); echo "load start<br />\n"; require_once 'Zend/Loader.php'; //$dirs="/usr/local/apache2/htdocs/drupal7.postcodebase.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata"; Zend_Loader::loadClass('Zend_Gdata'); Zend_Loader::loadClass('Zend_Gdata_ClientLogin'); Zend_Loader::loadClass('Zend_Gdata_Gapps'); echo "load end<br />\n"; //$subdomain="jpn"; $subdomain_array=array("afg"); // Provide Google Account Information $email = 'email@gmail.com'; $passwd = 'password'; $service = 'sitemaps'; // Try to connect echo "try start<br />\n"; try { $client = Zend_Gdata_ClientLogin::getHttpClient($email, $passwd, $service); } catch (Zend_Gdata_App_CaptchaRequiredException $cre) { echo 'URL of CAPTCHA image: ' . $cre->getCaptchaUrl() . "n"; echo 'Token ID: ' . $cre->getCaptchaToken() . "n"; } catch (Zend_Gdata_App_AuthException $ae) { echo 'Problem authenticating: ' . $ae->exception() . "n"; } echo "try end<br />\n"; foreach ($subdomain_array as $subdomain) { $post_address="https://www.google.com/webmasters/tools/feeds/sites/"; echo "post_address: $post_address<br />\n"; $site_url = "http://$subdomain.bizdirlib.com/"; add_site($site_url,$client,$post_address); } function add_site($site_url,$client,$post_address){ $xml ="<atom:entry xmlns:atom='http://www.w3.org/2005/Atom'>"; $xml.='<atom:content src="'.$site_url.'" />'; $xml.="</atom:entry>"; $fdata = new Zend_Gdata($client); //echo "post start<br />\n"; echo "site_url: $site_url<br />\n"; $result=$fdata->post($xml,$post_address,null,"application/atom+xml"); //echo "post end<br />\n"; } ?>
添加还是用的与提交一样的post方式,验证需要put,还有获取列表需要get,put和get的例子都还没有找到,我也不熟悉,只有先放一放再说。
2013年9月22日补充:为了解决验证、修改geolocation等问题,自己折腾了很久PUT还是没有搞定,到处去找,终于是找到了一段非常好的程序(Webmaster Tools 或者 WebmasterTools.php),现在WebmasterTools.php放在hawk726的/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/上:
<?php class WebmasterTools { function WebmasterTools($username, $password) { $this->_Login($username, $password); } function _Http($method, $url, $contentType, $content='') { $method = strtoupper($method); $opts = array('http' => array( 'method' => $method, 'protocol_version' => 1.0, 'header' => 'Content-type: ' . $contentType . (isset($this->auth) && isset($this->auth['Auth']) ? "\nAuthorization: GoogleLogin auth=" . $this->auth['Auth'] : '' ) . "\nContent-Length: " . strlen($content), 'content' => $content ) ); $context = stream_context_create($opts); $result = @file_get_contents($url, false, $context); return $result; } function _Login($username, $password, $service='sitemaps') { $postdata = http_build_query( array('accountType' => 'GOOGLE', 'Email' => $username, 'Passwd' => $password, 'source' => 'WebmasterTools-Class', 'service'=> $service) ); $login = $this->_Http('POST', 'https://www.google.com/accounts/ClientLogin','application/x-www-form-urlencoded', $postdata); $lines = explode("\n", $login); $data = array(); foreach ($lines as $line) { list($var,$value) = explode('=', $line); $data[$var] = $value; } $this->auth=$data; } function _GetText($node) { $text = ''; for ($i=0; $i < $node->childNodes->length; $i++) { $child = $node->childNodes->item($i); if ($child->nodeType==XML_TEXT_NODE) $text .= $child->wholeText; } return $text; } // array_elements_in has the set of tags we should use as array b // because they may repeat. function _ElementToArray($node, $array_elements_in = array()) { $row = array(); $array_elements = array(); foreach ($array_elements_in as $array_element) $array_elements[$array_element] = true; for ($i=0; $i < $node->childNodes->length; $i++) { $item = $node->childNodes->item($i); if (!isset($item->tagName)) continue; $children = $this->_ElementToArray($item, $array_elements_in); if (count($children) > 0) { $value = $children; } else { $value = $this->_GetText($item); } if (isset($array_elements[$item->tagName])) { if (!isset($row[$item->tagName])) $row[$item->tagName] = array(); $row[$item->tagName][] = $value; } else $row[$item->tagName] = $value; } return $row; } function _callWMT($method, $url, $site='', $params = array(), $array_elements_in = array()) { $method = strtolower($method); $site = "http://$site/"; $url = str_replace('{site}', urlencode($site), $url); $xml = ''; if ($method=='post' || $method=='put') { $doc = new DOMDocument('1.0', 'utf-8'); $root = $doc->createElementNS("http://www.w3.org/2005/Atom", 'atom:entry' ); if (count($params) > 0) { $root->setAttributeNS('http://www.w3.org/2000/xmlns/','xmlns:wt','http://schemas.google.com/webmasters/tools/2007'); } $doc->appendChild($root); $element = $doc->createElement('atom:id', $site); $root->appendChild($element); if (count($params) > 0) { $element = $doc->createElement('atom:category'); $element->setAttribute('scheme','http://schemas.google.com/g/2005#kind'); $element->setAttribute('term','http://schemas.google.com/webmasters/tools/2007#site-info'); $root->appendChild($element); } else { $element = $doc->createElement('atom:content'); $element->setAttribute('src',$site); $root->appendChild($element); } foreach ($params as $tag => $value) { if (is_array($value)) { $element = $doc->createElement("wt:$tag", $value['_value']); foreach($value as $att => $value) { if($att=='_value') continue; $element->setAttribute('att','value'); } } else { $element = $doc->createElement("wt:$tag", $value); $root->appendChild($element); } } $xml = $doc->saveXML(); } $body = $this->_Http($method, $url, "application/atom+xml", $xml); if ($body!='') { $doc = new DOMDocument(); $success = $doc->loadXML($body); return $this->_ElementToArray($doc, $array_elements_in); } else { return false; } } function createSite($site) { $this->_callWMT('post', 'https://www.google.com/webmasters/tools/feeds/sites/', $site); // Google does send Content-Lenght back and get_contents fails so we get the site again ! return $this->getSite($site); } function deleteSite($site) { return $this->_callWMT('delete', 'https://www.google.com/webmasters/tools/feeds/sites/{site}', $site); } function setGeoLocation($site, $location) { return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('geolocation' => $location)); } function setPreferredDomain($site, $domain='') { if ($domain=='') $domain = $site; return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('preferred-domain' => $domain)); } function getSite($site) { $entries = $this->_callWMT('get','https://www.google.com/webmasters/tools/feeds/sites/{site}', $site); return $entries; } function getSites() { $rawSites = $this->_callWMT('get','https://www.google.com/webmasters/tools/feeds/sites','',array(),array('entry')); $sites = array(); foreach ($rawSites['feed']['entry'] as $entry) { $site = explode('/', $entry['title']); $site = $site[2]; $sites[$site] = $entry; } return $sites; } function verifySite($site, $location = '') { $entry = $this->getSite($site); $vm = $entry['entry']['wt:verification-method']; if ($location!='') file_put_contents("$location/$vm", $vm); return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('verification-method' => array('_value' => $vm, 'type' => 'htmlpage', 'in-use' => 'true', 'file-content' => "goolge-site-verification: $vm" ) )); } } function ut_WebmasterTools ($username, $password, $website,$location) { $wt = new WebmasterTools($username, $password); echo "Get Site\n"; print_r($wt->getSite($website)); echo "Delete Site\n"; print_r($wt->deleteSite($website)); echo "Create Site\n"; print_r($wt->createSite($website)); echo "Verify Site\n"; print_r($wt->verifySite($website)); echo "Set Location\n"; print_r($wt->setGeoLocation($website,$location)); } ?>
这段程序用起来很方便,似乎也不需要调用复杂的GData Zend什么的,所以无需特别的配置环境,我已经用这段代码修改了200多个站点的geolocation。
另外,Google Webmaster API有添加站点的数量限制,不能超过1000个,但这个限制只针对API,如果是人工手工添加更多的站点,是可以超过1000个的。
2014年8月21日补充:转移到usloft4065服务器上的这个目录:/var/www/html/yellowpage.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/
评论1
Google Webmaster Data API中没有删除网址的命令
我们有些错误的网址,虽然现在已经是404状态,但Google Webmaster Tools中看还是显示抓取错误,希望能主动批量提交删除,目前在Webmaster Data API中没有找到删除URL的办法,只能手工逐个提交。
不过这个问题似乎也不算太大,Google自己会逐步删除这些错误的网址,逐步不再爬取和提示错误的,我想是这样的。
补充:后面用301重定向基本解决了,让错误的网址转到正确的网址去,这个办法应该最合适了。