前幾天在為系列網站提交網站地圖的時候,發現提交多語言、包含手機版的上百個sitemap實在是太痛苦了,不停地click+copy&paste做多了直讓人犯困,算了一下,至少需要幾十個小時來做這種極度無聊的工作。
了解到Google Webmaster Data API是可以用于批量提交sitemap的,但不知道如何使用,查了一些資料也沒有找到突破頭緒就先擱置了,後來嘗試用robots.txt提交,但這種辦法隻能提交網站更目錄下的sitemap(例如http://che.postcodebase.com/sitemap.xml),無法識别子目錄(或者說子路徑)下的sitemap(例如http://che.postcodebase.com/m/ar/sitemap.xml)。
前兩天找到一篇《Google WebMaster API (PHP) for submitting dynamic sitemaps》是最接近我們需要的解決辦法,但文章中提到的下載地址失效就沒有繼續弄。今天下決心非得解決不可,找程序員同事也詢問了辦法,然後逐條調試、修改,遇到php配置的問題又跟換了一台服務器上調試,最後終于是成功了!
對老外這篇文章又愛又恨啊,感謝文章中的例子給了php實現的辦法,但裡面的一些配置沒有說清楚、程序中有錯誤代碼,搞得多花了好多時間來弄。現在把幾個要點記下來:
- Zend_GData下載地址搬家了,前文中的已經不對;
- 下載Zend_GData後,可以不再下載專門的Zend Framework;
- Zend生效需要include_path,可以改php.ini或者程序中加入ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library");
- $sitemap-location應該是$sitemap_location,而且需要先賦值,等于要提交的sitemap網址;
- 程序中多了兩段<entry...>和</entry>錯誤代碼;
- $result=$fdata->post($xml,"https://www.google.com/webmasters/tools/feeds/http://yoursite中後面一個http://沒有進行urlencode;
- 運行環境中的php需要支持ssl以便用https訪問google account獲得授權。
好歹算是可以用了,一會兒就提交成功了好多個站點的數百個sitemap,沒有白費幾個小時的時間!最後把我用的代碼再貼出來:
<?php ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library"); echo "load start<br />\n"; require_once 'Zend/Loader.php'; Zend_Loader::loadClass('Zend_Gdata'); Zend_Loader::loadClass('Zend_Gdata_ClientLogin'); Zend_Loader::loadClass('Zend_Gdata_Gapps'); echo "load end<br />\n"; // Provide Google Account Information $email = 'abc@gmail.com'; $passwd = '123'; $service = 'sitemaps'; // Try to connect echo "try start<br />\n"; try { $client = Zend_Gdata_ClientLogin::getHttpClient($email, $passwd, $service); } catch (Zend_Gdata_App_CaptchaRequiredException $cre) { echo 'URL of CAPTCHA image: ' . $cre->getCaptchaUrl() . "n"; echo 'Token ID: ' . $cre->getCaptchaToken() . "n"; } catch (Zend_Gdata_App_AuthException $ae) { echo 'Problem authenticating: ' . $ae->exception() . "n"; } echo "try end<br />\n"; $sitemap_location = 'http://cyp.postcodebase.com/sitemap.xml'; add_sitemap($sitemap_location,$client); function add_sitemap($sitemap_location,$client){ $xml ='<entry xmlns="http://www.w3.org/2005/Atom" xmlns:wt="http://schemas.google.com/webmasters/tools/2007"><id>'.$sitemap_location.'</id>'; $xml.="<category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/webmasters/tools/2007#sitemap-regular'/><wt:sitemap-type>WEB</wt:sitemap-type></entry>"; $fdata = new Zend_Gdata($client); echo "post start<br />\n"; $result=$fdata->post($xml,"https://www.google.com/webmasters/tools/feeds/http%3A%2F%2Fcyp%2Epostcodebase%2Ecom%2F/sitemaps/",null,"application/atom+xml"); echo "$sitemap_location<br />\n"; echo "post end<br />\n"; } ?>
裡面多了幾句echo調試語句,便于查看運行情況,這個php程序(/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/submit.php)可以在命令行運行,也可以通過浏覽器查看服務器hawk726上的這個php程序所在的網址來運行,效果一樣。
2013年4月補充:新網站也可以用API來添加網站、驗證網站,我隻添加成功了,但驗證沒有成功。添加的程序:
<?php //echo phpinfo(); //include_path='/usr/local/apache2/htdocs/drupal7.postcodebase.com/sites/all/will-delete/ZendGdata-1.12.0/library'; //include_path=''; ini_set("include_path", ".:/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/library"); echo "load start<br />\n"; require_once 'Zend/Loader.php'; //$dirs="/usr/local/apache2/htdocs/drupal7.postcodebase.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata"; Zend_Loader::loadClass('Zend_Gdata'); Zend_Loader::loadClass('Zend_Gdata_ClientLogin'); Zend_Loader::loadClass('Zend_Gdata_Gapps'); echo "load end<br />\n"; //$subdomain="jpn"; $subdomain_array=array("afg"); // Provide Google Account Information $email = 'email@gmail.com'; $passwd = 'password'; $service = 'sitemaps'; // Try to connect echo "try start<br />\n"; try { $client = Zend_Gdata_ClientLogin::getHttpClient($email, $passwd, $service); } catch (Zend_Gdata_App_CaptchaRequiredException $cre) { echo 'URL of CAPTCHA image: ' . $cre->getCaptchaUrl() . "n"; echo 'Token ID: ' . $cre->getCaptchaToken() . "n"; } catch (Zend_Gdata_App_AuthException $ae) { echo 'Problem authenticating: ' . $ae->exception() . "n"; } echo "try end<br />\n"; foreach ($subdomain_array as $subdomain) { $post_address="https://www.google.com/webmasters/tools/feeds/sites/"; echo "post_address: $post_address<br />\n"; $site_url = "http://$subdomain.bizdirlib.com/"; add_site($site_url,$client,$post_address); } function add_site($site_url,$client,$post_address){ $xml ="<atom:entry xmlns:atom='http://www.w3.org/2005/Atom'>"; $xml.='<atom:content src="'.$site_url.'" />'; $xml.="</atom:entry>"; $fdata = new Zend_Gdata($client); //echo "post start<br />\n"; echo "site_url: $site_url<br />\n"; $result=$fdata->post($xml,$post_address,null,"application/atom+xml"); //echo "post end<br />\n"; } ?>
添加還是用的與提交一樣的post方式,驗證需要put,還有獲取列表需要get,put和get的例子都還沒有找到,我也不熟悉,隻有先放一放再說。
2013年9月22日補充:為了解決驗證、修改geolocation等問題,自己折騰了很久PUT還是沒有搞定,到處去找,終于是找到了一段非常好的程序(Webmaster Tools 或者 WebmasterTools.php),現在WebmasterTools.php放在hawk726的/var/www/html/drupal7.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/上:
<?php class WebmasterTools { function WebmasterTools($username, $password) { $this->_Login($username, $password); } function _Http($method, $url, $contentType, $content='') { $method = strtoupper($method); $opts = array('http' => array( 'method' => $method, 'protocol_version' => 1.0, 'header' => 'Content-type: ' . $contentType . (isset($this->auth) && isset($this->auth['Auth']) ? "\nAuthorization: GoogleLogin auth=" . $this->auth['Auth'] : '' ) . "\nContent-Length: " . strlen($content), 'content' => $content ) ); $context = stream_context_create($opts); $result = @file_get_contents($url, false, $context); return $result; } function _Login($username, $password, $service='sitemaps') { $postdata = http_build_query( array('accountType' => 'GOOGLE', 'Email' => $username, 'Passwd' => $password, 'source' => 'WebmasterTools-Class', 'service'=> $service) ); $login = $this->_Http('POST', 'https://www.google.com/accounts/ClientLogin','application/x-www-form-urlencoded', $postdata); $lines = explode("\n", $login); $data = array(); foreach ($lines as $line) { list($var,$value) = explode('=', $line); $data[$var] = $value; } $this->auth=$data; } function _GetText($node) { $text = ''; for ($i=0; $i < $node->childNodes->length; $i++) { $child = $node->childNodes->item($i); if ($child->nodeType==XML_TEXT_NODE) $text .= $child->wholeText; } return $text; } // array_elements_in has the set of tags we should use as array b // because they may repeat. function _ElementToArray($node, $array_elements_in = array()) { $row = array(); $array_elements = array(); foreach ($array_elements_in as $array_element) $array_elements[$array_element] = true; for ($i=0; $i < $node->childNodes->length; $i++) { $item = $node->childNodes->item($i); if (!isset($item->tagName)) continue; $children = $this->_ElementToArray($item, $array_elements_in); if (count($children) > 0) { $value = $children; } else { $value = $this->_GetText($item); } if (isset($array_elements[$item->tagName])) { if (!isset($row[$item->tagName])) $row[$item->tagName] = array(); $row[$item->tagName][] = $value; } else $row[$item->tagName] = $value; } return $row; } function _callWMT($method, $url, $site='', $params = array(), $array_elements_in = array()) { $method = strtolower($method); $site = "http://$site/"; $url = str_replace('{site}', urlencode($site), $url); $xml = ''; if ($method=='post' || $method=='put') { $doc = new DOMDocument('1.0', 'utf-8'); $root = $doc->createElementNS("http://www.w3.org/2005/Atom", 'atom:entry' ); if (count($params) > 0) { $root->setAttributeNS('http://www.w3.org/2000/xmlns/','xmlns:wt','http://schemas.google.com/webmasters/tools/2007'); } $doc->appendChild($root); $element = $doc->createElement('atom:id', $site); $root->appendChild($element); if (count($params) > 0) { $element = $doc->createElement('atom:category'); $element->setAttribute('scheme','http://schemas.google.com/g/2005#kind'); $element->setAttribute('term','http://schemas.google.com/webmasters/tools/2007#site-info'); $root->appendChild($element); } else { $element = $doc->createElement('atom:content'); $element->setAttribute('src',$site); $root->appendChild($element); } foreach ($params as $tag => $value) { if (is_array($value)) { $element = $doc->createElement("wt:$tag", $value['_value']); foreach($value as $att => $value) { if($att=='_value') continue; $element->setAttribute('att','value'); } } else { $element = $doc->createElement("wt:$tag", $value); $root->appendChild($element); } } $xml = $doc->saveXML(); } $body = $this->_Http($method, $url, "application/atom+xml", $xml); if ($body!='') { $doc = new DOMDocument(); $success = $doc->loadXML($body); return $this->_ElementToArray($doc, $array_elements_in); } else { return false; } } function createSite($site) { $this->_callWMT('post', 'https://www.google.com/webmasters/tools/feeds/sites/', $site); // Google does send Content-Lenght back and get_contents fails so we get the site again ! return $this->getSite($site); } function deleteSite($site) { return $this->_callWMT('delete', 'https://www.google.com/webmasters/tools/feeds/sites/{site}', $site); } function setGeoLocation($site, $location) { return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('geolocation' => $location)); } function setPreferredDomain($site, $domain='') { if ($domain=='') $domain = $site; return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('preferred-domain' => $domain)); } function getSite($site) { $entries = $this->_callWMT('get','https://www.google.com/webmasters/tools/feeds/sites/{site}', $site); return $entries; } function getSites() { $rawSites = $this->_callWMT('get','https://www.google.com/webmasters/tools/feeds/sites','',array(),array('entry')); $sites = array(); foreach ($rawSites['feed']['entry'] as $entry) { $site = explode('/', $entry['title']); $site = $site[2]; $sites[$site] = $entry; } return $sites; } function verifySite($site, $location = '') { $entry = $this->getSite($site); $vm = $entry['entry']['wt:verification-method']; if ($location!='') file_put_contents("$location/$vm", $vm); return $this->_callWMT('put',"https://www.google.com/webmasters/tools/feeds/sites/{site}", $site, array('verification-method' => array('_value' => $vm, 'type' => 'htmlpage', 'in-use' => 'true', 'file-content' => "goolge-site-verification: $vm" ) )); } } function ut_WebmasterTools ($username, $password, $website,$location) { $wt = new WebmasterTools($username, $password); echo "Get Site\n"; print_r($wt->getSite($website)); echo "Delete Site\n"; print_r($wt->deleteSite($website)); echo "Create Site\n"; print_r($wt->createSite($website)); echo "Verify Site\n"; print_r($wt->verifySite($website)); echo "Set Location\n"; print_r($wt->setGeoLocation($website,$location)); } ?>
這段程序用起來很方便,似乎也不需要調用複雜的GData Zend什麼的,所以無需特别的配置環境,我已經用這段代碼修改了200多個站點的geolocation。
另外,Google Webmaster API有添加站點的數量限制,不能超過1000個,但這個限制隻針對API,如果是人工手工添加更多的站點,是可以超過1000個的。
2014年8月21日補充:轉移到usloft4065服務器上的這個目錄:/var/www/html/yellowpage.bizdirlib.com/sites/all/will-delete/ZendGdata-1.12.0/demos/Zend/Gdata/
评论1
Google Webmaster Data API中沒有删除網址的命令
我們有些錯誤的網址,雖然現在已經是404狀态,但Google Webmaster Tools中看還是顯示抓取錯誤,希望能主動批量提交删除,目前在Webmaster Data API中沒有找到删除URL的辦法,隻能手工逐個提交。
不過這個問題似乎也不算太大,Google自己會逐步删除這些錯誤的網址,逐步不再爬取和提示錯誤的,我想是這樣的。
補充:後面用301重定向基本解決了,讓錯誤的網址轉到正确的網址去,這個辦法應該最合适了。