我們一直很重視網站地圖對搜索引擎的提交,以前的MediaWiki自帶生成sitemap的程序,Drupal也有專門的第三方擴展XML Sitemap程序。
但Drupal的這個擴展隻能對node, user, taxonomy term, menu等生成網站地圖,也可以手工添加custom網址加入地圖中,但卻無法把Views批量做成的頁面都加進去。這個問題以前不算很突出、很重要,因為主要頁面都是node頁面或者分類頁面,但采取“在Drupal中直接導入、使用數據庫”的辦法以後,一個網站的主要頁面基本上都是Views生成的,這時Drupal的xmlsitemap擴展程序就起不了很大作用了。
去年在嘗試在Drupal中直接導入數據到數據庫表的時候就考慮到這個問題,也有了一些思路,可以用Views本身來生成xmlsitemap,例如生成json數據,再用一個外部sitemap.php程序調用、呈現。但感覺麻煩了一些,如果直接用SQL語句查詢數據庫,可以省去做專門Views的過程。
最近我們在采取新辦法搭建台灣《國語小字典》、《國語小辭典》、《國語大辭典》、《成語辭典》的時候,就專門花時間來研究這個,也算是找到了一個比較好的解決辦法,辦法是在網站更目錄下創建一個sitemap.php,在.htaccess裡面用重定向來設置sitemap訪問的URL網址:
RewriteBase / # robots.txt RewriteCond %{REQUEST_URI} ^\/robots\.txt$ RewriteRule ^(.*)$ /robots.php [L] # sitemap_xxx.xml RewriteCond %{HTTP_HOST} ^zidian\.18dao\.net$ RewriteCond %{REQUEST_URI} ^\/sitemap_(danzi|bushou|bihua|bushoubihua)\.xml$ RewriteRule ^(.*)$ /sitemap.php [L] RewriteCond %{HTTP_HOST} ^cidian\.18dao\.net$ RewriteCond %{REQUEST_URI} ^\/sitemap_(zici|bushou|bihua)\.xml$ RewriteRule ^(.*)$ /sitemap.php [L] RewriteCond %{HTTP_HOST} ^dacidian\.18dao\.net$ RewriteCond %{REQUEST_URI} ^\/sitemap_(zici|bushou|bihua)\.xml$ RewriteRule ^(.*)$ /sitemap.php [L] RewriteCond %{HTTP_HOST} ^chengyu\.18dao\.net$ RewriteCond %{REQUEST_URI} ^\/sitemap_(chengyu|shouzipinyin|shouzibihuashu|shouzi|weizi)\.xml$ RewriteRule ^(.*)$ /sitemap.php [L]
一個目錄下的多站點設置時,可以像上面這樣對不同的網站設置幾個不同的網站地圖。
而sitemap.php的源代碼如下,一些注釋就直接寫在裡面:
<?php /* * sitemap.php * James Qi * 2018-1-11 * example: https://cidian.18dao.net/sitemap_zici.xml?page=2 * modify .htaccess, example: RewriteCond %{HTTP_HOST} ^cidian\.18dao\.net$ RewriteCond %{REQUEST_URI} ^\/sitemap_(zici|bushou|bihua)\.xml$ RewriteRule ^(.*)$ /sitemap.php [L] * modify robots.txt, example: Sitemap: https://cidian.18dao.net/sitemap_zici.xml Sitemap: https://cidian.18dao.net/sitemap_bushou.xml Sitemap: https://cidian.18dao.net/sitemap_bihua.xml */ # 設置sitemap的Content-Type為application/xml,符合規範 # 如果不設置就成了默認的text/xml,顯示純文本,雖然也可以用,但不規範 header("Content-Type: application/xml"); # 設置PHP參數 ini_set('memory_limit','512M'); ini_set('display_errors', 'On'); error_reporting(E_ALL); # 讀取網址中的參數 $http_host = $_SERVER['HTTP_HOST'];//example:cidian.18dao.net $request_uri = $_SERVER['REQUEST_URI'];//example:/sitemap_zici.xml?page=2 $query_string = $_SERVER['QUERY_STRING'];//example:page=2 # 獲取網址中的sitemap名稱 $map_start = strpos($request_uri,'_');//example: 8 $map_end = strpos($request_uri,'.');//example: 13 $map = substr($request_uri, $map_start + 1, $map_end - $map_start - 1);//example: zici if ($map == 'sitemap') { print "no sitemap"; exit; } # 定義站點對應的數據庫 $database_array = array( 'zidian.18dao.net' => 'net_18dao_zidian', 'cidian.18dao.net' => 'net_18dao_cidian', 'dacidian.18dao.net' => 'net_18dao_dacidian', 'chengyu.18dao.net' => 'net_18dao_chengyu', ); $database = $database_array[$http_host];//example: 'net_18dao_cidian'; # sitemap默認參數,可以在單獨的sitemap中定義覆蓋 $lastmod_default = '2018-01-13T00:00:00Z'; $priority_default = '0.5';//0.0 - 1.0 $changefreq_default = 'weekly';//always, hourly, daily, weekly, monthly, yearly, never $path_default = $map;//默認路徑與map名稱一緻//example:'zici'; $url_per_page_default = 10000;//sitemap協議規定一個sitemap可以包含最多50000個網址 # 定義每個sitemap的參數, # 以http_host和map名稱為二維數組的兩個參數 # 必填寫項:定義table,field, # 可選項(不填寫就是用默認值):lastmod, priority, changefreq $sitemap_array = array( 'zidian.18dao.net' => array( 'danzi' => array( 'table' => 'dict_mini', 'field' => 'danzi', ), 'bushou' => array( 'table' => 'dict_mini', 'field' => 'bushou', 'priority' => '0.6', ), 'bihua' => array( 'table' => 'dict_mini', 'field' => 'danzibihua', 'priority' => '0.6', ), 'bushoubihua' => array( 'table' => 'dict_mini', 'field' => 'bushoubihua', 'priority' => '0.6', ), ), 'cidian.18dao.net' => array( 'zici' => array( 'table' => 'dict_concised', 'field' => 'ziciming', ), 'bushou' => array( 'table' => 'dict_concised', 'field' => 'bushou', 'priority' => '0.6', ), 'bihua' => array( 'table' => 'dict_concised', 'field' => 'zongbihuashu', 'priority' => '0.6', ), ), 'dacidian.18dao.net' => array( 'zici' => array( 'table' => 'dict_revised', 'field' => 'ziciming', ), 'bushou' => array( 'table' => 'dict_revised', 'field' => 'bushouzi', 'priority' => '0.6', ), 'bihua' => array( 'table' => 'dict_revised', 'field' => 'zongbihuashu', 'priority' => '0.6', ), ), 'chengyu.18dao.net' => array( 'chengyu' => array( 'table' => 'dict_idioms', 'field' => 'chengyu', ), 'shouzipinyin' => array( 'table' => 'dict_idioms', 'field' => 'shouzipinyin', 'priority' => '0.6', ), 'shouzibihuashu' => array( 'table' => 'dict_idioms', 'field' => 'shouzibihuashu', 'priority' => '0.6', ), 'shouzi' => array( 'table' => 'dict_idioms', 'field' => 'shouzi', 'priority' => '0.7', ), 'weizi' => array( 'table' => 'dict_idioms', 'field' => 'weizi', 'priority' => '0.4', 'changefreq' => 'monthly', 'lastmod' => '2018-01-13T10:00:00Z', ), ), ); # 從以上定義中獲取實際sitemap的參數 $table = $sitemap_array[$http_host][$map]['table'];//example:'dict_concised'; $field = $sitemap_array[$http_host][$map]['field'];//example:'ziciming'; if (isset($sitemap_array[$http_host][$map]['priority'])) { $priority = $sitemap_array[$http_host][$map]['priority']; } else { $priority = $priority_default; } if (isset($sitemap_array[$http_host][$map]['changefreq'])) { $changefreq = $sitemap_array[$http_host][$map]['changefreq']; } else { $changefreq = $changefreq_default; } if (isset($sitemap_array[$http_host][$map]['lastmod'])) { $lastmod = $sitemap_array[$http_host][$map]['lastmod']; } else { $lastmod = $lastmod_default; } if (isset($sitemap_array[$http_host][$map]['url_per_page'])) { $url_per_page = $sitemap_array[$http_host][$map]['url_per_page']; } else { $url_per_page = $url_per_page_default;//default:10000 } if (isset($sitemap_array[$http_host][$map]['path'])) { $path = $sitemap_array[$http_host][$map]['path']; } else { $path = $path_default;//example:'zici'; } # 數據庫服務器連接參數 $serverName = '***'; $userName = '***'; $password = '***'; # 連接數據庫 $link = mysqli_connect("$serverName","$userName","$password") or die("unable to connect to msql server: " . mysql_error()); mysqli_select_db($link,"$database") or die("unable to select database 'db': " . mysql_error()); # 準備輸出内容 $output = ''; if ($query_string == NULL) { //index file or single file, example: https://cidian.18dao.net/sitemap_cidian.xml or https://cidian.18dao.net/sitemap_bihua.xml $sql = "SELECT DISTINCT $field FROM $table WHERE not $field like '%gif%' and not $field like '%jpg%' and not $field = ''"; $result = mysqli_query($link,$sql); $num_rows = $result->num_rows; $pages = ceil($num_rows / $url_per_page); if ($pages == 1) {//single file, example: https://cidian.18dao.net/sitemap_bihua.xml $output .= map_start(); while ($row = $result->fetch_array()) { $value = $row[0]; $value_urlencode = urlencode($value); if (strpos($value,'&') == FALSE) { $output .= "<url>"; $output .= "<loc>https://$http_host/$path/$value_urlencode</loc>"; $output .= "<lastmod>$lastmod</lastmod>"; $output .= "<changefreq>$changefreq</changefreq>"; $output .= "<priority>$priority</priority>"; $output .= "</url>\n"; } } $output .= map_end(); } else {//index file, example: https://cidian.18dao.net/sitemap_cidian.xml $output .= index_start(); for ($i=1; $i<=$pages; $i++) { $output .= "\t<sitemap>\n"; $output .= "\t\t<loc>https://$http_host$request_uri?page=$i</loc>\n"; $output .= "\t\t<lastmod>$lastmod</lastmod>\n"; $output .= "\t</sitemap>\n"; } $output .= index_end(); } } else {//paged file, example: https://cidian.18dao.net/sitemap_cidian.xml?page=2 $page = substr($query_string,5); //example: 2 $offset = ($page - 1) * $url_per_page; $limit = $url_per_page; $sql = "SELECT DISTINCT $field FROM $table WHERE not $field like '%gif%' and not $field like '%jpg%' and not $field = '' LIMIT $limit OFFSET $offset"; $result = mysqli_query($link,$sql); $output .= map_start(); while ($row = $result->fetch_array()) { $value = $row[0]; $value_urlencode = urlencode($value); if (strpos($value,'&') == FALSE) { $output .= "<url>"; $output .= "<loc>https://$http_host/$path/$value_urlencode</loc>"; $output .= "<lastmod>$lastmod</lastmod>"; $output .= "<changefreq>$changefreq</changefreq>"; $output .= "<priority>$priority</priority>"; $output .= "</url>\n"; } } $output .= map_end(); } # 打印輸出 print $output; # 定義sitemap和index開頭和結尾的内容 function map_start() { $http_host = $_SERVER['HTTP_HOST']; $output = '<?xml version="1.0" encoding="UTF-8"?>'."\n"; $output .= '<?xml-stylesheet type="text/xsl" href="https://'.$http_host.'/sitemap.xsl"?>'."\n"; $output .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'."\n"; return $output; } function map_end() { $output = "</urlset>\n"; return $output; } function index_start() { $http_host = $_SERVER['HTTP_HOST']; $output = '<?xml version="1.0" encoding="UTF-8"?>'."\n"; $output .= '<?xml-stylesheet type="text/xsl" href="https://'.$http_host.'/sitemap.xsl"?>'."\n"; $output .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'."\n"; return $output; } function index_end() { $output = "</sitemapindex>\n"; return $output; } ?>
上面程序的縮進格式有點問題,看起來不太清晰,我後面再來調整。另外就是在robots.txt中增加如下這樣的内容讓搜索引擎來發現網站地圖:
# sitemap start Sitemap: https://zidian.18dao.net/sitemap.xml Sitemap: https://zidian.18dao.net/rss.xml Sitemap: https://zidian.18dao.net/sitemap_danzi.xml Sitemap: https://zidian.18dao.net/sitemap_bushou.xml Sitemap: https://zidian.18dao.net/sitemap_bihua.xml Sitemap: https://zidian.18dao.net/sitemap_bushoubihua.xml # sitemap end
并且到Google Search Console、百度站長平台等地方去主動提交sitemap。
评论