我們一直很重視網站地圖對搜索引擎的提交,以前的MediaWiki自帶生成sitemap的程序,Drupal也有專門的第三方擴展XML Sitemap程序。
但Drupal的這個擴展隻能對node, user, taxonomy term, menu等生成網站地圖,也可以手工添加custom網址加入地圖中,但卻無法把Views批量做成的頁面都加進去。這個問題以前不算很突出、很重要,因為主要頁面都是node頁面或者分類頁面,但采取“在Drupal中直接導入、使用數據庫”的辦法以後,一個網站的主要頁面基本上都是Views生成的,這時Drupal的xmlsitemap擴展程序就起不了很大作用了。
去年在嘗試在Drupal中直接導入數據到數據庫表的時候就考慮到這個問題,也有了一些思路,可以用Views本身來生成xmlsitemap,例如生成json數據,再用一個外部sitemap.php程序調用、呈現。但感覺麻煩了一些,如果直接用SQL語句查詢數據庫,可以省去做專門Views的過程。
最近我們在采取新辦法搭建台灣《國語小字典》、《國語小辭典》、《國語大辭典》、《成語辭典》的時候,就專門花時間來研究這個,也算是找到了一個比較好的解決辦法,辦法是在網站更目錄下創建一個sitemap.php,在.htaccess裡面用重定向來設置sitemap訪問的URL網址:
RewriteBase /
# robots.txt
RewriteCond %{REQUEST_URI} ^\/robots\.txt$
RewriteRule ^(.*)$ /robots.php [L]
# sitemap_xxx.xml
RewriteCond %{HTTP_HOST} ^zidian\.18dao\.net$
RewriteCond %{REQUEST_URI} ^\/sitemap_(danzi|bushou|bihua|bushoubihua)\.xml$
RewriteRule ^(.*)$ /sitemap.php [L]
RewriteCond %{HTTP_HOST} ^cidian\.18dao\.net$
RewriteCond %{REQUEST_URI} ^\/sitemap_(zici|bushou|bihua)\.xml$
RewriteRule ^(.*)$ /sitemap.php [L]
RewriteCond %{HTTP_HOST} ^dacidian\.18dao\.net$
RewriteCond %{REQUEST_URI} ^\/sitemap_(zici|bushou|bihua)\.xml$
RewriteRule ^(.*)$ /sitemap.php [L]
RewriteCond %{HTTP_HOST} ^chengyu\.18dao\.net$
RewriteCond %{REQUEST_URI} ^\/sitemap_(chengyu|shouzipinyin|shouzibihuashu|shouzi|weizi)\.xml$
RewriteRule ^(.*)$ /sitemap.php [L]
一個目錄下的多站點設置時,可以像上面這樣對不同的網站設置幾個不同的網站地圖。
而sitemap.php的源代碼如下,一些注釋就直接寫在裡面:
<?php
/*
* sitemap.php
* James Qi
* 2018-1-11
* example: https://cidian.18dao.net/sitemap_zici.xml?page=2
* modify .htaccess, example:
RewriteCond %{HTTP_HOST} ^cidian\.18dao\.net$
RewriteCond %{REQUEST_URI} ^\/sitemap_(zici|bushou|bihua)\.xml$
RewriteRule ^(.*)$ /sitemap.php [L]
* modify robots.txt, example:
Sitemap: https://cidian.18dao.net/sitemap_zici.xml
Sitemap: https://cidian.18dao.net/sitemap_bushou.xml
Sitemap: https://cidian.18dao.net/sitemap_bihua.xml
*/
# 設置sitemap的Content-Type為application/xml,符合規範
# 如果不設置就成了默認的text/xml,顯示純文本,雖然也可以用,但不規範
header("Content-Type: application/xml");
# 設置PHP參數
ini_set('memory_limit','512M');
ini_set('display_errors', 'On');
error_reporting(E_ALL);
# 讀取網址中的參數
$http_host = $_SERVER['HTTP_HOST'];//example:cidian.18dao.net
$request_uri = $_SERVER['REQUEST_URI'];//example:/sitemap_zici.xml?page=2
$query_string = $_SERVER['QUERY_STRING'];//example:page=2
# 獲取網址中的sitemap名稱
$map_start = strpos($request_uri,'_');//example: 8
$map_end = strpos($request_uri,'.');//example: 13
$map = substr($request_uri, $map_start + 1, $map_end - $map_start - 1);//example: zici
if ($map == 'sitemap') {
print "no sitemap";
exit;
}
# 定義站點對應的數據庫
$database_array = array(
'zidian.18dao.net' => 'net_18dao_zidian',
'cidian.18dao.net' => 'net_18dao_cidian',
'dacidian.18dao.net' => 'net_18dao_dacidian',
'chengyu.18dao.net' => 'net_18dao_chengyu',
);
$database = $database_array[$http_host];//example: 'net_18dao_cidian';
# sitemap默認參數,可以在單獨的sitemap中定義覆蓋
$lastmod_default = '2018-01-13T00:00:00Z';
$priority_default = '0.5';//0.0 - 1.0
$changefreq_default = 'weekly';//always, hourly, daily, weekly, monthly, yearly, never
$path_default = $map;//默認路徑與map名稱一緻//example:'zici';
$url_per_page_default = 10000;//sitemap協議規定一個sitemap可以包含最多50000個網址
# 定義每個sitemap的參數,
# 以http_host和map名稱為二維數組的兩個參數
# 必填寫項:定義table,field,
# 可選項(不填寫就是用默認值):lastmod, priority, changefreq
$sitemap_array = array(
'zidian.18dao.net' => array(
'danzi' => array(
'table' => 'dict_mini',
'field' => 'danzi',
),
'bushou' => array(
'table' => 'dict_mini',
'field' => 'bushou',
'priority' => '0.6',
),
'bihua' => array(
'table' => 'dict_mini',
'field' => 'danzibihua',
'priority' => '0.6',
),
'bushoubihua' => array(
'table' => 'dict_mini',
'field' => 'bushoubihua',
'priority' => '0.6',
),
),
'cidian.18dao.net' => array(
'zici' => array(
'table' => 'dict_concised',
'field' => 'ziciming',
),
'bushou' => array(
'table' => 'dict_concised',
'field' => 'bushou',
'priority' => '0.6',
),
'bihua' => array(
'table' => 'dict_concised',
'field' => 'zongbihuashu',
'priority' => '0.6',
),
),
'dacidian.18dao.net' => array(
'zici' => array(
'table' => 'dict_revised',
'field' => 'ziciming',
),
'bushou' => array(
'table' => 'dict_revised',
'field' => 'bushouzi',
'priority' => '0.6',
),
'bihua' => array(
'table' => 'dict_revised',
'field' => 'zongbihuashu',
'priority' => '0.6',
),
),
'chengyu.18dao.net' => array(
'chengyu' => array(
'table' => 'dict_idioms',
'field' => 'chengyu',
),
'shouzipinyin' => array(
'table' => 'dict_idioms',
'field' => 'shouzipinyin',
'priority' => '0.6',
),
'shouzibihuashu' => array(
'table' => 'dict_idioms',
'field' => 'shouzibihuashu',
'priority' => '0.6',
),
'shouzi' => array(
'table' => 'dict_idioms',
'field' => 'shouzi',
'priority' => '0.7',
),
'weizi' => array(
'table' => 'dict_idioms',
'field' => 'weizi',
'priority' => '0.4',
'changefreq' => 'monthly',
'lastmod' => '2018-01-13T10:00:00Z',
),
),
);
# 從以上定義中獲取實際sitemap的參數
$table = $sitemap_array[$http_host][$map]['table'];//example:'dict_concised';
$field = $sitemap_array[$http_host][$map]['field'];//example:'ziciming';
if (isset($sitemap_array[$http_host][$map]['priority'])) {
$priority = $sitemap_array[$http_host][$map]['priority'];
} else {
$priority = $priority_default;
}
if (isset($sitemap_array[$http_host][$map]['changefreq'])) {
$changefreq = $sitemap_array[$http_host][$map]['changefreq'];
} else {
$changefreq = $changefreq_default;
}
if (isset($sitemap_array[$http_host][$map]['lastmod'])) {
$lastmod = $sitemap_array[$http_host][$map]['lastmod'];
} else {
$lastmod = $lastmod_default;
}
if (isset($sitemap_array[$http_host][$map]['url_per_page'])) {
$url_per_page = $sitemap_array[$http_host][$map]['url_per_page'];
} else {
$url_per_page = $url_per_page_default;//default:10000
}
if (isset($sitemap_array[$http_host][$map]['path'])) {
$path = $sitemap_array[$http_host][$map]['path'];
} else {
$path = $path_default;//example:'zici';
}
# 數據庫服務器連接參數
$serverName = '***';
$userName = '***';
$password = '***';
# 連接數據庫
$link = mysqli_connect("$serverName","$userName","$password")
or die("unable to connect to msql server: " . mysql_error());
mysqli_select_db($link,"$database")
or die("unable to select database 'db': " . mysql_error());
# 準備輸出内容
$output = '';
if ($query_string == NULL) { //index file or single file, example: https://cidian.18dao.net/sitemap_cidian.xml or https://cidian.18dao.net/sitemap_bihua.xml
$sql = "SELECT DISTINCT $field FROM $table WHERE not $field like '%gif%' and not $field like '%jpg%' and not $field = ''";
$result = mysqli_query($link,$sql);
$num_rows = $result->num_rows;
$pages = ceil($num_rows / $url_per_page);
if ($pages == 1) {//single file, example: https://cidian.18dao.net/sitemap_bihua.xml
$output .= map_start();
while ($row = $result->fetch_array()) {
$value = $row[0];
$value_urlencode = urlencode($value);
if (strpos($value,'&') == FALSE) {
$output .= "<url>";
$output .= "<loc>https://$http_host/$path/$value_urlencode</loc>";
$output .= "<lastmod>$lastmod</lastmod>";
$output .= "<changefreq>$changefreq</changefreq>";
$output .= "<priority>$priority</priority>";
$output .= "</url>\n";
}
}
$output .= map_end();
} else {//index file, example: https://cidian.18dao.net/sitemap_cidian.xml
$output .= index_start();
for ($i=1; $i<=$pages; $i++) {
$output .= "\t<sitemap>\n";
$output .= "\t\t<loc>https://$http_host$request_uri?page=$i</loc>\n";
$output .= "\t\t<lastmod>$lastmod</lastmod>\n";
$output .= "\t</sitemap>\n";
}
$output .= index_end();
}
} else {//paged file, example: https://cidian.18dao.net/sitemap_cidian.xml?page=2
$page = substr($query_string,5); //example: 2
$offset = ($page - 1) * $url_per_page;
$limit = $url_per_page;
$sql = "SELECT DISTINCT $field FROM $table WHERE not $field like '%gif%' and not $field like '%jpg%' and not $field = '' LIMIT $limit OFFSET $offset";
$result = mysqli_query($link,$sql);
$output .= map_start();
while ($row = $result->fetch_array()) {
$value = $row[0];
$value_urlencode = urlencode($value);
if (strpos($value,'&') == FALSE) {
$output .= "<url>";
$output .= "<loc>https://$http_host/$path/$value_urlencode</loc>";
$output .= "<lastmod>$lastmod</lastmod>";
$output .= "<changefreq>$changefreq</changefreq>";
$output .= "<priority>$priority</priority>";
$output .= "</url>\n";
}
}
$output .= map_end();
}
# 打印輸出
print $output;
# 定義sitemap和index開頭和結尾的内容
function map_start() {
$http_host = $_SERVER['HTTP_HOST'];
$output = '<?xml version="1.0" encoding="UTF-8"?>'."\n";
$output .= '<?xml-stylesheet type="text/xsl" href="https://'.$http_host.'/sitemap.xsl"?>'."\n";
$output .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'."\n";
return $output;
}
function map_end() {
$output = "</urlset>\n";
return $output;
}
function index_start() {
$http_host = $_SERVER['HTTP_HOST'];
$output = '<?xml version="1.0" encoding="UTF-8"?>'."\n";
$output .= '<?xml-stylesheet type="text/xsl" href="https://'.$http_host.'/sitemap.xsl"?>'."\n";
$output .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'."\n";
return $output;
}
function index_end() {
$output = "</sitemapindex>\n";
return $output;
}
?>
上面程序的縮進格式有點問題,看起來不太清晰,我後面再來調整。另外就是在robots.txt中增加如下這樣的内容讓搜索引擎來發現網站地圖:
# sitemap start Sitemap: https://zidian.18dao.net/sitemap.xml Sitemap: https://zidian.18dao.net/rss.xml Sitemap: https://zidian.18dao.net/sitemap_danzi.xml Sitemap: https://zidian.18dao.net/sitemap_bushou.xml Sitemap: https://zidian.18dao.net/sitemap_bihua.xml Sitemap: https://zidian.18dao.net/sitemap_bushoubihua.xml # sitemap end
并且到Google Search Console、百度站長平台等地方去主動提交sitemap。
评论