当前位置

Drupal 7中让页面归类的PHP程序

James Qi 在 2016年6月14日 - 09:38 提交
内容摘要:最近需要编写一段程序来读取Drupal网站中页面Node的某个文本字段,进行处理、判断、匹配后,将这个页面归类Taxonomy到某个术语表Vocabulary的术语Term中。在刚开始用Drupal ......

  最近需要编写一段程序来读取Drupal网站中页面Node的某个文本字段,进行处理、判断、匹配后,将这个页面归类Taxonomy到某个术语表Vocabulary的术语Term中。在刚开始用Drupal 6的时候就曾经编写过类似程序来分类,见博文《Drupal中让Node归类的PHP程序》,在后来使用Drupal 7的过程中,绝大多数分类都是在创建网站、导入数据的时候就自动进行了,使用了术语来源Term reference字段和自动完成术语挂件(标签)Autocomplete term widget (tagging)控件,但也有把数据作为文本导入字段,然后再运行php程序进行分类的情况,不过Drupal 7中的程序与Drupal 6的有些不同,当时没有记录博客,后来再找以前的程序很费劲,现在补记一下,示范程序如下:

<?php
$province="anhui";//这里写成固定的,也可以用php运行参数的方式来引入
$offset=$argv[1];
$limit=$argv[2];

$_SERVER['HTTP_HOST'] = "ditu.mingluji.com.$province";//子目录方式的站点就这样写http_host
$_SERVER['SCRIPT_NAME'] = "/ditu_category.php";
$_SERVER['REMOTE_ADDR'] = '127.0.0.1';

$drupal_path = '/usr/local/apache2/htdocs/ditu.mingluji.com/';
chdir($drupal_path);

define('DRUPAL_ROOT', $drupal_path);

require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$node_type = "poi";

//$sql = "SELECT node.nid FROM {node} WHERE node.type = '$node_type' LIMIT $limit OFFSET $offset";
$sql = "SELECT node.nid FROM {node} LEFT JOIN {field_data_field_category} field_data_field_category ON node.nid = field_data_field_category.entity_id WHERE node.type = '$node_type' AND (field_data_field_category.field_category_tid IS NULL ) LIMIT $limit OFFSET $offset";//加入了联合查询,将已经进行了分类的页面排除,以免重新运行
//echo $_SERVER['HTTP_HOST'];
//echo "\n";
//echo $sql;
//echo "\n";
$result = db_query($sql);
//print_r ($result);
//echo "\n";
$count=0;
$count_added=0;
$count_not_add=0;
$count_adding=0;
$vid=taxonomy_vocabulary_machine_name_load('category')->vid;//获取vid

while ($anode = $result->fetch()) {
$count++;
/*
print_r ($anode);
echo "\n";
*/
$nid=$anode->nid;
$entity=node_load($nid);
//print_r ($node);
//将CCK数据库字段读出赋值给变量

$classification = $entity->field_classification['und'][0]['value'];//读取文本字段
print "nid=$nid,classification=$classification\n";
$category=$classification;//对文本字段内容进行变换处理
$category=preg_replace("/\[\d{1,3},/",'',$category);
$category=str_replace('[','',$category);
$category=str_replace(']','',$category);
$category=str_replace('<\/font>','',$category);
$category=str_replace('委,办,局','委/办/局',$category);
$array=explode(',',$category);
$i=0;
foreach ($array as $name) {//多个分类词逐个处理
  $term=taxonomy_get_term_by_name($name);
  //print_r ($term);
  $tid=key($term);
  if ($term==NULL) {//如果术语不存在则先创建该术语
    $term = new stdClass();
    $term->name = $name;
    $term->vid = $vid;
    taxonomy_term_save($term);
    $tid=$term->tid;
  }
    print "i=$i,name=$name,tid=$tid\n";
  $entity->field_category['und'][$i]['tid']=$tid;//分类属于的tid设置
  $i++;
}
//print $category;
//$entity->field_category['und'][0]['value'] = $category;
//$entity->field_classification['und'][0]['value'] = "classification:$classification category:$category";
node_save($entity);//保存
}

echo "\n------------------------\n";
print "Done!\n";
print "count=$count\n";
print "count_added=$count_added\n";
print "count_adding=$count_adding\n";
print "count_not_add=$count_not_add\n";

?>

  上面这个是2014年10月编写的一个地图系列站的分类程序,更早2013年6月编写的各国名录系列站的按照地区、行业分类程序如下:

<?php
//程序开头注释部分开始
/*

通用地区分类程序

运行步骤:

1、SSH登录网站所在服务器:69.64.43.200
2、进入本程序所在目录:cd /root/drupal7.bizdirlib.com-php
3、上传分类与地区的对应文件(下面详细解释),例如:dza.txt
4、运行本程序,并带上3个参数(下面详细解释),例如:php refresh_category_area.php dza 0 10000
5、查看本程序运行过程中以及运行结束后的屏幕提示,了解处理情况和统计数据(下面详细解释)
6、可以在views中增加一个test area (test industry)来查看未分类情况,找出新的规律,修改dza.txt并上传,再次运行

参数说明:

参数1:国家代码,也就是网站域名最前面部分,例如dza.bizdirlib.com的国家代码为'dza'
参数2:开始偏移量,也就是程序处理开头的序号,一般就用0,表示从头开始,注意这个参数不是node id,而是需要处理的页面偏移量
参数3:数量限制,也就是程序处理的条数,调试的时候可以用1表示仅处理1条,也可以是10表示10条,实际运行可以是10000或者更多,但一般不要超过5万,否则有可能php内存不足而中断报错,如果需要处理的数量超过5万,可以多次运行本程序,每次处理5万条

对应文件:
地区对应文件命名为dza.txt,其实dza是dza.bizdirlib.com的前面部分,文件内容如下:

"Algiers","Draria"
"Algiers","Bordj El Kiffan"
"Algiers","Ouled Fayet"
"Algiers","Beni Messous"
"Algiers","Sidi M'hamed"
"Algiers","El Biar"
"Algiers","EL Marsa"
"Algiers","Ain Benian"
"Algiers","Hamma Anassers"
"Algiers","Zeralda"
"Algiers","Bab El Oued"
"Algiers","Baraki"
"Sétif","Setif"
"Tizi Ouzou","Freha"
"Tizi Ouzou","Tigzirt"
"Tipaza","Cherchell"
"Mascara","Tighenif"
"Tlemcen","Chetouane"
"Bejaia","Akbou"
"Bouira","Lakhdaria"
"Ouargla","Touggourt"
"Djasr Kassentina","Djasr Kassentina"
"Hussein Dey","Hussein Dey"
"M'sila","M'sila"
"Beijing","Beijing"
"Hubei","Hubei"
"Hubei","Wuhan"
"Hubei","Shiyan"

每一行包含两个部分,逗号前为“分类”,逗号后为匹配的“关键词”,也就是说只要在地址字段找到“关键词”,就划分到“分类”中去。
注意:dza.txt 请用utf-8 unix格式保存,否则非英文字符都成乱码

统计数据:
count //本次运行的总计数
count_added //以前已经增加过分类的计数
count_not_add //本次运行中没有增加分类的计数
count_adding //本次运行中正增加分类的计数

*/
//程序开头注释部分结束

//定义区开始,请在下面填写国家英文名称、国家代码以及地区数组这3个变量

$country_code=$argv[1];//国家代码
$offset=$argv[2];//开始偏移量
$limit=$argv[3];//数量限制

switch ($country_code) {
case "ae":
  $country_name="United Arab Emirates";
  break;
case "are":
  $country_name="United Arab Emirates";
  break;
case "afg":
  $country_name="Afghanistan";
  break;
case "arm":
  $country_name="Armenia";
  break;
case "cn":
  $country_name="China";
  break;
case "chn":
  $country_name="China";
  break;
case "hkg":
  $country_name="Hong Kong";
  break;
case "ind":
  $country_name="India";
  break;
case "idn":
  $country_name="Indonesia";
  break;
case "mys":
  $country_name="Malaysia";
  break;
case "aze":
  $country_name="Azerbaijan";
  break;
case "bhr":
  $country_name="Bahrain";
  break;
case "bgd":
  $country_name="Bangladesh";
  break;
case "btn":
  $country_name="Bhutan";
  break;
case "brn":
  $country_name="Brunei";
  break;
case "khm":
  $country_name="Cambodia";
  break;
case "irn":
  $country_name="Iran";
  break;
case "irq":
  $country_name="Iraq";
  break;
case "jpn":
  $country_name="Japan";
  break;
case "jor":
  $country_name="Jordan";
  break;
case "kaz":
  $country_name="Kazakhstan";
  break;
case "kwt":
  $country_name="Kuwait";
  break;
case "kgz":
  $country_name="Kyrgyzstan";
  break;
case "lao":
  $country_name="Laos";
  break;
case "lbn":
  $country_name="Lebanon";
  break;
case "mac":
  $country_name="Macau";
  break;
case "mdv":
  $country_name="Maldives";
  break;
case "mmr":
  $country_name="Myanmar";
  break;
case "npl":
  $country_name="Nepal";
  break;
case "omn":
  $country_name="Oman";
  break;
case "pak":
  $country_name="Pakistan";
  break;
case "pse":
  $country_name="Palestine";
  break;
case "phl":
  $country_name="Philippines";
  break;
case "qat":
  $country_name="Qatar";
  break;
case "sau":
  $country_name="Saudi Arabia";
  break;
case "sg":
  $country_name="Singapore";
  break;
case "sgp":
  $country_name="Singapore";
  break;
case "lka":
  $country_name="Sri lanka";
  break;
case "syr":
  $country_name="Syria";
  break;
case "tw":
  $country_name="Taiwan";
  break;
case "twn":
  $country_name="Taiwan";
  break;
case "tjk":
  $country_name="Tajikistan";
  break;
case "tha":
  $country_name="Thailand";
  break;
case "uzb":
  $country_name="Uzbekistan";
  break;
case "vnm":
  $country_name="Vietnam";
  break;
case "yem":
  $country_name="Yemen";
  break;
case "kor":
  $country_name="South Korea";
  break;
case "au":
  $country_name="Australia";
  break;
case "aus":
  $country_name="Australia";
  break;
case "nz":
  $country_name="New Zealand";
  break;
case "nzl":
  $country_name="New Zealand";
  break;
case "fji":
  $country_name="Fiji";
  break;
case "png":
  $country_name="Papua New Guinea";
  break;
case "wsm":
  $country_name="Samoa";
  break;
case "alaska":
  $country_name="Alaska";
  break;
case "abw":
  $country_name="Aruba";
  break;
case "canada":
  $country_name="Canada";
  break;
case "can":
  $country_name="Canada";
  break;
case "bhs":
  $country_name="Bahamas";
  break;
case "brb":
  $country_name="Barbados";
  break;
case "bmu":
  $country_name="Bermuda";
  break;
case "cym":
  $country_name="Cayman Islands";
  break;
case "cub":
  $country_name="Cuba";
  break;
case "dom":
  $country_name="Dominican Republic";
  break;
case "grd":
  $country_name="Grenada";
  break;
case "gtm":
  $country_name="Guatemala";
  break;
case "hti":
  $country_name="Haiti";
  break;
case "jam":
  $country_name="Jamaica";
  break;
case "pan":
  $country_name="Panama";
  break;
case "mex":
  $country_name="Mexico";
  break;
case "tto":
  $country_name="Trinidad and Tobago";
  break;
case "vir":
  $country_name="Virgin Islands US";
  break;
case "unitedstates":
  $country_name="United States";
  break;
case "dza":
  $country_name="Algeria";
  break;
case "ago":
  $country_name="Angola";
  break;
case "ben":
  $country_name="Benin";
  break;
case "bfa":
  $country_name="Burkina Faso";
  break;
case "bdi":
  $country_name="Burundi";
  break;
case "cmr":
  $country_name="Cameroon";
  break;
case "tcd":
  $country_name="Chad";
  break;
case "cog":
  $country_name="Congo";
  break;
case "dji":
  $country_name="Djibouti";
  break;
case "egy":
  $country_name="Egypt";
  break;
case "gha":
  $country_name="Ghana";
  break;
case "ken":
  $country_name="Kenya";
  break;
case "mdg":
  $country_name="Madagascar";
  break;
case "mli":
  $country_name="Mali";
  break;
case "mar":
  $country_name="Morocco";
  break;
case "nga":
  $country_name="Nigeria";
  break;
case "sdn":
  $country_name="Sudan";
  break;
case "zaf":
  $country_name="South Africa";
  break;
case "tza":
  $country_name="Tanzania";
  break;
case "eth":
  $country_name="Ethiopia";
  break;
case "lby":
  $country_name="Libya";
  break;
case "and":
  $country_name="Andorra";
  break;
case "at":
  $country_name="Austria";
  break;
case "aut":
  $country_name="Austria";
  break;
case "be":
  $country_name="Belgium";
  break;
case "bel":
  $country_name="Belgium";
  break;
case "deu":
  $country_name="Germany";
  break;
case "it":
  $country_name="Italy";
  break;
case "ita":
  $country_name="Italy";
  break;
case "nld":
  $country_name="Netherlands";
  break;
case "blr":
  $country_name="Belarus";
  break;
case "bgr":
  $country_name="Bulgaria";
  break;
case "hrv":
  $country_name="Croatia";
  break;
case "cyp":
  $country_name="Cyprus";
  break;
case "cze":
  $country_name="Czech";
  break;
case "dnk":
  $country_name="Denmark";
  break;
case "est":
  $country_name="Estonia";
  break;
case "fin":
  $country_name="Finland";
  break;
case "fr":
  $country_name="France";
  break;
case "fra":
  $country_name="France";
  break;
case "geo":
  $country_name="Georgia";
  break;
case "grc":
  $country_name="Greece";
  break;
case "hun":
  $country_name="Hungary";
  break;
case "isl":
  $country_name="Iceland";
  break;
case "irl":
  $country_name="Ireland";
  break;
case "lva":
  $country_name="Latvia";
  break;
case "lie":
  $country_name="Liechtenstein";
  break;
case "ltu":
  $country_name="Lithuania";
  break;
case "lux":
  $country_name="Luxembourg";
  break;
case "mlt":
  $country_name="Malta";
  break;
case "mda":
  $country_name="Moldova";
  break;
case "mco":
  $country_name="Monaco";
  break;
case "nor":
  $country_name="Norway";
  break;
case "pol":
  $country_name="Poland";
  break;
case "prt":
  $country_name="Portugal";
  break;
case "rus":
  $country_name="Russia";
  break;
case "srb":
  $country_name="Serbia";
  break;
case "svk":
  $country_name="Slovakia";
  break;
case "svn":
  $country_name="Slovenia";
  break;
case "swe":
  $country_name="Sweden";
  break;
case "ch":
  $country_name="Switzerland";
  break;
case "che":
  $country_name="Switzerland";
  break;
case "tur":
  $country_name="Turkey";
  break;
case "ukr":
  $country_name="Ukraine";
  break;
case "unitedkingdom":
  $country_name="United Kingdom";
  break;
case "gb":
  $country_name="United Kingdom";
  break;
case "gbr":
  $country_name="United Kingdom";
  break;
case "es":
  $country_name="Spain";
  break;
case "esp":
  $country_name="Spain";
  break;
case "rou":
  $country_name="Romania";
  break;
case "mkd":
  $country_name="Macedonia";
  break;
case "aia":
  $country_name="Anguilla";
  break;
case "arg":
  $country_name="Argentina";
  break;
case "bol":
  $country_name="Bolivia";
  break;
case "bra":
  $country_name="Brazil";
  break;
case "chl":
  $country_name="Chile";
  break;
case "col":
  $country_name="Colombia";
  break;
case "cuw":
  $country_name="Curacao, Netherlands Antilles";
  break;
case "ecu":
  $country_name="Ecuador";
  break;
case "slv":
  $country_name="El Salvador";
  break;
case "glp":
  $country_name="Guadeloupe French";
  break;
case "guy":
  $country_name="Guyana";
  break;
case "hnd":
  $country_name="Honduras";
  break;
case "nic":
  $country_name="Nicaragua";
  break;
case "per":
  $country_name="Peru";
  break;
case "pri":
  $country_name="Puerto Rico";
  break;
case "sxm":
  $country_name="Sint Maarten (Dutch)";
  break;
case "sur":
  $country_name="Suriname";
  break;
case "ven":
  $country_name="Venezuela";
  break;
case "mtq":
  $country_name="Martinique French";
  break;
case "cri":
  $country_name="Costa Rica";
  break;
default:
  $country_name="Country Name";
}

//定义区结束,下面的程序不需要修改
//print_r($area_array);

//print "offset=$offset\n";
//print "limit=$limit\n";

//print "country_name=$country_name\n";
//print "country_code=$country_code\n";

$file="$country_code.txt";
$fp=fopen($file,"r");//以只读的方式打开文件
$count_line=0;
$file_array=array();
while(!(feof($fp)))
{
    $text=fgets($fp);//读取文件的一行
    $text=str_replace("\n",'',$text);
    $text=str_replace("\r",'',$text);
//    print "text=$text\n";
    if ($text!='') {
        $len=strpos($text,'","');
        $term=substr($text,1,$len-1);
        $area=substr($text,$len+3,-1);
        $file_array[$count_line]['term']=$term;
        $file_array[$count_line]['area']=$area;
//        print "file_array[$count_line]['term']=".$file_array[$count_line]['term']."\n";
//        print "file_array[$count_line]['area']=".$file_array[$count_line]['area']."\n";
    }
//    print_r ($file_array);
//$vocalbulary->vid='area';
/*
    $term_area = taxonomy_get_term_by_name ($term);
    if ($term_area==NULL) { //如果没有该分类存在,则创建该分类
        $term_object->vid='area';
        $term_object->name=$term;
        taxonomy_term_save($term_object);
        $term_area = taxonomy_get_term_by_name ($term);
        print "term '$term' saved\n";
    } else {
        print "term '$term' exist\n";
    }
*/
    $count_line++;
}
//    print_r ($file_array);

$_SERVER['HTTP_HOST'] = "$country_code.bizdirlib.com";
$_SERVER['SCRIPT_NAME'] = "/refresh_category_area.php";
$_SERVER['REMOTE_ADDR'] = '127.0.0.1';

$drupal_path = '/var/www/html/drupal7.bizdirlib.com/';
chdir($drupal_path);

define('DRUPAL_ROOT', $drupal_path);

#require_once './includes/bootstrap.inc';
require_once DRUPAL_ROOT.'includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

foreach ($file_array as $file) {
    //print "key=$key,term=$term\n";
    $term=$file['term'];
    $area=$file['area'];
        print "term='$term',area='$area'\n";

$term_area = taxonomy_get_term_by_name ($term);
//    print_r ($term_area);

    if ($term_area==NULL) { //如果没有该分类存在,则创建该分类
        $term_object=new stdClass();
        $term_object->vid=2;
        $term_object->name=$term;
//        print_r ($term_object);
        taxonomy_term_save($term_object);
        $term_area = taxonomy_get_term_by_name ($term);
        print "term '$term' saved\n";
    } else {
        print "term '$term' exist\n";
    }
}

$node_type = "country";
//$limit = 10000;
//$offset = 0;
//$sql = "SELECT node.nid FROM {node} WHERE node.type = '$node_type' LIMIT $limit OFFSET $offset";
$sql = "SELECT node.nid FROM {node} LEFT JOIN {field_data_field_area} field_data_field_area ON node.nid = field_data_field_area.entity_id WHERE node.type = '$node_type' AND (field_data_field_area.field_area_tid IS NULL ) LIMIT $limit OFFSET $offset"; //将已经分类过的node排除掉
//print "sql=$sql\n";
$result = db_query($sql);

$count=0; //本次运行的总计数
$count_added=0; //以前已经增加过分类的计数
$count_not_add=0; //本次运行中没有增加分类的计数
$count_adding=0; //本次运行中正增加分类的计数

while ($anode = $result->fetch()) {
    $count++;
/*
print_r ($anode);
echo '<br>\n';
*/
    $node=node_load($anode->nid);
print "\n node $anode->nid \n";
/*
print_r ($node);
echo '<br>';
*/
//将CCK数据库字段读出赋值给变量
//基本信息,字段对应是准确的

    $address=$node->field_address[und][0]['value'];
    //$phone=$node->field_phone[und][0]['value'];
    $area=$node->field_area[und][0]['tid'];
    //$industry=$node->field_industry[und][0]['tid'];

    $address=str_ireplace($country_name,"",$address); //替换掉国家名称,大小写不敏感
//$address=str_ireplace("需要替换的其它字符","",$address); //如果有必要,还可以替换其它字符
//print "address=$address\n";

//$tids=array();
//$terms=array("Bouzareah","Ben Aknoun");
    $i=0;
$adding=false;
    foreach ($file_array as $file) {
        //print "key=$key,term=$term\n";
        $term=$file['term'];
        $area=$file['area'];
//        print "term='$term',area='$area'\n";

    $term_area = taxonomy_get_term_by_name ($term);
//    print_r ($term_area);
/*
    if ($term_area==NULL) { //如果没有该分类存在,则创建该分类
        $term_object->vid='area';
        $term_object->name=$term;
        taxonomy_term_save($term_object);
        $term_area = taxonomy_get_term_by_name ($term);
        print "term '$term' saved\n";
    } else {
        print "term '$term' exist\n";
    }
*/
        $tid_area=key($term_area);
//        print "tid_area=$tid_area\n";
//print "stristr $address $area ".stristr($address,$area)."\n";
        if (stristr($address,$area)!==false) { //地址中找到该词
            $node->field_area[und][$i]['tid']=$tid_area;
            $i++;
            node_save($node);
            $count_adding++;
            echo "country_code=$country_code,count=$count,count_adding=$count_adding \n";
            $adding=true;
            break;
        }
    } //end foreach
//print "count_adding=$count_adding\n";
    if (!$adding) {
        $count_not_add++;
        print "country_code=$country_code,count=$count,count_not_add=$count_not_add,address=$address";
    }

} //end while

//程序运行结束,下面打印统计数据

print "\n------------------------\n";
print "country_code=$country_code\n";
print "Done!\n";
print "count=$count\n";
print "count_adding=$count_adding\n";
//print "count_added=$count_added\n";
print "count_not_add=$count_not_add\n";
?>

  行业分类:

<?php
//程序开头注释部分开始
/*

通用行业分类程序

运行步骤:

1、SSH登录网站所在服务器:69.64.43.200
2、进入本程序所在目录:cd /root/drupal7.bizdirlib.com-php
3、上传分类与行业的对应文件(下面详细解释),例如:mdg-industry.txt
4、运行本程序,并带上3个参数(下面详细解释),例如:php refresh_category_industry.php mdg 0 10000
5、查看本程序运行过程中以及运行结束后的屏幕提示,了解处理情况和统计数据(下面详细解释)
6、可以在views中增加一个test area (test industry)来查看未分类情况,找出新的规律,修改dza-industry.txt并上传,再次运行

参数说明:

参数1:国家代码,也就是网站域名最前面部分,例如dza.bizdirlib.com的国家代码为'dza'
参数2:开始偏移量,也就是程序处理开头的序号,一般就用0,表示从头开始,注意这个参数不是node id,而是需要处理的页面偏移量
参数3:数量限制,也就是程序处理的条数,调试的时候可以用1表示仅处理1条,也可以是10表示10条,实际运行可以是10000或者更多,但一般不要超过5万,否则有可能php内存不足而中断报错,如果需要处理的数量超过5万,可以多次运行本程序,每次处理5万条

对应文件:
行业对应文件命名为mdg-industry.txt,其中mdg是mdg.bizdirlib.com的前面部分,文件内容如下:

ABATTOIRS and VIANDE EN GROS
ADDUCTION D'EAU and VRD
ADMINISTRATIONS
AEROPORTS
AEROPORTS,SECURITE AERIENNE
AGENCEMENT and DECORATION
AGENCES DE PRESSE and D'INFORMATION
AGENCES DE PUBLICITE and DE COMMUNICATION

每一行包含一个行业。
注意:mdg-industry.txt 请用utf-8 unix格式保存,否则非英文字符都成乱码

统计数据:
count //本次运行的总计数
count_added //以前已经增加过分类的计数
count_not_add //本次运行中没有增加分类的计数
count_adding //本次运行中正增加分类的计数

*/
//程序开头注释部分结束

//定义区开始,请在下面填写国家英文名称、国家代码以及地区数组这3个变量

$country_code=$argv[1];//国家代码
$offset=$argv[2];//开始偏移量
$limit=$argv[3];//数量限制

//定义区结束,下面的程序不需要修改
//print_r($industry_array);

//print "offset=$offset\n";
//print "limit=$limit\n";

//print "country_name=$country_name\n";
//print "country_code=$country_code\n";

$file="$country_code-industry.txt";
$fp=fopen($file,"r");//以只读的方式打开文件
$count_line=0;
$file_array=array();
while(!(feof($fp)))
{
    $text=fgets($fp);//读取文件的一行
    $text=str_replace("\n",'',$text);
    $text=str_replace("\r",'',$text);
//    print "text=$text\n";
    if ($text!='') {
//        $len=strpos($text,'","');
//        $term=substr($text,1,$len-1);
//        $industry=substr($text,$len+3,-1);
        $term=$text;
        $industry=$text;
        $file_array[$count_line]['term']=$term;
        $file_array[$count_line]['industry']=$industry;
//        print "file_array[$count_line]['term']=".$file_array[$count_line]['term']."\n";
//        print "file_array[$count_line]['industry']=".$file_array[$count_line]['industry']."\n";
    }
//    print_r ($file_array);
//$vocalbulary->vid='industry';
    $count_line++;
}
//    print_r ($file_array);

$_SERVER['HTTP_HOST'] = "$country_code.bizdirlib.com";
$_SERVER['SCRIPT_NAME'] = "/refresh_category_industry.php";
$_SERVER['REMOTE_ADDR'] = '127.0.0.1';

$drupal_path = '/var/www/html/drupal7.bizdirlib.com/';
chdir($drupal_path);

define('DRUPAL_ROOT', $drupal_path);

#require_once './includes/bootstrap.inc';
require_once DRUPAL_ROOT.'includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

foreach ($file_array as $file) {
    //print "key=$key,term=$term\n";
    $term=$file['term'];
    $industry=$file['industry'];
//    print "term='$term',industry='$industry'\n";

$term_industry = taxonomy_get_term_by_name ($term);
//    print_r ($term_industry);

    if ($term_industry==NULL) { //如果没有该分类存在,则创建该分类
        $term_object=new stdClass();
        $term_object->vid=3;
        $term_object->name=$term;
//        print_r ($term_object);
        taxonomy_term_save($term_object);
        $term_industry = taxonomy_get_term_by_name ($term);
        print "term '$term' saved\n";
    } else {
        print "term '$term' exist\n";
    }
}

$node_type = "country";
//$limit = 10000;
//$offset = 0;
//$sql = "SELECT node.nid FROM {node} WHERE node.type = '$node_type' LIMIT $limit OFFSET $offset";
$sql = "SELECT node.nid FROM {node} LEFT JOIN {field_data_field_industry} field_data_field_industry ON node.nid = field_data_field_industry.entity_id WHERE node.type = '$node_type' AND (field_data_field_industry.field_industry_tid IS NULL ) LIMIT $limit OFFSET $offset"; //将已经分类过的node排除掉
//print "sql=$sql\n";
$result = db_query($sql);

$count=0; //本次运行的总计数
$count_added=0; //以前已经增加过分类的计数
$count_not_add=0; //本次运行中没有增加分类的计数
$count_adding=0; //本次运行中正增加分类的计数

while ($anode = $result->fetch()) {
    $count++;
/*
print_r ($anode);
echo '<br>\n';
*/
    $node=node_load($anode->nid);
print "\n node $anode->nid \n";
/*
print_r ($node);
echo '<br>';
*/
//将CCK数据库字段读出赋值给变量
//基本信息,字段对应是准确的

    $category=$node->field_category_activities[und][0]['value'];
    //$phone=$node->field_phone[und][0]['value'];
    //$area=$node->field_area[und][0]['tid'];
    $industry=$node->field_industry[und][0]['tid'];

//    $address=str_ireplace($country_name,"",$address); //替换掉国家名称,大小写不敏感
//$address=str_ireplace("需要替换的其它字符","",$address); //如果有必要,还可以替换其它字符
//print "address=$address\n";

//$tids=array();
//$terms=array("Bouzareah","Ben Aknoun");
    $i=0;
    $adding=false;
    $category_count=substr_count($category,"^^");
    //print "category1=$category1 \n";
//    print "category_count=$category_count,category=$category,category1=$category1\n";
    if ($category==NULL) {
//        print "country_code=$country_code,count=$count,count_not_add=$count_not_add,category=$category \n";
/*
    } elseif ($category1==$category) {
        $term_industry=taxonomy_get_term_by_name ($category1);
        if ($term_industry!==NULL) {
            $tid_industry=key($term_industry);
            $node->field_industry[und][$i]['tid']=$tid_industry;
            $i++;
            node_save($node);
            $count_adding++;
            echo "country_code=$country_code,count=$count,count_adding=$count_adding \n";
            $adding=true;
        }
*/
    } else {
        $count_adding++;
        for ($i=0;$i<=$category_count;$i++) {
            if ($i==0) {
                $category1=strtok($category,"^^");
            } else {
                $category1=strtok("^^");
            }
            $term_industry=taxonomy_get_term_by_name ($category1);
            if ($term_industry!==array()) {
                $tid_industry=key($term_industry);
                $node->field_industry[und][$i]['tid']=$tid_industry;
                //echo "country_code=$country_code,count=$count,count_adding=$count_adding,i=$i \n";
                echo "country_code=$country_code,count=$count,count_adding=$count_adding,i=$i,category1=$category1 \n";
                $adding=true;
            } else {
                $count_adding--;
                break;
            }
        }
                node_save($node);
    }
    if (!$adding) {
        $count_not_add++;
        print "country_code=$country_code,count=$count,count_not_add=$count_not_add \n";
        //print "country_code=$country_code,count=$count,count_not_add=$count_not_add,category=$category \n";
    }

} //end while

//程序运行结束,下面打印统计数据

print "\n------------------------\n";
print "country_code=$country_code\n";
print "Done!\n";
print "count=$count\n";
print "count_adding=$count_adding\n";
//print "count_added=$count_added\n";
print "count_not_add=$count_not_add\n";
?>

  主要的程序内容就是上面这些,每个站在实际使用的时候需要根据实际情况进行修改,特别是文本字段读取出来后的一些判断处理。另外,上面的程序写得很乱,一些中途用过做调试的语句就注释着依然放在里面,仅作参考。

自由标签:

添加新评论

Plain text

  • 不允许使用HTML标签。
  • 自动将网址与电子邮件地址转变为链接。
  • 自动断行和分段。
验证码
本问题用于测试您是否为人类访问者,避免自动垃圾发贴。
图形验证
键入显示在图片中的字符