博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
全文检索引擎Solr系列——整合MySQL、MongoDB
阅读量:7072 次
发布时间:2019-06-28

本文共 5231 字,大约阅读时间需要 17 分钟。

MySQL

  1. 拷贝mysql-connector-java-5.1.25-bin.jar到E:\solr-4.8.0\example\solr-webapp\webapp\WEB-INF\lib目录下面
  2. 配置E:\solr-4.8.0\example\solr\collection1\conf\solrconfig.xml
1
2
3
4
5
6
<
requestHandler
name
=
"/dataimport"
     
class
=
"org.apache.solr.handler.dataimport.DataImportHandler"
>
       
<
lst
name
=
"defaults"
>
          
<
str
name
=
"config"
>data-config.xml</
str
>
       
</
lst
>
</
requestHandler
>
  1. 导入依赖库文件:
1
<lib dir="../../../dist/" regex="solr-dataimporthandler-\d.*\.jar"/>
加在
1
<lib dir="../../../dist/" regex="solr-cell-\d.*\.jar" />
前面。
  1. 创建E:\solr-4.8.0\example\solr\collection1\conf\data-config.xml,指定MySQL数据库地址,用户名、密码以及建立索引的数据表
    <?
    xml
    version
    =
    "1.0"
    encoding
    =
    "UTF-8"
    ?>
          
    <
    dataConfig
                  
    <
    dataSource
    type
    =
    "JdbcDataSource"
                              
    driver
    =
    "com.mysql.jdbc.Driver"
                              
    url
    =
    "jdbc:mysql://localhost:3306/django_blog"
                              
    user
    =
    "root"
                              
    password
    =
    ""
    /> 
                      
    <
    document
    name
    =
    "blog"
                              
    <
    entity
    name
    =
    "blog_blog"
    pk
    =
    "id"
                                      
    query
    =
    "select id,title,content from blog_blog"
                                      
    deltaImportQuery
    =
    "select id,title,content from blog_blog where ID='${dataimporter.delta.id}'" 
                                      
    deltaQuery="select id  from blog_blog where add_time > '${dataimporter.last_index_time}'" 
                                      
    deletedPkQuery="select id  from blog_blog where id=0"> 
                                   
    <
    field
    column
    =
    "id"
    name
    =
    "id"
    /> 
                                   
    <
    field
    column
    =
    "title"
    name
    =
    "title"
    /> 
                                   
    <
    field
    column
    =
    "content"
    name
    =
    "content"
    /> 
                              
    </
    entity
                     
    </
    document
    >
          
    </
    dataConfig
    >
    • query 用于初次导入到索引的sql语句。
      • 考虑到数据表中的数据量非常大,比如千万级,不可能一次索引完,因此需要分批次完成,那么查询语句query要设置两个参数:${dataimporter.request.length} ${dataimporter.request.offset}
      • query=”select id,title,content from blog_blog limit ${dataimporter.request.length} offset ${dataimporter.request.offset}”
      • 请求:http://localhost:8983/solr/collection2/dataimport?command=full-import&commit=true&clean=false&offset=0&length=10000
    • deltaImportQuery 根据ID取得需要进入的索引的单条数据。
    • deltaQuery 用于增量索引的sql语句,用于取得需要增量索引的ID。
    • deletedPkQuery 用于取出需要从索引中删除文档的的ID
  2. 为数据库表字段建立域(field),编辑E:\solr-4.8.0\example\solr\collection1\conf\schema.xml:
<!-- mysql -->
   
<
field
name
=
"id"
type
=
"string"
indexed
=
"true"
stored
=
"true"
required
=
"true"
/>
   
<
field
name
=
"title"
type
=
"text_cn"
indexed
=
"true"
stored
=
"true"
termVectors
=
"true"
termPositions
=
"true"
termOffsets
=
"true"
/>
   
<
field
name
=
"content"
type
=
"text_cn"
indexed
=
"true"
stored
=
"true"
termVectors
=
"true"
termPositions
=
"true"
termOffsets
=
"true"
/>
<!-- mysql -->
  1. 配置增量索引更新文件

参考:

Mongodb

    1. 安装,最好使用手动安装方式:
      git clone https://github.com/10gen-labs/mongo-connector.git cd mongo-connector #安装前修改mongo_connector/constants.py的变量:设置DEFAULT_COMMIT_INTERVAL = 0 python setup.py install 

      默认是不会自动提交了,这里设置成自动提交,否则mongodb数据库更新,索引这边没法同时更新,或者在命令行中可以指定是否自动提交,不过我现在还没发现。

    2. 配置schema.xml,把mongodb中需要加上索引的字段配置到schema.xml文件中:
      <?
      xml
      version
      =
      "1.0"
      encoding
      =
      "UTF-8"
      ?>
       
      <
      schema
      name
      =
      "example"
      version
      =
      "1.5"
      >
           
      <
      field
      name
      =
      "_version_"
      type
      =
      "long"
      indexed
      =
      "true"
      stored
      =
      "true"
      />
           
      <
      field
      name
      =
      "_id"
      type
      =
      "string"
      indexed
      =
      "true"
      stored
      =
      "true"
      required
      =
      "true"
      multiValued
      =
      "false"
      />
           
      <
      field
      name
      =
      "body"
      type
      =
      "string"
      indexed
      =
      "true"
      stored
      =
      "true"
      />
           
      <
      field
      name
      =
      "title"
      type
      =
      "string"
      indexed
      =
      "true"
      stored
      =
      "true"
      multiValued
      =
      "true"
      />
           
      <
      field
      name
      =
      "text"
      type
      =
      "text_general"
      indexed
      =
      "true"
      stored
      =
      "false"
      multiValued
      =
      "true"
      />  
           
      <
      uniqueKey
      >_id</
      uniqueKey
      >
           
      <
      defaultSearchField
      >title</
      defaultSearchField
      >
           
      <
      solrQueryParser
      defaultOperator
      =
      "OR"
      />
           
      <
      fieldType
      name
      =
      "string"
      class
      =
      "solr.StrField"
      sortMissingLast
      =
      "true"
      />
           
      <
      fieldType
      name
      =
      "long"
      class
      =
      "solr.TrieLongField"
      precisionStep
      =
      "0"
      positionIncrementGap
      =
      "0"
      />
           
      <
      fieldType
      name
      =
      "text_general"
      class
      =
      "solr.TextField"
      positionIncrementGap
      =
      "100"
      >
             
      <
      analyzer
      type
      =
      "index"
      >
               
      <
      tokenizer
      class
      =
      "solr.StandardTokenizerFactory"
      />
               
      <
      filter
      class
      =
      "solr.StopFilterFactory"
      ignoreCase
      =
      "true"
      words
      =
      "stopwords.txt"
      />
               
      <
      filter
      class
      =
      "solr.LowerCaseFilterFactory"
      />
             
      </
      analyzer
      >
             
      <
      analyzer
      type
      =
      "query"
      >
               
      <
      tokenizer
      class
      =
      "solr.StandardTokenizerFactory"
      />
               
      <
      filter
      class
      =
      "solr.StopFilterFactory"
      ignoreCase
      =
      "true"
      words
      =
      "stopwords.txt"
      />
               
      <
      filter
      class
      =
      "solr.SynonymFilterFactory"
      synonyms
      =
      "synonyms.txt"
      ignoreCase
      =
      "true"
      expand
      =
      "true"
      />
               
      <
      filter
      class
      =
      "solr.LowerCaseFilterFactory"
      />
             
      </
      analyzer
      >
           
      </
      fieldType
      >
       
      </
      schema
      >
    3. 启动Mongod:
      mongod --replSet myDevReplSet --smallfiles 

      初始化:rs.initiate()

    4. 启动mongo-connector:
      E:\Users\liuzhijun\workspace\mongo-connector\mongo_connector\doc_managers>mongo-connector -m localhost:27017 -t http://localhost:8983/solr/collection2 -n s_soccer.person -u id -d ./solr_doc_manager.py 
      • -m:mongod服务
      • -t:solr服务
      • -n:mongodb命名空间,监听database.collection,多个命名空间逗号分隔
      • -u:uniquekey
      • -d:处理文档的manager文件

      注意:mongodb通常使用_id作为uniquekey,而Solrmore使用id作为uniquekey,如果不做处理,索引文件时将会失败,有两种方式来处理这个问题:

      1. 指定参数--unique-key=id到mongo-connector,Mongo Connector 就可以翻译把_id转换到id
      2. 把schema.xml文件中的:
        <uniqueKey>id<uniqueKey> 

        替换成

        <uniqueKey>_id</uniqueKey> 

        同时还要定义一个_id的字段:

        <field name="_id" type="string" indexed="true" stored="true" /> 
      3. 启动时如果报错:
        2014-06-18 12:30:36,648 - ERROR - OplogThread: Last entry no longer in oplog cannot recover! Collection(Database(MongoClient('localhost', 27017), u'local'), u'oplog.rs') 

        清空E:\Users\liuzhijun\workspace\mongo-connector\mongo_connector\doc_managers\config.txt中的内容,需要删除索引目录下的文件重新启动

    5. 测试
      mongodb中的数据变化都会同步到solr中去。

转载地址:http://eezml.baihongyu.com/

你可能感兴趣的文章
算法实验1 两个数组的中位数
查看>>
仓储管理的目标
查看>>
gcc g++ 参数介绍
查看>>
本博客供喜欢JAVA的同学一起交流学习
查看>>
trie树
查看>>
xshell常用命令大全
查看>>
秒杀?能不能先预估下服务器能不能顶的住再玩啊!!!
查看>>
Oracle回顾
查看>>
R中数据结构
查看>>
mysql数据库学习(二)--表操作
查看>>
学习Qt的一些心得笔记
查看>>
cookie与session组件
查看>>
Windows Server 2008 R2下将JBoss安装成windows系统服务
查看>>
关于dubbo服务的xml配置文件报错的问题
查看>>
Escape
查看>>
运营商 WLAN
查看>>
并发编程 —— ScheduledThreadPoolExecutor
查看>>
zabbix 监控域名证书到期时间!!!!
查看>>
Java Magic. Part 1: java.net.URL
查看>>
异步实现服务器推送消息(聊天功能示例)
查看>>