bc... [google]
Home » Source Code » 实时增量全文检索搜索引擎系统(Instant and Incremental Full-Text Search Engine)

实时增量全文检索搜索引擎系统(Instant and Incremental Full-Text Search Engine)

xihuyu
2016-05-18 14:19:44
The author
View(s):
Download(s): 0
Point (s): 1 
Category Category:
AllAll

Description

Libibase is a library for Search Engine which is Instant and Incremental.

hibase 是一个基于倒排索引库libibase的检索系统, libsbase为基础通信库的一套完整搜索引擎系统.

支持增量在线实时索引/实时更新int/long/double类型字段

支持BM25检索算法

支持上下文邻近位置标注的中文短句检索

支持多字段检索

支持聚类(group)

支持数据风险安全过滤(预先处理)

支持数据cache时长自定义

支持自定义打分基数

支持int/long/double范围过滤

支持int/long位操作(屏蔽,过滤)

程序功能介绍

hibase主要包括hidocd,himasterd,hindexd,hiqparserd,hiqdocd

hidocd 数据处理正排生成,正排分发;

himasterd 分布式检索归并节点;

hindexd 索引/检索节点;

hiqparserd query parser节点, 负责把query转换成 检索需要的IQUERY结构;

hiqdocd 文档摘要节点;

以上hiqdocd hiqparserd和hindexd实质是一个程序, 只是根据配置不同完成不同的功能, hindexd可以实现单节点上的数据检索功能,同时hindexd还担负从hidocd接收文档完成倒排索引的功能.

索引过程

数据生成: 生成格式参见ibase.h::{FHEADER}结构; hidocd根据生成的这个文件顺序读取然后处理成正排格式存储到本地db(db.c);

数据分发: hidocd提供http接口管理,可以添加数据节点, 然后根据任务分发数据给对应的hiqparserd hiqdocd hindexd;

数据索引: hindexd接受到数据完成本地倒排;

检索过程

请求方通过himasterd提供的HTTP接口传入传入参数;

himaster本次key => id转换以后把请求串发送给hiqparserd;

hiqparserd对请求参数进行处理,分词query转换成IQUERY结构返回himasterd;

himaster把IQUERY结构转发给hindexd;

hindexd根据IQUERY本地检索完成后返回检索{IRES}+{IRECORD}TOPK

himasterd根据hindexd返回归并排序,并且根据要求cache(CACHE时长可以在请求中定义)

himasterd根据from,count参数发送请求到hiqdocd;

hiqdocd根据请求完成动态摘要以JASON格式返回;

himasterd返回JASON格式的结构(可以要求是否需要做摘要)

安装和配置

通信基本库: libevbase libsbase http://code.google.com/p/sbase/downloads/list

倒排索引库: libibase http://code.google.com/p/libibase/downloads/list

检索系统: hibase http://code.google.com/p/libibase/downloads/list

按照列表顺序下载最新版本进行安装 : ./configure --prefix=/usr && make

可以参考rpms(centos 5.5)已经编译好的版本使用;

有问题可以给我的mail/MSN: sounos@gmail.com

Sponsored links

File list

Tips: You can preview the content of files by clicking file names^_^
Name Size Date
01.96 kB
bigtable-osdi06.pdf216.03 kB2007-08-07 18:12
gfs-sosp2003.pdf269.47 kB2007-08-07 18:12
google_clusters.pdf146.96 kB2005-11-21 22:16
google搜索原理论文75.50 kB2012-05-11 04:46
Hybrid.pdf219.57 kB2010-10-08 15:57
Incremental.pdf1.31 MB2010-10-08 16:01
Information_Extraction.pdf2.35 MB2010-04-06 02:03
Inverted.pdf928.95 kB2010-10-08 16:01
irbookonlinereading.pdf6.58 MB2009-04-01 00:34
mapreduce-osdi04.pdf186.24 kB2007-08-07 18:15
mapreduce.pdf1.12 MB2009-05-05 03:51
SearchEngineDesignandImplementation.pdf7.58 MB2012-05-11 20:26
Searching.the.Web.pdf1.67 MB2012-09-06 04:24
The1.17 MB2009-03-23 10:23
The_C_Programming_Language_Ritchie_kernighan.pdf1.21 MB2009-03-24 00:07
Top783.19 kB2009-01-16 06:22
writeos-1.0-2.pdf3.61 MB2008-11-29 03:10
指针经验总结(经典%2C非常详细).pdf214.41 kB2008-08-22 03:29
检索参数.pdf66.20 kB2012-05-16 21:21
...
Sponsored links

Comments

(Add your comment, get 0.1 Point)
Minimum:15 words, Maximum:160 words
zzc21321321
2017-10-25

大牛,基于倒排索引能够极大地提升搜索效率,具有参考价值。

  • 1
  • Page 1
  • Total 1

实时增量全文检索搜索引擎系统(Instant and Incremental Full-Text Search Engine) (23.40 MB)

Need 1 Point(s)
Your Point (s)

Your Point isn't enough.

Get 22 Point immediately by PayPal

Point will be added to your account automatically after the transaction.

More(Debit card / Credit card / PayPal Credit / Online Banking)

Submit your source codes. Get more Points

LOGIN

Don't have an account? Register now
Need any help?
Mail to: support@codeforge.com

切换到中文版?

CodeForge Chinese Version
CodeForge English Version

Where are you going?

^_^"Oops ...

Sorry!This guy is mysterious, its blog hasn't been opened, try another, please!
OK

Warm tip!

CodeForge to FavoriteFavorite by Ctrl+D