A web crawler program developed with C #
2016-08-23
0 0 0
no vote
Other
Earn points
Shotsearch is a web crawler program developed with C #. Its kernel includes crawler, storage, web post-processing, index generation and so on. In the process of crawling web pages, you can flexibly make a variety of rules, filter URL, build in an infinite growth of the subsequent queue module, you can pause or stop crawling at any time, crawling web pages can be time-sharing or according to the specified size block storage (custom large file system). In the process of web page processing, there is a built-in rule processing engine, which can flexibly extract or filter text information by writing regular expressions, store useful information in database (general database interface) or generate index (support Lucene and XML) Hubble.NET )。 A Chinese word segmentation module supporting Lucene is built in. be based on Quartz.NET Each step (crawling, processing, index generation) is a job, and each job can be flexibly combined and expanded according to XML. Built in multiple interfaces, developers can develop and replace a module at any time according to their own needs.
c#
爬虫
网络
程序
开发
Related Source Codes
No. 186: DX0110- Source code for community propert
0
0
no vote
No. 219: DX0149- Source code for community propert
0
0
no vote
Verification code identification
0
0
no vote
CSV data analysis tool
0
0
no vote
Source code of hospital medical record information
0
0
no vote
No comment