Upload Code
loading-left
loading loading loading
loading-right

Loading

Profile
No self-introduction
codes (2)
A web crawler program developed with C #
no vote
Shotsearch is a web crawler program developed with C #. Its kernel includes crawler, storage, web post-processing, index generation and so on. In the process of crawling web pages, you can flexibly make a variety of rules, filter URL, build in an infinite growth of the subsequent queue module, you can pause or stop crawling at any time, crawling web pages can be time-sharing or according to the specified size block storage (custom large file system). In the process of web page processing, there is a built-in rule processing engine, which can flexibly extract or filter text information by writing regular expressions, store useful information in database (general database interface) or generate index (support Lucene and XML) Hubble.NET )。 A Chinese word segmentation module supporting Lucene is built in. be based on Quartz.NET Each step (crawling, processing, index generation) is a job, and each job can be flexibly combined and expanded according to XML. Built in multiple interfaces, developers can develop and replace a module at any time according to their own needs.
scq0123
2016-08-23
0
1
Multithreaded web crawler
no vote
NCrawler is a simple and highly efficient multithreaded web crawler. C# development, pipeline based processor. It contains HTML, Text, PDF, and IFilter document processor and language detection (Google). Can easily add pipe steps to extract, use and modify information.
scq0123
2013-10-23
0
1
No more~