Semantic text extraction in web pages
2016-08-23
0 0 0
no vote
Other
Earn points
Application background
This code we did as part of our minor project in Semantic Web Technologies subject at our college.
This code was a very basic attempt to try to remove advertisements from web page and show only relevant text. We removed ads and flash and other javascript etc and took only text to show. This code uses python language as it provides lot of libraries to reduce coding effort from programmer side.
Key Technology
Web has become the largest information source with billions of pages. However, aweb page usually contains some contents which are irrelevant with main topic. For example,
there are so many multimedia advertising segments, unnecessary images, or navigation links
in Web pages. These parts can seriously harm Web data mining, distract users from main
topic, and influence PageRank. There are some existing approaches to discover informative
python
网页
提取
文本
语义
Related Source Codes
Word memorization software
0
0
no vote
CWT Python example
0
0
no vote
STFT example
0
0
no vote
Chinese Speech Recognition
0
0
no vote
python-ela
0
0
no vote
No comment