Converting PDF to Text in C#
2016-08-23
0 0 0
4.0
Other
Earn points
Translated by maninwest@Codeforge Author:Dan Letecky @CodeProject
There are several main methods for extracting text from PDF files in .NET:
Microsoft IFilter interface and Adobe IFilter implementation.iTextSharp
PDFBox
None of these PDF parsing solutions is perfect. We will discuss all these methods below.
1. Parsing PDF using Adobe PDF IFilter
In order to parse PDF files using IFilter interface you need the following:
Windows 2000 or laterAdobe Acrobat or Reader 7.0.5+ (or the standalone Adobe PDF IFilter [adobe.com])
IFilter COM wrapper class [dotlucene.net]Sample code:
using IFilter; // ... public static string ExtractTextFromPdf(string path) { return DefaultParser.Extract(path); }
Download a sample project:
Parsing PDF Files using IFilter [squarepdf.net]
c#
转化
格式
成文
Related Source Codes
No. 186: DX0110- Source code for community propert
0
0
no vote
No. 219: DX0149- Source code for community propert
0
0
no vote
Verification code identification
0
0
no vote
CSV data analysis tool
0
0
no vote
Source code of hospital medical record information
0
0
no vote
No comment