I'm using PDFSharp to read a PDF document created in Crystal Report.
I referenced the PDFSharp Libraries using VB.NET
From the below you can see i load the text from the text elements into a textbox.
The textbox is then populated with the below text
In between the Td and Tj tags is the actual text.
Tj means show text as per adobe's PDF specification see chapter 9
If I copy the markup into excel. Col1 is the actual text you see when its open in adobe reader and the strange encoding (Strange to me)

There's a pattern!
The double %% gave it away :wtf:
The reason I'm asking this is becaue I need to find speciffic text in a PDF document and then split it up. I familiar on how to do all the splitting and creating new documents and stuff.
This only happens when I try to read a PDF created in Crystal Reports, If i create a word document and then save it as PDF the text between the brackets sitting between td and tj has clear text.
anyone familiar with this ?
I dont want to use itextSharp.
I referenced the PDFSharp Libraries using VB.NET
From the below you can see i load the text from the text elements into a textbox.
Code:
Dim SourceFileName As String = "D:\vbpROJECTS\firstpagetest.pdf"
Dim Inputdocument As PdfDocument = PdfReader.Open(SourceFileName, PdfDocumentOpenMode.Import)
Dim TextExtract As String
'MsgBox(Inputdocument.Pages(0).Contents.Elements.Count)
TextExtract = Inputdocument.Pages(1).Contents.Elements.GetDictionary(0).Stream.ToString
TextBox1.Clear()
TextBox1.Text = TextExtract
The textbox is then populated with the below text
Td
(.*/$%%&) Tj
0 -220 Td
(.8*.%%&) Tj
0 -220 Td
(.8*7%%&) Tj
0 -220 Td
(.8*7%%') Tj
0 -220 Td
(.8*;%%&) Tj
0 -220 Td
(.8*-%%&) Tj
0 -220 Td
(.8*-%%') Tj
0 -220 Td
(.8*-%%@) Tj
0 -220 Td
(.8*-%%B) Tj
In between the Td and Tj tags is the actual text.
Tj means show text as per adobe's PDF specification see chapter 9
If I copy the markup into excel. Col1 is the actual text you see when its open in adobe reader and the strange encoding (Strange to me)

There's a pattern!
D=.
A= *
U= /
T= $
0= %
1= &
The double %% gave it away :wtf:
The reason I'm asking this is becaue I need to find speciffic text in a PDF document and then split it up. I familiar on how to do all the splitting and creating new documents and stuff.
This only happens when I try to read a PDF created in Crystal Reports, If i create a word document and then save it as PDF the text between the brackets sitting between td and tj has clear text.
anyone familiar with this ?
I dont want to use itextSharp.