Solarion
Honorary Master
- Joined
- Nov 14, 2012
- Messages
- 28,050
- Reaction score
- 17,804
Hey guys. I am busy experimenting with using C# to read Microsoft Word Documents. It it related to a task I may or may not have coming up in around July.
So I will not be able to use Interops for security reasons on the server itself. I have tried all day with Open XML and also with DOCX but so far nothing has been able to achieve what I want to.
I have a word document. In this document is a table which has 12 rows and 1 single column. Each row has some text and is text field is numbered 1,2,3.. up to 12 respectively. So 1.xxxx, 2.xxxx etc.
Now here is that catch. The number is auto generated by Microsoft Word. It only displays when the document is opened with Microsoft Word. If I use a C# Console App to read in this document using either Open XML or DOCX, the numbers are gone, but the text in each column is all I see when displayed in the console. I've tried this with all of the free packages with all the same result.
After doing much digging and head scratching it turns out that these numbers do not actually appear on a Word document as text, but are formatted elements. I have tried with both packages to read these formatted elements to no avail. They simply do not have this capability.
I tried converting the document to text, html, even alien, but nothing helps. So at this point I am concluding that this project may not actually meet a feasibility analysis, which is the point it's at now. It's not a big project, not at all. But it is one of those things which could come in handy if accomplished. Which may be wishful thinking.
Tl'dr: Well hopefully you read at least a little bit, but essentially, I'm really just wanting to know if any of you have come across having to do this in the past?
So I will not be able to use Interops for security reasons on the server itself. I have tried all day with Open XML and also with DOCX but so far nothing has been able to achieve what I want to.
I have a word document. In this document is a table which has 12 rows and 1 single column. Each row has some text and is text field is numbered 1,2,3.. up to 12 respectively. So 1.xxxx, 2.xxxx etc.
Now here is that catch. The number is auto generated by Microsoft Word. It only displays when the document is opened with Microsoft Word. If I use a C# Console App to read in this document using either Open XML or DOCX, the numbers are gone, but the text in each column is all I see when displayed in the console. I've tried this with all of the free packages with all the same result.
After doing much digging and head scratching it turns out that these numbers do not actually appear on a Word document as text, but are formatted elements. I have tried with both packages to read these formatted elements to no avail. They simply do not have this capability.
I tried converting the document to text, html, even alien, but nothing helps. So at this point I am concluding that this project may not actually meet a feasibility analysis, which is the point it's at now. It's not a big project, not at all. But it is one of those things which could come in handy if accomplished. Which may be wishful thinking.
Tl'dr: Well hopefully you read at least a little bit, but essentially, I'm really just wanting to know if any of you have come across having to do this in the past?
Last edited: