Word question

tcofran

Expert Member
Joined
Jun 17, 2009
Messages
3,563
Reaction score
1,257
Location
Pretoria
I know this is a long shot.....

i have a lot of documents (.doc and .docx) with coordinates for sites. The information is all in the same place in the various documents. I need (want) to plot the sites in GIS. Is there any software that can extract the info (in batch) from the various documents and then add this to a spreadsheet ?

Thanks

T
 
Can you give an example of how the word doc is formatted?
 
The coordinates are easier - standard format that can be picked up using wildcard characters. You could probably create a macro that manually searches the document for those strings that contain coordinates and copy them across to a text file. The blue box at the top is trickier.

Trying to think of a way to do this without having to write any code.
:unsure:
 
The coordinates are easier - standard format that can be picked up using wildcard characters. You could probably create a macro that manually searches the document for those strings that contain coordinates and copy them across to a text file. The blue box at the top is trickier.

Trying to think of a way to do this without having to write any code.
:unsure:
@saor, the blue box at the top only needs the site description, if that makes it easier.... Should have changed the highlighting
 
Last edited:
No...not sure how to do this without writing a VB script to automate the process.
 
Looks like a table.
Tables from Word can be copied and pasted into Excel and retain the table layout.
It is then simple to get the cells that contain the data i.e. B4, D15 and F15
 
Looks like a table.
Tables from Word can be copied and pasted into Excel and retain the table layout.
It is then simple to get the cells that contain the data i.e. B4, D15 and F15
Sure, but how to automate that process & parse the data for what sounds like a lot of long documents.
 
Sure, but how to automate that process & parse the data for what sounds like a lot of long documents.
A simple macro can delete sets of rows and move cells to a designated order.
i.e. delete rows 1 to 3, 5 to 13 and 15 to 20
then move cell A2 to A1, D2 to B1 and F2 to C1
delete row 2
Wash, rinse, repeat
 
Sure, but how to automate that process & parse the data for what sounds like a lot of long documents.
A simple macro can delete sets of rows and move cells to a designated order.
i.e. delete rows 1 to 3, 5 to 13 and 15 to 20
then move cell A2 to A1, D2 to B1 and F2 to C1
delete row 2
Wash, rinse, repeat

Thanks for the effort guys, but ja, it is about 600 documents......
 
Thanks for the effort guys, but ja, it is about 600 documents......
Look at this:

 
can you send me an example of a 1 or 2 page doc? data can be horse poo, just need it to represent the format
@Steamy Tom , please send my a PM with your e-mail address, i cant message you on here

Look at this:

Thanks Dairyfarmer, will have a look
 
for those guys interest who offered other advice here, or for someone struggling in future...i am not a c# dev so don't chastise me :p

Code:
using System;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace wordExtractor
{
    class Program
    {
        static void Main(string[] args)
        {
            string output = "";

            string[] fileArray = Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory, "*.docx");

            foreach (var file in fileArray)
            {
                try
                {
                    using (WordprocessingDocument doc =
                   WordprocessingDocument.Open(file, true))
                    {
                        // Find the first table in the document.
                        Table table =
                            doc.MainDocumentPart.Document.Body.Elements<Table>().First();

                        TableRow row = table.Elements<TableRow>().ElementAt(6);
                        TableCell cell = row.Elements<TableCell>().ElementAt(1);

                        output += cell.InnerText;
                        output += "\t";

                        TableRow row2 = table.Elements<TableRow>().ElementAt(17);
                        TableCell cell2 = row2.Elements<TableCell>().ElementAt(3);
                        output += cell2.InnerText;
                        output += "\t";

                        TableCell cell3 = row2.Elements<TableCell>().ElementAt(5);
                        output += cell3.InnerText;

                        output += "\n";
                    }
                }
                catch(Exception ex)
                {
                    output += "issue with file: ";
                    output += file;
                    output += "\n";
                }
            }

            string destPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "results.txt");
            File.WriteAllText(destPath, output);
        }
    }
}
 
Last edited:
Top
Sign up to the MyBroadband newsletter
X