Telemetry data download and storage (FTP and web services)

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
All of it - How sensitive is this?

Mind mailing me the code? - If not post a sample how you are phrasing and what validation is failig


PHP:
<?php

function libxml_display_error($error)
{
    $return = "<br/>\n";
    switch ($error->level) {
        case LIBXML_ERR_WARNING:
            $return .= "<b>Warning $error->code</b>: ";
            break;
        case LIBXML_ERR_ERROR:
            $return .= "<b>Error $error->code</b>: ";
            break;
        case LIBXML_ERR_FATAL:
            $return .= "<b>Fatal Error $error->code</b>: ";
            break;
    }
    $return .= trim($error->message);
    if ($error->file) {
        $return .=    " in <b>$error->file</b>";
    }
    $return .= " on line <b>$error->line</b>\n";

    return $return;
}

function libxml_display_errors() {
    $errors = libxml_get_errors();
    foreach ($errors as $error) {
        print libxml_display_error($error);
    }
    libxml_clear_errors();
}

// Enable user error handling
libxml_use_internal_errors(true);

$xml = new DOMDocument(); 
$xml->load('example.xml'); 

if (!$xml->schemaValidate('example.xsd')) {
    print '<b>DOMDocument::schemaValidate() Generated Errors!</b>';
    libxml_display_errors();
}

?>

https://secure.php.net/manual/en/ref.libxml.php

Yup, was recoding based on ^ for the error handling, but when I saw the type of errors in the file...it gave me pause and I went back to the original files on the source FTP server because I thought my FTP transfer was generating these. The files on the FTP server were the same

The issue is that if data integrity is bad, you are missing essential trip information . You can't have blank spots on a trip. You are also missing data records. This can only be fixed at source.

For example, one XML file I had 4 records in 1 hour . That's for 4 vehicles. These are moving vehicles....I know there was more data for that hour than 4 dots on a map
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
Make sure you defined the correct types - that is normally the issue (at least in my case)

Code:
* <panic>Off</panic>

* * [B]<fu <record>[/B]

* * <assetid>fab98f46-259e-4fa4-b31f-d3d1d43c1c32</assetid>

What data type would the "fu" be above?
The node name doesn't even show fully, I'd doesn't close that tag ">" , or display the contents or have an ending tag </fu>. The record doesn't even end </record>
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
[)roi(];18733824 said:
Is it possible to post a complete test file?, specifically one that is demonstrating the parsing problems.
Am posting on the phone , will post some of these when I am on computer.

The only way I see it possible to :

- Parse as a normal text file
- Split into array(?) at each <record> start tag
- Parse the parts of each record, searching first for the </record> tag (complete record), then for the last </tagname> tag
- filter out the totally invalid records, keep the incomplete ones that have enough data to at least create a GPS position.
-if complete, check the "column" names vs columns in the database, create as required (was doing this in XML via xslt before. It had worked well, I have 33 columns already created)
-import the records, either by imploding the array to create XML ,/ JSON, or just loop though and add to db
-add filename to import db table with status completed or completed with errors , and number of valid rows

Whatever the case, it's a rewrite on many levels....so will wait for the developers to redeem themselves first.
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
Make sure you defined the correct types - that is normally the issue (at least in my case)

Code:
* <panic>Off</panic>

* * [B]<fu <record>[/B]

* * <assetid>fab98f46-259e-4fa4-b31f-d3d1d43c1c32</assetid>

What data type would the "fu" be above?
The node name doesn't even show fully, I'd doesn't close that tag ">" , or display the contents or have an ending tag </fu>. The record doesn't even end </record>
A big FU from the xml file.


That's most likely the error if I get a xml file like that I kick it and flag it as corrupt.
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
It's a tough one really, but invalid xml is invalid.

you could always try a SAX parser instead of a DOM parser.

But this could happen with any data that must follow conventions.

a JSON string missing a comma or a colon will also just explode with no warning.
Yup. The first 20 files I played with had no errors. And were imported successfully into db. Files that were processed would not get process again, as they had been logged in the db as droid suggested.

I threw the code at a different clients data, and all these errors started popping up:

----Intermission----

And thats where I am sitting
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Am posting on the phone , will post some of these when I am on computer.

The only way I see it possible to :

- Parse as a normal text file
- Split into array(?) at each <record> start tag
- Parse the parts of each record, searching first for the </record> tag (complete record), then for the last </tagname> tag
- filter out the totally invalid records, keep the incomplete ones that have enough data to at least create a GPS position.
-if complete, check the "column" names vs columns in the database, create as required (was doing this in XML via xslt before. It had worked well, I have 33 columns already created)
-import the records, either by imploding the array to create XML ,/ JSON, or just loop though and add to db
-add filename to import db table with status completed or completed with errors , and number of valid rows

Whatever the case, it's a rewrite on many levels....so will wait for the developers to redeem themselves first.
Will review once you've shared.
Ps. All the failing examples and at least 1 example that you consider ok.
 
Last edited:

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
There's unfortunately no consistency in the malformed xml files; if there was; we could have possibly used 'tidy' to make the adjustment, but that's certainly not the case.

  • XML header tag & <reports> tag missing
  • Incomplete tag <fu (likely fuellevel)
  • <driv incomplete tag (likely drivername)
  • Missing closing tag </record>
  • <alt incomplete tag (likely altitude)
Clearly there's an integrity issue with the routines that generate the xml files on the host. Naturally the bug needs to fixed at source, but I still suggest at least some alteration to your workflow:
  1. Separate the FTP mirroring folder and the 'inbox' folder. Basically the process that evaluates the xml files against your import metadata, would copy any new / unprocessed files from the 'mirror' folder to the 'inbox' folder. From this 'inbox' folder your import process will then try to parse the file.
    • If it parses without error then the file is moved to a 'completed / processed' folder.
    • If it fails with malformation issues, then i should be moved to an 'error' folder.
All files in the 'error' folder will need to be manually corrected; trying to write code around inconsistent malformation is risky; far better to fix the bug(s) at source. Once the files have been manually fixed (missing tags added / closed, invalid tags removed, ...), you simply manually move these back into the 'inbox folder' to try the processing again -- naturally your import metadata needs to be aware that file was received from 'mirror' folder but has a parsing error, so it doesn't keep trying to copy / parse these malformed files.


/Edit: had the malformation been less severe, e.g. missing closing tag; then you could simply have included 'tidy' with a right mix of settings, for example:
PHP:
$xmlString = @file_get_contents($filepath);
$tidy = new tidy();
$tidyString = $tidy->repairString($xmlString, array("input-xml" => 1));
$xml = simplexml_load_string($tidyString);
...
 
Last edited:

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
[)roi(];18740672 said:
There's unfortunately no consistency in the malformed xml files; if there was; we could have possibly used 'tidy' to make the adjustment, but that's certainly not the case.

  • XML header tag & <reports> tag missing
  • Incomplete tag <fu (likely fuellevel)
  • <driv incomplete tag (likely drivername)
  • Missing closing tag </record>
  • <alt incomplete tag (likely altitude)
Clearly there's an integrity issue with the routines that generate the xml files on the host. Naturally the bug needs to fixed at source, but I still suggest at least some alteration to your workflow:
  1. Separate the FTP mirroring folder and the 'inbox' folder. Basically the process that evaluates the xml files against your import metadata, would copy any new / unprocessed files from the 'mirror' folder to the 'inbox' folder. From this 'inbox' folder your import process will then try to parse the file.
    • If it parses without error then the file is moved to a 'completed / processed' folder.
    • If it fails with malformation issues, then i should be moved to an 'error' folder.
All files in the 'error' folder will need to be manually corrected; trying to write code around inconsistent malformation is risky; far better to fix the bug(s) at source. Once the files have been manually fixed (missing tags added / closed, invalid tags removed, ...), you simply manually move these back into the 'inbox folder' to try the processing again -- naturally your import metadata needs to be aware that file was received from 'mirror' folder but has a parsing error, so it doesn't keep trying to copy / parse these malformed files.


/Edit: had the malformation been less severe, e.g. missing closing tag; then you could simply have included 'tidy' with a right mix of settings, for example:
PHP:
$xmlString = @file_get_contents($filepath);
$tidy = new tidy();
$tidyString = $tidy->repairString($xmlString, array("input-xml" => 1));
$xml = simplexml_load_string($tidyString);
...

Thanks, will have a look
 

gkm

Expert Member
Joined
May 10, 2005
Messages
1,519
[)roi(];18707694 said:
As opposed to? the use of FTP?
Is this what you're roughly referring to?
View attachment 402412

If yes, then that really going to depend on the source server, hardly useful if it's solid state, plus pushing onto the queue at the source end implies an ability to introduce changes on the source server. @imranpanji would have to confirm if that's possible / agreeable.

Plus how do you envisage dealing with the .open file i.e. no fixed trigger point, aside from the hourly completion / state change You could push a copy onto the queue at a timed interval, but then we probably need to question the merits of this over FTP?

Yes, this is roughly what I mean. As long as imranpanji can at least do code changes to the source server, which he seems to be able to do, he can write all the updates to a remote queue, say SQS or whatever (I am just picking SQS, since that is the one I have the most recent experience with), rather than a file, which can then be picked up by the target system to apply the updates to the target database.

Something like: (source system) -> (SQS queue) -> (SQS poller) -> (Target database)

Source system writes single <record></record> entries to SQS (via https calls). Then SQS poller on the target system polls this queue for new messages, and when there are messages on the queue, dequeue them, parse them and write to the target DB. All with code than can be trivially cut and pasted from the internet for ones favourite language.

My opinion is this is much simpler than all the nasty edge cases around shipping changes out via FTP files, many of which have already been encountered in this thread.
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
Yes, this is roughly what I mean. As long as imranpanji can at least do code changes to the source server, which he seems to be able to do

Cannot do anything to source server. Only thing I can do is ftp the data across
 

gkm

Expert Member
Joined
May 10, 2005
Messages
1,519
Cannot do anything to source server. Only thing I can do is ftp the data across

Sorry, I do not want to read the whole thread again, but I thought there was talk somewhere about making changes to the source server to make the writing of the files to be FTP-ed more atomic to avoid corrupted records. In my opinion, if you cannot do anything on the source side at least in terms of code changes, you are in for a world of pain in the long run. Do you not maybe have some kind of interaction with the source side to ask them to write the stuff for you to SQS, rather than files?
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
Sorry, I do not want to read the whole thread again, but I thought there was talk somewhere about making changes to the source server to make the writing of the files to be FTP-ed more atomic to avoid corrupted records. In my opinion, if you cannot do anything on the source side at least in terms of code changes, you are in for a world of pain in the long run. Do you not maybe have some kind of interaction with the source side to ask them to write the stuff for you to SQS, rather than files?
My ftp mirror script is working just fine. We have moved on from there. CRC checks confirm that everything is all ok
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Sorry, I do not want to read the whole thread again, but I thought there was talk somewhere about making changes to the source server to make the writing of the files to be FTP-ed more atomic to avoid corrupted records. In my opinion, if you cannot do anything on the source side at least in terms of code changes, you are in for a world of pain in the long run. Do you not maybe have some kind of interaction with the source side to ask them to write the stuff for you to SQS, rather than files?
Not sure where there was reference to atomicity, but the unpatterned corruption of his telemetry data could certainly be related to badly behaving threads. As for the world of pain re FTP; that's doubtful; I've built quite a lot of transactional solutions off FTP that e.g. have linked up with SAP for consignment stock sales.
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
I tried to give this a go in C. Gosh was that a colossal fck up.

The more I C, the less I see.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
I tried to give this a go in C. Gosh was that a colossal fck up.

The more I C, the less I see.
Lol... It's tough going from either a weakly typed (PHP) or untyped language (Javascript) to strictly typed (C) one. The other way around is easy peasy.

Ps. If you think that's tough, you should try Rust. ;)
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
Give what a go exactly?

FTP server + XML parser

[)roi(];18766834 said:
Lol... It's tough going from either a weakly typed (PHP) or untyped language (Javascript) to strictly typed (C) one. The other way around is easy peasy.

Ps. If you think that's tough, you should try Rust. ;)

Ja nee kyk, this was a world of butthurt. Lots to learn thou I quickly saw common principles I will need to get a good programming book and start doing thought experiments to think the right way.

Programming is beautiful, Programming is life.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Ja nee kyk, this was a world of butthurt. Lots to learn thou I quickly saw common principles I will need to get a good programming book and start doing thought experiments to think the right way.
Start wth the basics and work on from there; best to forget everything PHP or Javascript -- they're only going to be a hinderance.
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
I will ignore droid on this. Whatever allows me to belt out working , decent apps in the minimal time is where I would go.

Yes, I started with machine code and assembler in 1993....am not going back to that, if I can say "loadxml (filename) " and it works, that's where I'm gonna go
 
Top