Telemetry data download and storage (FTP and web services)

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
We are using a saas platform for out system that allows us to export the telemetry data created by our devices via FTP or REST API.

I need to download this telemetry regularly and store it In a database for the following reasons:

1. Backing up of telemetry data

2. Run custom queries on the data for reports that are not available on the saas platform

3. Allow me to develop alternate UI's and mobile apps for that telemetry data that is more suited to the particular clients whose assets are producing that data.

To quicken this up I need to

a. Start downloading the data ASAP. That would imply using FTP . Unfortunately, there are a few disadvantages to that, but that's where I will start and at least build an adequate data store to start looking at 3 above

b. Store the data in a database

c. Extract and summarise the data for reports and UI display

d. Connect to the rest API and collect the data from there instead. This data will be up to date as opposed to the FTP data, and will also contain all the records and fields (read columns)
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
a. FTP download

Advantages:
1. I can choose what columns i recieve of the telemetry, so no surprises when having to add data to the database
2. I can choose XML or CSV data
3. It's FTP. Transferring the files on a schedule should be straightforward

Disadvantage:
1. There might be data that is missed, as it is for items that don't typically have data. There is different equipment sending data, so i cannot ascertain in advance which columns will be populated and which won't . Some of that missed data might be crucial for future use
2. The data is not fully up to date (discussed in next post), so for a mobile app will not be very useful as the mobile user will want the latest telemetry
3. Is a bit old school, which i like actually. But i really need to be moving on from that.

This is the method i have decided on to begin this process.
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
a. FTP Download

Typical File Listing on FTP server

telemetry_20161118120000.xml
telemetry_20161118130000.xml
telemetry_20161118140000.xml
telemetry_20161118150000.xml
telemetry_20161118160000.xml
telemetry_20161118170000.xml
telemetry_20161118180000.xml
telemetry_20161118190000.xml
telemetry_20161118200000.xml.open


Every hour, a file is created that has the most recent data. The file naming strategy is based on the time date

Current file : telemetry_yyyymmddhhmmss.xml.open
When the hour concludes, the file is renamed to : telemetry_yyyymmddhhmmss.xml
The file remains there for 30 days after which it is automatically deleted. Or it can be removed (preferred) to conserve space and keep things neat
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
a. FTP Download

Typical File contents of XML file

<?xml version="1.0" encoding="UTF-8" ?>
<records>
<record>
<assetid>fab98f46-259e-4fa4-b31f-d3d1d43c1c32</assetid>
<assetname>Skais XTRAIL</assetname>
<date>11/18/2016 7:12:22 PM</date>
<lat>-4.34504222222222</lat>
<lon>15.2937077777778</lon>
<altitude>298.2</altitude>
<active>False</active>
<speed>0</speed>
<heading>0</heading>
<odometer>96034.76</odometer>
<address>Rana Apartment</address>
<ac></ac>
<cargodoor>Closed</cargodoor>
<panic>Off</panic>
<fuellevel></fuellevel>
<obdfuellevel></obdfuellevel>
<rpm></rpm>
<obdrpm></obdrpm>
<compartment1></compartment1>
<compartment2></compartment2>
<fuellevel></fuellevel>
<oilpressure></oilpressure>
<enginetemperature></enginetemperature>
</record>
<record>
<assetid>fab98f46-259e-4fa4-b31f-d3d1d43c1c32</assetid>
<assetname>Skais XTRAIL</assetname>
<date>11/18/2016 7:27:20 PM</date>
<lat>-4.34504222222222</lat>
<lon>15.2937077777778</lon>
<altitude>298.2</altitude>
<active>False</active>
<speed>0</speed>
<heading>0</heading>
<odometer>96034.76</odometer>
<address>Rana Apartment</address>
<ac></ac>
<cargodoor>Closed</cargodoor>
<panic>Off</panic>
<fuellevel></fuellevel>
<obdfuellevel></obdfuellevel>
<rpm></rpm>
<obdrpm></obdrpm>
<compartment1></compartment1>
<compartment2></compartment2>
<fuellevel></fuellevel>
<oilpressure></oilpressure>
<enginetemperature></enginetemperature>
</record>
</records>


These are just 2 records from an xml file that has been transferred out via FTP.
Key things to note:

assetid is a unique identifier for the item that is sending the telemetry.In each xml file there will
There are other unique identifiers that might be exposed in one of the columns, but those would not be a the key data for the main data table. At this stage, i just want to import the data as is for the time being
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
Question 1 - File transfer tool / Scripting language

I have root access (and cron) available to 2 linux servers with static IP addresses.
I want to be able to ensure that i am able to :

1. Transfer the xml files over, ensuring that the transfer is 100% intact even in the case of a disconnection to the server This will be done hourly, and will have to be repeated immediately if there is a failure
2. Transfer the xml.open file every couple of minutes as new records are written to the file. This is the livest data i will have, and will be written to a different table until the complete output file is produced on the hour

What is the best direction to go in this regard?

Will throw out a few options, and will work with whatever tool/language people think will be most adequate for this task. Please suggest if i am going in the wrong direction or there are alternates i have missed out

  • Cron/Bash/Wget
  • Cron/Bash/Lftp
  • Cron/Bash/rsync
  • Cron/Python
  • Cron/PHP
  • Linux service
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
[)roi(];18695048 said:
Sounds simple enough, what do you need help with?

Am trying to go through this as step by step as possible.
I am very rusty at everything at the moment, and out of date with a lot of stuff.
So my structured approach on this thing is to ensure that i am not missing anything.

The end goal is the rest api, but need to work my way up to that....relearning as i go
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
The Rest API sync is probably going to present as much challenges as FTP; basically in both cases you have to figure out the best way for diffing the most recent entries to avoid duplication (assuming there is no globally unique transactionid to use as a key).

FTP: you'd want to check on the uniqueness of the file names; as this can be used as metadata to validate if a particular file contents had been processed, coupled with e.g. file size and creation time stamp; basically diffing could also be retriggered if these ever change.

Rest API: Biggest challenge is probably how the API supports segmentation of the data; for effeciency you certainly wouldn't want the API to return all records, rather only the last segment (guess this is similar to the FTP open file).

Also think about what needs to happen when the app stop and restarts; exactly what process are you going to use to avoid missing some data. FTP sounds easy in this regard; file metadata matchup, Rest API is unknown; but I guess you'll flesh that one out.

Also how many hosts are you going to connect to (FTP or Rest); built something similar quite a long time ago; that similarly had to collect transactional, temperature & gauging data from ~6,000 site controllers every ~10 minutes. using a single Linux box it was possible to hit most of that in under minute. Per controller snapshot was quite a bit larger than what you've shared.

Anyway I'll check in later once you've gathered your thoughts some more.
 
Last edited:

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
[)roi(];18695072 said:
The Rest API sync is probably going to present as much challenges as FTP; basically in both cases you have to figure out the best way for diffing the most recent entries to avoid duplication (assuming there is no globally unique transactionid to use as a key).

FTP: you'd want to check on the uniqueness of the file names; as this can be used as metadata to validate if a particular file contents had been processed, coupled with e.g. file size and creation time stamp; basically diffing could also be retriggered if these ever change.

File names are unique, based on datetime . New file is created hourly. New data file has a ".XML.opened" extension , and technically is the only file in the folder that changes. The previous hour data doesn't change , so can be transferred as soon as it is renamed to an ".XML" file extension as soon as the hour passes.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
File names are unique, based on datetime . New file is created hourly. New data file has a ".XML.opened" extension , and technically is the only file in the folder that changes. The previous hour data doesn't change , so can be transferred as soon as it is renamed to an ".XML" file extension as soon as the hour passes.
Good so with FTP you could maintain file metadata; basically to know what you've processed. So the only part that requires diffing is the open file.
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
To read the XML I would have gone:


PHP:
<?php
// Check if the file telemetry_yyyymmddhhmmss.xml contains an XML document with a root element
// and at least an element /[root]/node.

if (file_exists('telemetry_20161118190000.xml')) {
    $xml = simplexml_load_file('telemetry_20161118190000.xml');
 
    print_r($xml);
} else {
    exit('Failed to open telemetry_20161118190000.xml.xml.');
}
?>

The output should be:

PHP:
SimpleXMLElement Object
(
  [node] => Example node
  ...
)

PHP:
//simplexml_load_file — Interprets an XML file into an object
 
Last edited:

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
To read the XML I would have gone:


PHP:
<?php
// Check if the file telemetry_yyyymmddhhmmss.xml contains an XML document with a root element
// and at least an element /[root]/node.

if (file_exists('telemetry_20161118190000.xml')) {
    $xml = simplexml_load_file('telemetry_20161118190000.xml');
 
    print_r($xml);
} else {
    exit('Failed to open telemetry_20161118190000.xml.xml.');
}
?>

The output should be:

PHP:
SimpleXMLElement Object
(
  [node] => Example node
  ...
)

PHP:
//simplexml_load_file — Interprets an XML file into an object
Ok, will get around to that shortly tiger. Am still looking for a bulletproof FTP transfer method
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
I will play around a little bit and report back.

Am seriously looking at lftp : http://lftp.tech/
The parameter in question is the mirror command

http://linux.overshoot.tv/wiki/networking/lftp_backup_and_mirroring
https://www.cyberciti.biz/faq/lftp-mirror-example/

It will do all the things i want it to

Transfer the files, newer or non existing only
Won't delete older files i have transfered earlier.
Can get it's batch commands from a text file
Can work unattended
Is fast, and supports parallel transfers

Will ensure i am using the correct parameters and then cron it for every 2-3 minutes
Tested using

mirror --only-newer / /home/telemetrydata

The reading of files and comparing time/date stamps is taking time though so i think i will have to make a point of deleting some of the files from the source server
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Am seriously looking at lftp : http://lftp.tech/
The parameter in question is the mirror command

http://linux.overshoot.tv/wiki/networking/lftp_backup_and_mirroring
https://www.cyberciti.biz/faq/lftp-mirror-example/

It will do all the things i want it to

Transfer the files, newer or non existing only
Won't delete older files i have transfered earlier.
Can get it's batch commands from a text file
Can work unattended
Is fast, and supports parallel transfers

Will ensure i am using the correct parameters and then cron it for every 2-3 minutes
Tested using

mirror --only-newer / /home/telemetrydata

The reading of files and comparing time/date stamps is taking time though so i think i will have to make a point of deleting some of the files from the source server

Don't know iftp, but it's a fairly easy component to replace; many options available and failing that it's not all that difficult to roll your own. You'll still have to manage the file metadata in the import process + the open file will still require transactional diffing (sifting out what you've loaded, what changed, what needs to be added) -- the design of this from the DB side is probably something to consider; could be beneficial to have a separate table for the open recordset, and then once finalised either move the records across; or just simply reprocess the closed file (as per normal) and delete all the records in the open recordset.

Basically: you'd have two separate transactional tables: complete transactions vs. open / pending transactions -- concatenating the results of these two separate queries is simple (SQL UNION).
 
Last edited:

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
[)roi(];18697998 said:
Don't know iftp, but it's a fairly easy component to replace; many options available and failing that it's not all that difficult to roll your own. You'll still have to manage the file metadata in the import process + the open file will still require transactional diffing (sifting out what you've loaded, what changed, what needs to be added) -- the design of this from the DB side is probably something to consider; could be beneficial to have a separate table for the open recordset, and then once finalised either move the records across; or just simply reprocess the closed file (as per normal) and delete all the records in the open recordset.

Basically: you'd have two separate transactional tables: complete transactions vs. open / pending transactions -- concatenating the results of these two separate queries is simple (SQL UNION).
My transfer process is running well.
I have one process to sync all files, and another process for the .open file. They run every 5 minutes. One is just checking for new files to transfer. The other one is updating the .open file if it has changed

2 tables. One for live data from the .open file. And one for archived data from .XML files. As soon as I receive the .XML file, I use a single transaction to import it into the archived data, and drop the contents from the live table if it is the same. The easiest way of doing this is to add a column to the tables with the file name it is imported from?

The .open file will always have less records than the matching .XML file.

I need to draw so e diagrams for the flow, otherwise this will be a bust!
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
b. Store the data in a database

I now have a series of XML files on my server

The ones named YYYYMMDDHHMMSS.xml will not change and will be imported into an archive table
The ones names YYYYMMDDHHMMSS.xml will change every 5 minutes or more will be imported into a live or temporary table.

I want to focus on the archive table first, as it is easier to implement.

  • The XML file has a number of columns. At minumum Asset ID, Asset name, Date, Latitude, Longitude, Altitude, Active
  • The number of columns and their names might change as records come in
  • Not all records will have all columns.
  • I want to add a column called "FileName" or something like that that identifies which file the records have been imported from. Might be also achievable via metadata, but this seems simplest to me
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
b. Store the data in a database

Importing the XML file Processes


  1. Loop through the files in the folder for XML files ending in ".XML"
  2. Open each XML files, and start by checking the integrity of the records
  3. Check if the columns already exist in the database
  4. If all columns exist, jump to step 6
  5. If columns don't exist, then create the columns required
  6. Add a column called "Filename" to each record with a value of the filename of the xml file
  7. Import the data
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
Question 2

What tools do i use for this?

Programming Options:

Scheduled Python script
Scheduled PHP script
Scheduled Database import and transformation (used to do this yonks ago with MSSQL)

Database options

MySQL
Postgres
SQlite?
Other

Which route do i go? As it's all a relearning process for me, am open to suggestion
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
Update:

Thinking of using an xslt transformation in php to alter the data to format it correctly for the import. I used to do this extensively about 10 years ago in visual studio. But don't remember if you are able to add a column with data during a transformation.

Or just add the Filename field using the xml DOM and populate it before importing it into the DB.

Am I over thinking this?
 
Top