Telemetry data download and storage (FTP and web services)

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Update:

Thinking of using an xslt transformation in php to alter the data to format it correctly for the import. I used to do this extensively about 10 years ago in visual studio. But don't remember if you are able to add a column with data during a transformation.

Or just add the Filename field using the xml DOM and populate it before importing it into the DB.

Am I over thinking this?
Don't you have a full list of the fields? Writing code to alter structure on import is risky.
 

Pho3nix

The Legend
Joined
Jul 31, 2009
Messages
30,589
Question 2

What tools do i use for this?

Programming Options:

Scheduled Python script
Scheduled PHP script
Scheduled Database import and transformation (used to do this yonks ago with MSSQL)

Database options

MySQL
Postgres
SQlite?
Other

Which route do i go? As it's all a relearning process for me, am open to suggestion

Will look at this when I get home.
Usually I'd go with db import script/job. Python might be easier and cheaper.

/sub'd
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Question 2

What tools do i use for this?

Programming Options:

Scheduled Python script
Scheduled PHP script
Scheduled Database import and transformation (used to do this yonks ago with MSSQL)

Database options

MySQL
Postgres
SQlite?
Other

Which route do i go? As it's all a relearning process for me, am open to suggestion

Stick with the languages for the import; if you must avoid it (for any reason), then maybe look into an ETL, for example: Tibco has an open source (community) ETL where you focus on designing the flow using a GUI and it generates Java code. https://www.jaspersoft.com/data-integration

Databases: stick with the more flexible options: MySQL, Postgres, ...
 

Pho3nix

The Legend
Joined
Jul 31, 2009
Messages
30,589
[)roi(];18700168 said:
Stick with the languages for the import; if you must avoid it (for any reason), then maybe look into an ETL, for example: Tibco has an open source (community) ETL where you focus on designing the flow using a GUI and it generates Java code. https://www.jaspersoft.com/data-integration

Databases: stick with the more flexible options: MySQL, Postgres, ...

Why would you recommend a language VS ETL? Just curious?
Never heard of Jasper, will give it a gander :)

As for Postgres.. hate the POS.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Why would you recommend a language VS ETL? Just curious?
Never heard of Jasper, will give it a gander :)

As for Postgres.. hate the POS.
It's a lot more like Microsoft's SSIS. Jasper ETL is almost the same product as Talend open studio (build from same source). Talend are however the original source owners (if memory serves me correct), so you can download the community version also from Talend. Haven't use it for a few years, so I can't tell you if they're maintaining different forks.

In any case it's probably better to just use the one from Talend. http://www.talend.com/products/talend-open-studio

As for it's utility; it's a very powerful product, connection to many hosts / databases / file formats, with almost complete flexibility of the extract, transform and load process, including notification, ... The code produced is fairly readable, and performs really well, its fast, re you not only have options to multi-thread the solution, but also to run it across multiple host nodes.
 
Last edited:

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
[)roi(];18700168 said:
Stick with the languages for the import; if you must avoid it (for any reason), then maybe look into an ETL, for example: Tibco has an open source (community) ETL where you focus on designing the flow using a GUI and it generates Java code. https://www.jaspersoft.com/data-integration

Databases: stick with the more flexible options: MySQL, Postgres, ...

So will look at php/python with mysql/postgres

Am leaning towards php, but testing on the command line is a bit irritating tbh. My code will likely contain minimal error checking initially as both the languages are new to me. Will post code as soon as I have one importing to a test db.
 

biometrics

Honorary Master
Joined
Aug 7, 2003
Messages
71,858
So will look at php/python with mysql/postgres

Am leaning towards php, but testing on the command line is a bit irritating tbh. My code will likely contain minimal error checking initially as both the languages are new to me. Will post code as soon as I have one importing to a test db.

I'm late to the party, what have you figured out and what not?
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
I'm late to the party, what have you figured out and what not?
Have been in charge of my son for the day, so haven't moved forward much today.

The FTP transfer is sorted enough. I've figured out how to read and print a directory listing in php.

Next stage will be to look at xml code above (posted by Thor) to read the files one by one in an xml format and post the data within to a database table. Need to cycle through the xml file, check which items are columns, look at the database schema to see if a column exists already and add a column if it doesn't exist before importing the data
 
Last edited:

biometrics

Honorary Master
Joined
Aug 7, 2003
Messages
71,858
The FTP transfer is sorted enough. I've figured out how to read and print a directory listing in php.

Not to disappoint, but you need to have FTP retries and resume before you sign off on FTP. ;)

You mentioned earlier that you have access to a HTTP API, that may be easier to implementing than a complete FTP transfer solution especially if the files are small.
 
Last edited:

_kabal_

Executive Member
Joined
Oct 24, 2005
Messages
5,922
He already said that he is using lftp which automatically handles the mirroring and retry functionality
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Why would you recommend a language VS ETL? Just curious?
Overlooked this question.

In short it's difficult to make any clear decision without an assessment of the project; but in this case it's @imranpanji project / exercise... He needs to weigh up the pros / cons for either route.

Naturally the programming route offers a great learning experience if that's the objective, yet in the case of Talend ETL, you can design the flow, run it and once working properly, switch over and generate the source code for review (i.e. potentially a valuable way to learn some new techniques).
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
Does his app interface with lftp? How does it know when it is finished with a download?
First step is to FTP the data across to my dev server before it gets deleted. Essentially a backup. There is typically 30 days of data stored on the FTP server. Each hourly file is about 1mb in size

The lftp script runs every 5 minutes and mirrors across whatever new files there are (.XML and .open), and then mirrors across the one .open file that actually changes....but only if it has changed. It writes what it has done to a log file (as well as errors). So typically every hour the log file will contain 1 new .XML transfer, and many .open transfers.
 

_kabal_

Executive Member
Joined
Oct 24, 2005
Messages
5,922
Does his app interface with lftp? How does it know when it is finished with a download?

I would assume the lftp most likely can use a copy move strategy, either using a temp folder, or using a temp additional file extension.

If that is the case, new files in a folder can be considered download complete.

The .open file does however make things a little more complicated

@OP - using the file system as a state machine can be an effective way to run a simple transformation pipeline - watch a folder(input channel adapter)-> process a file (filters, splitters, service activators, etc)-> move to a folder (output channel adapter)
 
Last edited:

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
@OP - using the file system as a state machine can be an effective way to run a simple transformation pipeline - watch a folder(input channel adapter)-> process a file (filters, splitters, service activators, etc)-> move to a folder (output channel adapter)

That's actually a very good idea. My only concern is that each stage implies a delay. Since there is already a delay of X minutes from the ftp server, want as little stages as possible to process the files into the db? Is there a way of monitoring a folder for changes In Linux?

Edit: I already use the watch command for directory changes
 
Last edited:

_kabal_

Executive Member
Joined
Oct 24, 2005
Messages
5,922
Yes, you can use inotify (inode notify).

I have very limited experience with it however. I used it previously to attempt to solve a known performance issue with vagrant/virtual box shared folders from windows machines with many files. This didn't work for my specific usecase (serving php from a shared folder) and ended up switching to samba shares.
 
Last edited:

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
That's actually a very good idea. My only concern is that each stage implies a delay. Since there is already a delay of X minutes from the ftp server, want as little stages as possible to process the files into the db? Is there a way of monitoring a folder for changes In Linux?

Edit: I already use the watch command for directory changes
For directory changes I would have a cron job and something similar to this

http://www.franzone.com/2008/06/05/php-script-to-monitor-ftp-directory-changes/
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
For directory changes I would have a cron job and something similar to this

http://www.franzone.com/2008/06/05/php-script-to-monitor-ftp-directory-changes/

Am not concerned about the remote ftp site. I can cron the current mirroring to every minute if i want. My method takes less than a minute to transfer the new files in any case. The remote folder has about 720 files in it, so even with lftp, when i tried to even do a timestamp comparison it was taking a lot of time.

But i get your point, and thanks for the link though, am sure i will find a use for it at some stage.
 

Hamish McPanji

Honorary Master
Joined
Oct 29, 2009
Messages
42,084
In PHP, how would i concatonate this string :

Code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <!-- Identity transform -->
   <xsl:template match="@* | node()">
      <xsl:copy>
         <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="assetname">
 <xsl:copy-of select="."/>

with this :

Code:
<Company>$Companyname</Company>
      <file>$Filename</file>
   </xsl:template>
</xsl:stylesheet>


Assume the items with $Companyname and $Filename are variables whose values i want to be in the string
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
In PHP, how would i concatenate this string :

Code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <!-- Identity transform -->
   <xsl:template match="@* | node()">
      <xsl:copy>
         <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="assetname">
 <xsl:copy-of select="."/>

with this :

Code:
<Company>$Companyname</Company>
      <file>$Filename</file>
   </xsl:template>
</xsl:stylesheet>


Assume the items with $Companyname and $Filename are variables whose values i want to be in the string

Personally I would save the two as seperate xml files and then merge the two.

master_datestamp_temp.xml

concatenate_datestamp_temp.xml

ppap the thing into master_datestamp.xml

https://stackoverflow.com/questions/20372216/combine-multiple-xml-into-one
 
Top