[HELP] JSON scraping

eternaloptimist

Well-Known Member
Joined
Jul 10, 2013
Messages
175
Hi all,
I need to scrape all this data: json every 30 minutes and save it into a single file /json/csv/ etc. What's the best way to it? Looking for libraries, packages etc that I can use. Python preferably but Java and JavaScript would be ok too.
Thanks in advance.
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
File_get_contents

json_decode

For each loop
 

Hamster

Resident Rodent
Joined
Aug 22, 2006
Messages
42,920
Wait, do you want to store the data as is in a json file or first parse it and store some kind of result?
 

eternaloptimist

Well-Known Member
Joined
Jul 10, 2013
Messages
175
Thanks for the replies. The data will end up in a sql database. I need to get data every 30 mins for about a month or a fortnight. I'm hoping to have this running on either an AWS/Digital Ocean etc instance.
 

Hamster

Resident Rodent
Joined
Aug 22, 2006
Messages
42,920
Thanks for the replies. The data will end up in a sql database. I need to get data every 30 mins for about a month or a fortnight. I'm hoping to have this running on either an AWS/Digital Ocean etc instance.
Ok, but the data you linked to is already in json format. So if you just want to dump that json to a file and process all of it later, an sh file doing a wget and outputting the result to [current date time].json should do the trick. Or a python script as mentioned above.

You then schedule it with Cron to run every 30 minutes and Bob's your uncle.

If you want to parse the data on the fly and push it into the db's relevant tables every 30 minutes, go with python and cron. Python has json libraries but it'll most likely just marshall it into a dictionary structure.

EDIT: Check out the answer: http://stackoverflow.com/questions/12965203/how-to-get-json-from-webpage-into-python-script
 
Last edited:

rorz0r

Executive Member
Joined
Feb 10, 2006
Messages
7,968
Throwing this out there, an Azure Logic App. If you need a bit more control you can write an Azure Function in JavaScript but you could just as easily do that on AWS Lamba, just depends which works better with what you need it for.
 
Top