RSS and MySQL

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
Right this is what I have in the Foreach loop:

PHP:
    $unique_hash = hash('SHA256', $title.$link);
  
$statement = $db->prepare("INSERT INTO market_feed (`title`, `description`, `link`, `date`, `unique_hash` ) SELECT * FROM (SELECT :title, :description, :link, :date, :unique_hash) AS tmp WHERE NOT EXISTS ( SELECT `unique_hash` FROM market_feed WHERE unique_hash= ':unique_hash') LIMIT 1;");
$statement->execute(array(':title' => $title, ':description' => $description, ':link' => $link, ':date' => $date, ':unique_hash' => $unique_hash));

This is what it looks like in the DB ( I dropped all the stuff so it was a clean start (empty table) )

I have made the hash a CHAR type with a length of 64

2016-08-10_14-57-16.jpg

However if I run the SQL query again it adds the duplicates:

2016-08-10_15-03-48.jpg
 
Last edited:

MagicDude4Eva

Banned
Joined
Apr 2, 2008
Messages
6,479
I am a slight bit lost.

From a performance perspective you want to store the hash into a numeric value and not some varchar string as it is more efficient to store keys as numerics over strings as well as performance. Probably not an issue in your example if you don't deal with millions of records.

In your pursuit of knowledge, look up perfect hashing (i.e. a hash which results in no collisions). I am sure a DB such as MySQL will allow the generation/storage of binary / numeric hashes. With many DBs you should actually be able to generate the primary key based on a trigger of other columns. All of the above is quite simple and a bit of reading/Googling will sort you out :whistling:
 

Nod

Honorary Master
Joined
Jul 22, 2005
Messages
10,057
As a test, change/add the unique_hash as a primary key, then you change your insert query to:
Code:
INSERT [B]IGNORE[/B] INTO market_feed (`title`, `description`, `link`, `date`, `unique_hash` ) VALUES (:title, :description, :link, :date, :unique_hash)
This will create a warning about the duplicate key, skip the insert for that record, and continue.
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
From a performance perspective you want to store the hash into a numeric value and not some varchar string as it is more efficient to store keys as numerics over strings as well as performance. Probably not an issue in your example if you don't deal with millions of records.

In your pursuit of knowledge, look up perfect hashing (i.e. a hash which results in no collisions). I am sure a DB such as MySQL will allow the generation/storage of binary / numeric hashes. With many DBs you should actually be able to generate the primary key based on a trigger of other columns. All of the above is quite simple and a bit of reading/Googling will sort you out :whistling:

The hash is -> f6926caf3d8c489de012d2192539c926579cf3e27423ddff9311b53451580f66

So I thought Char would be my only hope as it contains both numerals and letters

OR do you mean hash it -> f6926caf3d8c489de012d2192539c926579cf3e27423ddff9311b53451580f66

Then take f6926caf3d8c489de012d2192539c926579cf3e27423ddff9311b53451580f66 and convert that to binary and store the binary in the DB ?
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
As a test, change/add the unique_hash as a primary key, then you change your insert query to:
Code:
INSERT [B]IGNORE[/B] INTO market_feed (`title`, `description`, `link`, `date`, `unique_hash` ) VALUES (:title, :description, :link, :date, :unique_hash)
This will create a warning about the duplicate key, skip the insert for that record, and continue.

HOLY SMOLLY !!!!!!!

That works!!!!!!!!!!!!!!

OMG OMG OMG.

Edit:

Where does the warning go? What I mean is if this is a cron job there will be nothing echo'ed so will the warning just go into the void and I can forget about it?
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Are you wanting to keep a history of News24 articles?
if not then surely you should just delete everything and start anew each cycle.

As for implementation of this; include the delete all command and new inserts in the same transaction; that way if anything fails you could simply rollback the transaction i.e. Table will be in the original state, the state it was before you started the transaction.

Here's some information of PDO transactions;
https://www.html5andbeyond.com/using-database-transactions-pdo-code-example/
http://php.net/manual/en/pdo.begintransaction.php

Note:
Some databases, including MySQL, automatically issue an implicit COMMIT when a database definition language (DDL) statement such as DROP TABLE or CREATE TABLE is issued within a transaction. The implicit COMMIT will prevent you from rolling back any other changes within the transaction boundary.
 
Last edited:

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
[)roi(];18123969 said:
Are you wanting to keep a history of News24 articles?
if not then surely you should just delete everything and start anew each cycle.

As for implementation of this; include the delete all command and new inserts in the same transaction; that way if anything fails you could simply rollback the transaction i.e. Table will be returned to original state before the failure.

Here's some information of PDO transactions; http://php.net/manual/en/pdo.begintransaction.php

That is actually a good point same as I do with the stock prices.

Since all that happens is

Get articles from news 24
Save to database
Via cron

Then
create a XML feed also in the cron

Then as a visitor visits the page
Display it on main website

The the forum then also periodically polls the XML feed and creates a new thread if new items are added to the XML feed.

So definitely a good idea if I then just mimic what I do with the stocks although this was a much needed exercise since I have learned a hell of a lot and for the time being I can keep it as it is it works 110% now guys!!!
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Just don't forget to use transactions; it ensures that your table is always left in a usable state (with records)
 
Last edited:

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
[)roi(];18124123 said:
Just don't forget to use transactions; it ensures that your table is always left in a usable state (with records)

Oops!.. I do not know what this is, or if I use it already, I assume not?

I assume it's something like this? (Was wondering what this is)

PHP:
try {
    // First of all, let's begin a transaction
    $db->beginTransaction();

    // A set of queries; if one fails, an exception should be thrown
    $db->query('first query');
    $db->query('second query');
    $db->query('third query');

    // If we arrive here, it means that no exception was thrown
    // i.e. no query has failed, and we can commit the transaction
    $db->commit();
} catch (Exception $e) {
    // An exception has been thrown
    // We must rollback the transaction
    $db->rollback();
}
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Oops!.. I do not know what this is, or if I use it already, I assume not?

I assume it's something like this? (Was wondering what this is)

PHP:
try {
    // First of all, let's begin a transaction
    $db->beginTransaction();

    // A set of queries; if one fails, an exception should be thrown
    $db->query('first query');
    $db->query('second query');
    $db->query('third query');

    // If we arrive here, it means that no exception was thrown
    // i.e. no query has failed, and we can commit the transaction
    $db->commit();
} catch (Exception $e) {
    // An exception has been thrown
    // We must rollback the transaction
    $db->rollback();
}
Exactly; as your quote example says "if one fails, an exception should be thrown" -- as you can see the action to take for failures is "$db->rollback();" i.e. return database to the state prior to "$db->beginTransaction();"

Transactions are typically used to maintain integrity during a sequence of changes (usually more than one), For example; in your case:
1. you would start by deleting all the current records
2. Secondly you would retrieve the news24 RSS, and generate/execute inserts

When something fails you'd still want to maintain integrity, e.g. you usually wouldn't want step 1 to take effect if step 2 failed (as that would leave you with an empty table). However transactions integrity can also be maintained across tables; in the case of interlinked data, meaning if inserts to table 1 fails, you wouldn't want the related inserts to table 2 to succeed.

As mentioned be careful with DDL(data definition language) statements as these often don't work with transactions: With MySQL, DDLs are immediate commits; meaning transactional integrity cannot be maintained with these.
 
Last edited:

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
[)roi(];18125037 said:
Exactly; as your quote example says "if one fails, an exception should be thrown" -- as you can see the action to take for failures is "$db->rollback();" i.e. return database to the state prior to "$db->beginTransaction();"

Transactions are typically used to maintain integrity during a sequence of changes (usually more than one), For example; in your case:
1. you would start by deleting all the current records
2. Secondly you would retrieve the news24 RSS, and generate/execute inserts

When something fails you'd still want to maintain integrity, e.g. you usually wouldn't want step 1 to take effect if step 2 failed (as that would leave you with an empty table). However transactions integrity can also be maintained across tables; in the case of interlinked data, meaning if inserts to table 1 fails, you wouldn't want the related inserts to table 2 to succeed.

As mentioned be careful with DDL(data definition language) statements as these often don't work with transactions: With MySQL, DDLs are immediate commits; meaning transactional integrity cannot be maintained with these.

This is pretty nifty!!!!

I wish I can thank you guys personally. I avoided anything DB related like the plague as I just never thought I would be able to wrap my head around it. WELL I was wrong thanks to everyone this little attempt of mine turned out to be one incredible learning experience !!

I am not afraid of the DB anymore.
 

_kabal_

Executive Member
Joined
Oct 24, 2005
Messages
5,923
As a test, change/add the unique_hash as a primary key, then you change your insert query to:
Code:
INSERT [B]IGNORE[/B] INTO market_feed (`title`, `description`, `link`, `date`, `unique_hash` ) VALUES (:title, :description, :link, :date, :unique_hash)
This will create a warning about the duplicate key, skip the insert for that record, and continue.

this might be a feature of the language, but this still seems like a hack, and I would recommend against doing this.

this is basically the equivilent of doing the following, which is a very bad practise

PHP:
$db = connectToDb();
try {
   $db->executeSql(SOME_SQL_QUERY);
} catch ($exception) {
    if ($exception == SomeDuplicateKeyException) {
        //swallow the exception
    } else {
       throw $exception
    }
}
 

_kabal_

Executive Member
Joined
Oct 24, 2005
Messages
5,923
is the reason your INSERT IF NOT EXISTS not working because of the final quoting around the hash

PHP:
$statement = $db->prepare("INSERT INTO market_feed (`title`, `description`, `link`, `date`, `unique_hash` ) SELECT * FROM (SELECT :title, :description, :link, :date, :unique_hash) AS tmp WHERE NOT EXISTS ( SELECT `unique_hash` FROM market_feed WHERE unique_hash= ':unique_hash') LIMIT 1;");

should this not be

PHP:
$statement = $db->prepare("INSERT INTO market_feed (`title`, `description`, `link`, `date`, `unique_hash` ) SELECT :title, :description, :link, :date, :unique_hash FROM DUAL WHERE NOT EXISTS ( SELECT `unique_hash` FROM market_feed WHERE unique_hash = :unique_hash) LIMIT 1;");
 
Last edited:
Top