SQL - Retrieving work

Marcell1992

Well-Known Member
Joined
Aug 7, 2018
Messages
289
Reaction score
59
Good day,

I in no way have any experience with SQL, but I'm trying to assist my company in coming up with a solution to an issue we have that negatively impacts our TAT considerably. Our IT team is *censored* (Don't want to be fired, lol)

So the issue is. We have multiple sites around South Africa that scans work to a server in PDF or Alternatiff format, if that stuff matters. We then have data capturers that retrieve this work from a queue in a first scanned first serve basis, it is evident based on errors we receive(for other non-related issues) that IT is using SQL. Anyway, the issue is that when the server gets overloaded, data capturers will frequently receive the same form (multiple users receive the same form) as well as that form getting stuck in the queue and you have to backdoor remove it.

Is there a workaround to this, perhaps some SQL coding you can apply that when the server is overloaded, the server will use a different method to distribute work? Or apply precautionary methods to not let the form be retrieved by multiple users at the same time?
 
I would invest time in looking at a system designed specifically to queue jobs.. RabbitMQ and AWS SQS come to mind..
 
Not sure I understand the problem, Using a surrogate key? use a different sequence range for the different sites?
 
...
We have multiple sites around South Africa that scans work to a server in PDF or Alternatiff format, if that stuff matters. We then have data capturers that retrieve this work from a queue in a first scanned first serve basis....
There is insufficient detail to provide you any meaningful advise; extra clarity will help, for example:
  • "scans work to a server" -- is this a separate server with just a database file/path reference index or is this as a database BLOB.
  • "data capturers that retrieve this work from a queue" -- what is the queue; a file/path reference table in a database, queuing software or something else?
  • A diagram of the workflow and servers involved would certainly help, including pointing out where the performance issues are.
 
hope you're not using a timestamp as the key.
 
hope you're not using a timestamp as the key.
I don't think OP would know if they were. Could also be that client sends pdf to server on capture, server overloaded so no repsonse to say received or something, so client re-sends.

Difficult to say anything without knowing more.
 
Getting stuck or delivering the same message multiple times shouldn’t be an error case due to an overloaded server. It sounds more like a fundamental concurrency error in the design.
Could very well be a transactional race issue and a BLOB would only heighten the occurrence under load -- insufficient detail to be sure.
 
Possibly the “queue” doesn’t know when a user has assigned the work to themselves. Users should only be able to select an item that is not already assigned. The only issue should be more than one session trying to grab an open item. Row level locking by the DBMS should be managing that. If anything, users should be plagued by “record in use” type messages in a system which has performance issues.

One way around that is to rather have server assign work based on a set of rules, or have team leader assign work. This approach will have its own issues though.
 
A queue that takes the Job Id is one solution. Or you can take advantage of the DB to enforce that a job is only assigned once. E.g. (pseudo-code)

table JOB
(
ID bigint PRIMARY KEY,
IS_ASSIGNED bool
)

table ASSIGNMENT
(
JOB_ID bigint PRIMARY KEY,
ASSIGNED_TO_USER_ID bigint
)

C#:
public void AssignJob(long jobId,long userId)
{
  try
  {
    StartTransaction();

    update JOB set IS_ASSIGNED = true where ID = :jobId;
    insert into ASSIGNMENT (JOB_ID,ASSIGNED_USER_ID) values (:jobId,:userId);

    Commit();
  }
  catch
  {
    Rollback();
    throw;
  }
}

You are taking advantage of the fact that the DB engine will enforce uniqueness on the JOB_ID on the ASSIGNMENT table. The insert and the update is atomic, so they will always fail or succeed together. If the action is successful then the job was assigned, otherwise it wasn't.
 
That has nothing to do with sql, it has to do with whatever app that is supplying your data capturers. SQL is a database, what you are talking about is a distributed work model by a program.

Not sure if the system giving the work is Intranet, but we log onto our browsers and go to an Intranet site to retrieve the work.

I don't think OP would know if they were. Could also be that client sends pdf to server on capture, server overloaded so no response to say received or something, so client re-sends
Difficult to say anything without knowing more.

We don't work on a time stamp. Sounds like it will be a bad idea, but what if you use time stamp down to the milliseconds? We have probably around 100 sites scanning work.

There is insufficient detail to provide you any meaningful advise; extra clarity will help, for example:
  • "scans work to a server" -- is this a separate server with just a database file/path reference index or is this as a database BLOB.
  • "data capturers that retrieve this work from a queue" -- what is the queue; a file/path reference table in a database, queuing software or something else?
  • A diagram of the workflow and servers involved would certainly help, including pointing out where the performance issues are.

1. It's one server. The work gets scanned to it, then via Intranet you view the form scanned and if you complete it, it is still on the server, just accessible from a different way, and due to this, our server is set up to run backups once a day to copy it over to another server to mitigate the issue I mentioned.
2. Not sure, my developer tool for IE (which is the browser we use) is not working for some reason.
3. I'll get back on that one and ask IT to explain what is happening.

Edit: In the Java script of IE on the Intranet, what would I need to look for to see what code is being run to do all the work? I see there's reference to www.w3.org, but not sure if that is relevant or correct.
 
Last edited:
A queue that takes the Job Id is one solution. Or you can take advantage of the DB to enforce that a job is only assigned once. E.g. (pseudo-code)
...
You are taking advantage of the fact that the DB engine will enforce uniqueness on the JOB_ID on the ASSIGNMENT table. The insert and the update is atomic, so they will always fail or succeed together. If the action is successful then the job was assigned, otherwise it wasn't.
Transactions don't inherently prevent all race conditions.
 
It's one server. The work gets scanned to it, then via Intranet you view the form scanned and if you complete it, it is still on the server, just accessible from a different way, and due to this, our server is set up to run backups once a day to copy it over to another server to mitigate the issue I mentioned.
2. Not sure, my developer tool for IE (which is the browser we use) is not working for some reason.
3. I'll get back on that one and ask IT to explain what is happening.
Ok, so singular server; but also check if the scanned image is stored in the database (i.e. as a BLOB), or stored in a directory / folder.
 
Is this a new issue, or has it always been there?
 
Transactions don't inherently prevent all race conditions.
In which condition wont it? I have used this mechanism in a telco application to enforce uniqueness where we handled tens of million events an hour. Never had an issue
 
Top
Sign up to the MyBroadband newsletter
X