MySQL Query Help

DrJohnZoidberg · Oct 23, 2013

I am trying to write an efficient query in MySQL to compare financial figures between two periods of different years.

For instance I want to compare our sales figures during the following periods:

October 1, 2013 to October 22, 2013
October 1, 2012 to October 22, 2012
October 1, 2011 to October 22, 2011
etc
etc

Unfortunately my SQL skills are much to be desired and looks like my first attempt has made an impossible query:

Code:

SELECT brand, date, transfers_in
FROM financial_main
WHERE (MONTH(date) BETWEEN MONTH((now() - INTERVAL 1 MONTH)) AND MONTH(now()))
AND (DAY(date) BETWEEN 1 AND DAY((SELECT MAX(date) from financial_main LIMIT 1)))

It should check what the latest date is in the financial_main table then use that as the cut off date for the other years too.

Help

DrJohnZoidberg · Oct 23, 2013

I don't need it to do any more than just pull the relevant data, I have a separate php function that serves it as JSON.

DrJohnZoidberg · Oct 23, 2013

Okay, got it to work but it's still pretty slow. How can I make this faster?

Code:

SELECT DATE, brand, transfers_in
FROM financial_main
WHERE (
MONTH( DATE ) 
BETWEEN MONTH( (
NOW( ) )
)
AND MONTH( NOW( ) )
)
AND DAY( DATE ) <= DAY( (

SELECT MAX( DATE ) 
FROM financial_main
LIMIT 1 )
)

As far as I understand it is checking to see what the last date is on every record, how can I get it to just check it once?

EDIT: Oops, removed an interval that wasn't supposed to be there.

shauntir · Oct 23, 2013

Rule number 1 of sql querying optimization: don't convert your search argument to a different type.

So

Code:

MONTH( DATE )

should not be done. The engine has to convert the data before searching and will stop any indexes from being used.

I'll have a look at your query later this evening when I get home and see if I could lend a hand. Please post any indexes that you have on this table as well.

Also, I think you may be better suited to create a stored procedure since you will have access to variables etc.

DrJohnZoidberg · Oct 23, 2013

shauntir said:
Rule number 1 of sql querying optimization: don't convert your search argument to a different type.

So

Code:

MONTH( DATE )

should not be done. The engine has to convert the data before searching and will stop any indexes from being used.

I'll have a look at your query later this evening when I get home and see if I could lend a hand. Please post any indexes that you have on this table as well.

Also, I think you may be better suited to create a stored procedure since you will have access to variables etc.

Thanks for the reply. I managed to get it working ok now, but I really need to learn how to optimise my queries. I am creating these as views in MySQL.

Darko · Oct 23, 2013

Since you have a php script that is returning this data as JSON, why don't you just run your 3 simple queries above, one for each year, then use the php script to run the calculations and display the differences?

nand · Oct 23, 2013

Ok, so reading your query:

Code:

WHERE
(Month of date numeric) is between (n and n-1, where n = current month numeric)
AND
(Day of date numeric) between (1 and highest day numberic)

This seems... :erm: weird. But then again, I have no idea of what's within the `financial_main` table.
From context, I gather that:

Code:

financial_main (a logbook)
===============
brand          - A distributor / supplier title
date            - A specific date, either captured daily, weekly or monthly 
transfers_in  - A number

...Sample Data...
Title | 2012-10-11 | 5
Title | 2011-10-11 | 5
Title | 2010-10-11 | 5
Title | 2009-10-11 | 5

However, and again I'm possibly wrong with this, you actually want aggregated data for each "brand" over a certain time period - not the individual rows themselves? Either way, could you possibly let us in on your data structures and some sample data?

zippy · Oct 24, 2013

DrJohnZoidberg said:
Thanks for the reply. I managed to get it working ok now, but I really need to learn how to optimise my queries. I am creating these as views in MySQL.

You need to have indexes and make sure your query uses them. The problem is you are exclusively using functions in the where clause. I don't know that much about MySQL, but Oracle has a feature called function-based indexes which solve this. Check your MySQL documentation if there is such a thing.

DrJohnZoidberg · Oct 24, 2013

zippy said:
You need to have indexes and make sure your query uses them. The problem is you are exclusively using functions in the where clause. I don't know that much about MySQL, but Oracle has a feature called function-based indexes which solve this. Check your MySQL documentation if there is such a thing.

Thanks, I have indexed fields but I'm not using them in this query - I'll check what I can do here.

zippy · Oct 25, 2013

DrJohnZoidberg said:
Thanks, I have indexed fields but I'm not using them in this query - I'll check what I can do here.

If you can't create function based indexes in MySQL, then you can create another column on the table which will reflect the result of the function you are using. So if have a date '17-OCT-2013' in the current column, then the new column will have 'OCT-2013'. You then create and index in the new column. Then in your where clause you refer to the new column without a function around it. Ofc, you need to make sure this column is always synced when the main column changes. You can do this with an insert/update trigger.

We used to do this on Oracle before we got function based indexes

Edit: a warning about function based indexes. If you are doing very large batch inserts, and I mean large, then these indexes can be a killer on your batch insert jobs.

Waansin · Oct 26, 2013

This subquery:

Code:

AND DAY( DATE ) <= DAY( (
    SELECT MAX( DATE ) 
    FROM financial_main
    LIMIT 1 )
)

I imagine it executing on every row in the main query. Every time. You would be better off querying for the MAX( DATE ) from financial_main once before the main query and using its maximum date as a bound parameter.

I'm a little confused as to your stored dates. Are you storing them as dates, strings, ints? Pasting some of your table's data and schema would help.

[)roi(] · Oct 27, 2013

It would help to see the DDL for financial_main, some detail of the indexes on this table, and a short extract of your output.

Re optimisation:

Have you run your query with the EXPLAIN keyword? It'll help you to understand how the query is being executed, and possibly point you towards a way to optimise the execution time. (Also share this output for analysis)
Btw MySQL can optimise aggregate functions like MIN and MAX as long as the columns used are indexed (EXPLAIN will help you identify which ones).

For the remainder of your query, you could concatenate the data for the other periods by using either a:

UNION (simpler query to compile and read, but potentially slower)
Custom column to categorise the results into the various periods (more complex and difficult to read query, but potentially faster) -- making use of "SELECT CASE WHEN <date> BETWEEN <start period x> AND <end period x> THEN 'Period x' "
There are alternatives to using the singular MONTH, DAY, YEAR functions that could offer further optimisation.

Stickler for detail:
Comparing the current month to previous years can be fairly meaningless (partial month vs. full month) unless of course the query is only being run on the last day of the current month.

DrJohnZoidberg · Oct 27, 2013

This is what my final query looks like:

Code:

SELECT  `financial_main`.`date` AS  `Date` ,  `financial_main`.`brand` AS  `Brand` , SUM(  `financial_main`.`field1` ) AS  `SumField1` 
FROM  `financial_main` 
WHERE (
(
MONTH(  `financial_main`.`date` ) = MONTH( NOW( ) )
)
AND (
DAYOFMONTH(  `financial_main`.`date` ) <= DAYOFMONTH( (

SELECT MAX(  `financial_main`.`date` ) 
FROM  `financial_main` 
LIMIT 1 )
)
)
)
GROUP BY YEAR(  `financial_main`.`date` ) ,  `financial_main`.`brand`

None of the columns I'm using are indexed, there is only one indexed column in my table and that is the primary key column. Is it advisable to have the date column index too?

The query runs very quickly but there aren't too many rows in the table (around 17K), I do have other tables with over 2 - 3 million rows that's why I'm asking about optimisations.

[)roi(] · Oct 27, 2013

Now worries re the rest of my explanation; assumed you would also prefer transposition of the result.

Run the query with EXPLAIN; the output will show more detail for possible optimisation; especially re the use of MONTH / DAYOFMONTH / YEAR functions.

The EXPLAIN statement is used to obtain more information on how MySQL has executed your statement: as opposed to just comparing time. The output of EXPLAIN makes it easier to narrow down the best place for an index.

DrJohnZoidberg · Oct 27, 2013

This is all I get from the EXPLAIN output:

Code:

+----+-------------+----------------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows  | Extra                                        |
+----+-------------+----------------+------+---------------+------+---------+------+-------+----------------------------------------------+
|  1 | PRIMARY     | financial_main | ALL  | NULL          | NULL | NULL    | NULL | 16480 | Using where; Using temporary; Using filesort |
|  2 | SUBQUERY    | financial_main | ALL  | NULL          | NULL | NULL    | NULL | 16480 |                                              |
+----+-------------+----------------+------+---------------+------+---------+------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

[)roi(] · Oct 27, 2013

DrJohnZoidberg said:

This is all I get from the EXPLAIN output:

Code:

+----+-------------+----------------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows  | Extra                                        |
+----+-------------+----------------+------+---------------+------+---------+------+-------+----------------------------------------------+
|  1 | PRIMARY     | financial_main | ALL  | NULL          | NULL | NULL    | NULL | 16480 | Using where; Using temporary; Using filesort |
|  2 | SUBQUERY    | financial_main | ALL  | NULL          | NULL | NULL    | NULL | 16480 |                                              |
+----+-------------+----------------+------+---------------+------+---------+------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

...and that's exactly as it should be

Primary is the outer most SELECT.
SUBQUERY is your first SELECT in the SUBQUERY.
Table is as the name implies the table used by the query.
Type is the type of the join; ALL implies no joins.
Possible_keys shows whether any relevant indexes were available (NULL = none)
Key indicates which index was chosen to run the query (NULL = none)
Key_len is the length of the chosen key, and allow you to determine how many parts of a multi part key is used (NULL = none)
Ref shows which columns or constants are compared to the index (NULL - none)
Rows is an estimate of the Row to be examined.
Filtered indicates an estimated percentage of table rows that will be filtered by the table condition (is only available with EXPLAIN EXTENDED).
Extra contains additional information about how MySQL resolves the query. (is only available with EXPLAIN EXTENDED).

Ok, let's try to make some sense of your data:

Basically it indicates we have a Primary and Subquery, and each query is estimated to run through a total of 16480 records; as you have already confirmed this is all the records in the financial_main table. So from this we can immediately garner that no optimisation is being achieved. (Btw If you had 2 million records, your SQL would force all 2 million records to be examined twice)
The NULL value in the of Possible_keys and Key columns confirms that no optimisation was used or is possible.

Now let's get down to why EXPLAIN shows that no optimisation is possible:

MAX - As I have indicated before MAX / MIN can be optimised through indexing (as long as MAX is being done against a table column and not a calculated value); In this case you are using an actual column "MAX( `financial_main`.`date` ) " so we could assume it's probably not the problem.
As someone else mentioned previously, your date functions: "MONTH( `financial_main`.`date` ) = MONTH( NOW( ) )" and "DAYOFMONTH( `financial_main`.`date` ) <= DAYOFMONTH( (SELECT MAX( `financial_main`.`date` ) FROM `financial_main`LIMIT 1 )" void the ability for any optimisations; why you ask? Because the comparison is being done against a calculated value, and until we process a record and make the calculation, we are unable to evaluate if there is a match or not; hence every row must be examined, and index optimisation is not possible.

So in the end; if you want to optimise your query your have to learn how to write the same query differently; basically it must be rewritten to evaluate a match against one of your table's columns as opposed to a calculated value; once this is done you can apply indexing to reduce the number of rows that the query needs to evaluate (i.e. some level of optimisation is achieved when we don't have to examine every row in the table)

DrJohnZoidberg · Oct 27, 2013

[)roi(];11430963 said:
...and that's exactly as it should be

Primary is the outer most SELECT.

SUBQUERY is your first SELECT in the SUBQUERY.

Table is as the name implies the table used by the query.

Type is the type of the join; ALL implies no joins.

Possible_keys shows whether any relevant indexes were available (NULL = none)

Key indicates which index was chosen to run the query (NULL = none)

Key_len is the length of the chosen key, and allow you to determine how many parts of a multi part key is used (NULL = none)

Ref shows which columns or constants are compared to the index (NULL - none)

Rows is an estimate of the Row to be examined.

Filtered indicates an estimated percentage of table rows that will be filtered by the table condition (is only available with EXPLAIN EXTENDED).

Extra contains additional information about how MySQL resolves the query. (is only available with EXPLAIN EXTENDED).

Ok, let's try to make some sense of your data:

Basically it indicates we have a Primary and Subquery, and each query is estimated to run through a total of 16480 records; as you have already confirmed this is all the records in the financial_main table. So from this we can immediately garner that no optimisation is being achieved. (Btw If you had 2 million records, your SQL would force all 2 million records to be examined twice)

The NULL value in the of Possible_keys and Key columns confirms that no optimisation was used or is possible.

Now let's get down to why EXPLAIN shows that no optimisation is possible:

MAX - As I have indicated before MAX / MIN can be optimised through indexing (as long as MAX is being done against a table column and not a calculated value); In this case you are using an actual column "MAX( `financial_main`.`date` ) " so we could assume it's probably not the problem.

As someone else mentioned previously, your date functions: "MONTH( `financial_main`.`date` ) = MONTH( NOW( ) )" and "DAYOFMONTH( `financial_main`.`date` ) <= DAYOFMONTH( (SELECT MAX( `financial_main`.`date` ) FROM `financial_main`LIMIT 1 )" void the ability for any optimisations; why you ask? Because the comparison is being done against a calculated value, and until we process a record and make the calculation, we are unable to evaluate if there is a match or not; hence every row must be examined, and index optimisation is not possible.

So in the end; if you want to optimise your query your have to learn how to write the same query differently; basically it must be rewritten to evaluate a match against one of your table's columns as opposed to a calculated value; once this is done you can apply indexing to reduce the number of rows that the query needs to evaluate (i.e. some level of optimisation is achieved when we don't have to examine every row in the table)

Wow, thanks for the awesome response. I should probably just look doing a database basics course to save myself the trouble of just winging it

Do you know of any good online resources for MySQL tutorials, lessons, etc?

[)roi(] · Oct 27, 2013

Ok, so now onto a solution; you need to rewrite your query so that the primary part of the query is evaluated against one of your table's columns and not a calculated value. For example:
`financial_main`.`date` BETWEEN <start> AND <end>.

I believe it's possibly to achieve this in either:

a UNION query: perform calculations only on the 1 side of the comparison (see example above); then repeat the query for each annual term, ultimately merge the results into a single dataset with UNION statement.
A more complex, non UNION solution (and potentially faster). By using a SUBQUERY as the primary data source as opposed to directly off financial_main; the subquery essentially returns the same columns as financial_main, except that it also includes a custom column that categorizes the data according to period using (SELECT CASE WHEN `financial_main`.`date` BETWEEN <calcStartPeriod> AND <calcEndPeriod> THEN <periodname> END AS `PeriodColumn`), the resulting dataset is then aggregated by grouping on the new `PeriodColumn`.
alternatively build a stored procedure to do something similar.

[)roi(] · Oct 27, 2013

DrJohnZoidberg said:
Wow, thanks for the awesome response. I should probably just look doing a database basics course to save myself the trouble of just winging it Do you know of any good online resources for MySQL tutorials, lessons, etc?

Unfortunately not...
But I did find a stackoverflow example along the lines of what I was trying to explain (hope it helps):
http://stackoverflow.com/questions/4973368/optimizing-mysql-query-for-group-by-with-date-functions

Just remember to keep 1 side (left side) of your WHERE or SELECT CASE WHEN referencing a column in your table (a column that can be indexed for optimization); + best to try to avoid comparing 2 calculations.

Here's a few date helpers:

1st day of current month: DATE_SUB(NOW(),INTERVAL DAYOFMONTH(NOW()) - 1 DAY)
Last day of current month: LAST_DAY(NOW())
Same date, previous year: DATE_SUB(NOW(), INTERVAL 1 YEAR)

GoB · Oct 29, 2013

For me the lazy approach, is to think of what you could JOIN together to get to your result / the parameters needed to filter on for the following join.

I would try something like this first. If it didn't work then I would blame the DBMS and go for coffee.

SELECT years.brand, years.year, SUM(financial_main.field1) FROM
(SELECT fm.brand, YEAR(fm.date) AS year FROM financial_main AS fm GROUP BY fm.brand, YEAR(fm.date)) AS years
INNER JOIN
financial_main ON financial_main.brand = years.brand
AND financial_main.date BETWEEN
str_to_date( concat( years.year ) , '-', month( curdate( ) ) , '-1' ) , '%Y-%m-%d' )
AND
str_to_date( concat( years.year ) , '-', month( curdate( ) ) , '-', day( curdate( ) ) ) , '%Y-%m-%d' )
GROUP BY years.brand, years.year

Join the MyBroadband community

Get started

MySQL Query Help

Honorary Master

Honorary Master

Honorary Master

Well-Known Member

Honorary Master

Senior Member

Senior Member

Honorary Master

Honorary Master

Honorary Master

Well-Known Member

Executive Member

Honorary Master

Executive Member

Honorary Master

Executive Member

Honorary Master

Executive Member

Executive Member

Expert Member