Need database advice

DrJohnZoidberg

Honorary Master
Joined
Jul 24, 2006
Messages
27,988
Reaction score
7,446
Location
Table View
I'm back once again in dire need of the MyBB collective mind :)

I'm trying to figure out how to best store some information in a MySQL database.

The scenario is that I have a table with customer data like name, email, signup dates, etc. The problem comes in because the web front-end allows admins to also create custom data fields which can differ between users as each user can be part of a separate list. So for one customer list we may have 3 custom data fields for things like date of last purchase, total purchases and country while another list may only has 1 custom field.

TL/DR :: If you already know what I want go to next TL/DR :D

The way it is currently handled is in the lists table there is a column called custom_fields which handles the custom field header information, like the custom field name and the data type (can be date, text or numeric). This is all stored in the one field and the data is given a delimiter to allow the system to decipher it again, so a typical field will look like this:

Code:
Last Purchase:Date%s%Total Purchases:Numeric%s%Country:Text

So each custom field is delimited by %s% and then each custom fields type is delimited by a colon
Code:
(:)
.

Then in the customer table there is also a column which matches this delimited format but it contains the actual data for that customer, so it would look like:

Code:
1021327200%s%500.42%s%South Africa

This works great as a simple measure but it is very inefficient when you try and search through this data as it first has to be broken up and then calculations must be done on them (currently in php). It works fine for small lists but as soon as you start hitting 100k record it noticeably starts slowing down.

TL/DR Continuation...

My question is how do store these variable custom fields in database tables so I can do MySQL queries instead of having to hack the data to pieces in php?

If the custom fields could only be one data type it wouldn't be a problem, the issue is that there are 3 different data types (well date and numeric could be stored in the same type but not the text) possible and to achieve my blissful goal I need them to be stored as the correct MySQL data types.

Any useful input is appreciated.
 
Last edited:
1 to many link to a metadata table:

CarMetadata
CarID | key | value

I use this quite a bit in our production system for lookup tables (branches, offices, barcodes etc) and you can always cast the data in SQL if need be.

Another way is to save the data as an XML datatype and use xpath in your SQL queries (DB2 allows this, not sure about MySQL)
 
1 to many link to a metadata table:

CarMetadata
CarID | key | value

I use this quite a bit in our production system for lookup tables (branches, offices, barcodes etc) and you can always cast the data in SQL if need be.

Another way is to save the data as an XML datatype and use xpath in your SQL queries (DB2 allows this, not sure about MySQL)

Thanks, I'll do some reading so long :)
 
For splitting the info at mysql level , rather than using php, there are some database functions that can be used in your queries

http://dev.mysql.com/doc/refman/5.0/en/string-functions.html

The ones of interest would be the substring ones, as there is no split function

Still think saving them in one delimited string is not as efficient as splitting them up into key values in another table, performance being the major drawback. If you need the data split then store it that way.
 
Maybe you could create a table where you define your custom data fields (name, datatype name, description).

The custom data field table is linked out to a table that associates a custom data field, a client and the value for that particular client. I suppose you could create a nullable column in this table for each datatype you need and store the value like that, or only use one varchar to store all of them (I don't work in mysql).

The custom data field table can be linked out to linking table that associates a "list" table with one or more custom data fields. This is so that it would be easy to determine what custom fields are associated with the list so that to correct entries can be inserted for new clients based on the selected list.

All this might not perform that great and adds complexity.
 
If you really need to query the data:

Customer table
ID

Custom_fields table
ID
Name
Data type

Customer_custom_field_values
Customer_id
Custom_field_id
Value

Otherwise, just have a field on the customer with json/xml. MySQL will soon support queries on Json fields
 
Last edited:
This sounds like a typical EAV schema -> http://en.wikipedia.org/wiki/Entity–attribute–value_model

It can get a little messy and it has its disadvantages:
http://stackoverflow.com/questions/4481672/is-eav-hybrid-a-bad-database-design-choice
http://programmers.stackexchange.com/questions/93124/eav-is-it-really-bad-in-all-scenarios

An alternative is to store a document, something like a json object, inside a single column per user. This is similar to what you're doing, but without the need for a delimiter.
 
Customer table
ID

Custom_fields table
ID
Name
Data type

Customer_custom_field_values
Customer_id
Custom_field_id
Value

That's exactly where I'm stuck. The Value field in the customer_custom_field_values is only of one type... Which means I would probably have to store everything as varchar and still have to do conversions for the numeric values. Probably still a lot faster than the delimited method but not ideal.
 
This sounds like a typical EAV schema -> http://en.wikipedia.org/wiki/Entity–attribute–value_model

It can get a little messy and it has its disadvantages:
http://stackoverflow.com/questions/4481672/is-eav-hybrid-a-bad-database-design-choice
http://programmers.stackexchange.com/questions/93124/eav-is-it-really-bad-in-all-scenarios

An alternative is to store a document, something like a json object, inside a single column per user. This is similar to what you're doing, but without the need for a delimiter.

Aaah. I think this is what I'm looking for, I'll check this out.

The json plan actually sounds like a very good idea though, I'm sure processing json would be a lot quicker than using php string functions.

Thanks!
 
Aaah. I think this is what I'm looking for, I'll check this out.

The json plan actually sounds like a very good idea though, I'm sure processing json would be a lot quicker than using php string functions.

Thanks!

Just consider whether you'll ever need to search/sort/filter on any of these key-value pairs within that json/delimited string. If that's the case, it may be worth looking at EAV. If however these operations are limited and would mostly be done during offline processing batch jobs or reporting, you could always push this data through a pre-processor which spits out a new table/temporary table, then perform your operations on that table.

It's worth putting a little thought into this, as the wrong design pattern could come back and bite you :) The good thing is it's relatively easy to to move from EAV -> JSON or JSON -> EAV or any other store.
 
Or you can add/edit table fields as needed if you want to get complex :whistle:

I have a similar issue and actually have to split the individual items on the receipt to a table for the different items, trying to avoid having to delimit it
 
I'd use XML if time is on your side. You can create xml schemas that will accept your xml data in a valid format or reject it. I know you can do this in sql server.

Basically, you could have 3 xml schemas to accept your 3 different value types. Before it is inseted into the xml field, you validate it against your schema.

Then you would use xpath to query the data.
 
If you really need to query the data:

Customer table
ID

Custom_fields table
ID
Name
Data type

Customer_custom_field_values
Customer_id
Custom_field_id
Value

Otherwise, just have a field on the customer with json/xml. MySQL will soon support queries on Json fields

I'd second this approach.

Unless you want to change to mongodb and then save it all as json in an object..?
 
I'd second this approach.

Unless you want to change to mongodb and then save it all as json in an object..?

I explained I can't do it like this because of the different data types. I have however considered the Mongo route, but that is a massive undertaking to move the entire app over.
 
If you really need to query the data:

Customer table
ID

Custom_fields table
ID
Name
Data type

Customer_custom_field_values
Customer_id
Custom_field_id
Value

Otherwise, just have a field on the customer with json/xml. MySQL will soon support queries on Json fields

I'd second this approach.

Unless you want to change to mongodb and then save it all as json in an object..?

Actually, going back to this... If were to store all the values as say VARCHAR, and then called my queries like (pseudo code):

Code:
SELECT customers.name, customers.email, field_data.value FROM customers, field_data WHERE CAST(field_data.values, UNSIGNED) = php_var

This could work, but I have no idea how speed would be affected by casting the variable each time?

Thoughts on this route?
 
LOL someone brought this up as an idea recently - ability to add x amount of custom fields.I told them to go fly a kite.

I love having this power,MUHUHAHAHAHA!
 
Actually, going back to this... If were to store all the values as say VARCHAR, and then called my queries like (pseudo code):

Code:
SELECT customers.name, customers.email, field_data.value FROM customers, field_data WHERE CAST(field_data.values, UNSIGNED) = php_var

This could work, but I have no idea how speed would be affected by casting the variable each time?

Thoughts on this route?

I think it really depends on the actual performance requirements. It's all well and good to say that CAST inserts a performance hit, but how much does this hit really affect you.

Especially for INT's.
I havent tried, but I am pretty sure there is not going to be much difference between
SELECT CAST('-123' AS SIGNED INTEGER);
and
SELECT -123

That is obviously a super simple use case, and I am unsure of the actual performance when using WHERES or JOIN ON.

best to run some EXPLAIN's to get some metrics.

is 50ms vs 100ms really important to you. its also worth nothing, that a 50ms vs 100ms doesn't necessarily automatically translate into a 1sec vs 2sec, 10sec vs 20sec, etc

live performance vs report performance is also a consideration. maybe you are only really using the data in reports, and then performance can in some cases even be ignored
 
Last edited:
I think it really depends on the actual performance requirements. It's all well and good to say that CAST inserts a performance hit, but how much does this hit really affect you.

Especially for INT's.
I havent tried, but I am pretty sure there is not going to be much difference between
SELECT CAST('-123' AS SIGNED INTEGER);
and
SELECT -123

That is obviously a super simple use case, and I am unsure of the actual performance when using WHERES or JOIN ON.

best to run some EXPLAIN's to get some metrics.

is 50ms vs 100ms really important to you. its also worth nothing, that a 50ms vs 100ms doesn't necessarily automatically translate into a 1sec vs 2sec, 10sec vs 20sec, etc

live performance vs report performance is also a consideration. maybe you are only really using the data in reports, and then performance can in some cases even be ignored

I guess I will just have to test it. I'm quite positive though it would increase performance drastically compared to how it currently being done and would scale a lot better. This is all I really need so will definitely check out this avenue. Thanks for the tips so far, there is a lot for me to go through.
 
another spanner to consider :)

allowed values, eg for select boxes, multiselect listboxes, non boolean checkbox groups, etc....
 
Top
Sign up to the MyBroadband newsletter
X