Database design (MySQL)

thesadguy

Active Member
Joined
Nov 29, 2008
Messages
70
Reaction score
0
I have 2 tables, entity_A and entity_B.

entity_A has a field that links to one or more records in entityB.

entity_B has only a handful of records (say 20), while entity_A has a few thousand.

What is the best way to design for this? It doesn't seem to make sense to have entity_B have an entity_A_id field, as multiple records in entity_A would require linkage to the same entity_B record. In this case, entity_B would be many times larger than entity_A, with a LOT of redundant information (as I say, a few thousand records instead of just 20).

Currently, the system maintains an object for each entity_A record. The object is serialised and stored in the database. When it needs to be retrieved, it is unserialised. The entity_B table, is only used for ease-of-use for the user; displaying a form with a list of entity_B records and an "Add" button.

The big problem with this approach is that the entity_B data is not easily searchable.

Any ideas?
 
You will need a mapping table (Table C) if table_a needs to reference more than one row from table_b

example:


Table A:
ref
name
...

Table B:
ref
name
...

Table C:
ref_a
ref_b


SQL: (Psudo)

SELECT a.*, b.* FROM table_c JOIN table_a JOIN table_b WHERE table_a.ref=table_c.ref_a AND table_b.ref=table_c.ref_b;
 
Last edited:
BTW: Just remember that disk space is cheap and DBs are highly optimised apps. Use the basics and never reinvent the wheel.

Mapping tables can also grow but never be afraid of their size. Also use foreign key constraints on your tables. It makes it more bullet proof and easier to understand what you are doing.

After working with DBs everyday for the last 3yrs I've accepted this.
 
You will need a mapping table (Table C) if table_a needs to reference more than one row from table_b

How would that look?

Couldn't I just as easily do what I described above, have table_b have multiple records, one for each of table_a's records' entity_b entries? Then table_b would be very large, but it would work.

Having a table_c to do the mapping would surely require a record for each mapping of a table_b record to a table_a record? Or am I missing the point?
 
"In this case, entity_B would be many times larger than entity_A" not quite true.
Multiple entity A records can reference the exact same record in entity B, no need to duplicate the data in entity B at all.

Ofcourse, assuming I understand you correctly.

Real world example: ent_B contains 'Cities' and the user can add new Cities

ent_A contains addresses, like street, house numberm, suburb and ofcource the city referenced in ent_B

Does that sort of describe the way the tables are used ?
If so, then the design is fine, as long as data is not duplicated in ent_B, because thats just not neccessary.
(I don't know MySQL that well, I know oracle, SQL server very well. Does MySQL have primary keys or unique indexes ? )
 
But as I say, having 20 records vs ~3 x size(table_a)... doesn't make sense.
 
But as I say, having 20 records vs ~3 x size(table_a)... doesn't make sense.

Don't fight it.

Say you have 1000 entries in table_a (unique rows) and each need to reference 3 (or even 20) rows in table_b

Your mapping table will have 3000 rows then.

Also dbs tend to grow. Its a good habit to learn mapping tables now. You will always use them in the future.
 
Sorry, can't divulge the ACTUAL usage of the tables, but a correct analogy follows.

2 tables: People and Cities

People contains a few thousand people, and one of the fields is "cities_lived_in"
Cities contains only about 20 cities.

Clearly some people would have lived in more than 1 city.

It's guaranteed that every Person has lived in AT LEAST one of the Cities.
 
*** UPDATED TO BE MORE RELEVANT TO YOUR QUESTION ***

Table A: rows 3
=====
Piet
Koos
Jan
...


Table B: rows 4
=====
Cape Town
Jhb
Durban
Pofadder
...


Table C: 6 rows

Table_A_ref
Table_B_ref
When_Lived
=====
Piet - Cape Town - 2009
Koos - Cape Town -2009
Koos - Jhb -2001
Jan - Pofadder - 2009
Jan - Cape Town - 2008
Jan - Jhb - 2007


* SELECT WILL GIVE YOU *

Piet, ..., Cape Town, ..., 2009
Koos, ..., Cape Town, ..., 2009
Koos, ..., Jhb, ..., 2001
Jan, ..., Pofadder, ..., 2009
Jan, ..., Cape Town, ..., 2008
Jan, ..., Jhb, ..., 2007
 
Last edited:
Sorry, can't divulge the ACTUAL usage of the tables, but a correct analogy follows.

2 tables: People and Cities

People contains a few thousand people, and one of the fields is "cities_lived_in"
Cities contains only about 20 cities.

Clearly some people would have lived in more than 1 city.

It's guaranteed that every Person has lived in AT LEAST one of the Cities.

OK

This is a textbook mapping table example. Any SQL beginners book will resolve it with my recommendation.

See the above post for a psudo layout
 
Sorry, can't divulge the ACTUAL usage of the tables, but a correct analogy follows.

2 tables: People and Cities

People contains a few thousand people, and one of the fields is "cities_lived_in"
Cities contains only about 20 cities.

Clearly some people would have lived in more than 1 city.

It's guaranteed that every Person has lived in AT LEAST one of the Cities.

That analogy/scenario implies using a mapping table, unless you only use the people table to track the cities they've lived in and don't mind duplicating the people for each city
 
Right, got it ta.

As the others have said, linking table is best practice.

PARENT CHILD SETUP:

PEOPLE
- PERSONID (primary key)
- NAME
- bla bla bla


CITIES
- CITY_ID
- NAME
- BLABLABLA

PEOPLE_CITIES
- PERSONID (primary key, and foreign key to PEOPLE )
- CITY_ID (primary key, and foreign key to CITIES )
-


Above, as others said, perfect for your application.
Sorry, can't divulge the ACTUAL usage of the tables, but a correct analogy follows.

2 tables: People and Cities

People contains a few thousand people, and one of the fields is "cities_lived_in"
Cities contains only about 20 cities.

Clearly some people would have lived in more than 1 city.

It's guaranteed that every Person has lived in AT LEAST one of the Cities.
 
*** UPDATED TO BE MORE RELEVANT TO YOUR QUESTION ***

Table A: rows 3
=====
Piet
Koos
Jan
...


Table B: rows 4
=====
Cape Town
Jhb
Durban
Pofadder
...


Table C: 6 rows

Table_A_ref
Table_B_ref
When_Lived
=====
Piet - Cape Town - 2009
Koos - Cape Town -2009
Koos - Jhb -2001
Jan - Pofadder - 2009
Jan - Cape Town - 2008
Jan - Jhb - 2007


* SELECT WILL GIVE YOU *

Piet, ..., Cape Town, ..., 2009
Koos, ..., Cape Town, ..., 2009
Koos, ..., Jhb, ..., 2001
Jan, ..., Pofadder, ..., 2009
Jan, ..., Cape Town, ..., 2008
Jan, ..., Jhb, ..., 2007

++
 
Ok. If you guys say so, I trust you. Just didn't seem like the elegant solution I was looking for.

So you reckon I should have a table that contains all unique people, another table that contains all unique cities, and then a third table that contains the mappings between them? Bearing in mind, that we have quite a few fields that we need to reference in this way?

Basically have an extra mapping table for each of these relationships?

This kind of thing isn't frowned upon?
 
Oh, and how would I go about designing this to be user-expandable?

Currently, the user is able to add a new "entity_X" table and populate it with unique records. Then (possibly many) of these entity_X records need to be linked to various People records. This is easily accomplished using the abovementioned object/serialise/unserialise method. But it seems that using the mapping table would require 2 new tables to be built as a new entity table is added, and I would probably need to keep a third extra table to describe to the system how they all relate?
 
Without knowing what exactly you're trying to do, we can only help as so far as the information you provide to us.

Don't be afraid to use more tables. In the end it is better to query multiple smaller tables, than one big table. Also, for indexes your rows should be as unique as possible, and having one big table will make this less likely.
 
Ok, so, while it doesn't seem that elegant, it is best to

1) Have a separate table for each piece of unique information (i.e. people table with fields for personal information)
2) Have a separate table for each type of data we're trying to extract

?
 
Try smothing like that for data driven field additions

PERSON(ID,Name)
1,Gregg
2,Bob

CITY(id,Name)
1,Cape Town
2,Durban

PERSON_CITY(ID,Person_ID,City_ID)
1,1,1
2,1,2
3,2,1



DATA_TYPE(ID,Type )
1,Street
2,Surburb


PERSON_CITY_DATA(PersonCity_ID,DataType_ID,DataPointValue)
1,1,MainRoad
1,2,Seapoint
2,1,Gimpy Street



PERSON_CITY Table could be left out depeding on how the data would be used.
 
Last edited:
If you have multiple pieces of data that would be referenced like that, you can still use only one physical linking table ;)

Simply add a 'link type' to the table, obviously part of the PK, (we do this in our systems btw)

e.g.

ENTITY_LINK
- SOURCE ID
- TARGET ID
- LINK_TYPE


Unfortunately, you will lose some strict referential integrity if the source/target objects are from various different tables (e.g. Cities and Suburbs all referenced by TARGET_ID)

EDIT: Example
SOURCEID,TARGETID,LINKTYPE
Jan, Pretoria, 1
Jan, Jhb, 1
Jan, Koffie, 2
Jan, Tee, 2
etc...
 
Top
Sign up to the MyBroadband newsletter
X