Data warehousing - Best Practise

louisp

Senior Member
Joined
May 23, 2004
Messages
662
Reaction score
1
Location
.
Hi,

I'm evaluating a data warehouse team.

This team designed and implemented a solution that extracts line of business data monthly and then stores it in a fact-dimensional model.

Thus far the structure seems very sound and they stuck to the Kimball approach.

It is now almost 9 months after implementation and I found a few problems.
Due to system issues, some of the months had incomplete loads. There is thus missing data in the warehouse.

If you were to do a transaction count per month, some months would show zero when it shouldn't due to the incomplete loads.

What is best practise?

Is this allowed?
Should there not be complete data with every load?

Please point me to any reading you might have on this subject.
 
thats why i prefer the Linux approach. Pref using mySQL with gLUSTER storage blocks. Its fast, its redundant, its low level, and no application code can fck it up. my 2c. Sorry cant help.
 
Each month's data must be balanced against the system generating the input, the corresponding financial systems, etc.This balancing should form part of the system design, and is crucial for system accuracy.

(From my experience CEO's take a dim view of of IT who, when asked for last month's turnover, respond with the all-too-common: 'According to our accounting system R2 million, according to our sales system R3 million, and according to our data warehouse R50 ... :) )

You could perform some form of averaging, but it would probably skew the results even more due to seasonal trends, public holidays, bad weather ... :(

Bottom line: if you are looking for accuracy - and a data warehouse that you can trust - ditch the lot & start again.
 
Each month's data must be balanced against the system generating the input, the corresponding financial systems, etc.This balancing should form part of the system design, and is crucial for system accuracy.

(From my experience CEO's take a dim view of of IT who, when asked for last month's turnover, respond with the all-too-common: 'According to our accounting system R2 million, according to our sales system R3 million, and according to our data warehouse R50 ... :) )

You could perform some form of averaging, but it would probably skew the results even more due to seasonal trends, public holidays, bad weather ... :(

Bottom line: if you are looking for accuracy - and a data warehouse that you can trust - ditch the lot & start again.

OK thank you, from your reply I gather that is not so uncommon but I don't have to accept it.
 
It is uncommon - from people who know what they are doing. They would never offer a data warehouse based on incomplete data.

OTOH, if the team has simply read part of a book on star schemas then the results are ... well, let's say, just what you have ... :)

There are occasions when accurate data is hard to come by and one can only approximate. But these are few and far between, and rarely - if ever - appear in a business context (deliberate fraud excepted)

If you worked for me - and you accepted the results that you are describing - you would be leading the queue for the door ... !!
 
Last edited:
The problem with having a data warehouse in SA is our marvelous phone lines. And in some cases the inaccuracy of the data is not related to the actual data warehouse but to where it's pulling the data from.

For instance, I made x ammount of sales this month but I use the system that was built for me wrong, IE place this data in the wrong place, then your data won't be accurate.

Working in the fast food industry I see this happening a lot, where a shop stores their stock take in GRV for instance. Sometimes it's out of the developers hands. It's impossible to account for every single strategy. You never know what someone will try with your software. You create something that should work for only food, and then they use that software in a hardware store. Just my opinion. :)
 
Top
Sign up to the MyBroadband newsletter
X