SQL Server Replication Problem

retromodcoza

Senior Member
Joined
Mar 4, 2006
Messages
528
Reaction score
452
Hi Guys... I need a favour because I'm stuck.

Googling and hours of diagnosis have brought me closer , but I have a specific problem I can't get past. It might be because I don't fully understand how replication works.

I have a laptop. It has SQL Server 2012 Standard 64bit. (I'll call this local)
I have a server. It has the same SQL server version.

The database on the laptop is exactly the same as the database on the server. Every day , new data enters the laptop database via a script. This can't happen on the server for a number of irrelevant reasons.

So , once the data is on the laptop and some calculations are done on the new data , it needs to be added to the server database. However , the tables which hold this data are huge (+-2GB) . As a result , I need to sync just the changes to the server. (I'm on a DSL line at 1Mbps upload).

I tried the data compare feature inside Visual Studio 2012 data tools , and this worked for the smaller tables. However , Visual Studio crashes when trying to compare the larger tables , so I discounted this option.

I then moved on to replication. What a nightmare. After solving a million errors...I managed to get the server as the publisher , and the laptop as the subscriber in a pull merge replication. Running the replication fails. It throws an error :

The schema script 'schema.sch' could not be propagated to the subscriber.

Further investigation reveals :

Message: The process could not read file '\20140208174443\DailyShareData_2.sch' due to OS error 53.

OS error 53 means a network error where the laptop cannot access the servers file.

This is where it all falls apart , as I do not understand why this would be necessary and seems to defeat the point of replication.

Please could anyone with experience on this shed some light on it , or if not , could I call you on the phone and ask a bunch of questions? Perhaps PM me your tel no? I know exactly what needs to be asked and won't waste your time.

Again , I have Googled this extensively , so I'm begging you not to point me to vague links. If I have missed information that needs to be posted please put it in and I'll provide it.

Cheers
 
I think replication is a bit of an overkill, and without meaning any insult, think you need to rethink your architecture. It would benefit having the processing done on the server itself rather than laptop to server, especially considering you're on a 1mbps line. Both in terms of sanity and speed, not to mention if your laptop or yourself gets hit by a bus, the company can continue working without having to resort to selling their souls to get their systems back up and running.

However, I think SSIS will be welcome in this situation. It has several nice features and should do what you want very easily.
 
I've worked with replication back in the days of SQL 7 and not so much since, so if I refer to options that may have changed or are no longer available, please forgive me. Also, the passage of time may have clouded my memory a bit, but I'll try my best. I can't help you with the specific error, but I can explain how replication works in broad terms.

If I understand your post correctly, new data is only added to your laptop and not to the server. This means your laptop should be the publisher and the server the subscriber. Publishers publish data for replication and subscribers subscribe to new data. Think of a newspaper. The newspaper gets published daily and the subscribers get a copy thrown in a muddy puddle in their front yard. The flow of information is from the publisher to the subscriber.

Because data flows only one way in your example, a merge replication would be inefficient. You say "pull merge" and I'm not familiar with this. I assume it's an option that has been added since the olden days. In SQL 7 you got push, pull or merge.
Merge means that data changes flows both ways. If data will be changed on both publisher and subscriber side in a specific table, this is what you need. It is also the the type of replication with the most niggles.
Pull means that the subscriber does the work. It connects to the publisher and pulls any changes that happened on the publisher to itself at the specified interval. This is set up on the subscriber.
Push means that the publisher does the work. Changes are pushed to the subscriber at the specified interval. This is set up on the publisher.
If you only need to replicate once a day, don't set the interval to "continuous" or whatever it is called now. Set your interval to correspond to your needs.

Another thing you should know about is a snapshot. If you generate a snapshot, it copies your entire table from the publisher to the subscriber, overwriting the subscriber table with the data of the publisher, whether it is in step or not. Use with caution, they are heavy on bandwidth.

Lastly, your databases need to be exactly the same. I'm not referring to indexes, triggers or even table layouts (I think you can replicate only specific fields of a table if you so desire, but I've never done this, best to keep the tables identical and publish the whole thing), I mean the encoding you use and those type of settings. You can actually take a back-up of your laptop database and restore it on your server as a new database and if you used the wrong encoding during restore, your replication will fail. I would actually hazard a guess and say that may be your problem. Your databases may look the same from your point of view, but the underlying settings may be different.

So the TL;DR version: For your problem, as I understand it (and in SQL 7 terms), your laptop must be the publisher and your server the subscriber. Set up a pull replication on the server. If you prefer, you can rather do a push from the laptop. Both will get the job done, but it is probably best to do some reading on the scenarios best suited to push and pull. I always used the machine with most free processing time to do the work.

Hope this helps in some way. And I really hope things have not changed so much since I last touched replication that I'm actually confusing you more now than helping :)
 
Hi Guys

Many thanks for taking the time to respond guys. I probably should have included what I tried and failed in the thread.

Replication may very well be overkill. However , I'm battling to come up with another solution. Rethinking the architecture is an idea , but limited by available funds. This is my own project , and the server is a VPS. If I had the cash I would host in a local colo or haul fibre onto our premises and host the servers myself. Running the calculations on the server causes two problems - one it probably violates the VPS policy to inflate CPU usage for too long (Which I am less concerned about) - and the other is that the calculations take a very long time unless SSD's are used due to temporary table writes in some of the calculations. While the server has SSD caching...I have no idea how long the calcs will take , and even if they dont take too long , I can't check the output before making it live.

SSIS may be a solution , but my knowledge of it is limited. I will go retake a look at it and see if I can come up with something. Half the problem is getting the actual data up on to the server - as uploading backup files timeout erratically. I have tried FTP where the file uploading continues on network failure but it only restarts from scratch. Not good on a 15hr upload. To that end , I ran out of solutions

@Mister44. I would make the laptop the publisher , but it is behind a NAT and cant be seen by the server. Thus , push for me is kinda out - but youre right - that would be the most correct way of doing it. The server as a publisher is OK because the entire table is downloaded once to the subscriber and using pull merge changes that are made on the subscriber are sent to the publisher. Its a two way sync... according to Microsoft documentation and my own internal testing.

I will check encoding....

And...I'm totally OK with the publisher rewriting the entire table in the beginning for the sync. The bandwidth is plentiful in this direction.

Ill do some digging here. Many thanks guys.
 
I reckon SSIS will be your saviour

However, who cares if the VPS needs to do some work, that's what it's there for :)

The only reason it'll go to disk is because there's not enough RAM. But again, it's a VPS, and unless you're hosting with someone like Hostgator who will actively disable your account because of high load for a few minutes, I sincerely doubt that it will be a problem.

It sounds like you haven't even tested the performance of the VPS to handle the calculations, so I suggest you do that first before spending another few hours on trying to make it work in the way you do. It will be easy enough to create a staging area before you go live with it, but then you have to ask yourself, how much failure is there expected with the process that will affect live data and why you can't just run the process on the VPS and make it live all in one go after fixing potential hiccups/errors
 
I think I agree with the posters recommending a change in your process (working with big data myself, I know how much of a pain it can be to get data from one place to another).

What sort of calculations are you doing and what is the output thereof? Are you updating tables or are you generating new rows to be inserted into a table?
 
Top
Sign up to the MyBroadband newsletter
X