Thursday, September 4, 2014

SOLVED - Virtual Domain Controller Time is Wrong - Hyper V

I recently ran into quite a few problems with hyper V, and found the following steps very useful for troubleshooting.  Problem #6 specifically fixed our issue with virtual domains being about 300 seconds ahead of actual time.

There is a lot of confusion about how time synchronization works in Hyper-V – so I wanted to take the time to sit down and write up all the details. 
There are actually multiple problems that exist around keeping time inside of virtual machines – and Hyper-V tackles these problems in different ways.
Problem #1 – Running virtual machines lose track of time.
While all computers contain a hardware clock (called the RTC – or real-time clock) most operating systems do not rely on this clock.  Instead they read the time from this clock once (when they boot) and then they use their own internal routines to calculate how much time has passed.
The problem is that these internal routines make assumptions about how the underlying hardware behaves (how frequently interrupts are delivered, etc…) and these assumptions do not account for the fact that things are different inside a virtual machine.  The fact that multiple virtual machines need to be scheduled to run on the same physical hardware invariably results in minor differences in these underlying systems.  The net result of this is that time appears to drift inside of virtual machines.
UPDATE 11/22: One thing that you should be aware of here: the rate at which the time in a virtual machine drifts is affected by the total system load of the Hyper-V server.  More virtual machines doing more stuff means time drifts faster.
In order to deal with time drift in a virtual machine – you need to have some process that regularly gets the real time from a trusted source and updates the time in a virtual machine.
Hyper-V provides the time synchronization integration services to do this for you.  The way it does this is by getting time readings from the management operating system and sending them over to the guest operating system.  Once inside the guest operating system – these time readings are then delivered to the Windows time keeping infrastructure in the form of an Windows time provider (you can read more about this here: http://msdn.microsoft.com/en-us/library/bb608215.aspx).   These time samples are correctly adjusted for any time zone difference between the management operating system and the guest operating system.
Problem #2 – Saved virtual machines / snapshots have the wrong time when they are restored.
When we restore a virtual machines from a saved state or from a snapshot we put back together the memory and run state of the guest operating system to exactly match what it was when the saved state / snapshot was taken.  This includes the time calculated by the guest operating system.  So if the snapshot was taken one month ago – the time and date will report that it is still one month ago.
Interestingly enough, at this point in time we will be reporting the correct (with some caveats) time in the systems RTC.  But unfortunately the guest operating system has no idea that anything significant has happened – so it does not know to go and check the RTC and instead continues with its own internally calculated time.
To deal with this the Hyper-V time synchronization integration service detects whenever it has come back from a saved state or snapshot, and corrects the time.  It does this by issuing a time change request through the normal user mode interfaces provided by Windows.  The effect of this is that it looks just like the user sat down and changed the time manually.  This method also correctly adjusts for time zone differences between the management operating system and the guest operating system.
Problem #3 – There is no correct “RTC value” when a virtual machine is started
As I have mentioned – physical computers have a RTC that operating systems look at when they first boot to get the time.  This real-time clock is backed by a small battery (you have probably seen the battery yourself if you have every pulled apart a computer).  Unfortunately virtual machines do not have any “batteries”.  When a virtual machine is turned off there is no component that keeps track of time for it.  Instead – whenever you start a virtual machine we take the time from the management operating system and put this into the real-time clock of the virtual machine.
This is done without the use of the Hyper-V time synchronization integration servers (it happens long before the integration services have loaded). 
The downside of this approach is that this does not take into account any potential time zone differences between the management operating system and the guest operating system.  The reason for this is that “time zones” are a construct of the software that runs in a virtual machine – and is not communicated to the virtual hardware in any way.  So – in short – when we start a virtual machine there is no way for us to know what time zone the guest operating system believes it is in.
One partial mitigation we have for this issue is that when the Hyper-V time synchronization component loads for the first time – it does an initial user mode set of the time to ensure that the time gets corrected as quickly as possible (using the same technique as discussed in problem #2).
So now that you understand how this all works – let’s discuss some common issues and questions around virtual machines and time synchronization.
Question #1 – I have a virtual machine that is configured for a different time zone to the management operating system.  Should I disable the time synchronization component of Hyper-V?
No, no, no, no, no, no, no.  And I say again – no.  As I have mentioned above – all time synchronization that is done by the Hyper-V time synchronization integration service is time zone aware.  If you disable the Hyper-V time synchronization integration service you will disable all the time synchronization aspects of Hyper-V that are time zone aware – and only leave the initial RTC synchronization active – which is not time zone aware.
This means that your virtual machines will go from booting in the wrong time zone, and then being corrected as soon as the Hyper-V time synchronization integration service loads to booting in the wrong time zone and staying in the wrong time zone.
Question #2 – Is there any way that I can stop Hyper-V from putting the wrong time in the RTC at boot?
In short; no.  We need to put something in there – and that is the best thing that we have to work with.
Question #3 – Can’t you use UTC time in the RTC so that the correct time is established when the virtual machine boots?
UTC (which is the computer techy version of saying GMT) time would solve this problem nicely with only one problem.  Windows does not support UTC time in the BIOS (Linux does).  So while this would solve the problem for our Linux running user base – the fact of the matter is that most of our users run Windows – and this would not work for them.
Question #4 – What about if I am using a different time synchronization source (e.g. domain time or a remote time server)?
Hyper-V time synchronization was designed to “get along well” with other time synchronization sources.  You should not need to disable Hyper-V time synchronization in order to use a different time synchronization source – as long as it goes through the Windows time synchronization infrastructure.
In fact – if you are running a Domain Controller inside a virtual machine I would recommend that you leave Hyper-V time synchronization enabled but that you also setup an external time source.  You can do this by going to this KB article:http://support.microsoft.com/kb/816042 and following the steps outlined in the “Configuring the Windows Time service to use an external time source” section.
UPDATE 11/22: I should have mentioned: since virtual machines tend to lose time much faster than physical computer, you need to configure any external time source to be checked frequently.  Once every 15 minutes is a good place to start.
Question #5 – How can I check what time source is being used by Windows inside of a virtual machine?
This is easy to do.  Just open an administrative command prompt and run “w32tm /query /source”.  If you are synchronizing with a remote computer – its name should be listed.  If you are using the Hyper-V time synchronization integration service you should see the following output:
image
If you see this output:
image
It means that there is no time synchronization going on for this virtual machine.  This is a very bad thing – as time will drift inside of the virtual machine.
Question #6 – Wait a minute!  My virtual machine should be synchronizing to the domain (or an external server) – but when I run that command it tells me that the Hyper-V time synchronization provider is being used!  How do I fix this!
I do not know why this happens – but sometimes it happens.  The first thing that you should do is to check that your domain does have a correctly configured authoritative time source.  There have been a small number of times when I have seen this problem being caused by the lack of an authoritative time source.
Alternatively – you can “partially disable” Hyper-V time synchronization.  The reason why I say “partially disable” is that you do not want to turn off the aspects of Hyper-V time synchronization that fix the time after a virtual machine has booted for the first time, or after the virtual machine comes back from a saved state.  No other time synchronization source can address these scenarios elegantly.
Luckily – there is a way to leave this functionality intact but still ensure that the day to day time synchronization is conducted by an external time source.  The key thing trick here is that it is possible to disable the Hyper-V time synchronization provider in the Windows time synchronization infrastructure – while still leaving the service running and enabled under Hyper-V.
To do this you will need to log into the virtual machine, open an administrative command prompt and run the following commands:
reg add HKLM\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\VMICTimeProvider /v Enabled /t reg_dword /d 0
This command stops W32Time from using the Hyper-V time synchronization integration service for moment-to-moment synchronization.  Remember from earlier in this post that we do not go through the Windows time synchronization infrastructure to correct the time in the event of virtual machine boot / restore from saved state or snapshot.  So those operations are unaffected.
w32tm /config /syncfromflags:DOMHIER /update
This command tells Windows to go and look for the best time source in the domain hierarchy.  If you want to use an external time server instead you can use the commands found here: http://technet.microsoft.com/en-us/library/cc784553(WS.10).aspx
net stop w32time & net start w32time
w32tm /resync /force
These two commands just “kick the Windows time service” to make sure the settings changes take effect immediately.
w32tm /query /source
This final command should confirm that everything is working as expected.
When you run these commands you should see something like this:
image
Question #7 – I have a virtual machine that has gotten ahead of time, and it never gets corrected back to the correct time.  What is going on here?
As a general rule of thumb, when time drifts inside a virtual machine it runs slower than in the real world, and the time falls behind.  We will always detect and correct this.
However, in the past, we have had reports of software problems caused when the Hyper-V time synchronization integration service decides to adjust the time back – because it believes the virtual machine is ahead of time.  To deal with this (rare) issue – we put logic in our integration service that will not change the time if the virtual machine is more than 5 seconds ahead of the physical computer.
UPDATE 11/22: I was asked how having the virtual machine in a different time zone to the Hyper-V server would affect this.  The short answer is that it does not.  The 5 second check is done after we have done the necessary time zone translation.
Question #8 – When should I disable the Hyper-V time synchronization service (either in the virtual machine settings, or inside the guest operating system)?
Never.
There are definitely times when you will want to augment the functionality of the Hyper-V time integration services with a remote time source (be it a domain source or an external time server) but the only way to get the best experience around virtual machine boot / restore operations is to leave the Hyper-V time integration services enabled.

Wednesday, March 26, 2014

DNS, Replication, and Intersite Trust Issues

Are you seeing a lot of strange replication and trust issues between domains in your company? We recently started migrating users from a company we purchased into our corp domain, everything appeared to be going fine, and then one day it all went to hell.

The trust between our domains went down, group policy's went all out of sync, lots of "can't contact a logon server" errors.

I can't say for sure any one of these caused the others, but it was chaos for a few days.

Here are the main things we fixed, and what you can look at if any of this sounds familiar to you.

Domain Trust: This was going up and down for us, it would hold steady for a week and then go down and come back up.  The trust itself doesn't throw any big obvious errors unless you are looking for them, so double check your trust and validate them to make sure everything is up.

DNS:  Usually the root of most of issues, in our case we had switched from conditional lookups to secondary zones at some point.  Make sure you remember to enable zone forwarding in the source domain, or this won't work.  Also specifically if you are using server 2008 and pushing to server 2012, make sure you are checking your _msdcs_ folder. We had to call microsoft about this one, and it solved a ton of our issues.  The msdcs zone is not automatically created, and has it's own forwarding zone check box.  So you have to do that and make sure you allow forwarding.

DFS Replication: At one point, one of our servers was shut down and created a "dirty" database for replication.  This halted all replication on the domain, as server 2012 does not have automatic recovery enabled.  Check your event history under DFS replication, you should see a warning along with the full command that needs to be run in order to force replication to start again.

Friday, June 7, 2013

Syncing a Distro Group with Office 365

When you go to build a distribution group that is going to sync back to your active directory, there are a few things to watch out for that will stop your group from syncing correctly.  If you are just building them into the cloud, then you don't have to worry about this.

You build out your distro just like you would a user with two key differences:


  1. You will not add a proxy address for the email as you would with a user, manually adding this will confuse the sync and cause it to not work.  
  2. You will need to add a display name attribute.  Office 365 requires a display name be present in order to sync, and it will not be happy until you manually add one. 


If you are unsure about where to add this attributes, they can be found under the attributes tab in active directory so long as you have the show advanced items option enabled.

Buffalo Terastation NAS - Hard Drive Replacement


Recently my terastation NAS had a hard drive failure, and I was stuck for days trying to fix it.  If you have been using the buffalo NAS for awhile, you are probably aware that it's pretty much awful.  The documentation is usually hard to find, and sparse.  What little information is there is frequently outdated, or just plain wrong.

This leaves users with google and a lot of forum crawling to fix issues.

After doing quite a bit of this myself, I have combined the solutions that worked for me along with what seemed to help other people and put them all here.  So, for your convenience:

The LEDs 

I never really found good information about these, but from what I gathered they reflect what you think they should.  Green is good, red is bad.  When one of my drives failed, the drive was a solid red color.  When I replaced it and the NAS was trying to do something, it was blinking red.

One thing I did find online that you should watch for is that if a drive fails, depending on your array setup, it may push all your other drives over the capacity warning level.  This will cause more drives to have a solid red light, making you think they are bad as well. When in reality they are just handling the data from the failed drive and almost full, which is indicated by a solid red light.

The Replacement Drive

The instructions I found online said to simply remove the bad drive, swap in the new one, and hold down function for ~3 seconds until you hear a beep. If you are lucky this will work for you, I tried this first, and the LED started to blink, but then nothing happen.  The tiny display on the NAS didn't give me any information other than a drive had failed, and the web gui didn't mention anything was happening.  It can take upwards of 24 hours to rebuild the array, so I let it blink away until the next day, but nothing happen.

This leads me back to the hard drive itself.  There are a few things that caused issues for people, first of which is the replacement drive being partitioned or otherwise having data on it already.  This is bad and can stop the NAS from using the drive.

To fix this, hook the drive up to a computer and run some disk maintenance on it and make sure it's a clean hard drive. While you are doing this, make sure the drive is set to be dynamic, not basic.  Mine was set to basic, which turned out to be the core of my issue.

Another thing to look for is the jumper on the drive depending on which brand you have.  A number of people had to use a jumper on their drive set to Cable Select.  Their old drive didn't have it set that way and worked, but this fixed the issue for them.

After all this I set my drive back into the system, held down function for ~3 seconds and the lights started to blink again.  This time however, when I logged into the web gui, I get an alert saying that it is changing the array with a progress counter.

The System

Overall the web gui seems to be the place to go to make sure something is happening.  While this is not clearly documented anywhere, flashing lights don't seem to ever spit out errors beyond the disk is broken.  When you have correctly started a rebuild of your disk, the web gui will update.

One last thing to watch for with regards to the gui itself, you may need to do a cache refresh to see the latest settings / notifications. Your shares will still be available while the system rebuilds itself, but it will slow down.

It all sounds pretty simple, but the risk of breaking the temperamental system was unnerving to me especially when combined with very little useful information about how to resolve the issue, and not knowing what feed back to expect from the system when things were happening.

Wednesday, August 1, 2012

Invalid Boot.ini - SOLVED



My favorite thing when driving into work is "the following servers are offline" alert messages!

Two of my drives failed at the same time in my raid 5 array which resulted in the server being unable to rebuild the c drive. Luckily, we keep backups of all the important information, but the server still had to have it's OS re-installed. Everything went super, or so I thought, I reinstalled server 2003 and rebooted; then I saw the evil error...

Invalid Boot.ini
booting from C:\windows
NTDETECT Failed

After a tinkering around for quite some time, I resorted to googling, which was chaotic and filled with tons of strange advice ranging from expanding the ntdetect.com file from the install disc, to rewriting a custom boot.ini file based on the partition. The 3rd line of the error seems to change from person to person, and doesn't seem to be the root of the issue. Something is most likely wrong with your boot.ini which is causing the rest of the problems.

What worked for me

Simply reinstalling the operating system didn't work, and left some trace of the previous OS causing the issue I was having. I had to go back through the install process, and instead of just reinstalling, I deleted the entire partition and then remade it while installing.  This worked perfectly for me, and was quite simple.

What else you can try

Being able to totally wipe the partitions may not work for everyone, if the above doesn't work for you I would try some of these solutions. If you didn't know this already, boot.ini is a hidden system file that should be in your root C: drive.  You need to show hidden files/folders and display hidden system files in order to see it. 

Grab your install disc

I would start by grabbing your install disc, you will need it to access the recovery console which is the heart of pretty much all these fixes. 

Next I would try and run a chkdsk, it's a long shot, but if it works it's easy! Just load the recovery console and do:  chkdsk /r 

If that doesn't work and you have another server that has the same setup, you can try copying that systems boot.ini file and using that on the server you are having issues with. 

If you are still unable to get the system up and running I would google some more.  There are a few other fixes you can try, but they can get a bit complex and other sites have done a good job of describing them.