TechNet Blog – The Road to Take

These are the Updates You Are Looking For

This article was originally posted September 2009 on my old blog blogs.technet.com/ad which I handed over to Alex Simons years ago. I wrote it to help Active Directory admins gain a better understanding of how AD replication works so they could catch problems or fine-tune AD update convergence. Of course, the reference to Microsoft Operations Manager (MOM) is a dead giveaway to the fine aging of the article.

I think the information in this article is still relevant for AD admins now, but it is also incredibly useful information for security architects, consultants, and analysts.

As I mentioned in my last article, Active Directory continues to be a target for bad actors. The security industry therefore concentrates endpoint security on the domain members (workstations, servers) and domain controllers in the environment. And rightly so! AD is where an attacker will be trying to glean info about an environment to begin lateral movement and possible elevation to the cloud.

Why not simply review the directory itself rather than the audit or other event logs? AD replication metadata can only be overwritten by later updates. So, if there is missing log data at least there is still that forensic thread to unravel.

The directory may have some evidence in terms of what has been updated. Each update of an attribute requires detailed tracking to ensure that the entire directory (all domain controllers) gets the update. That tracking is metadata describing which attribute(s) where changed, on which DC, the version of the attribute which is incremented each time an update is applied, and when it was changed. This data alone can provide key insights, but gains solidity when combined with secure channel information or audit information. When you add in data from other sources then a more holistic and clearer picture can emerge.

Which data? Attributes like password unicodePwd, Kerberos servicePrincipalName, group memberof. Bad actors could do things like change a password, add SPNs, add user or group members, and other things which would leave a trail of forensic information in the AD replication metadata. Not only AD replication convergence can be tracked using these techniques. A savvy security practitioner could track individual updates to specific attributes on specific AD objects.

This article can help you follow that trail. Read below to find the updates you are looking for.

In this blog post we’re going to go over a few techniques that are a bit old school but will come in handy for understanding how things work even if you ultimately use a great monitoring suite like MOM. Now, there are great articles here and here that describe good general ways to start checking your AD replication-and the information on those articles still applies. In this post we’re going to go a bit past and to the side of them though.

Before we go further, we need to go over USN Highwater-marks and Up to Dateness vectors and how they are used. In my experience these are the two data points in tracking updates that are the most confusing in Active Directory replication.

Of course, USNs are Update Sequence Numbers and are an ever-increasing counter of numbers assigned to updates-unique per domain controller. As updates are received from peer replicas, or as updates originate at that domain controller itself, the next USN in the series is used to signify that update. In other words USNs are local numbers on each DC. However, those local USNs are monitored by peer domain controllers who look at what the most recent and highest number USN was in order to help decide whether or not some of those updates are needed to be replicated in. If they are not needed then they can be discarded…which is what propagation dampening is.

A recent supportability article had excellent explanations of up-to-dateness vector and high-water mark which I’m pasting below:

For each directory partition that a destination domain controller stores, USNs are used to track the latest originating update that a domain controller has received from each source replication partner, as well as the status of every other domain controller that stores a replica of the directory partition. When a domain controller is restored after a failure, it queries its replication partners for changes with USNs that are greater than the USN of the last change that the domain controller received from each partner before the time of the backup.

The following two replication values contain USNs. Source and destination domain controllers use them to filter updates that the destination domain controller requires.

Up-to-dateness vector A value that the destination domain controller maintains for tracking the originating updates that are received from all source domain controllers. When a destination domain controller requests changes for a directory partition, it provides its up-to-dateness vector to the source domain controller. The source domain controller then uses this value to reduce the set of attributes that it sends to the destination domain controller. The source domain controller sends its up-to-dateness vector to the destination at the completion of a successful replication cycle.
High water mark Also known as the direct up-to-dateness vector. A value that the destination domain controller maintains to keep track of the most recent changes that it has received from a specific source domain controller for an object in a specific partition. The high-water mark prevents the source domain controller from sending out changes that are already recorded by the destination domain controller.

Let’s dig in with a scenario where you are the admin and you have noticed that there is a replication backlog at some AD sites. In this situation we have anecdotal complaints from our help desk where they see users created in New York have an hour or even occasionally days before those users show up on DCs in the Los Angeles site. Although it’s sometimes wise to take help desk reports with a grain of salt this isn’t something you want to ignore.

We have three sites-Los Angeles, Kansas City and New York-and we have DCs in each site. For the question at hand, we need to figure out whether there is, in fact, a replication back log and if so how big it is. Repadmin.exe, since it is the Swiss Army knife of AD replication tools, would be the first tool to use (repadmin /showrepl * /csv that is) however it is entirely possible to have a back log of updates between two replicas and not see constant or even intermittent errors from them if they are replicating-albeit replicating slowly.

Now let’s see why the USNHighwater-mark and Up-to-Dateness Vectors are important in tracking updates by using the command “repadmin /showutdvec < hostname> <distinguished name of naming context>”. To understand what is happening between the three DCs Server15 in LA, Server17 in KC, and Server12 in NY we will need to run the showutdvec command once on each server and then examine the results.

Ran on or against Server15:

LosAngeles\server15 @ USN 16531174 @ Time 2009-09-21 13:54:45

KansasCity\server17 @ USN 35282103 @ Time 2009-09-17 12:51:15

NewYork\server12 @ USN 1581572 @ Time 2009-09-21 13:54:39

Ran on or against Server17:

LosAngeles\server15 @ USN 16531174 @ Time 2009-09-21 13:54:45

KansasCity\server17 @ USN 36483665 @ Time 2009-09-21 10:54:41

NewYork\server12 @ USN 1581572 @ Time 2009-09-21 13:54:39

Ran on or against Server12:

LosAngeles\server15 @ USN 16531174 @ Time 2009-09-21 13:54:45

KansasCity\server17 @ USN 35295102 @ Time 2009-09-18 07:03:08

NewYork\server12 @ USN 1581572 @ Time 2009-09-21 13:54:39

Let’s take KC and NY and compare them:

KC LOCALLY:server17 @ USN 36483665

NEW YORK: server17 @ USN 35282103

Now subtract what NY knows of KC having versus what KC has as high-water mark:

36483665 minus 35282103 = 1201562

So, there is a difference of 1,201,562 updates between what the Kansas City server named Server17 has and what its peers think it has.

This tells us that Server17 has received (from some other DC not listed above) or originated approximately 1.2 million updates and that the LA and New York servers have not processed those updates yet. This also tells us that the KC DC Server17 is receiving inbound updates from the other two sites just fine.

That suggests a replication backlog, since the up-to-dateness vector (that USN number above) for Server17 which the LA and NY servers have retained for tracking locally are lower than the USN Highwater-mark which actually is on the KC server itself.

Are all of these updates ones that NY and LA actually need? Perhaps not-it simply depends on the nature of the updates.

More than likely propagation dampening will occur as the replicas try to process the updates from KC. Propagation dampening is the routine which assesses whether a received updated is needed by the local domain controller or not. If the update is not, then it is discarded. For those unneeded updates you would see an event like below following a similar event ID 1240 if you have your NTDS diagnostic logging for Replication events turned up:

9/20/2009 10:35:30 AM Replication 1239 Server15

Internal event: The attribute of the following object was not sent to the following directory service because its up-to-dateness vector indicates that the change is redundant.

Attribute:9030e (samaccountname)

Object:<distinguishedname of object>

Object GUID:d8frg570-73f1-4781-9b82-f4345255b68u

directory service GUID:9fbfdgdf66-3e75-4542-b3e7-2akjkj776b

That leads us to the question of how to find out more about what those updates are.

To do that we can issue an LDAP query against KCs DC Server 17 for all of the objects that have a recent WhenChanged attribute. To do that we first get the USNHighwatermark for the given partition from our showvector command above and subtract a number from it in order to display the most recent updates against that DC. In our scenario that would be 36483665, and we will subtract 1000 in order to query for the most recent 1000 updates.

Open LDP.EXE.
From the Connection menu select Connect and then press OK in the Connect dialogue that appears.
From the Connection menu select Bind and then press OK in the Connect dialogue that appears.
Next, click on the Browse menu and select Search.
Enter the partition’s distinguished name in the BaseDN field (DC=<partname>,DC=com).
Paste the following in the filter field: (usnchanged>=36482665)
Select Subtree search.
Click on Options and change the size limit to 5000.
Still in Options add the following to the Attributes list (each entry separated by semicolon) to those already present: usnchanged;whenchanged
Then click Run.

And here is a sample of our result set:

>> Dn: CN=Test134417,OU=Accounting,DC=treyresearch,DC=com

4> objectClass: top; person; organizationalPerson; user;

1> cn: Test134417;

1> distinguishedName: CN=Test134417,OU=Accounting,DC=treyresearch,DC=com;

1> whenChanged: 09/13/2009 15:11:26 Central Standard Time;

1> uSNChanged: 36483650;

1> name: Test134417;

1> canonicalName: treyresearch.com/Accounting/Test134417;

>> Dn: CN=Test134418,OU=Accounting,DC=treyresearch,DC=com

4> objectClass: top; person; organizationalPerson; user;

1> cn: Test134418;

1> distinguishedName: CN=Test134418,OU=Accounting,DC=treyresearch,DC=com;

1> whenChanged: 09/13/2009 15:11:26 Central Standard Time;

1> uSNChanged: 36483649;

1> name: Test134418;

1> canonicalName: treyresearch.com/Accounting/Test134418;

In this case, after a large sampling of all the most recent updates to occur on the KC DC, we see that someone or something is creating users named Test<number> in the Accounting OU.

Is it some provisioning software that the accounting department uses? A migration from another directory? What if the objects were of some other type, something unique enough to be immediately understood?

These are all questions that you can apply to a concern like this once you have an idea about those updates you are looking for.

A Day at the SPA

This blog post was originally published July 9, 2007, on the TechNet blog https://blogs.technet.com/ad. It is worth noting that the capability discussed in this article is now available in the Windows Performance Toolkit, and the toolkit also allows for customization of the performance analyzers.

Ah, there’s nothing like the stop-everything, our-company-has-come-to-a-complete-halt emergency call we sometimes get where the domain controllers have slowed to a figurative crawl. Resulting in nearly all other business likewise emulating a glacier as well owing to logon and application failures and the like.

If you’ve had that happen to one of your domain controllers, then you are nodding your head now and feeling some relief that you are reading about it and not experiencing that issue right this moment.

The question for this post is: what do you do when that waking nightmare happens (other than considering where you can hide where your boss can’t find you)?

Well, you use my favorite, and the guest of honor for this post: Server Performance Advisor. Otherwise known as SPA.

Think of SPA as a distilled and concentrated version of the Perfmon data you might review in this scenario. Answers to your questions are boiled down to what you need to know; things that are not relevant to Active Directory performance aren’t gathered, collated or mentioned. SPA may not tell you the cause of the problem in every case, but it will tell you where to look to find that cause.

So, I’ve talked about the generalities of SPA, now let’s delve into the specifics. Well, not all of them, but an overview and the highlights that will be most useful to you.

SPA’s AD data collector is comprised of sections called Performance Advice, Active Directory, Application Tables, CPU, Network, Disk, Memory, Tuning Parameters, and General Information.

Before you reach all the hard data in those sections, though, SPA gives you a summary at the top of the report. It’ll look something like this:

Summary

CPU Usage (%) 2

Top Process Group CPU% lsass.exe 98

CPU% Top Activity SamEnumUsersInDom 5

Top Client CPU% Rockyroaddc01.icecream.dairy.org 5

Top Disk by IO Rate IO/sec 0 11

Performance Advice is self-explanatory and is one of the big benefits of SPA over other performance data tools. It’s a synopsis of the more common bottlenecks that can be found with an assessment of whether they are a problem in your case. Very helpful. It looks at CPU, Network, Memory and Disk I/O and gives a percentage of overall utilization, it’s judgment on whether the performance seen is idle, normal or a problem and a short detail sentence that may tell more.

The Active Directory portion gives good, collated data and some hard numbers on AD specific counters. These are most useful if you already understand what that domain controllers baseline performance counters are. In other words, what the normal numbers would be for that domain controller based on what role it has and services it provides day to day. Though, SPA is most often used when a sudden problem has occurred, and so at that point establishing a baseline is not what it should be used for.

The good, collated data includes a listing of clients with the most CPU usage for LDAP searches. Client names are resolved by FQDN and there is a separate are that gives the result of those searches.

AD has indices for fast searches and those indices can get hammered sometimes. The Application Tables section gives data on how those indices are used. The information this gives to you can be used to refine queries being issued to the database (if they were to traverse too many entries to get you a result for example) if you have an application that is doing that sort of thing, it can suggest that you need to index something new, or that you need to examine and perhaps fix your database using ntdsutil.exe.

The CPU portion gives a good snapshot of the busiest processes running on the server during the data gathering. Typically, this would show LSASS.EXE as being the busiest on a domain controller, but not always-particularly in situations where the domain controller has multiple jobs (file server, application server of some kind perhaps). Generally speaking, having a domain controller be just a domain controller is a good thing.

Note: If Idle has the highest CPU percentage then you may want to make sure you gathered data during the problem actually occurring.

The Network section is one of the most commonly useful ones. Among other things, this summarizes the TCP and UDCP client inbound and outbound traffic by computer. It also tells what processes on the local server were being used in conjunction with that traffic. Good stuff which can give a “smoking gun” for some issues. The remaining data in the Network section is also useful but we have to draw the line somewhere or this becomes less of a blog post and more like training.

The Disk and Memory sections will provide very useful data, more so if you have that baseline for that system to tell you what is out of the normal for it typically.

SPA is a free download from our site and installs as a new program group. Here’s where you can get it (install does not require a reboot):

http://www.microsoft.com/downloads/details.aspx?familyid=09115420-8c9d-46b9-a9a5-9bffcd237da2&displaylang=en

A few other things to discuss regarding SPA.

· It requires Server 2003 to run.

· As I stated above, when you have a problem is the worst time to establish a baseline

· The duration of the test can be altered depending on your issue. The default time set for it is 300 seconds (5 minutes). Keep in mind that if you gather data a great deal longer than the duration of the problem then you run the risk of averaging out the data and making it not useful for troubleshooting.

· In the same way that there are ADAM performance counters, SPA has an ADAM data collector

· The latest version (above) includes an executable that can kick this off from a command line, and which can be run remotely via PsExec or similar.

· Server 2008 Perfmon will include SPA like data collectors…or so I hear.

· SPA will not necessarily be the only thing you do, but it’s a great starting place to figure out the problem.

See? A day at the SPA can really take the edge off of a stop-everything, our-company-has-come-to-a-complete-halt emergency kind of day. Very relaxing indeed.

The Road to Take

A lot of what we end up doing in the information technology field is not really about the technology itself but rather choosing the right figurative road to take with respect to technology. This paradigm can be applied to most software and IT industry roles. This article introduces a new blog which will help you choose the best technology roads to take with Identity and Security technologies.

A lot of what we end up doing in the information technology field is not really about the technology itself but rather choosing the right figurative road to take with respect to technology. Those who are successful in IT know how to bring together the needs of an organization with what is available and figure out which is the right technology to meet those needs.

This paradigm can be applied to most software and IT industry roles.

Product managers are an example of this. Software product managers work with their customers to determine the things the software needs, the engineering teams to see what is possible and practical, and ultimately with the business planning and marketing to ensure that the approach will meet the business needs.

IT architects are another example. They are given requirements from their business leaders and internal customers and work with internal and perhaps external partners to implement the desired solutions which comprise the enterprise architecture.

On a smaller and more iterative scale, consultants and engineers are presented with a problem or problems which will be solved by information technology, and they must choose the right solution and implement it.

It’s all about determining the right path and taking the organization down it.

Of course, determining the right road to take is the hardest thing to do. Deriving what is needed is only the first step. Gaining enough technical expertise and experience so that the right solution can be selected and implemented is the next step. This is why people who are the best at choosing the right technology road to take are often the people who implemented similar solutions before.

I have had extraordinary experiences helping organizations choose the right information technology roads to take. From being a field consultant early in my career, providing Security and Identity solutions at Microsoft, developing the Microsoft engineering delivery organizations, to building new Azure AD features. I have been blessed by the opportunity to lead and contribute in so many companies and institutions and in all parts of the globe over the years.

In this blog I will recount some of those experiences but also provide my insights, repost old blog posts from my AD blog circa 2006-2014 along with updates and commentaries, and provide PowerShell code I wrote which was useful in Security and Identity scenarios with the hope it will help others too. I’ll also discuss technical leadership and my thoughts on technology trends.

We may not be going to the same destination but hit the road with me. It’ll be an interesting walk.