A while back I made a joke. It was hilarious.
Because it was accurate.
Why? Because domain trusts are complicated. Here's how they work. https://t.co/LjVvOcJlpk

— Steve Syfuhs (@SteveSyfuhs) November 24, 2020

Twitter warning: Like all good things this is mostly correct, with a few details fuzzier than others for reasons: a) details are hard on twitter; b) details are fudged for greater clarity; c) maybe I'm just dumb.

To understand trusts we have to understand domains. In Windows-land domains are logical groupings of things like users and resources like computers and services. These things are grouped together by a name -- the domain name: foo.name.com.

These domain names can be whatever you want. They can mirror real DNS names registered publicly, or they can be internal-only and represent just your own stuff.

The things within the domain can only ever belong to a single domain. UserA belongs to domain.com. The computer myserver$ belongs to just domain.com. Both of these things can also exist in niamod.com, but they're different entities. Totally unrelated.

A user in a domain can logically access resources in the same domain. That is, they exist within the same security boundary (we'll come back to this). The user and resource both trust the domain, so when the domain says the user can access the resource, the resource listens.

The thing that dictates these security rules is the domain controller. It is the arbiter of access control. In Kerberos-land it is the key distribution center: A bit about Kerberos (syfuhs.net)

So far all of this is pretty simple (in the sense that I'm lying to you and its actually quite complicated, but you know, details) because everything is self-contained within a single domain. It's easy enough to reason about.

However, it gets a little more interesting when users in one domain need to access resources in another domain. 

How do? Through a domain trust.

A domain trust is an agreement between two domains where domain B is willing to allow users in domain A to access resources in domain B. In effect to act like a member of domain B.

Trusts only work in a single direction. Domain B trusts domain A. If a user in B tries to access a resource in domain A, domain A will block it.

But that doesn't mean you can't have multiple trusts between the same two domains.

B trusts A. That's one trust. Users in A can access resources in B.

A trusts B. That's another trust. Users in B can access resources in A.

There's only two domains and one direction, so in total there can only ever be two trusts.

However, there's nothing to stop multiple domains from trusting a single domain.

B trusts A.

B trusts C.

Users in C can access resources in C, but not in A.

Users in B cannot access resources in A or C.

Users in A can access resources in B, but not in C.

And trusts can be transitive. Meaning a user in domain A can access resources in domain C, by way of domain B.

B trusts A.

C trusts B.

A => B => C.

This is all well and good, but how does it actually work using the protocols Windows understands? In this case Kerberos and NTLM. Let's set the stage.

You have a user Alice in domain A. You have an SMB file share \\partnerstuff\ in domain B. A domain trust exists between the two domains such that domain B trusts domain A. In other words users in A can access stuff in B.

The user types \\partnerstuff\ into explorer and the SMB stack lights up. It connects to that server and the server asks it to authenticate. This is just plain old SSO so far: How Windows Single Sign-On Works (syfuhs.net)

The client will attempt Kerberos first and it connects to the domain controller and asks for a ticket to cifs/partnerstuff. Here the fun begins.

A domain controller has a list of all the service principals in it's domain. In this case the service principal doesn't exist in this domain. It exists in another domain. What does the domain controller do? Well, it consults with it's list of domain trusts.

The domain controller checks the SPN and looks for a suffix. Suppose the user typed in \\partnerstuff.domainb.com. The DC can infer from the name and has a pretty good idea which domain oversees this resource: domainb. However, the user just typed in \\partnerstuff. Hmmmm.

So it...guesses. Well, in the sense that it has a list of domains, and they've optionally been configured in an order, and it grabs the first one from the list.

Now the domain controller has the domain it thinks the resource belongs to so it creates a referral ticket to that domain. A referral ticket is just a plain old service ticket, except the resource is actually another domain instead of the file share.

Remember the security boundary thing. Domain A can't issue tickets to resources in domain B. However, domain A can issue a referral to domain B. When the domain trust was created, a secret key was shared between both domains. Domain A encrypts the referral ticket to that secret.

The domain controller has generated the referral and returns it to the client. The client looks at the response and says "waaaaaait, this isn't for the thing I asked for". The referral is instead of krbtgt/domainb.com.

The client is aware of this special service name [format]. It knows this is a referral. It knows it needs to use this special ticket to get the real ticket it wants, and it now knows what domain oversees this resource.

Since the client knows this is a special ticket, it knows it can do something weird, like use it in place of a TGT. In this case the client now has two TGTs: one to its own realm domaina.com, and now a referral TGT to domainb.com.

Though the special-ness of this ticket is a bit overstated. The client received the ticket and it stuck it in the cache. The client then decided it needed to make a TGS-REQ to a KDC in domainb so it looked for krbtgt/domainb.com in the cache, and oh hey, there's a ticket.

Anyway, the client has been given a hint: domaina doesn't know this resource, but here's a ticket to domainb, go ask them.

So the client makes a TGS-REQ to domainb, using the TGT it now has for domainb, and asks for cifs/partnerstuff.

The KDC receives this request and looks at the TGT. It's for krbtgt/domainb.com -- good good, but it's issued by realm domaina.com -- uhhhhh, what? This is a special hint to the KDC to go check the list of trusts it knows about.

So the KDC finds the domaina trust and gets the secret that was previously shared. The KDC decrypts the ticket. The KDC then looks for the cifs/partnerstuff SPN and finds it. Woohoo! The KDC generates a service ticket and returns it to the client.

The client now has a service ticket and it hands it off to the SMB stack. The SMB stack fires it off to the remote server and down the SSO rabbit hole we go.

But then I also said the trusts can be transitive. What happens when the SMB server is actually in domain C? The same exact thing, except repeated from B to C.

A returns a referral to B. B looks up the SPN and says "pfft, no idea, maybe try C" and returns a referral to C. C says "oh yeah, I got you" and returns a service ticket.

"But Steve" you say "this is just for domains. What's the deal with these forest things?" Hooboy. Okay. Deep breath. Forests.

Forests are hierarchical collections of domains. In Windows-land a domain MUST belong to a forest. It may be a forest of one, but there will always be a forest. It's kind of like a tree. The forest root is a domain itself.

Now suppose you have a forest corp.company.com. In this forest you have two domains: childa and childb. Their names are childa.corp.company.com and childb.corp.company.com.

Forests provide this special structure to the domains by way of trusts. Childa and childb trust corp. By virtue of the transitive property childa trusts childb and childb trusts childa.

Forests also provide another form of security boundary. To understand this we have to look at how authorization works across trusts.

Remember in the SSO thread where I said authorization is based on this Privilege Attribute Certificate thing? The PAC. That still applies to trusts. How Windows Single Sign-On Works (syfuhs.net)

The PAC contains a list of group memberships in the form of Security Identifiers -- SIDs. The SID is a globally unique identifier of the group or user. They're of the form S-1-{authority}-{domain}-{RID}.

The domain portion is the SID of the domain itself, and the RID is the relative identifier of the user/group in the domain. The RID is guaranteed unique within a domain, and the SID is guaranteed unique globally.

So when the KDC of your domain issues a referral ticket, it includes your PAC with your SIDs. The domain receiving the referral examines your SIDs and filters out SIDs that shouldn't belong in it.

What does that mean? SIDs that shouldn't belong? Well, a user in a forest might be a member of groups from a whole bunch of domains in the forest. That identity gets projected to all resources within the forest through this PAC.

But also across forests. You can have trusts between forests. The corp forest domain trusts childa, and the partner forest domain trusts the corp forest.

childa <= corp <= partner.

So a user in childa can access resources in partner, despite being an entirely different forest.

However, forests are bigger security boundaries. When you project your identity through the PAC to the other forest, the other forest is going to filter our anything that might be dangerous.

Principally that means any SID that has the domain portion matching the SID of the partner forest. The forest will accept 

S-1-{corp}-123

But it'll block

S-1-{partner}-456

This is because the partner forest only trusts corp to project identities from corp.

Why would corp ever provide a SID for a forest outside it's own security boundary? It wouldn't, unless it were evil and wanted to get access to resources it shouldn't normally have. Hence the filtering.

Plus, forest trusts are NOT transitive. You can't get referrals from foresta to forestb to forestc to forestd. It won't work. It'll be blocked.

This is why we generally refer to forests as the real security boundary. Domains will do some SID filtering for sanity reasons, but you can still project a false identity with SIDs from the target domain.

Now, if forests are separate collections of domains, how do domain controllers know to issue referrals to these entirely unrelated named things?

We're back to that hint or guess step. If the SPN is fully qualified, the DC grabs the rightmost portion of the name and compares that to the Top Level Names list of a trust. These TLNs say "I can (probably) issue tickets to this resource if the rightmost portion ends in my TLN".

My forest is corp.company.com and I have a trust to ext.partner.com. The TLN on the trust is ext.partner.com. The rightmost portion of cifs/partnerstuff.ext.partner.com is ext.partner.com, so go use that trust. Easy.

But again, people often just type \\partnerstuff. We're back to guessing. The KDC has this thing called Forest Search Order. It basically says "if you can't get a ticket from our domain then try all these other forests in this order until you get a ticket".

You can configure FSO on the KDC or the client. Both of them are really just hints. The KDC is either telling the client "eeeehhh, maybe try this one", or the client is asking the KDC "eeeeeeh, maybe I should try this one?"

Eventually a referral is chased enough that it finds a domain that'll issue the ticket. In large environments this can be kind of a pain. Normally whenever the client needed this ticket it'd have to start from the beginning and chase it down every time.

Thankfully Windows clients have this thing called the SPN cache. It basically acts as a shortcut. When the client finally receives the ticket it requested so many hops before, the realm of the final ticket is logged with the SPN.

The next time a service ticket is requested for that SPN, the cache is consulted. The client then knows it can skip all the referral chasing and just go directly to the domain that has the SPN. The client already has the referral TGT in the cache, so no extra work.

But that initial chase might still be expensive. It's going to take a while if you have to hop through a bunch of domains and the KDCs for those domains are over slow networks or just very far away. It adds up.

This is also kind of a pain because there's only so many hops you can go through before the client gives up. This is more of a safety thing so it doesn't stuck in a cycle chasing the same referrals over and over again. I think Windows allows up to 25 referrals?

Give or take. I can't be bothered to find it in the code. It's not like anyone has that complex of a forest environment anyway.

But anyway, every hop still counts. The DCs are sometimes very clever. Sometimes they know exactly what domain a resource is in, but can't necessarily get you directly to it. This is common in child domain => forest a => forest b chasing.

The DC can analyze this graph and provide a shortcut hint. It can't provide a referral directly to the final domain, but it can provide a hint that says when you get to domain B explicitly request a referral to domain C.

Before I wrap this up I wanted to touch on how NTLM fits into all this. Ugggggh.

It basically works the exact opposite way.

Instead of the client asking the KDC for a ticket, the client connects to the target resource and provides a nonce. The target server forwards that off to its own DC, and the DC checks the nonce.

The DC can't process the nonce because its from another domain, so the DC finds a DC in the other domain and asks the other domain to process it. If that other domain can't process it, that domain finds another domain, and so on until it finds the appropriate domain.

This does have the useful property that the client isn't particularly chatty, but the connection to the target is held open for quite a while. On the other hand as the client you also have no idea WTF is going on behind the scenes.

Anyway, here's Bruce not pleased at all with having to listen to me explain all this.

EnnnqmaVcAAorlw.jpg