What is Security Assertion Markup Language (SAML) and how does it work?

Latest

Auth & identity

Engineering

January 10, 2023

Author: Stytch Team

Welcome back to B2B Auth School. Our mission is to help B2B companies’ uplevel their understanding and implementation of user authentication technologies. Our first series of posts is dedicated to single sign on (SSO). This article is lesson six in that series.

Lesson one | Introducing B2B Auth School
Lesson two | Organization tenancy: the foundation of SSO and B2B data models
Lesson three | What is single sign on?
Lesson four | SSO protocols: SAML vs OIDC
Lesson five | What is OpenID Connect (OIDC)?
Lesson six | What is Security Assertion Markup Language (SAML) and how does it work?
Lesson seven | Choosing a B2B auth provider

In the previous lesson, we took a close look at OpenID Connect or OIDC – one of the two most popular protocols for handling single sign on (SSO). We looked at the origin of OIDC, its close relation to OAuth, and how that authorization protocol was built upon to create what is now quickly becoming a preferred standard for identity claims for federated, enterprise SSO. 

Today we’re going to look at the other most popular SSO protocol, Secure Assertion Markup Language, or SAML protocol. We’ll look at:

  • The history of this legacy standard
  • Its steps and flow
  • Its key components and anatomy
  • Common footguns and challenges
  • Benefits and significance for B2B auth 

Let’s get started!

What is security assertion markup language, and why do we have it?

Like OIDC, SAML is a protocol for exchanging authentication and authorization data between parties (if you want a refresher on protocols, check out the previous blog post). Its most common usage is for single sign on. Unlike OIDC, which is built largely with JWTs and claims, SAML is built with Extensible Markup Language (XML) and what are called SAML assertions.

The origins of the security assertion markup language date back all the way to the beginning of this millennium, when it was developed to overcome certain shortcomings in lightweight directory access protocol (LDAP), an even older authentication standard that was popular for on-prem systems in the 1990s. Before SAML, LDAP was the de-facto authentication protocol, and excelled when users and servers all shared the same network and building. But as companies expanded and the internet grew in prominence, the need emerged to perform user authentication and user authorization across networks, directories, and applications. SAML filled that whitespace. 

The first version of SAML 1.0 was adopted by the Organization for the Advancement of Structured Information Standards (OASIS) in 2002, for the purpose of cross-directory security assertions and authentication. SAML 2.0 was ratified in 2005 and remains the latest up-to-date version of the protocol. Today, SAML’s primary use case is in B2B identity and access management, namely in enabling SSO.

How does SAML work?

Like OIDC, SAML authentication also works through a series of redirects and information that is conveyed along with those redirects. To start, let’s take a look at the main components.

Key components of SAML

Unlike OIDC, which inherited most of its terms and operations from OAuth, SAML authentication uses very similar terms to those used to describe single sign on more generally. If you read our overview of SSO, most of these terms and components will seem familiar. 

  • Service Provider (SP) is the B2B app or website that users attempt to log into. In SAML, they receive both user identities and control user access to resources from the Identity Provider via SAML assertion. 
  • Identity Provider (IdP) is the service that maintains user identities and issues SAML responses on behalf of the service provider (i.e. Okta, OneLogin)
  • Auth providers can be a 3rd party or built in-house auth solution that interfaces and coordinates auth flows between  identity providers and service providers using SAML assertions. 
  • B2B customer are customers of the service provider (usually companies or businesses) who wish to log into the service provider’s application with SSO and use an identity provider for identity management.
  • Member describes the end user who is a part of the B2B customer’s organization within the service provider’s app. They are the person or machine attempting to authenticate (or “gain user access”) to resources within the service provider using their identity provider. 

SAML flow: step-by-step

To see the big picture, let’s first walk through the SSO login flow that occurs between the SAML key components. 

Note: for this blog post, we’ll focus on a service provider-initiated (SP-initiated) SAML authentication flow, as it’s the most comparable with OIDC. BUT, readers should note that unlike OIDC, SAML authentication also allows for identity provider-initiated (IdP-initiated) single sign on. This looks a little different, and comes with specific security vulnerabilities that we touch on later. And if you want a quick refresher on IdP-initiated SSO, check out our overview of SSO

Now, back to that (service provider-initiated) SAML flow: 

  1. A member of the B2B customer’s organization needs to log in to the service provider, which has an option for SSO login via their auth provider.
  2. The service provider redirects to the auth provider to start the SSO login flow with parameters such as the member’s organization data and redirect urls.
  3. The auth provider generates the SAML request for authentication.
  4. The auth provider binds the SAML request to a transport protocol, an HTTP POST for example, and redirects the member to the identity provider.
  5. The identity provider receives the SAML request and asks the member to authenticate. 
  6. Once authenticated, the identity provider sends a SAML response and redirects the member back to the auth provider with that response.
  7. The auth provider parses the SAML response to validate the security assertions and signature within the XML. We’ll cover what these elements are in a later section. 
  8. After successfully validating the SAML response, the auth provider redirects the authenticated member back to the service provider’s application.
  9. The member is logged in to the service provider’s application.
  10. Success!

Compared to OIDC, the SAML authentication flow is rather straightforward. The auth provider quarterbacks the redirects and exchanges between the service provider and identity provider in a certain defined order. The only difference here is that the auth provider sends, receives, and validates data in the form of SAML requests and SAML responses

Unfortunately, that’s where the simplicity ends. 

SAML messages – data elements

The notoriously difficult parts about SAML relate to how the data is structured – which are coded in XML. So in order to fully understand SAML, we need to study its XML anatomy defined by the SAML 2.0 specs. SAML has several specs – Core, Bindings, Profiles, and Metadata – but for this article we’ll primarily be focused on Core.  

It’s worth noting that XML, like SAML, has been around for decades. Its maturity as a language and technology means that web services still use it in production today, but mostly as legacy applications that rely on older protocols like SOAP or XML-RPC. Ultimately, XML has aged quite poorly in today’s REST API landscape due to its verbose and delicate format when compared to modern data formats like JSON.

For SSO login, SAML messages come in two forms: SAML requests and SAML responses. The main difference between them is the direction in which they are sent. A SAML request is sent from the auth provider to the identity provider to request information about a user, while the identity provider sends a SAML response to the auth provider to provide the requested information. 

However, they both have an envelope-like data model with these four core data elements:

  • SAML request or SAML response – the top level element that defines what type of SAML message the XML document is.
  • Signature – the digital signature generated from public-private key cryptography that can be used to verify the authenticity and integrity of the SAML message. It generally contains both the signature value and public certificate encoded in base64. Depending on your security needs, a signature can be inserted at the SAML assertion level as well. We won’t fully cover signatures in this post, but we encourage you to read more about it on page 64 of the SAML 2.0 spec
  • SAML assertions – the data statements within the SAML request or response that describe the auth provider, identity provider, service provider, or any other component in the SAML flow. This is where most of the data about the SAML authentication flow itself resides. 
  • Attributes – the assertions specific to the user and its properties such as email, name, birthday, phone number, roles, etc. 

SAML messages can be tens or even hundreds of kilobytes in size. But remember, the vast majority of SAML syntax will layer into one of these four data elements. To demonstrate, let’s dissect specific examples of a SAML request and SAML response.

SAML messages – the SAML request

Created by the auth provider and sent to the identity provider, the SAML request is a request for authentication with attached data properties. Among these properties may be information that relates to the service provider, identity provider, auth provider, B2B customer, or how the data needs to be delivered and formatted. 

Note that in this article we won’t be going deep into XML semantics or schema (those could be their own article – stay tuned!). But if you want more specific information on SAML data schema or namespaces (mentioned below) check out page 11 in the SAML specs.

From top to bottom, we can deconstruct the XML into its most important parts and summarize their meaning:  

  • AuthnRequest – the top level element that represents the whole SAML message.

    • xmlns:samlp is an attribute that communicates two main things:

      • Namespaces are an XML-specific method for disambiguating between different element names. Any element prepended with a given namespace belongs to that namespace. In this case, xmlns:samlp establishes the samlp namespace, so that any element names prepended with samlp, like the “samlp:NameID” element, are parts of that same namespace (elements prepended with other namespaces, like “saml:IssuerID” would not belong to that namespace).
      • protocol-related: The word “protocol” appearing in the string value simply indicates that this is a protocol-related data element.
    • ProtocolBinding says the SAML request will use an HTTP POST to bind and transfer data over to the identity provider.
    • Destination is the identity provider’s endpoint where the request is directed.
    • AssertionConsumerServiceURL is the auth provider’s callback endpoint the identity provider should send the ensuing SAML response to.
  • Issuer – the assertion that specifies the entity ID of the identity provider (https://identityprovider.com/123456)
  • NameIDPolicy – the assertion that specifies the format of the member or user ID from the identity provider. In this case, the auth provider needs an email address. 

To quickly paraphrase: the SAML message is going to send an authentication request bound to an HTTP POST to the specified destination and entity ID and, within the request, it asserts that it needs an email address in the response when returned to the callback URL. 

Congrats, you’re now successfully reading SAML. 

SAML messages – the SAML response

The SAML response is a much longer and nastier web of XML, because it contains much more information. It has all the requested data, and usually more, about the user and the authentication flow. Created by the identity provider, the SAML response is the most important artifact since it carries all the assertions necessary that enable SSO to work. 

We’ll have to break the full SAML response into three parts.

SAML response – part one

Part one should look familiar. Like the SAML request, the SAML response has a top level element called the Response with attributes like xmlns and Destination, but with updated values; for example, the Destination attribute now specifies the auth provider’s callback url that will parse and validate the SAML response.

SAML response – part two

Part two of the SAML response is where all the relevant assertions start to come in. These assertions start to give context about the user and how they authenticated with the identity provider. You’ll see the Assertion element that wraps all the data statements inside, which include: 

  • Subject the user or member who is trying to authenticate via SSO. Notice how the user’s NameID is an email address (your_user@email.com), the specific format asserted in the SAML request from before. 
  • Conditions – the requirements and rules for the SAML authentication flow. For example, the NotBefore and NotOnOrAfter attributes define a range of time for which the assertion is valid.
  • AuthnStatement the information regarding how the user was authenticated by the identity provider. Taking a look at the string value, it seems the user logged with password credentials.

SAML response – part three

And finally, part three contains all the assertions about the user’s identity and profile. These user assertions are called attributes. Common attributes about a user include but are not limited to name, phone, email, location, etc. In our example, you’ll see an Attribute element for: 

  • The user’s name: Ada Lovelace
  • The user’s phone number: +1 (123) 456-7890
  • The two groups the user belongs to: Group One and Group Two

These attributes can provide a robust profile of a user’s identity. And that’s because attributes are customizable, in the sense that assertions can include about as much information and metadata an IT admin cares to stuff in there. But be mindful, every identity provider has its own unique, generally configurable, set of attributes for a user – which means the auth provider will need to know how to interpret them.

Adding all three parts together, the SAML response gives a full picture of the authentication flow and the user’s identity. But with multiple assertions and custom attributes, the length and format of the XML starts to multiply in complexity very quickly. It is a requirement for the auth provider to be able to parse and validate all the different variations of SAML syntax and assertions – which is a huge challenge, easier said than done. 

Common footguns & complications

While XML makes SAML seemingly endlessly extensible, it also comes with its fair share of footguns and challenges. The hive-mind that is the internet likes to give XML and/or SAML a hard rap for this, but we think it’s more helpful to think of these as challenges to be aware of rather than outright drawbacks. SAML is an incredibly powerful tool, but like any widely-adopted or long-existing standard, it needs to be approached with due consideration and thoroughness to get the most out of it.

SAML’s flexible XML structure that makes it so powerful also makes it vulnerable to footguns and cyberattacks if implemented poorly or without proper guardrails. Some of these include issues like buffer overflow attacks that target the XML parser; others include canonicalization issues, XML signature wrapping attacks, or attacks on XML encryption like adaptive chosen ciphertext attacks.  

To protect your product against these vulnerabilities, it’s important to:

1. Choose your XML parser very thoroughly. 

Many XML libraries have un-patched vulnerabilities, and many libraries are also not regularly maintained when new ones are found. This makes the XML parser a critical juncture for the security and efficiency of your SSO process.

2. Review XML best practices regularly. 

Especially if you are building your SAML / SSO solution in house, make sure your developers spend the time reviewing XML specific bugs and issues. Most developers who are more familiar or experienced with other languages will not intuit all of XML’s quirks and requirements. The devil is in the details with this one. 

3. Offer IdP-initiated SSO with due caution and care.

IdP-initiated SSO flows are more vulnerable than SP-initiated flows, and they are only possible with SAML. Because auth providers don’t initiate the request in this case, hackers attack by pretending to be the identity providers themselves. In spite of these vulnerabilities, larger B2B enterprise customers often request IdP-initiated SSO for the convenience of their users. To keep those customers safe, B2B companies who permit IdP-initiated SSO with their app should research and protect against these vulnerabilities thoroughly.  

4. Stay current with the latest news and events about SAML vulnerabilities. 

The online community is continually poking holes and experimenting with SAML vulnerabilities, in part because it is such an important cornerstone of B2B authentication and security. Because of this, you should never consider SAML a one-and-done, and should constantly be on the lookout for emerging news stories or vulnerabilities as they are discovered. For extra reading, check out engineer Ionnis Kakavis’ deep dive on a SAML assertion-related vulnerability he found in Microsoft Office 365. It’s a perfect demonstration of how this legacy protocol continues to get broken and improved as time goes on. 

Why SAML for B2B auth

So if there is so much literature on the challenges of working with security assertion markup language, why would your B2B app use it? Is it really worth the headache?

We’d say yes, but with an important caveat. 

SAML is definitely worth offering as a B2B company: there are no signs of it going away any time soon, despite OIDC’s rising popularity, and the flexibility and capacity of its assertions to communicate loads of detailed information proves ever-desirable, especially for bigger enterprise clients with custom security and identity management needs.

But just  because SAML is worth offering doesn’t necessarily mean it’s worth building in house, and it also doesn’t mean every auth provider that offers it is created equal. At Stytch, whenever we’re evaluating a vendor (or even our own product features), we ask two critical questions:

  • Is this going to accelerate a developer’s ability to do their best work (and thus improve / optimize our product), or will it introduce roadblocks or complications that will slow them down and/or divert their attention?
  • Will this investment scale with us as we grow? Or is this a temporary solution that we will have to reevaluate and/or replace after we scale?

When talking with customers, we too often see amazing and promising B2B companies try to build SAML SSO in-house, or go with a provider that just checks a box, but perhaps doesn’t offer the flexibility or features that B2B team will need after they hit certain milestones. Both choices end up draining in-house engineering time and talent far more than the company would have wanted or anticipated, and then requires even more time to rip and replace. 

So if you’re thinking about how to uplevel your B2B product with SSO (OIDC or SAML), we’d love to talk to you about future-proofing your product with best-in-class SSO from Stytch. 

SHARE

Get started with Stytch