Skip to content

Implement authentication. #15

Closed
campb303 opened this issue Aug 6, 2020 · 22 comments
Closed

Implement authentication. #15

campb303 opened this issue Aug 6, 2020 · 22 comments
Assignees
Labels
feature-request Request for functionality that has not already been implemented

Comments

@campb303
Copy link
Collaborator

campb303 commented Aug 6, 2020

webqueue2 does not have any form of authentication at this time. The original webqueue had HTTP protocol level authentication at the browser level that looks like this:

HTTP Auth in Chrome

This type of authentication cannot be accessed by screen readers or password managers. To address this, one of two options should be chosen and moved forward with:

  • Application level authentication like most web apps have with a login form or
  • Purdue CAS aka BoilerKey for a unified interface and two factor authentication
@campb303
Copy link
Collaborator Author

campb303 commented Aug 6, 2020

After the first webqueue2 demo, Dave expressed interest in moving forward with CAS due to a need for two factor authentication. This should be followed up on.

@campb303 campb303 added api tooling Related to tools and utilities for the management of the project feature-request Request for functionality that has not already been implemented labels Aug 7, 2020
@benf
Copy link

benf commented Aug 7, 2020

https://www.purdue.edu/securepurdue/identity-access/authentication-options.php

SAML and CAS both support BoilerKey.

@campb303 campb303 added this to the v1 milestone Sep 14, 2020
@campb303
Copy link
Collaborator Author

Initial research suggests that using ITaP's SAML interface to BoilerKey may be the easiest method of two factor authentication. More experimentation is needed. Direct ActiveDirectory authentication is also an option. I need to ask @seth what he uses to interact with ActiveDirectory to see how that works under the hood.

@campb303
Copy link
Collaborator Author

@seth Authenticates against BoilerAD directly at boilerad.purdue.edu

@campb303
Copy link
Collaborator Author

campb303 commented Nov 4, 2020

For our beta release we need authentication. For the short term this can be acheived by authenticating with a shared username and password and authorizing with JWTs. Later implementations of authentication will be 2FA via SAML or BoilerKey.

The authentication/authorization workflow should work as follows:

  1. The user visits the site for the first time and enters the shared username and password.
  2. The login form hashes the password using SHA 256, never storing the plain text password.
  3. The login form sends the username and hashed password to the API.
{
  "username": form.username,
  "password": sha256(form.password)
}
  1. The API receives the username and hashed password. The hashed password is combined with a salt then rehashed using SHA 256.
username = request.json.username
password = sha256(request.json.password + salt)
  1. If the hash received matches the hash stored, the API issues a new JWT for the user and sends it back to the client.
  2. The client stores the JWT in a cookie. When subsequent requests are made, the JWT is also sent.
  3. The API attempts to decrypt the JWT. If the token is decoded and has not expired, the user is presumed valid and the request is served.
  4. When the client wishes to logout, the JWT is removed from cookies.

webqueue2 Authentication Diagram


To achieve these the following things need to happen in this order:

Implment .env File Support In The API

JWTs require a secret key for encryption. This should not be stored in version control. Instead, the JWT secret key and other sensitive values can be stored in .env files that are ignored by version control.

dotenv is a Python package that will load entries from .env files into the os.environ namespace. These variables can then be got and set with os.environ.get(key) and os.environ[key] = value respectively.

Implement JWT support in the API

We are currently using Flask and FlaskRESTful to create our API.

Within this environment we can add JWT support via a low level library like pyjwt and use an authentication decorator as shown in this YouTube video.

Important: This videos suggests storing the JWT private key directly in source code. This is insecure as it would expose the key via version control. The key should be stored in an environment variable file outside of version control with separate keys for development and production.

The generated JWT is sent back to the client as dictionary with a key of token and a value of the JWT:

{ "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c" }

Flask specific JWT libraries could be considered such as Flask JWT Extended or Flask-Praetorian.

Note: This article suggests that there are to be two types of tokens:

  • Access Token: used for short term resource authorization and stored in memory for the running application.
  • Refresh Token: use for long term ability to get more access tokens and stored in a cookie with flags httpOnly, secure, and SameSite=strict set.

At this time, it appears that a short term access token alone will be sufficent but refresh tokens should be implemented at a later date.

Build A Login Page

We are currently using react-router to control which items get displayed in the frontend. There are plans to implement cookie based queue storage using react-cookie.

With JWT support in the API, we can retrieve a JWT by sending a valid username and SHA 256 password. This can be acheived by a controlled login form sending the authentication data then storing the JWT in a cookie. Its important that the cookie path be set to root / so that the JWT is sent back to server on every request.

Make react-router Authentication Aware

We are currently using react-router to control which items get displayed in the frontend.

react-router needs to be made aware of authentication so that the following things happen:

  • If a user does not have an unexpired JWT, redirect to the login page.
  • If a user logs outs, they'll be redirected to the login page.
  • If a use has left their page open past their JWT expiration date, redirect to the login page on next page load.

This can be acheived by:

  • Creating a custom PrivateRoute component extending react-router's Route to manage authentication status.
  • Create a custom context manager and hooks to make authentication data available without prop drilling.

Bother of these are dicussed in this article.

@campb303
Copy link
Collaborator Author

campb303 commented Nov 5, 2020

Support for .env files has been added to the API through the dotenv package. This is how it works:

Getting and (temporarily) setting environment variables during runtime is already available within Python scrips via the os library's environ submodule like this:

import os

# Get the USER environment variable
print( os.environ.get("USER") ) # campb303

# Set the LOL_BUTTS environment variable to nyan-cat
# Setting this variable does not persist outside of this script
os.environ["LOL_BUTTS"] = "nyan-cat"
print( os.environ.get("LOL_BUTTS") ) # nyan-cat

To set and load out own environment variables, we can create a file (typically named .env) in the same directory with key value pairs then, in out script, import the dotenv module and run its load_dotenv() function.

# ./.env
LOL_BUTTS=nyan-cat
test=123
# ./script.py
import os, dotenv

# Load environment variables from .env
dotenv.load_dotenv()

# Get the LOL_BUTTS environment variable
print( os.environ.get("LOL_BUTTS") ) # nyan-cat

# Get the test environment variable
print( os.environ.get("test") ) # 123

dotenv's load_dotenv() function can be customized as detailed on the project's GitHub page. For our purposes, the default behavior should suffice.

Any part of out Python codebase can utilize .env files by importing the dotenv module and running its load_dotenv() function.

@campb303
Copy link
Collaborator Author

FUll JWT support has been added to the API. There are two types of tokens:

  • Access Tokens: valid for 15 minutes and used to send and received data to and from the API.
  • Refresh Tokens: valid for 30 days and used to get new access tokens.

JWT usage depends on the API endpoint. There are four types of API endpoints:

  • Public: These endpoints are not authenticated and can be accessed by anyone.
    Example: /login
  • Access Token Restricted: These endpoints require a non-expired access token to be present. Refresh tokens will not be accepted.
    Example: /api/get_queues
  • Refresh Token Restricted: These endpoints require a non-expired refresh token to be present. Access tokens will not be accepted.
    Example: /tokens/refresh

Interacting with the API using a JWT based workflow would work like this:

First Time Login (Authentication):

Send a POST request to the /login endpoint with a JSON body containing a username and password:

fetch(
    "/login", 
    {
        method: "POST", 
        headers: { 'Content-Type': 'application/json'},
        body: '{
            "username": username, 
            "password": password
        }'
    }
);

The API will validate the username and password then return an access token in the body of the response.

{
    "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2MDQ3Nzk2MDUsIm5iZiI6MTYwNDc3OTYwNSwianRpIjoiY2EzMWMxMzAtNDU5OC00MTBkLThhNzEtM2JmMzhkODc5OTljIiwiZXhwIjoxNjA0NzgwNTA1LCJzdWIiOiJ3cTJCZXRhIiwiZnJlc2giOmZhbHNlLCJ0eXBlIjoiYWNjZXNzIiwiY3NyZiI6IjYzZTBjOWI3LTY5NzgtNDIwMy1iMjVmLTNhMTIzNWU3YzAzMSJ9.RsuZe0C8CMZvNlcLHRptWXwmj0RGNhW6YYylNbRMjco"
}

The API will also attach two cookies to the response:

  • refresh_token_cookie: containing the refresh token used to get new access tokens. This is a HttpOnly cookie meaning the client cannot read it but the API will receive it on subsequent requests.
  • csrf_refresh_token: containing a validation string used to confirm the refresh token. This must be manually sent back to the API as a header named 'X-CSRF-TOKEN` when getting new refresh tokens.

Refreshing Access Tokens

After the first login, new access tokens should be retrieved before the old access token expires to avoid errors.

Send a POST request to the /tokens/refresh endpoint with a header named X-CSRF-TOKEN containing the value of the csrf_refresh_token cookie:

fetch(
    "/tokens/refresh", 
    {
        method: "POST", 
        headers: {"X-CSRF-TOKEN": "8a32d817-8f77-42d1-94f2-e6876cb436a4"}
    }
);

The API will validate the refresh token with the CSRF string and return a new access token:

{
    "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2MDQ3Nzk2MDUsIm5iZiI6MTYwNDc3OTYwNSwianRpIjoiY2EzMWMxMzAtNDU5OC00MTBkLThhNzEtM2JmMzhkODc5OTljIiwiZXhwIjoxNjA0NzgwNTA1LCJzdWIiOiJ3cTJCZXRhIiwiZnJlc2giOmZhbHNlLCJ0eXBlIjoiYWNjZXNzIiwiY3NyZiI6IjYzZTBjOWI3LTY5NzgtNDIwMy1iMjVmLTNhMTIzNWU3YzAzMSJ9.RsuZe0C8CMZvNlcLHRptWXwmj0RGNhW6YYylNbRMjco"
}

Access Token Usage (Authorization)

When interacting with access token restricted endpoints, a unexpired access token must be sent in an authorization header:

fetch(
    "/ce/100", 
    {
        method: "POST", 
        headers: {"Authorization": "Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2MDQ3Nzk2MDUsIm5iZiI6MTYwNDc3OTYwNSwianRpIjoiY2EzMWMxMzAtNDU5OC00MTBkLThhNzEtM2JmMzhkODc5OTljIiwiZXhwIjoxNjA0NzgwNTA1LCJzdWIiOiJ3cTJCZXRhIiwiZnJlc2giOmZhbHNlLCJ0eXBlIjoiYWNjZXNzIiwiY3NyZiI6IjYzZTBjOWI3LTY5NzgtNDIwMy1iMjVmLTNhMTIzNWU3YzAzMSJ9.RsuZe0C8CMZvNlcLHRptWXwmj0RGNhW6YYylNbRMjco"}
    }
);

The API will validate the access token and respond with the requested data:

{
    "queue": "ce",
    "number": 100,
    "lastUpdated": "2020-09-28T13:26:00-0400",
    "headers": [ ],
    "content": [ ],
    "isLocked": "ce 100 is locked by knewell using qvi",
    "userEmail": "campb303@purdue.edu",
    "userName": "Justin Campbell",
    "userAlias": "campb303",
    "assignedTo": "campb303",
    "subject": "Beepboop",
    "status": "Dont Delete",
    "priority": "",
    "department": "",
    "building": "",
    "dateReceived": "2020-06-23T13:25:51-0400"
}

Renewing Refresh Tokens

When a refresh token expires, another login must happen.

@campb303
Copy link
Collaborator Author

campb303 commented Nov 12, 2020

Using authentication within the frontend use state variables require extraneous prop drilling (passing state variables through intermediate components for access down the tree) and would become brittle for refactoring with every prop reference needing to be changed. A better way to manage this would be with a React Context.

React Context allows data to be arbitrarily passed through a component tree without prop drilling. Contexts are made of three pieces:

  • A Provider component that wraps a subtree, making the values passed to it available.
  • A Consumer component that can exist anywhere in a wrapped subtree to access the provider values.
  • The useContext() hook that accesses the values provided by a Provider insider of a Consumer.

Making use of a context requires three steps:

  1. Create and place a Provider
  2. Create and place a Consumer
  3. Reference the provided data in a consumer with useContext

(Codevolution on YouTube has a three part video series that visualizes the use of a context well. See Part 1, Part 2 and Part 3)

Example:

import React, { createContext, useContext } from "react";

export default function App(){
    // Create a context
    const AuthContext = createContext();

    // Create components for component tree
    const CompA = _ => "CompA";
    const CompB = _ => "CompB";
    const CompC = _ => "CompC";

    return (
        // Use the AuthContext provider with a `true` value to represent a user being logged in
        <AuthContext.Provider value={true}>
            <CompA>
                <CompB>
                    <CompC>
                        // Use the AuthContext consumer to reference the value
                        <AuthContext.Consumer>
                            {
                                // To use the value inside a provider we must use render prop style rendering
                               // See: https://reactjs.org/docs/render-props.html
                                (value) => {
                                    const isLoggedIn = value;
                                    isLoggedIn
                                        ? <AdminArea />
                                        : <Login />
                                }
                            }
                        </AuthContext.Consumer>
                    </CompC>
                </CompB>
            </CompA>
        </AuthContext.Provider>
    );
}

In the example above we created an AuthContext and used its Provider to pass a value down a component tree to be used inside its consumer. As this article points out there are ways to make this more convenient allow for complicated logic to be controlled by the context. Using suggestions from this article, we can create helper components from the context Provider and Consumer as well as utilize custom hooks that wrap useContext() to create a more performant solution with easier granular access all without prop drilling.

This is the method we'll use for managing authentication in the frontend.

@campb303
Copy link
Collaborator Author

A timed event to refresh the access token needs to be implemented.

@campb303 campb303 added the high-priority Needs immediate extra focus label Nov 25, 2020
@campb303
Copy link
Collaborator Author

campb303 commented Dec 1, 2020

Fixed in #130 . Closing.

@campb303 campb303 closed this as completed Dec 1, 2020
@campb303
Copy link
Collaborator Author

ITaP's authentication options says that I2A2 based systems have been deprecated since Jan 2019. The current webqueue authenticates against I2A2.

The other three options are:

  • CAS: An implementation of Apereo CAS, provides Single Sign On w/ other Purdue apps, supports 2FA w/ BoilerKey. works using cookie based ticketing simlar to JWTs already in place. Requires group approval.
  • SAML: Does everything CAS does with a higher level detail and extra checks.
  • LDAP: Only recommended if CAS/SAML are not options. Requires group approval.

At this point, it seems SAML is the best option moving forward as it provides all features and is recommended in place of the other two. More information on how it can be used is needed.

@campb303 campb303 reopened this Dec 14, 2020
@campb303 campb303 added the question Something that requires more information before moving forward label Dec 14, 2020
@campb303
Copy link
Collaborator Author

campb303 commented Dec 14, 2020

Notes on how Shibboleth/SAML works:


What is Shibboleth/SAML?

Security Assertion Markup Language (SAML) is used to exchange authorization and authentication information between an Identity Provider (IdP) and Service Provider (SP).

An IdP is any source of truth for users and info about them. One popular IdP is Shibboleth. An SP is typically some application or service, like a banking website or Google Drive.

The use of SAML with an IdP and SP presents two particular benefits:

  • The need for each service provider to maintain its own store of users and info about them can be replaced with a single source of truth.
  • Users can use the single source of truth as Single Sign On (SSO) to login once and use many services. For example, when you log into Gmail, you can then visit other Google products like YouTube and Drive without logging in again.

How Does Shibboleth/SAML Work?

There are generally three components in play when using Shibboleth/SAML for authentication and/or authorization:

  • User Agent (UA): Generally the user's web browser.
  • Identity Provider (IdP): The source of truth for users and info about them. This could be something like Active Directory or a general LDAP server.
  • Service Provider (SP): The application or service requesting authentication and/or authorization for and/or of a user. This could be something like a web app or Electron based app.

There are two common workflows for Shibboleth/SAML based authentication/authorization routines:

  • IdP-Init: where the process is started by the Identity Provider.
    Example: one.purdue.edu
  • SP-Init: where the process is started by the Service Provider.
    Example: Brightspace

For webqueue2, we'll focus on the SP-Init process where webqueue2 is the service provider and Purdue's SAML interface is the Identity Provider. It works like this:

  1. User Agent attempts to access Service Provider.
  2. Service Provider redirects User Agent to Identity Provider for authentication.
  3. User Agent Sends authentication data to Identity Provider and attempts authentication.
  4. If authentication is successful, Identity Provider generates a SAML Assertion for trust.
  5. Identity Provider sends the SAML Assertion to User Agent.
  6. User Agent forwards SAML Assertion to Service Provider to establish trust and start the session.

Shibboleth_SAML SP-Init Workflow


@campb303
Copy link
Collaborator Author

After a talk w/ Sundeep, the decision was made to fall back to I2A2. We'll move webqueue2 to CAS or SAML if/when I2A2 dies.

@campb303
Copy link
Collaborator Author

@benf Mentioned LDAP as a possibility. @seth Is already using LDAP for his applications without issue. @kellyst confirmed that LDAP can be interacted with anyone on the domain without issue. @sundeep approved the new direction.

Proceeding with LDAP testing using my own account. If i works, we should create an Active Directory account for webqueue2 to do auth independent of a user's account.

@campb303
Copy link
Collaborator Author

Initial testing using EasyAD were inconclusive. I can't seem to log in while providing valid credentials for my own account. However, there isn't much logging available on my side.

I asked @seth about how he authenticates and the configuration seems similar though there's a possibility that I need to make everything lowercase.

I've also reached out to Sean Kellyto double-check configuration settings as he is more familiar with BoilerAD than I am.

@campb303
Copy link
Collaborator Author

campb303 commented Jan 6, 2021

EasyAD requires the following packages to be installed on the host machine:

  • libsasl2-dev
  • python3-dev
  • python3-pip
  • libldap2-dev
  • libssl-dev

With these packages installed on Ubuntu 18.04 and while connected to the WebVPN, I was able to successfully authenticate against BoilerAD. (With myself, @kellyst and @seth looking at the code, we're not sure why both the AD Server and AD Domain need to be boilerad.purdue.edu but that is what works.)

from easyad import EasyAD

config = dict(AD_SERVER="boilerad.purdue.edu",
              AD_DOMAIN="boilerad.purdue.edu")

ad = EasyAD(config)

user = ad.authenticate_user("campb303", password_redacted, json_safe=True)

Next steps are to:

  • Reduce info collected during authentication to make it faster. By default, a large amount of information is collected which makes lookup time slow but there are ways to reduce what is looked up.
  • Get @seth and @kellyst to look at the code for third party approval before requesting that dependencies get installed on templeton.
  • Get dependencies installed on templeton.
  • Integrate this logic into the API's login process.

@campb303
Copy link
Collaborator Author

campb303 commented Jan 7, 2021

As @seth pointed out, EasyAD's authenticate_user function eventually calls its own seach function when looking up users and the default LDAP filtering for a search is quite flexible but inefficient making user lookup take about 8 seconds.

By removing all but an LDAP display name which corresponds to the career account user name (sAMMAccountName), user lookup is less flexible but 400x times faster at <0.02 seconds. Since we're only looking users up by their career account username the loss in flexibility should be fine.

# Default EasyAD Filter String
# Takes about 8 seconds for user lookup
filter_string = "(&(objectClass=user)(|(userPrincipalName={0})(sAMAccountName={0})(uid={0})(mail={0})" \
                "(distinguishedName={0})(proxyAddresses=SMTP:{0})))".format(escape_filter_chars(user_string))

# Optimized EasyAD Filter String:
# Takes about 0.02 seconds for user lookup
filter_string = "(&(objectClass=user)(|(sAMAccountName={0})))".format(escape_filter_chars(user_string))

There is no way to override the default filter string when using EasyAD's authenticate_user function but I can use the search function directly and override the filter string. This has the added benefit of binding to but not authenticating against AD meaning we can see if the password is valid without the user being logged in.


@seth also pointed out that the use of a Python equivalent to C#'s secure string would reduce the risk of plaintext passwords getting dumped in case of the Python interpreting crashing while a password is in memory. This should be looked into.


I also spoke with @kellyst and @sundeep about security implications of arbitrary user logins via webqueue2. We don't want administrative accounts to bind or authenticate to avoid potential unintended privilege escalation.

Next Steps

  • Look into Python equivalent of a secure string (or another way to avoid dumping plaintext passwords)
  • Add check to deny admin accounts (accounts that end with 'adm')
  • Set meeting w/ Curtis, Sundeep, Sean and/or Seth and myself to test authenticate code on templeton.

@campb303
Copy link
Collaborator Author

campb303 commented Jan 7, 2021

After further conversation with @cs and @kellyst, the logic for preventing admin account login has switched from using regex to block usernames ending in 'adm' or 'admx' in the API.

Instead, @kellyst created a new Active Direcory group 00000227-ECN-webqueue that contains three groups:

  • ECN Staff
  • ECN Students
  • and a Misc group

This group cannot contain username's that end in adm or admx therefore it does the same checking but it is enforced on Active Directory's side instead of webqueue2's.

@campb303
Copy link
Collaborator Author

campb303 commented Jan 8, 2021

Auth code is ready. Need to check with @cs on getting dependencies installed on templeton.

from easyad import EasyAD
from ldap.filter import escape_filter_chars
from ldap import INVALID_CREDENTIALS as LDAP_INVALID_CREDENTIALS

def user_is_valid(username: str, password: str) -> bool:
    """Checks if user is valid and in webqueue2 login group.

    Args:
        username (str): Career account username.
        password (str): Career account passphrase.

    Returns:
        bool: True if user is valid, otherwise False.
    """

    # Check for empty arguments
    if (username == "" or password == ""):
        return False

    # Initialize EasyAD
    config = {
        "AD_SERVER": "boilerad.purdue.edu",
        "AD_DOMAIN": "boilerad.purdue.edu"
    }
    ad = EasyAD(config)

    # Prepare search critiera for Active Directory
    credentials = {
        "username": escape_filter_chars(username),
        "password": password
    }
    attributes = [ 'cn', "memberOf" ]
    filter_string = f'(&(objectClass=user)(|(sAMAccountName={username})))'

    # Do user search
    try:
        user = ad.search(credentials=credentials, attributes=attributes, filter_string=filter_string)[0]
    # pylint says this is an error but it works so ¯\_(ツ)_/¯
    except LDAP_INVALID_CREDENTIALS:
        return False
    
    # Isolate group names
        # Example:
        #	'CN=00000227-ECNStuds,OU=BoilerADGroups,DC=BoilerAD,DC=Purdue,DC=edu' becomes
        # 	`00000227-ECNStuds`
    user_groups = [ group.split(',')[0].split('=')[1] for group in user["memberOf"] ]

    # Check group membership
    webqueue_login_group = "00000227-ECN-webqueue"
    if webqueue_login_group not in user_groups:
        return False

    return True

@campb303
Copy link
Collaborator Author

campb303 commented Jan 8, 2021

Request sent to software group.

@campb303
Copy link
Collaborator Author

A screenshot of the ARS config for the previously mentioned AD groups.

Screen Shot 2021-01-08 at 9 39 50 AM

@campb303 campb303 removed frontend high-priority Needs immediate extra focus question Something that requires more information before moving forward tooling Related to tools and utilities for the management of the project labels Feb 5, 2021
@campb303 campb303 self-assigned this Feb 5, 2021
@campb303
Copy link
Collaborator Author

campb303 commented Feb 5, 2021

Full Active Directory login support has been implemented in the API using EasyAD. This required building a custom version of PyLDAP that didn't require unused SASL libraries and automating this process in the venv-manager as described in this comment.

This is now being tracked in #169 . Closing.

Sign in to join this conversation on GitHub.
Labels
feature-request Request for functionality that has not already been implemented
Projects
None yet
Development

No branches or pull requests

2 participants