Skip to content

Implement support for webqueue2 on templeton #162

Closed
campb303 opened this issue Jan 13, 2021 · 11 comments
Closed

Implement support for webqueue2 on templeton #162

campb303 opened this issue Jan 13, 2021 · 11 comments
Assignees
Labels
high-priority Needs immediate extra focus question Something that requires more information before moving forward tooling Related to tools and utilities for the management of the project
Projects

Comments

@campb303
Copy link
Collaborator

Following a conversation with @sundeep , @cs and I today, some architectural changes will need to be made to webqueue2.

A core reason for rewriting webqueue2 is to address accumulated technical debt and produce a system that meets today's needs. This is in conflict with current state of ECN's web hosting infrastructure on templeton. While webqueue2 has a WSGI based API server, templeton only has the ability to support serving static files and CGI scripts. While addressing technical debt and adding features to meet today's needs is a leading reason behind webqueue2, in order to move forward support for CGI based API interaction will need to be added.

This appears to be possible at the cost of performance and system resources. At its core, the webqueue2 API works via Python Flask which supports running as a CGI application according to the project's documentation. This would allow us to run the API within the constraints of templeton but there still needs to be a proxy from the Apache instance to the API.

The proxy may be possible via a rewrite rule using .htaccess files. These files can control the Apache config per directory and the rewrite engine can act as a proxy to a local port. This would require mod_rewrite to be enabled and while it isn't loaded with the current configuration, it is listed as a loadable module. I need to check with @cs about loading mod_rewrite.

More testing is needed but this may prove to be the best option available.

@campb303 campb303 added api tooling Related to tools and utilities for the management of the project question Something that requires more information before moving forward high-priority Needs immediate extra focus labels Jan 13, 2021
@campb303 campb303 added this to the v1-readonly milestone Jan 13, 2021
@campb303 campb303 self-assigned this Jan 13, 2021
@campb303
Copy link
Collaborator Author

I have a successful proof of concept for using .htaccess files to reverse proxy connections to the API.

The modules needed for this include:

  • mod_rewrite
  • mod_proxy

I can confirm that those modules are present by checking the loaded modules with httpd -M (elipses denote that some part of the output have been removed for brevity):

campb303@templeton [~]
$ httpd -M
Loaded Modules:
 ...
 rewrite_module (shared)
 ...
 proxy_http_module (shared)
 ...
Syntax OK

With these modules loaded, I can use create a .htaccess files in the directory I'd like to host webqueue2 from that:

  • Enables the rewrite module
  • Adds a rewrite rule to reverse proxy requests from ${HOSTNAME}/api/${REQUEST} to ${API}/${REQUEST}

For example, if I wanted to host the webqueue2 from https://engineering.purdue.edu/webqueue/q2 I would create the file /web/groups/qweb/public_html/wq2/.htaccess and add the following lines:

# Enable the rewrite module
RewriteEngine On

# Reverse proxy all requests to engineering.purdue.edu/webqueue/q2/api/ to w2vm4 running the API
# See mod_rewrite docs: https://httpd.apache.org/docs/current/mod/mod_rewrite.html
# See RewriteRule and its P (proxy) flag docs: https://httpd.apache.org/docs/current/rewrite/flags.html#flag_p
# 128.46.154.134 is the IP for w2vm4
RewriteRule ^api/(.*)$ http://128.46.154.134:5000/api/$1 [P]

At this point, the directory structures looks like this:

/web/groups/qweb/public_html/
└── wq2
    ├── .htaccess
    └── index.html        # The file to be served for non-API requests

Assuming the index.html file simple contains a greeting, if I were to visit https://engineering.purdue.edu/webqueue/q2 I would see something like:

Screen Shot 2021-01-13 at 4 34 43 PM

But if I request a valid API endpoint, like the item CE 100, I would see something like:
Screen Shot 2021-01-13 at 4 35 00 PM

With this solution in place, it would be possible to host the webqueue2 frontend as one server on Templeton and host the API as another serve on Templeton with the reverse proxy pointing to the API server. This has the benefit of no further modification to the API and a draw back of needing an automatable mechanism to start, stop and restart the API.

Further research needs to be done to see how running the API as a CGI script affects it.

@campb303
Copy link
Collaborator Author

Using the Flask CGI handler I am able to run the API as a CGI with very little changes. By combining the reverse proxy abilities previously mentioned and adding .htaccess options to enable CGI script execution, all technical limitations can be overcome.

Assuming I wanted to serve webqueue2 from https://engineering.purdue.edu/webqueue/q2 which corresponds to /web/groups/qweb/public_html/webqueue/q2, I first copy place the needed files in the same directory:

  • .env
  • ECNQueue.py
  • api.py
  • requirements.txt

Now I need to create a Python virtual environment for the API to run from by running:

campb303@templeton [~]
$ python3 -m venv venv

This will create the venv folder that contains an activation script and gives me the ability to install my own Python packages that happen to be listed in the requirements.txt file. To set this up I run:

# Activate the Python virtual environment
source venv/bin/activate

# Update to the latest version of pip
pip install --upgrade pip

# Install the project dependencies
pip install -r requirements.txt

# Deactive the Python virtual environment
deactivate

Now with a virtual environment and the needed Python packages, I can modify the API module api.py to be run as a CGI script. First I import the CGI handler then I replace the Flask app.run() with a CGI handler run and pass the Flask app as the argument:

# api.py
...
from wsgiref.handlers import CGIHandler
...
if __name__ == "__main__":
    CGIHandler().run(app)

After this I change the name of the API module from api.py to api.cgi:

mv api.py apy.cgi

With the API prepared to run as a CGI script and the the virtual environment created, I can use an .htaccess file to configure a reverse proxy for the api/ endpoint by creating the file /web/groups/qweb/public_html/wq2/.htaccess and add the following lines:

# Add CGI file hander
AddHandler cgi-script .cgi

# Allow CGI scripts to be run
Options +ExecCGI

# Enable the rewrite module
RewriteEngine On

# Reverse proxy all requests to the API to the API CGI script
# See mod_rewrite docs: https://httpd.apache.org/docs/current/mod/mod_rewrite.html
RewriteRule ^api/(.*)$ /web/groups/qweb/public_html/q2-cgi-test/api.cgi/api/$1 [L]

At this point the directory structure looks like this:

.
├── .env
├── .htaccess
├── ECNQueue.py
├── api.cgi
├── index.html
├── requirements.txt
└── venv

I have some questions about this setup that need to be tested moving forward:

  • How does the reverse proxy affect cookies both secure and otherwise? Is there any more configuration needed to preserve that functionality? (This can be tested by building and deploying the frontend.)
  • How much slower is the CGI script than the WSGI server? (This can be measured with time stamps before and after the CGI handler is called.)
  • The Apache error logs produce errors like: [Wed Jan 13 22:56:02 2021] [error] [client 76.179.18.240] client denied by server configuration: /web/groups/api. What does this mean? How can I rewrite the rewrite rules to avoid these errors?

@campb303
Copy link
Collaborator Author

An initial test of loading both the frontend and the backend with the previously mentioned .htaccess configuration and CGI handler was unsuccessful but the fixes needed are possible as mentioned in this article. Those fixes are:

  • Modify the frontend router with a basename to automatically prepend the subdirectory path.
  • Modify fetch requests with the same basename so that API calls aren't made against the top level domain.
  • Modify the API so that all endpoints are prefaces with api/ so that the .htaccess rewrite rule will work.

@campb303
Copy link
Collaborator Author

Another consideration to keep in mind is that once the routing changes in the frontend and CGI modifications for the backend are made, the development environments of w2vm[1-5] will need to be modified to allow for testing with similar configurations to templeton.

@campb303
Copy link
Collaborator Author

campb303 commented Jan 14, 2021

In response to the previously asked question:

How much slower is the CGI script than the WSGI server? (This can be measured with time stamps before and after the CGI handler is called.)

When measuring the time difference between running the API as a CGI script vs. WSGI application I found:

  • Retrieving a queue as a CGI script is 2.68 times slower than the WSGI application.
  • Retrieving an item as a CGI script is 3.27 times slower than the WSGI application.

Tests were run by loading a page for the API as a CGI script and a page for the API as a WSGI application then defining timing functions in the browser's developer tools. These timing functions measure the time from making a call to the API to resolving the resultant JSON. (For simplicity, JWT authorization was temporarily disabled.) The timing functions were run 5 times in quick succession and the times were averaged before being compared.

Below are the timing functions run. Note that the path used in the fetch call differed between the CGI and WSGI tests because of environmental differences but should not affect the execution time:

let timedQueueResolve = async () => {
    let start = performance.now();
    let ceQueue = await fetch("/api/ce");
    let ceQueueJson = await ceQueue.json();
    let end = performance.now();
    console.log(`Fetching the CE queue took about ${(end - start)/1000} seconds.`);
}

let timedItemResolve = async () => {
    let start = performance.now();
    let ceQueue = await fetch("/api/ce/1");
    let ceQueueJson = await ceQueue.json();
    let end = performance.now();
    console.log(`Fetching CE 1 item took about ${(end - start)/1000} seconds.`);
}

The results were as follows:

# Retrieving CE queue via WSGI (avg 1.421536):
Fetching the CE queue took about 1.4349899999797344 seconds.
Fetching the CE queue took about 1.4041049999650568 seconds.
Fetching the CE queue took about 1.4088099999935366 seconds.
Fetching the CE queue took about 1.4366900000022724 seconds.
Fetching the CE queue took about 1.423084999958519 seconds.

# Retrieving CE queue via CGI (avg 3.809544):
Fetching the CE queue took about 3.865899999975227 seconds.
Fetching the CE queue took about 3.674745000025723 seconds.
Fetching the CE queue took about 3.524184999987483 seconds.
Fetching the CE queue took about 3.6421099999570288 seconds.
Fetching the CE queue took about 4.340780000027735 seconds.

# Retrieving CE 1 item via WSGI (avg 0.604347):
Fetching CE 1 item took about 0.6257249999907799 seconds.
Fetching CE 1 item took about 0.5444949999800883 seconds.
Fetching CE 1 item took about 0.5278749999706633 seconds.
Fetching CE 1 item took about 0.6280999999726191 seconds.
Fetching CE 1 item took about 0.6955399999860674 seconds.


# Retrieving CE 1 item via CGI (avg 1.973271):
Fetching CE 1 item took about 1.8826700000208803 seconds.
Fetching CE 1 item took about 1.5634200000204146 seconds.
Fetching CE 1 item took about 2.095244999974966 seconds.
Fetching CE 1 item took about 1.9756899999920279 seconds.
Fetching CE 1 item took about 2.3493299999972805 seconds.

@campb303
Copy link
Collaborator Author

In response to the previously asked question:

The Apache error logs produce errors like: [Wed Jan 13 22:56:02 2021] [error] [client 76.179.18.240] client denied by server configuration: /web/groups/api. What does this mean? How can I rewrite the rewrite rules to avoid these errors?

It appears that this issue was caused when the rewrite rule from the .htaccess file was using a file system path to the CGI script as opposed to a URI relative to the TLD. The errors have stopped with the following changes>

# .htaccess

# DOES NOT WORK
# Rewrite using a file system path
RewriteRule ^api/(.*)$ /web/groups/qweb/public_html/q2/api.cgi/api/$1 [L]

# WORKS
# Rewrite using a URI relative to the TLD:
RewriteRule ^api/(.*)$ /webqueue/q2/api.cgi/api/$1 [L]

@campb303
Copy link
Collaborator Author

Talked to @sundeep proposing the API be run as a separate web entity which reverse proxies API requests to the WSGI server when requests are made to the webqueue2 frontend. He signed off on that.

I need to get this running to show @cs that it can work with a cron job and an init script but no larger changes to management tools or established process.

@campb303
Copy link
Collaborator Author

campb303 commented Jan 22, 2021

Previously I had thought that the EasyAD library required libsasl because it's GitHub page noted it needed to be installed. This is not the case. libsasl is an optional dependency for OpenLDAP and likely listed for install with EasyAD for a "batteries included" installation.

I'd attempted to install EasyAD within a virtual environment on one of the development machines before without success and instead of reading through the error message I assumed it was a lack of libsasl. Upon further inspection, what was missing was the C header file for the BER encoder/decoder for OpenLDAP. To solve the issue, we would only need to install libldap-dev on the development machines.

Knowing that OpenLDAP was already installed on templeton, I assumed that these header files were installed and tried to run the existing EasyAD implementation directly on templeton. It worked!

In short, Curtis was right that we do not need libsasl and I was wrong to assume it was the issue. In reality we need OpenLDAP dev files installed on the development machines and those files are already installed on templeton. Everything else in this thread still applies for .htaccess proxy and CGI scripts. I still think its best to proceed with the API as a WSGI application and proxy connections to it.

See also: Customizing the Install of PyLDAP

@campb303 campb303 changed the title Implement support for CGI Implement support for API on templeton Jan 26, 2021
@campb303
Copy link
Collaborator Author

Curtis responded to my request for the OpenLDAP dev files to be installed on webqueue2 development machines saying those were already there. He was right as shown below:

campb303@w2vm4 [~]
$ dpkg-query -l | grep -i ldap2-dev
ii  libldap2-dev:amd64                            2.4.45+dfsg-1ubuntu1.8                           amd64        OpenLDAP development libraries

While trying to reproduce the BER encoder/decoder error I found that I had been experimenting on Pier -- not the webqueue2 development machines. After attempting to install pyldap in a virtual environment on one of the dev machines and got an error that SASL headers were not found:

...
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DHAVE_SASL -DHAVE_TLS -DHAVE_LIBLDAP_R -DHAVE_LIBLDAP_R -DLDAPMODULE_VERSION=3.3.1 -DLDAPMODULE_AUTHOR=python-ldap project -DLDAPMODULE_LICENSE=Python style -IModules -I/home/pier/e/campb303/error-repoduce/include -I/usr/include/python3.6m -c Modules/LDAPObject.c -o build/temp.linux-x86_64-3.6/Modules/LDAPObject.o
    Modules/LDAPObject.c:16:10: fatal error: sasl/sasl.h: No such file or directory
     #include <sasl/sasl.h>
              ^~~~~~~~~~~~~
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/pier/e/campb303/error-repoduce/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-bbst1134/python-ldap_29f55b471d414dcc97f77ad4af5e0eba/setup.py'"'"'; __file__='"'"'/tmp/pip-install-bbst1134/python-ldap_29f55b471d414dcc97f77ad4af5e0eba/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-qvcw2d_m/install-record.txt --single-version-externally-managed --compile --install-headers /home/pier/e/campb303/error-repoduce/include/site/python3.6/python-ldap Check the logs for full command output.

Knowing that pyldap can be built from source without SASL support, I did so successfully with the following steps:

  1. Download and unarchive the latest pyldap source from GitHub (v3.3.1):
wget https://github.com/python-ldap/python-ldap/archive/python-ldap-3.3.1.zip
unzip python-ldap-3.3.1.zip
# Remove the archive.
rm python-ldap-3.3.1.zip
# Enter the source code directory.
cd python-ldap-python-ldap-3.3.1/
  1. Edit the defines setting in the setup.cfg file to remove SASL support:
nano setup.cfg
# setuo.cfg

#defines = HAVE_SASL HAVE_TLS HAVE_LIBLDAP_R
defines = HAVE_TLS HAVE_LIBLDAP_R

Note: Simply omitting the sasl2 entry from the libs setting as suggested in the pyldap guide on installing from source did not work because the defines setting still passes the --with-sasl flag to the compiler.

  1. Create and activate a Python virtual environment for isolation:
python3 -m venv __venv__
source __venv__/bin/activate
  1. Build and install pyldap
python setup.py build
python setup.py install
  1. Update pip and install EasyAD:
pip install --upgrade pip
pip install easyad

At this point I copied the Active Directory auth code and tested it with success. I'll need to automate the building of pyldap from source without SASL support and confirm this process works on templeton. I believe this should be a workable solution.

@campb303
Copy link
Collaborator Author

Building of pyldap without SASL support and running auth code on templeton works fine.

@campb303 campb303 changed the title Implement support for API on templeton Implement support for webqueue2 on templeton Jan 27, 2021
@campb303
Copy link
Collaborator Author

campb303 commented Jan 27, 2021

@campb303 campb303 added this to To do in v1.0 Mar 8, 2021
v1.0 automation moved this from To do to Done Mar 15, 2021
Sign in to join this conversation on GitHub.
Labels
high-priority Needs immediate extra focus question Something that requires more information before moving forward tooling Related to tools and utilities for the management of the project
Projects
No open projects
v1.0
  
Done
Development

No branches or pull requests

1 participant