Skip to content

Package Structure #20

Closed
benne238 opened this issue Mar 29, 2021 · 18 comments · Fixed by #32
Closed

Package Structure #20

benne238 opened this issue Mar 29, 2021 · 18 comments · Fixed by #32
Assignees
Labels
enhancement Request for a change to existing functionality high-priority Needs immediate extra focus
Projects

Comments

@benne238
Copy link
Collaborator

Currently, there are only two modules that make up webqueue2_api:

  1. ECNQueue.py
  2. api.py

However ECNQueue.py and api.py both have several moving parts that should be separated out into various packages and/or modules.

ECNQueue.py has 3 distinct parts:

  1. The item class which stores the basic python structure of an item
  2. The queue class, which is a collection of multiple items
  3. The parser, which interacts with the items stored on pier

api.py has 5 distinct parts:

  1. logging in, which allows an authorized user to access webqueue2
  2. generating refresh tokens, which allows for an authorized user to stay logged in for a longer period of time
  3. get the json representation of an item (from ECNQueue.py)
  4. get the json representation of a queue (from ECNQueue.py)
  5. get the json representation of all the queues and the number of items in each queue (from ECNQueue.py)

As a preliminary structure, it might make sense to make ECNQueue and api sub-packages of webqueue2_api, with their respective functions outlined above, broken into their own modules:

├─ setup.py
│
├───webqueue2_api
│   ├───api
│   │   ├───__init__.py
│   │   ├───login.py
│   │   ├───token_refresh.py
│   │   ├───get_item.py
│   │   ├───get_queue.py
│   │   └───get_queue_list.py
│   │     
│   └───webqueueapi
│       ├───__init__.py
│       ├───Item.py
│       ├───parser.py
│       └───Queue.py
│
└─ __init__.py
@benne238 benne238 self-assigned this Mar 29, 2021
@benne238 benne238 added the enhancement Request for a change to existing functionality label Mar 29, 2021
@campb303 campb303 added the high-priority Needs immediate extra focus label Mar 29, 2021
@campb303 campb303 added this to To do in v1.0 via automation Mar 29, 2021
@campb303 campb303 added this to the production-ready milestone Mar 29, 2021
@campb303
Copy link
Collaborator

Some changes:

@campb303
Copy link
Collaborator

PXL_20210331_183344547

@campb303
Copy link
Collaborator

campb303 commented Apr 1, 2021

Passing Config Options via Environment Variables

In order to keep ECNQueue separate from the API and let the end user configure ECNQueue without directly editing code, we need a way to pass config options from a management script to library code. We can do this using a combination of the python-dotenv library and module level __init__.py files.

Because a package's __init__.py file is run on import with an execution context of where its importing script is run (or where the interpreter is started from), file paths are relative to the importing script.

For example:
If we have a package called stork with a single __init__.py file inside it like this:

stork/
└── __init__.py

and the __init__.py references a file with a relative path like this:

# __init__.py

# A made up function
load_file("file.config")

if the module is imported from a script called run.py then the file.config file needs to exist in the same directory as run.py.

Because of this relative import, we can load environment variables from files next to a management script.

@campb303
Copy link
Collaborator

campb303 commented Apr 1, 2021

Suggested Package Structure:

.
├── ECNQueue
│   ├── Item.py                         # Item Class
│   ├── Queue.py                        # Queue Class
│   ├── __init__.py                     # Load Environment, Store Globals (queue_dir, queues_to_ignore), re-export utils
│   ├── parser                          # Split current parser code
│   │   └── __init__.py
│   └── utils.py                        # Utility functions
└── api
    ├── __init__.py                     # Load Environment, Initialize Flask and API objects, Register Resources
    ├── __main__.py                     # argparse, start|stop|restart for api via gunicorn
    ├── auth.py                         # user_is_valid
    └── resources                       # Stores resources
        ├── __init__.py
        ├── item.py
        ├── login.py
        ├── queue.py
        ├── queue_list.py
        └── refresh_access_token.py

@benne238
Copy link
Collaborator Author

benne238 commented Apr 3, 2021

Logger

We need to implement a logger in the top level __init__.py so that the logger is accessible throughout the entire package. However, due to time constraints, this will be unable to be implemented tonight.

@benne238
Copy link
Collaborator Author

benne238 commented Apr 3, 2021

Custom exceptions

There are several configuration values that need to be correctly configured. However, in order to do this, it makes the most sense to define a custom exception classes that is able to handle the various problems that could occur when trying set the various api configuration values.

exceptions.py

class exceptionName(Exception):
    pass

caller.py

from exceptions import exceptionName
raise exceptionName("ope! It done broke")

The exceptions.py script creates an exception class, and the caller.py raises that exception. The resulting output will be this:

Traceback (most recent call last):
  File "caller.py", line 2, in <module>
    raise exceptionName("ope! It done broke")
exceptions.exceptionName: ope! It done broke

@benne238
Copy link
Collaborator Author

benne238 commented Apr 4, 2021

configparser

configparser replaces our previous use of os.environ in order to get environment variables. In this case, we now have a structured environment file that can be created in order to pass certain configurations to ECNQueue, webqueue2-api, and the Logger. The name of the config file is webqueue2-api.cfg and it should be located next to the wrapper script:

[webqueue2_api]
[webqueue2_api]
JWT_SECRET_KEY = sshhhhhhh, its a secret!
ENVIRONMENT = dev

[ECNQueue]
QUEUES_TO_IGNORE = ["archives", "drafts", "inbox", "coral"]
QUEUE_DIRECTORY = /home/pier/e/queue/Mail

[Logger]
LOGGER_OUT_FILE = /tmp

@benne238
Copy link
Collaborator Author

benne238 commented Apr 11, 2021

A note on import statements in the webqueue2_api.api package

__main__.py, in context of the api sub-package, contains a simple script to run the api:

__main__.py:

from .__init__ import app

app.run()

To run the api via the __main__.py script, the python3 command can be used to run sub package directly:

python3 -m webqueue2_api.api

However, it is also possible to run the api in a similar manner by using a wrapper script that functions in a similar way, the big difference being that the wrapper script can contain any code that the user wishes to place in there:

wrapper.py:

from webqueue2_api.api import app

app.run()

It is important to note that whenever an import statement is used, the __init__.py script will run in what ever package/module that is being imported, regardless if __init__.py itself is being imported.

However, the __main__.py script will run, but only if it is explicitly imported: like this

wrapper.py:

from webequeue2_api.api import __main__

Note: since __main__ is being imported from webqueue2_api.api, the __init__.py scripts located in the webqueue2_api directory and the webqueue2_api/api directory, will both run.

In all three of the examples above: the output will be exactly the same and resemble something like this:

 * Serving Flask app "webqueue2_api.api.__init__" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Issues with "duplicate" imports

With many different way to start the api and with many different import statements happening before code is even executed, it is possible to have issues with the api being run twice. For example. the resulting script would run the api twice:

wrapper.py

from webqueue2_api.api import app
from webqueue2_api.api import __main__
app.run()

output:

 * Serving Flask app "webqueue2_api.api.__init__" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
^C * Serving Flask app "webqueue2_api.api.__init__" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

In addition to this, it is also pertinent that the __init__.py script not contain an import statement that imports the __main__.py script because if the __main__.py script were run directly whether by using the python3 -m webqueue2_api.api command or by a wrapper script, the __init__.py script will run first, which will cause the __main__.py script to be imported, then subsequently run and when the user attempts to exit, the remaining code in the __init__.py script will be executed then the main script will be executed an additional time, resulting in similar output as documented above.

@benne238
Copy link
Collaborator Author

How it works right now

At this point in time, it is possible to start the api with gunicorn from the command line.
BUT
There are significant limitations at this time: command line arguments passed via python3 -m webqueue2_api {args here} start-api are completely ignored except for the verbosity flag `-v'

It is possible to stop the api if it is running by running: python3 -m webqueue2_api stop-api

To effectively run the api with any desired configurations, you must create a config file that resembles this:
webqueue2-api.cfg:

[webqueue2_api]
JWT_SECRET_KEY = secretstuffhere
ENVIRONMENT = dev

[ECNQueue]
QUEUES_TO_IGNORE = ["archives", "drafts", "inbox", "coral"]
QUEUE_DIRECTORY = /home/pier/e/queue/Mail

[Logger]
LOGGER_OUT_FILE = /tmp/webqueue2_api.log

Note: make sure to name the file webqueue2-api.cfg

Then, run the following command in the same directory as the config file that was just created: python3 -m webqueue2_api start-api

The api is designed (if configurations are omitted from the command line) to fall back and check the current directory for a file with the name webqueue2-api.cfg

Following the above steps will create an api with the desired configurations.

Next steps

  • Argparse arguments are not being effectively used and are essentially ignored and I have to figure out why that is the case
  • I do not know how to start the api from a wrapper script or if it can be done at this point in time
  • Gunicorn will currently run in the background due to the --daemon flag as seen in the __main__.py module. However, this prevents output from the logger from being put on the screen and only into the log file.
  • Make the code prettier. There are a ton of moving parts and scripts referencing other scripts and a global config file that stores information. All of this needs to be smoothed out because the current readability of the code leaves a lot to be desired.

@benne238
Copy link
Collaborator Author

Big changes in the api:

The largest change in the api is the addition of a file that contains all of the possible configurations for different areas of the api including the logger, the api, and the ecnqueue. However, this is currently being changed, so that functions in the respective packages will be called as opposed to delaying when the init file for these packages is called, which pull the values directl from the global config file.

One of the other large changes is that gunicorn references another script called start.py. Gunicorn is run in a sub process, which calls part of the package without having any context of the fact that it is in a running python script, meaning all changes made before gunicorn is called do not take affect. In its current implementation, the gunicorn sub process calls a function in the start.py module, which returns a flask object that. While this is not optimal, this is the best way I have found to use gunicorn to run the api.

@campb303 campb303 removed high-priority Needs immediate extra focus overdue labels Apr 28, 2021
@benne238
Copy link
Collaborator Author

How to Use the API

It is possible to interact with the api in one of two ways: via the command line or via a wrapper script

Command Line

Syntax:

python3 -m webqueue2_api {flags} {start-api, stop-api, restart-api}

Flags:

Flag Description Default
-j Specifies the JWT_SECRET_KEY to use when starting the api Defaults to a random 16 character string
-d Specifies where the queue directory is located /home/pier/e/queue/Mail
-l Specifies what file log output should go to /tmp/webqueue2_api.cfg
-v Specifies if api should output debug messages to the terminal False
-e Specifies the environment (either prod or dev) and set to JWT_COOKIE_SECURE to True or False, respectively prod
-i Specifies which queues to ignore ['archives', 'drafts', 'inbox', 'coral']
-c Specifies the config file to pull all of the above configurations from

Wrapper Script

Syntax:

#wrapper.py
from webqueue2_api import __main__

__main__.startApi(
    config_file= "/path/to/config", 
    log_file= "/path/to/store/logs", 
    jwt_secret_key= "the secret key", 
    environment= "dev",  # or "prod" 
    queue_dir= "/path/to/queue/directory", 
    queues_to_ignore= ["queues", "to", "ignore"], 
    verbose= True  # or False
    )


__main__.stopApi()

The Config File

The config file is an alternate way to specify all of the different options that can be used with the api. The config file must follow the following syntax:

[webqueue2_api]
JWT_SECRET_KEY = secretkey
ENVIRONMENT = dev

[ECNQueue]
QUEUES_TO_IGNORE = ["archives", "drafts", "inbox", "coral"]
QUEUE_DIRECTORY = /home/pier/e/queue/Mail

[Logger]
LOGGER_OUT_FILE = /tmp/webqueue2_api.log

The config file can store any and all arguments (except the config_file option) and be used instead of or in conjunction with command line arguments or key word arguments passed in a wrapper script.

The order of precedence is as follows:

  1. All flags passed to the command line or via keyword arguments will be applied.
  2. If a valid path to a config file was provided OR a config file called webqueue2-api.cfg exists within the current directory from where the api is being called from, any flag/kwarg not directly specified or incorrectly specified will be set to whatever value is specified in the config file for that particular argument
  3. If the provided path to the config file is not valid, there is a syntax error in the config file, there is no webqueue2-api.cfg in the current directory or any combination of the three, the default value for the argument will be applied.

@campb303 campb303 linked a pull request May 6, 2021 that will close this issue
@campb303 campb303 modified the milestones: production-ready, write-access May 17, 2021
@campb303 campb303 added the high-priority Needs immediate extra focus label May 17, 2021
@campb303 campb303 assigned campb303 and unassigned benne238 May 24, 2021
@campb303
Copy link
Collaborator

Work for this is now being tracked in the refactor-to-module-layout branch.

@campb303
Copy link
Collaborator

The project layout is currently monolithic and difficult to maintain. Its the result of adding features as needed without the foresight of package

webqueue2-api/
├── ECNQueue.py     # Parser and data clases (Item/Queue)
├── README.md       # Intro to the project, appears on GitHub
├── __init__.py     # Package identifier
├── api.py          # API and Authentication (gunicorn and EasyAD)
├── docs            # Documentation
└── setup.py        # Package setup script

To facilitate code splitting and a scalable structure, the layout will be modified to match the following:

webqueue2-api/
├── README.md               # Intro to the project, appears on GitHub
├── docs                    # Documentation
├── setup.py                # Package setup script
└── src                     # Contains all packages/modules
    └── webqueue2-api       # The new source code root for the package
        └── __init__.py

This new structure presents several benefits:

  1. Disallows for importing code from the root directory forcing the installed package to be used replicating the end user experience.
  2. Future packages can be added to the project by adding them to the src/ directory allowing for expandability.
  3. Project utilities can now distinctly separated from project source code just by being outside the src/ directory.

@campb303
Copy link
Collaborator

"webqueue2-api" contains a hyphen which is against package naming conventions. Following suit with GitHub, the package will be renamed to "webqueue2api".

@campb303
Copy link
Collaborator

By default, to import a symbol from a module in a package would require a fully qualified namespace. For example, with the following package structure:

animals/
├── __init__.py
└── dog.py       # Contains Dog class

To access the Dog class in the dog module within the animals package I would have to do one of the following:

# Fully qualified namespace
import animals.dog.Dog
animals.dog.Dog.bark()
# Aliased fully qualified namespace
import animals.dog.Dog as dog
dog.bark()
# Alternative aliased fully qualified namespace
from animals.dog import Dog
Dog.bark()

This introduced redundancy in imports making code harder to read and it requires an end user to know how the package itself is organized.

Following guidance from "What's init for me?", the package will follow the "The Convenience Store" model for module imports. In this example, symbols like values, functions and classes are imported into __init__.py and renamed to allow for top level access by end users. For example, if we modify the __init__.py in the animals package to include:

from .dog import Dog

Then the end user could access the Dog class like this:

from animals import Dog
Dog.bark()

This will require new features in the future to be explicitly exported in the __init__.py script however it simplifies th uage of the package and hides the internal tools.

@campb303
Copy link
Collaborator

campb303 commented Jun 9, 2021

The previous ECNQueue module containing Item and Queue class definitions as well as utility functions like loadQueues() has been refactored to a parser sub-package of webqueue2api. These definitions are now available via:

  • webqueue2api.parser.Item
  • webqueue2api.parser.Queue
  • webqueue2api.parser.loadQueues

These same symbols have also been made available at the top level of webqueue2api:

  • webqueue2api.Item
  • webqueue2api.Queue
  • webqueue2api.loadQueues

This allows for a workflow like this:

import webqueue2api

all_queues = webqueue2api.loadQueues()
ce_queue = webqueue2api.Queue("ce")
aae_1 = webqueue2api.Item("aae", 1)

Which is equivalent to:

from webqueue2api import loadQueues, Queue, Item

all_queues = loadQueues()
ce_queue = Queue("ce")
aae_1 = Item("aae", 1)

Next steps are to implement global configuration using Dataclasses.

@campb303
Copy link
Collaborator

campb303 commented Jun 14, 2021

Configuration options for webqueue2api are now being managed using package level dataclass objects. Dataclasses were introduced in Python 3.7. Because templeton runs Python 3.6, we now need to use the dataclasses for Python 3.6 package.

Dataclasses are an extension of standard classes that accept instance variable definition and types as well as instance methods then generates an __init__ function for the instance variables in the order they were defined. e.g.

from dataclasses import dataclass
import pathlib

@dataclass
class Configuration:
    queue_directory: pathlib.Path = pathlib.Path("/home/pier/e/queue/Mail/")
    queues_to_ignore: list = ["archives", "drafts", "inbox", "coral"]

The __init__ function for the above dataclass would be created automatically and look like this:

def __init__(self, queue_directory: pathlib.Path = pathlib.Path("/home/pier/e/queue/Mail/"), queues_to_ignore: list = ["archives", "drafts", "inbox", "coral"]):
    self.queue_directory = queue_directory
    self.queues_to_ignore = queues_to_ignore

An instance of this configuration could be made like this:

# Default Values
config = Configuration()
# Override Values
config = Configuration(queue_directory="/path/somewhere/else")

For each sub-package, a config.py will be added with a dataclass definition called Configuration and an instance of the dataclass called config. The config symbol can then be imported by other modules to share configuration details e.g.:

# webqueeu2api/parser/Item.py
from .config import config

def load_queues():
    for folder in config.queue_directory:
        if folder not in config.queues_to_ignore:
            Queue(folder)

For parent and sibling modules, the config symbol will be imported to each package's __init__.py and subsequently made available via package.config.value e.g. parser.config.queue_directory.

At the top level package, webqueue2api, a master config option will be composed to allow for hierarchical config access liek this:

import webqueue2api
webqueue2api..config.parser.queue_directory
webqueue2api..config.api.jwt_secret_key

When the package is imported, the default config values are loaded. In order to allow for config file and module execution flags overrides, the instance variable values can be replaced at runtime.

Further Reading: https://tech.preferred.jp/en/blog/working-with-configuration-in-python/

@campb303 campb303 linked a pull request Jun 21, 2021 that will close this issue
@campb303
Copy link
Collaborator

Package structure was updated with #32

Sign in to join this conversation on GitHub.
Labels
enhancement Request for a change to existing functionality high-priority Needs immediate extra focus
Projects
No open projects
v1.0
  
To do
Development

Successfully merging a pull request may close this issue.

2 participants