Skip to content

Implement section parsing for item body. #2

Closed
campb303 opened this issue Aug 6, 2020 · 7 comments
Closed

Implement section parsing for item body. #2

campb303 opened this issue Aug 6, 2020 · 7 comments
Assignees
Labels
enhancement Request for a change to existing functionality tooling Related to tools and utilities for the management of the project

Comments

@campb303
Copy link
Collaborator

campb303 commented Aug 6, 2020

Section Parsing Definitions

Introduction

The body of an item is made up of seven distinct actions Below are descriptions of each action, its purpose, behavior, metadata and delimiters.

Directory Information

Information about the user such as alias, phone number and office location. This only appears once right after the headers and right before the initial message. This only occurs if the item is submitted through the Trouble Reporting page.

Fields

Key Value
type directory_information
Name The real name of the sender.
Login The career account alias of the sender.
Computer The computer the item is related to. Formatting may vary.
Location Where the computer is located.
Email The email address of the sender.
Phone The phone number of the sender.
Office The office location of the sender.
UNIX Dir The home directory for the user on non-Windows systems
Zero Dir The home directory for the user via Active Directory
User ECNDB Link to the sender's username report in ECNDB
Host ECNDB Link to the computer report in ECNDB
Subject The subject of the email sent to the queue

Delimiters

  • Start: The second line after the first newline followed by a tab \n\t.
  • End: The first non-empty line after the start that begins with whitespace then "Subject:"

Plain Text Example

\tName: Jerry L Guerrero\n
       Login: jerry\n"
    Computer: x-ee27å0bpc1 (128.46.164.29)\n
    Location: EE 270B\n
       Email: jerry@purdue.edu\n
       Phone: \n
      Office: \n
    UNIX Dir: /home/pier/c/jerry\n
    Zero Dir: U=\\\\pier.ecn.purdue.edu\\jerry\n
  User ECNDB: http://eng.purdue.edu/jump/bcafa8\n
  Host ECNDB: http://eng.purdue.edu/jump/2dbd461    \n
     Subject: Win7 to Win10 Migration List - kevin\n

Parsed Example

{
    "type": "directory_information",
    "Name": "Jerry L Guerrero",
    "Login": "jerry",
    "Computer": "x-ee27å0bpc1 (128.46.164.29)",
    "Location": "EE 270B",
    "Email": "jerry@purdue.edu",
    "Phone": "",
    "Office": "",
    "UNIX Dir": "/home/pier/c/jerry",
    "Zero Dir": "U=\\\\pier.ecn.purdue.edu\\jerry",
    "User ECNDB": "http://eng.purdue.edu/jump/bcafa8",
    "Host ECNDB": "http://eng.purdue.edu/jump/2dbd461",
    "Subject": "Win7 to Win10 Migration List - kevin"
}

Initial Message

The body of the email the item originated from. This usually appears directly after the headers unless directory information is present.

Fields

Key Value
type initial_message
datetime RFC 8061 formatted datetime string.
from_name The sender's real name. Formatting may vary. This can be empty.
from_email The sender's email address.
to A list of names(s) and email(s) of people this message was sent to.
cc A list of name(s) and email(s) of people who were CC'd. This can be empty.
subject The subject of the initial message.
content The content of the message as an list of strings.

Delimiters

  • Start: First newline after directory information if present, otherwise first newline.
  • End: Beginning of another delimiter if present, otherwise end of file.

Plain Text Example

I need some help with something.

Parsed Example

{
    "type": "initial_message",
    "datetime": "2020-09-11T01:26:45+00:00",
    "from_name": "Justin Campbell",
    "from_email": "campb303@purdue.edu",
    "to": [
        { "name": "John Doe", "email": "johndoe@example.com" },
    ],
    "cc": [
        { "name": "", "email": "janesmith@example.com" }
    ],
    "subject": Maps to item.subject,
    "content": [
        "I need some help with something.\n"
    ]
}

Edit

Information added by someone at ECN, usually for internal use and/or communication. This can occur anywhere in the item after the initial message.

Fields

Key Value
type edit
datetime RFC 8061 formatted datetime string.
by The career account alias of the person who added the edit.
content The content of the edit as a list of strings.

Delimiters

  • Start: Line starting with *** Edited
  • End: Beginning of another delimiter if present, otherwise end of file.

Plain Text Example

*** Edited by: knewell at: 04/22/20 16:39:51 ***
This is related to another item. I need to do X next.

Parsed Example

{
    "type": "edit",
    "datetime": "2020-04-22T16:39:51",
    "by": "knewell",
    "content": [
        "This is related to another item. I need to do X next.\n"
    ]
}

Status

A short message about the progress of the item. This can occur anywhere in the item after the initial message.

Fields

Key Value
type status
datetime RFC 8061 formatted datetime string.
by The career account alias of the person who updated the status.
content The content of the status as a list of strings.

Delimiters

  • Start: Line starting with *** Status
  • End: Beginning of another delimiter if present, otherwise end of file.

Plain Text Example

*** Status updated by: knewell at: 4/23/2020 10:35:47 ***
Doing X thing.

Parsed Example

{
    "type": "status",
    "datetime": "2020-04-23T10:35:47",
    "by": "knewell",
    "content": [
        "Doing X thing."
    ]
}

Assignment

Assigning the item to someone. This does not occur in the body of the item. It it tracked in the headers using three different entries:

  • Assigned-To-Updated-By: the career account alias of the person who updated the assignment
  • Assigned-To-Updated-Time: the time the assignment was updated
  • Assigned-To: the career account alias of the person the item was assigned to

Fields

Key Value
type assignment
datetime RFC 8061 formatted datetime string.
by The career account alias of the person who changed the
to The career account alias of the person who the item was assigned to.

Delimiters

N/A

Plain Text Example

Assigned-To: campb303
Assigned-To-Updated-Time: Tue, 23 Jun 2020 13:27:00 EDT
Assigned-To-Updated-By: harley

Parsed Example

{
    "type": "assignment",
    "datetime": "2020-06-23T13:27:00",
    "by": "harley",
    "to": "campb303",
}

Reply To User

A message from ECN to the user and/or related parties. This can occur anywhere in the item after the initial message.

Fields

Key Value
type reply_to_user
datetime RFC 8061 formatted datetime string.
by The sender's real name. Formatting may vary. This can be empty.
content The content of the message as an list of strings

Delimiters

  • Start: Line starting with *** Replied
  • End: Beginning of another delimiter if present, otherwise end of file.

Plain Text Example

*** Replied by: ewhile at: 05/08/20 09:21:43 ***
Sascha,

Chicken kevin biltong, flank jowl prosciutto shoulder meatball meatloaf sirloin.

Ethan White
ECN

Parsed Example

{
    "type": "reply_to_user",
    "datetime": "2020-05-08T09:21:43",
    "by": "ewhile",
    "content": [
        "Sascha,\n",
        "\n",
        "Chicken kevin biltong, flank jowl prosciutto shoulder meatball meatloaf sirloin.\n",
        "\n",
        "Ethan White\n",
        "ECN"
    ]
}

Reply from User

A message from the user and/or related parties. This is only found after two or more items have been merged together. This can occur anywhere in the item after the initial message.

Fields

Key Value
type reply_from_user
datetime RFC 8061 formatted datetime string.
from_name The sender's real name. Formatting may vary. This can be empty.
from_email The sender's email address.
cc A list of name(s) and email(s) of people who were CC'd. This can be empty.
headers A dictionary of headers from the reply.
subject The subject of the reply.
content The content of the message as an list of strings

Delimiters:

  • Start: Line starting with ===
  • End: Line starting with ====

Plain Text Example

=== Additional information supplied by user ===

Subject: RE: New Computer Deploy
From: "Reckowsky, Michael J." <mreckowsky@purdue.edu>
Date: Fri, 8 May 2020 13:57:17 +0000

Ethan,

Biltong beef ribs doner chuck, pork chop jowl salami cow filet mignon pork.

Mike
===============================================

Parsed Example

{
    "type": "reply_from_user",
    "datetime": "2020-05-08T13:57:18+00:00",
    "from_name": "Reckowsky, Michael J.",
    "from_email": "mreckowsky@purdue.edu",
    "cc": [
        { "name": "John Doe", "email": "johndoe@example.com" },
        { "name": "", "email": "janesmith@example.com" }
    ],
    "headers" : [
        {
            "type": "Subject", 
            "content": "RE: New Computer Deploy"
        },
        {
            "type": "From", 
            "content": "\"Reckowsky, Michael J.\" <mreckowsky@purdue.edu>"
        },
        {
            "type": "Date", 
            "content": "Fri, 8 May 2020 13:57:17 +0000"
        },
    ],
    "subject": "RE: New Computer Deploy",
    "content": [
        "Ethan,\n",
        "\n",
        "Biltong beef ribs doner chuck, pork chop jowl salami cow filet mignon pork.\n",
        "\n",
        "Mike\n",
    ]
}

Parse Error

An error caused by a malformed delimiter or nested delimiters. Parse errors only occur if a delimiter is incorrectly formatted or a delimiter is nested in a reply-from-user.

Fields

Key Value
type parse_error
datetime RFC 8061 formatted datetime string.
file_path Full path of the item with the error.
expected Description of what the parser was expecting.
got Line that cause the parse error.
line_num The line number in the item that caused the parse error.

Plain Text Example

(item aae2 in qsnapshot)
=== Additional information supplied by user ===

Subject: RE: Help with hardware upgrades
From: "Ezra, Kristopher L" <kris@purdue.edu>
Date: Wed, 5 Feb 2020 18:11:58 +0000

If it makes no difference between windows and linux for the fileserver I'd
rather service a linux machine.

Considering the switches,  i could do 2 8s and 2 16s.  Two of the switches
I'm replacing already service 9 connections and I'd rather not daisy chain.
Is there something driving the price here? I see gigabit switches from
tplink on amazon right now for $50.  I dont need managed switches or
anything fancy.

Kris
*** Replied by: emuffley at: 02/05/20 13:22:02 ***

Kris,

Thank you on the server operating question.  We will kick that off to our linux folks for discussion.

No daisy chain, agreed.  These switches are unmanaged.

For the workstations, are you wanting Windows, Linux or a mix?


Eric Muffley

Systems Engineer, Engineering Computer Network

===============================================

Parsed Example

{
  'type': 'parse_error',
  'datetime': '2020-10-16T10:44:45',
  'file_path': '/home/pier/e/benne238/webqueue2/q-snapshot/aae/2',
  'expected': 'Did not encounter a reply-from-user ending delimiter',
  'got': 'Kris', 
  'line_num': 468
}
@benf
Copy link

benf commented Aug 7, 2020

The directory information is only present if the Trouble Report form was used to submit the item.

@campb303 campb303 added tooling Related to tools and utilities for the management of the project enhancement Request for a change to existing functionality labels Aug 7, 2020
@campb303
Copy link
Collaborator Author

campb303 commented Aug 7, 2020

@benf Useful information! I'll update my original post to add that logic.

@campb303 campb303 added this to the v1 milestone Sep 14, 2020
@campb303
Copy link
Collaborator Author

The JSON spec for this has been updated both inline and in this file.

@campb303 campb303 reopened this Sep 20, 2020
@campb303
Copy link
Collaborator Author

The current implementation is good enough for tomorrow's presentation but, moving forward, the delimiter information and newlines before/after the content should be removed from the content entry in the dictionary.


Edit
Example: (from ce 8)

    {
      "type": "edit",
      "by": "remender",
      "datetime": "2020-03-11T09:42:52",
      "content": [
        "*** Edited by: remender at: 03/11/20 09:42:52 ***\n",
        "\n",
        "ST: 88QVQC2\n",
        "\n",
        "\n",
        "\n"
      ]
    },

This should be:

    {
      "type": "edit",
      "by": "remender",
      "datetime": "2020-03-11T09:42:52",
      "content": [
        "ST: 88QVQC2\n",
      ]
    },

Status
Example: (from ce 8)

    {
      "type": "status",
      "by": "remender",
      "datetime": "2020-03-11T09:26:19",
      "content": [
        "*** Status updated by: remender at: 3/11/2020 09:26:19 ***\n",
        "waiting for reply\n",
        "\n"
      ]
    },

This should be:

    {
      "type": "status",
      "by": "remender",
      "datetime": "2020-03-11T09:26:19",
      "content": [
        "waiting for reply\n"
      ]
    },

Status
See #54


Reply from ECN
Example: (from ce 8)

    {
      "type": "replyToUser",
      "by": "remender",
      "datetime": "2020-03-11T09:25:59",
      "content": [
        "*** Replied by: remender at: 03/11/20 09:25:59 ***\n",
        "\n",
        "Hi Yen-Fang,\n",
        "\n",
        "We can look into getting a quote for a replacement battery. Could you send us the Service Tag of the system?\n",
        "\n",
        "Josh Remender\n",
        "ECN\n",
        "\n"
      ]
    },

This should be:

    {
      "type": "replyToUser",
      "by": "remender",
      "datetime": "2020-03-11T09:25:59",
      "content": [
        "Hi Yen-Fang,\n",
        "\n",
        "We can look into getting a quote for a replacement battery. Could you send us the Service Tag of the system?\n",
        "\n",
        "Josh Remender\n",
        "ECN\n",
      ]
    },

Reply from User
Example: (from ce 8)

    {
      "type": "replyFromUser",
      "datetime": "2020-03-11T13:39:02+0000",
      "subject": "Re: Lab laptop swollen battery ",
      "userName": "Yen-Fang Su",
      "userEmail": "su177@purdue.edu",
      "content": [
        "=== Additional information supplied by user ===\n",
        "\n",
        "Subject: Re: Lab laptop swollen battery \n",
        "From: Yen-Fang Su <su177@purdue.edu>\n",
        "Date: Wed, 11 Mar 2020 13:39:02 +0000\n",
        "X-ECN-Queue-Original-Path: /home/pier/e/queue/Attachments/inbox/2020-03-11/111-original.txt\n",
        "X-ECN-Queue-Original-URL: https://engineering.purdue.edu/webqueue/Attachments/inbox/2020-03-11/111-original.txt\n",
        "X-ECN-Queue-Attachment-1-URL: https://engineering.purdue.edu/webqueue/Attachments/inbox/2020-03-11/111-attachment-1.jpg\n",
        "X-ECN-Queue-Attachment-1-Path: /home/pier/e/queue/Attachments/inbox/2020-03-11/111-attachment-1.jpg\n",
        "\n",
        "Hi Josh,\n",
        "\n",
        "Service tag:\n",
        "\n",
        "\n",
        "===============================================\n",
        "\n"
      ],
      "ccRecipients": [
        
      ]
    }

This should be:

    {
      "type": "replyFromUser",
      "datetime": "2020-03-11T13:39:02+0000",
      "subject": "Re: Lab laptop swollen battery ",
      "userName": "Yen-Fang Su",
      "userEmail": "su177@purdue.edu",
      "content": [
        "Hi Josh,\n",
        "\n",
        "Service tag:\n",
      ],
      "ccRecipients": [
        
      ]
    },

@benne238
Copy link
Collaborator

ECNQueue dedicates a dictionary to the Directory section of an Item, with keys that are parsed directly from the directory information within an item. Instead of this dictionary format:

{
    "type": "directoryInformation",
    
    // An array of lines with non-printable characters.
    // Example from aae 1
    "content": [
        "\n",
        "\tName: Jerry L Guerrero\n",
        "       Login: jerry\n",
        "    Computer: x-ee27å0bpc1 (128.46.164.29)\n",
        "    Location: EE 270B\n",
        "       Email: jerry@purdue.edu\n",
        "       Phone: \n",
        "      Office: \n",
        "    UNIX Dir: /home/pier/c/jerry\n",
        "    Zero Dir: U=\\\\pier.ecn.purdue.edu\\jerry\n",
        "  User ECNDB: http://eng.purdue.edu/jump/bcafa8\n",
        "  Host ECNDB: http://eng.purdue.edu/jump/2dbd461    \n",
        "     Subject: Win7 to Win10 Migration List - kevin\n"
    ]
}

The script will now dedicate a key to each delimiter in content, so the output reflects this structure:

// An array of lines with non-printable characters.
    // Example from aae 1
{
    "type": "directoryInformation",
    "Name": Jerry L Guerrero",
    "Login": "jerry",
    "Computer": "x-ee27å0bpc1 (128.46.164.29)",
    "Location": "EE 270B",
    "Email": "jerry@purdue.edu",
    "Phone": "",
    "Office": "",
    "UNIX Dir": "/home/pier/c/jerry",
    "Zero Dir": "U=\\\\pier.ecn.purdue.edu\\jerry",
    "User ECNDB": "http://eng.purdue.edu/jump/bcafa8",
    "Host ECNDB": "http://eng.purdue.edu/jump/2dbd461",
    "Subject": "Win7 to Win10 Migration List - kevin"
}

Keys are not predetermined however, so any key, with the exception of "Name" can be included or excluded which will not affect the functionality of the directory section parsing.

Currently, the script parses through raw directory information that looks like this:

\tName: Jerry L Guerrero\n
       Login: jerry\n"
    Computer: x-ee27å0bpc1 (128.46.164.29)\n
    Location: EE 270B\n
       Email: jerry@purdue.edu\n
       Phone: \n
      Office: \n
    UNIX Dir: /home/pier/c/jerry\n
    Zero Dir: U=\\\\pier.ecn.purdue.edu\\jerry\n
  User ECNDB: http://eng.purdue.edu/jump/bcafa8\n
  Host ECNDB: http://eng.purdue.edu/jump/2dbd461    \n
     Subject: Win7 to Win10 Migration List - kevin\n

The script will strip any leading and trailing white space (which includes tabs, spaces and newlines), directly from each line, and create a key value pair by splitting each into two substrings based on the location of the colon in each delimiter, effectively making a key, value pair which is then added to the directory Information dictionary as seen in the current json output above.

@benne238
Copy link
Collaborator

benne238 commented Oct 2, 2020

def __getFormattedMessageContent(self, messageContent: list) -> list:

This helper function was created within the ECNQueue.py script. A list is passed into it and the function will continually check if the first and last line of the list is a newline or a delimiter, and removes that line until both the first line and the last lines of the list are not newlines or delimiters. The function then returns the list, stripped of delimiters and newlines.

@campb303
Copy link
Collaborator Author

Error parsing is detailed in #58 and now merges with this. All parsing dicussion should continue in this thread.

Sign in to join this conversation on GitHub.
Labels
enhancement Request for a change to existing functionality tooling Related to tools and utilities for the management of the project
Projects
None yet
Development

No branches or pull requests

3 participants