Skip to content

Parsing Nested Replies from Users #58

Closed
benne238 opened this issue Sep 25, 2020 · 5 comments
Closed

Parsing Nested Replies from Users #58

benne238 opened this issue Sep 25, 2020 · 5 comments
Assignees
Labels
bug An issue that results in webqueue2 breaking tooling Related to tools and utilities for the management of the project

Comments

@benne238
Copy link
Collaborator

Item 11 in the aae queue within q-snapshot contains the following reply-from-user section:

=== Additional information supplied by user ===

Subject: Re: Computer does not have Adobe Acrobat (CHAF201PC10.ECN)
From: Andrew W Head <head13@purdue.edu>
Date: Mon, 9 Mar 2020 19:03:54 +0000
X-ECN-Queue-Original-Path: /home/pier/e/queue/Attachments/inbox/2020-03-09/413-original.txt
X-ECN-Queue-Original-URL: https://engineering.purdue.edu/webqueue/Attachments/inbox/2020-03-09/413-original.txt

Hi there,

Oh no, that's not good. I need a program with text recognition for pdfs.
Can Sumatra do that? And if not, what other options do I have for that?

As for migrating my PC to Windows 10, this Thursday to Friday would work.
Thanks.

Sincerely,
Andrew
*** Replied by: kevin at: 03/09/20 16:43:39 ***

Andrew,

Sumatra can not do text recognition for pdfs. We can migrate you to Windows 10
and then install Adobe Acrobat. Before we can migrate your PC please make sure
any data you need is backed up. Also provide us a list of any other software
that you need installed back to your PC.
Thanks,

Kevin Hurley

ECN Systems Engineer 

*** Status updated by: kevin at: 3/9/2020 16:48:33 ***
waiting for reply

=== Additional information supplied by user ===

Subject: Re: Computer does not have Adobe Acrobat (CHAF201PC10.ECN)
From: Andrew W Head <head13@purdue.edu>
Date: Fri, 13 Mar 2020 14:16:30 +0000
X-ECN-Queue-Original-Path: /home/pier/e/queue/Attachments/inbox/2020-03-13/133-original.txt
X-ECN-Queue-Original-URL: https://engineering.purdue.edu/webqueue/Attachments/inbox/2020-03-13/133-original.txt

Hi there,

Just following up on this. I need a program with text recognition for pdfs.
Can Sumatra do that? And if not, what other options do I have for that?
Thanks.

Sincerely,
Andrew

===============================================


===============================================

The nested section delimiters within reply from user cause the current ecnqueue script to interpret the sections as separate sections and not part of the message in the reply from user.

@benne238 benne238 added tooling Related to tools and utilities for the management of the project bug An issue that results in webqueue2 breaking labels Sep 25, 2020
@benne238 benne238 self-assigned this Sep 25, 2020
@benne238
Copy link
Collaborator Author

All content in the reply from user is now parsed as part of the reply from user message, even if there are other delimiters within the reply including other replies from the user. The method of doing this relies on the reply from user ending delimiter ==========================. Without this ending delimiter, all other delimiters below a reply from user will be parsed as part of the reply from user message, even if they are not part of the message.

=== Additional information supplied by user ===

Subject: 
From:
Date: 
X-ECN-Queue-Original-Path:
X-ECN-Queue-Original-URL: 

reply message content

*** Replied by: username at: dd/mm/yyyy hh:mm:ss ***

reply from ecn message content

=== Additional information supplied by user ===

Subject:
From:
Date:
X-ECN-Queue-Original-Path:
X-ECN-Queue-Original-URL:

reply from message content

===============================================


===============================================


*** Status updated by: username at: dd/mm/yyyy hh:mm:ss ***
status update message

In the above item example item, there is a nested reply from a user within a reply from a user. The parent reply-from-user is what will be stored in a dictionary, and all the other nested delimiters in the reply-from-user will be stored as message content to the parent reply-from-user. However, if any of the reply-from-user ending delimiters are discarded, then the status update, which is not nested, and everything beyond, will be parsed as part of the message content for the parent reply-from-user.

@campb303
Copy link
Collaborator

Rather than try to work around parsing improperly merged items, we will revert to parsing chronologically. While parsing chronologically, if/when we encounter unexpected syntax, we will stop parsing and insert an error showing the error encountered and the line number.

This behavior is similar to how the Python debugger generates error messages and how the cclang C compiler follows expressive diagnostics

Example of cclang error messages.
ccland Error Messages


Example: Properly formatted item
Example Item

I need help with my computer.

*** Edited by: knewell at: 04/22/20 16:39:51 ***
They're computer is arms2106pc12

*** Status updated by: knewell at: 4/23/2020 10:35:47 ***
Computer is online again

Example Item Parsed

[
    {
        "type": "initialMessage",
        "datetime": "2020-04-23T09:35:47Z",
        "userName": "",
        "userEmail": "",
        "ccRecipients": [ ],
        "content": "I need help with my computer.\n"
    },
    {
        "type": "edit",
        "datetime": "2020-04-22T16:39:51Z",
        "by": "knewell",
        "content": "They're computer is arms2106pc12\n"
    },
    {
        "type": "status",
        "datetime": "2020-04-23T10:35:47Z",
        "by": "knewell",
        "content": "Computer is online again\n"
    },
]

Example: Improperly formatted item
Example Item

I need help with my computer.

*** Edited by: knewell at: 04/22/20 16:39:51 ***
They're computer is arms2106pc12

*** Status updated by: knewell at: 
4/23/2020 10:35:47 ***
Computer is online again

Example Item Parsed

[
    {
        "type": "initialMessage",
        "datetime": "2020-04-23T09:35:47Z",
        "userName": "",
        "userEmail": "",
        "ccRecipients": [ ],
        "content": "I need help with my computer.\n"
    },
    {
        "type": "parseError",
        "datetime": "2020-09-30T17:51:10Z",
        "content": "Parsing error at 6:35. Expected date string but got '\n'
                    *** Status updated by: knewell at:
                                                     ^"
    }
]

@benne238
Copy link
Collaborator Author

benne238 commented Oct 2, 2020

Presently, the only dictionary returned after encountering a nested delimiter is formatted similar to this from Item 11 in aae:

{
"type": "parseError",
"datetime": "2020-10-02T10:59:57",
"content": "Nested delimiter encountered on line 131:\n\t *** Replied by: kevin at: 03/09/20 16:43:39 ***\n"
}

While this isn't the best way to store this information, it does successfully identify nested delimiters as well as the line number associated with the error.

@campb303 campb303 closed this as completed Oct 5, 2020
@campb303 campb303 reopened this Oct 5, 2020
@benne238
Copy link
Collaborator Author

benne238 commented Oct 6, 2020

 def __errorParsing(self, line: str, lineNum: int, lineColumn: int, errorMessage: str) -> dict:

This helper function was implemented in the ECNQueue.py script to create a unified format for creating an error parse section if an error is encountered when going through an item. Any one item can have a wide variety of possible parsing errors (missing information, altered delimiter format, nested delimiters, missing ending delimiters, etc). This helper function outputs a dictionary containing an error message (as specified by the specific error) as well as the column and line number that caused the error, providing a uniform way to inform the user of any malformations in the item.

Key Value
type "parse_error"
datetime The time the error was encountered
content[0] Error message explaining the issue with the location following the syntax line:column
content[1] Item line that threw the parsing error
{
  "type": "parse_error",
  "datetime": "2020-10-06T15:38:40-0500",
  "content": [
   "Encountered Nested delimiter at 128:0",
   "*** Status updated by: username at: 4/28/2020 14:21:42 ***"
  ]
}

@campb303 campb303 added this to the v1 milestone Oct 9, 2020
@campb303
Copy link
Collaborator

campb303 commented Oct 12, 2020

This issue has changed from being about nested replies to a more generic parsing feature. Parsing is already be tracked in another issue so this will be referenced in #2 and closed.

Sign in to join this conversation on GitHub.
Labels
bug An issue that results in webqueue2 breaking tooling Related to tools and utilities for the management of the project
Projects
None yet
Development

No branches or pull requests

2 participants