diff --git a/CryoREAD/.gitignore b/CryoREAD/.gitignore
new file mode 100644
index 0000000..9f11b75
--- /dev/null
+++ b/CryoREAD/.gitignore
@@ -0,0 +1 @@
+.idea/
diff --git a/CryoREAD/CryoREAD.ipynb b/CryoREAD/CryoREAD.ipynb
new file mode 100644
index 0000000..0eeb0f5
--- /dev/null
+++ b/CryoREAD/CryoREAD.ipynb
@@ -0,0 +1,514 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "private_outputs": true,
+ "provenance": [],
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ },
+ "accelerator": "GPU",
+ "gpuClass": "standard"
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# CryoREAD: De novo structure modeling for nucleic acids in cryo-EM maps using deep learning\n"
+ ],
+ "metadata": {
+ "id": "dHUEherpXUYZ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ "\n",
+ "Cryo_READ is a computational tool using deep learning to automatically build full DNA/RNA atomic structure from cryo-EM map. \n",
+ "\n",
+ "Copyright (C) 2022 Xiao Wang, Genki Terashi, Daisuke Kihara, and Purdue University.\n",
+ "\n",
+ "License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)\n",
+ "\n",
+ "Contact: Daisuke Kihara (dkihara@purdue.edu).\n",
+ "\n",
+ "For technical problems or questions, please reach to Xiao Wang (wang3702@purdue.edu).\n",
+ "\n",
+ "**We strongly suggest to use Google Chrome for CryoREAD Colab version. Other browsers such as Safari may raise errors when uploading or downloading files.**\n",
+ "\n",
+ "If you are using other browsers, disabling tracking protection may help resolve the errors when uploading or downloading files.\n",
+ "\n",
+ "For more details, see **Instructions** of the notebook and checkout the **[CryoREAD GitHub](https://github.com/kiharalab/CryoREAD)**. If you use CryoREAD, please cite it: **Citation**."
+ ],
+ "metadata": {
+ "id": "gdmYnKJkXbvF"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#Overall Protocol\n",
+ "1) Structure Detection by deep neural network Cryo-READ networks; \n",
+ "2) Tracing backbone according to detections; \n",
+ "3) Fragment-based nucleotide assignment; \n",
+ "4) Full atomic structure modeling. \n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
"
+ ],
+ "metadata": {
+ "id": "Otu-D0c7YMwu"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Instructions \n",
+ "## Tutorial ppt [PPT](https://github.com/kiharalab/CryoREAD/blob/main/CryoREAD_tutorial.pptx)\n",
+ "## Steps\n",
+ "1. Connect to a gpu machine by clicking the right top button **\"connect\"** in the notebook, then we can run DAQ under GPU support.\n",
+ "2. Click the left running button in Install Dependencies to install dependencies.\n",
+ "3. Upload your cryo-EM maps in mrc/map format by clicking the left running button in Upload Cryo EM maps. If you want to use our example,then choose the box **use_author_example**.
\n",
+ "Here we suggest user to upload a cryo-EM map with **spacing 1** to save the running time.
\n",
+ "Here is a simple instructions to do that via [ChimeraX](https://www.rbvi.ucsf.edu/chimerax/): \n",
+ "```\n",
+ "1 open your map via chimeraX.\n",
+ "2 In the bottom command line to type command: vol resample #1 spacing 1.0\n",
+ "3 In ChimeraX, click \"save\", then choose \"MRC density map(*.mrc)\" in \"Files of type\", then in \"Map\" choose the resampled map, finally specify the file name and path to save.\n",
+ "4 Then you can use the resampled map to upload\n",
+ "```\n",
+ "4. (Optional) Upload your fasta file storing sequence information by clicking the left running button in Upload Fasta.
\n",
+ "We suggest to use the following style:\n",
+ "```\n",
+ ">[chain_id1]\n",
+ "sequence_info1\n",
+ ">[chain_id2]\n",
+ "sequence_info2\n",
+ "```\n",
+ "such as\n",
+ "```\n",
+ ">A\n",
+ "CUGACAUACUUGUUCCACUCUAGCAGCACGUAAAUAUUGGCGUAGUGAAAUAUAUAUUAAACACCAAUAUUACUGUGCUGCUUUAGUGUGACAGGGAUACAGCAA\n",
+ "```\n",
+ "Meanwhile, standard way in PDB database should also be supported.\n",
+ "\n",
+ "5. Specify the Parameters in Parameters. Either you modified or not, click the left running button to set it.\n",
+ "\n",
+ "6. Running CryoREAD by by clicking the left running button in Run CryoREAD.\n",
+ "\n",
+ "7. (Optional) Click the left running button in Download to download the zip files. If you choose to generate structures by either using sequence or not, the output will be saved in pdb format. You can easily open it in **COOT** to do further refinement based on your expertise. For simple visualization, you can also use pymol to check the outputted structure. If you only choose to generate CryoREAD detections, you can check the predicted probabilities by\n",
+ "\n",
+ "8. Visualize structure online by clicking the left running button in Visualization\n",
+ "\n",
+ "**Result in zip file**\n",
+ "1. A PDB file with final structure.\n",
+ "2. A PDB file saved naive structure from CryoREAD without refinement.\n",
+ "3. Detection map with probability values of different classes: phosphate, sugar, base, protein; base-A, base-U/T, base-C, base-G.\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "mkLP5iPpZv5S"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Run CryoREAD Online"
+ ],
+ "metadata": {
+ "id": "bfnZFRxhfKtt"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#@title Install dependencies \n",
+ "#@markdown Please make sure the notebook is already connected to **GPU**, DAQ needs GPU support to run.
\n",
+ "#@markdown Click the right top button **\"connect\"**, then the notebook will automatically connect to a gpu machine\n",
+ "\n",
+ "%cd /content\n",
+ "!pip install biopython ortools==9.4.1874\n",
+ "!pip install mrcfile==1.2.0\n",
+ "!pip install numpy>=1.19.4\n",
+ "!pip install numba>=0.52.0\n",
+ "!pip install torch>=1.6.0\n",
+ "!pip install scipy>=1.6.0\n",
+ "!pip install tqdm\n",
+ "!pip install progress\n",
+ "!pip install numba-progress\n",
+ "!pip install py3Dmol\n",
+ "!rm -rf CryoREAD\n",
+ "!git clone https://github.com/kiharalab/CryoREAD --quiet\n",
+ "%cd CryoREAD"
+ ],
+ "metadata": {
+ "id": "TZjoGFVZfbq6",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#@title Input cryo-EM map \n",
+ "#@markdown **Please make sure the cryo-EM map is 3D cryo-EM map with the same format in EMDB.**\n",
+ "#@markdown
Here we suggest user to upload a cryo-EM map with **spacing 1** to save the running time. Detailed instructions with ChimeraX is ChimeraX resampling\n",
+ "#@markdown
**Support file format: .mrc, .mrc.gz, .map, .map.gz**\n",
+ "from google.colab import files\n",
+ "import os\n",
+ "import os.path\n",
+ "import re\n",
+ "import hashlib\n",
+ "import random\n",
+ "import string\n",
+ "\n",
+ "rand_letters = string.ascii_lowercase\n",
+ "rand_letters = ''.join(random.choice(rand_letters) for i in range(20))\n",
+ "#@markdown Instead of uploading, you can also specify the link here to automatically download maps from EMDB and other servers.\n",
+ "#@markdown Example: https://files.wwpdb.org/pub/emdb/structures/EMD-21051/map/emd_21051.map.gz\n",
+ "download_link = '' #@param {type:\"string\"}\n",
+ "#@markdown ```If you want to use author's example, just select the following box.```\n",
+ "if download_link!='':\n",
+ " root_dir = os.getcwd()\n",
+ " upload_dir = os.path.join(root_dir,rand_letters)\n",
+ " if not os.path.exists(upload_dir):\n",
+ " os.mkdir(upload_dir)\n",
+ " os.chdir(upload_dir)\n",
+ " os.system(\"wget %s\"%download_link)\n",
+ " parse_link=download_link.split(\"/\")[-1]\n",
+ " map_input_path = os.path.join(upload_dir,parse_link)\n",
+ " os.chdir(root_dir)\n",
+ " fasta_input_path = None\n",
+ "else:\n",
+ " use_author_example = True #@param {type:\"boolean\"}\n",
+ " if not use_author_example:\n",
+ " os.chdir(\"/content/CryoREAD\")\n",
+ " root_dir = os.getcwd()\n",
+ " upload_dir = os.path.join(root_dir,rand_letters)\n",
+ " if not os.path.exists(upload_dir):\n",
+ " os.mkdir(upload_dir)\n",
+ " os.chdir(upload_dir)\n",
+ " map_input = files.upload()\n",
+ " for fn in map_input.keys():\n",
+ " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
+ " name=fn, length=len(map_input[fn])))\n",
+ " map_input_path = os.path.abspath(fn)\n",
+ " print(\"Map save to %s\"%map_input_path)\n",
+ " os.chdir(root_dir)\n",
+ " fasta_input_path = None\n",
+ " else:\n",
+ " map_input_path = os.path.join(os.getcwd(),\"example\")\n",
+ " map_input_path = os.path.join(map_input_path,\"21051.mrc\")\n",
+ " fasta_input_path = os.path.join(os.getcwd(),\"example\")\n",
+ " fasta_input_path = os.path.join(fasta_input_path,\"21051.fasta\")\n",
+ " print(\"Autho Example is selected!\",map_input_path)"
+ ],
+ "metadata": {
+ "id": "kNmT6IyGjYd4",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#@title (Optional) Input Fasta File \n",
+ "#@markdown If you choose to use author's example, please **skip** this.
\n",
+ "#@markdown If your **sequence length** is longer than **500**, please ignore this step and just use no sequence mode since free-version colab can only run 2-3 hours per day with 2 CPUs.\n",
+ "#@markdown
Otherwise, please upload your fasta file.\n",
+ "#@markdown
**Support file format: .fasta**\n",
+ "\n",
+ "\n",
+ "from google.colab import files\n",
+ "import os\n",
+ "import os.path\n",
+ "import re\n",
+ "import hashlib\n",
+ "import random\n",
+ "import string\n",
+ "\n",
+ "rand_letters = string.ascii_lowercase\n",
+ "rand_letters = ''.join(random.choice(rand_letters) for i in range(20))\n",
+ "if use_author_example:\n",
+ " print(\"you have chosen to use author's example, you can not upload map files any more.\")\n",
+ "else:\n",
+ " root_dir = os.getcwd()\n",
+ " upload_dir = os.path.join(root_dir,rand_letters)\n",
+ " if not os.path.exists(upload_dir):\n",
+ " os.mkdir(upload_dir)\n",
+ " os.chdir(upload_dir)\n",
+ " fasta_input = files.upload()\n",
+ " for fn in fasta_input.keys():\n",
+ " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
+ " name=fn, length=len(fasta_input[fn])))\n",
+ " fasta_input_path = os.path.abspath(fn)\n",
+ " print(\"Fasta file save to %s\"%fasta_input_path)\n",
+ " os.chdir(root_dir)\n"
+ ],
+ "metadata": {
+ "id": "oYnMrYURkW7d",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "\n",
+ "#@title Specify Parameters \n",
+ "contour = 0.6 #@param {type:\"number\"}\n",
+ "#@markdown ```author (in EMDB) recommended contour level for the input map. Using contour level will not have any impact on the result, but can reduce the computation time by ignoring uninterested regions. ```\n",
+ "#@markdown
```If you are not sure the contour level, just use 0.```\n",
+ "#@markdown
```default:0. Suggested Range: [0,author_contour]```\n",
+ "stride = 32 #@param {type:\"number\"}\n",
+ "#@markdown Detailed explanation can be seen: [stride_definition](https://deepai.org/machine-learning-glossary-and-terms/stride)
\n",
+ "#@markdown ```stride step for scanning the cryo-EM map with a box size of 64. Increasing the stride can reduce the computation time but may lead to unreliable result. ```
``` default stride: 16(integer). Suggested values: [16,32,48].```\n",
+ "#@markdown
**If your job encounter disk space limit of colab, please increase stride to 32 or 48 to save time.**\n",
+ "\n",
+ "detection_only = 0 #@param {type:\"number\"}\n",
+ "#@markdown ```If you only want to get the predictions from CryoREAD, please change it to 1. Otherwise, leave it as 0.```\n",
+ "#@markdown ```You can also set it to 1 if you find whole process cannot be finished in colab in its limited time.```\n",
+ "use_sequence = 0 #@param {type:\"number\"}\n",
+ "#@markdown ```use sequence information to further refine base assignment or not. Default: 0. Because fragment-base assignment takes long time to finish, therefore we set it as 0 by default. However, for sequence length less than 500, it may be possible to finish in colab (2-3 hour limit per day). If you are confident your job can finish in colab, you can set it as 1 for better base assignment. ```\n",
+ "# resolution =3.7 #@param {type:\"number\"}\n",
+ "# #@markdown ```Resolution of Cryo-EM maps. Required for last step refinement. If you specify 0 here, we will skip refinement step.```\n",
+ "resolution = 0"
+ ],
+ "metadata": {
+ "id": "ufb_iHDAnWz_",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#@title Run CryoREAD \n",
+ "#@markdown Please allow 5min-2hours to get the output, since 3D input processing and inferencing takes some time.\n",
+ "#@markdown
Our running time is directly correlated to the size of the map.\n",
+ "#@markdown
If your map is too big, please run locally with our github code. If you don't have GPU resources, please make contact with us and we are happy to run it for you.\n",
+ "%cd /content/CryoREAD\n",
+ "!git pull origin main #make sure up to date\n",
+ "if detection_only:\n",
+ " command_line = \"python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8 --prediction_only --stride=%d\"%(map_input_path,contour*0.5,stride)\n",
+ "elif use_sequence:\n",
+ " if resolution==0:\n",
+ " command_line = \"python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8 -P=%s --rule_soft=0 --stride=%d\"%(map_input_path,contour*0.5,fasta_input_path,stride)\n",
+ " else:\n",
+ " command_line = \"python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8 -P=%s --rule_soft=0 --resolution=%f --refine --colab --stride=%d\"%(map_input_path,contour*0.5,fasta_input_path,resolution,stride)\n",
+ "else:\n",
+ " if resolution==0:\n",
+ " command_line = \"python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8 --no_seqinfo --stride=%d\"%(map_input_path,contour*0.5,stride)\n",
+ " else:\n",
+ " command_line = \"python main.py --mode=0 -F=%s -M=best_model --contour=%f --gpu=0 --batch_size=8 --no_seqinfo --resolution=%f --refine --colab --stride=%d\"%(map_input_path,contour*0.5,resolution,stride)\n",
+ "!echo $command_line\n",
+ "!$command_line\n",
+ "!echo \"INFO : CryoREAD Done\""
+ ],
+ "metadata": {
+ "id": "SMNz2SuPrkcR",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#@title Download Output \n",
+ "#@markdown The pdb file of predicted structure and detection probability map of CryoREAD will be compressed and downloaded. You can visualize your structure by Pymol and may further refine it by loading it into COOT.\n",
+ "from google.colab import files\n",
+ "import os, tarfile\n",
+ "import shutil\n",
+ "import zipfile\n",
+ "\n",
+ "zip_format = True #@param {type:\"boolean\"}\n",
+ "#@markdown If you want to download tar.gz format file, please not choose **zip_format** box.\n",
+ "map_name = os.path.split(map_input_path)[1].replace(\".mrc\", \"\")\n",
+ "map_name = map_name.replace(\".map\", \"\")\n",
+ "map_name = map_name.replace(\".gz\", \"\")\n",
+ "map_name = map_name.replace(\"(\",\"\").replace(\")\",\"\")\n",
+ "download_path = os.path.join(os.getcwd(),\"Predict_Result\")\n",
+ "user_download_path = os.path.join(download_path,map_name)\n",
+ "detection_download_path = os.path.join(user_download_path,\"2nd_stage_detection\")\n",
+ "tmp_download_dir = os.path.join(os.getcwd(),\"tmp\")\n",
+ "if not os.path.exists(tmp_download_dir):\n",
+ " os.mkdir(tmp_download_dir)\n",
+ "os.system(\"rm \"+str(tmp_download_dir)+\"/*\")\n",
+ "#get detection maps\n",
+ "for item in os.listdir(detection_download_path):\n",
+ " if \".mrc\" in item and \"chain\" in item:\n",
+ " shutil.copy(os.path.join(detection_download_path,item),os.path.join(tmp_download_dir,item))\n",
+ "if not detection_only:\n",
+ " #get structures\n",
+ " structure_download_path = user_download_path #os.path.join(user_download_path,\"Output\")\n",
+ " # if use_sequence:\n",
+ " # structure_download_path = os.path.join(structure_download_path,\"Output_Structure\")\n",
+ " # else:\n",
+ " # structure_download_path = os.path.join(structure_download_path,\"Output_Structure_noseq\")\n",
+ " for item in os.listdir(structure_download_path):\n",
+ " if \".pdb\" in item:\n",
+ " shutil.copy(os.path.join(structure_download_path,item),os.path.join(tmp_download_dir,item))\n",
+ "if zip_format:\n",
+ " tar_path = os.path.join(download_path,map_name+\"_cryoread.zip\")\n",
+ "else:\n",
+ " tar_path = os.path.join(download_path,map_name+\"_cryoread.tar.gz\")\n",
+ "def zip_file(tar_path,src_dir):\n",
+ " zip_name = tar_path\n",
+ " z = zipfile.ZipFile(zip_name,'w',zipfile.ZIP_DEFLATED)\n",
+ " for dirpath, dirnames, filenames in os.walk(src_dir):\n",
+ " fpath = dirpath.replace(src_dir,'')\n",
+ " fpath = fpath and fpath + os.sep or ''\n",
+ " for filename in filenames:\n",
+ " z.write(os.path.join(dirpath, filename),fpath+filename)\n",
+ " print ('==Compress Success!==',filename)\n",
+ " z.close()\n",
+ "\n",
+ "def make_targz(output_filename, source_dir):\n",
+ " \"\"\"\n",
+ " :param output_filename:\n",
+ " :param source_dir:\n",
+ " :return: bool\n",
+ " \"\"\"\n",
+ " try:\n",
+ " with tarfile.open(output_filename, \"w:gz\") as tar:\n",
+ " tar.add(source_dir, arcname=os.path.basename(source_dir))\n",
+ "\n",
+ " return True\n",
+ " except Exception as e:\n",
+ " print(e)\n",
+ " return False\n",
+ "if zip_format:\n",
+ " zip_file(tar_path,tmp_download_dir)\n",
+ "else:\n",
+ " make_targz(tar_path,tmp_download_dir)\n",
+ "files.download(tar_path)\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "Ab17KMMH25ru"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#@title CryoREAD Predicted Structure Visualization (3D) \n",
+ "#@markdown Limited by redistribution constraints, the structure here is not refined and may include atom clashes. If you want better structures, please use our server for full services: https://em.kiharalab.org/algorithm/CryoREAD.\n",
+ "#@markdown
Please **skip** this step if you choose detection_only.\n",
+ "#@markdown
To check the structure positions in map, please download the structure and visualize in coot, chimera or pymol to compare against the input map.\n",
+ "map_name = os.path.split(map_input_path)[1].replace(\".mrc\", \"\")\n",
+ "map_name = map_name.replace(\".map\", \"\")\n",
+ "map_name = map_name.replace(\".gz\", \"\")\n",
+ "map_name = map_name.replace(\"(\",\"\").replace(\")\",\"\")\n",
+ "download_path = os.path.join(os.getcwd(),\"Predict_Result\")\n",
+ "user_download_path = os.path.join(download_path,map_name)\n",
+ "# structure_download_path = os.path.join(user_download_path,\"graph_atomic_modeling\")\n",
+ "# if use_sequence:\n",
+ "# structure_download_path = os.path.join(structure_download_path,\"Output_Structure\")\n",
+ "# else:\n",
+ "# structure_download_path = os.path.join(structure_download_path,\"Output_Structure_noseq\")\n",
+ "# listfiles=[]\n",
+ "# for item in os.listdir(structure_download_path):\n",
+ "# if \".pdb\" in item:\n",
+ "# listfiles.append(os.path.join(structure_download_path,item))\n",
+ "# final_pdb_path=None\n",
+ "# if len(listfiles)==0:\n",
+ "# print(\"no pdb detected in the prediction directory\",structure_download_path)\n",
+ "# elif len(listfiles)==1:\n",
+ "# final_pdb_path=listfiles[0]\n",
+ "# else:\n",
+ "# for x in listfiles:\n",
+ "# if \"Refine\" in x:\n",
+ "# final_pdb_path=x\n",
+ "# if final_pdb_path is None:\n",
+ "# final_pdb_path=listfiles[0]\n",
+ "structure_download_path = os.path.join(user_download_path,\"Output\")\n",
+ "final_pdb_path = os.path.join(structure_download_path,\"Refine_cycle3.pdb\")\n",
+ "if not os.path.exists(final_pdb_path):\n",
+ " final_pdb_path = os.path.join(structure_download_path,\"Refine_cycle2.pdb\")\n",
+ " if not os.path.exists(final_pdb_path):\n",
+ " final_pdb_path = os.path.join(structure_download_path,\"Refine_cycle1.pdb\")\n",
+ " if not os.path.exists(final_pdb_path):\n",
+ " final_pdb_path = os.path.join(user_download_path,\"CryoREAD_norefine.pdb\")\n",
+ " else:\n",
+ " listfiles=[x for x in os.listdir(structure_download_path) if \".pdb\" in x]\n",
+ " if len(listfiles)!=0:\n",
+ " final_pdb_path = os.path.join(structure_download_path,listfiles[0])\n",
+ " else:\n",
+ " print(\"we do not find any pdb output in %s\"%structure_download_path)\n",
+ " #return\n",
+ "print(\"visualize %s\"%final_pdb_path)\n",
+ "import py3Dmol\n",
+ "def show_pdb(output_pdb_path):\n",
+ " view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)\n",
+ " view.addModel(open(output_pdb_path,'r').read(),'pdb')\n",
+ " #view.setStyle({\"style\":\"sticks\"})\n",
+ " view.setStyle({'stick':{}})\n",
+ " #view.setStyle({'cartoon': {'spectrum': {'prop':'b','min':-1,'max':1}}})\n",
+ " view.zoomTo()\n",
+ " return view\n",
+ "if final_pdb_path is not None:\n",
+ " show_pdb(final_pdb_path).show()\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "K4cZPeW279Zx"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Citation: \n",
+ "\n",
+ "Xiao Wang, Genki Terashi & Daisuke Kihara. De novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nature Methods, 2023.\n",
+ "Paper\n",
+ "```\n",
+ "@article{xiao2022CryoREAD, \n",
+ " title={De novo structure modeling for nucleic acids in cryo-EM maps using deep learning}, \n",
+ " author={Xiao Wang, Genki Terashi, and Daisuke Kihara}, \n",
+ " journal={Nature Methods}, \n",
+ " year={2023} \n",
+ "} \n",
+ "```"
+ ],
+ "metadata": {
+ "id": "KvmAjHl2n_jB"
+ }
+ }
+ ]
+}
\ No newline at end of file
diff --git a/CryoREAD/CryoREAD_tutorial.pptx b/CryoREAD/CryoREAD_tutorial.pptx
new file mode 100644
index 0000000..5ac8cbb
Binary files /dev/null and b/CryoREAD/CryoREAD_tutorial.pptx differ
diff --git a/CryoREAD/README.md b/CryoREAD/README.md
new file mode 100644
index 0000000..1468699
--- /dev/null
+++ b/CryoREAD/README.md
@@ -0,0 +1,372 @@
+# CryoREAD
+
+
+
+
+
+
+
+
+
+Cryo_READ is a computational tool using deep learning to automatically build full DNA/RNA atomic structure from cryo-EM map.
+
+Copyright (C) 2022 Xiao Wang, Genki Terashi, Daisuke Kihara, and Purdue University.
+
+License: GPL v3. (If you are interested in a different license, for example, for commercial use, please contact us.)
+
+Contact: Daisuke Kihara (dkihara@purdue.edu)
+
+For technical problems or questions, please reach to Xiao Wang (wang3702@purdue.edu).
+
+## Citation:
+
+Xiao Wang, Genki Terashi & Daisuke Kihara. De novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nature Methods, 2023.
+[https://www.nature.com/articles/s41592-023-02032-5](https://www.nature.com/articles/s41592-023-02032-5)
+```
+@article{wang2023CryoREAD,
+ title={De novo structure modeling for nucleic acids in cryo-EM maps using deep learning},
+ author={Xiao Wang, Genki Terashi, and Daisuke Kihara},
+ journal={Nature Methods},
+ year={2023}
+}
+```
+
+## News
+Apr 2024: CryoREAD includes a new model to support DNA/RNA structure modeling for input maps from 5-10A. The model is trained with maps at resolution 5-10A, and it will be used in CryoREAD once the input resolution is 5-10A.
+
+# Online Platform:
+
+## Server(Recommended): https://em.kiharalab.org/algorithm/CryoREAD
+
+We have three publicly available platforms, which basically offer the same functionality.
+Input: cryo-EM map+sequence file (optional). Output: modeled structure. The input and output are the same across all platforms.
+
+
+### Google Colab: https://bit.ly/CryoREAD
+
+
+ Step-by-step instructions are available. Limited by redistribution constraints of Coot and Phenix, the structure here is not refined and may include atom clashes. If you want better structure, please use our [server](https://em.kiharalab.org/algorithm/CryoREAD) or Github. For free user, colab has 4-hour running time limit and may not work for large structure(>=1000 nucleotides).
+
+
+
+### Local installation with source code at Github
+
+Full code is available here and it is easier for user to modify to develop their own tools.
+
It provides two additional supports:
+
1. Detection Output: This option outputs probability values of detected phosphate, sugar, base, and base types, computed by deep learning, in the map, for users reference.
+
2. Refinement pipeline: structures from other source can be refined in the specified EM map.
+
+
+### Project website: https://kiharalab.org/emsuites
+### Detailed pipeline instructions can be found https://kiharalab.org/emsuites/cryoread.php
+### CryoREAD algorithm video (20 minutes): https://www.youtube.com/watch?v=p7Bpou2vL6o
+### For benchmark purpose, please check the [eval_code](https://github.com/kiharalab/CryoREAD#predicted-structure-evaluation) for predicted structure evaluation.
+
+## Introduction
+
+ Cryo_READ is a computational tool using deep learning to automatically build full DNA/RNA atomic structure from cryo-EM map.
+DNA and RNA play fundamental roles in various cellular processes, where the three-dimensional (3D) structure provides critical information to understand molecular mechanisms of their functions. Although an increasing number of structures of nucleic acids and their complexes with proteins are determined by cryogenic electron microscopy (cryo-EM), structure modeling for DNA and RNA is still often challenging particularly when the map is determined at sub-atomic resolution. Moreover, computational methods are sparse for nucleic acid structure modeling.
+
+Here, we developed a deep learning-based fully automated de novo DNA/RNA atomic structure modeling method, CryoREAD. CryoREAD identifies phosphate, sugar, and base positions in a cryo-EM map using deep learning, which are traced and modeled into a 3D structure. When tested on cryo-EM maps determined at 2.0 to 5.0 Å resolution, CryoREAD built substantially accurate models than existing methods. We have further applied the method on cryo-EM maps of biomolecular complexes in SARS-CoV-2.
+
+
+## Overall Protocol
+
+
+1) Structure Detection by deep neural network CryoREAD networks;
+2) Tracing backbone according to detections;
+3) Fragment-based nucleotide assignment;
+4) Full atomic structure modeling.
+
+
+
+
+
+
+
+## Installation
+
+
+### System Requirements
+CPU: >=8 cores
+Memory (RAM): >=50Gb. For maps with more than 3,000 nucleotides, memory space should be higher than 200GB if the sequence is provided.
+GPU: any GPU supports CUDA with at least 12GB memory.
+GPU is required for CryoREAD and no CPU version is available for CryoREAD since it is too slow.
+
+## Pre-required software
+### Required
+Python 3 : https://www.python.org/downloads/
+Phenix: https://phenix-online.org/documentation/install-setup-run.html
+Coot: https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/
+### Optional
+Pymol (for map visualization): https://pymol.org/2/
+Chimera (for map visualization): https://www.cgl.ucsf.edu/chimera/download.html
+
+## Installation
+### 1. [`Install git`](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
+### 2. Clone the repository in your computer
+```
+git clone https://github.com/kiharalab/CryoREAD.git && cd CryoREAD
+```
+
+### 3. Build dependencies.
+You have two options to install dependency on your computer:
+#### 3.2 Install with anaconda (Recommended)
+##### 3.2.1 [`install anaconda`](https://www.anaconda.com/download).
+##### 3.2.2 Install dependency in command line
+Make sure you are in the CryoREAD directory and then run
+```
+conda env create -f environment.yml
+```
+Each time when you want to run this software, simply activate the environment by
+```
+conda activate CryoREAD
+conda deactivate(If you want to exit)
+```
+
+
+#### 3.2 Install with pip and python (Not Suggested).
+##### 3.2.1[`install pip`](https://pip.pypa.io/en/stable/installing/).
+##### 3.2.2 Install dependency in command line.
+```
+pip3 install -r requirements.txt --user
+```
+If you encounter any errors, you can install each library one by one:
+```
+pip3 install biopython
+pip3 install numpy
+pip3 install numba
+pip3 install scipy
+pip3 install ortools
+pip3 install mrcfile
+pip3 install torch==1.6.0
+```
+
+
+
+#### 4 Verify the pre-installed software
+To verify phenix is correctly installed for final refinement step, please run
+```
+phenix.real_space_refine -h
+```
+To veryify coot is correctly installed for final refinement step, please run
+```commandline
+coot
+```
+If it can print out the help information of this function, then the refinemnt step of our program can be supported.
+**If not, please always remove --refine command line in all the commands, then CryoREAD should output structure without refinement.**
+
+
+
+
+
+
+
+# Usage
+
+### Command
+
+Command Parameters
+
+```
+usage: main.py [-h] [-F F] [-M M] [-P P] --mode MODE [--contour CONTOUR] [--stride STRIDE] [--box_size BOX_SIZE] [--gpu GPU] [--batch_size BATCH_SIZE] [-f F] [-m M]
+ [-g G] [-k K] [-R R] [--rule_soft RULE_SOFT] [--frag_size FRAG_SIZE] [--frag_stride FRAG_STRIDE] [--top_select TOP_SELECT] [--resolution RESOLUTION]
+ [--num_workers NUM_WORKERS] [--prediction_only PREDICTION_ONLY] [--no_seqinfo NO_SEQINFO]
+
+optional arguments:
+ -h, --help show this help message and exit
+ -F F Input map file path. (str)
+ -M M Pre-trained model path. (str) Default value: "best_model". If you want to reproduce the results in our paper, your can specify "best_model_paper". Here the default path is the new model trained on the entire dataset.
+ -P P Optional fasta sequence file path. (str)
+ --mode MODE Control Mode for program: 0: cryo_READ structure modeling. Required parameter. (Integer), Default value: 0
+ --contour CONTOUR Contour level for input map, suggested 0.5*[author_contour]. (Float), Default value: 0.0
+ --stride STRIDE Stride for scanning of deep learning model. (Integer), Default value: 16.
+ --box_size BOX_SIZE Input box size for deep learning model. (Integer), Default value: 64
+ --gpu GPU Specify the gpu we will use. (str), Default value: None.
+ --batch_size BATCH_SIZE
+ Batch size for inference of network. (Integer), Default value: 4.
+ -f F Filter for representative points, for LDPs, removing points' normalized density<=-f (Float), Default value: 0.05
+ -m M After meanshifting merge points distance<[float]. (Float), Default value: 2.0.
+ -g G Bandwidth of the Gaussian filter, (Float), Default value: 3.0.
+ -k K Always keep edges where d
+
+
+
+### Build an atomic structure without sequence information
+
+ DNA/RNA structure modeling without FASTA sequence
+
+```
+python3 main.py --mode=0 -F=[Map_Path] -M=[Model_Path] --contour=[half_contour_level] --gpu=[GPU_ID] --batch_size=[batch_size] --resolution=[Map_Resolution] --no_seqinfo --refine
+```
+[Map_Path] is the path of the experimental cryo-EM map,
+
[Model_Path] is the path of our pre-trained deep learning model, (If you want to reproduce the results in our paper, your can specify "best_model_paper". Here the default path is the new model trained on the entire dataset.)
+
[half_contour_level] is 0.5* contour_level (suggested by author) to remove outside regions to save processing time,
+
[GPU_ID] specifies the gpu used for inference,
+
[batch_size] is the number of examples per batch in the inference (we used 8 with a 24GB GPU),
+
[Map_Resolution] is the resolution of the deposited maps.
+
+"--refine" should be removed if you can not successfully install Phenix/coot correctly, which may result in nucleotides that do not satisfy some geometry and chemical constraints.
+
+The automatically build atomic structure is saved in [Predict_Result/(map-name)/CryoREAD_noseq.pdb] (or [Predict_Result/(map-name)/CryoREAD_norefine.pdb] if you do not add --refine param) in pdb format. You can also add ```--output=[your_directory]``` to specify the output directory. Then the output will be saved in [output_dir/CryoREAD_noseq.pdb].
+
+#### Example Command:
+```
+python3 main.py --mode=0 -F=example/21051.mrc -M=best_model --contour=0.3 --gpu=0 --batch_size=4 --resolution=3.7 --no_seqinfo --refine
+```
+
+
+
+### Build an atomic structure with sequence information
+
+
+ DNA/RNA structure modeling with FASTA sequence
+
+```
+python3 main.py --mode=0 -F=[Map_Path] -M=[Model_Path] -P=[Fasta_Path] --contour=[half_contour_level] --gpu=[GPU_ID] --batch_size=[batch_size] --rule_soft=[assignment_rule] --resolution=[Map_Resolution] --refine --thread=[num_threads]
+```
+[Map_Path] is the path of the experimental cryo-EM map,
+
[Model_Path] is the path of our pre-trained deep learning model, (If you want to reproduce the results in our paper, your can specify "best_model_paper". Here the default path is the new model trained on the entire dataset.)
+
[Fasta_Path] is the path of the input fasta file about sequence information,
+
[half_contour_level] is 0.5* contour_level (suggested by author) to remove outside regions to save processing time,
+
[GPU_ID] specifies the gpu used for inference,
+
[batch_size] is the number of examples per batch in the inference (we used 8 with a 24GB GPU),
+
[rule_soft] specifies the assignment rule, default is 0 to use the strict assignment assembling rule,
+
[Map_Resolution] is the resolution of the deposited maps.
+
[num_thread] specifies the number of CPUs used for fragment-based sequence assignment.
+
+"--refine" should be removed if you can not successfully install Phenix/coot correctly,which may result in nucleotides that do not satisfy some geometry and chemical constraints.
+
+
+
+#### Example Command:
+```
+python3 main.py --mode=0 -F=example/21051.mrc -M=best_model -P=example/21051.fasta --contour=0.3 --gpu=0 --batch_size=4 --rule_soft=0 --resolution=3.7 --refine --thread 4
+```
+The automatically build atomic structure is saved in [Predict_Result/(map-name)/CryoREAD.pdb] in pdb format (or [Predict_Result/(map-name)/CryoREAD_norefine.pdb] if you do not add --refine param). Modeled structures without considering sequence information are also saved as [Predict_Result/(map-name)/Output/CryoREAD_noseq.pdb] (without refinement). Meanwhile, structures only considering the sequence information without connecting gap regions are saved in [Predict_Result/(map-name)/Output/CryoREAD_seqonly.pdb] (without refinement) for reference.
+
Please adjust --thread based on your available CPU numbers (more is better).
+
+
+
+### Structure Information Detection by CryoREAD
+
+
+CryoREAD detection (if you only want to check detection by deep learning)
+
+```
+python3 main.py --mode=0 -F=[Map_Path] -M=[Model_Path] --contour=[half_contour_level] --gpu=[GPU_ID] --batch_size=[batch_size] --prediction_only
+```
+[Map_Path] is the path of the experimental cryo-EM map,
+
[Model_Path] is the path of our pre-trained deep learning model, (If you want to reproduce the results in our paper, your can specify "best_model_paper". Here the default path is the new model trained on the entire dataset.)
+
[half_contour_level] is 0.5* contour_level (suggested by author) to remove outside regions to save processing time,
+
[GPU_ID] specifies the gpu used for inference,
+
[batch_size] is the number of examples per batch in the inference (we used 8 with a 24GB GPU).
+
+The predicted probability maps are saved in [Predict_Result/(map_name)/2nd_stage_detection] with mrc format. It will include 8 mrc files corresponding to 8 different classes.
+
+#### Example Command:
+```
+python3 main.py --mode=0 -F=example/21051.mrc -M=best_model --contour=0.3 --gpu=0 --batch_size=4 --prediction_only
+```
+
+
+### Structure refinement
+
+
+Structure Refinement Pipeline in CryoREAD (if you only want to refine your structure)
+
+The full refinement pipeline involving Phenix and coot is also available for refinement-only purposes.
+```
+python3 main.py --mode=1 -F=[input_structure_pdb] -M=[input_map_path] -P=[output_dir] --resolution=[resolution]
+```
+This refinement pipeline can work for any given structure (not limited to DNA/RNA) and a corresponding map.
+
[input_structure_pdb] is the path of the input structure in pdb format,
+
[input_map_path] corresponds to the input map path.
+
[output_dir]: the directory you specify to save the outputs during refinement process. The final output Refine_cycle3.pdb will be generated in this directory.
+
[resolution] is the resolution of the deposited maps.
+
+#### Example Command:
+```
+python3 main.py --mode=1 -F=example/6v5b_drna.pdb -M=example/21051.mrc -P=refine_test --resolution=3.7
+```
+This will refine the input structure according to density and output the refined structure in [refine_test] directory.
+
+
+
+### Predicted Structure Evaluation
+
+
+Structure Evaluation in CryoREAD (if you only want to compare your predicted structure with native structure)
+
+We provided the evaluation pipeline used in CryoREAD, use 5Å as a cutoff. Different from Phenix, we matched the nucleotides based on the average distance of the backbone atoms of nucleotides instead of only using P atoms, which is not so stable and accurate.
+
+To use our evaluation pipeline, please use
+```
+python3 main.py --mode=2 -F=predicted.cif[.pdb] -M=target.cif[.pdb]
+```
+Here -F takes the predicted pdb/cif file and -M takes the native pdb/cif files. The example output is as follows
+```
+****************************************************************************************************
+Atom Coverage: 0.940 Atom Precision: 0.872
+Sequence Recall(Match): 0.600 Sequence Precision(Match): 0.562
+Sequence Recall: 0.564 Sequence Precision: 0.490
+RMSD: 3.010
+****************************************************************************************************
+```
+All the reported metrics in our paper can be calculated.
+
+Atom Coverage is computed as the fraction of sugar atoms (C1′, C2′, O2′ for RNA, C3′, O3′, C4′, O4′, C5′) and phosphate atoms (P, OP1, OP2, O5′, OP3′) that were closer than 5 Å to a matched sugar or phosphate node for each nucleotide, which were then averaged over all the nucleotides in the map.
+Atom Precision is computed as the fraction of nodes within 5 Å to the corresponding atoms of its closest nucleotides.
+
+A cutoff of 5 Å was used because it is shorter than the average distance between adjacent phosphate atoms, which is 6.0 Å.
+
+Sequence recall is computed by identifying a nucleotide in the model that corresponds to each nucleotide in the reference structure. This identification is done by assigning the nucleotide in the model that has the closest average atom distance to each nucleotide in the reference structure. Then, it is determined whether the bases are identical. Finally, the fraction of identical bases over all the bases in the reference structure is computed.
+
+Sequence recall (match) only considers nucleotides in the reference structure that have a corresponding nucleotide in the model (an average atom pair distance of less than 5 Å).
+
+Sequence precision is defined as the fraction of the identical bases over all the bases in the model.
+
+Sequence precision (match) only considers nucleotides in the model structure that have a corresponding nucleotide in the reference (an average atom pair distance of less than 5 Å).
+
+RMSD is calculated between backbone atoms of nucleotides in the model structure that have a corresponding nucleotide in the reference.
+
+For a comprehensive assessment, it is important to use multiple metrics to evaluate the accuracy of DNA/RNA structure modeling. Each metric provides different aspects of the modeling performance.
+
+
+
+## Example
+
+
+
+### Input File
+Cryo-EM map with mrc format.
+(Optional) Sequence information with fasta format.
+Our example input can be found [here](https://github.com/kiharalab/CryoREAD/tree/main/example)
+
+### Output File
+1 *.mrc: a mrc file saved our detected probabilites by our deep learning model.
+2 *.pdb: a PDB file that stores the atomic DNA/RNA structure by our method.
+Our example output can be found [here](https://kiharalab.org/emsuites/cryoread/output_21051.tar.gz). All the intermediate results are also kept here.
+
diff --git a/CryoREAD/atomic/A.pdb b/CryoREAD/atomic/A.pdb
new file mode 100644
index 0000000..869b19c
--- /dev/null
+++ b/CryoREAD/atomic/A.pdb
@@ -0,0 +1,25 @@
+ATOM 1 OP3 A A 1 2.890 -0.774 -6.042 0.00 0.00 O
+ATOM 2 P A A 1 1.779 0.230 -5.452 0.00 0.00 P
+ATOM 3 OP1 A A 1 2.388 1.557 -5.217 0.00 0.00 O
+ATOM 4 OP2 A A 1 0.572 0.372 -6.507 0.00 0.00 O
+ATOM 5 O5' A A 1 1.211 -0.353 -4.063 0.00 0.00 O
+ATOM 6 C5' A A 1 0.235 0.576 -3.592 0.00 0.00 C
+ATOM 7 C4' A A 1 -0.346 0.080 -2.267 0.00 0.00 C
+ATOM 8 O4' A A 1 0.691 -0.016 -1.267 0.00 0.00 O
+ATOM 9 C3' A A 1 -1.350 1.106 -1.698 0.00 0.00 C
+ATOM 10 O3' A A 1 -2.690 0.727 -2.016 0.00 0.00 O
+ATOM 11 C2' A A 1 -1.119 1.051 -0.171 0.00 0.00 C
+ATOM 12 O2' A A 1 -2.310 0.638 0.502 0.00 0.00 O
+ATOM 13 C1' A A 1 0.000 -0.000 -0.000 0.00 0.00 C
+ATOM 14 N9 A A 1 0.913 0.396 1.074 0.00 0.00 N
+ATOM 15 C8 A A 1 2.020 1.180 0.943 0.00 0.00 C
+ATOM 16 N7 A A 1 2.598 1.330 2.099 0.00 0.00 N
+ATOM 17 C5 A A 1 1.898 0.659 3.044 0.00 0.00 C
+ATOM 18 C6 A A 1 2.045 0.458 4.427 0.00 0.00 C
+ATOM 19 N6 A A 1 3.099 1.031 5.117 0.00 0.00 N
+ATOM 20 N1 A A 1 1.146 -0.289 5.058 0.00 0.00 N
+ATOM 21 C2 A A 1 0.138 -0.839 4.407 0.00 0.00 C
+ATOM 22 N3 A A 1 -0.037 -0.684 3.112 0.00 0.00 N
+ATOM 23 C4 A A 1 0.811 0.047 2.397 0.00 0.00 C
+TER 24 A A 1
+END
diff --git a/CryoREAD/atomic/C.pdb b/CryoREAD/atomic/C.pdb
new file mode 100644
index 0000000..c523792
--- /dev/null
+++ b/CryoREAD/atomic/C.pdb
@@ -0,0 +1,23 @@
+ATOM 1 OP3 C A 1 2.890 -0.774 -6.042 0.00 0.00 O
+ATOM 2 P C A 1 1.779 0.230 -5.452 0.00 0.00 P
+ATOM 3 OP1 C A 1 2.388 1.558 -5.216 0.00 0.00 O
+ATOM 4 OP2 C A 1 0.571 0.372 -6.507 0.00 0.00 O
+ATOM 5 O5' C A 1 1.211 -0.353 -4.063 0.00 0.00 O
+ATOM 6 C5' C A 1 0.235 0.576 -3.592 0.00 0.00 C
+ATOM 7 C4' C A 1 -0.347 0.080 -2.268 0.00 0.00 C
+ATOM 8 O4' C A 1 0.691 -0.017 -1.268 0.00 0.00 O
+ATOM 9 C3' C A 1 -1.350 1.106 -1.698 0.00 0.00 C
+ATOM 10 O3' C A 1 -2.690 0.726 -2.017 0.00 0.00 O
+ATOM 11 C2' C A 1 -1.118 1.051 -0.170 0.00 0.00 C
+ATOM 12 O2' C A 1 -2.310 0.639 0.502 0.00 0.00 O
+ATOM 13 C1' C A 1 0.000 -0.000 -0.000 0.00 0.00 C
+ATOM 14 N1 C A 1 0.913 0.396 1.074 0.00 0.00 N
+ATOM 15 C2 C A 1 1.561 1.571 1.001 0.00 0.00 C
+ATOM 16 O2 C A 1 1.381 2.297 0.037 0.00 0.00 O
+ATOM 17 N3 C A 1 2.398 1.958 1.963 0.00 0.00 N
+ATOM 18 C4 C A 1 2.613 1.188 3.022 0.00 0.00 C
+ATOM 19 N4 C A 1 3.479 1.595 4.009 0.00 0.00 N
+ATOM 20 C5 C A 1 1.952 -0.053 3.126 0.00 0.00 C
+ATOM 21 C6 C A 1 1.100 -0.428 2.143 0.00 0.00 C
+TER 22 C A 1
+END
diff --git a/CryoREAD/atomic/DA.pdb b/CryoREAD/atomic/DA.pdb
new file mode 100644
index 0000000..06f45cc
--- /dev/null
+++ b/CryoREAD/atomic/DA.pdb
@@ -0,0 +1,24 @@
+ATOM 1 OP3 A A 1 2.890 -0.774 -6.042 0.00 0.00 O
+ATOM 2 P A A 1 1.779 0.230 -5.452 0.00 0.00 P
+ATOM 3 OP1 A A 1 2.388 1.557 -5.217 0.00 0.00 O
+ATOM 4 OP2 A A 1 0.572 0.372 -6.507 0.00 0.00 O
+ATOM 5 O5' A A 1 1.211 -0.353 -4.063 0.00 0.00 O
+ATOM 6 C5' A A 1 0.235 0.576 -3.592 0.00 0.00 C
+ATOM 7 C4' A A 1 -0.346 0.080 -2.267 0.00 0.00 C
+ATOM 8 O4' A A 1 0.691 -0.016 -1.267 0.00 0.00 O
+ATOM 9 C3' A A 1 -1.350 1.106 -1.698 0.00 0.00 C
+ATOM 10 O3' A A 1 -2.690 0.727 -2.016 0.00 0.00 O
+ATOM 11 C2' A A 1 -1.119 1.051 -0.171 0.00 0.00 C
+ATOM 13 C1' A A 1 0.000 -0.000 -0.000 0.00 0.00 C
+ATOM 14 N9 A A 1 0.913 0.396 1.074 0.00 0.00 N
+ATOM 15 C8 A A 1 2.020 1.180 0.943 0.00 0.00 C
+ATOM 16 N7 A A 1 2.598 1.330 2.099 0.00 0.00 N
+ATOM 17 C5 A A 1 1.898 0.659 3.044 0.00 0.00 C
+ATOM 18 C6 A A 1 2.045 0.458 4.427 0.00 0.00 C
+ATOM 19 N6 A A 1 3.099 1.031 5.117 0.00 0.00 N
+ATOM 20 N1 A A 1 1.146 -0.289 5.058 0.00 0.00 N
+ATOM 21 C2 A A 1 0.138 -0.839 4.407 0.00 0.00 C
+ATOM 22 N3 A A 1 -0.037 -0.684 3.112 0.00 0.00 N
+ATOM 23 C4 A A 1 0.811 0.047 2.397 0.00 0.00 C
+TER 24 A A 1
+END
diff --git a/CryoREAD/atomic/DC.pdb b/CryoREAD/atomic/DC.pdb
new file mode 100644
index 0000000..b8bd71f
--- /dev/null
+++ b/CryoREAD/atomic/DC.pdb
@@ -0,0 +1,22 @@
+ATOM 1 OP3 C A 1 2.890 -0.774 -6.042 0.00 0.00 O
+ATOM 2 P C A 1 1.779 0.230 -5.452 0.00 0.00 P
+ATOM 3 OP1 C A 1 2.388 1.558 -5.216 0.00 0.00 O
+ATOM 4 OP2 C A 1 0.571 0.372 -6.507 0.00 0.00 O
+ATOM 5 O5' C A 1 1.211 -0.353 -4.063 0.00 0.00 O
+ATOM 6 C5' C A 1 0.235 0.576 -3.592 0.00 0.00 C
+ATOM 7 C4' C A 1 -0.347 0.080 -2.268 0.00 0.00 C
+ATOM 8 O4' C A 1 0.691 -0.017 -1.268 0.00 0.00 O
+ATOM 9 C3' C A 1 -1.350 1.106 -1.698 0.00 0.00 C
+ATOM 10 O3' C A 1 -2.690 0.726 -2.017 0.00 0.00 O
+ATOM 11 C2' C A 1 -1.118 1.051 -0.170 0.00 0.00 C
+ATOM 13 C1' C A 1 0.000 -0.000 -0.000 0.00 0.00 C
+ATOM 14 N1 C A 1 0.913 0.396 1.074 0.00 0.00 N
+ATOM 15 C2 C A 1 1.561 1.571 1.001 0.00 0.00 C
+ATOM 16 O2 C A 1 1.381 2.297 0.037 0.00 0.00 O
+ATOM 17 N3 C A 1 2.398 1.958 1.963 0.00 0.00 N
+ATOM 18 C4 C A 1 2.613 1.188 3.022 0.00 0.00 C
+ATOM 19 N4 C A 1 3.479 1.595 4.009 0.00 0.00 N
+ATOM 20 C5 C A 1 1.952 -0.053 3.126 0.00 0.00 C
+ATOM 21 C6 C A 1 1.100 -0.428 2.143 0.00 0.00 C
+TER 22 C A 1
+END
diff --git a/CryoREAD/atomic/DG.pdb b/CryoREAD/atomic/DG.pdb
new file mode 100644
index 0000000..3f4f02d
--- /dev/null
+++ b/CryoREAD/atomic/DG.pdb
@@ -0,0 +1,25 @@
+ATOM 1 OP3 G A 1 2.890 -0.774 -6.041 0.00 0.00 O
+ATOM 2 P G A 1 1.779 0.230 -5.452 0.00 0.00 P
+ATOM 3 OP1 G A 1 2.387 1.557 -5.218 0.00 0.00 O
+ATOM 4 OP2 G A 1 0.571 0.373 -6.507 0.00 0.00 O
+ATOM 5 O5' G A 1 1.211 -0.353 -4.063 0.00 0.00 O
+ATOM 6 C5' G A 1 0.235 0.576 -3.592 0.00 0.00 C
+ATOM 7 C4' G A 1 -0.346 0.080 -2.267 0.00 0.00 C
+ATOM 8 O4' G A 1 0.691 -0.016 -1.267 0.00 0.00 O
+ATOM 9 C3' G A 1 -1.349 1.106 -1.698 0.00 0.00 C
+ATOM 10 O3' G A 1 -2.690 0.727 -2.017 0.00 0.00 O
+ATOM 11 C2' G A 1 -1.118 1.051 -0.170 0.00 0.00 C
+ATOM 13 C1' G A 1 0.000 -0.000 -0.000 0.00 0.00 C
+ATOM 14 N9 G A 1 0.913 0.396 1.074 0.00 0.00 N
+ATOM 15 C8 G A 1 2.021 1.181 0.942 0.00 0.00 C
+ATOM 16 N7 G A 1 2.598 1.330 2.099 0.00 0.00 N
+ATOM 17 C5 G A 1 1.896 0.651 3.040 0.00 0.00 C
+ATOM 18 C6 G A 1 2.047 0.459 4.433 0.00 0.00 C
+ATOM 19 O6 G A 1 2.978 0.966 5.036 0.00 0.00 O
+ATOM 20 N1 G A 1 1.131 -0.295 5.079 0.00 0.00 N
+ATOM 21 C2 G A 1 0.100 -0.862 4.393 0.00 0.00 C
+ATOM 22 N2 G A 1 -0.811 -1.629 5.075 0.00 0.00 N
+ATOM 23 N3 G A 1 -0.052 -0.691 3.100 0.00 0.00 N
+ATOM 24 C4 G A 1 0.812 0.048 2.394 0.00 0.00 C
+TER 25 G A 1
+END
diff --git a/CryoREAD/atomic/DT.pdb b/CryoREAD/atomic/DT.pdb
new file mode 100644
index 0000000..8121105
--- /dev/null
+++ b/CryoREAD/atomic/DT.pdb
@@ -0,0 +1,23 @@
+ATOM 1 OP3 U A 1 -3.912 -2.311 1.636 0.00 0.00 O
+ATOM 2 P U A 1 -3.968 -1.665 3.118 0.00 0.00 P
+ATOM 3 OP1 U A 1 -4.406 -2.599 4.208 0.00 0.00 O
+ATOM 4 OP2 U A 1 -4.901 -0.360 2.920 0.00 0.00 O
+ATOM 5 O5' U A 1 -2.493 -1.028 3.315 0.00 0.00 O
+ATOM 6 C5' U A 1 -2.005 -0.136 2.327 0.00 0.00 C
+ATOM 7 C4' U A 1 -0.611 0.328 2.728 0.00 0.00 C
+ATOM 8 O4' U A 1 0.247 -0.829 2.764 0.00 0.00 O
+ATOM 9 C3' U A 1 0.008 1.286 1.720 0.00 0.00 C
+ATOM 10 O3' U A 1 0.965 2.121 2.368 0.00 0.00 O
+ATOM 11 C2' U A 1 0.710 0.360 0.754 0.00 0.00 C
+ATOM 13 C1' U A 1 1.157 -0.778 1.657 0.00 0.00 C
+ATOM 14 N1 U A 1 1.164 -2.047 0.989 0.00 0.00 N
+ATOM 15 C2 U A 1 2.333 -2.544 0.374 0.00 0.00 C
+ATOM 16 O2 U A 1 3.410 -1.945 0.363 0.00 0.00 O
+ATOM 17 N3 U A 1 2.194 -3.793 -0.240 0.00 0.00 N
+ATOM 18 C4 U A 1 1.047 -4.570 -0.300 0.00 0.00 C
+ATOM 19 O4 U A 1 0.995 -5.663 -0.857 0.00 0.00 O
+ATOM 20 C5 U A 1 -0.143 -3.980 0.369 0.00 0.00 C
+ATOM 21 C7 U A 1 -1.420 -4.757 0.347 0.00 0.00 C
+ATOM 22 C6 U A 1 -0.013 -2.784 0.958 0.00 0.00 C
+TER 23 U A 1
+END
diff --git a/CryoREAD/atomic/G.pdb b/CryoREAD/atomic/G.pdb
new file mode 100644
index 0000000..727bdd0
--- /dev/null
+++ b/CryoREAD/atomic/G.pdb
@@ -0,0 +1,26 @@
+ATOM 1 OP3 G A 1 2.890 -0.774 -6.041 0.00 0.00 O
+ATOM 2 P G A 1 1.779 0.230 -5.452 0.00 0.00 P
+ATOM 3 OP1 G A 1 2.387 1.557 -5.218 0.00 0.00 O
+ATOM 4 OP2 G A 1 0.571 0.373 -6.507 0.00 0.00 O
+ATOM 5 O5' G A 1 1.211 -0.353 -4.063 0.00 0.00 O
+ATOM 6 C5' G A 1 0.235 0.576 -3.592 0.00 0.00 C
+ATOM 7 C4' G A 1 -0.346 0.080 -2.267 0.00 0.00 C
+ATOM 8 O4' G A 1 0.691 -0.016 -1.267 0.00 0.00 O
+ATOM 9 C3' G A 1 -1.349 1.106 -1.698 0.00 0.00 C
+ATOM 10 O3' G A 1 -2.690 0.727 -2.017 0.00 0.00 O
+ATOM 11 C2' G A 1 -1.118 1.051 -0.170 0.00 0.00 C
+ATOM 12 O2' G A 1 -2.310 0.638 0.502 0.00 0.00 O
+ATOM 13 C1' G A 1 0.000 -0.000 -0.000 0.00 0.00 C
+ATOM 14 N9 G A 1 0.913 0.396 1.074 0.00 0.00 N
+ATOM 15 C8 G A 1 2.021 1.181 0.942 0.00 0.00 C
+ATOM 16 N7 G A 1 2.598 1.330 2.099 0.00 0.00 N
+ATOM 17 C5 G A 1 1.896 0.651 3.040 0.00 0.00 C
+ATOM 18 C6 G A 1 2.047 0.459 4.433 0.00 0.00 C
+ATOM 19 O6 G A 1 2.978 0.966 5.036 0.00 0.00 O
+ATOM 20 N1 G A 1 1.131 -0.295 5.079 0.00 0.00 N
+ATOM 21 C2 G A 1 0.100 -0.862 4.393 0.00 0.00 C
+ATOM 22 N2 G A 1 -0.811 -1.629 5.075 0.00 0.00 N
+ATOM 23 N3 G A 1 -0.052 -0.691 3.100 0.00 0.00 N
+ATOM 24 C4 G A 1 0.812 0.048 2.394 0.00 0.00 C
+TER 25 G A 1
+END
diff --git a/CryoREAD/atomic/U.pdb b/CryoREAD/atomic/U.pdb
new file mode 100644
index 0000000..dd2e4ca
--- /dev/null
+++ b/CryoREAD/atomic/U.pdb
@@ -0,0 +1,23 @@
+ATOM 1 OP3 U A 1 2.891 -0.774 -6.042 0.00 0.00 O
+ATOM 2 P U A 1 1.779 0.230 -5.452 0.00 0.00 P
+ATOM 3 OP1 U A 1 2.389 1.557 -5.217 0.00 0.00 O
+ATOM 4 OP2 U A 1 0.573 0.372 -6.508 0.00 0.00 O
+ATOM 5 O5' U A 1 1.211 -0.353 -4.063 0.00 0.00 O
+ATOM 6 C5' U A 1 0.235 0.577 -3.593 0.00 0.00 C
+ATOM 7 C4' U A 1 -0.348 0.080 -2.270 0.00 0.00 C
+ATOM 8 O4' U A 1 0.690 -0.016 -1.267 0.00 0.00 O
+ATOM 9 C3' U A 1 -1.352 1.104 -1.697 0.00 0.00 C
+ATOM 10 O3' U A 1 -2.692 0.721 -2.011 0.00 0.00 O
+ATOM 11 C2' U A 1 -1.118 1.052 -0.172 0.00 0.00 C
+ATOM 12 O2' U A 1 -2.308 0.639 0.504 0.00 0.00 O
+ATOM 13 C1' U A 1 -0.001 -0.001 0.001 0.00 0.00 C
+ATOM 14 N1 U A 1 0.913 0.397 1.076 0.00 0.00 N
+ATOM 15 C2 U A 1 1.587 1.555 0.981 0.00 0.00 C
+ATOM 16 O2 U A 1 1.429 2.264 0.007 0.00 0.00 O
+ATOM 17 N3 U A 1 2.429 1.947 1.957 0.00 0.00 N
+ATOM 18 C4 U A 1 2.616 1.171 3.045 0.00 0.00 C
+ATOM 19 O4 U A 1 3.381 1.523 3.925 0.00 0.00 O
+ATOM 20 C5 U A 1 1.915 -0.053 3.153 0.00 0.00 C
+ATOM 21 C6 U A 1 1.073 -0.418 2.163 0.00 0.00 C
+TER 22 U A 1
+END
diff --git a/CryoREAD/atomic/__init__.py b/CryoREAD/atomic/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/CryoREAD/atomic/geo_atomic_modeling.py b/CryoREAD/atomic/geo_atomic_modeling.py
new file mode 100644
index 0000000..59b63be
--- /dev/null
+++ b/CryoREAD/atomic/geo_atomic_modeling.py
@@ -0,0 +1,288 @@
+from graph.LDP_ops import permute_point_coord_to_global_coord,permute_global_coord_to_point_coord
+
+
+_base_atoms = ["OP3","P", "OP1", "OP2", "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]
+residue_atoms = {
+ 'A': ["N9", "C8", "N7", "C5", "C6", "N6", "N1", "C2", "N3", "C4"],
+ 'G': ["N9", "C8", "N7", "C5", "C6", "O6", "N1", "C2", "N2", "N3", "C4"],
+ 'C': ["N1", "C2", "O2", "N3", "C4", "N4", "C5", "C6"],
+ 'U': ["N1", "C2", "O2", "N3", "C4", "O4", "C5", "C6"],
+}
+_pho_atoms=["OP3", "P", "OP1", "OP2", "O5'",]
+_sugar_atoms = [ "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]
+_backbone_atoms=['P',"O5'","C5'", "C4'","C3'", "O3'" ]
+import numpy as np
+def calculate_point_distance(a,b):
+
+ dist= np.linalg.norm(np.array(a)-np.array(b))
+ return dist
+def calculate_cosine_angle(sugar_location,point_pho_location,point_next_pho_location):
+ dist1 = calculate_point_distance(sugar_location,point_pho_location)
+ dist2 = calculate_point_distance(point_pho_location,point_next_pho_location)
+ dist3 = calculate_point_distance(sugar_location,point_next_pho_location)
+ cos_value = (dist1**2+dist2**2-dist3**2)/(2*dist1*dist2)
+ return cos_value
+def read_refer_dict(pdb_path):
+ pho_origin_dict = {}
+ sugar_origin_dict = {}
+ base_origin_dict = {}
+ base_location_list=[]
+ all_location_dict ={}
+ with open(pdb_path,'r') as file:
+ for line in file:
+ if len(line)>4 and line[:4]=="ATOM":
+ atom_name=line[13:16]
+ atom_name = atom_name.replace(" ","")
+ x = float(line[30:38])
+ y = float(line[38:46])
+ z = float(line[46:54])
+ location = [x,y,z]
+ if atom_name in _pho_atoms:
+ pho_origin_dict[atom_name]=location
+ elif atom_name in _sugar_atoms:
+ sugar_origin_dict[atom_name]=location
+ else:
+ base_origin_dict[atom_name]=location
+ base_location_list.append(location)
+ all_location_dict[atom_name]=location
+ backbone_location_list =[]
+ for atom_name in _backbone_atoms:
+ if atom_name in pho_origin_dict:
+ backbone_location_list.append(pho_origin_dict[atom_name])
+ elif atom_name in sugar_origin_dict:
+ backbone_location_list.append(sugar_origin_dict[atom_name])
+ #calculate direction vector
+ phosphate_location = np.array(pho_origin_dict['P'])
+
+ backbone_location_list = np.array(backbone_location_list)
+ base_center_location = np.array(base_location_list)
+ base_center_location = np.mean(base_center_location,axis=0)
+ bpp_cos_values = calculate_cosine_angle(base_center_location,backbone_location_list[0],backbone_location_list[-1])
+ bp1_distance = calculate_point_distance(base_center_location,backbone_location_list[0])
+ p1_p2_distance = calculate_point_distance(backbone_location_list[0],backbone_location_list[-1])
+ p1_p2_vector = backbone_location_list[-1]-backbone_location_list[0]
+ orthognal_location = backbone_location_list[0]+p1_p2_vector*(bpp_cos_values*bp1_distance)/p1_p2_distance
+ p1p2_to_base_direction = base_center_location-orthognal_location
+ orthognal_ratio = (bpp_cos_values*bp1_distance)/p1_p2_distance
+ p1_to_base_direction = base_center_location- phosphate_location
+ return all_location_dict, backbone_location_list,orthognal_ratio,p1p2_to_base_direction,p1_to_base_direction
+standard_location_dict={}
+standard_location_dict['A']=read_refer_dict("atomic/A.pdb")
+standard_location_dict['C']=read_refer_dict("atomic/C.pdb")
+standard_location_dict['G']=read_refer_dict("atomic/G.pdb")
+standard_location_dict['U']=read_refer_dict("atomic/U.pdb")
+standard_location_dict['DA']=read_refer_dict("atomic/DA.pdb")
+standard_location_dict['DC']=read_refer_dict("atomic/DC.pdb")
+standard_location_dict['DG']=read_refer_dict("atomic/DG.pdb")
+standard_location_dict['DT']=read_refer_dict("atomic/DT.pdb")
+
+
+def read_refer_dict_sugar(pdb_path):
+ pho_origin_dict = {}
+ sugar_origin_dict = {}
+ base_origin_dict = {}
+ base_location_list=[]
+ all_location_dict ={}
+ with open(pdb_path,'r') as file:
+ for line in file:
+ if len(line)>4 and line[:4]=="ATOM":
+ atom_name=line[13:16]
+ atom_name = atom_name.replace(" ","")
+ x = float(line[30:38])
+ y = float(line[38:46])
+ z = float(line[46:54])
+ location = [x,y,z]
+ if atom_name in _pho_atoms:
+ pho_origin_dict[atom_name]=location
+ elif atom_name in _sugar_atoms:
+ sugar_origin_dict[atom_name]=location
+ else:
+ base_origin_dict[atom_name]=location
+ base_location_list.append(location)
+ all_location_dict[atom_name]=location
+ backbone_location_list =[]
+ for atom_name in _backbone_atoms:
+ if atom_name in pho_origin_dict:
+ backbone_location_list.append(pho_origin_dict[atom_name])
+ elif atom_name in sugar_origin_dict:
+ backbone_location_list.append(sugar_origin_dict[atom_name])
+ backbone_location_list = np.array(backbone_location_list)
+ sugar_location_list =[]
+ for atom_name in sugar_origin_dict:
+ sugar_location_list.append(sugar_origin_dict[atom_name])
+ sugar_location_list = np.array(sugar_location_list)
+ sugar_location_list = np.mean(sugar_location_list,axis=0)
+ base_center_location = np.array(base_location_list)
+ base_center_location = np.mean(base_center_location,axis=0)
+ p1_p2_vector = backbone_location_list[-1]-backbone_location_list[0]
+ next_sugar_location_list = sugar_location_list+p1_p2_vector
+
+
+ bss_cos_values = calculate_cosine_angle(base_center_location,sugar_location_list,next_sugar_location_list)
+ bs1_distance = calculate_point_distance(base_center_location,sugar_location_list)
+ s1_s2_distance = calculate_point_distance(sugar_location_list,next_sugar_location_list)
+ orthognal_location = sugar_location_list+p1_p2_vector*(bss_cos_values*bs1_distance)/s1_s2_distance
+ s1s2_to_base_direction = base_center_location-orthognal_location
+ orthognal_ratio = (bss_cos_values*bs1_distance)/s1_s2_distance
+ s1_to_base_direction = base_center_location- sugar_location_list
+ return all_location_dict, [sugar_location_list,next_sugar_location_list],orthognal_ratio,s1s2_to_base_direction,s1_to_base_direction
+
+
+sugar_standard_location_dict={}
+sugar_standard_location_dict['A']=read_refer_dict_sugar("atomic/A.pdb")
+sugar_standard_location_dict['C']=read_refer_dict_sugar("atomic/C.pdb")
+sugar_standard_location_dict['G']=read_refer_dict_sugar("atomic/G.pdb")
+sugar_standard_location_dict['U']=read_refer_dict_sugar("atomic/U.pdb")
+sugar_standard_location_dict['DA']=read_refer_dict_sugar("atomic/DA.pdb")
+sugar_standard_location_dict['DC']=read_refer_dict_sugar("atomic/DC.pdb")
+sugar_standard_location_dict['DG']=read_refer_dict_sugar("atomic/DG.pdb")
+sugar_standard_location_dict['DT']=read_refer_dict_sugar("atomic/DT.pdb")
+
+
+
+def rigid_transform_3D(A, B):
+ assert len(A) == len(B)
+
+ N = A.shape[0] # total points
+ centroid_A = np.mean(A, axis=0)
+ centroid_B = np.mean(B, axis=0)
+
+ # centre the points
+ AA = A - np.tile(centroid_A, (N, 1))
+ BB = B - np.tile(centroid_B, (N, 1))
+
+ H = np.matmul(np.transpose(AA),BB)
+ U, S, Vt = np.linalg.svd(H)
+ R = np.matmul(Vt.T, U.T)
+
+ # special reflection case
+ if np.linalg.det(R) < 0:
+ print("Reflection detected")
+ Vt[2, :] *= -1
+ R = np.matmul(Vt.T,U.T)
+
+ t = -np.matmul(R, centroid_A) + centroid_B
+ err = B - np.matmul(A,R.T) - t.reshape([1, 3])
+ return R, t,err
+
+
+def build_base_atomic_nuc_rigid(input_pho_position,next_pho_location, cur_base_location,nuc_type,map_info_list):
+ point_pho_location = permute_global_coord_to_point_coord(input_pho_position,map_info_list)
+ revert_back_pho_location = permute_point_coord_to_global_coord(point_pho_location,map_info_list)
+ for k in range(3):
+ assert abs(revert_back_pho_location[k]-input_pho_position[k])<=0.01
+ point_next_pho_location = permute_global_coord_to_point_coord(next_pho_location,map_info_list)
+ point_pho_location = np.array(point_pho_location)
+ point_next_pho_location = np.array(point_next_pho_location)
+ point_base_location = permute_global_coord_to_point_coord(cur_base_location,map_info_list)
+ cos_base_p1_p2 = calculate_cosine_angle(point_base_location,point_pho_location,point_next_pho_location)
+ base_p1_distance = calculate_point_distance(point_base_location,point_pho_location)
+ p1_p2_distance = calculate_point_distance(point_pho_location,point_next_pho_location)
+ orthognal_location = point_pho_location+(point_next_pho_location-point_pho_location)*(cos_base_p1_p2*base_p1_distance)/p1_p2_distance
+ p1p2_base_orthognal_direction = point_base_location - orthognal_location
+
+ current_standard_info = standard_location_dict[nuc_type]
+ all_location_dict,standard_backbone_location_list, standard_orthognal_ratio,standard_p1p2_to_base_direction,standard_p1_base_direction = current_standard_info
+ p1_p2_select_orthognal_location = point_pho_location + standard_orthognal_ratio*(point_next_pho_location-point_pho_location)
+ base_direction_adjust_ratio = calculate_point_distance([0,0,0],standard_p1p2_to_base_direction)/calculate_point_distance([0,0,0],p1p2_base_orthognal_direction)
+ p1_p2_adjust_base_location = p1_p2_select_orthognal_location+p1p2_base_orthognal_direction*base_direction_adjust_ratio
+
+ standard_align_vector = []
+ #standard_align_vector.append([0,0,0])
+ #backbone coord
+ #for k in range(len(standard_backbone_location_list)):
+ # standard_align_vector.append(standard_backbone_location_list[k]-standard_backbone_location_list[0])
+ #add base coord
+
+ #standard_align_vector.append(standard_backbone_location_list[-1]-standard_backbone_location_list[0])
+ #standard_align_vector.append(standard_p1_base_direction)
+ standard_align_vector.append(standard_backbone_location_list[0])
+ standard_align_vector.append(standard_backbone_location_list[-1])
+ standard_align_vector.append(standard_p1_base_direction+standard_backbone_location_list[0])
+ for k in range(len(standard_align_vector)):
+ standard_align_vector[k]=standard_align_vector[k]-standard_backbone_location_list[0]
+
+ standard_align_vector = np.array(standard_align_vector)
+
+
+ #build current vector
+ current_bpp_vector = []#[[0,0,0]]
+# divide_number = len(standard_backbone_location_list)-1
+# p1_p2_direction = point_next_pho_location-point_pho_location
+# for k in range(1,divide_number+1):
+# current_bpp_vector.append(p1_p2_direction*k/divide_number)
+# #add base coord
+# # current_bpp_vector.append(point_next_pho_location-point_pho_location)
+# current_bpp_vector.append(p1_p2_adjust_base_location-point_pho_location)
+ divide_number = len(standard_backbone_location_list)
+ point_final_o_location = point_pho_location+(divide_number-1)/divide_number*(point_next_pho_location-point_pho_location)
+ current_bpp_vector.append(point_pho_location)
+ current_bpp_vector.append(point_final_o_location)
+ current_bpp_vector.append(p1_p2_adjust_base_location)
+ for k in range(len(current_bpp_vector)):
+ current_bpp_vector[k]=current_bpp_vector[k]-point_pho_location
+ current_bpp_vector = np.array(current_bpp_vector)
+
+ rotation,translation,err = rigid_transform_3D(standard_align_vector,current_bpp_vector)
+ all_locations = np.array(list(all_location_dict.values()))
+ output_locations = np.matmul(all_locations, rotation.T) + translation.reshape([1, 3])
+ search_key_list = list(all_location_dict.keys())
+ pho_location_index = search_key_list.index("P")
+ cur_pho_location = output_locations[pho_location_index]
+ aligh_pho_shift = point_pho_location-cur_pho_location
+ output_locations = output_locations+aligh_pho_shift
+ output_locations = [permute_point_coord_to_global_coord(output_locations[k],map_info_list) for k in range(len(output_locations))]
+ return search_key_list,output_locations
+def build_base_atomic_nuc_rigid_sugar(input_pho_position,next_pho_location, cur_base_location,nuc_type,map_info_list):
+ point_pho_location = permute_global_coord_to_point_coord(input_pho_position,map_info_list)
+ revert_back_pho_location = permute_point_coord_to_global_coord(point_pho_location,map_info_list)
+ for k in range(3):
+ assert revert_back_pho_location[k]==input_pho_position[k]
+ point_next_pho_location = permute_global_coord_to_point_coord(next_pho_location,map_info_list)
+ point_pho_location = np.array(point_pho_location)
+ point_next_pho_location = np.array(point_next_pho_location)
+ point_base_location = permute_global_coord_to_point_coord(cur_base_location,map_info_list)
+ cos_base_p1_p2 = calculate_cosine_angle(point_base_location,point_pho_location,point_next_pho_location)
+ base_p1_distance = calculate_point_distance(point_base_location,point_pho_location)
+ p1_p2_distance = calculate_point_distance(point_pho_location,point_next_pho_location)
+ orthognal_location = point_pho_location+(point_next_pho_location-point_pho_location)*(cos_base_p1_p2*base_p1_distance)/p1_p2_distance
+ #here actual is the sugar line to base direction
+ p1p2_base_orthognal_direction = point_base_location - orthognal_location
+ current_standard_info = sugar_standard_location_dict[nuc_type]
+ all_location_dict,standard_backbone_location_list, standard_orthognal_ratio,standard_s1s2_to_base_direction,standard_s1_base_direction = current_standard_info
+
+ s1_s2_select_orthognal_location = point_pho_location + standard_orthognal_ratio*(point_next_pho_location-point_pho_location)
+ base_direction_adjust_ratio = calculate_point_distance([0,0,0],standard_s1s2_to_base_direction)/calculate_point_distance([0,0,0],p1p2_base_orthognal_direction)
+ s1_s2_adjust_base_location = s1_s2_select_orthognal_location+p1p2_base_orthognal_direction*base_direction_adjust_ratio
+
+ standard_align_vector = []
+ standard_align_vector.append(standard_backbone_location_list[0])
+ standard_align_vector.append(standard_backbone_location_list[-1])
+ standard_align_vector.append(standard_s1_base_direction+standard_backbone_location_list[0])
+ for k in range(len(standard_align_vector)):
+ standard_align_vector[k]=standard_align_vector[k]-standard_backbone_location_list[0]
+
+ standard_align_vector = np.array(standard_align_vector)
+
+
+ #build current vector
+ current_bpp_vector = []#[[0,0,0]]
+
+ point_final_o_location = point_next_pho_location
+ current_bpp_vector.append(point_pho_location)
+ current_bpp_vector.append(point_final_o_location)
+ current_bpp_vector.append(s1_s2_adjust_base_location)
+ for k in range(len(current_bpp_vector)):
+ current_bpp_vector[k]=current_bpp_vector[k]-point_pho_location
+ current_bpp_vector = np.array(current_bpp_vector)
+
+ rotation,translation,err = rigid_transform_3D(standard_align_vector,current_bpp_vector)
+ all_locations = np.array(list(all_location_dict.values()))
+ output_locations = np.matmul(all_locations, rotation.T) + translation.reshape([1, 3])
+ search_key_list = list(all_location_dict.keys())
+ pho_location_index = search_key_list.index("P")
+ cur_pho_location = output_locations[pho_location_index]
+ aligh_pho_shift = point_pho_location-cur_pho_location
+ output_locations = output_locations+aligh_pho_shift
+ output_locations = [permute_point_coord_to_global_coord(output_locations[k],map_info_list) for k in range(len(output_locations))]
+ return search_key_list,output_locations
diff --git a/CryoREAD/atomic/io_utils.py b/CryoREAD/atomic/io_utils.py
new file mode 100644
index 0000000..11f3c4a
--- /dev/null
+++ b/CryoREAD/atomic/io_utils.py
@@ -0,0 +1,199 @@
+from collections import defaultdict
+import numpy as np
+import os
+from scipy.spatial.distance import cdist
+from atomic.geo_atomic_modeling import build_base_atomic_nuc_rigid,build_base_atomic_nuc_rigid_sugar
+
+def define_base_location_list(frag_info_list,refer_base_location):
+ all_frag_coordinate= []
+ for i in range(len(frag_info_list)):
+ chain_id,current_seq_index,atom_name, cur_atom_position,nuc_type,avg_score =frag_info_list[i]
+ if atom_name=="C4'":
+ all_frag_coordinate.append(cur_atom_position)
+ all_refer_coordinate = []
+ all_refer_key = list(refer_base_location.keys())
+ for key in all_refer_key:
+ split_key = key.split(",")
+ cur_coord=[]
+ for k in range(3):
+ cur_coord.append(float(split_key[k]))
+ all_refer_coordinate.append(cur_coord)
+ all_refer_coordinate = np.array(all_refer_coordinate)
+ refer_distance = cdist(all_frag_coordinate,all_refer_coordinate)
+ refer_close = np.argmin(refer_distance,axis=1)
+ refer_base_location_list = []
+ for k in range(len(all_frag_coordinate)):
+ close_key = int(refer_close[k])
+ cur_distance = refer_distance[k,close_key]
+ assert cur_distance<1#should be the same coord used for
+ search_key = all_refer_key[close_key]
+ refer_base_location_list.append(refer_base_location[search_key])
+ return refer_base_location_list
+def Write_Atomic_Fraginfo_cif(name,frag_info_list,refer_base_location,save_path,DNA_Label,map_info_list):
+ Natm =1
+ pdb_save_path = save_path[:-4]+".pdb"
+ with open(pdb_save_path,"w") as file:
+ file.write("")
+ refer_sugar_base_location = define_base_location_list(frag_info_list,refer_base_location)
+ with open(save_path,'w') as file:
+ line = 'data_%s\n'%name
+ line += "#\nloop_\n_atom_site.group_PDB\n_atom_site.id\n_atom_site.type_symbol\n" \
+ "_atom_site.label_atom_id\n_atom_site.label_alt_id\n_atom_site.label_comp_id\n"\
+ "_atom_site.label_asym_id\n_atom_site.label_entity_id\n_atom_site.label_seq_id\n"\
+ "_atom_site.pdbx_PDB_ins_code\n_atom_site.Cartn_x\n_atom_site.Cartn_y\n_atom_site.Cartn_z\n"\
+ "_atom_site.occupancy\n_atom_site.B_iso_or_equiv\n_atom_site.auth_seq_id\n_atom_site.auth_asym_id\n"\
+ "_atom_site.pdbx_PDB_model_num\n"
+ file.write(line)
+ count_track = 0
+ #extract sugar and phosphate_pairs
+ Sugar_Refer_Dict={}#use nuc_id to identify Phosphate
+ Sugar_Only_Frag_List=[]
+ for i in range(len(frag_info_list)):
+ chain_id,current_seq_index,atom_name, cur_atom_position,nuc_type,avg_score =frag_info_list[i]
+ if atom_name=="C4'":
+ Sugar_Only_Frag_List.append(frag_info_list[i])
+ else:
+ Sugar_Refer_Dict[current_seq_index]=frag_info_list[i]
+ count_use_pho=0
+ for k in range(len(Sugar_Only_Frag_List)):
+ use_pho_model=0
+ chain_id,current_seq_index,atom_name, cur_sugar_position,nuc_type,avg_score = Sugar_Only_Frag_List[k]
+ if current_seq_index in Sugar_Refer_Dict:
+ chain_id, current_seq_index,atom_name2, cur_pho_position,nuc_type,avg_score = Sugar_Refer_Dict[current_seq_index]
+ if k+1=1 and k+10:
+ print("use pho situation %d"%use_pho_model)
+ count_use_pho+=1
+ #assign structures based on
+ atom_list,coordinate_list = build_base_atomic_nuc_rigid(cur_pho_position,next_pho_position, cur_base_location,nuc_type,map_info_list)
+ else:
+ atom_list,coordinate_list = build_base_atomic_nuc_rigid_sugar(cur_sugar_position,next_sugar_position, cur_base_location,nuc_type,map_info_list)
+ # except:
+ # atom_list = ["C4'"]
+ # coordinate_list = [cur_sugar_position]
+ # print("svd solution failed for all atom building for this residue")
+ for k in range(len(atom_list)):
+ line =""
+ line += "ATOM %-10d C %-3s . %-3s %-2s . %-10d . " % (Natm, atom_list[k], nuc_type, chain_id, current_seq_index)
+ line += "%-8.3f %-8.3f %-8.3f %-6.2f %-6.8f %-10d %-2s 1 \n" % (
+ coordinate_list[k][0], coordinate_list[k][1], coordinate_list[k][2], 1.0,avg_score,Natm,chain_id)
+ file.write(line)
+ with open(pdb_save_path,"a+") as wfile:
+ line=""
+ if len(chain_id)>2:
+ chain_id=chain_id[:2]#for saving purposes
+ line += "ATOM%7d %-4s %3s%2s%4d " % (Natm, atom_list[k],nuc_type, chain_id,current_seq_index)
+ # tmp_dens=(point.merged_cd_dens[i,3]-dens_min)/(dens_max-dens_min)
+ line = line + "%8.3f%8.3f%8.3f%6.2f%6.2f\n" % (
+ coordinate_list[k][0], coordinate_list[k][1], coordinate_list[k][2], 1.0, avg_score)
+ wfile.write(line)
+
+ Natm += 1
+ print("we have %d/%d atomic build upon pho atoms"%(count_use_pho,len(Sugar_Only_Frag_List)))
diff --git a/CryoREAD/best_model/stage1_network.pth b/CryoREAD/best_model/stage1_network.pth
new file mode 100644
index 0000000..035c4a7
Binary files /dev/null and b/CryoREAD/best_model/stage1_network.pth differ
diff --git a/CryoREAD/best_model/stage2_network.pth b/CryoREAD/best_model/stage2_network.pth
new file mode 100644
index 0000000..14cb697
Binary files /dev/null and b/CryoREAD/best_model/stage2_network.pth differ
diff --git a/CryoREAD/best_model_0_3A/stage1_network.pth b/CryoREAD/best_model_0_3A/stage1_network.pth
new file mode 100644
index 0000000..60b94ad
Binary files /dev/null and b/CryoREAD/best_model_0_3A/stage1_network.pth differ
diff --git a/CryoREAD/best_model_0_3A/stage2_network.pth b/CryoREAD/best_model_0_3A/stage2_network.pth
new file mode 100644
index 0000000..50fd9b8
Binary files /dev/null and b/CryoREAD/best_model_0_3A/stage2_network.pth differ
diff --git a/CryoREAD/best_model_5_10A/stage1_network.pth b/CryoREAD/best_model_5_10A/stage1_network.pth
new file mode 100644
index 0000000..b0f15bb
Binary files /dev/null and b/CryoREAD/best_model_5_10A/stage1_network.pth differ
diff --git a/CryoREAD/best_model_5_10A/stage2_network.pth b/CryoREAD/best_model_5_10A/stage2_network.pth
new file mode 100644
index 0000000..3e2d9d6
Binary files /dev/null and b/CryoREAD/best_model_5_10A/stage2_network.pth differ
diff --git a/CryoREAD/best_model_paper/stage1_network.pth b/CryoREAD/best_model_paper/stage1_network.pth
new file mode 100644
index 0000000..99946f4
Binary files /dev/null and b/CryoREAD/best_model_paper/stage1_network.pth differ
diff --git a/CryoREAD/best_model_paper/stage2_network.pth b/CryoREAD/best_model_paper/stage2_network.pth
new file mode 100644
index 0000000..94343e7
Binary files /dev/null and b/CryoREAD/best_model_paper/stage2_network.pth differ
diff --git a/CryoREAD/coot/__init__.py b/CryoREAD/coot/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/CryoREAD/coot/all_atom_refine.py b/CryoREAD/coot/all_atom_refine.py
new file mode 100644
index 0000000..12abd8c
--- /dev/null
+++ b/CryoREAD/coot/all_atom_refine.py
@@ -0,0 +1,247 @@
+
+import os
+#work with refine residues
+
+
+def real_space_atom_refine(input_map,input_pdb,contour):
+ output_pdb = input_pdb[:-4]+"_cootrefine.pdb"
+ structure = read_pdb(input_pdb)
+ map_imol = handle_read_ccp4_map(input_map, 0)
+ set_contour_level_absolute(map_imol,contour)
+ res_step=1
+
+ residue_list=[]
+ for chain_id in chain_ids(structure):
+ n_residues = chain_n_residues(chain_id,structure)
+ print "There are %(a)i residues in chain %(b)s" % {"a":n_residues,"b":chain_id}
+
+ for serial_number in range(0, n_residues, res_step):
+ res_name = resname_from_serial_number(structure, chain_id, serial_number)
+ res_no = seqnum_from_serial_number(structure, chain_id, serial_number)
+ ins_code = insertion_code_from_serial_number(structure, chain_id, serial_number)
+ residue_list.append( [chain_id,res_no,ins_code])
+ # active_atom = active_residue()
+ # centred_residue = active_atom[1:4]
+ with AutoAccept():
+ refine_residues(structure,residue_list)
+ write_pdb_file(structure, output_pdb)
+ close_molecule(structure)
+ close_molecule(map_imol)
+ coot_real_exit(0)
+
+def real_space_atom_refine_output(input_map,input_pdb,output_pdb,contour):
+ #output_pdb = input_pdb[:-4]+"_cootrefine.pdb"
+ structure = read_pdb(input_pdb)
+ map_imol = handle_read_ccp4_map(input_map, 0)
+ set_contour_level_absolute(map_imol,contour)
+ res_step=1
+
+ residue_list=[]
+ for chain_id in chain_ids(structure):
+ n_residues = chain_n_residues(chain_id,structure)
+ print "There are %(a)i residues in chain %(b)s" % {"a":n_residues,"b":chain_id}
+
+ for serial_number in range(0, n_residues, res_step):
+ res_name = resname_from_serial_number(structure, chain_id, serial_number)
+ res_no = seqnum_from_serial_number(structure, chain_id, serial_number)
+ ins_code = insertion_code_from_serial_number(structure, chain_id, serial_number)
+ residue_list.append( [chain_id,res_no,ins_code])
+ # active_atom = active_residue()
+ # centred_residue = active_atom[1:4]
+ with AutoAccept():
+ refine_residues(structure,residue_list)
+ write_pdb_file(structure, output_pdb)
+ close_molecule(structure)
+ close_molecule(map_imol)
+ coot_real_exit(0)
+
+def real_space_atom_fragmentrefine_output(input_map,input_pdb,output_pdb,contour):
+ #output_pdb = input_pdb[:-4]+"_cootrefine.pdb"
+ structure = read_pdb(input_pdb)
+ map_imol = handle_read_ccp4_map(input_map, 0)
+ set_contour_level_absolute(map_imol,contour)
+ res_step=1
+
+ residue_list=[]
+ #begin_flag=True
+ prev_residue =-1
+ prev_chain="GG"
+ for chain_id in chain_ids(structure):
+ n_residues = chain_n_residues(chain_id,structure)
+ print "There are %(a)i residues in chain %(b)s" % {"a":n_residues,"b":chain_id}
+
+ for serial_number in range(0, n_residues, res_step):
+ res_name = resname_from_serial_number(structure, chain_id, serial_number)
+ res_no = seqnum_from_serial_number(structure, chain_id, serial_number)
+ ins_code = insertion_code_from_serial_number(structure, chain_id, serial_number)
+
+
+
+
+ if chain_id==prev_chain and abs(res_no-prev_residue)==1:
+ residue_list.append([chain_id,res_no,ins_code])
+ prev_residue=res_no
+ prev_chain=chain_id
+ else:
+ if len(residue_list)!=0:
+ print("refine structures: ",residue_list)
+ with AutoAccept():
+ refine_residues(structure,residue_list)
+ residue_list=[]
+ residue_list.append([chain_id,res_no,ins_code])
+ prev_residue=res_no
+ prev_chain=chain_id
+ # active_atom = active_residue()
+ # centred_residue = active_atom[1:4]
+ if len(residue_list)!=0:
+ print("refine structures: ",residue_list)
+ with AutoAccept():
+ refine_residues(structure,residue_list)
+ write_pdb_file(structure, output_pdb)
+ close_molecule(structure)
+ close_molecule(map_imol)
+ coot_real_exit(0)
+
+def real_space_atom_residuerefine_output(input_map,input_pdb,output_pdb,contour):
+ #output_pdb = input_pdb[:-4]+"_cootrefine.pdb"
+ structure = read_pdb(input_pdb)
+ map_imol = handle_read_ccp4_map(input_map, 0)
+ set_contour_level_absolute(map_imol,contour)
+ res_step=1
+
+ residue_list=[]
+ #begin_flag=True
+ prev_residue =-1
+ prev_chain="GG"
+ for chain_id in chain_ids(structure):
+ n_residues = chain_n_residues(chain_id,structure)
+ print "There are %(a)i residues in chain %(b)s" % {"a":n_residues,"b":chain_id}
+
+ for serial_number in range(0, n_residues, res_step):
+ res_name = resname_from_serial_number(structure, chain_id, serial_number)
+ res_no = seqnum_from_serial_number(structure, chain_id, serial_number)
+ ins_code = insertion_code_from_serial_number(structure, chain_id, serial_number)
+
+
+ with AutoAccept():
+ refine_residues(structure,[[chain_id,res_no,ins_code]])
+
+ write_pdb_file(structure, output_pdb)
+ close_molecule(structure)
+ close_molecule(map_imol)
+ coot_real_exit(0)
+
+def real_space_atom_3residuerefine_output(input_map,input_pdb,output_pdb,contour):
+ #output_pdb = input_pdb[:-4]+"_cootrefine.pdb"
+ structure = read_pdb(input_pdb)
+ map_imol = handle_read_ccp4_map(input_map, 0)
+ set_contour_level_absolute(map_imol,contour)
+ res_step=1
+ all_list=[]
+ residue_list=[]
+ #begin_flag=True
+ prev_residue =-1
+ prev_chain="GG"
+ for chain_id in chain_ids(structure):
+ n_residues = chain_n_residues(chain_id,structure)
+ print "There are %(a)i residues in chain %(b)s" % {"a":n_residues,"b":chain_id}
+
+ for serial_number in range(0, n_residues, res_step):
+ res_name = resname_from_serial_number(structure, chain_id, serial_number)
+ res_no = seqnum_from_serial_number(structure, chain_id, serial_number)
+ ins_code = insertion_code_from_serial_number(structure, chain_id, serial_number)
+
+
+
+
+ if chain_id==prev_chain and abs(res_no-prev_residue)==1:
+ residue_list.append([chain_id,res_no,ins_code])
+ prev_residue=res_no
+ prev_chain=chain_id
+ else:
+ if len(residue_list)!=0:
+ print("collecting structures: ",residue_list)
+ all_list.append(residue_list)
+ residue_list=[]
+ residue_list.append([chain_id,res_no,ins_code])
+ prev_residue=res_no
+ prev_chain=chain_id
+ # active_atom = active_residue()
+ # centred_residue = active_atom[1:4]
+ if len(residue_list)!=0:
+ print("collecting structures: ",residue_list)
+ all_list.append(residue_list)
+
+ for residue_list in all_list:
+ if len(residue_list)<=3:
+ print("refine structures: ",residue_list)
+ with AutoAccept():
+ refine_residues(structure,residue_list)
+ else:
+
+ for k in range(0,len(residue_list)-3):
+ print("refine structures: ",residue_list[k:k+3])
+ with AutoAccept():
+ refine_residues(structure,residue_list[k:k+3])
+ write_pdb_file(structure, output_pdb)
+ close_molecule(structure)
+ close_molecule(map_imol)
+ coot_real_exit(0)
+
+def real_space_atom_Bchainrefine_output(input_map,input_pdb,output_pdb,contour):
+ #output_pdb = input_pdb[:-4]+"_cootrefine.pdb"
+ structure = read_pdb(input_pdb)
+ map_imol = handle_read_ccp4_map(input_map, 0)
+ set_contour_level_absolute(map_imol,contour)
+ res_step=1
+
+ residue_list=[]
+ for chain_id in chain_ids(structure):
+ if chain_id!='B':
+ continue
+ n_residues = chain_n_residues(chain_id,structure)
+ print "There are %(a)i residues in chain %(b)s" % {"a":n_residues,"b":chain_id}
+
+ for serial_number in range(0, n_residues, res_step):
+ res_name = resname_from_serial_number(structure, chain_id, serial_number)
+ res_no = seqnum_from_serial_number(structure, chain_id, serial_number)
+ ins_code = insertion_code_from_serial_number(structure, chain_id, serial_number)
+ residue_list.append( [chain_id,res_no,ins_code])
+ # active_atom = active_residue()
+ # centred_residue = active_atom[1:4]
+ with AutoAccept():
+ refine_residues(structure,residue_list)
+ write_pdb_file(structure, output_pdb)
+ close_molecule(structure)
+ close_molecule(map_imol)
+ coot_real_exit(0)
+
+def real_space_atom_range_refine_output(input_map,input_pdb,output_pdb,contour,refine_index):
+ #output_pdb = input_pdb[:-4]+"_cootrefine.pdb"
+ refine_start, refine_end = refine_index
+ structure = read_pdb(input_pdb)
+ map_imol = handle_read_ccp4_map(input_map, 0)
+ set_contour_level_absolute(map_imol,contour)
+ res_step=1
+
+ residue_list=[]
+ for chain_id in chain_ids(structure):
+
+ n_residues = chain_n_residues(chain_id,structure)
+ print "There are %(a)i residues in chain %(b)s" % {"a":n_residues,"b":chain_id}
+
+ for serial_number in range(0, n_residues, res_step):
+ res_name = resname_from_serial_number(structure, chain_id, serial_number)
+ res_no = seqnum_from_serial_number(structure, chain_id, serial_number)
+ ins_code = insertion_code_from_serial_number(structure, chain_id, serial_number)
+ if int(res_no)>=refine_start and int(res_no)<=refine_end:
+ residue_list.append( [chain_id,res_no,ins_code])
+ # active_atom = active_residue()
+ # centred_residue = active_atom[1:4]
+ print("refine list",residue_list)
+ with AutoAccept():
+ refine_residues(structure,residue_list)
+ write_pdb_file(structure, output_pdb)
+ close_molecule(structure)
+ close_molecule(map_imol)
+ coot_real_exit(0)
diff --git a/CryoREAD/coot/coot_refine_structure.py b/CryoREAD/coot/coot_refine_structure.py
new file mode 100644
index 0000000..0beb30a
--- /dev/null
+++ b/CryoREAD/coot/coot_refine_structure.py
@@ -0,0 +1,31 @@
+import mrcfile
+import numpy as np
+import os
+import shutil
+def coot_refine_structure(input_cif_path,input_map_path,output_cif_path,coot_software="coot"):
+ """
+ input and output must be pdb format
+ """
+ output_cif_path = os.path.abspath(output_cif_path)
+ work_dir = os.path.split(output_cif_path)[0]
+ script_path= os.path.join(os.getcwd(),"coot")
+ script_path = os.path.join(script_path,"all_atom_refine.py")
+ new_script_path = os.path.join(work_dir,"all_atom_refine.py")
+ shutil.copy(script_path,new_script_path)
+ #determine contour
+ with mrcfile.open(input_map_path,permissive=True) as mrc:
+ data=mrc.data
+ min_value= np.min(data[data>0])
+ script_path = os.path.join(work_dir,"coot_refine.py")
+ with open(script_path,'w') as file:
+ #file.write('from all_atom_refine import real_space_atom_refine_output\n')
+ with open(new_script_path,'r') as rfile:
+ for line in rfile:
+ file.write(line)
+ file.write("\n")
+ file.write('real_space_atom_residuerefine_output("%s","%s","%s",%f)'%(input_map_path,input_cif_path,output_cif_path,min_value))
+
+ root_dir=os.getcwd()
+ os.chdir(work_dir)
+ os.system("%s --no-graphics --script coot_refine.py"%coot_software)
+ os.chdir(root_dir)
diff --git a/CryoREAD/data_processing/DRNA_dataset.py b/CryoREAD/data_processing/DRNA_dataset.py
new file mode 100644
index 0000000..9fc8511
--- /dev/null
+++ b/CryoREAD/data_processing/DRNA_dataset.py
@@ -0,0 +1,57 @@
+
+import os
+import numpy as np
+import torch
+import torch.utils.data
+import random
+
+class Single_Dataset(torch.utils.data.Dataset):
+ def __init__(self, data_path, search_key="input_"):
+ """
+ :param data_path: training data path
+ :param training_id: specify the id that will be used for training
+ """
+ listfiles = [x for x in os.listdir(data_path) if search_key in x]
+ listfiles.sort()
+ self.input_path = []
+ self.id_list = []
+ for x in listfiles:
+ self.input_path.append(os.path.join(data_path,x))
+ cur_id = int(x.replace(search_key,"").replace(".npy",""))
+ self.id_list.append(cur_id)
+
+ def __len__(self):
+ return len(self.input_path)
+
+ def __getitem__(self, idx):
+ inputfile = self.input_path[idx]
+ cur_id = self.id_list[idx]
+ input = np.load(inputfile)
+ input = input[np.newaxis, :]
+ input = torch.from_numpy(np.array(input, np.float32, copy=True))
+
+ return input,cur_id
+class Single_Dataset2(torch.utils.data.Dataset):
+ def __init__(self, data_path, search_key="input_"):
+ """
+ :param data_path: training data path
+ :param training_id: specify the id that will be used for training
+ """
+ listfiles = [x for x in os.listdir(data_path) if search_key in x]
+ listfiles.sort()
+ self.input_path = []
+ self.id_list = []
+ for x in listfiles:
+ self.input_path.append(os.path.join(data_path,x))
+ cur_id = int(x.replace(search_key,"").replace(".npy",""))
+ self.id_list.append(cur_id)
+
+ def __len__(self):
+ return len(self.input_path)
+
+ def __getitem__(self, idx):
+ inputfile = self.input_path[idx]
+ cur_id = self.id_list[idx]
+ input = np.load(inputfile)
+ input = torch.from_numpy(np.array(input, np.float32, copy=True))
+ return input,cur_id
diff --git a/CryoREAD/data_processing/Gen_MaskDRNA_map.py b/CryoREAD/data_processing/Gen_MaskDRNA_map.py
new file mode 100644
index 0000000..5c16284
--- /dev/null
+++ b/CryoREAD/data_processing/Gen_MaskDRNA_map.py
@@ -0,0 +1,25 @@
+from data_processing.map_utils import save_dens_map
+import mrcfile
+import numpy as np
+def Gen_MaskDRNA_map(chain_prob,cur_map_path,save_map_path,contour,threshold=0.6):
+ sp_prob = chain_prob[0]+chain_prob[1]
+ base_prob = chain_prob[-1]
+ protein_prob = chain_prob[-2]
+ with mrcfile.open(cur_map_path,permissive=True) as mrc:
+ dens_data=np.array(mrc.data)
+ dens_data[protein_prob>=threshold]=0
+ drna_predictions= sp_prob+base_prob
+ dens_data[drna_predictions<=0.1]=0
+ dens_data[dens_data<=contour]=0
+ #then save the new density data
+ save_dens_map(save_map_path,dens_data, cur_map_path)
+
+
+def Gen_MaskProtein_map(chain_prob,cur_map_path,save_map_path,contour,threshold=0.3):
+ protein_prob = chain_prob[-2]
+ with mrcfile.open(cur_map_path,permissive=True) as mrc:
+ dens_data=np.array(mrc.data)
+ dens_data[protein_prob<=threshold]=0
+ dens_data[dens_data<=contour]=0
+ #then save the new density data
+ save_dens_map(save_map_path,dens_data, cur_map_path)
diff --git a/CryoREAD/data_processing/Resize_Map.py b/CryoREAD/data_processing/Resize_Map.py
new file mode 100644
index 0000000..ec244b3
--- /dev/null
+++ b/CryoREAD/data_processing/Resize_Map.py
@@ -0,0 +1,518 @@
+import os
+import mrcfile
+import numpy as np
+import torch
+import torch.nn.functional as F
+from multiprocessing import Pool, Lock
+from multiprocessing.sharedctypes import Value, Array
+
+# global size,data,iterator,s
+from numba import jit
+
+# https://github.com/jglaser/interp3d
+import shutil
+from numba_progress import ProgressBar
+
+
+@jit(nopython=True, nogil=True)
+def interpolate_fast(data, data_new, size, iterator1, iterator2, iterator3, prev_voxel_size, progress_proxy):
+
+ for i in range(1, iterator1, 1):
+ progress_proxy.update(1)
+ # if(i%1==0):
+ # prefix="interpolating"
+ # display_size=60
+ # x = int(display_size*i/iterator1)
+ # line=["#" for k in range(x)]
+ # #line="".join(line)
+ # line1=["." for k in range(size-x)]
+ # #line1="".join(line1)
+ # print(prefix, "[",line,line1,"] (", i, iterator1,")",
+ # end='\r', flush=True)
+ # print("Finished",i,iterator1)
+ for j in range(1, iterator2, 1):
+ for k in range(1, iterator3, 1):
+ count = [int(i / prev_voxel_size), int(j / prev_voxel_size), int(k / prev_voxel_size)]
+ e1 = count[0] + 1
+ e2 = count[1] + 1
+ e3 = count[2] + 1
+
+ if count[0] >= size[0] - 1: # or count[1]>=size[1]-1 or count[2]>=size[2]-1 ):
+ # print(count)
+ e1 = count[0]
+ continue
+ if count[1] >= size[1] - 1:
+ e2 = count[1]
+ continue
+ if count[2] >= size[2] - 1:
+ e3 = count[2]
+ continue
+ diff1 = [i - count[0] * prev_voxel_size, j - count[1] * prev_voxel_size, k - count[2] * prev_voxel_size]
+ diff2 = [e1 * prev_voxel_size - i, e2 * prev_voxel_size - j, e3 * prev_voxel_size - k]
+ # print(diff)
+ val1 = data[count[0], count[1], count[2]]
+ val2 = data[e1, count[1], count[2]]
+ val3 = data[e1, e2, count[2]]
+ val4 = data[count[0], e2, count[2]]
+ val5 = data[count[0], count[1], e3]
+ val6 = data[e1, count[1], e3]
+ val7 = data[e1, e2, e3]
+ val8 = data[count[0], e2, e3]
+ # val = (val1 + diff[0] * (val2 - val1) + diff[1] * (val4 - val1) + diff[2] * (val5 - val1) + diff[0] *
+ # diff[1] * (val1 - val2 + val3 - val4) + diff[0] * diff[2] * (val1 - val2 - val5 + val6) + diff[
+ # 1] * diff[2] * (
+ # val1 - val4 - val5 + val8) + diff[0] * diff[1] * diff[2] * (
+ # val8 - val7 + val6 - val5 + val4 - val3 + val2 - val1))
+ u1 = diff1[0]
+ u2 = diff2[0]
+ v1 = diff1[1]
+ v2 = diff2[1]
+ w1 = diff1[2]
+ w2 = diff2[2]
+ val = (
+ (
+ w2 * (v1 * (u1 * val3 + u2 * val4) + v2 * (u1 * val2 + u2 * val1))
+ + w1 * (v1 * (u1 * val7 + u2 * val8) + v2 * (u1 * val6 + u2 * val5))
+ )
+ / (w1 + w2)
+ / (v1 + v2)
+ / (u1 + u2)
+ )
+ data_new[i, j, k] = val
+ return data_new # np.float32(data_new)
+
+
+@jit(nopython=True, nogil=True)
+def interpolate_fast_general(data, data_new, size, iterator1, iterator2, iterator3, prev_voxel_size1, prev_voxel_size2, prev_voxel_size3):
+ for i in range(1, iterator1, 1):
+ # if(i%10==0):
+ # print("Finished",i,iterator1)
+ for j in range(1, iterator2, 1):
+ for k in range(1, iterator3, 1):
+ count = [int(i / prev_voxel_size1), int(j / prev_voxel_size2), int(k / prev_voxel_size3)]
+ e1 = count[0] + 1
+ e2 = count[1] + 1
+ e3 = count[2] + 1
+
+ if count[0] >= size[0] - 1: # or count[1]>=size[1]-1 or count[2]>=size[2]-1 ):
+ # print(count)
+ e1 = count[0]
+ continue
+ if count[1] >= size[1] - 1:
+ e2 = count[1]
+ continue
+ if count[2] >= size[2] - 1:
+ e3 = count[2]
+ continue
+ diff1 = [i - count[0] * prev_voxel_size1, j - count[1] * prev_voxel_size2, k - count[2] * prev_voxel_size3]
+ diff2 = [e1 * prev_voxel_size1 - i, e2 * prev_voxel_size2 - j, e3 * prev_voxel_size3 - k]
+ # print(diff)
+ val1 = data[count[0], count[1], count[2]]
+ val2 = data[e1, count[1], count[2]]
+ val3 = data[e1, e2, count[2]]
+ val4 = data[count[0], e2, count[2]]
+ val5 = data[count[0], count[1], e3]
+ val6 = data[e1, count[1], e3]
+ val7 = data[e1, e2, e3]
+ val8 = data[count[0], e2, e3]
+ # val = (val1 + diff[0] * (val2 - val1) + diff[1] * (val4 - val1) + diff[2] * (val5 - val1) + diff[0] *
+ # diff[1] * (val1 - val2 + val3 - val4) + diff[0] * diff[2] * (val1 - val2 - val5 + val6) + diff[
+ # 1] * diff[2] * (
+ # val1 - val4 - val5 + val8) + diff[0] * diff[1] * diff[2] * (
+ # val8 - val7 + val6 - val5 + val4 - val3 + val2 - val1))
+ u1 = diff1[0]
+ u2 = diff2[0]
+ v1 = diff1[1]
+ v2 = diff2[1]
+ w1 = diff1[2]
+ w2 = diff2[2]
+ val = (
+ (
+ w2 * (v1 * (u1 * val3 + u2 * val4) + v2 * (u1 * val2 + u2 * val1))
+ + w1 * (v1 * (u1 * val7 + u2 * val8) + v2 * (u1 * val6 + u2 * val5))
+ )
+ / (w1 + w2)
+ / (v1 + v2)
+ / (u1 + u2)
+ )
+ data_new[i, j, k] = val
+ return data_new # np.float32(data_new)
+
+
+def interpolate_slow(data, data_new, size, iterator1, iterator2, iterator3, prev_voxel_size, progress_proxy):
+ for i in range(1, iterator1, 1):
+ progress_proxy.update(1)
+ # if(i%10==0):
+ # print("Finished",i,iterator1)
+ for j in range(1, iterator2, 1):
+ for k in range(1, iterator3, 1):
+ count = [int(i / prev_voxel_size), int(j / prev_voxel_size), int(k / prev_voxel_size)]
+ e1 = count[0] + 1
+ e2 = count[1] + 1
+ e3 = count[2] + 1
+
+ if count[0] >= size[0] - 1: # or count[1]>=size[1]-1 or count[2]>=size[2]-1 ):
+ # print(count)
+ e1 = count[0]
+ continue
+ if count[1] >= size[1] - 1:
+ e2 = count[1]
+ continue
+ if count[2] >= size[2] - 1:
+ e3 = count[2]
+ continue
+ diff1 = [i - count[0] * prev_voxel_size, j - count[1] * prev_voxel_size, k - count[2] * prev_voxel_size]
+ diff2 = [e1 * prev_voxel_size - i, e2 * prev_voxel_size - j, e3 * prev_voxel_size - k]
+ # print(diff)
+ val1 = data[count[0], count[1], count[2]]
+ val2 = data[e1, count[1], count[2]]
+ val3 = data[e1, e2, count[2]]
+ val4 = data[count[0], e2, count[2]]
+ val5 = data[count[0], count[1], e3]
+ val6 = data[e1, count[1], e3]
+ val7 = data[e1, e2, e3]
+ val8 = data[count[0], e2, e3]
+ # val = (val1 + diff[0] * (val2 - val1) + diff[1] * (val4 - val1) + diff[2] * (val5 - val1) + diff[0] *
+ # diff[1] * (val1 - val2 + val3 - val4) + diff[0] * diff[2] * (val1 - val2 - val5 + val6) + diff[
+ # 1] * diff[2] * (
+ # val1 - val4 - val5 + val8) + diff[0] * diff[1] * diff[2] * (
+ # val8 - val7 + val6 - val5 + val4 - val3 + val2 - val1))
+ u1 = diff1[0]
+ u2 = diff2[0]
+ v1 = diff1[1]
+ v2 = diff2[1]
+ w1 = diff1[2]
+ w2 = diff2[2]
+ val = (
+ (
+ w2 * (v1 * (u1 * val3 + u2 * val4) + v2 * (u1 * val2 + u2 * val1))
+ + w1 * (v1 * (u1 * val7 + u2 * val8) + v2 * (u1 * val6 + u2 * val5))
+ )
+ / (w1 + w2)
+ / (v1 + v2)
+ / (u1 + u2)
+ )
+ data_new[i, j, k] = val
+ return data_new
+
+
+def Reform_Map_Voxel_Final(map_path, new_map_path):
+ from scipy.interpolate import RegularGridInterpolator
+
+ if not os.path.exists(new_map_path):
+ with mrcfile.open(map_path, permissive=True) as mrc:
+ prev_voxel_size = mrc.voxel_size
+ prev_voxel_size_x = float(prev_voxel_size["x"])
+ prev_voxel_size_y = float(prev_voxel_size["y"])
+ prev_voxel_size_z = float(prev_voxel_size["z"])
+ nx, ny, nz, nxs, nys, nzs, mx, my, mz = (
+ mrc.header.nx,
+ mrc.header.ny,
+ mrc.header.nz,
+ mrc.header.nxstart,
+ mrc.header.nystart,
+ mrc.header.nzstart,
+ mrc.header.mx,
+ mrc.header.my,
+ mrc.header.mz,
+ )
+ orig = mrc.header.origin
+ print("Origin:", orig)
+ print("Previous voxel size:", prev_voxel_size)
+ print("nx, ny, nz", nx, ny, nz)
+ print("nxs,nys,nzs", nxs, nys, nzs)
+ print("mx,my,mz", mx, my, mz)
+ data = mrc.data
+ data = np.swapaxes(data, 0, 2)
+ size = np.shape(data)
+ x = np.arange(size[0])
+ y = np.arange(size[1])
+ z = np.arange(size[2])
+ my_interpolating_function = RegularGridInterpolator((x, y, z), data)
+ it_val_x = int(np.floor(size[0] * prev_voxel_size_x))
+ it_val_y = int(np.floor(size[1] * prev_voxel_size_y))
+ it_val_z = int(np.floor(size[2] * prev_voxel_size_z))
+ print("Previouse size:", size, " Current map size:", [it_val_x, it_val_y, it_val_z])
+ data_new = np.zeros([it_val_x, it_val_y, it_val_z])
+ # from ops.progressbar import progressbar
+ from progress.bar import Bar
+
+ bar = Bar("Preparing Input: ", max=int(it_val_x))
+ for i in range(it_val_x): # progressbar(range(it_val_x), prefix="", size=60):
+ # if i%10==0:
+ # print("Resizing finished %d/%d"%(i,it_val_x))
+ bar.next()
+ for j in range(it_val_y):
+ for k in range(it_val_z):
+ if i / prev_voxel_size_x >= size[0] - 1:
+ x_query = size[0] - 1
+ else:
+ x_query = i / prev_voxel_size_x
+
+ if j / prev_voxel_size_y >= size[1] - 1:
+ y_query = size[1] - 1
+ else:
+ y_query = j / prev_voxel_size_y
+ if k / prev_voxel_size_z >= size[2] - 1:
+ z_query = size[2] - 1
+ else:
+ z_query = k / prev_voxel_size_z
+ current_query = np.array([x_query, y_query, z_query])
+ current_value = float(my_interpolating_function(current_query))
+ data_new[i, j, k] = current_value
+ bar.finish()
+ data_new = np.swapaxes(data_new, 0, 2)
+ data_new = np.float32(data_new)
+ mrc_new = mrcfile.new(new_map_path, data=data_new, overwrite=True)
+ vsize = mrc_new.voxel_size
+ vsize.flags.writeable = True
+ vsize.x = 1.0
+ vsize.y = 1.0
+ vsize.z = 1.0
+ mrc_new.voxel_size = vsize
+ mrc_new.update_header_from_data()
+ mrc_new.header.nxstart = nxs * prev_voxel_size_x
+ mrc_new.header.nystart = nys * prev_voxel_size_y
+ mrc_new.header.nzstart = nzs * prev_voxel_size_z
+ mrc_new.header.mapc = mrc.header.mapc
+ mrc_new.header.mapr = mrc.header.mapr
+ mrc_new.header.maps = mrc.header.maps
+ mrc_new.header.origin = orig
+ mrc_new.update_header_stats()
+ mrc.print_header()
+ mrc_new.print_header()
+ mrc_new.close()
+ del data
+ del data_new
+ return new_map_path
+
+
+def Reform_Map_Voxel(map_path, new_map_path):
+ if not os.path.exists(new_map_path):
+ with mrcfile.open(map_path, permissive=True) as mrc:
+ prev_voxel_size = mrc.voxel_size
+ # assert len(prev_voxel_size)==3
+
+ if not (prev_voxel_size["x"] == prev_voxel_size["y"] and prev_voxel_size["x"] == prev_voxel_size["z"]):
+ print(
+ "Grid size of different axis is different, please specify --resize=1 in command line to call another slow process to deal with it!"
+ )
+ exit(1)
+ prev_voxel_size = float(prev_voxel_size["x"])
+ nx, ny, nz, nxs, nys, nzs, mx, my, mz = (
+ mrc.header.nx,
+ mrc.header.ny,
+ mrc.header.nz,
+ mrc.header.nxstart,
+ mrc.header.nystart,
+ mrc.header.nzstart,
+ mrc.header.mx,
+ mrc.header.my,
+ mrc.header.mz,
+ )
+ orig = mrc.header.origin
+ print("Origin:", orig)
+ print("Previous voxel size:", prev_voxel_size)
+ data = mrc.data
+ data = np.swapaxes(data, 0, 2)
+ size = np.shape(data)
+ if prev_voxel_size == 1:
+ shutil.copy(map_path, new_map_path)
+ return new_map_path
+ if prev_voxel_size < 1:
+ print(
+ "Grid size is smaller than 1, please specify --resize=1 in command line to call another slow process to deal with it!"
+ )
+ exit(1)
+ it_val1 = int(np.floor(size[0] * prev_voxel_size))
+ it_val2 = int(np.floor(size[1] * prev_voxel_size))
+ it_val3 = int(np.floor(size[2] * prev_voxel_size))
+ print("Previouse size:", size, " Current map size:", it_val1, it_val2, it_val3)
+ data_new = np.zeros([it_val1, it_val2, it_val3])
+ data_new[0, 0, 0] = data[0, 0, 0]
+ data_new[it_val1 - 1, it_val2 - 1, it_val3 - 1] = data[size[0] - 1, size[1] - 1, size[2] - 1]
+ # iterator = Value('i', it_val)
+ # s = Value('d', float(prev_voxel_size))
+ # pool = Pool(3)
+ # out_1d = pool.map(interpolate,enumerate(np.reshape(data_new, (iterator.value * iterator.value * iterator.value,))))
+ # data_new = np.array(out_1d).reshape(iterator.value, iterator.value, iterator.value)
+ try:
+ with ProgressBar(total=it_val1) as progress:
+ data_new = interpolate_fast(data, data_new, size, it_val1, it_val2, it_val3, prev_voxel_size, progress)
+ except:
+ data_new = np.zeros([it_val1, it_val2, it_val3])
+ data_new[0, 0, 0] = data[0, 0, 0]
+ data_new[it_val1 - 1, it_val2 - 1, it_val3 - 1] = data[size[0] - 1, size[1] - 1, size[2] - 1]
+ with ProgressBar(total=it_val1) as progress:
+ data_new = interpolate_slow(data, data_new, size, it_val1, it_val2, it_val3, prev_voxel_size, progress)
+ data_new = np.swapaxes(data_new, 0, 2)
+ data_new = np.float32(data_new)
+ mrc_new = mrcfile.new(new_map_path, data=data_new, overwrite=True)
+ vsize = mrc_new.voxel_size
+ vsize.flags.writeable = True
+ vsize.x = 1.0
+ vsize.y = 1.0
+ vsize.z = 1.0
+ mrc_new.voxel_size = vsize
+ mrc_new.update_header_from_data()
+ mrc_new.header.nxstart = nxs * prev_voxel_size
+ mrc_new.header.nystart = nys * prev_voxel_size
+ mrc_new.header.nzstart = nzs * prev_voxel_size
+ mrc_new.header.mapc = mrc.header.mapc
+ mrc_new.header.mapr = mrc.header.mapr
+ mrc_new.header.maps = mrc.header.maps
+ mrc_new.header.origin = orig
+ mrc_new.update_header_stats()
+ mrc.print_header()
+ mrc_new.print_header()
+ mrc_new.close()
+ del data
+ del data_new
+ # del out_1d
+ return new_map_path
+
+
+def Resize_Map(input_map_path, new_map_path):
+ try:
+ my_reform_1a(input_map_path, new_map_path, use_gpu=True)
+ except:
+ print("GPU reform failed, falling back to CPU")
+ my_reform_1a(input_map_path, new_map_path, use_gpu=False)
+ # try:
+ # Reform_Map_Voxel(input_map_path, new_map_path)
+ # except:
+ # try:
+ # Reform_Map_Voxel_Final(input_map_path, new_map_path)
+ # except:
+ # exit()
+ return new_map_path
+
+
+def my_reform_1a(input_mrc, output_mrc, use_gpu=False):
+
+ with torch.no_grad() and torch.cuda.amp.autocast(enabled=use_gpu, dtype=torch.float16):
+
+ with mrcfile.open(input_mrc, permissive=True) as orig_map:
+ voxel_size = np.array([orig_map.voxel_size.x, orig_map.voxel_size.y, orig_map.voxel_size.z])
+
+ # orig_data = torch.from_numpy(orig_map.data.copy().transpose((2, 1, 0))).unsqueeze(0).unsqueeze(0)
+ orig_data = torch.from_numpy(orig_map.data.copy()).unsqueeze(0).unsqueeze(0)
+
+ orig_data = orig_data.cuda() if use_gpu else orig_data
+
+ print("Previous shape: ", orig_data.shape)
+ # orig = np.array([orig_map.header.origin.x, orig_map.header.origin.y, orig_map.header.origin.z])
+
+ new_grid_size = np.floor(np.array(orig_data.shape[2:]) * voxel_size).astype(np.int32)
+
+ # Voodoo magic
+ new_grid = torch.stack(
+ torch.meshgrid(
+ torch.arange(0, new_grid_size[2], device="cuda" if use_gpu else "cpu") / voxel_size[2] / (orig_data.shape[4] - 1) * 2
+ - 1,
+ torch.arange(0, new_grid_size[1], device="cuda" if use_gpu else "cpu") / voxel_size[1] / (orig_data.shape[3] - 1) * 2
+ - 1,
+ torch.arange(0, new_grid_size[0], device="cuda" if use_gpu else "cpu") / voxel_size[0] / (orig_data.shape[2] - 1) * 2
+ - 1,
+ indexing="ij",
+ ),
+ dim=-1,
+ )
+
+ new_grid = new_grid.unsqueeze(0)
+ new_data = F.grid_sample(orig_data, new_grid, mode="bilinear", align_corners=True).cpu().numpy()[0, 0]
+
+ new_voxel_size = np.array((1.0, 1.0, 1.0))
+ print("Real voxel size: ", new_voxel_size)
+ print("New shape: ", new_data.shape)
+
+ new_data = new_data.transpose((2, 1, 0))
+
+ with mrcfile.new(output_mrc, data=new_data.astype(np.float32), overwrite=True) as mrc:
+ vox_sizes = mrc.voxel_size
+ vox_sizes.flags.writeable = True
+ vox_sizes.x = new_voxel_size[0]
+ vox_sizes.y = new_voxel_size[1]
+ vox_sizes.z = new_voxel_size[2]
+ mrc.voxel_size = vox_sizes
+ mrc.update_header_from_data()
+ mrc.header.nxstart = orig_map.header.nxstart * voxel_size[0]
+ mrc.header.nystart = orig_map.header.nystart * voxel_size[1]
+ mrc.header.nzstart = orig_map.header.nzstart * voxel_size[2]
+ mrc.header.origin = orig_map.header.origin
+ mrc.header.mapc = orig_map.header.mapc
+ mrc.header.mapr = orig_map.header.mapr
+ mrc.header.maps = orig_map.header.maps
+ mrc.update_header_stats()
+ mrc.flush()
+
+
+if __name__ == "__main__":
+ import sys
+
+ def progressbar(it, prefix="", size=60, out=sys.stdout): # Python3.3+
+ count = len(it)
+
+ def show(j):
+ x = int(size * j / count)
+ print("{}[{}{}] {}/{}".format(prefix, "#" * x, "." * (size - x), j, count), end="\r", file=out, flush=True)
+
+ show(0)
+ for i, item in enumerate(it):
+ yield item
+ show(i + 1)
+ print("\n", flush=True, file=out)
+
+ from numba_progress import ProgressBar
+
+ data_new = np.zeros([200, 200, 200])
+ data = np.zeros([100, 100, 100])
+ size = [100, 100, 100]
+ it_val1 = 200
+ it_val2 = 200
+ it_val3 = 200
+ prev_voxel_size = 0.5
+ from scipy.interpolate import RegularGridInterpolator
+
+ with ProgressBar(total=it_val1) as progress:
+ interpolate_fast(data, data_new, size, it_val1, it_val2, it_val3, prev_voxel_size, progress)
+ with ProgressBar(total=it_val1) as progress:
+ data_new2 = interpolate_slow(data, data_new, size, it_val1, it_val2, it_val3, prev_voxel_size, progress)
+ # from ops.progressbar import progressbar
+ it_val_x = it_val1
+ it_val_y = it_val2
+ it_val_z = it_val3
+ prev_voxel_size_x = prev_voxel_size
+ prev_voxel_size_y = prev_voxel_size
+ prev_voxel_size_z = prev_voxel_size
+ x = np.arange(size[0])
+ y = np.arange(size[1])
+ z = np.arange(size[2])
+ my_interpolating_function = RegularGridInterpolator((x, y, z), data)
+ from progress.bar import Bar
+
+ bar = Bar("Preparing Input: ", max=int(it_val_x))
+ for i in range(it_val_x): # progressbar(range(it_val_x), prefix="", size=60):
+ # if i%10==0:
+ # print("Resizing finished %d/%d"%(i,it_val_x))
+ bar.next()
+ for j in range(it_val_y):
+ for k in range(it_val_z):
+ if i / prev_voxel_size_x >= size[0] - 1:
+ x_query = size[0] - 1
+ else:
+ x_query = i / prev_voxel_size_x
+
+ if j / prev_voxel_size_y >= size[1] - 1:
+ y_query = size[1] - 1
+ else:
+ y_query = j / prev_voxel_size_y
+ if k / prev_voxel_size_z >= size[2] - 1:
+ z_query = size[2] - 1
+ else:
+ z_query = k / prev_voxel_size_z
+ current_query = np.array([x_query, y_query, z_query])
+ current_value = float(my_interpolating_function(current_query))
+ data_new[i, j, k] = current_value
+
+ bar.finish()
diff --git a/CryoREAD/data_processing/Unify_Map.py b/CryoREAD/data_processing/Unify_Map.py
new file mode 100644
index 0000000..c9c8450
--- /dev/null
+++ b/CryoREAD/data_processing/Unify_Map.py
@@ -0,0 +1,49 @@
+from pathlib import Path
+
+import mrcfile
+import numpy as np
+
+
+def Unify_Map(input_map_path, new_map_path):
+ # Read MRC file
+ mrc = mrcfile.open(input_map_path, permissive=True)
+
+ data = mrc.data.copy()
+ voxel_size = np.asarray(mrc.voxel_size.tolist(), dtype=np.float32)
+ origin = np.array(mrc.header.origin.tolist(), dtype=np.float32)
+ nstart = np.asarray([mrc.header.nxstart, mrc.header.nystart, mrc.header.nzstart], dtype=np.float32)
+ cella = np.array(mrc.header.cella.tolist(), dtype=np.float32)
+ mapcrs = np.asarray([mrc.header.mapc, mrc.header.mapr, mrc.header.maps], dtype=int)
+ if np.sum(nstart) == 0:
+ return input_map_path
+ mrc.print_header()
+ mrc.close()
+
+ # Reorder
+ sort = np.asarray([0, 1, 2], dtype=np.int64)
+ for i in range(3):
+ sort[mapcrs[i] - 1] = i
+ nstart = np.asarray([nstart[i] for i in sort])
+ data = np.transpose(data, axes=2 - sort[::-1])
+
+ # Move offsets from nstart to origin
+ origin = origin + nstart * voxel_size
+
+ # Save the unified map
+ mrc_new = mrcfile.new(new_map_path, data=data, overwrite=True)
+ (mrc_new.header.origin.x, mrc_new.header.origin.y, mrc_new.header.origin.z) = origin
+ (mrc_new.header.cella.x, mrc_new.header.cella.y, mrc_new.header.cella.z) = cella
+ mrc_new.update_header_stats()
+ mrc_new.print_header()
+ mrc_new.close()
+
+ # Validate the new MRC file
+ mrcfile.validate(new_map_path)
+ return new_map_path
+
+
+if __name__ == "__main__":
+ input_map_path = Path('../example/21051.mrc')
+ new_map_path = Path('../example/21051_unified.mrc')
+ new_map_path = Unify_Map(input_map_path, new_map_path)
+ print(f"New map path is {new_map_path}")
diff --git a/CryoREAD/data_processing/__init__.py b/CryoREAD/data_processing/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/CryoREAD/data_processing/format_pdb.py b/CryoREAD/data_processing/format_pdb.py
new file mode 100644
index 0000000..5789446
--- /dev/null
+++ b/CryoREAD/data_processing/format_pdb.py
@@ -0,0 +1,177 @@
+#write a standard pdb output to be used for phenix refinement
+from Bio.PDB import MMCIFParser
+from Bio.PDB import PDBParser
+from collections import defaultdict
+import os
+#follow standard order
+_base_atoms_rna = ["P", "C5'","O5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]
+_base_atoms_dna = ["P", "C5'","O5'", "C4'", "O4'", "C3'", "O3'", "C2'", "C1'"]
+_base_atoms_protein = ['N',"CA","C","O"]
+residue_atoms_order = {
+ 'A': ["N1", "C2", "N3", "C4","C5","C6", "N6", "N7", "C8","N9" ],
+ 'G': ["N1", "C2", "N2", "N3", "C4","C5", "C6", "O6","N7", "C8","N9"],
+ 'C': ["N1", "C2", "O2", "N3", "C4", "N4", "C5", "C6"],
+ 'U': ["N1", "C2", "O2", "N3", "C4", "O4", "C5", "C6"],
+ 'T': ["N1", "C2", "O2", "N3", "C4", "O4", "C5", "C6",'C7'],
+ 'DA': ["N1", "C2", "N3", "C4","C5","C6", "N6", "N7", "C8","N9"],
+ 'DG': ["N1", "C2", "N2", "N3", "C4","C5", "C6", "O6","N7", "C8","N9"],
+ 'DC': ["N1", "C2", "O2", "N3", "C4", "N4", "C5", "C6"],
+ 'DT':["N1", "C2", "O2", "N3", "C4", "O4", "C5", "C6",'C7'],
+ "ALA": ["CB"],
+ "ARG": ["CB", "CG", "CD", "NE", "CZ", "NH1", "NH2"],
+ "ASN": ["CB", "CG", "OD1", "ND2"],
+ "ASP": ["CB", "CG", "OD1", "OD2"],
+ "CYS": ["CB", "SG"],
+ "GLU": ["CB", "CG", "CD", "OE1", "OE2"],
+ "GLN": ["CB", "CG", "CD", "OE1", "NE2"],
+ "GLY": [],
+ "HIS": ["CB", "CG", "ND1", "CD2", "CE1", "NE2"],
+ "ILE": ["CB", "CG1", "CG2", "CD1"],
+ "LEU": ["CB", "CG", "CD1", "CD2"],
+ "LYS": ["CB", "CG", "CD", "CE", "NZ"],
+ "MET": ["CB", "CG", "SD", "CE"],
+ "PHE": ["CB", "CG", "CD1", "CD2", "CE1", "CE2", "CZ"],
+ "PRO": ["CB", "CG", "CD"],
+ "SER": ["CB", "OG"],
+ "THR": ["CB", "OG1", "CG2"],
+ "TRP": ["CB", "CG", "CD1", "CD2", "NE1", "CE2", "CE3", "CZ2", "CZ3", "CH2"],
+ "TYR": ["CB", "CG", "CD1", "CD2", "CE1", "CE2", "CZ", "OH"],
+ "VAL": ["CB", "CG1", "CG2"]
+}
+_base_atoms_end = ["OP1","OP2","OP3"]
+map_dict={"P":"P","C5'":"C","O5'":"O","C4'":"C","O4'":"O","C3'":"C","O3'":"O","C2'":"C","O2'":"O","C1'":"C",
+ "N1":"N","C2":"C","O2":"O","N3":"N","C4":"C","N4":"N","C5":"C","C6":"C","O4":"O","N9":"N","C8":"C",
+ "N7":"N","N6":"N","O6":"O","OP1":"O","OP2":"O1-","OP3":"O","C7":"C","N2":"N","N":"N","CA":"C","O":"O","C":"C"}
+for key in residue_atoms_order:
+ tmp_list=residue_atoms_order[key]
+ for atom_name in tmp_list:
+ map_dict[atom_name]=atom_name[0]
+
+def write_res_info(wfile,current_atom_info,atomid,nucid):
+ if nucid>9999:
+ nucid=9999
+ if "C4'" in current_atom_info:
+ res_name = current_atom_info["C4'"][3]
+ elif "CA" in current_atom_info:
+ res_name = current_atom_info["CA"][3]
+ else:
+ return
+ if res_name not in residue_atoms_order:
+ print("unrecognized info: ",current_atom_info)
+ return
+ # res_name=residue.get_resname().replace(" ","")
+ # current_atom_info={} #[atom_name]:[information]
+ # for atom in residue.get_list():
+ # atom_name = atom.get_fullname().replace(" ","")
+ # atom_coord = atom.get_coord()
+ # format_coord = []
+ # atom_coord = str(atom_coord)
+ # atom_coord=atom_coord.replace("[","")
+ # atom_coord=atom_coord.replace("]","")
+ # atom_coord_split = atom_coord.split()
+ # for k in range(3):
+ # format_coord.append(float(atom_coord_split[k]))
+ # current_atom_info[atom_name]=format_coord
+ if "CA" in current_atom_info:
+ _base_atoms_order = _base_atoms_protein
+ elif "D" in res_name:
+ _base_atoms_order = _base_atoms_dna
+ else:
+ _base_atoms_order = _base_atoms_rna
+ for atom_name in _base_atoms_order:
+ if atomid>99999:
+ atomid=99999
+ if atom_name in current_atom_info:
+ format_coord=current_atom_info[atom_name]
+ assert format_coord[3]==res_name
+ line=""
+ line += "ATOM%7d %-4s %3s%2s%4d " % (atomid, atom_name,res_name, format_coord[4],nucid)
+ line = line + "%8.3f%8.3f%8.3f%6.2f%6.2f" % (format_coord[0],format_coord[1],format_coord[2], 1.0, 0)
+ line+=" "*11
+ line+=map_dict[atom_name]+"\n"
+ wfile.write(line)
+ atomid+=1
+ current_base_list = residue_atoms_order[res_name]
+ for atom_name in current_base_list:
+ if atomid>99999:
+ atomid=99999
+ if atom_name in current_atom_info:
+ format_coord=current_atom_info[atom_name]
+ assert format_coord[3]==res_name
+ line=""
+ line += "ATOM%7d %-4s %3s%2s%4d " % (atomid, atom_name,res_name, format_coord[4],nucid)
+ line = line + "%8.3f%8.3f%8.3f%6.2f%6.2f" % (format_coord[0],format_coord[1],format_coord[2], 1.0, 0)
+ line+=" "*11
+ line+=map_dict[atom_name]+"\n"
+ wfile.write(line)
+ atomid+=1
+ for atom_name in _base_atoms_end:
+ if atomid>99999:
+ atomid=99999
+ if atom_name in current_atom_info:
+ format_coord=current_atom_info[atom_name]
+ assert format_coord[3]==res_name
+ line=""
+ line += "ATOM%7d %-4s %3s%2s%4d " % (atomid, atom_name,res_name, format_coord[4],nucid)
+ line = line + "%8.3f%8.3f%8.3f%6.2f%6.2f" % (format_coord[0],format_coord[1],format_coord[2], 1.0, 0)
+ line+=" "*11
+ line+=map_dict[atom_name]+"\n"
+ wfile.write(line)
+ atomid+=1
+ nucid+=1
+ return atomid,nucid
+def format_pdb(input_pdb_path,output_pdb_path,DNA_label=False):
+ rna2dna_dict={"A":"DA","U":"DT","C":"DC","G":"DG","T":"DT",
+ "DT":"DT","DA":"DA","DC":"DC","DG":"DG"}
+ #parser = PDBParser()
+ #structure = parser.get_structure("input",input_pdb_path)
+ with open(output_pdb_path,"w") as wfile:
+ atomid=1
+ nucid=1
+ prev_resid=0
+ keep_info_dict={}
+ with open(input_pdb_path,'r') as rfile:
+ for read_line in rfile:
+ if (read_line.startswith('ATOM')):
+ chain_name = read_line[21]
+ atom_name = read_line[12:16].replace(" ","")
+ x=float(read_line[30:38])
+ y=float(read_line[38:46])
+ z=float(read_line[46:55])
+ resi=int(read_line[22:26])
+ res_name = read_line[17:20].replace(" ","")
+ if DNA_label:
+ res_name = rna2dna_dict[res_name]
+ if resi!=prev_resid:
+ if len(keep_info_dict)>0:
+ atomid,nucid =write_res_info(wfile,keep_info_dict,atomid,nucid)
+ keep_info_dict={}
+ keep_info_dict[atom_name]=[x,y,z,res_name,chain_name]
+ prev_resid=resi
+ else:
+ keep_info_dict[atom_name]=[x,y,z,res_name,chain_name]
+ #for model in structure.get_list():
+ # for chain in model.get_list():
+ #chain_id = chain.get_id()
+
+
+ return output_pdb_path
+
+def remove_op3_pdb(input_pdb_path,output_pdb_path):
+ with open(input_pdb_path,'r') as rfile:
+ with open(output_pdb_path,'w') as wfile:
+ for line in rfile:
+ if len(line)>4 and line[:4]=="ATOM":
+ atom_name = line[12:16].replace(" ","")
+ if atom_name!="OP3":
+ wfile.write(line)
+ else:
+ wfile.write(line)
+
+if __name__ == "__main__":
+ import sys
+ format_pdb(sys.argv[1],sys.argv[2])
+
+
+
+
diff --git a/CryoREAD/data_processing/map_utils.py b/CryoREAD/data_processing/map_utils.py
new file mode 100644
index 0000000..5a8a81e
--- /dev/null
+++ b/CryoREAD/data_processing/map_utils.py
@@ -0,0 +1,315 @@
+
+from ops.os_operation import mkdir
+import os
+import mrcfile
+import numpy as np
+
+def permute_ns_coord_to_pdb(input_coord,mapc,mapr,maps):
+ """
+ :param input_coord: [x,y,z] coord from pdb
+ :param mapc:
+ :param mapr:
+ :param maps:
+ :return:
+ """
+ if mapc==1 and mapr==2 and maps==3:
+ out_x = input_coord[0]#out_x coorespond to section
+ out_y = input_coord[1]#out_y correspond to row
+ out_z = input_coord[2]#out_z correspond to column
+ elif mapc==1 and mapr==3 and maps==2:
+ out_x = input_coord[0]
+ out_y = input_coord[2]
+ out_z = input_coord[1]
+ elif mapc == 2 and mapr == 1 and maps == 3:
+ out_x = input_coord[1]
+ out_y = input_coord[0]
+ out_z = input_coord[2]
+
+ elif mapc == 2 and mapr == 3 and maps == 1:
+ out_x = input_coord[2]
+ out_y = input_coord[0]
+ out_z = input_coord[1]
+ elif mapc == 3 and mapr == 1 and maps == 2:
+ out_x = input_coord[1]
+ out_y = input_coord[2]
+ out_z = input_coord[0]
+ elif mapc == 3 and mapr == 2 and maps == 1:
+ out_x = input_coord[2]
+ out_y = input_coord[1]
+ out_z = input_coord[0]
+ else:
+ exit()
+ return [out_x, out_y, out_z]
+
+def save_dens_map(save_map_path,new_dens,
+ origin_map_path):
+
+ with mrcfile.open(origin_map_path, permissive=True) as mrc:
+ prev_voxel_size = mrc.voxel_size
+ prev_voxel_size_x = float(prev_voxel_size['x'])
+ prev_voxel_size_y = float(prev_voxel_size['y'])
+ prev_voxel_size_z = float(prev_voxel_size['z'])
+ nx, ny, nz, nxs, nys, nzs, mx, my, mz = \
+ mrc.header.nx, mrc.header.ny, mrc.header.nz, \
+ mrc.header.nxstart, mrc.header.nystart, mrc.header.nzstart, \
+ mrc.header.mx, mrc.header.my, mrc.header.mz
+ orig = mrc.header.origin
+ print("Origin:", orig)
+ print("Previous voxel size:", prev_voxel_size)
+ print("nx, ny, nz", nx, ny, nz)
+ print("nxs,nys,nzs", nxs, nys, nzs)
+ print("mx,my,mz", mx, my, mz)
+
+ data_new = np.float32(new_dens)
+ mrc_new = mrcfile.new(save_map_path, data=data_new, overwrite=True)
+ vsize = mrc_new.voxel_size
+ vsize.flags.writeable = True
+ vsize.x = 1.0
+ vsize.y = 1.0
+ vsize.z = 1.0
+ mrc_new.voxel_size = vsize
+ mrc_new.update_header_from_data()
+ mrc_new.header.nxstart = nxs * prev_voxel_size_x
+ mrc_new.header.nystart = nys * prev_voxel_size_y
+ mrc_new.header.nzstart = nzs * prev_voxel_size_z
+ mrc_new.header.mapc = mrc.header.mapc
+ mrc_new.header.mapr = mrc.header.mapr
+ mrc_new.header.maps = mrc.header.maps
+ mrc_new.header.origin = orig
+ mrc_new.update_header_stats()
+ mrc.print_header()
+ mrc_new.print_header()
+ mrc_new.close()
+ del data_new
+def increase_map_density(input_path,output_path,add_contour):
+ add_contour=abs(add_contour)
+ with mrcfile.open(input_path,permissive=True) as mrc:
+ data=mrc.data
+ data=np.float32(data)
+ data = data+add_contour
+ save_dens_map(output_path,data,input_path)
+ return output_path
+def save_predict_specific_map(save_map_path,specific_class,prediction_array,
+ origin_map_path,label_only=False):
+ prediction = np.array(prediction_array)
+ #prediction[prediction!=specific_class]=0
+ if label_only:
+ prediction[prediction!=specific_class]=0
+ else:
+ prediction = prediction[specific_class]
+ with mrcfile.open(origin_map_path, permissive=True) as mrc:
+ prev_voxel_size = mrc.voxel_size
+ prev_voxel_size_x = float(prev_voxel_size['x'])
+ prev_voxel_size_y = float(prev_voxel_size['y'])
+ prev_voxel_size_z = float(prev_voxel_size['z'])
+ nx, ny, nz, nxs, nys, nzs, mx, my, mz = \
+ mrc.header.nx, mrc.header.ny, mrc.header.nz, \
+ mrc.header.nxstart, mrc.header.nystart, mrc.header.nzstart, \
+ mrc.header.mx, mrc.header.my, mrc.header.mz
+ orig = mrc.header.origin
+ print("Origin:", orig)
+ print("Previous voxel size:", prev_voxel_size)
+ print("nx, ny, nz", nx, ny, nz)
+ print("nxs,nys,nzs", nxs, nys, nzs)
+ print("mx,my,mz", mx, my, mz)
+
+ data_new = np.float32(prediction)
+ mrc_new = mrcfile.new(save_map_path, data=data_new, overwrite=True)
+ vsize = mrc_new.voxel_size
+ vsize.flags.writeable = True
+ vsize.x = 1.0
+ vsize.y = 1.0
+ vsize.z = 1.0
+ mrc_new.voxel_size = vsize
+ mrc_new.update_header_from_data()
+ mrc_new.header.nxstart = nxs * prev_voxel_size_x
+ mrc_new.header.nystart = nys * prev_voxel_size_y
+ mrc_new.header.nzstart = nzs * prev_voxel_size_z
+ mrc_new.header.mapc = mrc.header.mapc
+ mrc_new.header.mapr = mrc.header.mapr
+ mrc_new.header.maps = mrc.header.maps
+ mrc_new.header.origin = orig
+ mrc_new.update_header_stats()
+ mrc.print_header()
+ mrc_new.print_header()
+ mrc_new.close()
+ del data_new
+
+#finding the maximum density used to normalize
+def find_top_density(map_data,threshold):
+ use_density=map_data[map_data>0]
+
+ hist,bin_edges=np.histogram(use_density, bins=200)
+ #print(hist)
+ log_hist = [np.log(x) if x>0 else 0 for x in hist]
+ sum_cutoff=np.sum(log_hist)*threshold
+ cumulative=0
+ for j in range(len(log_hist)):
+ cumulative+=log_hist[j]
+ if cumulative>sum_cutoff:
+ return bin_edges[j]
+ return bin_edges[-1]
+
+def automate_contour(map_data):
+ use_density=map_data[map_data>1e-6]
+ #hist,bin_edges=np.histogram(use_density, bins=1000)
+ sorted_array = np.sort(use_density)
+ select_index=int(len(sorted_array)/1000) #usually 0
+ return sorted_array[select_index]
+def process_map_data(input_map_path):
+ with mrcfile.open(input_map_path, permissive=True) as mrc:
+ orig = mrc.header.origin
+ orig = str(orig)
+ orig = orig.replace("(","")
+ orig = orig.replace(")", "")
+ orig = orig.split(",")
+ new_origin = []
+ for k in range(3):
+ new_origin.append(float(orig[k]))
+ print("Origin:", new_origin)
+ data = mrc.data
+ mapc = mrc.header.mapc
+ mapr = mrc.header.mapr
+ maps = mrc.header.maps
+ print("detected mode mapc %d, mapr %d, maps %d" % (mapc, mapr, maps))
+ nxstart, nystart, nzstart = mrc.header.nxstart, mrc.header.nystart, mrc.header.nzstart
+ #mapc to x, mapr to y maps to z
+ return data, mapc, mapr, maps,new_origin,nxstart,nystart,nzstart
+
+def segment_map(input_map,output_map,contour=0):
+ """
+ segment meaningful region of a map
+ :param input_map:
+ :param output_map:
+ :return:
+ generate a new small size map
+ """
+ with mrcfile.open(input_map, permissive=True) as mrc:
+ prev_voxel_size = mrc.voxel_size
+ prev_voxel_size_x = float(prev_voxel_size['x'])
+ prev_voxel_size_y = float(prev_voxel_size['y'])
+ prev_voxel_size_z = float(prev_voxel_size['z'])
+ nx, ny, nz, nxs, nys, nzs, mx, my, mz = \
+ mrc.header.nx, mrc.header.ny, mrc.header.nz, \
+ mrc.header.nxstart, mrc.header.nystart, mrc.header.nzstart, \
+ mrc.header.mx, mrc.header.my, mrc.header.mz
+ orig = mrc.header.origin
+ #check the useful density in the input
+ input_data = mrc.data
+ useful_index = np.argwhere(input_data>contour)
+ min_x = int(np.min(useful_index[:,0]))
+ max_x = int(np.max(useful_index[:,0]))
+ min_y = int(np.min(useful_index[:,1]))
+ max_y = int(np.max(useful_index[:,1]))
+ min_z = int(np.min(useful_index[:,2]))
+ max_z = int(np.max(useful_index[:,2]))
+ mapc = mrc.header.mapc
+ mapr = mrc.header.mapr
+ maps = mrc.header.maps
+ new_data = input_data[min_x:max_x,min_y:max_y,min_z:max_z]
+ shift_start = permute_map_coord_to_pdb([min_x,min_y,min_z],mapc,mapr,maps)
+ origin = np.array(mrc.header.origin.tolist(), dtype=np.float32)
+ origin = np.array(origin)+np.array(shift_start)
+ mrc_new = mrcfile.new(output_map, data=new_data, overwrite=True)
+ vsize = mrc_new.voxel_size
+ vsize.flags.writeable = True
+ vsize.x = 1.0
+ vsize.y = 1.0
+ vsize.z = 1.0
+ mrc_new.voxel_size = vsize
+ mrc_new.update_header_from_data()
+ # mrc_new.header.nx = int(max_x-min_x)
+ # mrc_new.header.ny = int(max_y-min_y)
+ # mrc_new.header.nz = int(max_z-min_z)
+ mrc_new.header.nxstart = nxs * prev_voxel_size_x
+ mrc_new.header.nystart = nys * prev_voxel_size_y
+ mrc_new.header.nzstart = nzs * prev_voxel_size_z
+ # mrc_new.header.mx = int(max_x-min_x)
+ # mrc_new.header.my = int(max_y-min_y)
+ # mrc_new.header.mz = int(max_z-min_z)
+ mrc_new.header.mapc = mrc.header.mapc
+ mrc_new.header.mapr = mrc.header.mapr
+ mrc_new.header.maps = mrc.header.maps
+ # mrc_new.header.cella.x = int(max_x-min_x)
+ # mrc_new.header.cella.y = int(max_y-min_y)
+ # mrc_new.header.cella.z = int(max_z-min_z)
+ #mrc_new.header.origin = origin
+ (mrc_new.header.origin.x, mrc_new.header.origin.y, mrc_new.header.origin.z) = origin
+ mrc_new.update_header_stats()
+ mrc.print_header()
+ mrc_new.print_header()
+ mrc_new.close()
+ return output_map
+def permute_map_coord_to_pdb(input_coord,mapc,mapr,maps):
+ """
+ :param input_coord: [x,y,z] coord from pdb
+ :param mapc:
+ :param mapr:
+ :param maps:
+ :return:
+ """
+ if mapc==1 and mapr==2 and maps==3:
+ out_x = input_coord[2]#out_x coorespond to section
+ out_y = input_coord[1]#out_y correspond to row
+ out_z = input_coord[0]#out_z correspond to column
+ elif mapc==1 and mapr==3 and maps==2:
+ out_x = input_coord[2]
+ out_y = input_coord[0]
+ out_z = input_coord[1]
+ elif mapc == 2 and mapr == 1 and maps == 3:
+ out_x = input_coord[1]
+ out_y = input_coord[2]
+ out_z = input_coord[0]
+
+ elif mapc == 2 and mapr == 3 and maps == 1:
+ out_x = input_coord[0]
+ out_y = input_coord[2]
+ out_z = input_coord[1]
+ elif mapc == 3 and mapr == 1 and maps == 2:
+ out_x = input_coord[1]
+ out_y = input_coord[0]
+ out_z = input_coord[2]
+ elif mapc == 3 and mapr == 2 and maps == 1:
+ out_x = input_coord[0]
+ out_y = input_coord[1]
+ out_z = input_coord[2]
+ else:
+ exit()
+ return out_x, out_y, out_z
+
+def permute_pdb_coord_to_map(input_coord,mapc,mapr,maps):
+ """
+ :param input_coord: [x,y,z] coord from pdb
+ :param mapc:
+ :param mapr:
+ :param maps:
+ :return:
+ """
+ if mapc==1 and mapr==2 and maps==3:
+ out_x = input_coord[2]#out_x coorespond to section
+ out_y = input_coord[1]#out_y correspond to row
+ out_z = input_coord[0]#out_z correspond to column
+ elif mapc==1 and mapr==3 and maps==2:
+ out_x = input_coord[1]
+ out_y = input_coord[2]
+ out_z = input_coord[0]
+ elif mapc == 2 and mapr == 1 and maps == 3:
+ out_x = input_coord[2]
+ out_y = input_coord[0]
+ out_z = input_coord[1]
+
+ elif mapc == 2 and mapr == 3 and maps == 1:
+ out_x = input_coord[0]
+ out_y = input_coord[2]
+ out_z = input_coord[1]
+ elif mapc == 3 and mapr == 1 and maps == 2:
+ out_x = input_coord[1]
+ out_y = input_coord[0]
+ out_z = input_coord[2]
+ elif mapc == 3 and mapr == 2 and maps == 1:
+ out_x = input_coord[0]
+ out_y = input_coord[1]
+ out_z = input_coord[2]
+ else:
+ exit()
+ return out_x, out_y, out_z
diff --git a/CryoREAD/environment.yml b/CryoREAD/environment.yml
new file mode 100644
index 0000000..8b41d10
--- /dev/null
+++ b/CryoREAD/environment.yml
@@ -0,0 +1,22 @@
+name: CryoREAD
+channels:
+ - pytorch
+ - nvidia
+ - conda-forge
+ - anaconda
+ - defaults
+dependencies:
+ - cudatoolkit=11.1.74
+ - pip=21.1.3
+ - python=3.8.10
+ - pytorch=1.9.0
+ - pip:
+ - biopython
+ - numba==0.57.1
+ - numpy==1.24.4
+ - scipy==1.10.1
+ - tqdm==4.65.0
+ - ortools==9.4.1874
+ - mrcfile
+ - progress
+ - numba-progress
diff --git a/CryoREAD/evaluation/__init__.py b/CryoREAD/evaluation/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/CryoREAD/evaluation/evaluate_structure.py b/CryoREAD/evaluation/evaluate_structure.py
new file mode 100644
index 0000000..f48cd57
--- /dev/null
+++ b/CryoREAD/evaluation/evaluate_structure.py
@@ -0,0 +1,373 @@
+#make this file self consistent without too many dependency
+from collections import defaultdict
+import numpy as np
+from scipy.spatial.distance import cdist
+def cif2dict(input_cif_path,filter_list=None):
+ """
+ input_cif_path: input cif file path
+ return:
+ a dictionary in this format: [nuc_id][atom_id]:[coordinates]
+ """
+ begin_check=False
+ block_list=[]
+ with open(input_cif_path,'r') as rfile:
+ for line in rfile:
+ if "loop_" in line:
+ begin_check=True
+ continue
+
+ if begin_check and "_atom_site" in line:
+ block_list.append(line.strip("\n").replace(" ",""))
+ continue
+ if begin_check and "_atom_site" not in line:
+ begin_check=False
+ atom_ids = block_list.index('_atom_site.id')
+ try:
+ seq_ids = block_list.index('_atom_site.label_seq_id')
+ except:
+ seq_ids = block_list.index('_atom_site.auth_seq_id')
+ try:
+ chain_ids = block_list.index("_atom_site.auth_asym_id")
+ except:
+ chain_ids = block_list.index("_atom_site.label_asym_id")
+ atom_type_ids = block_list.index("_atom_site.label_atom_id")
+ res_name_ids = block_list.index("_atom_site.label_comp_id")
+ x_ids = block_list.index("_atom_site.Cartn_x")
+ y_ids = block_list.index("_atom_site.Cartn_y")
+ z_ids = block_list.index("_atom_site.Cartn_z")
+ structure_dict =defaultdict(dict)#[chain_id][nuc_id]:[coordinates, nuc_type,nuc_score]
+
+ map_dict={"A":0,"U":1,"T":1,"C":2,"G":3,"DA":0,"DU":1,"DT":1,"DC":2,"DG":3}
+ total_nuc_id=0
+ track_nuc_id=-1000
+ with open(input_cif_path,'r') as rfile:
+ for line in rfile:
+ if line.startswith("ATOM"):
+ split_result = line.split()
+ split_info=line.strip("\n").split()
+ current_atom_name = split_info[atom_type_ids]
+ current_atom_name = current_atom_name.replace(" ","")
+ current_res_index = int(split_info[seq_ids])
+ current_res_name = split_info[res_name_ids]
+ current_x = float(split_info[x_ids])
+ current_y = float(split_info[y_ids])
+ current_z = float(split_info[z_ids])
+ current_atom_name = current_atom_name.replace(" ","")
+ if current_res_name not in map_dict:
+ continue
+ pred_label = map_dict[current_res_name]
+
+ if filter_list is not None and current_atom_name not in filter_list:
+ continue
+ if track_nuc_id!=current_res_index:
+ track_nuc_id=current_res_index
+ total_nuc_id+=1
+
+ structure_dict[total_nuc_id][current_atom_name]=[current_x,current_y,current_z,pred_label]
+
+
+ return structure_dict
+
+def pdb2dict(input_pdb_path,filter_list=None):
+ """
+ input_pdb_path: input pdb file path
+ return:
+ a dictionary in this format: [nuc_id][atom_id]:[coordinates]
+
+ """
+ structure_dict =defaultdict(dict)#[chain_id][nuc_id]:[coordinates, nuc_type,nuc_score]
+ map_dict={"A":0,"U":1,"T":1,"C":2,"G":3,"DA":0,"DU":1,"DT":1,"DC":2,"DG":3}
+ total_nuc_id=0
+ track_nuc_id=-1000
+ with open(input_pdb_path,'r') as rfile:
+ for line in rfile:
+ if line.startswith("ATOM"):
+ chain_id = line[21]
+ atom_name = line[12:16]
+ x=float(line[30:38])
+ y=float(line[38:46])
+ z=float(line[46:54])
+ nuc_id=int(line[22:26])
+ resn = line[17:20]
+ coordinates = [x,y,z]
+ nuc_type = resn#split_result[5]
+ nuc_type = nuc_type.replace(" ","")
+ if nuc_type not in map_dict:
+ continue
+ pred_label = map_dict[nuc_type]
+ atom_name = atom_name.replace(" ","")
+ if filter_list is not None and atom_name not in filter_list:
+ continue
+ if track_nuc_id!=nuc_id:
+ track_nuc_id=nuc_id
+ total_nuc_id+=1
+
+ structure_dict[total_nuc_id][atom_name]=[x,y,z,pred_label]
+
+
+ return structure_dict
+def calcudate_atomwise_distmat(query_dict,target_dict):
+ """
+ query_dict: dictionary of predicted structure
+ target_dict: dictionary of native structure
+ return:
+ distance_matrix: K*M*N matrix, K is the number of atom types,
+ M is the number of nucleotides in query pdb, N is the number of nucleotides in target pdb
+ """
+ #first get atom list
+ query_keys=list(query_dict.keys())
+ target_keys = list(target_dict.keys())
+ tmp_key = query_keys[0]
+ atom_list = list(target_dict[tmp_key].keys())
+ distance_matrix = np.zeros([len(atom_list),len(query_keys),len(target_keys)]) #M*N matrix
+ count_matrix = np.zeros([len(query_keys),len(target_keys)]) #M*N matrix count the number of atoms for each pair of nucleotides
+ for select_atom in atom_list:
+ query_atom_list=[]
+ missing_query_index=[]
+ for k,nuc_id in enumerate(query_keys):
+ if select_atom not in query_dict[nuc_id]:
+
+ print("atom %s not found in target pdb"%select_atom)
+ query_atom_list.append([999999,999999,999999])#indicate record missing atom
+ missing_query_index.append(k)
+ else:
+ query_atom_list.append(query_dict[nuc_id][select_atom][:3])
+ target_atom_list=[]
+ missing_target_index=[]
+ for k,nuc_id in enumerate(target_keys):
+ if select_atom not in target_dict[nuc_id]:
+ print("atom %s not found in query pdb"%select_atom)
+ target_atom_list.append([999999,999999,999999])
+ missing_target_index.append(k)
+ else:
+ target_atom_list.append(target_dict[nuc_id][select_atom][:3])
+ query_atom_list=np.array(query_atom_list)
+ target_atom_list=np.array(target_atom_list)
+ current_dist_matrix = cdist(query_atom_list,target_atom_list) #M*N distance matrix for current atom type
+ for k in range(len(missing_query_index)):
+ current_dist_matrix[missing_query_index[k],:]=0
+ for k in range(len(missing_target_index)):
+ current_dist_matrix[:,missing_target_index[k]]=0
+ distance_matrix[atom_list.index(select_atom),:,:]=current_dist_matrix
+
+ tmp_count_matrix = np.ones([len(query_keys),len(target_keys)])
+ for k in range(len(missing_query_index)):
+ tmp_count_matrix[missing_query_index[k],:]=0
+ for k in range(len(missing_target_index)):
+ tmp_count_matrix[:,missing_target_index[k]]=0
+ count_matrix+=tmp_count_matrix
+ #check zero in count matrix
+ zero_index = np.where(count_matrix==0)
+ if len(zero_index[0])>0:
+ print("Some atoms are missing in the query/target pdb/cif, please check the input file")
+ exit()
+
+ return distance_matrix
+
+def calculate_eval_metric(query_dict,target_dict,distance_matrix,distance_atom_matrix,cutoff,max_cutoff=10.0):
+ """
+ query_dict: dictionary of predicted structure
+ target_dict: dictionary of native structure
+ distance_matrix: M*N matrix, M is the number of nucleotides in query pdb, N is the number of nucleotides in target pdb
+ distance_atom_matrix: K*M*N matrix, K is the number of atom types, M is the number of nucleotides in query pdb, N is the number of nucleotides in target pdb
+ cutoff: distance cutoff for evaluation
+ max_cutoff: a distance cutoff to calculate precision, when no match in 10A, we did not put it into calculation
+ return:
+ atom_coverage,atom_precision,sequence_match,sequence_match_prec,sequence_recall,sequence_prec,RMSD
+ """
+
+ atom_track="P"
+ #first find the closest pair of nucleotides
+ query_keys=list(query_dict.keys())
+ target_keys = list(target_dict.keys())
+ #calculate aotm coverage and precision
+ #calculate RMSD and denominator for sequence match regions
+ visit_target_set=set()
+ visit_query_set=set()
+ all_possible_match_dict = defaultdict(list)
+ RMSD=[]
+ for j,target_key in enumerate(target_keys):
+ cur_dist = distance_matrix[:,j]
+ current_query_index = np.argwhere(cur_dist<=cutoff)
+
+ if len(current_query_index)>0:
+ tmp_min_dist=max_cutoff
+ for tmp_index in current_query_index:
+ tmp_index = int(tmp_index)
+ visit_query_set.add(tmp_index)
+ all_possible_match_dict[j].append(tmp_index)
+ tmp_distance = np.mean(distance_atom_matrix[:,tmp_index,j])
+ if tmp_distancemax_cutoff:#this is because CryoREAD sometimes modeled region that EM-Map include but user did not model
+ continue
+ total_use_query+=1
+ RMSD= np.mean(np.array(RMSD))
+
+ visit_atom_query_set = visit_query_set
+ visit_atom_target_set = visit_target_set
+ atom_precision = len(visit_query_set)/total_use_query
+ atom_coverage = len(visit_target_set)/len(target_keys)
+ #get the atom coverage and precision
+ overall_match_dict={} #match_dict[query_nuc_id]=target_nuc_id
+ visit_query_set=set()#indicate if the target is already matched
+ visit_target_set=set()
+ #first check the closest pair of nucleotides
+ for j,target_key in enumerate(target_keys):
+ cur_dist = distance_matrix[:,j]
+ current_query_index = int(np.argmin(cur_dist))
+ query_key = query_keys[current_query_index]
+ match_dist = cur_dist[current_query_index]
+ if match_dist>cutoff:
+ continue
+ if current_query_index in visit_query_set or j in visit_target_set:
+ continue
+ try:
+ current_query_base = query_dict[query_key][atom_track][3]
+ current_target_base = target_dict[target_key][atom_track][3]
+ except:
+ #for some cases they missed P atoms
+ current_query_base = query_dict[query_key]["C4'"][3]
+ current_target_base = target_dict[target_key]["C4'"][3]
+ if current_query_base!=current_target_base:
+ continue
+ #1 predict base can only be mapped to one native base
+ overall_match_dict[query_key]=target_key
+ visit_query_set.add(current_query_index)
+ visit_target_set.add(j)
+ #then check all the remained nucleotides in target that did not find match
+ for j,target_key in enumerate(target_keys):
+ if j in visit_target_set:
+ continue
+ cur_dist = distance_matrix[:,j]
+ try:
+ current_target_base = target_dict[target_key][atom_track][3]
+ except:
+ #for some cases they missed P atoms
+ #print(target_dict[target_key])
+ current_target_base = target_dict[target_key]["C4'"][3]
+ current_query_index = np.argwhere(cur_dist<=cutoff)
+ if len(current_query_index)>0:
+ for tmp_index in current_query_index:
+ tmp_index = int(tmp_index)
+ if tmp_index in visit_query_set:
+ continue
+ query_key = query_keys[tmp_index]
+ try:
+ current_query_base = query_dict[query_key][atom_track][3]
+ except:
+ current_query_base = query_dict[query_key]["C4'"][3]
+
+ if current_query_base==current_target_base:
+ match_dist = cur_dist[tmp_index]
+ #further check if this matched base has another closet match and also has correct base type
+ cur_dist2 = distance_matrix[tmp_index,:]
+ current_tmp_target_index = np.argwhere(cur_dist2<=match_dist)
+ if len(current_tmp_target_index)==0:
+ overall_match_dict[query_key]=target_key
+ visit_query_set.add(tmp_index)
+ visit_target_set.add(j)
+ break
+ else:
+ for tmp_target_index in current_tmp_target_index:
+ tmp_target_index = int(tmp_target_index)
+ if tmp_target_index in visit_target_set:
+ continue
+ tmp_target_key = target_keys[tmp_target_index]
+ try:
+ judge_flag=query_dict[query_key][atom_track][3]==target_dict[tmp_target_key][atom_track][3]
+ except:
+ judge_flag=query_dict[query_key]["C4'"][3]==target_dict[tmp_target_key]["C4'"][3]
+ if judge_flag:
+ overall_match_dict[query_key]=tmp_target_key
+ visit_query_set.add(tmp_index)
+ visit_target_set.add(tmp_target_index)
+ break
+ if query_key not in overall_match_dict:
+ overall_match_dict[query_key]=target_key
+ visit_query_set.add(tmp_index)
+ visit_target_set.add(j)
+ break
+ total_match = len(overall_match_dict)
+ sequence_match = total_match/len(visit_atom_target_set)
+ sequence_match_prec = total_match/len(visit_atom_query_set)
+ sequence_recall = total_match/len(target_keys)
+ sequence_prec = total_match/total_use_query
+ #calculate rmsd based on the matched nucleotides
+ return atom_coverage,atom_precision,sequence_match,sequence_match_prec,sequence_recall,sequence_prec,RMSD
+def calcudate_center_distmat(query_dict,target_dict):
+ """
+ query_dict: dictionary of predicted structure
+ target_dict: dictionary of native structure
+ return:
+ distance_matrix: M*N matrix, M is the number of nucleotides in query pdb, N is the number of nucleotides in target pdb
+ """
+ #first get atom list
+ query_keys=list(query_dict.keys())
+ target_keys = list(target_dict.keys())
+ query_center={}
+ key_center={}
+ for key in query_keys:
+ query_center[key]=np.mean(np.array(list(query_dict[key].values()),dtype=float),axis=0)
+ for key in target_keys:
+ key_center[key]=np.mean(np.array(list(target_dict[key].values()),dtype=float),axis=0)
+ distance_matrix = cdist(np.array(list(query_center.values()),dtype=float),np.array(list(key_center.values()),dtype=float))
+ return distance_matrix
+def evaluate_structure(query_pdb,target_pdb,cutoff=5.0,max_cutoff=10.0):
+
+ """
+ query_pdb: predicted pdb file
+ target_pdb: native pdb file
+ cutoff: distance cutoff for evaluation
+ max_cutoff: a distance cutoff to calculate precision, when no match in 10A, we did not put it into calculation
+ """
+ _pho_atoms=["OP3", "P", "OP1", "OP2", "O5'",]
+ _sugar_atoms = [ "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]
+
+ filter_list = _pho_atoms+_sugar_atoms+["N1"] #backbone atoms for evaluation
+ if query_pdb.endswith(".cif"):
+ query_dict = cif2dict(query_pdb,filter_list)
+ elif query_pdb.endswith(".pdb"):
+ query_dict = pdb2dict(query_pdb,filter_list)
+ else:
+ print("format of query file %s not supported"%query_pdb)
+ exit()
+ print("Input query pdb includes %d nucleotides"%len(query_dict))
+ if len(query_dict)==0:
+ print("No nucleotide found in the query pdb/cif file")
+ exit()
+ #read the target pdb file
+ if target_pdb.endswith(".cif"):
+ target_dict = cif2dict(target_pdb,filter_list)
+ elif target_pdb.endswith(".pdb"):
+ target_dict = pdb2dict(target_pdb,filter_list)
+ else:
+ print("format of target file %s not supported"%target_pdb)
+ exit()
+ print("Input target pdb includes %d nucleotides"%len(target_dict))
+ if len(target_dict)==0:
+ print("No nucleotide found in the target pdb/cif file")
+ exit()
+ #calculate the distance between the two structures: N*M distance matrix, this is the average distance of corresponding atoms between the two structures
+ distance_matrix = calcudate_center_distmat(query_dict,target_dict) #this is for match
+ distance_atom_matrix = calcudate_atomwise_distmat(query_dict,target_dict)#this is for calculating the atom-wise RMSD
+ #calculate atom coverage, atom precision, sequuence recall(match),sequence precision(match), sequence recall, sequence precision, RMSD, base-RMSD
+ atom_coverage,atom_precision,sequence_match,sequence_match_prec,\
+ sequence_recall,sequence_prec,RMSD = calculate_eval_metric(query_dict,target_dict,
+ distance_matrix,distance_atom_matrix,
+ cutoff,
+ max_cutoff=max_cutoff)
+ print("*"*100)
+ print("Atom Coverage: %.3f"%atom_coverage+" Atom Precision: %.3f"%atom_precision)
+ print("Sequence Recall(Match): %.3f"%sequence_match+" Sequence Precision(Match): %.3f"%sequence_match_prec)
+ print("Sequence Recall: %.3f"%sequence_recall+" Sequence Precision: %.3f"%sequence_prec)
+ print("RMSD: %.3f"%RMSD)
+ print("*"*100)
diff --git a/CryoREAD/example/21051.fasta b/CryoREAD/example/21051.fasta
new file mode 100644
index 0000000..90b63ed
--- /dev/null
+++ b/CryoREAD/example/21051.fasta
@@ -0,0 +1,2 @@
+>21051_3
+GACAUACUUGUUCCACUCUAGCAGCACGUAAAUAUUGGCACCAAUAUUACUGUGCUGCUUUAGUGUGACAGGGAUACA
diff --git a/CryoREAD/example/21051.mrc b/CryoREAD/example/21051.mrc
new file mode 100644
index 0000000..708d362
Binary files /dev/null and b/CryoREAD/example/21051.mrc differ
diff --git a/CryoREAD/example/6v5b_drna.pdb b/CryoREAD/example/6v5b_drna.pdb
new file mode 100644
index 0000000..ac3600c
--- /dev/null
+++ b/CryoREAD/example/6v5b_drna.pdb
@@ -0,0 +1,1680 @@
+COMPND MOL_ID: 1;
+COMPND 2 MOLECULE: RNA (78-MER);
+COMPND 3 CHAIN: D;
+COMPND 4 ENGINEERED: YES
+SOURCE MOL_ID: 1
+REMARK 2
+REMARK 2 RESOLUTION. NOT APPLICABLE.
+REMARK 4
+REMARK 4 NULL COMPLIES WITH FORMAT V. 3.30, 13-JUL-11
+REMARK 250
+REMARK 250 EXPERIMENTAL DETAILS
+REMARK 250 EXPERIMENT TYPE : NULL
+REMARK 250 DATE OF DATA COLLECTION : NULL
+REMARK 250
+REMARK 250 REMARK: NULL
+SEQRES 1 D 78 G A C A U A C U U G U U C
+SEQRES 2 D 78 C A C U C U A G C A G C A
+SEQRES 3 D 78 C G U A A A U A U U G G C
+SEQRES 4 D 78 A C C A A U A U U A C U G
+SEQRES 5 D 78 U G C U G C U U U A G U G
+SEQRES 6 D 78 U G A C A G G G A U A C A
+CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1
+ORIGX1 1.000000 0.000000 0.000000 0.00000
+ORIGX2 0.000000 1.000000 0.000000 0.00000
+ORIGX3 0.000000 0.000000 1.000000 0.00000
+ATOM 1 P G D 3 141.405 105.694 92.037 1.00214.73 P
+ATOM 2 OP1 G D 3 142.763 105.113 91.898 1.00214.73 O
+ATOM 3 OP2 G D 3 140.218 104.815 91.892 1.00214.73 O
+ATOM 4 O5' G D 3 141.324 106.418 93.454 1.00214.73 O
+ATOM 5 C5' G D 3 142.449 106.443 94.321 1.00214.73 C
+ATOM 6 C4' G D 3 143.084 107.810 94.359 1.00214.73 C
+ATOM 7 O4' G D 3 143.067 108.389 93.028 1.00214.73 O
+ATOM 8 C3' G D 3 142.385 108.849 95.220 1.00214.73 C
+ATOM 9 O3' G D 3 142.711 108.744 96.595 1.00214.73 O
+ATOM 10 C2' G D 3 142.837 110.157 94.590 1.00214.73 C
+ATOM 11 O2' G D 3 144.161 110.473 94.992 1.00214.73 O
+ATOM 12 C1' G D 3 142.859 109.783 93.112 1.00214.73 C
+ATOM 13 N9 G D 3 141.586 110.108 92.439 1.00214.73 N
+ATOM 14 C8 G D 3 140.635 109.219 92.003 1.00214.73 C
+ATOM 15 N7 G D 3 139.612 109.796 91.437 1.00214.73 N
+ATOM 16 C5 G D 3 139.906 111.147 91.502 1.00214.73 C
+ATOM 17 C6 G D 3 139.165 112.263 91.051 1.00214.73 C
+ATOM 18 O6 G D 3 138.072 112.283 90.479 1.00214.73 O
+ATOM 19 N1 G D 3 139.826 113.453 91.310 1.00214.73 N
+ATOM 20 C2 G D 3 141.041 113.558 91.929 1.00214.73 C
+ATOM 21 N2 G D 3 141.504 114.803 92.092 1.00214.73 N
+ATOM 22 N3 G D 3 141.743 112.527 92.362 1.00214.73 N
+ATOM 23 C4 G D 3 141.120 111.358 92.115 1.00214.73 C
+ATOM 24 P A D 4 141.652 109.200 97.713 1.00205.49 P
+ATOM 25 OP1 A D 4 142.111 108.703 99.033 1.00205.49 O
+ATOM 26 OP2 A D 4 140.299 108.841 97.221 1.00205.49 O
+ATOM 27 O5' A D 4 141.763 110.788 97.731 1.00205.49 O
+ATOM 28 C5' A D 4 142.980 111.434 98.072 1.00205.49 C
+ATOM 29 C4' A D 4 142.926 112.900 97.731 1.00205.49 C
+ATOM 30 O4' A D 4 142.690 113.062 96.309 1.00205.49 O
+ATOM 31 C3' A D 4 141.798 113.680 98.385 1.00205.49 C
+ATOM 32 O3' A D 4 142.113 114.081 99.704 1.00205.49 O
+ATOM 33 C2' A D 4 141.594 114.845 97.427 1.00205.49 C
+ATOM 34 O2' A D 4 142.588 115.837 97.634 1.00205.49 O
+ATOM 35 C1' A D 4 141.857 114.178 96.080 1.00205.49 C
+ATOM 36 N9 A D 4 140.624 113.705 95.425 1.00205.49 N
+ATOM 37 C8 A D 4 140.228 112.404 95.263 1.00205.49 C
+ATOM 38 N7 A D 4 139.096 112.259 94.627 1.00205.49 N
+ATOM 39 C5 A D 4 138.723 113.561 94.353 1.00205.49 C
+ATOM 40 C6 A D 4 137.611 114.093 93.696 1.00205.49 C
+ATOM 41 N6 A D 4 136.644 113.332 93.190 1.00205.49 N
+ATOM 42 N1 A D 4 137.525 115.434 93.581 1.00205.49 N
+ATOM 43 C2 A D 4 138.504 116.188 94.093 1.00205.49 C
+ATOM 44 N3 A D 4 139.601 115.800 94.734 1.00205.49 N
+ATOM 45 C4 A D 4 139.650 114.464 94.832 1.00205.49 C
+ATOM 46 P C D 5 141.186 113.617 100.928 1.00184.62 P
+ATOM 47 OP1 C D 5 142.069 113.202 102.043 1.00184.62 O
+ATOM 48 OP2 C D 5 140.165 112.679 100.401 1.00184.62 O
+ATOM 49 O5' C D 5 140.456 114.956 101.370 1.00184.62 O
+ATOM 50 C5' C D 5 141.197 116.151 101.529 1.00184.62 C
+ATOM 51 C4' C D 5 140.466 117.326 100.940 1.00184.62 C
+ATOM 52 O4' C D 5 140.166 117.067 99.545 1.00184.62 O
+ATOM 53 C3' C D 5 139.115 117.634 101.555 1.00184.62 C
+ATOM 54 O3' C D 5 139.223 118.385 102.750 1.00184.62 O
+ATOM 55 C2' C D 5 138.407 118.385 100.442 1.00184.62 C
+ATOM 56 O2' C D 5 138.829 119.737 100.416 1.00184.62 O
+ATOM 57 C1' C D 5 138.952 117.691 99.193 1.00184.62 C
+ATOM 58 N1 C D 5 138.027 116.673 98.645 1.00184.62 N
+ATOM 59 C2 C D 5 136.908 117.059 97.900 1.00184.62 C
+ATOM 60 O2 C D 5 136.669 118.254 97.698 1.00184.62 O
+ATOM 61 N3 C D 5 136.086 116.113 97.405 1.00184.62 N
+ATOM 62 C4 C D 5 136.345 114.829 97.619 1.00184.62 C
+ATOM 63 N4 C D 5 135.506 113.930 97.110 1.00184.62 N
+ATOM 64 C5 C D 5 137.475 114.402 98.362 1.00184.62 C
+ATOM 65 C6 C D 5 138.279 115.349 98.846 1.00184.62 C
+ATOM 66 P A D 6 138.178 118.158 103.945 1.00162.68 P
+ATOM 67 OP1 A D 6 138.488 119.118 105.029 1.00162.68 O
+ATOM 68 OP2 A D 6 138.105 116.708 104.237 1.00162.68 O
+ATOM 69 O5' A D 6 136.784 118.582 103.318 1.00162.68 O
+ATOM 70 C5' A D 6 135.584 117.979 103.768 1.00162.68 C
+ATOM 71 C4' A D 6 134.452 118.242 102.813 1.00162.68 C
+ATOM 72 O4' A D 6 134.714 117.593 101.546 1.00162.68 O
+ATOM 73 C3' A D 6 133.105 117.695 103.227 1.00162.68 C
+ATOM 74 O3' A D 6 132.465 118.519 104.172 1.00162.68 O
+ATOM 75 C2' A D 6 132.373 117.599 101.903 1.00162.68 C
+ATOM 76 O2' A D 6 131.949 118.888 101.494 1.00162.68 O
+ATOM 77 C1' A D 6 133.500 117.165 100.971 1.00162.68 C
+ATOM 78 N9 A D 6 133.554 115.705 100.779 1.00162.68 N
+ATOM 79 C8 A D 6 134.005 114.720 101.618 1.00162.68 C
+ATOM 80 N7 A D 6 133.914 113.513 101.120 1.00162.68 N
+ATOM 81 C5 A D 6 133.372 113.725 99.869 1.00162.68 C
+ATOM 82 C6 A D 6 133.019 112.866 98.826 1.00162.68 C
+ATOM 83 N6 A D 6 133.172 111.548 98.866 1.00162.68 N
+ATOM 84 N1 A D 6 132.500 113.403 97.706 1.00162.68 N
+ATOM 85 C2 A D 6 132.340 114.726 97.639 1.00162.68 C
+ATOM 86 N3 A D 6 132.635 115.637 98.557 1.00162.68 N
+ATOM 87 C4 A D 6 133.147 115.063 99.651 1.00162.68 C
+ATOM 88 P U D 7 132.040 117.896 105.579 1.00154.12 P
+ATOM 89 OP1 U D 7 132.518 118.771 106.673 1.00154.12 O
+ATOM 90 OP2 U D 7 132.461 116.480 105.543 1.00154.12 O
+ATOM 91 O5' U D 7 130.456 117.957 105.537 1.00154.12 O
+ATOM 92 C5' U D 7 129.791 119.131 105.102 1.00154.12 C
+ATOM 93 C4' U D 7 128.415 119.201 105.694 1.00154.12 C
+ATOM 94 O4' U D 7 128.289 118.118 106.645 1.00154.12 O
+ATOM 95 C3' U D 7 128.137 120.463 106.494 1.00154.12 C
+ATOM 96 O3' U D 7 127.607 121.495 105.687 1.00154.12 O
+ATOM 97 C2' U D 7 127.189 119.995 107.581 1.00154.12 C
+ATOM 98 O2' U D 7 125.862 119.951 107.091 1.00154.12 O
+ATOM 99 C1' U D 7 127.663 118.564 107.817 1.00154.12 C
+ATOM 100 N1 U D 7 128.637 118.456 108.924 1.00154.12 N
+ATOM 101 C2 U D 7 128.116 118.364 110.184 1.00154.12 C
+ATOM 102 O2 U D 7 126.924 118.382 110.410 1.00154.12 O
+ATOM 103 N3 U D 7 129.031 118.253 111.185 1.00154.12 N
+ATOM 104 C4 U D 7 130.387 118.227 111.046 1.00154.12 C
+ATOM 105 O4 U D 7 131.065 118.124 112.051 1.00154.12 O
+ATOM 106 C5 U D 7 130.871 118.313 109.716 1.00154.12 C
+ATOM 107 C6 U D 7 129.995 118.414 108.729 1.00154.12 C
+ATOM 108 P A D 8 127.845 123.029 106.079 1.00162.36 P
+ATOM 109 OP1 A D 8 128.762 123.066 107.238 1.00162.36 O
+ATOM 110 OP2 A D 8 126.514 123.665 106.188 1.00162.36 O
+ATOM 111 O5' A D 8 128.596 123.647 104.818 1.00162.36 O
+ATOM 112 C5' A D 8 129.469 122.856 104.018 1.00162.36 C
+ATOM 113 C4' A D 8 130.903 123.289 104.166 1.00162.36 C
+ATOM 114 O4' A D 8 131.765 122.376 103.438 1.00162.36 O
+ATOM 115 C3' A D 8 131.216 124.674 103.625 1.00162.36 C
+ATOM 116 O3' A D 8 132.267 125.222 104.396 1.00162.36 O
+ATOM 117 C2' A D 8 131.745 124.374 102.239 1.00162.36 C
+ATOM 118 O2' A D 8 132.569 125.381 101.705 1.00162.36 O
+ATOM 119 C1' A D 8 132.524 123.100 102.495 1.00162.36 C
+ATOM 120 N9 A D 8 132.683 122.280 101.305 1.00162.36 N
+ATOM 121 C8 A D 8 131.753 121.980 100.350 1.00162.36 C
+ATOM 122 N7 A D 8 132.215 121.220 99.393 1.00162.36 N
+ATOM 123 C5 A D 8 133.535 121.021 99.754 1.00162.36 C
+ATOM 124 C6 A D 8 134.581 120.305 99.167 1.00162.36 C
+ATOM 125 N6 A D 8 134.470 119.616 98.029 1.00162.36 N
+ATOM 126 N1 A D 8 135.772 120.323 99.797 1.00162.36 N
+ATOM 127 C2 A D 8 135.910 121.004 100.939 1.00162.36 C
+ATOM 128 N3 A D 8 134.998 121.715 101.585 1.00162.36 N
+ATOM 129 C4 A D 8 133.832 121.669 100.931 1.00162.36 C
+ATOM 130 P C D 9 131.911 125.888 105.798 1.00139.38 P
+ATOM 131 OP1 C D 9 132.405 124.986 106.866 1.00139.38 O
+ATOM 132 OP2 C D 9 130.474 126.227 105.722 1.00139.38 O
+ATOM 133 O5' C D 9 132.723 127.255 105.802 1.00139.38 O
+ATOM 134 C5' C D 9 132.243 128.379 106.518 1.00139.38 C
+ATOM 135 C4' C D 9 132.212 129.610 105.658 1.00139.38 C
+ATOM 136 O4' C D 9 131.567 129.303 104.398 1.00139.38 O
+ATOM 137 C3' C D 9 131.410 130.764 106.224 1.00139.38 C
+ATOM 138 O3' C D 9 132.173 131.552 107.113 1.00139.38 O
+ATOM 139 C2' C D 9 130.990 131.526 104.987 1.00139.38 C
+ATOM 140 O2' C D 9 132.051 132.356 104.557 1.00139.38 O
+ATOM 141 C1' C D 9 130.806 130.403 103.964 1.00139.38 C
+ATOM 142 N1 C D 9 129.401 129.955 103.797 1.00139.38 N
+ATOM 143 C2 C D 9 128.355 130.765 103.328 1.00139.38 C
+ATOM 144 O2 C D 9 128.508 131.957 103.054 1.00139.38 O
+ATOM 145 N3 C D 9 127.119 130.251 103.208 1.00139.38 N
+ATOM 146 C4 C D 9 126.895 128.982 103.490 1.00139.38 C
+ATOM 147 N4 C D 9 125.659 128.525 103.343 1.00139.38 N
+ATOM 148 C5 C D 9 127.920 128.123 103.935 1.00139.38 C
+ATOM 149 C6 C D 9 129.134 128.650 104.059 1.00139.38 C
+ATOM 150 P U D 10 131.535 132.049 108.493 1.00134.04 P
+ATOM 151 OP1 U D 10 132.554 132.809 109.242 1.00134.04 O
+ATOM 152 OP2 U D 10 130.878 130.894 109.130 1.00134.04 O
+ATOM 153 O5' U D 10 130.406 133.065 108.043 1.00134.04 O
+ATOM 154 C5' U D 10 130.753 134.281 107.414 1.00134.04 C
+ATOM 155 C4' U D 10 129.532 134.992 106.907 1.00134.04 C
+ATOM 156 O4' U D 10 128.895 134.204 105.877 1.00134.04 O
+ATOM 157 C3' U D 10 128.439 135.214 107.928 1.00134.04 C
+ATOM 158 O3' U D 10 128.699 136.347 108.730 1.00134.04 O
+ATOM 159 C2' U D 10 127.204 135.367 107.060 1.00134.04 C
+ATOM 160 O2' U D 10 127.149 136.677 106.527 1.00134.04 O
+ATOM 161 C1' U D 10 127.503 134.407 105.913 1.00134.04 C
+ATOM 162 N1 U D 10 126.841 133.092 106.044 1.00134.04 N
+ATOM 163 C2 U D 10 125.492 132.949 105.790 1.00134.04 C
+ATOM 164 O2 U D 10 124.748 133.857 105.491 1.00134.04 O
+ATOM 165 N3 U D 10 125.007 131.681 105.923 1.00134.04 N
+ATOM 166 C4 U D 10 125.725 130.560 106.252 1.00134.04 C
+ATOM 167 O4 U D 10 125.161 129.477 106.331 1.00134.04 O
+ATOM 168 C5 U D 10 127.106 130.780 106.482 1.00134.04 C
+ATOM 169 C6 U D 10 127.600 132.006 106.366 1.00134.04 C
+ATOM 170 P U D 11 128.280 136.362 110.273 1.00129.03 P
+ATOM 171 OP1 U D 11 129.014 137.445 110.956 1.00129.03 O
+ATOM 172 OP2 U D 11 128.359 134.981 110.790 1.00129.03 O
+ATOM 173 O5' U D 11 126.766 136.817 110.228 1.00129.03 O
+ATOM 174 C5' U D 11 126.404 137.980 109.512 1.00129.03 C
+ATOM 175 C4' U D 11 124.925 138.019 109.271 1.00129.03 C
+ATOM 176 O4' U D 11 124.538 136.984 108.346 1.00129.03 O
+ATOM 177 C3' U D 11 124.070 137.761 110.485 1.00129.03 C
+ATOM 178 O3' U D 11 123.958 138.927 111.268 1.00129.03 O
+ATOM 179 C2' U D 11 122.751 137.293 109.882 1.00129.03 C
+ATOM 180 O2' U D 11 121.956 138.406 109.520 1.00129.03 O
+ATOM 181 C1' U D 11 123.211 136.592 108.600 1.00129.03 C
+ATOM 182 N1 U D 11 123.164 135.120 108.659 1.00129.03 N
+ATOM 183 C2 U D 11 121.988 134.464 108.400 1.00129.03 C
+ATOM 184 O2 U D 11 120.958 135.043 108.150 1.00129.03 O
+ATOM 185 N3 U D 11 122.072 133.101 108.464 1.00129.03 N
+ATOM 186 C4 U D 11 123.190 132.357 108.734 1.00129.03 C
+ATOM 187 O4 U D 11 123.135 131.135 108.760 1.00129.03 O
+ATOM 188 C5 U D 11 124.365 133.111 108.972 1.00129.03 C
+ATOM 189 C6 U D 11 124.307 134.432 108.918 1.00129.03 C
+ATOM 190 P G D 12 123.457 138.842 112.780 1.00128.15 P
+ATOM 191 OP1 G D 12 123.943 140.037 113.496 1.00128.15 O
+ATOM 192 OP2 G D 12 123.772 137.503 113.313 1.00128.15 O
+ATOM 193 O5' G D 12 121.885 138.963 112.639 1.00128.15 O
+ATOM 194 C5' G D 12 121.042 138.543 113.687 1.00128.15 C
+ATOM 195 C4' G D 12 119.731 138.044 113.154 1.00128.15 C
+ATOM 196 O4' G D 12 119.958 137.217 111.991 1.00128.15 O
+ATOM 197 C3' G D 12 118.947 137.158 114.097 1.00128.15 C
+ATOM 198 O3' G D 12 118.184 137.917 115.007 1.00128.15 O
+ATOM 199 C2' G D 12 118.088 136.333 113.154 1.00128.15 C
+ATOM 200 O2' G D 12 116.954 137.079 112.749 1.00128.15 O
+ATOM 201 C1' G D 12 119.003 136.183 111.944 1.00128.15 C
+ATOM 202 N9 G D 12 119.710 134.891 111.876 1.00128.15 N
+ATOM 203 C8 G D 12 121.060 134.786 111.704 1.00128.15 C
+ATOM 204 N7 G D 12 121.490 133.567 111.627 1.00128.15 N
+ATOM 205 C5 G D 12 120.344 132.819 111.739 1.00128.15 C
+ATOM 206 C6 G D 12 120.213 131.429 111.731 1.00128.15 C
+ATOM 207 O6 G D 12 121.103 130.597 111.604 1.00128.15 O
+ATOM 208 N1 G D 12 118.903 131.023 111.859 1.00128.15 N
+ATOM 209 C2 G D 12 117.852 131.875 111.997 1.00128.15 C
+ATOM 210 N2 G D 12 116.646 131.314 112.119 1.00128.15 N
+ATOM 211 N3 G D 12 117.961 133.189 112.016 1.00128.15 N
+ATOM 212 C4 G D 12 119.231 133.601 111.880 1.00128.15 C
+ATOM 213 P U D 13 118.745 138.233 116.471 1.00137.94 P
+ATOM 214 OP1 U D 13 118.731 139.701 116.661 1.00137.94 O
+ATOM 215 OP2 U D 13 120.011 137.492 116.659 1.00137.94 O
+ATOM 216 O5' U D 13 117.654 137.583 117.421 1.00137.94 O
+ATOM 217 C5' U D 13 117.741 136.217 117.771 1.00137.94 C
+ATOM 218 C4' U D 13 116.566 135.422 117.270 1.00137.94 C
+ATOM 219 O4' U D 13 116.836 134.949 115.922 1.00137.94 O
+ATOM 220 C3' U D 13 116.303 134.160 118.062 1.00137.94 C
+ATOM 221 O3' U D 13 115.514 134.404 119.205 1.00137.94 O
+ATOM 222 C2' U D 13 115.642 133.244 117.048 1.00137.94 C
+ATOM 223 O2' U D 13 114.281 133.595 116.874 1.00137.94 O
+ATOM 224 C1' U D 13 116.392 133.616 115.773 1.00137.94 C
+ATOM 225 N1 U D 13 117.584 132.764 115.503 1.00137.94 N
+ATOM 226 C2 U D 13 117.556 131.380 115.333 1.00137.94 C
+ATOM 227 O2 U D 13 116.570 130.673 115.409 1.00137.94 O
+ATOM 228 N3 U D 13 118.778 130.805 115.070 1.00137.94 N
+ATOM 229 C4 U D 13 119.987 131.448 114.953 1.00137.94 C
+ATOM 230 O4 U D 13 121.016 130.833 114.707 1.00137.94 O
+ATOM 231 C5 U D 13 119.938 132.848 115.115 1.00137.94 C
+ATOM 232 C6 U D 13 118.776 133.424 115.361 1.00137.94 C
+ATOM 233 P U D 14 116.122 134.134 120.660 1.00140.32 P
+ATOM 234 OP1 U D 14 115.141 134.620 121.654 1.00140.32 O
+ATOM 235 OP2 U D 14 117.506 134.656 120.689 1.00140.32 O
+ATOM 236 O5' U D 14 116.169 132.548 120.750 1.00140.32 O
+ATOM 237 C5' U D 14 114.995 131.786 120.527 1.00140.32 C
+ATOM 238 C4' U D 14 115.319 130.369 120.145 1.00140.32 C
+ATOM 239 O4' U D 14 116.004 130.331 118.867 1.00140.32 O
+ATOM 240 C3' U D 14 116.259 129.638 121.077 1.00140.32 C
+ATOM 241 O3' U D 14 115.614 129.184 122.247 1.00140.32 O
+ATOM 242 C2' U D 14 116.786 128.523 120.190 1.00140.32 C
+ATOM 243 O2' U D 14 115.814 127.500 120.063 1.00140.32 O
+ATOM 244 C1' U D 14 116.898 129.239 118.841 1.00140.32 C
+ATOM 245 N1 U D 14 118.263 129.746 118.575 1.00140.32 N
+ATOM 246 C2 U D 14 119.237 128.863 118.157 1.00140.32 C
+ATOM 247 O2 U D 14 119.057 127.673 117.994 1.00140.32 O
+ATOM 248 N3 U D 14 120.465 129.413 117.932 1.00140.32 N
+ATOM 249 C4 U D 14 120.820 130.727 118.074 1.00140.32 C
+ATOM 250 O4 U D 14 121.970 131.074 117.829 1.00140.32 O
+ATOM 251 C5 U D 14 119.766 131.580 118.496 1.00140.32 C
+ATOM 252 C6 U D 14 118.558 131.075 118.721 1.00140.32 C
+ATOM 253 P C D 15 116.436 129.056 123.616 1.00134.58 P
+ATOM 254 OP1 C D 15 115.588 129.574 124.709 1.00134.58 O
+ATOM 255 OP2 C D 15 117.778 129.643 123.407 1.00134.58 O
+ATOM 256 O5' C D 15 116.574 127.485 123.807 1.00134.58 O
+ATOM 257 C5' C D 15 115.455 126.647 123.571 1.00134.58 C
+ATOM 258 C4' C D 15 115.833 125.190 123.553 1.00134.58 C
+ATOM 259 O4' C D 15 116.412 124.831 122.270 1.00134.58 O
+ATOM 260 C3' C D 15 116.882 124.761 124.556 1.00134.58 C
+ATOM 261 O3' C D 15 116.361 124.583 125.859 1.00134.58 O
+ATOM 262 C2' C D 15 117.406 123.479 123.940 1.00134.58 C
+ATOM 263 O2' C D 15 116.495 122.417 124.161 1.00134.58 O
+ATOM 264 C1' C D 15 117.379 123.820 122.450 1.00134.58 C
+ATOM 265 N1 C D 15 118.681 124.322 121.952 1.00134.58 N
+ATOM 266 C2 C D 15 119.821 123.516 121.884 1.00134.58 C
+ATOM 267 O2 C D 15 119.827 122.336 122.247 1.00134.58 O
+ATOM 268 N3 C D 15 120.963 124.041 121.426 1.00134.58 N
+ATOM 269 C4 C D 15 121.012 125.290 121.012 1.00134.58 C
+ATOM 270 N4 C D 15 122.181 125.733 120.575 1.00134.58 N
+ATOM 271 C5 C D 15 119.887 126.141 121.050 1.00134.58 C
+ATOM 272 C6 C D 15 118.760 125.610 121.519 1.00134.58 C
+ATOM 273 P C D 16 117.191 125.096 127.132 1.00126.93 P
+ATOM 274 OP1 C D 16 116.432 124.752 128.354 1.00126.93 O
+ATOM 275 OP2 C D 16 117.565 126.507 126.885 1.00126.93 O
+ATOM 276 O5' C D 16 118.509 124.204 127.121 1.00126.93 O
+ATOM 277 C5' C D 16 118.448 122.809 127.378 1.00126.93 C
+ATOM 278 C4' C D 16 119.745 122.132 127.026 1.00126.93 C
+ATOM 279 O4' C D 16 120.092 122.418 125.649 1.00126.93 O
+ATOM 280 C3' C D 16 120.948 122.593 127.819 1.00126.93 C
+ATOM 281 O3' C D 16 121.043 121.914 129.052 1.00126.93 O
+ATOM 282 C2' C D 16 122.112 122.301 126.889 1.00126.93 C
+ATOM 283 O2' C D 16 122.497 120.943 126.989 1.00126.93 O
+ATOM 284 C1' C D 16 121.489 122.529 125.512 1.00126.93 C
+ATOM 285 N1 C D 16 121.793 123.860 124.940 1.00126.93 N
+ATOM 286 C2 C D 16 123.057 124.193 124.451 1.00126.93 C
+ATOM 287 O2 C D 16 123.977 123.374 124.523 1.00126.93 O
+ATOM 288 N3 C D 16 123.263 125.420 123.941 1.00126.93 N
+ATOM 289 C4 C D 16 122.269 126.284 123.879 1.00126.93 C
+ATOM 290 N4 C D 16 122.493 127.482 123.359 1.00126.93 N
+ATOM 291 C5 C D 16 120.976 125.972 124.341 1.00126.93 C
+ATOM 292 C6 C D 16 120.788 124.764 124.851 1.00126.93 C
+ATOM 293 P A D 17 122.022 122.456 130.193 1.00125.92 P
+ATOM 294 OP1 A D 17 121.604 121.897 131.495 1.00125.92 O
+ATOM 295 OP2 A D 17 122.136 123.920 130.035 1.00125.92 O
+ATOM 296 O5' A D 17 123.422 121.814 129.821 1.00125.92 O
+ATOM 297 C5' A D 17 124.601 122.337 130.388 1.00125.92 C
+ATOM 298 C4' A D 17 125.776 122.210 129.462 1.00125.92 C
+ATOM 299 O4' A D 17 125.401 122.543 128.103 1.00125.92 O
+ATOM 300 C3' A D 17 126.909 123.155 129.774 1.00125.92 C
+ATOM 301 O3' A D 17 127.713 122.667 130.819 1.00125.92 O
+ATOM 302 C2' A D 17 127.634 123.268 128.447 1.00125.92 C
+ATOM 303 O2' A D 17 128.459 122.137 128.240 1.00125.92 O
+ATOM 304 C1' A D 17 126.472 123.201 127.460 1.00125.92 C
+ATOM 305 N9 A D 17 125.998 124.531 127.038 1.00125.92 N
+ATOM 306 C8 A D 17 124.703 124.943 127.147 1.00125.92 C
+ATOM 307 N7 A D 17 124.472 126.144 126.698 1.00125.92 N
+ATOM 308 C5 A D 17 125.710 126.555 126.268 1.00125.92 C
+ATOM 309 C6 A D 17 126.128 127.752 125.696 1.00125.92 C
+ATOM 310 N6 A D 17 125.298 128.760 125.465 1.00125.92 N
+ATOM 311 N1 A D 17 127.419 127.895 125.359 1.00125.92 N
+ATOM 312 C2 A D 17 128.239 126.878 125.612 1.00125.92 C
+ATOM 313 N3 A D 17 127.958 125.690 126.137 1.00125.92 N
+ATOM 314 C4 A D 17 126.664 125.583 126.454 1.00125.92 C
+ATOM 315 P C D 18 127.904 123.542 132.139 1.00129.97 P
+ATOM 316 OP1 C D 18 127.975 122.619 133.294 1.00129.97 O
+ATOM 317 OP2 C D 18 126.883 124.609 132.119 1.00129.97 O
+ATOM 318 O5' C D 18 129.333 124.184 131.914 1.00129.97 O
+ATOM 319 C5' C D 18 130.304 123.464 131.179 1.00129.97 C
+ATOM 320 C4' C D 18 131.191 124.385 130.397 1.00129.97 C
+ATOM 321 O4' C D 18 130.511 124.869 129.215 1.00129.97 O
+ATOM 322 C3' C D 18 131.601 125.647 131.114 1.00129.97 C
+ATOM 323 O3' C D 18 132.636 125.424 132.044 1.00129.97 O
+ATOM 324 C2' C D 18 132.001 126.552 129.967 1.00129.97 C
+ATOM 325 O2' C D 18 133.292 126.205 129.500 1.00129.97 O
+ATOM 326 C1' C D 18 130.986 126.158 128.895 1.00129.97 C
+ATOM 327 N1 C D 18 129.834 127.086 128.786 1.00129.97 N
+ATOM 328 C2 C D 18 129.956 128.351 128.193 1.00129.97 C
+ATOM 329 O2 C D 18 131.052 128.728 127.771 1.00129.97 O
+ATOM 330 N3 C D 18 128.875 129.149 128.089 1.00129.97 N
+ATOM 331 C4 C D 18 127.707 128.726 128.541 1.00129.97 C
+ATOM 332 N4 C D 18 126.661 129.534 128.427 1.00129.97 N
+ATOM 333 C5 C D 18 127.543 127.455 129.135 1.00129.97 C
+ATOM 334 C6 C D 18 128.618 126.676 129.227 1.00129.97 C
+ATOM 335 P U D 19 132.611 126.183 133.451 1.00141.19 P
+ATOM 336 OP1 U D 19 133.561 125.490 134.348 1.00141.19 O
+ATOM 337 OP2 U D 19 131.198 126.340 133.860 1.00141.19 O
+ATOM 338 O5' U D 19 133.207 127.615 133.109 1.00141.19 O
+ATOM 339 C5' U D 19 134.503 127.725 132.548 1.00141.19 C
+ATOM 340 C4' U D 19 134.707 129.045 131.857 1.00141.19 C
+ATOM 341 O4' U D 19 133.787 129.179 130.750 1.00141.19 O
+ATOM 342 C3' U D 19 134.454 130.283 132.689 1.00141.19 C
+ATOM 343 O3' U D 19 135.543 130.590 133.532 1.00141.19 O
+ATOM 344 C2' U D 19 134.196 131.342 131.630 1.00141.19 C
+ATOM 345 O2' U D 19 135.416 131.803 131.081 1.00141.19 O
+ATOM 346 C1' U D 19 133.469 130.537 130.556 1.00141.19 C
+ATOM 347 N1 U D 19 132.002 130.702 130.608 1.00141.19 N
+ATOM 348 C2 U D 19 131.454 131.809 130.014 1.00141.19 C
+ATOM 349 O2 U D 19 132.113 132.663 129.464 1.00141.19 O
+ATOM 350 N3 U D 19 130.097 131.902 130.101 1.00141.19 N
+ATOM 351 C4 U D 19 129.237 131.008 130.680 1.00141.19 C
+ATOM 352 O4 U D 19 128.030 131.225 130.667 1.00141.19 O
+ATOM 353 C5 U D 19 129.872 129.883 131.264 1.00141.19 C
+ATOM 354 C6 U D 19 131.197 129.777 131.208 1.00141.19 C
+ATOM 355 P C D 20 135.285 131.276 134.953 1.00153.40 P
+ATOM 356 OP1 C D 20 136.577 131.306 135.673 1.00153.40 O
+ATOM 357 OP2 C D 20 134.112 130.615 135.567 1.00153.40 O
+ATOM 358 O5' C D 20 134.898 132.776 134.590 1.00153.40 O
+ATOM 359 C5' C D 20 135.859 133.648 134.016 1.00153.40 C
+ATOM 360 C4' C D 20 135.273 135.001 133.717 1.00153.40 C
+ATOM 361 O4' C D 20 134.241 134.871 132.711 1.00153.40 O
+ATOM 362 C3' C D 20 134.584 135.681 134.884 1.00153.40 C
+ATOM 363 O3' C D 20 135.504 136.377 135.710 1.00153.40 O
+ATOM 364 C2' C D 20 133.578 136.595 134.204 1.00153.40 C
+ATOM 365 O2' C D 20 134.209 137.792 133.788 1.00153.40 O
+ATOM 366 C1' C D 20 133.209 135.795 132.953 1.00153.40 C
+ATOM 367 N1 C D 20 131.932 135.050 133.061 1.00153.40 N
+ATOM 368 C2 C D 20 130.683 135.677 133.054 1.00153.40 C
+ATOM 369 O2 C D 20 130.589 136.908 133.007 1.00153.40 O
+ATOM 370 N3 C D 20 129.564 134.927 133.135 1.00153.40 N
+ATOM 371 C4 C D 20 129.646 133.604 133.189 1.00153.40 C
+ATOM 372 N4 C D 20 128.521 132.900 133.260 1.00153.40 N
+ATOM 373 C5 C D 20 130.891 132.935 133.173 1.00153.40 C
+ATOM 374 C6 C D 20 131.982 133.694 133.101 1.00153.40 C
+ATOM 375 P U D 21 135.177 136.632 137.260 1.00159.88 P
+ATOM 376 OP1 U D 21 136.376 137.137 137.965 1.00159.88 O
+ATOM 377 OP2 U D 21 134.479 135.436 137.774 1.00159.88 O
+ATOM 378 O5' U D 21 134.158 137.846 137.232 1.00159.88 O
+ATOM 379 C5' U D 21 134.587 139.119 136.789 1.00159.88 C
+ATOM 380 C4' U D 21 133.458 140.108 136.807 1.00159.88 C
+ATOM 381 O4' U D 21 132.459 139.738 135.829 1.00159.88 O
+ATOM 382 C3' U D 21 132.683 140.197 138.104 1.00159.88 C
+ATOM 383 O3' U D 21 133.338 140.998 139.071 1.00159.88 O
+ATOM 384 C2' U D 21 131.356 140.778 137.651 1.00159.88 C
+ATOM 385 O2' U D 21 131.467 142.180 137.481 1.00159.88 O
+ATOM 386 C1' U D 21 131.188 140.147 136.272 1.00159.88 C
+ATOM 387 N1 U D 21 130.274 138.985 136.270 1.00159.88 N
+ATOM 388 C2 U D 21 128.916 139.216 136.300 1.00159.88 C
+ATOM 389 O2 U D 21 128.421 140.321 136.336 1.00159.88 O
+ATOM 390 N3 U D 21 128.125 138.106 136.269 1.00159.88 N
+ATOM 391 C4 U D 21 128.545 136.805 136.216 1.00159.88 C
+ATOM 392 O4 U D 21 127.711 135.906 136.200 1.00159.88 O
+ATOM 393 C5 U D 21 129.960 136.643 136.190 1.00159.88 C
+ATOM 394 C6 U D 21 130.757 137.711 136.208 1.00159.88 C
+ATOM 395 P A D 22 133.163 140.694 140.637 1.00165.66 P
+ATOM 396 OP1 A D 22 134.022 141.638 141.387 1.00165.66 O
+ATOM 397 OP2 A D 22 133.325 139.238 140.841 1.00165.66 O
+ATOM 398 O5' A D 22 131.657 141.102 140.946 1.00165.66 O
+ATOM 399 C5' A D 22 131.281 142.468 140.956 1.00165.66 C
+ATOM 400 C4' A D 22 129.786 142.648 140.910 1.00165.66 C
+ATOM 401 O4' A D 22 129.225 141.962 139.767 1.00165.66 O
+ATOM 402 C3' A D 22 129.006 142.093 142.081 1.00165.66 C
+ATOM 403 O3' A D 22 129.052 142.944 143.206 1.00165.66 O
+ATOM 404 C2' A D 22 127.612 141.947 141.498 1.00165.66 C
+ATOM 405 O2' A D 22 126.965 143.207 141.447 1.00165.66 O
+ATOM 406 C1' A D 22 127.923 141.518 140.068 1.00165.66 C
+ATOM 407 N9 A D 22 127.862 140.056 139.904 1.00165.66 N
+ATOM 408 C8 A D 22 128.897 139.167 139.824 1.00165.66 C
+ATOM 409 N7 A D 22 128.513 137.923 139.675 1.00165.66 N
+ATOM 410 C5 A D 22 127.136 138.004 139.659 1.00165.66 C
+ATOM 411 C6 A D 22 126.131 137.042 139.531 1.00165.66 C
+ATOM 412 N6 A D 22 126.371 135.745 139.389 1.00165.66 N
+ATOM 413 N1 A D 22 124.850 137.450 139.553 1.00165.66 N
+ATOM 414 C2 A D 22 124.602 138.753 139.693 1.00165.66 C
+ATOM 415 N3 A D 22 125.463 139.754 139.825 1.00165.66 N
+ATOM 416 C4 A D 22 126.723 139.308 139.799 1.00165.66 C
+ATOM 417 P G D 23 129.128 142.318 144.677 1.00175.77 P
+ATOM 418 OP1 G D 23 129.789 143.301 145.567 1.00175.77 O
+ATOM 419 OP2 G D 23 129.670 140.946 144.541 1.00175.77 O
+ATOM 420 O5' G D 23 127.606 142.185 145.115 1.00175.77 O
+ATOM 421 C5' G D 23 126.943 140.937 145.037 1.00175.77 C
+ATOM 422 C4' G D 23 125.523 141.096 144.565 1.00175.77 C
+ATOM 423 O4' G D 23 125.355 140.436 143.289 1.00175.77 O
+ATOM 424 C3' G D 23 124.469 140.468 145.452 1.00175.77 C
+ATOM 425 O3' G D 23 124.115 141.316 146.525 1.00175.77 O
+ATOM 426 C2' G D 23 123.324 140.210 144.487 1.00175.77 C
+ATOM 427 O2' G D 23 122.588 141.402 144.268 1.00175.77 O
+ATOM 428 C1' G D 23 124.072 139.868 143.202 1.00175.77 C
+ATOM 429 N9 G D 23 124.234 138.421 142.974 1.00175.77 N
+ATOM 430 C8 G D 23 125.437 137.778 142.866 1.00175.77 C
+ATOM 431 N7 G D 23 125.325 136.505 142.635 1.00175.77 N
+ATOM 432 C5 G D 23 123.962 136.298 142.573 1.00175.77 C
+ATOM 433 C6 G D 23 123.240 135.107 142.346 1.00175.77 C
+ATOM 434 O6 G D 23 123.679 133.974 142.151 1.00175.77 O
+ATOM 435 N1 G D 23 121.876 135.319 142.356 1.00175.77 N
+ATOM 436 C2 G D 23 121.286 136.532 142.567 1.00175.77 C
+ATOM 437 N2 G D 23 119.953 136.557 142.540 1.00175.77 N
+ATOM 438 N3 G D 23 121.943 137.654 142.778 1.00175.77 N
+ATOM 439 C4 G D 23 123.272 137.468 142.765 1.00175.77 C
+ATOM 440 P C D 24 124.042 140.739 148.016 1.00178.32 P
+ATOM 441 OP1 C D 24 124.202 141.871 148.959 1.00178.32 O
+ATOM 442 OP2 C D 24 124.954 139.576 148.098 1.00178.32 O
+ATOM 443 O5' C D 24 122.549 140.212 148.133 1.00178.32 O
+ATOM 444 C5' C D 24 121.478 140.986 147.627 1.00178.32 C
+ATOM 445 C4' C D 24 120.283 140.124 147.340 1.00178.32 C
+ATOM 446 O4' C D 24 120.504 139.346 146.137 1.00178.32 O
+ATOM 447 C3' C D 24 119.979 139.083 148.394 1.00178.32 C
+ATOM 448 O3' C D 24 119.296 139.628 149.504 1.00178.32 O
+ATOM 449 C2' C D 24 119.175 138.050 147.619 1.00178.32 C
+ATOM 450 O2' C D 24 117.829 138.470 147.475 1.00178.32 O
+ATOM 451 C1' C D 24 119.845 138.103 146.244 1.00178.32 C
+ATOM 452 N1 C D 24 120.832 137.018 146.035 1.00178.32 N
+ATOM 453 C2 C D 24 120.407 135.733 145.697 1.00178.32 C
+ATOM 454 O2 C D 24 119.196 135.504 145.601 1.00178.32 O
+ATOM 455 N3 C D 24 121.329 134.768 145.507 1.00178.32 N
+ATOM 456 C4 C D 24 122.622 135.047 145.626 1.00178.32 C
+ATOM 457 N4 C D 24 123.509 134.078 145.430 1.00178.32 N
+ATOM 458 C5 C D 24 123.084 136.343 145.954 1.00178.32 C
+ATOM 459 C6 C D 24 122.161 137.281 146.142 1.00178.32 C
+ATOM 460 P A D 25 119.481 138.973 150.954 1.00179.52 P
+ATOM 461 OP1 A D 25 119.458 140.056 151.963 1.00179.52 O
+ATOM 462 OP2 A D 25 120.642 138.052 150.901 1.00179.52 O
+ATOM 463 O5' A D 25 118.167 138.097 151.131 1.00179.52 O
+ATOM 464 C5' A D 25 118.235 136.809 151.712 1.00179.52 C
+ATOM 465 C4' A D 25 117.395 135.821 150.951 1.00179.52 C
+ATOM 466 O4' A D 25 117.822 135.771 149.565 1.00179.52 O
+ATOM 467 C3' A D 25 117.506 134.387 151.423 1.00179.52 C
+ATOM 468 O3' A D 25 116.688 134.124 152.543 1.00179.52 O
+ATOM 469 C2' A D 25 117.119 133.601 150.185 1.00179.52 C
+ATOM 470 O2' A D 25 115.715 133.641 149.994 1.00179.52 O
+ATOM 471 C1' A D 25 117.769 134.443 149.093 1.00179.52 C
+ATOM 472 N9 A D 25 119.152 134.020 148.794 1.00179.52 N
+ATOM 473 C8 A D 25 120.235 134.855 148.822 1.00179.52 C
+ATOM 474 N7 A D 25 121.366 134.286 148.509 1.00179.52 N
+ATOM 475 C5 A D 25 121.005 132.980 148.259 1.00179.52 C
+ATOM 476 C6 A D 25 121.760 131.866 147.887 1.00179.52 C
+ATOM 477 N6 A D 25 123.072 131.925 147.696 1.00179.52 N
+ATOM 478 N1 A D 25 121.133 130.687 147.711 1.00179.52 N
+ATOM 479 C2 A D 25 119.811 130.643 147.893 1.00179.52 C
+ATOM 480 N3 A D 25 118.993 131.628 148.253 1.00179.52 N
+ATOM 481 C4 A D 25 119.649 132.790 148.418 1.00179.52 C
+ATOM 482 P G D 26 117.361 133.749 153.948 1.00177.11 P
+ATOM 483 OP1 G D 26 116.878 134.707 154.970 1.00177.11 O
+ATOM 484 OP2 G D 26 118.815 133.567 153.723 1.00177.11 O
+ATOM 485 O5' G D 26 116.747 132.324 154.274 1.00177.11 O
+ATOM 486 C5' G D 26 115.738 131.780 153.444 1.00177.11 C
+ATOM 487 C4' G D 26 116.161 130.447 152.900 1.00177.11 C
+ATOM 488 O4' G D 26 116.934 130.616 151.691 1.00177.11 O
+ATOM 489 C3' G D 26 117.074 129.642 153.796 1.00177.11 C
+ATOM 490 O3' G D 26 116.372 128.990 154.830 1.00177.11 O
+ATOM 491 C2' G D 26 117.733 128.686 152.818 1.00177.11 C
+ATOM 492 O2' G D 26 116.865 127.605 152.524 1.00177.11 O
+ATOM 493 C1' G D 26 117.853 129.552 151.570 1.00177.11 C
+ATOM 494 N9 G D 26 119.209 130.088 151.353 1.00177.11 N
+ATOM 495 C8 G D 26 119.597 131.397 151.410 1.00177.11 C
+ATOM 496 N7 G D 26 120.854 131.572 151.130 1.00177.11 N
+ATOM 497 C5 G D 26 121.324 130.301 150.878 1.00177.11 C
+ATOM 498 C6 G D 26 122.613 129.869 150.515 1.00177.11 C
+ATOM 499 O6 G D 26 123.625 130.548 150.361 1.00177.11 O
+ATOM 500 N1 G D 26 122.675 128.499 150.343 1.00177.11 N
+ATOM 501 C2 G D 26 121.611 127.657 150.495 1.00177.11 C
+ATOM 502 N2 G D 26 121.843 126.359 150.282 1.00177.11 N
+ATOM 503 N3 G D 26 120.397 128.047 150.825 1.00177.11 N
+ATOM 504 C4 G D 26 120.323 129.375 150.996 1.00177.11 C
+ATOM 505 P C D 27 117.057 128.808 156.266 1.00177.19 P
+ATOM 506 OP1 C D 27 115.983 128.539 157.248 1.00177.19 O
+ATOM 507 OP2 C D 27 118.002 129.932 156.468 1.00177.19 O
+ATOM 508 O5' C D 27 117.916 127.482 156.109 1.00177.19 O
+ATOM 509 C5' C D 27 117.340 126.310 155.559 1.00177.19 C
+ATOM 510 C4' C D 27 118.410 125.360 155.104 1.00177.19 C
+ATOM 511 O4' C D 27 119.124 125.909 153.971 1.00177.19 O
+ATOM 512 C3' C D 27 119.495 125.098 156.122 1.00177.19 C
+ATOM 513 O3' C D 27 119.095 124.142 157.082 1.00177.19 O
+ATOM 514 C2' C D 27 120.659 124.638 155.262 1.00177.19 C
+ATOM 515 O2' C D 27 120.514 123.268 154.937 1.00177.19 O
+ATOM 516 C1' C D 27 120.458 125.456 153.986 1.00177.19 C
+ATOM 517 N1 C D 27 121.376 126.612 153.870 1.00177.19 N
+ATOM 518 C2 C D 27 122.694 126.417 153.447 1.00177.19 C
+ATOM 519 O2 C D 27 123.080 125.274 153.191 1.00177.19 O
+ATOM 520 N3 C D 27 123.518 127.473 153.322 1.00177.19 N
+ATOM 521 C4 C D 27 123.069 128.688 153.596 1.00177.19 C
+ATOM 522 N4 C D 27 123.906 129.711 153.464 1.00177.19 N
+ATOM 523 C5 C D 27 121.739 128.921 154.023 1.00177.19 C
+ATOM 524 C6 C D 27 120.936 127.865 154.138 1.00177.19 C
+ATOM 525 P A D 28 119.331 124.425 158.640 1.00172.56 P
+ATOM 526 OP1 A D 28 118.468 123.501 159.412 1.00172.56 O
+ATOM 527 OP2 A D 28 119.237 125.887 158.862 1.00172.56 O
+ATOM 528 O5' A D 28 120.837 123.979 158.866 1.00172.56 O
+ATOM 529 C5' A D 28 121.281 122.723 158.388 1.00172.56 C
+ATOM 530 C4' A D 28 122.769 122.715 158.181 1.00172.56 C
+ATOM 531 O4' A D 28 123.122 123.525 157.044 1.00172.56 O
+ATOM 532 C3' A D 28 123.583 123.289 159.318 1.00172.56 C
+ATOM 533 O3' A D 28 123.777 122.321 160.328 1.00172.56 O
+ATOM 534 C2' A D 28 124.875 123.729 158.639 1.00172.56 C
+ATOM 535 O2' A D 28 125.811 122.666 158.608 1.00172.56 O
+ATOM 536 C1' A D 28 124.417 124.035 157.209 1.00172.56 C
+ATOM 537 N9 A D 28 124.428 125.470 156.888 1.00172.56 N
+ATOM 538 C8 A D 28 123.408 126.380 156.864 1.00172.56 C
+ATOM 539 N7 A D 28 123.800 127.584 156.517 1.00172.56 N
+ATOM 540 C5 A D 28 125.159 127.446 156.286 1.00172.56 C
+ATOM 541 C6 A D 28 126.179 128.328 155.894 1.00172.56 C
+ATOM 542 N6 A D 28 125.994 129.616 155.642 1.00172.56 N
+ATOM 543 N1 A D 28 127.434 127.854 155.772 1.00172.56 N
+ATOM 544 C2 A D 28 127.647 126.564 156.009 1.00172.56 C
+ATOM 545 N3 A D 28 126.778 125.637 156.387 1.00172.56 N
+ATOM 546 C4 A D 28 125.547 126.148 156.507 1.00172.56 C
+ATOM 547 P C D 29 124.452 122.718 161.719 1.00171.35 P
+ATOM 548 OP1 C D 29 123.979 121.771 162.754 1.00171.35 O
+ATOM 549 OP2 C D 29 124.298 124.177 161.911 1.00171.35 O
+ATOM 550 O5' C D 29 125.988 122.424 161.478 1.00171.35 O
+ATOM 551 C5' C D 29 126.965 123.304 161.987 1.00171.35 C
+ATOM 552 C4' C D 29 128.162 123.352 161.089 1.00171.35 C
+ATOM 553 O4' C D 29 127.852 124.070 159.875 1.00171.35 O
+ATOM 554 C3' C D 29 129.372 124.055 161.661 1.00171.35 C
+ATOM 555 O3' C D 29 130.155 123.161 162.419 1.00171.35 O
+ATOM 556 C2' C D 29 130.110 124.558 160.433 1.00171.35 C
+ATOM 557 O2' C D 29 130.987 123.565 159.936 1.00171.35 O
+ATOM 558 C1' C D 29 128.980 124.778 159.425 1.00171.35 C
+ATOM 559 N1 C D 29 128.589 126.198 159.231 1.00171.35 N
+ATOM 560 C2 C D 29 129.443 127.248 158.821 1.00171.35 C
+ATOM 561 O2 C D 29 130.653 127.071 158.616 1.00171.35 O
+ATOM 562 N3 C D 29 128.926 128.487 158.666 1.00171.35 N
+ATOM 563 C4 C D 29 127.636 128.709 158.874 1.00171.35 C
+ATOM 564 N4 C D 29 127.161 129.937 158.709 1.00171.35 N
+ATOM 565 C5 C D 29 126.752 127.678 159.267 1.00171.35 C
+ATOM 566 C6 C D 29 127.270 126.464 159.429 1.00171.35 C
+ATOM 567 P G D 30 130.648 123.558 163.884 1.00189.31 P
+ATOM 568 OP1 G D 30 131.132 122.318 164.527 1.00189.31 O
+ATOM 569 OP2 G D 30 129.578 124.353 164.529 1.00189.31 O
+ATOM 570 O5' G D 30 131.902 124.493 163.616 1.00189.31 O
+ATOM 571 C5' G D 30 132.938 124.086 162.741 1.00189.31 C
+ATOM 572 C4' G D 30 133.755 125.269 162.311 1.00189.31 C
+ATOM 573 O4' G D 30 133.005 126.068 161.373 1.00189.31 O
+ATOM 574 C3' G D 30 134.103 126.225 163.431 1.00189.31 C
+ATOM 575 O3' G D 30 135.261 125.797 164.113 1.00189.31 O
+ATOM 576 C2' G D 30 134.278 127.563 162.723 1.00189.31 C
+ATOM 577 O2' G D 30 135.602 127.689 162.233 1.00189.31 O
+ATOM 578 C1' G D 30 133.332 127.427 161.528 1.00189.31 C
+ATOM 579 N9 G D 30 132.082 128.202 161.632 1.00189.31 N
+ATOM 580 C8 G D 30 130.830 127.675 161.757 1.00189.31 C
+ATOM 581 N7 G D 30 129.878 128.560 161.768 1.00189.31 N
+ATOM 582 C5 G D 30 130.546 129.757 161.639 1.00189.31 C
+ATOM 583 C6 G D 30 130.036 131.069 161.597 1.00189.31 C
+ATOM 584 O6 G D 30 128.863 131.434 161.677 1.00189.31 O
+ATOM 585 N1 G D 30 131.049 132.000 161.458 1.00189.31 N
+ATOM 586 C2 G D 30 132.379 131.692 161.367 1.00189.31 C
+ATOM 587 N2 G D 30 133.214 132.726 161.235 1.00189.31 N
+ATOM 588 N3 G D 30 132.869 130.470 161.403 1.00189.31 N
+ATOM 589 C4 G D 30 131.900 129.557 161.544 1.00189.31 C
+ATOM 590 P U D 31 135.518 126.249 165.621 1.00204.60 P
+ATOM 591 OP1 U D 31 136.456 125.284 166.238 1.00204.60 O
+ATOM 592 OP2 U D 31 134.197 126.488 166.246 1.00204.60 O
+ATOM 593 O5' U D 31 136.263 127.644 165.465 1.00204.60 O
+ATOM 594 C5' U D 31 136.372 128.531 166.562 1.00204.60 C
+ATOM 595 C4' U D 31 136.571 129.947 166.097 1.00204.60 C
+ATOM 596 O4' U D 31 135.754 130.198 164.926 1.00204.60 O
+ATOM 597 C3' U D 31 136.150 131.023 167.078 1.00204.60 C
+ATOM 598 O3' U D 31 137.131 131.268 168.068 1.00204.60 O
+ATOM 599 C2' U D 31 135.888 132.210 166.169 1.00204.60 C
+ATOM 600 O2' U D 31 137.111 132.799 165.766 1.00204.60 O
+ATOM 601 C1' U D 31 135.285 131.528 164.944 1.00204.60 C
+ATOM 602 N1 U D 31 133.801 131.506 164.919 1.00204.60 N
+ATOM 603 C2 U D 31 133.031 132.660 164.849 1.00204.60 C
+ATOM 604 O2 U D 31 133.452 133.802 164.848 1.00204.60 O
+ATOM 605 N3 U D 31 131.679 132.452 164.793 1.00204.60 N
+ATOM 606 C4 U D 31 131.016 131.250 164.781 1.00204.60 C
+ATOM 607 O4 U D 31 129.790 131.229 164.725 1.00204.60 O
+ATOM 608 C5 U D 31 131.866 130.113 164.840 1.00204.60 C
+ATOM 609 C6 U D 31 133.184 130.282 164.889 1.00204.60 C
+ATOM 610 P A D 32 136.876 132.351 169.227 1.00239.28 P
+ATOM 611 OP1 A D 32 137.684 131.934 170.399 1.00239.28 O
+ATOM 612 OP2 A D 32 135.421 132.592 169.396 1.00239.28 O
+ATOM 613 O5' A D 32 137.529 133.680 168.651 1.00239.28 O
+ATOM 614 C5' A D 32 138.905 133.718 168.312 1.00239.28 C
+ATOM 615 C4' A D 32 139.346 135.121 168.003 1.00239.28 C
+ATOM 616 O4' A D 32 138.997 135.470 166.633 1.00239.28 O
+ATOM 617 C3' A D 32 138.677 136.206 168.822 1.00239.28 C
+ATOM 618 O3' A D 32 139.169 136.334 170.141 1.00239.28 O
+ATOM 619 C2' A D 32 138.889 137.427 167.951 1.00239.28 C
+ATOM 620 O2' A D 32 140.247 137.839 167.993 1.00239.28 O
+ATOM 621 C1' A D 32 138.605 136.830 166.574 1.00239.28 C
+ATOM 622 N9 A D 32 137.156 136.868 166.296 1.00239.28 N
+ATOM 623 C8 A D 32 136.340 135.785 166.101 1.00239.28 C
+ATOM 624 N7 A D 32 135.084 136.090 165.904 1.00239.28 N
+ATOM 625 C5 A D 32 135.067 137.473 165.986 1.00239.28 C
+ATOM 626 C6 A D 32 134.030 138.414 165.859 1.00239.28 C
+ATOM 627 N6 A D 32 132.763 138.079 165.627 1.00239.28 N
+ATOM 628 N1 A D 32 134.326 139.726 165.999 1.00239.28 N
+ATOM 629 C2 A D 32 135.593 140.074 166.240 1.00239.28 C
+ATOM 630 N3 A D 32 136.653 139.281 166.369 1.00239.28 N
+ATOM 631 C4 A D 32 136.331 137.976 166.235 1.00239.28 C
+ATOM 632 P A D 33 138.144 136.223 171.373 1.00233.78 P
+ATOM 633 OP1 A D 33 138.807 136.777 172.580 1.00233.78 O
+ATOM 634 OP2 A D 33 137.608 134.839 171.383 1.00233.78 O
+ATOM 635 O5' A D 33 136.945 137.197 170.980 1.00233.78 O
+ATOM 636 C5' A D 33 137.080 138.607 171.089 1.00233.78 C
+ATOM 637 C4' A D 33 135.852 139.326 170.592 1.00233.78 C
+ATOM 638 O4' A D 33 135.360 138.702 169.378 1.00233.78 O
+ATOM 639 C3' A D 33 134.651 139.316 171.521 1.00233.78 C
+ATOM 640 O3' A D 33 134.750 140.286 172.549 1.00233.78 O
+ATOM 641 C2' A D 33 133.491 139.567 170.569 1.00233.78 C
+ATOM 642 O2' A D 33 133.380 140.948 170.265 1.00233.78 O
+ATOM 643 C1' A D 33 133.954 138.829 169.310 1.00233.78 C
+ATOM 644 N9 A D 33 133.349 137.485 169.153 1.00233.78 N
+ATOM 645 C8 A D 33 134.033 136.306 168.982 1.00233.78 C
+ATOM 646 N7 A D 33 133.272 135.250 168.851 1.00233.78 N
+ATOM 647 C5 A D 33 131.989 135.763 168.929 1.00233.78 C
+ATOM 648 C6 A D 33 130.728 135.151 168.863 1.00233.78 C
+ATOM 649 N6 A D 33 130.557 133.842 168.690 1.00233.78 N
+ATOM 650 N1 A D 33 129.628 135.928 168.973 1.00233.78 N
+ATOM 651 C2 A D 33 129.794 137.245 169.125 1.00233.78 C
+ATOM 652 N3 A D 33 130.929 137.930 169.215 1.00233.78 N
+ATOM 653 C4 A D 33 132.011 137.134 169.107 1.00233.78 C
+ATOM 654 P A D 34 134.347 139.912 174.057 1.00233.71 P
+ATOM 655 OP1 A D 34 135.029 140.864 174.965 1.00233.71 O
+ATOM 656 OP2 A D 34 134.545 138.454 174.229 1.00233.71 O
+ATOM 657 O5' A D 34 132.786 140.213 174.131 1.00233.71 O
+ATOM 658 C5' A D 34 132.296 141.545 174.107 1.00233.71 C
+ATOM 659 C4' A D 34 130.803 141.570 173.911 1.00233.71 C
+ATOM 660 O4' A D 34 130.473 140.901 172.668 1.00233.71 O
+ATOM 661 C3' A D 34 129.995 140.831 174.967 1.00233.71 C
+ATOM 662 O3' A D 34 129.731 141.633 176.102 1.00233.71 O
+ATOM 663 C2' A D 34 128.737 140.423 174.215 1.00233.71 C
+ATOM 664 O2' A D 34 127.826 141.507 174.138 1.00233.71 O
+ATOM 665 C1' A D 34 129.288 140.153 172.818 1.00233.71 C
+ATOM 666 N9 A D 34 129.615 138.733 172.602 1.00233.71 N
+ATOM 667 C8 A D 34 130.859 138.226 172.357 1.00233.71 C
+ATOM 668 N7 A D 34 130.898 136.936 172.179 1.00233.71 N
+ATOM 669 C5 A D 34 129.579 136.565 172.336 1.00233.71 C
+ATOM 670 C6 A D 34 128.952 135.324 172.253 1.00233.71 C
+ATOM 671 N6 A D 34 129.603 134.189 172.017 1.00233.71 N
+ATOM 672 N1 A D 34 127.623 135.280 172.447 1.00233.71 N
+ATOM 673 C2 A D 34 126.963 136.415 172.689 1.00233.71 C
+ATOM 674 N3 A D 34 127.443 137.649 172.776 1.00233.71 N
+ATOM 675 C4 A D 34 128.769 137.657 172.577 1.00233.71 C
+ATOM 676 P U D 35 129.617 140.967 177.558 1.00235.59 P
+ATOM 677 OP1 U D 35 129.375 142.060 178.529 1.00235.59 O
+ATOM 678 OP2 U D 35 130.770 140.050 177.735 1.00235.59 O
+ATOM 679 O5' U D 35 128.286 140.097 177.490 1.00235.59 O
+ATOM 680 C5' U D 35 127.011 140.719 177.510 1.00235.59 C
+ATOM 681 C4' U D 35 125.909 139.695 177.547 1.00235.59 C
+ATOM 682 O4' U D 35 125.939 138.896 176.335 1.00235.59 O
+ATOM 683 C3' U D 35 126.000 138.666 178.662 1.00235.59 C
+ATOM 684 O3' U D 35 125.523 139.143 179.906 1.00235.59 O
+ATOM 685 C2' U D 35 125.188 137.506 178.110 1.00235.59 C
+ATOM 686 O2' U D 35 123.799 137.740 178.280 1.00235.59 O
+ATOM 687 C1' U D 35 125.517 137.577 176.619 1.00235.59 C
+ATOM 688 N1 U D 35 126.589 136.630 176.234 1.00235.59 N
+ATOM 689 C2 U D 35 126.286 135.284 176.102 1.00235.59 C
+ATOM 690 O2 U D 35 125.192 134.793 176.302 1.00235.59 O
+ATOM 691 N3 U D 35 127.329 134.484 175.749 1.00235.59 N
+ATOM 692 C4 U D 35 128.616 134.879 175.500 1.00235.59 C
+ATOM 693 O4 U D 35 129.457 134.048 175.175 1.00235.59 O
+ATOM 694 C5 U D 35 128.853 136.276 175.646 1.00235.59 C
+ATOM 695 C6 U D 35 127.858 137.086 175.993 1.00235.59 C
+ATOM 696 P A D 36 126.156 138.581 181.269 1.00223.84 P
+ATOM 697 OP1 A D 36 125.564 139.334 182.400 1.00223.84 O
+ATOM 698 OP2 A D 36 127.626 138.537 181.095 1.00223.84 O
+ATOM 699 O5' A D 36 125.629 137.082 181.359 1.00223.84 O
+ATOM 700 C5' A D 36 124.269 136.812 181.650 1.00223.84 C
+ATOM 701 C4' A D 36 123.925 135.367 181.401 1.00223.84 C
+ATOM 702 O4' A D 36 124.237 135.016 180.033 1.00223.84 O
+ATOM 703 C3' A D 36 124.689 134.347 182.225 1.00223.84 C
+ATOM 704 O3' A D 36 124.147 134.184 183.518 1.00223.84 O
+ATOM 705 C2' A D 36 124.588 133.091 181.378 1.00223.84 C
+ATOM 706 O2' A D 36 123.336 132.456 181.576 1.00223.84 O
+ATOM 707 C1' A D 36 124.636 133.666 179.964 1.00223.84 C
+ATOM 708 N9 A D 36 125.994 133.605 179.413 1.00223.84 N
+ATOM 709 C8 A D 36 126.884 134.620 179.214 1.00223.84 C
+ATOM 710 N7 A D 36 128.017 134.219 178.702 1.00223.84 N
+ATOM 711 C5 A D 36 127.856 132.851 178.589 1.00223.84 C
+ATOM 712 C6 A D 36 128.688 131.828 178.131 1.00223.84 C
+ATOM 713 N6 A D 36 129.917 132.047 177.680 1.00223.84 N
+ATOM 714 N1 A D 36 128.224 130.558 178.151 1.00223.84 N
+ATOM 715 C2 A D 36 126.990 130.340 178.607 1.00223.84 C
+ATOM 716 N3 A D 36 126.114 131.220 179.064 1.00223.84 N
+ATOM 717 C4 A D 36 126.616 132.461 179.024 1.00223.84 C
+ATOM 718 P U D 37 124.878 133.234 184.580 1.00232.98 P
+ATOM 719 OP1 U D 37 124.526 133.723 185.932 1.00232.98 O
+ATOM 720 OP2 U D 37 126.301 133.094 184.191 1.00232.98 O
+ATOM 721 O5' U D 37 124.191 131.816 184.378 1.00232.98 O
+ATOM 722 C5' U D 37 124.543 130.734 185.220 1.00232.98 C
+ATOM 723 C4' U D 37 124.647 129.440 184.459 1.00232.98 C
+ATOM 724 O4' U D 37 125.010 129.698 183.080 1.00232.98 O
+ATOM 725 C3' U D 37 125.713 128.484 184.961 1.00232.98 C
+ATOM 726 O3' U D 37 125.256 127.706 186.050 1.00232.98 O
+ATOM 727 C2' U D 37 126.037 127.650 183.733 1.00232.98 C
+ATOM 728 O2' U D 37 125.105 126.592 183.587 1.00232.98 O
+ATOM 729 C1' U D 37 125.827 128.657 182.592 1.00232.98 C
+ATOM 730 N1 U D 37 127.086 129.259 182.091 1.00232.98 N
+ATOM 731 C2 U D 37 128.105 128.505 181.513 1.00232.98 C
+ATOM 732 O2 U D 37 128.138 127.295 181.367 1.00232.98 O
+ATOM 733 N3 U D 37 129.190 129.226 181.107 1.00232.98 N
+ATOM 734 C4 U D 37 129.360 130.587 181.183 1.00232.98 C
+ATOM 735 O4 U D 37 130.399 131.088 180.761 1.00232.98 O
+ATOM 736 C5 U D 37 128.269 131.299 181.762 1.00232.98 C
+ATOM 737 C6 U D 37 127.199 130.626 182.174 1.00232.98 C
+ATOM 738 P U D 38 126.259 127.298 187.232 1.00240.75 P
+ATOM 739 OP1 U D 38 125.483 126.555 188.256 1.00240.75 O
+ATOM 740 OP2 U D 38 127.039 128.499 187.613 1.00240.75 O
+ATOM 741 O5' U D 38 127.243 126.252 186.554 1.00240.75 O
+ATOM 742 C5' U D 38 126.812 124.931 186.283 1.00240.75 C
+ATOM 743 C4' U D 38 127.903 124.134 185.625 1.00240.75 C
+ATOM 744 O4' U D 38 128.269 124.760 184.369 1.00240.75 O
+ATOM 745 C3' U D 38 129.214 124.049 186.388 1.00240.75 C
+ATOM 746 O3' U D 38 129.202 123.054 187.395 1.00240.75 O
+ATOM 747 C2' U D 38 130.225 123.772 185.287 1.00240.75 C
+ATOM 748 O2' U D 38 130.231 122.396 184.946 1.00240.75 O
+ATOM 749 C1' U D 38 129.643 124.563 184.116 1.00240.75 C
+ATOM 750 N1 U D 38 130.291 125.880 183.937 1.00240.75 N
+ATOM 751 C2 U D 38 131.562 125.946 183.385 1.00240.75 C
+ATOM 752 O2 U D 38 132.226 124.990 183.028 1.00240.75 O
+ATOM 753 N3 U D 38 132.066 127.203 183.249 1.00240.75 N
+ATOM 754 C4 U D 38 131.441 128.376 183.598 1.00240.75 C
+ATOM 755 O4 U D 38 132.014 129.446 183.421 1.00240.75 O
+ATOM 756 C5 U D 38 130.140 128.227 184.160 1.00240.75 C
+ATOM 757 C6 U D 38 129.616 127.016 184.297 1.00240.75 C
+ATOM 758 P G D 39 130.172 123.182 188.669 1.00243.39 P
+ATOM 759 OP1 G D 39 129.788 122.134 189.645 1.00243.39 O
+ATOM 760 OP2 G D 39 130.205 124.609 189.071 1.00243.39 O
+ATOM 761 O5' G D 39 131.610 122.793 188.111 1.00243.39 O
+ATOM 762 C5' G D 39 131.877 121.473 187.674 1.00243.39 C
+ATOM 763 C4' G D 39 133.173 121.394 186.914 1.00243.39 C
+ATOM 764 O4' G D 39 133.145 122.300 185.785 1.00243.39 O
+ATOM 765 C3' G D 39 134.419 121.800 187.676 1.00243.39 C
+ATOM 766 O3' G D 39 134.898 120.765 188.514 1.00243.39 O
+ATOM 767 C2' G D 39 135.385 122.164 186.560 1.00243.39 C
+ATOM 768 O2' G D 39 135.966 120.998 186.003 1.00243.39 O
+ATOM 769 C1' G D 39 134.444 122.781 185.525 1.00243.39 C
+ATOM 770 N9 G D 39 134.426 124.248 185.610 1.00243.39 N
+ATOM 771 C8 G D 39 133.398 125.015 186.079 1.00243.39 C
+ATOM 772 N7 G D 39 133.654 126.287 186.041 1.00243.39 N
+ATOM 773 C5 G D 39 134.931 126.355 185.521 1.00243.39 C
+ATOM 774 C6 G D 39 135.737 127.478 185.259 1.00243.39 C
+ATOM 775 O6 G D 39 135.466 128.666 185.433 1.00243.39 O
+ATOM 776 N1 G D 39 136.963 127.126 184.724 1.00243.39 N
+ATOM 777 C2 G D 39 137.378 125.844 184.495 1.00243.39 C
+ATOM 778 N2 G D 39 138.600 125.703 183.974 1.00243.39 N
+ATOM 779 N3 G D 39 136.630 124.780 184.734 1.00243.39 N
+ATOM 780 C4 G D 39 135.428 125.108 185.245 1.00243.39 C
+ATOM 781 P G D 40 135.876 121.103 189.741 1.00253.39 P
+ATOM 782 OP1 G D 40 135.876 119.947 190.669 1.00253.39 O
+ATOM 783 OP2 G D 40 135.540 122.460 190.236 1.00253.39 O
+ATOM 784 O5' G D 40 137.320 121.169 189.081 1.00253.39 O
+ATOM 785 C5' G D 40 138.460 121.459 189.872 1.00253.39 C
+ATOM 786 C4' G D 40 139.667 121.718 189.014 1.00253.39 C
+ATOM 787 O4' G D 40 139.221 122.191 187.714 1.00253.39 O
+ATOM 788 C3' G D 40 140.597 122.813 189.521 1.00253.39 C
+ATOM 789 O3' G D 40 141.565 122.351 190.445 1.00253.39 O
+ATOM 790 C2' G D 40 141.191 123.366 188.241 1.00253.39 C
+ATOM 791 O2' G D 40 142.180 122.485 187.735 1.00253.39 O
+ATOM 792 C1' G D 40 139.982 123.305 187.319 1.00253.39 C
+ATOM 793 N9 G D 40 139.107 124.482 187.478 1.00253.39 N
+ATOM 794 C8 G D 40 137.771 124.390 187.776 1.00253.39 C
+ATOM 795 N7 G D 40 137.166 125.530 187.878 1.00253.39 N
+ATOM 796 C5 G D 40 138.171 126.442 187.634 1.00253.39 C
+ATOM 797 C6 G D 40 138.093 127.845 187.608 1.00253.39 C
+ATOM 798 O6 G D 40 137.106 128.548 187.807 1.00253.39 O
+ATOM 799 N1 G D 40 139.316 128.422 187.319 1.00253.39 N
+ATOM 800 C2 G D 40 140.475 127.738 187.076 1.00253.39 C
+ATOM 801 N2 G D 40 141.547 128.507 186.832 1.00253.39 N
+ATOM 802 N3 G D 40 140.565 126.421 187.097 1.00253.39 N
+ATOM 803 C4 G D 40 139.379 125.830 187.377 1.00253.39 C
+ATOM 804 P C D 41 142.448 123.409 191.276 1.00258.12 P
+ATOM 805 OP1 C D 41 143.692 123.672 190.510 1.00258.12 O
+ATOM 806 OP2 C D 41 142.551 122.925 192.675 1.00258.12 O
+ATOM 807 O5' C D 41 141.567 124.739 191.259 1.00258.12 O
+ATOM 808 C5' C D 41 141.731 125.747 192.252 1.00258.12 C
+ATOM 809 C4' C D 41 141.408 127.121 191.718 1.00258.12 C
+ATOM 810 O4' C D 41 141.419 128.073 192.817 1.00258.12 O
+ATOM 811 C3' C D 41 142.398 127.658 190.690 1.00258.12 C
+ATOM 812 O3' C D 41 141.713 128.533 189.797 1.00258.12 O
+ATOM 813 C2' C D 41 143.352 128.472 191.553 1.00258.12 C
+ATOM 814 O2' C D 41 144.067 129.469 190.858 1.00258.12 O
+ATOM 815 C1' C D 41 142.392 129.072 192.571 1.00258.12 C
+ATOM 816 N1 C D 41 143.022 129.433 193.848 1.00258.12 N
+ATOM 817 C2 C D 41 143.505 130.730 194.003 1.00258.12 C
+ATOM 818 O2 C D 41 143.395 131.520 193.058 1.00258.12 O
+ATOM 819 N3 C D 41 144.085 131.085 195.168 1.00258.12 N
+ATOM 820 C4 C D 41 144.184 130.200 196.158 1.00258.12 C
+ATOM 821 N4 C D 41 144.763 130.596 197.293 1.00258.12 N
+ATOM 822 C5 C D 41 143.696 128.869 196.030 1.00258.12 C
+ATOM 823 C6 C D 41 143.126 128.533 194.868 1.00258.12 C
+ATOM 824 P A D 63 140.012 132.348 191.814 1.00240.02 P
+ATOM 825 OP1 A D 63 140.601 132.282 193.172 1.00240.02 O
+ATOM 826 OP2 A D 63 138.537 132.391 191.667 1.00240.02 O
+ATOM 827 O5' A D 63 140.621 133.620 191.077 1.00240.02 O
+ATOM 828 C5' A D 63 140.475 134.927 191.623 1.00240.02 C
+ATOM 829 C4' A D 63 139.313 135.662 191.004 1.00240.02 C
+ATOM 830 O4' A D 63 138.084 135.242 191.654 1.00240.02 O
+ATOM 831 C3' A D 63 139.359 137.178 191.127 1.00240.02 C
+ATOM 832 O3' A D 63 138.697 137.755 190.000 1.00240.02 O
+ATOM 833 C2' A D 63 138.540 137.427 192.388 1.00240.02 C
+ATOM 834 O2' A D 63 137.984 138.719 192.475 1.00240.02 O
+ATOM 835 C1' A D 63 137.462 136.355 192.266 1.00240.02 C
+ATOM 836 N9 A D 63 136.884 135.918 193.543 1.00240.02 N
+ATOM 837 C8 A D 63 135.711 136.373 194.092 1.00240.02 C
+ATOM 838 N7 A D 63 135.407 135.827 195.242 1.00240.02 N
+ATOM 839 C5 A D 63 136.452 134.950 195.474 1.00240.02 C
+ATOM 840 C6 A D 63 136.721 134.066 196.535 1.00240.02 C
+ATOM 841 N6 A D 63 135.925 133.921 197.597 1.00240.02 N
+ATOM 842 N1 A D 63 137.849 133.325 196.476 1.00240.02 N
+ATOM 843 C2 A D 63 138.652 133.465 195.416 1.00240.02 C
+ATOM 844 N3 A D 63 138.505 134.261 194.361 1.00240.02 N
+ATOM 845 C4 A D 63 137.376 134.992 194.443 1.00240.02 C
+ATOM 846 P C D 64 139.545 138.296 188.744 1.00247.30 P
+ATOM 847 OP1 C D 64 140.691 139.074 189.271 1.00247.30 O
+ATOM 848 OP2 C D 64 138.611 138.919 187.775 1.00247.30 O
+ATOM 849 O5' C D 64 140.121 136.982 188.056 1.00247.30 O
+ATOM 850 C5' C D 64 140.964 137.067 186.916 1.00247.30 C
+ATOM 851 C4' C D 64 141.936 135.917 186.881 1.00247.30 C
+ATOM 852 O4' C D 64 141.719 135.082 188.046 1.00247.30 O
+ATOM 853 C3' C D 64 141.795 134.967 185.701 1.00247.30 C
+ATOM 854 O3' C D 64 142.497 135.423 184.559 1.00247.30 O
+ATOM 855 C2' C D 64 142.323 133.646 186.252 1.00247.30 C
+ATOM 856 O2' C D 64 143.738 133.595 186.172 1.00247.30 O
+ATOM 857 C1' C D 64 141.921 133.725 187.721 1.00247.30 C
+ATOM 858 N1 C D 64 140.681 132.980 188.022 1.00247.30 N
+ATOM 859 C2 C D 64 140.715 131.604 188.229 1.00247.30 C
+ATOM 860 O2 C D 64 141.753 130.950 188.115 1.00247.30 O
+ATOM 861 N3 C D 64 139.569 130.960 188.509 1.00247.30 N
+ATOM 862 C4 C D 64 138.431 131.634 188.613 1.00247.30 C
+ATOM 863 N4 C D 64 137.325 130.962 188.903 1.00247.30 N
+ATOM 864 C5 C D 64 138.362 133.034 188.429 1.00247.30 C
+ATOM 865 C6 C D 64 139.500 133.648 188.130 1.00247.30 C
+ATOM 866 P C D 65 142.276 134.732 183.126 1.00250.03 P
+ATOM 867 OP1 C D 65 142.806 135.652 182.090 1.00250.03 O
+ATOM 868 OP2 C D 65 140.870 134.277 183.052 1.00250.03 O
+ATOM 869 O5' C D 65 143.202 133.438 183.168 1.00250.03 O
+ATOM 870 C5' C D 65 143.743 132.886 181.983 1.00250.03 C
+ATOM 871 C4' C D 65 143.534 131.396 181.921 1.00250.03 C
+ATOM 872 O4' C D 65 143.088 130.894 183.206 1.00250.03 O
+ATOM 873 C3' C D 65 142.467 130.925 180.953 1.00250.03 C
+ATOM 874 O3' C D 65 142.936 130.879 179.619 1.00250.03 O
+ATOM 875 C2' C D 65 142.086 129.560 181.509 1.00250.03 C
+ATOM 876 O2' C D 65 143.016 128.572 181.094 1.00250.03 O
+ATOM 877 C1' C D 65 142.246 129.776 183.016 1.00250.03 C
+ATOM 878 N1 C D 65 140.960 130.018 183.711 1.00250.03 N
+ATOM 879 C2 C D 65 140.066 128.967 183.972 1.00250.03 C
+ATOM 880 O2 C D 65 140.342 127.816 183.606 1.00250.03 O
+ATOM 881 N3 C D 65 138.917 129.239 184.625 1.00250.03 N
+ATOM 882 C4 C D 65 138.646 130.485 185.013 1.00250.03 C
+ATOM 883 N4 C D 65 137.500 130.702 185.654 1.00250.03 N
+ATOM 884 C5 C D 65 139.535 131.565 184.773 1.00250.03 C
+ATOM 885 C6 C D 65 140.665 131.280 184.127 1.00250.03 C
+ATOM 886 P A D 66 141.924 130.567 178.416 1.00241.35 P
+ATOM 887 OP1 A D 66 142.467 131.198 177.190 1.00241.35 O
+ATOM 888 OP2 A D 66 140.549 130.894 178.866 1.00241.35 O
+ATOM 889 O5' A D 66 142.008 128.988 178.242 1.00241.35 O
+ATOM 890 C5' A D 66 140.980 128.285 177.567 1.00241.35 C
+ATOM 891 C4' A D 66 140.601 127.027 178.301 1.00241.35 C
+ATOM 892 O4' A D 66 140.462 127.301 179.719 1.00241.35 O
+ATOM 893 C3' A D 66 139.271 126.411 177.906 1.00241.35 C
+ATOM 894 O3' A D 66 139.367 125.605 176.746 1.00241.35 O
+ATOM 895 C2' A D 66 138.889 125.623 179.146 1.00241.35 C
+ATOM 896 O2' A D 66 139.610 124.404 179.187 1.00241.35 O
+ATOM 897 C1' A D 66 139.415 126.526 180.261 1.00241.35 C
+ATOM 898 N9 A D 66 138.385 127.447 180.777 1.00241.35 N
+ATOM 899 C8 A D 66 138.545 128.806 180.852 1.00241.35 C
+ATOM 900 N7 A D 66 137.515 129.437 181.339 1.00241.35 N
+ATOM 901 C5 A D 66 136.616 128.423 181.608 1.00241.35 C
+ATOM 902 C6 A D 66 135.325 128.457 182.137 1.00241.35 C
+ATOM 903 N6 A D 66 134.738 129.592 182.496 1.00241.35 N
+ATOM 904 N1 A D 66 134.659 127.297 182.299 1.00241.35 N
+ATOM 905 C2 A D 66 135.255 126.155 181.952 1.00241.35 C
+ATOM 906 N3 A D 66 136.476 126.005 181.432 1.00241.35 N
+ATOM 907 C4 A D 66 137.122 127.183 181.279 1.00241.35 C
+ATOM 908 P A D 67 138.152 125.548 175.697 1.00230.34 P
+ATOM 909 OP1 A D 67 138.566 124.703 174.552 1.00230.34 O
+ATOM 910 OP2 A D 67 137.675 126.932 175.469 1.00230.34 O
+ATOM 911 O5' A D 67 137.005 124.765 176.469 1.00230.34 O
+ATOM 912 C5' A D 67 136.918 123.352 176.405 1.00230.34 C
+ATOM 913 C4' A D 67 135.608 122.876 176.968 1.00230.34 C
+ATOM 914 O4' A D 67 135.474 123.357 178.330 1.00230.34 O
+ATOM 915 C3' A D 67 134.364 123.400 176.269 1.00230.34 C
+ATOM 916 O3' A D 67 134.021 122.657 175.116 1.00230.34 O
+ATOM 917 C2' A D 67 133.315 123.326 177.362 1.00230.34 C
+ATOM 918 O2' A D 67 132.865 121.992 177.522 1.00230.34 O
+ATOM 919 C1' A D 67 134.134 123.712 178.587 1.00230.34 C
+ATOM 920 N9 A D 67 134.077 125.162 178.831 1.00230.34 N
+ATOM 921 C8 A D 67 135.069 126.081 178.618 1.00230.34 C
+ATOM 922 N7 A D 67 134.727 127.304 178.933 1.00230.34 N
+ATOM 923 C5 A D 67 133.416 127.177 179.355 1.00230.34 C
+ATOM 924 C6 A D 67 132.478 128.102 179.820 1.00230.34 C
+ATOM 925 N6 A D 67 132.730 129.403 179.930 1.00230.34 N
+ATOM 926 N1 A D 67 131.253 127.658 180.162 1.00230.34 N
+ATOM 927 C2 A D 67 130.988 126.356 180.052 1.00230.34 C
+ATOM 928 N3 A D 67 131.795 125.386 179.636 1.00230.34 N
+ATOM 929 C4 A D 67 133.003 125.865 179.304 1.00230.34 C
+ATOM 930 P U D 68 133.236 123.365 173.910 1.00226.91 P
+ATOM 931 OP1 U D 68 133.164 122.407 172.785 1.00226.91 O
+ATOM 932 OP2 U D 68 133.882 124.679 173.692 1.00226.91 O
+ATOM 933 O5' U D 68 131.752 123.576 174.461 1.00226.91 O
+ATOM 934 C5' U D 68 130.902 122.465 174.706 1.00226.91 C
+ATOM 935 C4' U D 68 129.635 122.861 175.424 1.00226.91 C
+ATOM 936 O4' U D 68 129.947 123.641 176.607 1.00226.91 O
+ATOM 937 C3' U D 68 128.677 123.742 174.645 1.00226.91 C
+ATOM 938 O3' U D 68 127.865 123.009 173.751 1.00226.91 O
+ATOM 939 C2' U D 68 127.875 124.423 175.744 1.00226.91 C
+ATOM 940 O2' U D 68 126.851 123.564 176.220 1.00226.91 O
+ATOM 941 C1' U D 68 128.924 124.587 176.845 1.00226.91 C
+ATOM 942 N1 U D 68 129.524 125.944 176.884 1.00226.91 N
+ATOM 943 C2 U D 68 128.795 127.073 177.245 1.00226.91 C
+ATOM 944 O2 U D 68 127.613 127.090 177.533 1.00226.91 O
+ATOM 945 N3 U D 68 129.508 128.242 177.240 1.00226.91 N
+ATOM 946 C4 U D 68 130.834 128.410 176.943 1.00226.91 C
+ATOM 947 O4 U D 68 131.350 129.523 176.986 1.00226.91 O
+ATOM 948 C5 U D 68 131.510 127.212 176.593 1.00226.91 C
+ATOM 949 C6 U D 68 130.854 126.058 176.585 1.00226.91 C
+ATOM 950 P A D 69 126.936 123.785 172.699 1.00233.76 P
+ATOM 951 OP1 A D 69 126.694 122.879 171.553 1.00233.76 O
+ATOM 952 OP2 A D 69 127.545 125.112 172.460 1.00233.76 O
+ATOM 953 O5' A D 69 125.557 124.019 173.457 1.00233.76 O
+ATOM 954 C5' A D 69 124.423 124.466 172.736 1.00233.76 C
+ATOM 955 C4' A D 69 123.682 125.559 173.460 1.00233.76 C
+ATOM 956 O4' A D 69 124.447 126.018 174.600 1.00233.76 O
+ATOM 957 C3' A D 69 123.423 126.816 172.648 1.00233.76 C
+ATOM 958 O3' A D 69 122.284 126.696 171.815 1.00233.76 O
+ATOM 959 C2' A D 69 123.271 127.890 173.714 1.00233.76 C
+ATOM 960 O2' A D 69 121.943 127.923 174.207 1.00233.76 O
+ATOM 961 C1' A D 69 124.216 127.392 174.813 1.00233.76 C
+ATOM 962 N9 A D 69 125.507 128.098 174.788 1.00233.76 N
+ATOM 963 C8 A D 69 126.736 127.552 174.537 1.00233.76 C
+ATOM 964 N7 A D 69 127.717 128.415 174.572 1.00233.76 N
+ATOM 965 C5 A D 69 127.081 129.607 174.866 1.00233.76 C
+ATOM 966 C6 A D 69 127.563 130.907 175.042 1.00233.76 C
+ATOM 967 N6 A D 69 128.851 131.209 174.937 1.00233.76 N
+ATOM 968 N1 A D 69 126.687 131.890 175.326 1.00233.76 N
+ATOM 969 C2 A D 69 125.394 131.585 175.430 1.00233.76 C
+ATOM 970 N3 A D 69 124.818 130.396 175.286 1.00233.76 N
+ATOM 971 C4 A D 69 125.721 129.437 175.004 1.00233.76 C
+ATOM 972 P U D 70 122.198 127.509 170.433 1.00227.44 P
+ATOM 973 OP1 U D 70 120.956 127.093 169.742 1.00227.44 O
+ATOM 974 OP2 U D 70 123.504 127.373 169.744 1.00227.44 O
+ATOM 975 O5' U D 70 121.997 129.025 170.873 1.00227.44 O
+ATOM 976 C5' U D 70 120.758 129.471 171.398 1.00227.44 C
+ATOM 977 C4' U D 70 120.837 130.914 171.819 1.00227.44 C
+ATOM 978 O4' U D 70 121.907 131.076 172.780 1.00227.44 O
+ATOM 979 C3' U D 70 121.169 131.908 170.717 1.00227.44 C
+ATOM 980 O3' U D 70 120.026 132.304 169.981 1.00227.44 O
+ATOM 981 C2' U D 70 121.801 133.061 171.477 1.00227.44 C
+ATOM 982 O2' U D 70 120.807 133.916 172.016 1.00227.44 O
+ATOM 983 C1' U D 70 122.503 132.344 172.627 1.00227.44 C
+ATOM 984 N1 U D 70 123.949 132.167 172.393 1.00227.44 N
+ATOM 985 C2 U D 70 124.776 133.261 172.519 1.00227.44 C
+ATOM 986 O2 U D 70 124.408 134.385 172.795 1.00227.44 O
+ATOM 987 N3 U D 70 126.097 133.022 172.306 1.00227.44 N
+ATOM 988 C4 U D 70 126.671 131.820 171.987 1.00227.44 C
+ATOM 989 O4 U D 70 127.885 131.760 171.826 1.00227.44 O
+ATOM 990 C5 U D 70 125.757 130.734 171.882 1.00227.44 C
+ATOM 991 C6 U D 70 124.458 130.938 172.086 1.00227.44 C
+ATOM 992 P U D 71 120.153 132.749 168.443 1.00220.79 P
+ATOM 993 OP1 U D 71 118.801 133.101 167.946 1.00220.79 O
+ATOM 994 OP2 U D 71 120.967 131.722 167.750 1.00220.79 O
+ATOM 995 O5' U D 71 120.975 134.109 168.482 1.00220.79 O
+ATOM 996 C5' U D 71 120.366 135.305 168.936 1.00220.79 C
+ATOM 997 C4' U D 71 121.388 136.391 169.116 1.00220.79 C
+ATOM 998 O4' U D 71 122.449 135.934 169.988 1.00220.79 O
+ATOM 999 C3' U D 71 122.114 136.826 167.860 1.00220.79 C
+ATOM 1000 O3' U D 71 121.357 137.739 167.090 1.00220.79 O
+ATOM 1001 C2' U D 71 123.395 137.433 168.406 1.00220.79 C
+ATOM 1002 O2' U D 71 123.180 138.774 168.812 1.00220.79 O
+ATOM 1003 C1' U D 71 123.656 136.578 169.645 1.00220.79 C
+ATOM 1004 N1 U D 71 124.717 135.568 169.432 1.00220.79 N
+ATOM 1005 C2 U D 71 126.042 135.986 169.410 1.00220.79 C
+ATOM 1006 O2 U D 71 126.413 137.138 169.549 1.00220.79 O
+ATOM 1007 N3 U D 71 126.966 134.999 169.221 1.00220.79 N
+ATOM 1008 C4 U D 71 126.708 133.658 169.056 1.00220.79 C
+ATOM 1009 O4 U D 71 127.649 132.882 168.892 1.00220.79 O
+ATOM 1010 C5 U D 71 125.323 133.303 169.095 1.00220.79 C
+ATOM 1011 C6 U D 71 124.398 134.246 169.276 1.00220.79 C
+ATOM 1012 P A D 72 121.524 137.781 165.496 1.00212.92 P
+ATOM 1013 OP1 A D 72 120.518 138.722 164.948 1.00212.92 O
+ATOM 1014 OP2 A D 72 121.578 136.382 165.012 1.00212.92 O
+ATOM 1015 O5' A D 72 122.952 138.442 165.272 1.00212.92 O
+ATOM 1016 C5' A D 72 123.192 139.792 165.628 1.00212.92 C
+ATOM 1017 C4' A D 72 124.664 140.091 165.597 1.00212.92 C
+ATOM 1018 O4' A D 72 125.356 139.238 166.538 1.00212.92 O
+ATOM 1019 C3' A D 72 125.348 139.812 164.276 1.00212.92 C
+ATOM 1020 O3' A D 72 125.174 140.869 163.359 1.00212.92 O
+ATOM 1021 C2' A D 72 126.796 139.591 164.685 1.00212.92 C
+ATOM 1022 O2' A D 72 127.456 140.832 164.873 1.00212.92 O
+ATOM 1023 C1' A D 72 126.634 138.909 166.041 1.00212.92 C
+ATOM 1024 N9 A D 72 126.748 137.444 165.951 1.00212.92 N
+ATOM 1025 C8 A D 72 125.743 136.515 165.955 1.00212.92 C
+ATOM 1026 N7 A D 72 126.167 135.281 165.872 1.00212.92 N
+ATOM 1027 C5 A D 72 127.541 135.415 165.812 1.00212.92 C
+ATOM 1028 C6 A D 72 128.579 134.481 165.713 1.00212.92 C
+ATOM 1029 N6 A D 72 128.371 133.169 165.651 1.00212.92 N
+ATOM 1030 N1 A D 72 129.847 134.936 165.669 1.00212.92 N
+ATOM 1031 C2 A D 72 130.056 136.253 165.729 1.00212.92 C
+ATOM 1032 N3 A D 72 129.161 137.229 165.820 1.00212.92 N
+ATOM 1033 C4 A D 72 127.914 136.737 165.857 1.00212.92 C
+ATOM 1034 P C D 73 124.081 140.739 162.196 1.00212.96 P
+ATOM 1035 OP1 C D 73 123.224 141.947 162.249 1.00212.96 O
+ATOM 1036 OP2 C D 73 123.457 139.399 162.310 1.00212.96 O
+ATOM 1037 O5' C D 73 124.943 140.794 160.860 1.00212.96 O
+ATOM 1038 C5' C D 73 125.699 141.950 160.547 1.00212.96 C
+ATOM 1039 C4' C D 73 127.176 141.666 160.568 1.00212.96 C
+ATOM 1040 O4' C D 73 127.507 140.872 161.733 1.00212.96 O
+ATOM 1041 C3' C D 73 127.711 140.847 159.410 1.00212.96 C
+ATOM 1042 O3' C D 73 127.909 141.612 158.238 1.00212.96 O
+ATOM 1043 C2' C D 73 128.993 140.259 159.985 1.00212.96 C
+ATOM 1044 O2' C D 73 130.052 141.200 159.925 1.00212.96 O
+ATOM 1045 C1' C D 73 128.612 140.044 161.451 1.00212.96 C
+ATOM 1046 N1 C D 73 128.281 138.629 161.762 1.00212.96 N
+ATOM 1047 C2 C D 73 129.256 137.872 162.413 1.00212.96 C
+ATOM 1048 O2 C D 73 130.332 138.413 162.690 1.00212.96 O
+ATOM 1049 N3 C D 73 129.018 136.576 162.715 1.00212.96 N
+ATOM 1050 C4 C D 73 127.852 136.022 162.410 1.00212.96 C
+ATOM 1051 N4 C D 73 127.658 134.742 162.715 1.00212.96 N
+ATOM 1052 C5 C D 73 126.838 136.757 161.743 1.00212.96 C
+ATOM 1053 C6 C D 73 127.089 138.037 161.447 1.00212.96 C
+ATOM 1054 P U D 74 128.289 140.883 156.861 1.00210.03 P
+ATOM 1055 OP1 U D 74 127.813 141.720 155.732 1.00210.03 O
+ATOM 1056 OP2 U D 74 127.864 139.464 156.952 1.00210.03 O
+ATOM 1057 O5' U D 74 129.877 140.901 156.852 1.00210.03 O
+ATOM 1058 C5' U D 74 130.598 139.915 156.141 1.00210.03 C
+ATOM 1059 C4' U D 74 131.886 139.561 156.833 1.00210.03 C
+ATOM 1060 O4' U D 74 131.639 139.234 158.228 1.00210.03 O
+ATOM 1061 C3' U D 74 132.590 138.335 156.287 1.00210.03 C
+ATOM 1062 O3' U D 74 133.335 138.613 155.117 1.00210.03 O
+ATOM 1063 C2' U D 74 133.444 137.886 157.462 1.00210.03 C
+ATOM 1064 O2' U D 74 134.622 138.672 157.549 1.00210.03 O
+ATOM 1065 C1' U D 74 132.538 138.227 158.653 1.00210.03 C
+ATOM 1066 N1 U D 74 131.750 137.066 159.155 1.00210.03 N
+ATOM 1067 C2 U D 74 132.337 135.926 159.708 1.00210.03 C
+ATOM 1068 O2 U D 74 133.534 135.707 159.806 1.00210.03 O
+ATOM 1069 N3 U D 74 131.450 134.956 160.111 1.00210.03 N
+ATOM 1070 C4 U D 74 130.076 134.995 160.080 1.00210.03 C
+ATOM 1071 O4 U D 74 129.431 134.037 160.501 1.00210.03 O
+ATOM 1072 C5 U D 74 129.548 136.193 159.520 1.00210.03 C
+ATOM 1073 C6 U D 74 130.381 137.147 159.110 1.00210.03 C
+ATOM 1074 P G D 75 133.115 137.721 153.803 1.00191.15 P
+ATOM 1075 OP1 G D 75 133.853 138.353 152.684 1.00191.15 O
+ATOM 1076 OP2 G D 75 131.660 137.459 153.674 1.00191.15 O
+ATOM 1077 O5' G D 75 133.844 136.352 154.152 1.00191.15 O
+ATOM 1078 C5' G D 75 135.217 136.338 154.509 1.00191.15 C
+ATOM 1079 C4' G D 75 135.588 135.057 155.204 1.00191.15 C
+ATOM 1080 O4' G D 75 134.939 134.987 156.496 1.00191.15 O
+ATOM 1081 C3' G D 75 135.155 133.781 154.509 1.00191.15 C
+ATOM 1082 O3' G D 75 136.038 133.409 153.472 1.00191.15 O
+ATOM 1083 C2' G D 75 135.122 132.779 155.649 1.00191.15 C
+ATOM 1084 O2' G D 75 136.435 132.334 155.949 1.00191.15 O
+ATOM 1085 C1' G D 75 134.636 133.644 156.805 1.00191.15 C
+ATOM 1086 N9 G D 75 133.187 133.526 157.032 1.00191.15 N
+ATOM 1087 C8 G D 75 132.260 134.508 156.839 1.00191.15 C
+ATOM 1088 N7 G D 75 131.053 134.139 157.152 1.00191.15 N
+ATOM 1089 C5 G D 75 131.194 132.826 157.558 1.00191.15 C
+ATOM 1090 C6 G D 75 130.223 131.905 157.998 1.00191.15 C
+ATOM 1091 O6 G D 75 129.011 132.076 158.118 1.00191.15 O
+ATOM 1092 N1 G D 75 130.778 130.679 158.309 1.00191.15 N
+ATOM 1093 C2 G D 75 132.106 130.387 158.206 1.00191.15 C
+ATOM 1094 N2 G D 75 132.463 129.149 158.541 1.00191.15 N
+ATOM 1095 N3 G D 75 133.026 131.236 157.800 1.00191.15 N
+ATOM 1096 C4 G D 75 132.503 132.431 157.492 1.00191.15 C
+ATOM 1097 P U D 76 135.475 132.710 152.145 1.00179.78 P
+ATOM 1098 OP1 U D 76 136.648 132.501 151.267 1.00179.78 O
+ATOM 1099 OP2 U D 76 134.302 133.478 151.660 1.00179.78 O
+ATOM 1100 O5' U D 76 134.981 131.280 152.639 1.00179.78 O
+ATOM 1101 C5' U D 76 135.917 130.316 153.088 1.00179.78 C
+ATOM 1102 C4' U D 76 135.236 129.089 153.626 1.00179.78 C
+ATOM 1103 O4' U D 76 134.386 129.423 154.746 1.00179.78 O
+ATOM 1104 C3' U D 76 134.305 128.384 152.668 1.00179.78 C
+ATOM 1105 O3' U D 76 135.009 127.581 151.747 1.00179.78 O
+ATOM 1106 C2' U D 76 133.420 127.580 153.603 1.00179.78 C
+ATOM 1107 O2' U D 76 134.084 126.398 154.010 1.00179.78 O
+ATOM 1108 C1' U D 76 133.317 128.507 154.814 1.00179.78 C
+ATOM 1109 N1 U D 76 132.040 129.247 154.864 1.00179.78 N
+ATOM 1110 C2 U D 76 130.924 128.584 155.329 1.00179.78 C
+ATOM 1111 O2 U D 76 130.927 127.427 155.690 1.00179.78 O
+ATOM 1112 N3 U D 76 129.775 129.314 155.366 1.00179.78 N
+ATOM 1113 C4 U D 76 129.632 130.621 154.991 1.00179.78 C
+ATOM 1114 O4 U D 76 128.530 131.154 155.079 1.00179.78 O
+ATOM 1115 C5 U D 76 130.830 131.244 154.528 1.00179.78 C
+ATOM 1116 C6 U D 76 131.969 130.554 154.484 1.00179.78 C
+ATOM 1117 P G D 77 134.707 127.702 150.179 1.00167.79 P
+ATOM 1118 OP1 G D 77 135.978 128.031 149.496 1.00167.79 O
+ATOM 1119 OP2 G D 77 133.502 128.545 150.005 1.00167.79 O
+ATOM 1120 O5' G D 77 134.315 126.225 149.763 1.00167.79 O
+ATOM 1121 C5' G D 77 134.239 125.209 150.744 1.00167.79 C
+ATOM 1122 C4' G D 77 132.890 124.552 150.727 1.00167.79 C
+ATOM 1123 O4' G D 77 132.074 125.033 151.817 1.00167.79 O
+ATOM 1124 C3' G D 77 132.047 124.828 149.504 1.00167.79 C
+ATOM 1125 O3' G D 77 132.430 124.037 148.398 1.00167.79 O
+ATOM 1126 C2' G D 77 130.644 124.521 149.996 1.00167.79 C
+ATOM 1127 O2' G D 77 130.387 123.130 149.915 1.00167.79 O
+ATOM 1128 C1' G D 77 130.715 124.922 151.464 1.00167.79 C
+ATOM 1129 N9 G D 77 130.009 126.184 151.737 1.00167.79 N
+ATOM 1130 C8 G D 77 130.476 127.466 151.676 1.00167.79 C
+ATOM 1131 N7 G D 77 129.568 128.347 151.994 1.00167.79 N
+ATOM 1132 C5 G D 77 128.439 127.598 152.271 1.00167.79 C
+ATOM 1133 C6 G D 77 127.143 127.994 152.664 1.00167.79 C
+ATOM 1134 O6 G D 77 126.714 129.131 152.863 1.00167.79 O
+ATOM 1135 N1 G D 77 126.302 126.914 152.845 1.00167.79 N
+ATOM 1136 C2 G D 77 126.668 125.613 152.662 1.00167.79 C
+ATOM 1137 N2 G D 77 125.729 124.690 152.877 1.00167.79 N
+ATOM 1138 N3 G D 77 127.870 125.228 152.290 1.00167.79 N
+ATOM 1139 C4 G D 77 128.697 126.266 152.112 1.00167.79 C
+ATOM 1140 P C D 78 131.707 124.225 146.979 1.00162.77 P
+ATOM 1141 OP1 C D 78 132.628 123.739 145.927 1.00162.77 O
+ATOM 1142 OP2 C D 78 131.168 125.601 146.896 1.00162.77 O
+ATOM 1143 O5' C D 78 130.466 123.237 147.042 1.00162.77 O
+ATOM 1144 C5' C D 78 129.244 123.595 146.429 1.00162.77 C
+ATOM 1145 C4' C D 78 128.062 123.070 147.190 1.00162.77 C
+ATOM 1146 O4' C D 78 127.999 123.693 148.491 1.00162.77 O
+ATOM 1147 C3' C D 78 126.731 123.394 146.563 1.00162.77 C
+ATOM 1148 O3' C D 78 126.396 122.453 145.574 1.00162.77 O
+ATOM 1149 C2' C D 78 125.782 123.374 147.744 1.00162.77 C
+ATOM 1150 O2' C D 78 125.448 122.037 148.072 1.00162.77 O
+ATOM 1151 C1' C D 78 126.664 123.937 148.854 1.00162.77 C
+ATOM 1152 N1 C D 78 126.504 125.399 149.070 1.00162.77 N
+ATOM 1153 C2 C D 78 125.357 125.970 149.644 1.00162.77 C
+ATOM 1154 O2 C D 78 124.401 125.267 149.979 1.00162.77 O
+ATOM 1155 N3 C D 78 125.299 127.307 149.821 1.00162.77 N
+ATOM 1156 C4 C D 78 126.319 128.073 149.465 1.00162.77 C
+ATOM 1157 N4 C D 78 126.234 129.386 149.655 1.00162.77 N
+ATOM 1158 C5 C D 78 127.499 127.535 148.906 1.00162.77 C
+ATOM 1159 C6 C D 78 127.542 126.217 148.740 1.00162.77 C
+ATOM 1160 P U D 79 126.113 122.934 144.080 1.00163.57 P
+ATOM 1161 OP1 U D 79 126.568 121.846 143.179 1.00163.57 O
+ATOM 1162 OP2 U D 79 126.664 124.299 143.920 1.00163.57 O
+ATOM 1163 O5' U D 79 124.529 123.033 144.044 1.00163.57 O
+ATOM 1164 C5' U D 79 123.747 122.175 144.860 1.00163.57 C
+ATOM 1165 C4' U D 79 122.441 122.817 145.219 1.00163.57 C
+ATOM 1166 O4' U D 79 122.576 123.628 146.409 1.00163.57 O
+ATOM 1167 C3' U D 79 121.902 123.775 144.187 1.00163.57 C
+ATOM 1168 O3' U D 79 121.285 123.090 143.121 1.00163.57 O
+ATOM 1169 C2' U D 79 120.950 124.635 145.001 1.00163.57 C
+ATOM 1170 O2' U D 79 119.719 123.960 145.188 1.00163.57 O
+ATOM 1171 C1' U D 79 121.670 124.706 146.350 1.00163.57 C
+ATOM 1172 N1 U D 79 122.418 125.967 146.557 1.00163.57 N
+ATOM 1173 C2 U D 79 121.741 127.094 146.968 1.00163.57 C
+ATOM 1174 O2 U D 79 120.543 127.112 147.143 1.00163.57 O
+ATOM 1175 N3 U D 79 122.510 128.213 147.154 1.00163.57 N
+ATOM 1176 C4 U D 79 123.869 128.304 146.987 1.00163.57 C
+ATOM 1177 O4 U D 79 124.451 129.365 147.187 1.00163.57 O
+ATOM 1178 C5 U D 79 124.496 127.098 146.575 1.00163.57 C
+ATOM 1179 C6 U D 79 123.770 126.001 146.386 1.00163.57 C
+ATOM 1180 P G D 80 120.873 123.880 141.798 1.00172.07 P
+ATOM 1181 OP1 G D 80 120.890 122.939 140.658 1.00172.07 O
+ATOM 1182 OP2 G D 80 121.679 125.118 141.739 1.00172.07 O
+ATOM 1183 O5' G D 80 119.365 124.279 142.078 1.00172.07 O
+ATOM 1184 C5' G D 80 118.773 125.361 141.392 1.00172.07 C
+ATOM 1185 C4' G D 80 117.800 126.094 142.271 1.00172.07 C
+ATOM 1186 O4' G D 80 118.403 126.366 143.559 1.00172.07 O
+ATOM 1187 C3' G D 80 117.384 127.454 141.755 1.00172.07 C
+ATOM 1188 O3' G D 80 116.338 127.355 140.814 1.00172.07 O
+ATOM 1189 C2' G D 80 116.994 128.191 143.023 1.00172.07 C
+ATOM 1190 O2' G D 80 115.707 127.781 143.451 1.00172.07 O
+ATOM 1191 C1' G D 80 118.014 127.641 144.014 1.00172.07 C
+ATOM 1192 N9 G D 80 119.233 128.462 144.120 1.00172.07 N
+ATOM 1193 C8 G D 80 120.484 127.927 144.005 1.00172.07 C
+ATOM 1194 N7 G D 80 121.450 128.773 144.152 1.00172.07 N
+ATOM 1195 C5 G D 80 120.791 129.955 144.381 1.00172.07 C
+ATOM 1196 C6 G D 80 121.349 131.219 144.609 1.00172.07 C
+ATOM 1197 O6 G D 80 122.542 131.516 144.636 1.00172.07 O
+ATOM 1198 N1 G D 80 120.369 132.170 144.798 1.00172.07 N
+ATOM 1199 C2 G D 80 119.024 131.927 144.779 1.00172.07 C
+ATOM 1200 N2 G D 80 118.229 132.978 144.990 1.00172.07 N
+ATOM 1201 N3 G D 80 118.489 130.742 144.570 1.00172.07 N
+ATOM 1202 C4 G D 80 119.424 129.797 144.383 1.00172.07 C
+ATOM 1203 P C D 81 116.624 127.669 139.271 1.00182.40 P
+ATOM 1204 OP1 C D 81 115.798 126.757 138.447 1.00182.40 O
+ATOM 1205 OP2 C D 81 118.092 127.707 139.076 1.00182.40 O
+ATOM 1206 O5' C D 81 116.064 129.144 139.102 1.00182.40 O
+ATOM 1207 C5' C D 81 114.911 129.546 139.820 1.00182.40 C
+ATOM 1208 C4' C D 81 115.007 130.983 140.254 1.00182.40 C
+ATOM 1209 O4' C D 81 115.904 131.109 141.387 1.00182.40 O
+ATOM 1210 C3' C D 81 115.582 131.939 139.231 1.00182.40 C
+ATOM 1211 O3' C D 81 114.645 132.320 138.244 1.00182.40 O
+ATOM 1212 C2' C D 81 116.049 133.096 140.094 1.00182.40 C
+ATOM 1213 O2' C D 81 114.954 133.919 140.454 1.00182.40 O
+ATOM 1214 C1' C D 81 116.539 132.370 141.349 1.00182.40 C
+ATOM 1215 N1 C D 81 118.010 132.176 141.381 1.00182.40 N
+ATOM 1216 C2 C D 81 118.868 133.246 141.677 1.00182.40 C
+ATOM 1217 O2 C D 81 118.389 134.365 141.896 1.00182.40 O
+ATOM 1218 N3 C D 81 120.199 133.036 141.722 1.00182.40 N
+ATOM 1219 C4 C D 81 120.682 131.827 141.477 1.00182.40 C
+ATOM 1220 N4 C D 81 122.001 131.659 141.524 1.00182.40 N
+ATOM 1221 C5 C D 81 119.843 130.724 141.183 1.00182.40 C
+ATOM 1222 C6 C D 81 118.530 130.939 141.156 1.00182.40 C
+ATOM 1223 P U D 82 115.115 132.490 136.720 1.00181.36 P
+ATOM 1224 OP1 U D 82 113.931 132.349 135.841 1.00181.36 O
+ATOM 1225 OP2 U D 82 116.298 131.622 136.516 1.00181.36 O
+ATOM 1226 O5' U D 82 115.620 133.994 136.639 1.00181.36 O
+ATOM 1227 C5' U D 82 117.002 134.293 136.719 1.00181.36 C
+ATOM 1228 C4' U D 82 117.244 135.552 137.505 1.00181.36 C
+ATOM 1229 O4' U D 82 118.018 135.248 138.692 1.00181.36 O
+ATOM 1230 C3' U D 82 118.056 136.611 136.788 1.00181.36 C
+ATOM 1231 O3' U D 82 117.256 137.409 135.937 1.00181.36 O
+ATOM 1232 C2' U D 82 118.675 137.396 137.931 1.00181.36 C
+ATOM 1233 O2' U D 82 117.742 138.326 138.454 1.00181.36 O
+ATOM 1234 C1' U D 82 118.903 136.304 138.978 1.00181.36 C
+ATOM 1235 N1 U D 82 120.280 135.750 139.004 1.00181.36 N
+ATOM 1236 C2 U D 82 121.407 136.515 139.281 1.00181.36 C
+ATOM 1237 O2 U D 82 121.435 137.712 139.492 1.00181.36 O
+ATOM 1238 N3 U D 82 122.590 135.831 139.289 1.00181.36 N
+ATOM 1239 C4 U D 82 122.773 134.486 139.090 1.00181.36 C
+ATOM 1240 O4 U D 82 123.908 134.019 139.127 1.00181.36 O
+ATOM 1241 C5 U D 82 121.578 133.760 138.826 1.00181.36 C
+ATOM 1242 C6 U D 82 120.413 134.400 138.806 1.00181.36 C
+ATOM 1243 P U D 83 117.665 137.610 134.400 1.00175.26 P
+ATOM 1244 OP1 U D 83 116.607 138.411 133.742 1.00175.26 O
+ATOM 1245 OP2 U D 83 118.051 136.288 133.851 1.00175.26 O
+ATOM 1246 O5' U D 83 118.963 138.522 134.473 1.00175.26 O
+ATOM 1247 C5' U D 83 118.921 139.783 135.116 1.00175.26 C
+ATOM 1248 C4' U D 83 120.306 140.259 135.441 1.00175.26 C
+ATOM 1249 O4' U D 83 120.897 139.420 136.463 1.00175.26 O
+ATOM 1250 C3' U D 83 121.288 140.179 134.296 1.00175.26 C
+ATOM 1251 O3' U D 83 121.148 141.258 133.400 1.00175.26 O
+ATOM 1252 C2' U D 83 122.628 140.144 135.007 1.00175.26 C
+ATOM 1253 O2' U D 83 123.009 141.450 135.406 1.00175.26 O
+ATOM 1254 C1' U D 83 122.291 139.336 136.261 1.00175.26 C
+ATOM 1255 N1 U D 83 122.672 137.910 136.154 1.00175.26 N
+ATOM 1256 C2 U D 83 124.001 137.554 136.285 1.00175.26 C
+ATOM 1257 O2 U D 83 124.911 138.340 136.470 1.00175.26 O
+ATOM 1258 N3 U D 83 124.257 136.218 136.186 1.00175.26 N
+ATOM 1259 C4 U D 83 123.345 135.219 135.974 1.00175.26 C
+ATOM 1260 O4 U D 83 123.736 134.057 135.916 1.00175.26 O
+ATOM 1261 C5 U D 83 121.994 135.659 135.868 1.00175.26 C
+ATOM 1262 C6 U D 83 121.712 136.956 135.954 1.00175.26 C
+ATOM 1263 P U D 84 121.454 141.037 131.847 1.00169.99 P
+ATOM 1264 OP1 U D 84 120.694 142.050 131.076 1.00169.99 O
+ATOM 1265 OP2 U D 84 121.281 139.595 131.554 1.00169.99 O
+ATOM 1266 O5' U D 84 122.995 141.392 131.732 1.00169.99 O
+ATOM 1267 C5' U D 84 123.492 142.571 132.334 1.00169.99 C
+ATOM 1268 C4' U D 84 124.988 142.533 132.442 1.00169.99 C
+ATOM 1269 O4' U D 84 125.396 141.592 133.467 1.00169.99 O
+ATOM 1270 C3' U D 84 125.718 142.049 131.210 1.00169.99 C
+ATOM 1271 O3' U D 84 125.789 143.023 130.192 1.00169.99 O
+ATOM 1272 C2' U D 84 127.063 141.637 131.778 1.00169.99 C
+ATOM 1273 O2' U D 84 127.861 142.780 132.037 1.00169.99 O
+ATOM 1274 C1' U D 84 126.645 141.030 133.120 1.00169.99 C
+ATOM 1275 N1 U D 84 126.513 139.556 133.058 1.00169.99 N
+ATOM 1276 C2 U D 84 127.662 138.802 133.142 1.00169.99 C
+ATOM 1277 O2 U D 84 128.772 139.280 133.258 1.00169.99 O
+ATOM 1278 N3 U D 84 127.484 137.451 133.078 1.00169.99 N
+ATOM 1279 C4 U D 84 126.293 136.786 132.947 1.00169.99 C
+ATOM 1280 O4 U D 84 126.294 135.561 132.904 1.00169.99 O
+ATOM 1281 C5 U D 84 125.145 137.624 132.871 1.00169.99 C
+ATOM 1282 C6 U D 84 125.295 138.944 132.929 1.00169.99 C
+ATOM 1283 P A D 85 125.055 142.745 128.796 1.00160.95 P
+ATOM 1284 OP1 A D 85 124.637 144.046 128.226 1.00160.95 O
+ATOM 1285 OP2 A D 85 124.055 141.674 129.012 1.00160.95 O
+ATOM 1286 O5' A D 85 126.212 142.162 127.882 1.00160.95 O
+ATOM 1287 C5' A D 85 127.368 142.939 127.633 1.00160.95 C
+ATOM 1288 C4' A D 85 128.608 142.090 127.594 1.00160.95 C
+ATOM 1289 O4' A D 85 128.731 141.332 128.819 1.00160.95 O
+ATOM 1290 C3' A D 85 128.659 141.037 126.508 1.00160.95 C
+ATOM 1291 O3' A D 85 129.037 141.578 125.259 1.00160.95 O
+ATOM 1292 C2' A D 85 129.674 140.050 127.060 1.00160.95 C
+ATOM 1293 O2' A D 85 130.988 140.535 126.846 1.00160.95 O
+ATOM 1294 C1' A D 85 129.383 140.111 128.556 1.00160.95 C
+ATOM 1295 N9 A D 85 128.536 139.002 129.030 1.00160.95 N
+ATOM 1296 C8 A D 85 127.224 139.054 129.417 1.00160.95 C
+ATOM 1297 N7 A D 85 126.739 137.909 129.825 1.00160.95 N
+ATOM 1298 C5 A D 85 127.808 137.048 129.695 1.00160.95 C
+ATOM 1299 C6 A D 85 127.946 135.686 129.956 1.00160.95 C
+ATOM 1300 N6 A D 85 126.955 134.935 130.420 1.00160.95 N
+ATOM 1301 N1 A D 85 129.143 135.117 129.727 1.00160.95 N
+ATOM 1302 C2 A D 85 130.136 135.879 129.261 1.00160.95 C
+ATOM 1303 N3 A D 85 130.126 137.174 128.973 1.00160.95 N
+ATOM 1304 C4 A D 85 128.921 137.702 129.216 1.00160.95 C
+ATOM 1305 P G D 86 128.341 141.056 123.915 1.00149.79 P
+ATOM 1306 OP1 G D 86 129.165 141.509 122.772 1.00149.79 O
+ATOM 1307 OP2 G D 86 126.903 141.396 123.973 1.00149.79 O
+ATOM 1308 O5' G D 86 128.464 139.476 124.019 1.00149.79 O
+ATOM 1309 C5' G D 86 129.044 138.721 122.970 1.00149.79 C
+ATOM 1310 C4' G D 86 130.248 137.954 123.448 1.00149.79 C
+ATOM 1311 O4' G D 86 130.136 137.688 124.866 1.00149.79 O
+ATOM 1312 C3' G D 86 130.438 136.586 122.828 1.00149.79 C
+ATOM 1313 O3' G D 86 131.071 136.664 121.570 1.00149.79 O
+ATOM 1314 C2' G D 86 131.262 135.853 123.873 1.00149.79 C
+ATOM 1315 O2' G D 86 132.623 136.232 123.771 1.00149.79 O
+ATOM 1316 C1' G D 86 130.707 136.439 125.171 1.00149.79 C
+ATOM 1317 N9 G D 86 129.667 135.607 125.807 1.00149.79 N
+ATOM 1318 C8 G D 86 128.404 136.046 126.101 1.00149.79 C
+ATOM 1319 N7 G D 86 127.658 135.159 126.683 1.00149.79 N
+ATOM 1320 C5 G D 86 128.486 134.066 126.789 1.00149.79 C
+ATOM 1321 C6 G D 86 128.221 132.807 127.349 1.00149.79 C
+ATOM 1322 O6 G D 86 127.174 132.401 127.847 1.00149.79 O
+ATOM 1323 N1 G D 86 129.305 131.959 127.258 1.00149.79 N
+ATOM 1324 C2 G D 86 130.510 132.309 126.732 1.00149.79 C
+ATOM 1325 N2 G D 86 131.454 131.366 126.746 1.00149.79 N
+ATOM 1326 N3 G D 86 130.778 133.485 126.208 1.00149.79 N
+ATOM 1327 C4 G D 86 129.728 134.317 126.270 1.00149.79 C
+ATOM 1328 P U D 87 130.409 135.948 120.304 1.00132.65 P
+ATOM 1329 OP1 U D 87 131.008 136.544 119.087 1.00132.65 O
+ATOM 1330 OP2 U D 87 128.941 135.955 120.487 1.00132.65 O
+ATOM 1331 O5' U D 87 130.921 134.451 120.428 1.00132.65 O
+ATOM 1332 C5' U D 87 132.302 134.193 120.575 1.00132.65 C
+ATOM 1333 C4' U D 87 132.557 132.860 121.218 1.00132.65 C
+ATOM 1334 O4' U D 87 132.054 132.849 122.574 1.00132.65 O
+ATOM 1335 C3' U D 87 131.887 131.667 120.576 1.00132.65 C
+ATOM 1336 O3' U D 87 132.590 131.211 119.439 1.00132.65 O
+ATOM 1337 C2' U D 87 131.886 130.654 121.703 1.00132.65 C
+ATOM 1338 O2' U D 87 133.147 130.024 121.793 1.00132.65 O
+ATOM 1339 C1' U D 87 131.708 131.534 122.936 1.00132.65 C
+ATOM 1340 N1 U D 87 130.328 131.512 123.456 1.00132.65 N
+ATOM 1341 C2 U D 87 129.894 130.401 124.145 1.00132.65 C
+ATOM 1342 O2 U D 87 130.583 129.431 124.370 1.00132.65 O
+ATOM 1343 N3 U D 87 128.613 130.457 124.607 1.00132.65 N
+ATOM 1344 C4 U D 87 127.736 131.494 124.428 1.00132.65 C
+ATOM 1345 O4 U D 87 126.607 131.418 124.898 1.00132.65 O
+ATOM 1346 C5 U D 87 128.255 132.605 123.710 1.00132.65 C
+ATOM 1347 C6 U D 87 129.504 132.578 123.259 1.00132.65 C
+ATOM 1348 P G D 88 131.850 130.375 118.289 1.00125.42 P
+ATOM 1349 OP1 G D 88 132.824 130.211 117.192 1.00125.42 O
+ATOM 1350 OP2 G D 88 130.554 131.022 117.996 1.00125.42 O
+ATOM 1351 O5' G D 88 131.607 128.935 118.929 1.00125.42 O
+ATOM 1352 C5' G D 88 132.702 128.074 119.191 1.00125.42 C
+ATOM 1353 C4' G D 88 132.351 126.957 120.142 1.00125.42 C
+ATOM 1354 O4' G D 88 131.723 127.466 121.339 1.00125.42 O
+ATOM 1355 C3' G D 88 131.358 125.943 119.642 1.00125.42 C
+ATOM 1356 O3' G D 88 131.930 125.037 118.755 1.00125.42 O
+ATOM 1357 C2' G D 88 130.881 125.302 120.920 1.00125.42 C
+ATOM 1358 O2' G D 88 131.881 124.442 121.434 1.00125.42 O
+ATOM 1359 C1' G D 88 130.800 126.516 121.827 1.00125.42 C
+ATOM 1360 N9 G D 88 129.462 127.119 121.812 1.00125.42 N
+ATOM 1361 C8 G D 88 129.160 128.371 121.366 1.00125.42 C
+ATOM 1362 N7 G D 88 127.903 128.662 121.478 1.00125.42 N
+ATOM 1363 C5 G D 88 127.358 127.528 122.033 1.00125.42 C
+ATOM 1364 C6 G D 88 126.030 127.272 122.381 1.00125.42 C
+ATOM 1365 O6 G D 88 125.065 128.019 122.253 1.00125.42 O
+ATOM 1366 N1 G D 88 125.874 126.014 122.920 1.00125.42 N
+ATOM 1367 C2 G D 88 126.880 125.116 123.096 1.00125.42 C
+ATOM 1368 N2 G D 88 126.542 123.946 123.631 1.00125.42 N
+ATOM 1369 N3 G D 88 128.130 125.342 122.773 1.00125.42 N
+ATOM 1370 C4 G D 88 128.295 126.563 122.247 1.00125.42 C
+ATOM 1371 P U D 89 131.531 125.186 117.234 1.00122.34 P
+ATOM 1372 OP1 U D 89 132.796 125.209 116.487 1.00122.34 O
+ATOM 1373 OP2 U D 89 130.631 126.352 117.161 1.00122.34 O
+ATOM 1374 O5' U D 89 130.713 123.863 116.933 1.00122.34 O
+ATOM 1375 C5' U D 89 131.336 122.615 117.136 1.00122.34 C
+ATOM 1376 C4' U D 89 130.460 121.649 117.877 1.00122.34 C
+ATOM 1377 O4' U D 89 129.944 122.238 119.088 1.00122.34 O
+ATOM 1378 C3' U D 89 129.224 121.180 117.157 1.00122.34 C
+ATOM 1379 O3' U D 89 129.523 120.202 116.189 1.00122.34 O
+ATOM 1380 C2' U D 89 128.381 120.643 118.296 1.00122.34 C
+ATOM 1381 O2' U D 89 128.845 119.360 118.661 1.00122.34 O
+ATOM 1382 C1' U D 89 128.736 121.605 119.430 1.00122.34 C
+ATOM 1383 N1 U D 89 127.692 122.621 119.654 1.00122.34 N
+ATOM 1384 C2 U D 89 126.576 122.267 120.371 1.00122.34 C
+ATOM 1385 O2 U D 89 126.393 121.166 120.837 1.00122.34 O
+ATOM 1386 N3 U D 89 125.655 123.251 120.537 1.00122.34 N
+ATOM 1387 C4 U D 89 125.732 124.530 120.067 1.00122.34 C
+ATOM 1388 O4 U D 89 124.818 125.307 120.298 1.00122.34 O
+ATOM 1389 C5 U D 89 126.916 124.828 119.349 1.00122.34 C
+ATOM 1390 C6 U D 89 127.834 123.883 119.168 1.00122.34 C
+ATOM 1391 P G D 90 128.409 119.780 115.129 1.00127.32 P
+ATOM 1392 OP1 G D 90 127.680 118.627 115.683 1.00127.32 O
+ATOM 1393 OP2 G D 90 129.004 119.673 113.782 1.00127.32 O
+ATOM 1394 O5' G D 90 127.419 121.014 115.146 1.00127.32 O
+ATOM 1395 C5' G D 90 126.174 120.945 114.482 1.00127.32 C
+ATOM 1396 C4' G D 90 125.052 120.664 115.442 1.00127.32 C
+ATOM 1397 O4' G D 90 125.101 121.577 116.561 1.00127.32 O
+ATOM 1398 C3' G D 90 123.665 120.844 114.874 1.00127.32 C
+ATOM 1399 O3' G D 90 123.258 119.694 114.171 1.00127.32 O
+ATOM 1400 C2' G D 90 122.819 121.103 116.108 1.00127.32 C
+ATOM 1401 O2' G D 90 122.463 119.881 116.723 1.00127.32 O
+ATOM 1402 C1' G D 90 123.798 121.827 117.031 1.00127.32 C
+ATOM 1403 N9 G D 90 123.592 123.283 117.098 1.00127.32 N
+ATOM 1404 C8 G D 90 124.594 124.191 116.947 1.00127.32 C
+ATOM 1405 N7 G D 90 124.209 125.421 117.065 1.00127.32 N
+ATOM 1406 C5 G D 90 122.861 125.321 117.321 1.00127.32 C
+ATOM 1407 C6 G D 90 121.911 126.343 117.528 1.00127.32 C
+ATOM 1408 O6 G D 90 122.075 127.564 117.541 1.00127.32 O
+ATOM 1409 N1 G D 90 120.653 125.836 117.770 1.00127.32 N
+ATOM 1410 C2 G D 90 120.353 124.510 117.781 1.00127.32 C
+ATOM 1411 N2 G D 90 119.078 124.205 118.015 1.00127.32 N
+ATOM 1412 N3 G D 90 121.222 123.549 117.590 1.00127.32 N
+ATOM 1413 C4 G D 90 122.459 124.012 117.360 1.00127.32 C
+ATOM 1414 P A D 91 122.684 119.812 112.686 1.00125.62 P
+ATOM 1415 OP1 A D 91 122.643 118.425 112.176 1.00125.62 O
+ATOM 1416 OP2 A D 91 123.453 120.840 111.955 1.00125.62 O
+ATOM 1417 O5' A D 91 121.204 120.358 112.893 1.00125.62 O
+ATOM 1418 C5' A D 91 120.309 119.681 113.757 1.00125.62 C
+ATOM 1419 C4' A D 91 119.066 120.483 114.019 1.00125.62 C
+ATOM 1420 O4' A D 91 119.325 121.517 114.990 1.00125.62 O
+ATOM 1421 C3' A D 91 118.506 121.216 112.826 1.00125.62 C
+ATOM 1422 O3' A D 91 117.704 120.354 112.054 1.00125.62 O
+ATOM 1423 C2' A D 91 117.719 122.357 113.455 1.00125.62 C
+ATOM 1424 O2' A D 91 116.424 121.914 113.815 1.00125.62 O
+ATOM 1425 C1' A D 91 118.503 122.629 114.736 1.00125.62 C
+ATOM 1426 N9 A D 91 119.361 123.817 114.669 1.00125.62 N
+ATOM 1427 C8 A D 91 120.712 123.783 114.509 1.00125.62 C
+ATOM 1428 N7 A D 91 121.275 124.956 114.519 1.00125.62 N
+ATOM 1429 C5 A D 91 120.217 125.811 114.690 1.00125.62 C
+ATOM 1430 C6 A D 91 120.173 127.191 114.794 1.00125.62 C
+ATOM 1431 N6 A D 91 121.264 127.930 114.697 1.00125.62 N
+ATOM 1432 N1 A D 91 118.987 127.791 114.970 1.00125.62 N
+ATOM 1433 C2 A D 91 117.903 127.011 115.062 1.00125.62 C
+ATOM 1434 N3 A D 91 117.823 125.682 114.985 1.00125.62 N
+ATOM 1435 C4 A D 91 119.029 125.137 114.804 1.00125.62 C
+ATOM 1436 P C D 92 117.301 120.752 110.564 1.00125.68 P
+ATOM 1437 OP1 C D 92 116.890 119.514 109.866 1.00125.68 O
+ATOM 1438 OP2 C D 92 118.397 121.579 110.018 1.00125.68 O
+ATOM 1439 O5' C D 92 116.023 121.679 110.749 1.00125.68 O
+ATOM 1440 C5' C D 92 115.760 122.730 109.838 1.00125.68 C
+ATOM 1441 C4' C D 92 114.971 123.843 110.472 1.00125.68 C
+ATOM 1442 O4' C D 92 115.656 124.326 111.658 1.00125.68 O
+ATOM 1443 C3' C D 92 114.822 125.077 109.612 1.00125.68 C
+ATOM 1444 O3' C D 92 113.788 124.945 108.664 1.00125.68 O
+ATOM 1445 C2' C D 92 114.577 126.167 110.634 1.00125.68 C
+ATOM 1446 O2' C D 92 113.245 126.097 111.110 1.00125.68 O
+ATOM 1447 C1' C D 92 115.500 125.726 111.767 1.00125.68 C
+ATOM 1448 N1 C D 92 116.848 126.347 111.716 1.00125.68 N
+ATOM 1449 C2 C D 92 117.105 127.723 111.882 1.00125.68 C
+ATOM 1450 O2 C D 92 116.189 128.532 112.073 1.00125.68 O
+ATOM 1451 N3 C D 92 118.388 128.155 111.840 1.00125.68 N
+ATOM 1452 C4 C D 92 119.383 127.304 111.640 1.00125.68 C
+ATOM 1453 N4 C D 92 120.633 127.750 111.605 1.00125.68 N
+ATOM 1454 C5 C D 92 119.159 125.923 111.482 1.00125.68 C
+ATOM 1455 C6 C D 92 117.903 125.507 111.539 1.00125.68 C
+ATOM 1456 P A D 93 114.150 124.955 107.107 1.00129.75 P
+ATOM 1457 OP1 A D 93 113.057 124.272 106.378 1.00129.75 O
+ATOM 1458 OP2 A D 93 115.540 124.462 106.975 1.00129.75 O
+ATOM 1459 O5' A D 93 114.123 126.499 106.731 1.00129.75 O
+ATOM 1460 C5' A D 93 112.943 127.254 106.931 1.00129.75 C
+ATOM 1461 C4' A D 93 113.230 128.725 107.046 1.00129.75 C
+ATOM 1462 O4' A D 93 113.945 129.007 108.270 1.00129.75 O
+ATOM 1463 C3' A D 93 114.111 129.313 105.969 1.00129.75 C
+ATOM 1464 O3' A D 93 113.396 129.569 104.781 1.00129.75 O
+ATOM 1465 C2' A D 93 114.635 130.572 106.631 1.00129.75 C
+ATOM 1466 O2' A D 93 113.644 131.581 106.607 1.00129.75 O
+ATOM 1467 C1' A D 93 114.803 130.107 108.075 1.00129.75 C
+ATOM 1468 N9 A D 93 116.179 129.679 108.353 1.00129.75 N
+ATOM 1469 C8 A D 93 116.629 128.413 108.600 1.00129.75 C
+ATOM 1470 N7 A D 93 117.915 128.348 108.816 1.00129.75 N
+ATOM 1471 C5 A D 93 118.320 129.659 108.705 1.00129.75 C
+ATOM 1472 C6 A D 93 119.561 130.264 108.808 1.00129.75 C
+ATOM 1473 N6 A D 93 120.657 129.582 109.091 1.00129.75 N
+ATOM 1474 N1 A D 93 119.656 131.598 108.652 1.00129.75 N
+ATOM 1475 C2 A D 93 118.540 132.280 108.411 1.00129.75 C
+ATOM 1476 N3 A D 93 117.311 131.812 108.257 1.00129.75 N
+ATOM 1477 C4 A D 93 117.271 130.489 108.423 1.00129.75 C
+ATOM 1478 P G D 94 114.153 129.509 103.374 1.00137.28 P
+ATOM 1479 OP1 G D 94 113.132 129.424 102.302 1.00137.28 O
+ATOM 1480 OP2 G D 94 115.191 128.457 103.489 1.00137.28 O
+ATOM 1481 O5' G D 94 114.880 130.921 103.275 1.00137.28 O
+ATOM 1482 C5' G D 94 114.130 132.122 103.312 1.00137.28 C
+ATOM 1483 C4' G D 94 115.025 133.330 103.286 1.00137.28 C
+ATOM 1484 O4' G D 94 115.735 133.454 104.541 1.00137.28 O
+ATOM 1485 C3' G D 94 116.126 133.308 102.250 1.00137.28 C
+ATOM 1486 O3' G D 94 115.665 133.663 100.965 1.00137.28 O
+ATOM 1487 C2' G D 94 117.124 134.295 102.821 1.00137.28 C
+ATOM 1488 O2' G D 94 116.672 135.618 102.601 1.00137.28 O
+ATOM 1489 C1' G D 94 117.012 134.003 104.313 1.00137.28 C
+ATOM 1490 N9 G D 94 118.021 133.036 104.765 1.00137.28 N
+ATOM 1491 C8 G D 94 117.798 131.738 105.128 1.00137.28 C
+ATOM 1492 N7 G D 94 118.882 131.115 105.492 1.00137.28 N
+ATOM 1493 C5 G D 94 119.878 132.065 105.361 1.00137.28 C
+ATOM 1494 C6 G D 94 121.262 131.977 105.620 1.00137.28 C
+ATOM 1495 O6 G D 94 121.925 131.021 106.020 1.00137.28 O
+ATOM 1496 N1 G D 94 121.909 133.162 105.354 1.00137.28 N
+ATOM 1497 C2 G D 94 121.290 134.284 104.907 1.00137.28 C
+ATOM 1498 N2 G D 94 122.093 135.326 104.714 1.00137.28 N
+ATOM 1499 N3 G D 94 120.003 134.395 104.667 1.00137.28 N
+ATOM 1500 C4 G D 94 119.360 133.253 104.920 1.00137.28 C
+ATOM 1501 P G D 95 116.256 132.917 99.679 1.00138.92 P
+ATOM 1502 OP1 G D 95 115.542 133.433 98.489 1.00138.92 O
+ATOM 1503 OP2 G D 95 116.257 131.464 99.968 1.00138.92 O
+ATOM 1504 O5' G D 95 117.761 133.427 99.604 1.00138.92 O
+ATOM 1505 C5' G D 95 118.050 134.706 99.072 1.00138.92 C
+ATOM 1506 C4' G D 95 119.424 135.187 99.455 1.00138.92 C
+ATOM 1507 O4' G D 95 119.692 134.904 100.847 1.00138.92 O
+ATOM 1508 C3' G D 95 120.592 134.562 98.726 1.00138.92 C
+ATOM 1509 O3' G D 95 120.786 135.134 97.450 1.00138.92 O
+ATOM 1510 C2' G D 95 121.747 134.836 99.672 1.00138.92 C
+ATOM 1511 O2' G D 95 122.190 136.170 99.522 1.00138.92 O
+ATOM 1512 C1' G D 95 121.073 134.706 101.035 1.00138.92 C
+ATOM 1513 N9 G D 95 121.279 133.376 101.618 1.00138.92 N
+ATOM 1514 C8 G D 95 120.303 132.448 101.828 1.00138.92 C
+ATOM 1515 N7 G D 95 120.754 131.347 102.347 1.00138.92 N
+ATOM 1516 C5 G D 95 122.108 131.569 102.476 1.00138.92 C
+ATOM 1517 C6 G D 95 123.101 130.718 102.987 1.00138.92 C
+ATOM 1518 O6 G D 95 122.968 129.576 103.426 1.00138.92 O
+ATOM 1519 N1 G D 95 124.347 131.309 102.955 1.00138.92 N
+ATOM 1520 C2 G D 95 124.591 132.565 102.491 1.00138.92 C
+ATOM 1521 N2 G D 95 125.865 132.953 102.546 1.00138.92 N
+ATOM 1522 N3 G D 95 123.670 133.380 102.022 1.00138.92 N
+ATOM 1523 C4 G D 95 122.454 132.816 102.039 1.00138.92 C
+ATOM 1524 P G D 96 121.449 134.277 96.271 1.00138.54 P
+ATOM 1525 OP1 G D 96 121.279 135.046 95.019 1.00138.54 O
+ATOM 1526 OP2 G D 96 120.931 132.894 96.356 1.00138.54 O
+ATOM 1527 O5' G D 96 123.000 134.266 96.625 1.00138.54 O
+ATOM 1528 C5' G D 96 123.743 135.472 96.632 1.00138.54 C
+ATOM 1529 C4' G D 96 125.077 135.304 97.313 1.00138.54 C
+ATOM 1530 O4' G D 96 124.901 134.768 98.649 1.00138.54 O
+ATOM 1531 C3' G D 96 126.039 134.336 96.658 1.00138.54 C
+ATOM 1532 O3' G D 96 126.710 134.909 95.555 1.00138.54 O
+ATOM 1533 C2' G D 96 126.967 133.978 97.805 1.00138.54 C
+ATOM 1534 O2' G D 96 127.882 135.034 98.034 1.00138.54 O
+ATOM 1535 C1' G D 96 125.991 133.935 98.979 1.00138.54 C
+ATOM 1536 N9 G D 96 125.471 132.578 99.230 1.00138.54 N
+ATOM 1537 C8 G D 96 124.150 132.240 99.157 1.00138.54 C
+ATOM 1538 N7 G D 96 123.910 130.994 99.426 1.00138.54 N
+ATOM 1539 C5 G D 96 125.155 130.470 99.679 1.00138.54 C
+ATOM 1540 C6 G D 96 125.501 129.152 100.014 1.00138.54 C
+ATOM 1541 O6 G D 96 124.751 128.188 100.149 1.00138.54 O
+ATOM 1542 N1 G D 96 126.857 129.001 100.202 1.00138.54 N
+ATOM 1543 C2 G D 96 127.761 130.008 100.074 1.00138.54 C
+ATOM 1544 N2 G D 96 129.027 129.652 100.296 1.00138.54 N
+ATOM 1545 N3 G D 96 127.454 131.254 99.762 1.00138.54 N
+ATOM 1546 C4 G D 96 126.134 131.423 99.579 1.00138.54 C
+ATOM 1547 P A D 97 126.977 134.033 94.244 1.00143.82 P
+ATOM 1548 OP1 A D 97 127.380 134.926 93.138 1.00143.82 O
+ATOM 1549 OP2 A D 97 125.811 133.142 94.072 1.00143.82 O
+ATOM 1550 O5' A D 97 128.237 133.150 94.632 1.00143.82 O
+ATOM 1551 C5' A D 97 129.473 133.758 94.969 1.00143.82 C
+ATOM 1552 C4' A D 97 130.453 132.733 95.466 1.00143.82 C
+ATOM 1553 O4' A D 97 129.947 132.154 96.689 1.00143.82 O
+ATOM 1554 C3' A D 97 130.670 131.552 94.536 1.00143.82 C
+ATOM 1555 O3' A D 97 131.700 131.809 93.603 1.00143.82 O
+ATOM 1556 C2' A D 97 131.003 130.411 95.482 1.00143.82 C
+ATOM 1557 O2' A D 97 132.372 130.458 95.840 1.00143.82 O
+ATOM 1558 C1' A D 97 130.183 130.770 96.711 1.00143.82 C
+ATOM 1559 N9 A D 97 128.874 130.102 96.782 1.00143.82 N
+ATOM 1560 C8 A D 97 127.668 130.731 96.671 1.00143.82 C
+ATOM 1561 N7 A D 97 126.633 129.953 96.823 1.00143.82 N
+ATOM 1562 C5 A D 97 127.201 128.725 97.053 1.00143.82 C
+ATOM 1563 C6 A D 97 126.629 127.482 97.294 1.00143.82 C
+ATOM 1564 N6 A D 97 125.315 127.289 97.322 1.00143.82 N
+ATOM 1565 N1 A D 97 127.459 126.442 97.488 1.00143.82 N
+ATOM 1566 C2 A D 97 128.780 126.652 97.453 1.00143.82 C
+ATOM 1567 N3 A D 97 129.437 127.787 97.241 1.00143.82 N
+ATOM 1568 C4 A D 97 128.579 128.794 97.048 1.00143.82 C
+ATOM 1569 P U D 98 131.475 131.516 92.046 1.00145.88 P
+ATOM 1570 OP1 U D 98 132.491 132.271 91.280 1.00145.88 O
+ATOM 1571 OP2 U D 98 130.038 131.707 91.759 1.00145.88 O
+ATOM 1572 O5' U D 98 131.796 129.969 91.900 1.00145.88 O
+ATOM 1573 C5' U D 98 132.976 129.420 92.456 1.00145.88 C
+ATOM 1574 C4' U D 98 132.826 127.938 92.651 1.00145.88 C
+ATOM 1575 O4' U D 98 131.935 127.676 93.758 1.00145.88 O
+ATOM 1576 C3' U D 98 132.198 127.200 91.489 1.00145.88 C
+ATOM 1577 O3' U D 98 133.148 126.910 90.485 1.00145.88 O
+ATOM 1578 C2' U D 98 131.608 125.958 92.147 1.00145.88 C
+ATOM 1579 O2' U D 98 132.603 124.964 92.320 1.00145.88 O
+ATOM 1580 C1' U D 98 131.223 126.483 93.530 1.00145.88 C
+ATOM 1581 N1 U D 98 129.778 126.749 93.682 1.00145.88 N
+ATOM 1582 C2 U D 98 128.988 125.699 94.073 1.00145.88 C
+ATOM 1583 O2 U D 98 129.432 124.587 94.264 1.00145.88 O
+ATOM 1584 N3 U D 98 127.663 125.992 94.228 1.00145.88 N
+ATOM 1585 C4 U D 98 127.065 127.211 94.035 1.00145.88 C
+ATOM 1586 O4 U D 98 125.858 127.327 94.214 1.00145.88 O
+ATOM 1587 C5 U D 98 127.950 128.253 93.646 1.00145.88 C
+ATOM 1588 C6 U D 98 129.246 127.991 93.487 1.00145.88 C
+ATOM 1589 P A D 99 132.731 126.951 88.939 1.00174.16 P
+ATOM 1590 OP1 A D 99 133.664 127.856 88.225 1.00174.16 O
+ATOM 1591 OP2 A D 99 131.272 127.203 88.871 1.00174.16 O
+ATOM 1592 O5' A D 99 133.005 125.463 88.451 1.00174.16 O
+ATOM 1593 C5' A D 99 133.695 124.548 89.286 1.00174.16 C
+ATOM 1594 C4' A D 99 133.320 123.131 88.958 1.00174.16 C
+ATOM 1595 O4' A D 99 132.482 123.119 87.780 1.00174.16 O
+ATOM 1596 C3' A D 99 134.478 122.218 88.605 1.00174.16 C
+ATOM 1597 O3' A D 99 135.091 121.675 89.753 1.00174.16 O
+ATOM 1598 C2' A D 99 133.829 121.164 87.722 1.00174.16 C
+ATOM 1599 O2' A D 99 133.194 120.175 88.513 1.00174.16 O
+ATOM 1600 C1' A D 99 132.752 121.975 87.008 1.00174.16 C
+ATOM 1601 N9 A D 99 133.150 122.387 85.652 1.00174.16 N
+ATOM 1602 C8 A D 99 132.843 121.716 84.501 1.00174.16 C
+ATOM 1603 N7 A D 99 133.302 122.276 83.415 1.00174.16 N
+ATOM 1604 C5 A D 99 133.957 123.394 83.886 1.00174.16 C
+ATOM 1605 C6 A D 99 134.658 124.408 83.228 1.00174.16 C
+ATOM 1606 N6 A D 99 134.818 124.453 81.907 1.00174.16 N
+ATOM 1607 N1 A D 99 135.201 125.387 83.977 1.00174.16 N
+ATOM 1608 C2 A D 99 135.041 125.343 85.301 1.00174.16 C
+ATOM 1609 N3 A D 99 134.407 124.441 86.026 1.00174.16 N
+ATOM 1610 C4 A D 99 133.879 123.481 85.260 1.00174.16 C
+ATOM 1611 P C D 100 136.680 121.494 89.799 1.00204.27 P
+ATOM 1612 OP1 C D 100 137.022 120.622 90.949 1.00204.27 O
+ATOM 1613 OP2 C D 100 137.296 122.833 89.682 1.00204.27 O
+ATOM 1614 O5' C D 100 137.009 120.693 88.465 1.00204.27 O
+ATOM 1615 C5' C D 100 138.152 121.018 87.688 1.00204.27 C
+ATOM 1616 C4' C D 100 139.359 120.242 88.138 1.00204.27 C
+ATOM 1617 O4' C D 100 138.989 119.382 89.250 1.00204.27 O
+ATOM 1618 C3' C D 100 139.955 119.325 87.083 1.00204.27 C
+ATOM 1619 O3' C D 100 141.346 119.184 87.355 1.00204.27 O
+ATOM 1620 C2' C D 100 139.249 118.009 87.373 1.00204.27 C
+ATOM 1621 O2' C D 100 139.912 116.862 86.885 1.00204.27 O
+ATOM 1622 C1' C D 100 139.196 118.031 88.896 1.00204.27 C
+ATOM 1623 N1 C D 100 138.105 117.236 89.472 1.00204.27 N
+ATOM 1624 C2 C D 100 138.393 115.987 90.020 1.00204.27 C
+ATOM 1625 O2 C D 100 139.560 115.574 90.005 1.00204.27 O
+ATOM 1626 N3 C D 100 137.395 115.251 90.548 1.00204.27 N
+ATOM 1627 C4 C D 100 136.154 115.731 90.547 1.00204.27 C
+ATOM 1628 N4 C D 100 135.190 114.981 91.078 1.00204.27 N
+ATOM 1629 C5 C D 100 135.832 117.001 89.998 1.00204.27 C
+ATOM 1630 C6 C D 100 136.831 117.713 89.477 1.00204.27 C
+ATOM 1631 P A D 101 142.385 118.913 86.166 1.00212.63 P
+ATOM 1632 OP1 A D 101 143.039 120.200 85.840 1.00212.63 O
+ATOM 1633 OP2 A D 101 141.686 118.157 85.099 1.00212.63 O
+ATOM 1634 O5' A D 101 143.476 117.963 86.830 1.00212.63 O
+ATOM 1635 C5' A D 101 143.102 116.979 87.781 1.00212.63 C
+ATOM 1636 C4' A D 101 143.837 115.688 87.544 1.00212.63 C
+ATOM 1637 O4' A D 101 143.373 114.675 88.470 1.00212.63 O
+ATOM 1638 C3' A D 101 143.636 115.054 86.181 1.00212.63 C
+ATOM 1639 O3' A D 101 144.429 115.652 85.172 1.00212.63 O
+ATOM 1640 C2' A D 101 143.980 113.596 86.443 1.00212.63 C
+ATOM 1641 O2' A D 101 145.387 113.414 86.490 1.00212.63 O
+ATOM 1642 C1' A D 101 143.422 113.406 87.853 1.00212.63 C
+ATOM 1643 N9 A D 101 142.065 112.836 87.830 1.00212.63 N
+ATOM 1644 C8 A D 101 140.878 113.466 88.103 1.00212.63 C
+ATOM 1645 N7 A D 101 139.834 112.683 87.989 1.00212.63 N
+ATOM 1646 C5 A D 101 140.372 111.457 87.621 1.00212.63 C
+ATOM 1647 C6 A D 101 139.798 110.202 87.343 1.00212.63 C
+ATOM 1648 N6 A D 101 138.487 109.960 87.400 1.00212.63 N
+ATOM 1649 N1 A D 101 140.623 109.185 87.000 1.00212.63 N
+ATOM 1650 C2 A D 101 141.941 109.414 86.939 1.00212.63 C
+ATOM 1651 N3 A D 101 142.596 110.548 87.178 1.00212.63 N
+ATOM 1652 C4 A D 101 141.747 111.538 87.516 1.00212.63 C
+TER 1653 A D 101
+MASTER 10 0 0 0 0 0 0 3 1652 1 0 6
+END
diff --git a/CryoREAD/graph/Assign_ops.py b/CryoREAD/graph/Assign_ops.py
new file mode 100644
index 0000000..85ab89c
--- /dev/null
+++ b/CryoREAD/graph/Assign_ops.py
@@ -0,0 +1,200 @@
+
+from scipy.spatial.distance import cdist
+from ops.math_calcuation import calculate_distance,calculate_cosine_value
+import numpy as np
+def Assign_PhoLDP_Sugarpath(Path_ID_List,sugar_point,pho_point,cut_off_length=7):
+ sugar_coodinate = sugar_point.merged_cd_dens[:,:3]
+ pho_coordinate = pho_point.merged_cd_dens[:,:3]
+ distance_array = cdist(sugar_coodinate,pho_coordinate)
+ Path_P_align_list=[]
+ Path_P_reverse_align_list=[]
+ Path_Pho_ID_List=[]
+
+ for cur_path_list in Path_ID_List:
+ current_length = len(cur_path_list)
+ tmp_point_list = np.zeros([current_length,2])-1#use -1 to indicate the non-assignment
+ for k in range(current_length-1):
+ node_id1 = int(cur_path_list[k])
+ node_id2 = int(cur_path_list[k+1])
+ distance_node1 = distance_array[node_id1]
+ distance_node2 = distance_array[node_id2]
+ combine_distance = distance_node1+distance_node2
+ nearby_index = int(np.argmin(combine_distance))
+ minimum_distance_now = combine_distance[nearby_index]
+ if minimum_distance_now>cut_off_length*2:
+ continue
+
+
+
+ location_node1 = sugar_coodinate[node_id1]
+ location_node2 = sugar_coodinate[node_id2]
+ location_pho_now = pho_coordinate[nearby_index]
+
+ s1_s2_edge = calculate_distance(location_node1,location_node2)
+ s1_p_edge = distance_node1[nearby_index]
+ s2_p_edge = distance_node2[nearby_index]
+
+ cosine_s2_s1_p = calculate_cosine_value(s1_s2_edge,s1_p_edge,s2_p_edge)
+ cosine_s1_s2_p = calculate_cosine_value(s1_s2_edge,s2_p_edge,s1_p_edge)
+ cosine_s1_p_s2 = calculate_cosine_value(s2_p_edge,s1_p_edge,s1_s2_edge)
+ if cosine_s2_s1_p<0 or cosine_s1_p_s2<0 or cosine_s1_s2_p<0:
+ continue
+ #all angles<=90, we can apply projection to the main path
+ tmp_point_list[k,1]= nearby_index#end pointer for this
+ tmp_point_list[k+1,0]=nearby_index #starting pointer for the following point
+ #assign prev pho node for 1st node , following node of the last node
+ begining_node = cur_path_list[0]
+ distance_begin_node = distance_array[begining_node]
+ distance_next_node = distance_array[int(cur_path_list[1])]
+ location_node1 = sugar_coodinate[begining_node]
+ location_node2 = sugar_coodinate[int(cur_path_list[1])]
+ possible_candidate_index = np.argwhere(distance_begin_node<=cut_off_length)
+ for candidate_index in possible_candidate_index:
+ s1_s2_edge = calculate_distance(location_node1,location_node2)
+ s1_p_edge = distance_begin_node[candidate_index]
+ s2_p_edge = distance_next_node[candidate_index]
+ cosine_s2_s1_p = calculate_cosine_value(s1_s2_edge,s1_p_edge,s2_p_edge)
+ if cosine_s2_s1_p<0:
+ tmp_point_list[0,0]=candidate_index
+ break
+ #assign the pho node for the last sugar node in the path
+ ending_node = cur_path_list[-1]
+ distance_end_node = distance_array[ending_node]
+ distance_prev_node = distance_array[int(cur_path_list[-2])]
+ location_node1 = sugar_coodinate[ending_node]
+ location_node2 = sugar_coodinate[int(cur_path_list[-2])]
+ possible_candidate_index = np.argwhere(distance_end_node<=cut_off_length)
+ second_choice = -1
+
+ for candidate_index in possible_candidate_index:
+ s1_s2_edge = calculate_distance(location_node1,location_node2)
+ s1_p_edge = distance_end_node[candidate_index]
+ s2_p_edge = distance_prev_node[candidate_index]
+ cosine_s2_s1_p = calculate_cosine_value(s1_s2_edge,s1_p_edge,s2_p_edge)
+ if cosine_s2_s1_p<0:
+ tmp_point_list[-1,1]=candidate_index
+ break
+
+
+
+ #assign previous pointer to every allowed sugar points as a reward for them
+ #from begining to ending, assign starting point
+ cur_align_list1=[]
+ cur_id_list = []
+ cur_starting_point=tmp_point_list[0,0]
+ for k in range(len(tmp_point_list)):
+ if tmp_point_list[k,0]==-1:
+ node_id1 = int(cur_path_list[k])
+ distance_node1 = distance_array[node_id1]
+ cur_distance = distance_node1[int(cur_starting_point)]
+ if cur_distance<=cut_off_length:
+ cur_align_list1.append(cur_starting_point)
+ else:
+ cur_align_list1.append(-1)
+ else:
+ cur_starting_point=tmp_point_list[k,0]
+ cur_align_list1.append(cur_starting_point)
+ cur_id_list.append(cur_starting_point)
+ cur_id_list.append(tmp_point_list[-1,1])
+ Path_Pho_ID_List.append(cur_id_list)
+ #2 from ending to begining, assign ending point
+ cur_reverse_align_list =[]
+ cur_starting_point=tmp_point_list[-1,1]
+ for k in range(len(tmp_point_list)-1, -1, -1):
+ if tmp_point_list[k,1]==-1:
+ node_id1 = int(cur_path_list[k])
+ distance_node1 = distance_array[node_id1]
+ cur_distance = distance_node1[int(cur_starting_point)]
+ if cur_distance<=cut_off_length:
+ cur_reverse_align_list.append(cur_starting_point)
+ else:
+ cur_reverse_align_list.append(-1)
+ else:
+ cur_starting_point=tmp_point_list[k,1]
+ cur_reverse_align_list.append(cur_starting_point)
+ cur_reverse_align_list=cur_reverse_align_list[::-1]
+ assert len(cur_align_list1)==len(cur_reverse_align_list) and len(cur_align_list1)==len(cur_path_list)
+ Path_P_align_list.append(cur_align_list1)
+ Path_P_reverse_align_list.append(cur_reverse_align_list)
+ return Path_P_align_list,Path_P_reverse_align_list,Path_Pho_ID_List
+def Assign_Base_Main_Path_sugar(Path_ID_List,pho_coordinate,Base_LDP_List,base_prob_array,cut_off_length=10):
+
+ Base_Coord_List = []
+ for base_ldp in Base_LDP_List:
+ if len(base_ldp.merged_cd_dens)>0:
+ base_coord = base_ldp.merged_cd_dens[:,:3]
+ Base_Coord_List.append(base_coord)
+ Base_Coord_List = np.concatenate(Base_Coord_List,axis=0)
+
+ distance_array = cdist(pho_coordinate,Base_Coord_List)
+ All_Base_Assign_List =[]#list of list: where probability of specific base will be put here.
+ count_out =0
+ count_total=0
+ for cur_path_list in Path_ID_List:
+ current_length = len(cur_path_list)
+ count_total+=current_length
+ tmp_list = []
+ for k in range(current_length):
+ node_id1 = int(cur_path_list[k])
+ distance_node1 = distance_array[node_id1]
+ nearby_index = np.argmin(distance_node1)
+ minimum_distance_now = distance_node1[nearby_index]
+ cur_base_coord = Base_Coord_List[int(nearby_index)]
+ current_tmp_prob = base_prob_array[:,int(cur_base_coord[0]),int(cur_base_coord[1]),int(cur_base_coord[2])]
+ if minimum_distance_now>cut_off_length:
+ count_out+=1
+ tmp_list.append(current_tmp_prob*0.5)
+ continue
+ else:
+ tmp_list.append(current_tmp_prob)
+ All_Base_Assign_List.append(tmp_list)
+ print("we have %d/%d base assignment outside safe range: %.3f"%(count_out,count_total,count_out/count_total))
+ return All_Base_Assign_List
+
+
+def Assign_Base_Main_Path(Path_ID_List,pho_coordinate,Base_LDP_List,base_prob_array,cut_off_length=10,reverse_flag=False,return_dict=False):
+
+ Base_Coord_List = []
+ for base_ldp in Base_LDP_List:
+ if len(base_ldp.merged_cd_dens)>0:
+ base_coord = base_ldp.merged_cd_dens[:,:3]
+ Base_Coord_List.append(base_coord)
+ Base_Coord_List = np.concatenate(Base_Coord_List,axis=0)
+
+ distance_array = cdist(pho_coordinate,Base_Coord_List)
+ All_Base_Assign_List =[]#list of list: where probability of specific base will be put here.
+ Assign_Refer_Dict = {}#[node_id]:[prob]
+ count_out =0
+ count_total=0
+ for cur_path_list in Path_ID_List:
+ current_length = len(cur_path_list)
+ count_total+=current_length
+ if reverse_flag is True:
+ cur_path_list = cur_path_list[::-1]
+ tmp_list = []
+ for k in range(current_length-1):
+ node_id1 = int(cur_path_list[k])
+ node_id2 = int(cur_path_list[k+1])
+ distance_node1 = distance_array[node_id1]
+ distance_node2 = distance_array[node_id2]
+ combine_distance = distance_node1+distance_node2
+ nearby_index = np.argmin(combine_distance)
+ minimum_distance_now = combine_distance[nearby_index]
+ cur_base_coord = Base_Coord_List[int(nearby_index)]
+ current_tmp_prob = base_prob_array[:,int(cur_base_coord[0]),int(cur_base_coord[1]),int(cur_base_coord[2])]
+ if minimum_distance_now>cut_off_length*2:
+ count_out+=1
+ tmp_list.append(current_tmp_prob*0.5)
+ Assign_Refer_Dict[node_id1]=current_tmp_prob*0.5
+ continue
+ else:
+ tmp_list.append(current_tmp_prob)
+ Assign_Refer_Dict[node_id1]=current_tmp_prob
+ if reverse_flag:
+ tmp_list = tmp_list[::-1]
+ All_Base_Assign_List.append(tmp_list)
+ print("we have %d/%d base assignment outside safe range: %.3f"%(count_out,count_total,count_out/count_total))
+ if return_dict:
+ return All_Base_Assign_List,Assign_Refer_Dict
+ else:
+ return All_Base_Assign_List
diff --git a/CryoREAD/graph/Build_Unet_Graph.py b/CryoREAD/graph/Build_Unet_Graph.py
new file mode 100644
index 0000000..66bf440
--- /dev/null
+++ b/CryoREAD/graph/Build_Unet_Graph.py
@@ -0,0 +1,429 @@
+import shutil
+import mrcfile
+from data_processing.map_utils import process_map_data
+import numpy as np
+from structure.MRC import MRC
+from structure.Tree import Tree
+from graph.LDP_ops import build_LDP,Build_Base_LDP,Build_Baseall_LDP,prepare_all_sugar_location
+from graph.Graph_ops import construct_graph
+import os
+from ops.os_operation import mkdir
+from graph.path_utils import collect_all_searched_path
+from ops.fasta_utils import read_fasta
+from graph.Assign_ops import Assign_Base_Main_Path_sugar,Assign_PhoLDP_Sugarpath,Assign_Base_Main_Path
+from graph.visualize_utils import Visualize_Path_Base,Visualize_LDP_Path
+import pickle
+from graph.geo_utils import Match_Sugar_Base_Location
+from graph.DP_ops import greedy_assign_PS
+from graph.assemble_ops import build_collision_table,solve_assignment
+from graph.assignment_ext import Extend_Solve_Assignment_SP_support
+from graph.reassign_ops import reassign_basedonfrag,merge_assign_geo_seq
+
+from data_processing.format_pdb import format_pdb,remove_op3_pdb
+from data_processing.Gen_MaskDRNA_map import Gen_MaskDRNA_map,Gen_MaskProtein_map
+from graph.geo_structure_modeling import Build_Atomic_Structure
+
+def refine_structure_global(init_pdb_path,format_pdb_path,root_save_path,frag_collect_dir,
+ chain_prob,origin_map_path,refined_pdb_path,params,DNA_label=False):
+ try:
+ #remove this to remove dependencies of pymol
+ #os.system("pymol -cq ops/save_formated_pdb.py "+str(init_pdb_path)+" "+str(format_pdb_path))
+ format_pdb(init_pdb_path,format_pdb_path)
+ # 5.4.0 prepare the mask map
+ mask_map_path = os.path.join(root_save_path,"mask_map.mrc")
+ Gen_MaskDRNA_map(chain_prob,origin_map_path,mask_map_path,params['contour'],threshold=0.6)
+ #run coot+phenix refinement
+ output_dir = os.path.join(root_save_path,"Output")
+ mkdir(output_dir)
+ refine0_pdb_path = os.path.join(output_dir,"Refine_cycle0.pdb")
+ shutil.copy(format_pdb_path,refine0_pdb_path)
+ # 5.4 build final atomic structure with phenix.real_space_refine
+ if params['colab']:
+ os.system('cd %s; /content/phenix/phenix-1.20.1-4487/build/bin/phenix.real_space_refine %s %s '
+ 'resolution=%.4f output.suffix="_phenix_refine" skip_map_model_overlap_check=True'%(frag_collect_dir,format_pdb_path,mask_map_path,params['resolution']))
+ else:
+ os.system('cd %s; phenix.real_space_refine %s %s resolution=%.4f output.suffix="_phenix_refine" skip_map_model_overlap_check=True'%(frag_collect_dir,format_pdb_path,mask_map_path,params['resolution']))
+ gen_pdb_path = format_pdb_path[:-4]+"_phenix_refine_000.pdb"
+ count_check=0
+ while not os.path.exists(gen_pdb_path) and count_check<5:
+ gen_pdb_path = format_pdb_path[:-4]+"_phenix_refine_00%d.pdb"%(count_check+1)
+ count_check+=1
+ if not os.path.exists(gen_pdb_path):
+ print("please check final non-refined atomic structure in %s"%refine0_pdb_path)
+ return
+ try:
+ #run coot+phenix refinement
+
+ refine1_pdb_path = os.path.join(output_dir,"Refine_cycle1.pdb")
+ remove_op3_pdb(gen_pdb_path,refine1_pdb_path)
+ #do coot refinement
+ refine2_pdb_path = os.path.join(output_dir,"Refine_cycle2.pdb")
+ from coot.coot_refine_structure import coot_refine_structure
+ if params['colab']:
+ coot_software="/content/coot/bin/coot"
+ else:
+ coot_software="coot"
+ coot_refine_structure(refine1_pdb_path,mask_map_path,refine2_pdb_path,coot_software)
+ refine3_pdb_path = os.path.join(output_dir,"Refine_cycle3.pdb")
+ if params['colab']:
+ os.system('cd %s; /content/phenix/phenix-1.20.1-4487/build/bin/phenix.real_space_refine %s %s '
+ 'resolution=%.4f output.suffix="_phenix_refine skip_map_model_overlap_check=True"'%(output_dir,refine2_pdb_path,mask_map_path,params['resolution']))
+ else:
+ os.system('cd %s; phenix.real_space_refine %s %s resolution=%.4f '
+ 'output.suffix="_phenix_refine" skip_map_model_overlap_check=True'%(output_dir,refine2_pdb_path,mask_map_path,params['resolution']))
+ phenix_final_pdb = refine2_pdb_path[:-4]+"_phenix_refine_000.pdb"
+ count_check=0
+ while not os.path.exists(phenix_final_pdb) and count_check<5:
+ phenix_final_pdb = refine2_pdb_path[:-4]+"_phenix_refine_00%d.pdb"%(count_check+1)
+ count_check+=1
+ if os.path.exists(phenix_final_pdb):
+ #shutil.move(phenix_final_pdb,refine3_pdb_path)
+ format_pdb(phenix_final_pdb,refine3_pdb_path,DNA_label)
+ print("please check final refined atomic structure in %s"%refine3_pdb_path)
+ print("You can also check other refined output here %s"%output_dir)
+ else:
+ print("please check final refined atomic structure (pdb format) in this directory %s"%output_dir)
+ except:
+ #Final_Assemble_20_2_20_formated_phenix_refine_000.pdb
+ print("refinement failed!")
+ if os.path.exists(gen_pdb_path):
+ #shutil.copy(gen_pdb_path,refined_pdb_path)
+ format_pdb(gen_pdb_path,refined_pdb_path,DNA_label)
+ print("please check final refined atomic structure in %s"%refined_pdb_path)
+ else:
+ print("please check final refined atomic structure in this directory %s"%frag_collect_dir)
+
+
+ except Exception as e:
+ print("Refinement failed because of the possible error",e)
+
+def Build_Unet_Graph(origin_map_path,chain_prob_path,fasta_path,save_path,
+ gaussian_bandwidth,dcut,rdcut, params):
+ root_save_path = os.path.split(save_path)[0]
+ pho_prob_threshold=0.1#make sure all necessary edges are connected
+ sugar_prob_threshold = 0.1
+ base_prob_threshold = 0.25
+ #0.1 read sequence information
+ map_data, mapc, mapr, maps, origin, nxstart, nystart, nzstart = process_map_data(origin_map_path)
+ map_info_list=[mapc, mapr, maps, origin, nxstart, nystart, nzstart]
+ chain_class =8
+ #["sugar", "phosphate","A","UT","C","G","protein","base"]
+ #0.2 read sequence information
+ if fasta_path is not None and os.path.exists(fasta_path) and os.path.getsize(fasta_path)>0:
+ chain_dict,DNA_Label= read_fasta(input_fasta_path=fasta_path,dna_check=True)
+ #DNA_Label = read_dna_label(chain_dict)
+ print("we have %d chains in provided fasta files"%(len(chain_dict)))
+ else:
+ chain_dict = None
+ DNA_Label=False#default processing as RNA
+
+
+ chain_prob = np.load(chain_prob_path)#[sugar,phosphate,A,UT,C,G,protein,base]
+ input_mrc = MRC(origin_map_path, gaussian_bandwidth)
+
+
+ #1. chain tracing
+ sp_prob = chain_prob[0]+chain_prob[1]
+ pho_prob = chain_prob[1]
+ sugar_prob = chain_prob[0]
+
+ input_mrc.upsampling_pho_prob(pho_prob,threshold=pho_prob_threshold,filter_array=None)
+ input_mrc.upsampling_sugar_prob(sugar_prob,threshold=sugar_prob_threshold,filter_array=None)
+
+ #1.1 LDP construction based on probability map
+ pho_point_path = os.path.join(save_path,"pho_LDP")
+ mkdir(pho_point_path)
+ pho_point= build_LDP(input_mrc,input_mrc.pho_dens, input_mrc.pho_Nact,origin_map_path,pho_point_path,"pho",params,map_info_list)
+ sugar_point_path = os.path.join(save_path,"sugar_LDP")
+ mkdir(sugar_point_path)
+ sugar_point = build_LDP(input_mrc,input_mrc.sugar_dens,input_mrc.sugar_Nact,origin_map_path,sugar_point_path,"sugar",params,map_info_list)
+
+
+ #1.2 graph construction: edge constructions
+ #pho_graph,pho_coordinate_list,pho_edge_pairs,pho_edge_d_dens = construct_graph(input_mrc,input_mrc.pho_dens, pho_point,chain_prob[1],chain_prob[-1],save_path,"pho_graph",map_info_list,params,prob_threshold=pho_prob_threshold,extend=True)
+ sugar_graph,sugar_coordinate_list,sugar_edge_pairs,sugar_edge_d_dens = construct_graph(input_mrc,input_mrc.sugar_dens,sugar_point,chain_prob[0],chain_prob[-1],save_path,"sugar_graph",map_info_list,params,prob_threshold=sugar_prob_threshold,extend=True)
+
+ #here edge density distance/probability since we prefer low distance+high prob
+ #can't search once, it's too heavy loaded, hence divided into several graphs to calculate
+
+ #2. Path searching with sugar graphs
+ #2.1 divide subgraphs
+ #mst = Tree(params)
+ #sort edge, finding connections, label mst_label
+ #connect_cid = mst.Setup_Connection(sugar_graph)
+ #update with more relaxed subgraph construction, fully based on distance constraints
+ #connect_cid = mst.Setup_Relaxed_Connection(sugar_graph)
+ #subgraphs = sugar_graph.build_subgraph(connect_cid)
+ #print("in total we have %d graphs"%len(subgraphs))
+ #to allow more searches to avoid some failure
+ subgraphs = [[k for k in range(sugar_graph.Nnode)]]
+
+ #2.2 collect all possible paths
+ sugar_search_path = os.path.join(save_path,"sugar_search")
+ mkdir(sugar_search_path)
+ All_Path_List = collect_all_searched_path(subgraphs, sugar_point,sugar_graph,sugar_search_path,map_info_list,params)
+ print("in total we collected %d path candidates"%len(All_Path_List))
+
+ #3 Base Assignment
+ #3.1 Cluster representative points of base LDPs
+ #['A','UT','C','G','base'] ldps in the list
+ base_save_path = os.path.join(save_path,"base_LDP")
+ Base_LDP_List=Build_Base_LDP(input_mrc,chain_prob,base_prob_threshold,origin_map_path,params,map_info_list,base_save_path,filter_type=0)
+ base_point= Build_Baseall_LDP(input_mrc,chain_prob,base_prob_threshold,origin_map_path,params,map_info_list,base_save_path)
+ Base_LDP_List.append(base_point)
+
+ #3.2 collect all possible paths
+ All_Base_Path_List_sugar = Assign_Base_Main_Path_sugar(All_Path_List,sugar_point.merged_cd_dens[:,:3],
+ Base_LDP_List,chain_prob[2:6],cut_off_length=10)
+ Visualize_Path_Base(os.path.join(save_path,"base_assignsugar_combine"),All_Path_List,
+ All_Base_Path_List_sugar,sugar_point.merged_cd_dens,
+ map_info_list,DNA_Label,sugar_visual=True)
+
+
+
+ #3.3 define the assign correlation for sugar point and base ldp point.
+ import pickle
+ base_refer_path = os.path.join(save_path,"base_location_refer_dict.pkl")
+ if not os.path.exists(base_refer_path):
+ sugar_base_match_dict = Match_Sugar_Base_Location(All_Path_List,sugar_point,Base_LDP_List,map_info_list)
+ with open(base_refer_path, 'wb') as handle:
+ pickle.dump(sugar_base_match_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ else:
+ with open(base_refer_path, 'rb') as handle:
+ sugar_base_match_dict= pickle.load(handle)
+
+
+ #4 DP assignment for each possible
+ #4.1 DP assign for each fragments in the searched paths by applying dp for known sequences information.
+ top_select = params['top_select']
+ #mean value should be mean of all maps in reasonable regions.
+ frag_size = params['frag_size']
+ ldp_size = frag_size#int(frag_size*1.5)
+ checking_stride = params['frag_stride']
+ save_seq_path = os.path.join(save_path,"Output_Structure_seq")
+ save_path = os.path.join(save_path,"Output_Structure_noseq")
+ mkdir(save_path)
+
+
+
+
+ #4.1 align each phosphate to sugar to make combined DP
+ #assign pho ldp locations to the path based on a projection to the sugar path, know which its previous node, which its next node
+ #for each sugar node, assign its prev pointer to pho and next pointer to another pho
+ Path_P_align_list,Path_P_reverse_align_list,Path_Pho_ID_List= Assign_PhoLDP_Sugarpath(All_Path_List,sugar_point,pho_point)
+ #visualize the pho pdbs
+ for k,path in enumerate(Path_Pho_ID_List):
+ Visualize_LDP_Path(os.path.join(save_path,"support_p_%d.cif"%k),"support_p_%d"%k,path,pho_point.merged_cd_dens[:,:3],map_info_list)
+
+ # 4.2 match pho-based base prob to each pho positions
+ Pho_Prob_Refer_List,Pho_Prob_Refer_Dict = Assign_Base_Main_Path(Path_Pho_ID_List,pho_point.merged_cd_dens[:,:3], Base_LDP_List,chain_prob[2:6],cut_off_length=10,return_dict=True)
+
+ Pho_Prob_Refer_List_Reverse,Pho_Prob_Refer_Reverse_Dict = Assign_Base_Main_Path(Path_Pho_ID_List,pho_point.merged_cd_dens[:,:3], Base_LDP_List,chain_prob[2:6],cut_off_length=10,reverse_flag=True,return_dict=True)
+
+ #4.3 DP assign for each fragments in the searched paths by applying dp for known sequences information.
+
+ print("only apply geometry constraints for dynamic programming")
+ greedy_save_path = os.path.join(save_path,"DP_geo_match")
+ mkdir(greedy_save_path)
+ from graph.DP_geo_ops import greedy_assign_geo
+ overall_dict,frag_location_dict=greedy_assign_geo(All_Base_Path_List_sugar,All_Path_List,
+ Path_P_align_list,Path_P_reverse_align_list,Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict,
+ sugar_point,map_info_list,greedy_save_path)
+
+ #4.4 build collision table for assemble overlapped fragments
+ frag_save_path = os.path.join(save_path,"AssembleFactory_geo")
+ mkdir(frag_save_path)
+ # 5 Final Atomic Structure Modeling
+
+ # 5.1 prepare atom positions by paths
+ sugar_path_based_location = prepare_all_sugar_location(All_Path_List,sugar_point,map_info_list)
+
+
+ # 5.2 build initial atomic structure
+
+ frag_collect_dir = os.path.join(frag_save_path,"atomic_geo")
+ mkdir(frag_collect_dir)
+ #from graph.geo_structure_modeling import Build_Atomic_Structure
+ Build_Atomic_Structure(overall_dict,
+ sugar_path_based_location, frag_collect_dir,
+ Path_P_align_list,Path_P_reverse_align_list,pho_point,map_info_list,sugar_base_match_dict)
+ init_pdb_path = os.path.join(frag_collect_dir,"Final_Assemble_geo.pdb")
+ format_pdb_path = os.path.join(frag_collect_dir,"Final_Assemble_geo_formated.pdb")
+ refined_pdb_path = os.path.join(save_path,"Final_Refinedgeo.pdb")
+ if params['no_seqinfo'] or chain_dict is None:
+ if not params['no_seqinfo'] and chain_dict is None:
+ print("!!!parsing fasta input failed, we can not output structures considering sequence assignment!!!")
+ # 5.3 reformat pdb for phenix to do refinement (including the last column in pdb file indicate atom type)
+ if params['refine']:
+ refine_structure_global(init_pdb_path,format_pdb_path,root_save_path,frag_collect_dir,
+ chain_prob,origin_map_path,refined_pdb_path,params,DNA_label=DNA_Label)
+ final_pdb_path = os.path.join(root_save_path,"CryoREAD.pdb")
+ from ops.os_operation import collect_refine_pdb
+ collect_refine_pdb(os.path.join(root_save_path,"Output"),refined_pdb_path,final_pdb_path)
+ else:
+ #output_dir = os.path.join(root_save_path,"Output")
+ #mkdir(output_dir)
+ nonrefined_pdb_path = os.path.join(root_save_path,"CryoREAD_norefine.pdb")
+ #shutil.copy(init_pdb_path,nonrefined_pdb_path)
+ format_pdb(init_pdb_path,nonrefined_pdb_path,DNA_Label)
+ print("please check final output atomic structure in %s"%nonrefined_pdb_path)
+ return
+ else:
+ format_pdb(init_pdb_path,format_pdb_path,DNA_Label)
+ nonrefined_pdb_path = os.path.join(root_save_path,"CryoREAD_noseq.pdb")
+ shutil.copy(format_pdb_path,nonrefined_pdb_path)
+ overall_geo_dict= overall_dict
+ frag_geo_location_dict = frag_location_dict
+ save_path = save_seq_path
+ mkdir(save_path)
+ greedy_save_path = os.path.join(save_path,"DP_search_"+str(ldp_size)+"_"+str(checking_stride)+"_"+str(top_select))
+ mkdir(greedy_save_path)
+ # 6.1 greedy assign based on geomery as well as the
+ if chain_dict is not None:
+ if params['thread']==1:
+ overall_dict,frag_location_dict=greedy_assign_PS(All_Base_Path_List_sugar,All_Path_List,
+ Path_P_align_list,Path_P_reverse_align_list,Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict,
+ chain_prob,ldp_size,sugar_point,map_info_list,chain_dict,greedy_save_path,top_select,checking_stride)
+ else:
+ from graph.DP_ops import greedy_assign_PS_effective
+ overall_dict,frag_location_dict=greedy_assign_PS_effective(All_Base_Path_List_sugar,All_Path_List,
+ Path_P_align_list,Path_P_reverse_align_list,Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict,
+ chain_prob,ldp_size,sugar_point,map_info_list,chain_dict,greedy_save_path,top_select,checking_stride,
+ num_cpus=params['thread'])
+ else:
+ print("!!!parsing fasta input failed, we can not output structures considering sequence assignment!!!")
+ exit()
+
+ # 6.2 Assemble fragments assignment together
+
+ if params['rule_soft']==1:
+ frag_save_path = os.path.join(save_path,"AssembleFactory_"+str(ldp_size)+"_"+str(checking_stride)+"_"+str(top_select))
+ else:
+ #pose strict rules for assembling
+ frag_save_path = os.path.join(save_path,"AssembleFactory_strict_"+str(ldp_size)+"_"+str(checking_stride)+"_"+str(top_select))
+ mkdir(frag_save_path)
+ cur_final_assemble_path = os.path.join(frag_save_path,"assemble_frag_%d_%d_%d_soft%d.txt"%(ldp_size,checking_stride,top_select,params['rule_soft']))
+
+ cur_collision_path = os.path.join(frag_save_path,"Collision_Table_%d_%d_%d_soft_%d.npy"%(ldp_size,checking_stride,top_select,params['rule_soft']))
+ collision_table,order_key_index,order_chain_index,key_order_index=build_collision_table(All_Base_Path_List_sugar, checking_stride,
+ ldp_size,overall_dict,cur_collision_path,params['rule_soft'])
+
+ #4.5 build assemble from pre-defined collision table
+ time_use = 3600*(len(sugar_point.merged_cd_dens)/1000)
+ time_use = min(time_use,3600*10)
+ if not os.path.exists(cur_final_assemble_path):
+
+ solve_frag_combine_list = solve_assignment(collision_table,order_key_index,
+ order_chain_index,overall_dict,time_use=time_use)
+ if len(solve_frag_combine_list)==0 and params['rule_soft']==1:
+ print("no possible solution for assembling")
+ print("please make contact with the developer for further help!")
+ #return
+ if len(solve_frag_combine_list)==0 and params['rule_soft']==0:
+ print("no possible solution for assembling with hard rules")
+ print("we will try to reassign for via soft rules")
+ frag_save_path = os.path.join(save_path,"AssembleFactory_"+str(ldp_size)+"_"+str(checking_stride)+"_"+str(top_select))
+ mkdir(frag_save_path)
+ params['rule_soft']=1
+ cur_final_assemble_path = os.path.join(frag_save_path,"assemble_frag_%d_%d_%d_soft%d.txt"%(ldp_size,checking_stride,top_select,params['rule_soft']))
+
+ cur_collision_path = os.path.join(frag_save_path,"Collision_Table_%d_%d_%d_soft_%d.npy"%(ldp_size,checking_stride,top_select,params['rule_soft']))
+ collision_table,order_key_index,order_chain_index,key_order_index=build_collision_table(All_Base_Path_List_sugar, checking_stride,
+ ldp_size,overall_dict,cur_collision_path,params['rule_soft'])
+ solve_frag_combine_list = solve_assignment(collision_table,order_key_index,
+ order_chain_index,overall_dict,time_use=time_use)
+ if len(solve_frag_combine_list)==0:
+ print("no possible solution for assembling after trying strict rules and soft rules")
+ print("please make contact with the developer for further help!")
+ #return
+ else:
+ np.savetxt(cur_final_assemble_path,np.array(solve_frag_combine_list))
+ else:
+ solve_frag_combine_list = np.loadtxt(cur_final_assemble_path)
+ print("loading solved possible fragment combination finished")
+
+ #if we do not find any possible solution, we will directly call refine for the CryoREAD_noseq.pdb
+ if len(solve_frag_combine_list)==0:
+ print("*"*100)
+ print("no solution find, use the noseq version as final output!")
+ print("*"*100)
+ final_pdb_path = os.path.join(root_save_path,"CryoREAD.pdb")
+ if not params['refine']:
+
+ with open(final_pdb_path,"w") as f:
+ f.write("#CryoREAD no_seq pdb, sequence information is not used because of unsolvable assembling!\n")
+ with open(nonrefined_pdb_path,"r") as f2:
+ for line in f2:
+ f.write(line)
+ else:
+ from graph.refine_structure import refine_structure
+ output_dir = os.path.join(root_save_path,"Output")
+ os.makedirs(output_dir,exist_ok=True)
+ refine_structure(nonrefined_pdb_path,origin_map_path,output_dir,params)
+ from ops.os_operation import collect_refine_pdb
+ refined_pdb_path = os.path.join(root_save_path,"CryoREAD_tmp.pdb")
+ collect_refine_pdb(output_dir,nonrefined_pdb_path,refined_pdb_path)
+ with open(final_pdb_path,"w") as f:
+ f.write("#CryoREAD no_seq pdb, sequence information is not used because of unsolvable assembling!\n")
+ with open(refined_pdb_path,"r") as f2:
+ for line in f2:
+ f.write(line)
+
+ return
+
+ # 7.3 reassign for those overlapped regions
+ frag_collect_dir = os.path.join(frag_save_path,"atomic_reassign_%d_%d_%d"%(ldp_size,checking_stride,top_select))
+ mkdir(frag_collect_dir)
+ overall_reassign_dict = reassign_basedonfrag(solve_frag_combine_list,order_key_index,order_chain_index,overall_dict,
+ sugar_path_based_location, ldp_size,frag_collect_dir,checking_stride,top_select,chain_dict,
+ All_Base_Path_List_sugar,All_Path_List,
+ Path_P_align_list,Path_P_reverse_align_list,Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict)
+
+ # build a sequence-based only structure here
+ from graph.structure_modeling import Build_Atomic_Model_nonoverlap_frag
+ frag_collect_dir = os.path.join(frag_save_path,"atomic_seq")
+ mkdir(frag_collect_dir)
+ Extra_Added_Assign_Dict=Extend_Solve_Assignment_SP_support(frag_save_path,All_Path_List,solve_frag_combine_list,
+ order_key_index,order_chain_index,overall_dict,define_ldp_size=ldp_size)
+ print("We added %d assignment that have collision to fill those gaps"%(len(Extra_Added_Assign_Dict)))
+ Build_Atomic_Model_nonoverlap_frag(overall_reassign_dict,
+ sugar_path_based_location, ldp_size,frag_collect_dir,checking_stride,top_select,
+ Path_P_align_list,Path_P_reverse_align_list,pho_point,map_info_list,
+ Extra_Added_Assign_Dict,sugar_base_match_dict,DNA_Label)
+ init_pdb_path = os.path.join(frag_collect_dir,"Final_Assemble_%d_%d_%d.pdb"%(ldp_size,checking_stride,top_select))
+ format_pdb_path = os.path.join(frag_collect_dir,"Final_Assemble_%d_%d_%d_formated.pdb"%(ldp_size,checking_stride,top_select))
+ format_pdb(init_pdb_path,format_pdb_path,DNA_Label)
+ output_dir = os.path.join(root_save_path,"Output")
+ mkdir(output_dir)
+ nonrefined_pdb_path = os.path.join(output_dir,"CryoREAD_seqonly.pdb")
+ shutil.copy(format_pdb_path,nonrefined_pdb_path)
+
+ # 7.4 merge dp assignment and the geo assignment
+ final_assign_dict = merge_assign_geo_seq(overall_geo_dict,overall_reassign_dict)
+ # 7.5 build atomic structure
+ frag_collect_dir = os.path.join(frag_save_path,"atomic_all")
+ mkdir(frag_collect_dir)
+
+ Build_Atomic_Structure(final_assign_dict,
+ sugar_path_based_location, frag_collect_dir,
+ Path_P_align_list,Path_P_reverse_align_list,pho_point,map_info_list,sugar_base_match_dict)
+ # 5.3 reformat pdb for phenix to do refinement (including the last column in pdb file indicate atom type)
+ init_pdb_path = os.path.join(frag_collect_dir,"Final_Assemble_geo.pdb")
+ format_pdb_path = os.path.join(frag_collect_dir,"Final_Assemble_seq_formated.pdb")
+ refined_pdb_path = os.path.join(save_path,"Final_Refinedseq.pdb")
+ if params['refine']:
+ refine_structure_global(init_pdb_path,format_pdb_path,root_save_path,frag_collect_dir,
+ chain_prob,origin_map_path,refined_pdb_path,params,DNA_label=DNA_Label)
+ final_pdb_path = os.path.join(root_save_path,"CryoREAD.pdb")
+ from ops.os_operation import collect_refine_pdb
+ collect_refine_pdb(os.path.join(root_save_path,"Output"),refined_pdb_path,final_pdb_path)
+ else:
+ #output_dir = os.path.join(root_save_path,"Output")
+ #mkdir(output_dir)
+ nonrefined_pdb_path = os.path.join(root_save_path,"CryoREAD_norefine.pdb")
+ #shutil.copy(init_pdb_path,nonrefined_pdb_path)
+ format_pdb(init_pdb_path,nonrefined_pdb_path,DNA_Label)
+ print("please check final output atomic structure in %s"%nonrefined_pdb_path)
+
+
+
diff --git a/CryoREAD/graph/DP_geo_ops.py b/CryoREAD/graph/DP_geo_ops.py
new file mode 100644
index 0000000..879996b
--- /dev/null
+++ b/CryoREAD/graph/DP_geo_ops.py
@@ -0,0 +1,192 @@
+
+
+
+from collections import defaultdict
+from turtle import screensize
+import numpy as np
+from data_processing.map_utils import permute_ns_coord_to_pdb,process_map_data,permute_pdb_coord_to_map,permute_map_coord_to_pdb
+from ops.math_calcuation import calculate_distance
+import os
+from ops.os_operation import mkdir
+import pickle
+from graph.LDP_ops import Convert_LDPcoord_To_Reallocation
+from numba import jit
+
+def Calculate_Distance_array(fragment_ldp_location):
+ distance_array =np.zeros([len(fragment_ldp_location),len(fragment_ldp_location)])
+ for i in range(len(fragment_ldp_location)):
+ for j in range(i+1,len(fragment_ldp_location)):
+ distance_array[i,j] = calculate_distance(fragment_ldp_location[i],fragment_ldp_location[j])
+ distance_array[j,i]= distance_array[i,j]
+ return distance_array
+@jit(nogil=True,nopython=True)
+def global_align_score(ldp_gap_penalty,match_matrix,
+ pointer, scratch,ldp_distance_array, ldp_prev_connect):
+ low_cut_off_distance=4
+ high_cut_off_distance=10
+ mean_distance=7
+ #two operations for pointer: 1 use, 0 skip
+ for i in range(len(ldp_distance_array)):
+ if i==0:
+ pointer[i]=1
+ scratch[i]=match_matrix[i]
+ ldp_prev_connect[i]=-1 #indicates itself is the last connected
+ else:
+ #1 choose
+ prev_neighbor = int(ldp_prev_connect[i-1]) if ldp_prev_connect[i-1]!=-1 else i-1
+ #choose_best = scratch[prev_neighbor]+match_matrix[i]
+ cur_distance = ldp_distance_array[prev_neighbor,i]
+
+ if cur_distance<=low_cut_off_distance or cur_distance>=high_cut_off_distance:
+ current_best= scratch[i-1]+match_matrix[i]-ldp_gap_penalty*abs(cur_distance-mean_distance)**2#ldp_gap
+ else:
+ current_best= scratch[i-1]+match_matrix[i]
+
+ #2 skip this node
+ reasonable_skip_ldp=0
+ if cur_distance+ldp_distance_array[i,i+1]=left_cur:
+ scratch[i]=current_best
+ pointer[i]=1
+ ldp_prev_connect[i]=-1
+ else:
+ scratch[i]= left_cur
+ pointer[i]=0
+ ldp_prev_connect[i]=prev_neighbor
+ return scratch,pointer,ldp_prev_connect
+
+def dynamic_assign_geo(updated_base_list,ldp_gap_penalty,fragment_distance_array,save_path=None):
+ map_dict={0:"A",1:"T",2:"C",3:"G"}
+ match_matrix = np.max(updated_base_list,axis=1)#updated_base_list
+ if save_path is not None:
+ score_path = os.path.join(save_path,"match_score.txt")
+ np.savetxt(score_path,match_matrix)
+ pointer = np.zeros(len(updated_base_list))#indicates the operations at position i
+ scratch = np.zeros(len(updated_base_list))
+ ldp_prev_connect = np.zeros(len(updated_base_list))-1
+ scratch,pointer,ldp_prev_connect=global_align_score(ldp_gap_penalty,match_matrix,
+ pointer, scratch,fragment_distance_array, ldp_prev_connect)
+ if save_path is not None:
+ score_path = os.path.join(save_path,"optimal_score.txt")
+ np.savetxt(score_path,scratch)
+ score_path = os.path.join(save_path,"optimal_direction.txt")
+ np.savetxt(score_path,pointer)#verfied corret
+ score_path = os.path.join(save_path,"ldp_prev_connect.txt")
+ np.savetxt(score_path,ldp_prev_connect)
+ input_seq_line=""
+ for i in range(len(updated_base_list)):
+ choice=int(np.argmax(updated_base_list[i]))
+ input_seq_line += map_dict[choice]
+ # if len(scratch)>5:
+ # max_score_index = np.argmax(scratch[-5:])+len(scratch)-5
+ # else:
+ # max_score_index = np.argmax(scratch)
+ # match_matrix = np.zeros(len(updated_base_list))-1
+ # max_score_index = int(max_score_index)
+ # match_matrix[max_score_index]=1
+ # prev_index = pointer[int(max_score_index)]
+ # while check_pointer!=0:
+ #for any ldp_prev_connect=-1, we select it.
+ max_score = np.max(scratch)
+ match_matrix = np.zeros(len(updated_base_list))
+ match_matrix[ldp_prev_connect==-1]=1
+ if save_path is not None:
+ score_path = os.path.join(save_path,"match_seq.txt")
+ match_seq=""
+ with open(score_path,'w') as file:
+ file.write(input_seq_line+"\n")
+ for k in range(len(match_matrix)):
+ if match_matrix[k]==1:
+ file.write(input_seq_line[k])
+ match_seq +=input_seq_line[k]
+ else:
+ file.write("-")
+ match_seq+="-"
+ file.write("\n")
+ else:
+ match_seq=""
+ for k in range(len(match_matrix)):
+ if match_matrix[k]==1:
+ match_seq +=input_seq_line[k]
+ else:
+ match_seq+="-"
+ return max_score,match_seq
+
+
+
+
+
+
+
+def greedy_assign_geo(All_Base_Path_List_sugar,All_Path_List_sugar,Path_P_align_list,Path_P_reverse_align_list,
+ Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict,
+ sugar_point,map_info_list,greedy_save_path):
+ cur_frag_path = os.path.join(greedy_save_path,"All_fragment_geo.pkl")
+ cur_frag_path2 = os.path.join(greedy_save_path,"Location_fragment_geo.pkl")
+ if os.path.exists(cur_frag_path) and os.path.exists(cur_frag_path2):
+ with open(cur_frag_path, 'rb') as handle:
+ overall_dict= pickle.load(handle)
+ with open(cur_frag_path2, 'rb') as handle:
+ frag_location_dict = pickle.load(handle)
+ return overall_dict,frag_location_dict
+ merged_cd_dens = sugar_point.merged_cd_dens
+ ldp_proper_range = [4,10]
+ overall_dict={}
+ frag_location_dict = {}
+
+ ldp_gap_penalty = 10 # only applied if not reasonable skip
+ for path_id,cur_path_list in enumerate(All_Base_Path_List_sugar):
+ cur_path_save_path = os.path.join(greedy_save_path,"path_%d"%path_id)
+ mkdir(cur_path_save_path)
+ current_path = All_Path_List_sugar[path_id]
+ current_pho_align = Path_P_align_list[path_id]
+ current_pho_align_reverse = Path_P_reverse_align_list[path_id]
+ current_length = len(cur_path_list)
+ current_base_list = All_Base_Path_List_sugar[path_id]
+ updated_base_list = [np.array(current_base_list[j]) for j in range(len(current_base_list))]
+ updated_base_list = np.array(updated_base_list)
+ #update the reward matrix based on the pho updates
+ pho_order_base_prob_list =[np.array(Pho_Prob_Refer_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Dict else np.zeros(4) for k in current_pho_align]
+ pho_reverse_base_prob_list = [np.array(Pho_Prob_Refer_Reverse_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Reverse_Dict else np.zeros(4) for k in current_pho_align_reverse]
+ assert len(pho_order_base_prob_list)==len(updated_base_list) and len(pho_order_base_prob_list)==len(pho_reverse_base_prob_list)
+ updated_base_score_list = [updated_base_list[j]+pho_order_base_prob_list[j]+pho_reverse_base_prob_list[j] for j in range(len(current_base_list))]
+ updated_base_score_list = np.array(updated_base_score_list)
+ study_ldp_base_list = updated_base_score_list
+
+ #study_ldp_base_list_reverse = updated_base_score_reverse_list[start_ldp_index:end_ldp_index]
+ current_location_list = [merged_cd_dens[int(kk)] for kk in current_path]
+ dp_save_path = os.path.join(cur_path_save_path,"path_order")
+ mkdir(dp_save_path)
+ fragment_ldp_location= Convert_LDPcoord_To_Reallocation(current_location_list, map_info_list)
+ fragment_distance_array = Calculate_Distance_array(fragment_ldp_location)
+ max_score, match_seq_line = dynamic_assign_geo(study_ldp_base_list,ldp_gap_penalty,
+ fragment_distance_array, dp_save_path)
+ dp_save_path = os.path.join(cur_path_save_path,"path_reverse_order")
+ mkdir(dp_save_path)
+ max_score2, match_seq_line2 = dynamic_assign_geo(study_ldp_base_list[::-1],ldp_gap_penalty,
+ fragment_distance_array[::-1,::-1], dp_save_path)
+ new_dict={}
+ if max_score>max_score2:
+ match_seq_line=match_seq_line
+ new_dict['direction']=1
+ else:
+ match_seq_line= match_seq_line2[::-1]
+ new_dict['direction']=-1
+ new_dict['score']=max_score
+ new_dict['match_seq']=match_seq_line
+ overall_dict[path_id]=new_dict
+ frag_location_dict[path_id]=fragment_ldp_location
+ print("geometrical based dp finished selecting ldp for atomic modeling")
+ with open(cur_frag_path, 'wb') as handle:
+ pickle.dump(overall_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ with open(cur_frag_path2, 'wb') as handle:
+ pickle.dump(frag_location_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ return overall_dict,frag_location_dict
+
diff --git a/CryoREAD/graph/DP_ops.py b/CryoREAD/graph/DP_ops.py
new file mode 100644
index 0000000..900d67a
--- /dev/null
+++ b/CryoREAD/graph/DP_ops.py
@@ -0,0 +1,537 @@
+from collections import defaultdict
+import numpy as np
+from data_processing.map_utils import permute_ns_coord_to_pdb,process_map_data,permute_pdb_coord_to_map,permute_map_coord_to_pdb
+from ops.math_calcuation import calculate_distance
+import os
+from ops.os_operation import mkdir
+import pickle
+from graph.LDP_ops import Convert_LDPcoord_To_Reallocation
+from graph.visualize_utils import Visualize_assign_DPbase
+
+def Calculate_Distance_array(fragment_ldp_location):
+ distance_array =np.zeros([len(fragment_ldp_location),len(fragment_ldp_location)])
+ for i in range(len(fragment_ldp_location)):
+ for j in range(i+1,len(fragment_ldp_location)):
+ distance_array[i,j] = calculate_distance(fragment_ldp_location[i],fragment_ldp_location[j])
+ distance_array[j,i]= distance_array[i,j]
+ return distance_array
+from numba import jit
+@jit(nogil=True,nopython=True)
+def global_align_score(ldp_gap,seq_gap, matrix, ldp_list,
+seq_list, pointer, scratch,seq_gap_count,ldp_gap_count,
+ldp_distance_array,ldp_prev_connect,seq_gap_total_limit=2):
+ low_cut_off_distance=4
+ high_cut_off_distance=10
+ mean_distance=7
+ for i in range(1,len(ldp_list)+1):
+ for j in range(1,len(seq_list)+1):
+ reasonable_skip_ldp=0
+ if i==1:
+ current_best= scratch[i-1,j-1]+matrix[i,j]
+ else:
+ prev_neighbor = int(ldp_prev_connect[i-1,j-1]) if ldp_prev_connect[i-1,j-1]!=-1 else i-1
+ cur_distance = ldp_distance_array[prev_neighbor,i]
+
+ if cur_distance<=low_cut_off_distance or cur_distance>=high_cut_off_distance:
+ #current_best=-999999
+ #elif cur_distance>=high_cut_off_distance:
+ current_best= scratch[i-1,j-1]+matrix[i,j]-ldp_gap*abs(cur_distance-mean_distance)**2#ldp_gap
+ else:
+ current_best= scratch[i-1,j-1]+matrix[i,j]
+
+ #left_best= 0
+ #left_best_step=0
+ if i==1:
+ left_cur = scratch[i-1,j]
+ else:
+ prev_neighbor = int(ldp_prev_connect[i-1,j]) if ldp_prev_connect[i-1,j]!=-1 else i-1
+ cur_distance = ldp_distance_array[prev_neighbor,i]
+ if i!=len(seq_list) and cur_distance+ldp_distance_array[i,i+1]=high_cut_off_distance:
+ # # #left_cur = -999999 #do not allow skip this ldp point any more #scratch[i-1,j]-abs(cur_distance-mean_distance)*ldp_gap#-ldp_gap#-ldp_gap*max(ldp_gap_count[i-1,j]-1,0)
+ # left_cur = scratch[i-1,j]-100*abs(cur_distance-mean_distance)#ldp_gap
+ # # else:
+ # #elif ldp_gap_count[i-1,j]>=2:
+ # # left_cur = -999999
+ # else:
+ # left_cur = scratch[i-1,j]
+ #else:
+ # left_cur = scratch[i-1,j]-ldp_gap*(ldp_gap_count[i-1,j]+1)
+ if reasonable_skip_ldp:
+ left_cur = scratch[i-1,j]
+ else:
+ left_cur = scratch[i-1,j]-ldp_gap*abs(cur_distance-mean_distance)**2#ldp_gap
+ left_best_step=1
+ # up_best =0
+ # up_best_step=0
+ if seq_gap_count[i,j-1]<=seq_gap_total_limit:
+ up_cur = scratch[i,j-1]-seq_gap#-seq_gap*(seq_gap_count[i,j-1]+1)#include the penalty for adding this
+ else:
+ up_cur = scratch[i,j-1]-seq_gap*(seq_gap_count[i,j-1]+1)#keep all possible combinations
+ #up_cur = scratch[i,j-1]-seq_gap
+ #up_cur=0#do not allow any skip for the sequences
+ up_best_step=1
+ #must take some operations
+ final_best = max(current_best,left_cur,up_cur)#max(0,current_best,left_cur,up_cur)
+ scratch[i,j]=final_best
+ if final_best==up_cur:
+ pointer[i,j]=up_best_step
+ seq_gap_count[i,j] = 1+seq_gap_count[i,j-1]
+ ldp_prev_connect[i,j] = ldp_prev_connect[i,j-1]
+ ldp_gap_count[i,j]= ldp_gap_count[i,j-1]
+
+ elif final_best==left_cur:
+ pointer[i,j]=-left_best_step
+ seq_gap_count[i,j] = seq_gap_count[i-1,j]
+ if ldp_prev_connect[i-1,j]!=-1:
+ ldp_prev_connect[i,j] = ldp_prev_connect[i-1,j]
+ else:
+ ldp_prev_connect[i,j] = i-1
+ if reasonable_skip_ldp:
+ ldp_gap_count[i,j]=ldp_gap_count[i-1,j]
+ else:
+ ldp_gap_count[i,j]=ldp_gap_count[i-1,j]+1
+
+ else:
+ pointer[i,j]=999999#indicates directly go diagnol
+ seq_gap_count[i,j] = seq_gap_count[i-1,j-1]
+ ldp_gap_count[i,j]= ldp_gap_count[i-1,j-1]
+
+ return scratch,pointer,seq_gap_count,ldp_prev_connect,ldp_gap_count
+
+
+def dynamic_assign_multi(updated_base_list,current_chain,
+ ldp_gap_penalty,seq_gap_penalty,fragment_distance_array,save_path=None):
+ map_dict={0:"A",1:"T",2:"C",3:"G"}
+ #build M*N matrix
+ match_matrix = np.zeros([len(updated_base_list)+1,len(current_chain)+1])
+ for i in range(len(updated_base_list)):
+ current_prob = updated_base_list[i]
+ for j in range(len(current_chain)):
+ cur_label=int(current_chain[j])
+ match_matrix[i+1,j+1]=current_prob[cur_label]*100#-100#if it's higher than reference value, it should be bigger than 1
+ if save_path is not None:
+ score_path = os.path.join(save_path,"match_score.txt")
+ np.savetxt(score_path,match_matrix)
+ match_score = match_matrix
+ pointer = np.zeros([len(updated_base_list)+1,len(current_chain)+1])
+ scratch = np.zeros([len(updated_base_list)+1,len(current_chain)+1])
+ #init the scracth matrix
+ for k in range(1,len(updated_base_list)+1):
+ scratch[k,0]=-999999#do not allow open gap
+ #for k in range(1,len(current_chain)+1):
+ # scratch[0,k]=-seq_gap_penalty*k
+
+ fragment_final_distance = np.zeros([len(fragment_distance_array)+1,len(fragment_distance_array)+1])
+ fragment_final_distance[1:,1:]=fragment_distance_array
+ fragment_final_distance[0,:]=20
+ fragment_final_distance[:,0]=20
+ #ldp_gap_accumulation = np.zeros([len(updated_base_list)+1,len(current_chain)+1])
+ seq_gap_accumulation = np.zeros([len(updated_base_list)+1,len(current_chain)+1])
+ ldp_gap_accumulation = np.zeros([len(updated_base_list)+1,len(current_chain)+1])
+ #init gap count
+ #for k in range(1,len(current_chain)+1):
+ # seq_gap_accumulation[0,k]=k
+ for k in range(1,len(updated_base_list)+1):
+ ldp_gap_accumulation[k,0]=k
+ seq_gap_total_limit = int(len(updated_base_list)*0.15)
+ ldp_prev_connect = np.zeros([len(updated_base_list)+1,len(current_chain)+1])-1
+ ldp_prev_connect[0]=0# which keep calculating reasonable penalty to to avoid open gap.
+ # for k in range(1,len(updated_base_list)+1):
+ # ldp_prev_connect[k,:]=k-1
+ scratch,pointer,seq_gap_count,ldp_prev_connect,ldp_gap_count=global_align_score(ldp_gap_penalty,seq_gap_penalty, match_matrix,
+ np.arange(len(updated_base_list)), np.arange(len(current_chain)),
+ pointer, scratch,seq_gap_accumulation, ldp_gap_accumulation,fragment_final_distance, ldp_prev_connect,seq_gap_total_limit)
+ if save_path is not None:
+ score_path = os.path.join(save_path,"optimal_score.txt")
+ np.savetxt(score_path,scratch)
+ score_path = os.path.join(save_path,"optimal_direction.txt")
+ np.savetxt(score_path,pointer)#verfied corret
+ score_path = os.path.join(save_path,"sequence_miss_count.txt")
+ np.savetxt(score_path,seq_gap_count)#verified correct
+ score_path = os.path.join(save_path,"ldp_prev_connect.txt")
+ np.savetxt(score_path,ldp_prev_connect)
+ score_path = os.path.join(save_path,"ldp_distance_array.txt")
+ np.savetxt(score_path,fragment_final_distance)
+ score_path = os.path.join(save_path,"ldp_gap_count.txt")
+ np.savetxt(score_path,ldp_gap_count)
+ input_seq_line=""
+ for i in range(len(updated_base_list)):
+ choice=int(np.argmax(updated_base_list[i]))
+ input_seq_line += map_dict[choice]
+ #trace back all possible choices that with score>0
+ #any score<0, we assume directly using the predictions
+ max_score_candidate_index = np.argwhere(scratch[len(updated_base_list)]>-1000)
+ max_score_list = []
+ match_seq_list = []
+ match_interval_list =[]
+ count_match=0
+ for candidate_index in max_score_candidate_index:
+ match_matrix = np.zeros(len(updated_base_list))-1
+ candidate_index =int(candidate_index)
+ max_index = candidate_index
+ max_index = [len(updated_base_list),max_index]
+ check_pointer = pointer[max_index[0],max_index[1]]
+ cur_x=int(max_index[0])
+ cur_y=int(max_index[1])
+ while check_pointer!=0:
+ if check_pointer==999999:
+ match_matrix[cur_x-1]=cur_y-1
+ current_begin_index = cur_y-1
+ cur_x -=1
+ cur_y -=1
+ elif check_pointer>0:
+ cur_y -= int(check_pointer)
+ elif check_pointer<0:
+ cur_x -= int(abs(check_pointer))
+ #match_matrix[cur_x-1]=cur_y-1
+ check_pointer = pointer[cur_x,cur_y]
+ if save_path is not None:
+ score_path = os.path.join(save_path,"match_result%d.txt"%count_match)
+ np.savetxt(score_path,match_matrix)
+ match_seq_line = ""
+ count_replace =0
+ for i in range(len(match_matrix)):
+ if match_matrix[i]==-1:
+ match_seq_line+="-"
+ else:
+ point_index = int(match_matrix[i])
+ #important rule, if the new assignment prob>average, accept, otherwise, decline
+ match_label = int(current_chain[point_index])
+ current_ldp_score = updated_base_list[i]
+ current_ldp_prob = current_ldp_score[match_label]
+
+ match_seq_line+=map_dict[match_label]
+ if current_ldp_prob<=0.25:
+ count_replace+=1
+
+ current_score = float(scratch[len(updated_base_list),candidate_index])
+ max_score_list.append(current_score)
+ match_seq_list.append(match_seq_line)
+ match_interval_list.append([current_begin_index,candidate_index])
+ count_match+=1
+ all_seq_line = ""
+ for i in range(len(current_chain)):
+ match_label = int(current_chain[i])
+ all_seq_line += map_dict[match_label]
+ if save_path is not None:
+ score_path = os.path.join(save_path,"match_seq.txt")
+ with open(score_path,'w') as file:
+ file.write(input_seq_line+"\n")
+ for j in range(len(match_seq_list)):
+ current_interval = match_interval_list[j]
+ file.write(match_seq_list[j]+"\t"+"%.2f"%max_score_list[j]+
+ "\t%d,%d\n"%(current_interval[0],current_interval[1]))
+ file.write(all_seq_line+"\n")
+
+ return max_score_list,match_seq_list,match_interval_list
+
+def greedy_assign_PS(All_Base_Path_List_sugar,All_Path_List_sugar,Path_P_align_list,Path_P_reverse_align_list,
+ Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict,chain_prob,
+ ldp_size,sugar_point,map_info_list,chain_dict,greedy_save_path,top_select,checking_stride):
+ cur_frag_path = os.path.join(greedy_save_path,"All_fragment_%d_%d_%d.pkl"%(ldp_size,checking_stride,top_select))
+ cur_frag_path2 = os.path.join(greedy_save_path,"Location_fragment_%d_%d_%d.pkl"%(ldp_size,checking_stride,top_select))
+ if os.path.exists(cur_frag_path) and os.path.exists(cur_frag_path2):
+ with open(cur_frag_path, 'rb') as handle:
+ overall_dict= pickle.load(handle)
+ with open(cur_frag_path2, 'rb') as handle:
+ frag_location_dict = pickle.load(handle)
+ return overall_dict,frag_location_dict
+ base_region_prob = chain_prob[2:6]
+ merged_cd_dens = sugar_point.merged_cd_dens
+
+ ldp_proper_range = [4,10]
+ overall_score=0
+ final_check_dir_list=[]
+ count_all_combination=0
+ count_use_combination=0
+ ldp_gap_penalty = 10 # only applied if not reasonable skip
+ seq_gap_penalty = 25 # applied once we need to skip an amino acid
+ overall_dict=defaultdict(list)#use path id staring index to design overlap
+ frag_location_dict = {}
+
+ for path_id,cur_path_list in enumerate(All_Base_Path_List_sugar):
+ current_path = All_Path_List_sugar[path_id]
+ current_pho_align = Path_P_align_list[path_id]
+ current_pho_align_reverse = Path_P_reverse_align_list[path_id]
+ current_length = len(cur_path_list)
+ current_base_list = All_Base_Path_List_sugar[path_id]
+ updated_base_list = [np.array(current_base_list[j]) for j in range(len(current_base_list))]
+ updated_base_list = np.array(updated_base_list)
+ #update the reward matrix based on the pho updates
+ pho_order_base_prob_list =[np.array(Pho_Prob_Refer_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Dict else np.zeros(4) for k in current_pho_align]
+ pho_reverse_base_prob_list = [np.array(Pho_Prob_Refer_Reverse_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Reverse_Dict else np.zeros(4) for k in current_pho_align_reverse]
+
+ assert len(pho_order_base_prob_list)==len(updated_base_list) and len(pho_order_base_prob_list)==len(pho_reverse_base_prob_list)
+ updated_base_score_list = [updated_base_list[j]+pho_order_base_prob_list[j]+pho_reverse_base_prob_list[j] for j in range(len(current_base_list))]
+ updated_base_score_list = np.array(updated_base_score_list)
+ #calculate the reverse score
+ #updated_base_score_reverse_list = [updated_base_list[j]+pho_reverse_base_prob_list[j] for j in range(len(current_base_list))]
+ #updated_base_score_reverse_list = np.array(updated_base_score_reverse_list)
+
+ cur_path_save_path = os.path.join(greedy_save_path,"path_%d"%path_id)
+ mkdir(cur_path_save_path)
+
+ start_ldp_index =0
+ count_path_frag = 0
+ while start_ldp_index0:
+ os.system("rm "+path_save_path+"/*")
+ #clear this directory to avoid previous results
+ final_check_dir_list.append("Collecttop_%d_%d"%(path_id,start_ldp_index))
+ path_assign_score_list=np.array(path_assign_score_list)
+ path_score_index = np.argsort(path_assign_score_list)
+ path_score_index = path_score_index[::-1]#from bigger to smaller
+ rank_id = 1
+ final_path_assign_collection_list =[]
+ final_path_assign_score_list =[]
+ final_path_assign_interval_list =[]
+ final_path_chain_list=[]
+ final_path_direction_list=[]
+ final_path_chainlength_list=[]
+ for select_index in path_score_index[:top_select]:
+ print("for fragment starting %d, select %d as one of top"%(count_path_frag,select_index))
+ current_assign_match_seq = path_assign_collection_list[select_index]
+ Visualize_assign_DPbase(path_save_path,"fragement_%d"%rank_id,
+ current_path[start_ldp_index:end_ldp_index],current_assign_match_seq,
+ current_base_list[start_ldp_index:end_ldp_index], sugar_point.merged_cd_dens,map_info_list)
+ final_path_assign_collection_list.append(path_assign_collection_list[select_index])
+ final_path_assign_score_list.append(path_assign_score_list[select_index])
+ final_path_assign_interval_list.append(path_assign_interval_list[select_index])
+ final_path_chain_list.append(path_chain_list[select_index])
+ final_path_direction_list.append(path_direction_list[select_index])
+ final_path_chainlength_list.append(path_chainlength_list[select_index])
+ rank_id+=1
+ count_use_combination+=min(top_select,len(path_score_index))
+ for kk in range(len(final_path_assign_collection_list)):
+ new_dict={}
+ new_dict['match_seq']=final_path_assign_collection_list[kk]
+ new_dict['score']=final_path_assign_score_list[kk]
+ new_dict['interval']=final_path_assign_interval_list[kk]
+ new_dict['chain']=final_path_chain_list[kk]
+ new_dict['direction']=final_path_direction_list[kk]
+ # new_dict['ldp_location'] = fragment_ldp_location[kk]
+ new_dict['chain_length'] = final_path_chainlength_list[kk]
+ overall_dict["%d_%d"%(path_id,start_ldp_index)].append(new_dict)
+ frag_location_dict["%d_%d"%(path_id,start_ldp_index)]=fragment_ldp_location
+ start_ldp_index+=checking_stride#checking_stride#int(frag_size*0.2)#checking_stride
+ count_path_frag+=1
+ if len(path_assign_score_list)>0:
+ overall_score += np.max(path_assign_score_list)
+ print("in total we selected %d/%d possible combinations"%(count_use_combination, count_all_combination))
+ with open(cur_frag_path, 'wb') as handle:
+ pickle.dump(overall_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ with open(cur_frag_path2, 'wb') as handle:
+ pickle.dump(frag_location_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ return overall_dict,frag_location_dict
+
+def distributed_dp_calcu(chain_dict,cur_path_save_path,start_ldp_index,
+ study_ldp_base_list,ldp_gap_penalty,seq_gap_penalty,fragment_distance_array,
+ path_id,top_select):
+ path_assign_collection_list=[]
+ path_assign_score_list =[]
+ path_assign_interval_list =[]
+ path_chain_list=[]
+ path_direction_list=[]
+ path_chainlength_list=[]
+ for current_chain_id in chain_dict:
+ current_chain = chain_dict[current_chain_id]
+ current_reverse_chain = current_chain[::-1]#to match from end to begining condition
+ dp_save_path = os.path.join(cur_path_save_path,"path_starting_%d_chain_%s"%(start_ldp_index,current_chain_id))
+ mkdir(dp_save_path)
+ max_score, match_seq_line,match_seq_interval = dynamic_assign_multi(study_ldp_base_list,current_chain,
+ ldp_gap_penalty,seq_gap_penalty,fragment_distance_array, None)
+ path_assign_collection_list.extend(match_seq_line)
+ path_assign_score_list.extend(max_score)
+ path_assign_interval_list.extend(match_seq_interval)
+ path_chain_list.extend([current_chain_id]*len(max_score))
+ path_direction_list.extend([1]*len(max_score))
+ path_chainlength_list.extend([len(current_chain)]*len(max_score))
+ #delte all the dirs
+ if os.path.exists(dp_save_path):
+ os.system("rm -r "+dp_save_path)
+ dp_save_path = os.path.join(cur_path_save_path,"rpath_starting_%d_chain_%s"%(start_ldp_index,current_chain_id))
+ mkdir(dp_save_path)
+ max_score, match_seq_line,match_seq_interval = dynamic_assign_multi(study_ldp_base_list,current_reverse_chain,
+ ldp_gap_penalty,seq_gap_penalty,fragment_distance_array, None)
+ path_assign_collection_list.extend(match_seq_line)
+ path_assign_score_list.extend(max_score)
+ path_assign_interval_list.extend(match_seq_interval)
+ path_chain_list.extend([current_chain_id]*len(max_score))
+ path_direction_list.extend([-1]*len(max_score))
+ path_chainlength_list.extend([len(current_chain)]*len(max_score))
+ #delte all the dirs
+ if os.path.exists(dp_save_path):
+ os.system("rm -r "+dp_save_path)
+
+ path_save_path = os.path.join(cur_path_save_path,"Collecttop_%d_%d"%(path_id,start_ldp_index))
+ mkdir(path_save_path)
+ if len(os.listdir(path_save_path))>0:
+ os.system("rm -r "+path_save_path)
+ #clear this directory to avoid previous results
+
+ path_assign_score_list=np.array(path_assign_score_list)
+ path_score_index = np.argsort(path_assign_score_list)
+ path_score_index = path_score_index[::-1]#from bigger to smaller
+ rank_id = 1
+ final_path_assign_collection_list =[]
+ final_path_assign_score_list =[]
+ final_path_assign_interval_list =[]
+ final_path_chain_list=[]
+ final_path_direction_list=[]
+ final_path_chainlength_list=[]
+ for select_index in path_score_index[:top_select]:
+ current_assign_match_seq = path_assign_collection_list[select_index]
+ final_path_assign_collection_list.append(path_assign_collection_list[select_index])
+ final_path_assign_score_list.append(path_assign_score_list[select_index])
+ final_path_assign_interval_list.append(path_assign_interval_list[select_index])
+ final_path_chain_list.append(path_chain_list[select_index])
+ final_path_direction_list.append(path_direction_list[select_index])
+ final_path_chainlength_list.append(path_chainlength_list[select_index])
+ rank_id+=1
+ if os.path.exists(path_save_path):
+ os.system("rm -r "+path_save_path)
+ return start_ldp_index,final_path_assign_collection_list,final_path_assign_score_list,\
+ final_path_assign_interval_list,final_path_chain_list,final_path_direction_list,final_path_chainlength_list
+
+
+def greedy_assign_PS_effective(All_Base_Path_List_sugar,All_Path_List_sugar,Path_P_align_list,Path_P_reverse_align_list,
+ Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict,chain_prob,
+ ldp_size,sugar_point,map_info_list,chain_dict,greedy_save_path,top_select,checking_stride,num_cpus=128):
+
+ cur_frag_path = os.path.join(greedy_save_path,"All_fragment_%d_%d_%d.pkl"%(ldp_size,checking_stride,top_select))
+ cur_frag_path2 = os.path.join(greedy_save_path,"Location_fragment_%d_%d_%d.pkl"%(ldp_size,checking_stride,top_select))
+ if os.path.exists(cur_frag_path) and os.path.exists(cur_frag_path2):
+ with open(cur_frag_path, 'rb') as handle:
+ overall_dict= pickle.load(handle)
+ with open(cur_frag_path2, 'rb') as handle:
+ frag_location_dict = pickle.load(handle)
+ return overall_dict,frag_location_dict
+ base_region_prob = chain_prob[2:6]
+ merged_cd_dens = sugar_point.merged_cd_dens
+
+ ldp_proper_range = [4,10]
+ overall_score=0
+ #final_check_dir_list=[]
+ count_all_combination=0
+ count_use_combination=0
+ ldp_gap_penalty = 10 # only applied if not reasonable skip
+ seq_gap_penalty = 25 # applied once we need to skip an amino acid
+ overall_dict=defaultdict(list)#use path id staring index to design overlap
+ frag_location_dict = {}
+
+
+ for path_id,cur_path_list in enumerate(All_Base_Path_List_sugar):
+ current_path = All_Path_List_sugar[path_id]
+ current_pho_align = Path_P_align_list[path_id]
+ current_pho_align_reverse = Path_P_reverse_align_list[path_id]
+ current_length = len(cur_path_list)
+ current_base_list = All_Base_Path_List_sugar[path_id]
+ updated_base_list = [np.array(current_base_list[j]) for j in range(len(current_base_list))]
+ updated_base_list = np.array(updated_base_list)
+ #update the reward matrix based on the pho updates
+ pho_order_base_prob_list =[np.array(Pho_Prob_Refer_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Dict else np.zeros(4) for k in current_pho_align]
+ pho_reverse_base_prob_list = [np.array(Pho_Prob_Refer_Reverse_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Reverse_Dict else np.zeros(4) for k in current_pho_align_reverse]
+
+ assert len(pho_order_base_prob_list)==len(updated_base_list) and len(pho_order_base_prob_list)==len(pho_reverse_base_prob_list)
+ updated_base_score_list = [updated_base_list[j]+pho_order_base_prob_list[j]+pho_reverse_base_prob_list[j] for j in range(len(current_base_list))]
+ updated_base_score_list = np.array(updated_base_score_list)
+ #calculate the reverse score
+ #updated_base_score_reverse_list = [updated_base_list[j]+pho_reverse_base_prob_list[j] for j in range(len(current_base_list))]
+ #updated_base_score_reverse_list = np.array(updated_base_score_reverse_list)
+
+ cur_path_save_path = os.path.join(greedy_save_path,"path_%d"%path_id)
+ mkdir(cur_path_save_path)
+ from multiprocessing import Pool
+ p= Pool(num_cpus)
+
+ start_ldp_index =0
+ count_path_frag = 0
+ Res_List=[]
+ while start_ldp_index=0.5:
+ maintain_label=False
+ break
+ if maintain_label:
+ Remain_ID.append(i)
+ print("%d/%d edges are remained after prob map checking"%(len(Remain_ID),len(graph_edge)))
+ return Remain_ID
+
+def verify_edge_criteria(location1,location2,base_prob,sp_prob,prob_threshold):
+ divide_check =5
+ direction = location2 - location1
+ #in the edge, check 5 points to see if they break the rules to decide if we maintain the edge or not
+ maintain_label=True
+ for k in range(0,divide_check+1):
+ current_location = location1+direction*k/divide_check
+ x = int(current_location[0])
+ y = int(current_location[1])
+ z = int(current_location[2])
+ current_sp_prob = sp_prob[x,y,z]
+ current_base_prob = base_prob[x,y,z]
+ if current_sp_prob<=prob_threshold:
+ maintain_label=False
+ break
+ if current_base_prob>=0.5:
+ maintain_label=False
+ return False
+
+ return maintain_label
+
+def extend_graph_edge_connect(pho_graph,merged_cd_dens,sp_prob,base_prob,tmp_save_path,cutoff,prob_threshold=0.1):
+ node_pair_path = os.path.join(tmp_save_path,"Edge_extend_info.txt")
+
+ if not os.path.exists(node_pair_path):
+ distance_array = cdist(merged_cd_dens,merged_cd_dens)
+ #get all the connected information
+ connect_info=set()
+ graph_edge = pho_graph.edge
+ for i in range(len(graph_edge)):
+ id1= graph_edge[i].id1
+ id2 = graph_edge[i].id2
+ min_id = min(id1,id2)
+ max_id = max(id1,id2)
+ connect_info.add("%d_%d"%(min_id,max_id))
+
+ Add_Info=[]#[id1,id2,distance]
+ for k in range(len(distance_array)):
+ current_distance = distance_array[k]
+ connect_index_list = np.argwhere(current_distance=cut_off_length:
+ select_lists.append(tmp_list)
+ tmp_list = [coordinate_list[k]]
+ if len(tmp_list)>1:
+ select_lists.append(tmp_list)
+ return select_lists
+def Prune_Selected_Path(tmp_save_path,listfiles,merged_cd_dens,drna_graph,map_info_list,ext_name,cut_off_length):
+ split_save_path = os.path.join(tmp_save_path,"Prune")
+ mkdir(split_save_path)
+ count_iter=0
+ All_Node_Path_List=[]
+ for j,item in enumerate(listfiles):
+ cur_cif_path = os.path.join(tmp_save_path,item)
+ #extract the coordinates
+ extract_coord_list = Extract_CIF_coord(cur_cif_path) #order indicates connections
+ if len(extract_coord_list)<=2:
+ continue
+ #identify the corresponding node id
+ node_id_list = Identify_Trace_Node(merged_cd_dens,extract_coord_list,map_info_list)
+ All_Node_Path_List.append(node_id_list)
+ #first visualize the selected node
+ coordinate_list, edge_pairs = Pick_Graph_coord(node_id_list, merged_cd_dens,
+ drna_graph.edge, drna_graph.Nnode, map_info_list)
+
+ visualize_graph(split_save_path,ext_name+"_graph%d"%j,coordinate_list,edge_pairs)
+ select_path_list = Filter_Path_With_Edge(coordinate_list,edge_pairs,cut_off_length)
+ print("%d node path find %d sub path"%(len(coordinate_list),len(select_path_list)))
+ for select_path in select_path_list:
+ visualize_graph(split_save_path,ext_name+"_path%d"%count_iter,select_path,None)
+ count_iter+=1
+ return All_Node_Path_List
+
+def construct_graph(input_mrc,input_density,pho_point,sp_prob,base_prob,save_path,ext_name,map_info_list,params,prob_threshold=0.1,extend=True):
+ #construct the graph to build edges for further processing.
+ pho_graph = Main_Graph(params)
+ tmp_save_path = os.path.join(save_path,ext_name)
+ mkdir(tmp_save_path)
+ pho_graph.setup_graph(pho_point, tmp_save_path)
+ coordinate_list, edge_pairs = Pick_Graph_coord(list(np.arange(pho_graph.Nnode)), pho_point.merged_cd_dens,
+ pho_graph.edge, pho_graph.Nnode, map_info_list)
+ visualize_graph(tmp_save_path,ext_name,coordinate_list,edge_pairs)
+ #1.3 prune graph edges by prediction map
+ #rule 1: phosphate_prob<=prob_threshold
+ #rule 2: do not cross base regions >0.5
+ edge_prune_path = os.path.join(tmp_save_path,"edge_remain_id.pkl")
+ if not os.path.exists(edge_prune_path):
+ Remain_ID_List= Prune_Graph_Edge(pho_graph,pho_point.merged_cd_dens,sp_prob,base_prob,prob_threshold=prob_threshold)
+ with open(edge_prune_path, 'wb') as handle:
+ pickle.dump(Remain_ID_List, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ else:
+ with open(edge_prune_path, 'rb') as handle:
+ Remain_ID_List= pickle.load(handle)
+ pho_graph.edge = [edge for k,edge in enumerate(pho_graph.edge) if k in Remain_ID_List]
+ coordinate_list, edge_pairs = Pick_Graph_coord(list(np.arange(pho_graph.Nnode)), pho_point.merged_cd_dens,
+ pho_graph.edge, pho_graph.Nnode, map_info_list)
+ visualize_graph(tmp_save_path,ext_name+"_prune",coordinate_list,edge_pairs)
+ #add more edges by using params['R'] to connect all nodes that within R while did not pass any base_region>0.5 and phosphate_prob>=threshold
+ if extend:
+ extend_graph_edge_connect(pho_graph,pho_point.merged_cd_dens,sp_prob, base_prob,tmp_save_path,params['R'],prob_threshold)
+ coordinate_list, edge_pairs = Pick_Graph_coord(list(np.arange(pho_graph.Nnode)), pho_point.merged_cd_dens,
+ pho_graph.edge, pho_graph.Nnode, map_info_list)
+ visualize_graph(tmp_save_path,ext_name+"_extend",coordinate_list,edge_pairs)
+ pho_graph.Ne = len(pho_graph.edge)
+ edge_d_dens= pho_graph.set_edge_dens(tmp_save_path,input_mrc,pho_point,input_density)
+ return pho_graph,coordinate_list,edge_pairs,edge_d_dens
diff --git a/CryoREAD/graph/LDP_ops.py b/CryoREAD/graph/LDP_ops.py
new file mode 100644
index 0000000..1d168c5
--- /dev/null
+++ b/CryoREAD/graph/LDP_ops.py
@@ -0,0 +1,187 @@
+
+import os
+import numpy as np
+from structure.Points import Points
+from data_processing.map_utils import permute_ns_coord_to_pdb,permute_map_coord_to_pdb,permute_pdb_coord_to_map
+from graph.io_utils import save_LDP_map
+from graph.visualize_utils import Show_Graph_Connect,Show_Bfactor_cif
+from scipy.spatial.distance import cdist
+def Extract_LDP_coord(merged_cd_dens, mapc, mapr, maps, origin, nxstart, nystart, nzstart):
+ #1st filter out all the coordinates in this cluster
+ nstart = [nxstart,nystart,nzstart]
+ nstart = permute_ns_coord_to_pdb(nstart,mapc,mapr,maps)
+ new_origin = [origin[k]+nstart[k] for k in range(3)]
+ All_location = []
+ for node_id in range(len(merged_cd_dens)):
+ node_id = int(node_id)
+ location = merged_cd_dens[node_id,:3]
+ location = permute_map_coord_to_pdb(location,mapc,mapr,maps)
+ All_location.append([location[k]+new_origin[k] for k in range(3)])
+ return All_location
+def Convert_LDPcoord_To_Reallocation(Coordinate_List, map_info_list):
+ #1st filter out all the coordinates in this cluster
+ mapc, mapr, maps, origin, nxstart, nystart, nzstart = map_info_list
+ nstart = [nxstart,nystart,nzstart]
+ nstart = permute_ns_coord_to_pdb(nstart,mapc,mapr,maps)
+ new_origin = [origin[k]+nstart[k] for k in range(3)]
+ All_location = []
+ for node_id in range(len(Coordinate_List)):
+ node_id = int(node_id)
+ location =Coordinate_List[node_id]
+ location = permute_map_coord_to_pdb(location,mapc,mapr,maps)
+ All_location.append([location[k]+new_origin[k] for k in range(3)])
+ return All_location
+
+def calculate_merge_point_density(point,prob_array):
+ before_merge_data = point.merged_data #[merge_to_id,x,y,z, density,merge_status]# 1st column is the id points to the merged points, which is the member id
+ after_merge_data = point.merged_cd_dens
+ Output_Prob = np.zeros(len(after_merge_data))
+ count_isolate=0
+ for k in range(len(before_merge_data)):
+ merge_to_id,x,y,z,density,merge_status=before_merge_data[k]
+ x,y,z = int(x),int(y),int(z)
+ if merge_to_id==k and merge_status==-1:
+ count_isolate+=1
+ continue
+ while merge_status==-1:
+ merge_to_id = int(merge_to_id)
+ merge_to_id,_,_,_,_,merge_status=before_merge_data[merge_to_id]
+ final_id = int(merge_status)
+ Output_Prob[final_id]+=prob_array[x,y,z]
+ point.merge_prob = Output_Prob
+ print("in total %d isolated grid points"%count_isolate)
+ Output_Prob = np.zeros(len(after_merge_data))
+ for k in range(len(after_merge_data)):
+ x,y,z,_ = after_merge_data[k]
+ x,y,z = int(x),int(y),int(z)
+ Output_Prob[k]=prob_array[x,y,z]
+ point.point_prob = Output_Prob
+ return point
+
+
+def build_LDP(input_mrc,sugar_density, sugar_Nact,origin_map_path,save_path,ext_name,params,map_info_list,relax_LDP=False):
+ #relax_LDP only allow mean shift limited distance and limited length
+ mean_shift_path = os.path.join(save_path, ext_name+'_mean_shift')
+ sugar_point = Points(params, sugar_Nact)
+ if relax_LDP:
+ if not os.path.exists(mean_shift_path + '_cd_relax.txt') \
+ or not os.path.exists(mean_shift_path + '_dens_rleax.txt'):
+ input_mrc.general_mean_shift(sugar_density,sugar_point, mean_shift_path,constriant=True)
+ else:
+ input_mrc.load_general_mean_shift(sugar_density,sugar_point, mean_shift_path,constriant=True)
+ else:
+ if not os.path.exists(mean_shift_path + '_cd.txt') \
+ or not os.path.exists(mean_shift_path + '_dens.txt'):
+ input_mrc.general_mean_shift(sugar_density,sugar_point, mean_shift_path)
+ else:
+ input_mrc.load_general_mean_shift(sugar_density,sugar_point, mean_shift_path)
+ #change the density to common value without dividing the Nori
+ sugar_point.recover_density()
+ sugar_point_path = os.path.join(save_path, ext_name+'_point.txt')
+ # init_id, x,y,z,density, merged_to_id
+ # can use the x,y,z here to assign detailed probability for each LDP points.
+ if not os.path.exists(sugar_point_path):
+ sugar_point.Merge_point(input_mrc, sugar_point_path) # You will get a merged point file here.
+ else:
+ sugar_point.load_merge(input_mrc, sugar_point_path)
+ merged_cd_dens = sugar_point.merged_cd_dens#np.loadtxt(sugar_point_path[:-4] + 'onlymerged.txt')
+ if len(merged_cd_dens)>0:
+ LDP_save_path = os.path.join(save_path,ext_name+"_LDP.mrc")
+ save_LDP_map(LDP_save_path, merged_cd_dens, origin_map_path)
+ mapc, mapr, maps, origin, nxstart, nystart, nzstart = map_info_list
+ All_location = Extract_LDP_coord(merged_cd_dens,mapc, mapr, maps, origin, nxstart, nystart, nzstart)
+ graph_path = os.path.join(save_path, ext_name+"_LDP.pdb")
+ Show_Graph_Connect(All_location, [], graph_path)
+ #plot in b factor
+ ldp_prob_path = os.path.join(save_path, ext_name+"_LDPdens.cif")
+ Show_Bfactor_cif(ext_name+"_dens",All_location,ldp_prob_path,sugar_point.merged_cd_dens[:,3])
+ #Get each LDP's sum probability values from its neighbors
+ sugar_point = calculate_merge_point_density(sugar_point,sugar_density)
+
+ #turns out density showed very good results its correlation with real phosphate positions
+ return sugar_point
+from ops.os_operation import mkdir
+def Build_Base_LDP(input_mrc,chain_prob,base_prob_threshold,origin_map_path,params,map_info_list,save_path,filter_type=0):
+ mkdir(save_path)
+ base_name_list=['A','UT','C','G']
+ base_region_detection = chain_prob[-1]
+ no_base_region = base_region_detectionldp_intersection_starting_index:
+ return True
+ else:
+ return False
+
+def overlap_size(interval1,interval2):
+ ldp_intersection_starting_index = max(interval1[0],interval2[0])
+ ldp_intersection_ending_index = min(interval1[1],interval2[1])
+ if ldp_intersection_ending_index>ldp_intersection_starting_index:
+ return ldp_intersection_ending_index-ldp_intersection_starting_index
+ else:
+ return 0
+
+
+def calculate_identity(match_align1_seq,match_align2_seq):
+ count=0
+ assert len(match_align1_seq)==len(match_align2_seq)
+ for k in range(len(match_align1_seq)):
+ if match_align1_seq[k]==match_align2_seq[k]:
+ count+=1
+ return count/len(match_align1_seq)
+def build_collision_table(All_Base_Assign_List,checking_stride,ldp_size,
+ overall_dict,collision_save_path,soft_rule=1,identity_cutoff=0.8,
+ load_collision=True,overlap_cutoff=0.2):
+ #build n*n collision table for the assembling
+ order_key_index={}#map current order to original key
+ order_chain_index= {}#to index which chain it corressponds to
+ key_order_index ={} #map original key to current order
+ order_index = 0
+ count_total_frag=0
+ #1. collect all relationship of chain
+ to_location_idx ={}
+ for k,cur_path_list in enumerate(All_Base_Assign_List):
+ current_base_list = All_Base_Assign_List[k]
+ start_ldp_index =0
+ while start_ldp_index=max_seqfrag_length*overlap_cutoff:
+ collision_table[i,j]=1
+ collision_table[j,i]=1
+ count_seq_collision1+=1
+ continue
+ else:
+ #if it's same direction and have overlap, then avoid assign
+ if overlap_size(match_intersect_interval1, match_intersect_interval2)>=max_seqfrag_length*overlap_cutoff:
+ collision_table[i,j]=1
+ collision_table[j,i]=1
+ count_seq_collision2 +=1
+
+ continue
+
+ count_seq_acceptable+=1
+ if i%10==0:
+ print("collision table finished %d/%d"%(i,len(collision_table)))
+ print("-"*30)
+ print("ldp overlap collision report:")
+ print("1. different chain or different direction:%d"%count_ldp_collision1)
+ print("2. no overlap in sequence:%d"%count_ldp_collision2)
+ print("3. matched region sequence no overlap:%d"%count_ldp_collision3)
+ print("4. matched region sequence low identity:%d"%count_ldp_collision4)
+ print("# Total acceptable pass ldp overlap check: %d"%count_ldp_acceptable)
+
+ print("-"*30)
+ print("seq overlap collision report:")
+ print("1. interaction of different direction assignment:%d"%count_seq_collision1)
+ print("2. interaction of same direction assignment:%d"%count_seq_collision2)
+ print("3. nearby assignment did not satisfy distance constraint:%d"%count_seq_collision3)
+ print("# Total acceptable pass for seq in same chain:%d"%count_seq_acceptable)
+ print("-"*30)
+ np.save(collision_save_path,collision_table)
+ return collision_table,order_key_index,order_chain_index,key_order_index
+
+
+from ortools.sat.python import cp_model
+from ortools.linear_solver import pywraplp
+#from ortools.init import pywrapinit
+import numpy as np
+def prepare_score_array(N_order, order_key_index,order_chain_index,overall_dict):
+ score_array = np.zeros(N_order)
+ for k in range(N_order):
+ current_key1 = order_key_index[k]
+ current_chain_candidate1 = order_chain_index[k]
+ seq_info = overall_dict[current_key1][current_chain_candidate1]
+ current_score = seq_info['score']
+ score_array[k]=int(current_score*100)#keep 0.01 precision
+ return score_array
+
+import time
+def solve_assignment(collision_table,order_key_index,order_chain_index,overall_dict,time_use=3600):
+ model = cp_model.CpModel()
+
+ x = []
+ for i in range(len(collision_table)):
+ x.append(model.NewBoolVar(f'x[{i}]'))
+
+ print('#Adding constraints')
+ #collision can't be true, if current assignment is determined
+ for i in range(len(collision_table)):
+ model.Add(sum(x[j] for j in range(len(collision_table)) if collision_table[i][j]==1)==0).OnlyEnforceIf(x[i])
+
+ score_array=prepare_score_array(len(collision_table), order_key_index,order_chain_index,overall_dict)
+ score_array = score_array.astype(int)
+ print('#Setting Objective Score')
+ # Objective
+ objective_terms = []
+ #[fid,rawsco,zsco,ali,tabu]
+ for i in range(len(collision_table)):
+ objective_terms.append(score_array[i]* x[i]) #Sum of Raw Scores
+ model.Maximize(sum(objective_terms))
+
+
+ print('#Start Solving...')
+ # Solve
+ print("Time limit is set to %d seconds"%time_use)
+ solver = cp_model.CpSolver()
+ solver.parameters.max_time_in_seconds = time_use
+ solver.parameters.num_search_workers = 8
+ solver.parameters.log_search_progress = True
+ solution_printer = cp_model.ObjectiveSolutionPrinter()
+ time1 = time.time()
+ status = solver.SolveWithSolutionCallback(model, solution_printer)
+ time2 = time.time()
+ print("Time used for solving:",time2-time1)
+ """
+ %unignore operations_research::MPSolver::ResultStatus;
+ %unignore operations_research::MPSolver::OPTIMAL; value 0
+ %unignore operations_research::MPSolver::FEASIBLE; value 1 // No unit test
+ %unignore operations_research::MPSolver::INFEASIBLE; value 2
+ %unignore operations_research::MPSolver::UNBOUNDED; value 3 // No unit test
+ %unignore operations_research::MPSolver::ABNORMAL; value 4
+ %unignore operations_research::MPSolver::NOT_SOLVED; value 5 // No unit test
+
+ OPTIMAL = _pywraplp.Solver_OPTIMAL
+ r optimal
+ FEASIBLE = _pywraplp.Solver_FEASIBLE
+ r feasible, or stopped by limit.
+ INFEASIBLE = _pywraplp.Solver_INFEASIBLE
+ r proven infeasible.
+ UNBOUNDED = _pywraplp.Solver_UNBOUNDED
+ r proven unbounded.
+ ABNORMAL = _pywraplp.Solver_ABNORMAL
+ r abnormal, i.e., error of some kind.
+ NOT_SOLVED = _pywraplp.Solver_NOT_SOLVED
+ r not been solved yet.
+ """
+ print("current status:",status)
+ results=[]
+ if status == pywraplp.Solver.OPTIMAL or status==pywraplp.Solver.FEASIBLE:
+ try:
+ for i in range(len(collision_table)):
+ if solver.BooleanValue(x[i]):
+ #print(i,'ID=',idtbl[i]) #order->ID
+ results.append(i)
+ except:
+ print("solution status is optimal or feasible, but error raised. ",results)
+ results=[]
+
+ # elif status == pywraplp.Solver.INFEASIBLE or status == pywraplp.Solver.UNBOUNDED:
+ # print("current status:",status)
+ # print("no optimal or feasible solution found for assembling, use temporary solution for final results.")
+ # try:
+ # for i in range(len(collision_table)):
+ # if solver.BooleanValue(x[i]):
+ # #print(i,'ID=',idtbl[i]) #order->ID
+ # results.append(i)
+ # except:
+ # return []#indicate falure
+ else:
+ print("*"*100)
+ print("current status:",status)
+ print('No solution found for assembling, please make contact the developer! You can also check the CryoREAD_noseq.pdb as temporary results.')
+ print("*"*100)
+ return results
diff --git a/CryoREAD/graph/assignment_ext.py b/CryoREAD/graph/assignment_ext.py
new file mode 100644
index 0000000..0f0f296
--- /dev/null
+++ b/CryoREAD/graph/assignment_ext.py
@@ -0,0 +1,144 @@
+import numpy as np
+import os
+
+import pickle
+from collections import defaultdict
+def identify_last_match(match_seq):
+ match_index=0
+ for k in range(len(match_seq)):
+ if match_seq[k]=="-":
+ continue
+ match_index=k
+ return match_index
+def reorganize_missing_list(missing_list,define_ldp_size):
+ final_missing_list = []
+ for k in range(len(missing_list)):
+ start,end = missing_list[k]
+ if end-start<=define_ldp_size:
+ final_missing_list.append([start,end])
+ else:
+ gap_range=end-start
+ # divide_num = int(np.ceil(gap_range/define_ldp_size))
+ # small_gap = int(gap_range/divide_num)
+ # print("gap design %d, we need to go through %d frags"%(small_gap,divide_num))
+ # print("previous grap %d/%d"%(start,end))
+ # for kk in range(divide_num):
+ # if kk!=divide_num-1:
+ # final_missing_list.append([start,start+small_gap-1])
+ # start+=small_gap
+ # else:
+ # final_missing_list.append([start,end])
+ max_check_length= define_ldp_size-1
+ while startgap_limit:
+ Missing_Range_Dict[ldp_path_id].append([0,starting_index])
+ else:
+ cur_starting_index,cur_ending_index = identified_ldp_frag_list[k]
+ next_starting_index,next_ending_index = identified_ldp_frag_list[k+1]
+ if next_starting_index-cur_ending_index<=gap_limit:
+ continue
+ current_match_info = current_path_match_info[k]#check which index is the last matched residue, then we use that as a starting referece
+ cur_last_real_match_idx = identify_last_match(current_match_info['match_seq'])+cur_starting_index
+ Missing_Range_Dict[ldp_path_id].append([cur_last_real_match_idx,next_starting_index])
+ total_ldp_list_length = len(cur_sugar_ldp_path_list)
+ final_ending_frag_index = identified_ldp_frag_list[-1][1]
+ if total_ldp_list_length-final_ending_frag_index>gap_limit:
+ current_match_info = current_path_match_info[-1]#check which index is the last matched residue, then we use that as a starting referece
+ cur_last_real_match_idx = identify_last_match(current_match_info['match_seq'])+identified_ldp_frag_list[-1][0]
+ Missing_Range_Dict[ldp_path_id].append([cur_last_real_match_idx,total_ldp_list_length])
+ Path_Assign_Dict=defaultdict(dict)
+ #3 fill the missing alignment region simply use dp results.
+ for ldp_path_id in range(len(All_Path_List_sugar)):
+ cur_sugar_ldp_path_list = All_Path_List_sugar[ldp_path_id]
+ missing_list = Missing_Range_Dict[ldp_path_id]
+ missing_list = reorganize_missing_list(missing_list,define_ldp_size)#specifically for strict since some missing region is even bigger compared to
+ for k in range(len(missing_list)):
+ fill_starting_index, fill_ending_index = missing_list[k]
+ current_key = "%d_%d"%(ldp_path_id,int(fill_starting_index))
+
+ if len(cur_sugar_ldp_path_list)-fill_starting_index<=gap_limit:
+ continue
+ current_seq_info = overall_dict[current_key]
+ iter_check=0
+ while len(current_seq_info)==0 and iter_check<=5:
+ fill_starting_index-=1
+ current_key = "%d_%d"%(ldp_path_id,int(fill_starting_index))
+ current_seq_info = overall_dict[current_key]
+ iter_check+=1
+ if iter_check>5:
+ #search another direction
+ iter_check=0
+ while len(current_seq_info)==0 and iter_check<=5:
+ fill_starting_index+=1
+ current_key = "%d_%d"%(ldp_path_id,int(fill_starting_index))
+ current_seq_info = overall_dict[current_key]
+ iter_check+=1
+ if iter_check>5:
+ continue
+ print("extend for %d-%d"%(fill_starting_index,fill_ending_index))
+ current_seq_info = overall_dict[current_key][0]#use top 1 as match candidate
+ print("before info:",current_seq_info)
+ remove_match_seq = current_seq_info['match_seq'][(fill_ending_index-fill_starting_index):]
+ current_seq_info['match_seq']=current_seq_info['match_seq'][:fill_ending_index-fill_starting_index]
+
+ count_remove_match_num = len([1 for tmp_match_id in remove_match_seq if tmp_match_id!="-"])
+
+ current_seq_info['interval'][1]=current_seq_info['interval'][1]-count_remove_match_num
+ print("previous info:",current_seq_info)
+ Path_Assign_Dict[ldp_path_id][fill_starting_index]= current_seq_info
+ with open(extend_support_path, 'wb') as handle:
+ pickle.dump(Path_Assign_Dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+
+ return Path_Assign_Dict
+
+
+
+
+
diff --git a/CryoREAD/graph/geo_structure_modeling.py b/CryoREAD/graph/geo_structure_modeling.py
new file mode 100644
index 0000000..aeb564a
--- /dev/null
+++ b/CryoREAD/graph/geo_structure_modeling.py
@@ -0,0 +1,120 @@
+
+from collections import defaultdict
+import numpy as np
+import os
+from ops.os_operation import mkdir
+from graph.io_utils import append_cif_info
+from atomic.io_utils import Write_Atomic_Fraginfo_cif
+from graph.structure_utils import clean_pho_assign_info_list
+from graph.LDP_ops import Convert_LDPcoord_To_Reallocation
+from ops.cif_utils import cif2pdb
+def build_cif_PS(current_path_dict,path_id,ldp_sugar_location,
+ Path_P_align_list,pho_point,map_info_list):
+ pho_merged_cd = pho_point.merged_cd_dens[:,:3]
+ current_seq_info = current_path_dict
+ match_seq = current_seq_info['match_seq']
+ cur_score = current_seq_info['score']
+ current_ldp_location = ldp_sugar_location
+ current_pho_ldp_align_info = Path_P_align_list
+ current_pho_ldp_location = [pho_merged_cd[int(kk)] for kk in current_pho_ldp_align_info ]
+ current_pho_ldp_location = Convert_LDPcoord_To_Reallocation(current_pho_ldp_location , map_info_list)
+ #start to ensemble all the information
+ fragment_info_list = []
+ current_seq_index = 1
+ if path_id>=62:
+ path_id= path_id%62
+ chain_dict={}
+ for k in range(26):
+ chain_dict[k]=chr(65+k)
+ for k in range(26):
+ chain_dict[26+k]=chr(97+k)
+ for k in range(10):
+ chain_dict[52+k]="%d"%k
+ align_length= len([x for x in match_seq if x!="-"])
+ avg_score = cur_score/align_length
+ chain_id = chain_dict[int(path_id)]
+ for k in range(len(match_seq)):
+ if match_seq[k]=="-":
+ continue
+ else:
+ cur_pho_tmp_position = current_pho_ldp_location[k]
+ if current_pho_ldp_align_info[k]>=0:#make sure it has assigned P
+ fragment_info_list.append([chain_id,current_seq_index+1,'P',cur_pho_tmp_position,match_seq[k],avg_score])
+ cur_resi_position = current_ldp_location[k]
+ fragment_info_list.append([chain_id,current_seq_index+1,"C4'",cur_resi_position,match_seq[k],avg_score])
+
+ current_seq_index+=1
+
+ return fragment_info_list,cur_score
+
+
+
+def build_atomic_fragment_cluster_cif_SP(Path_Assign_Dict,all_sugar_location,Path_P_align_list,Path_P_reverse_align_list,
+ save_dir,pho_point,map_info_list,refer_base_location,ext_name="geo"):
+ overall_score=0
+ check_file_list=[]
+ for ldp_path_id in Path_Assign_Dict.keys():
+ cur_ldp_sugar_location= all_sugar_location[ldp_path_id]
+ cur_pho_ldp_nodeid_list = Path_P_align_list[ldp_path_id]
+ cur_pho_ldp_nodeid_reverse_list = Path_P_reverse_align_list[ldp_path_id]
+ current_path_dict = Path_Assign_Dict[ldp_path_id]
+ if current_path_dict['direction']==1:
+ input_pho_nodeid = cur_pho_ldp_nodeid_list
+ else:
+ input_pho_nodeid = cur_pho_ldp_nodeid_reverse_list
+ tmp_save_path = os.path.join(save_dir,"%s_%d_path.cif"%(ext_name, ldp_path_id))
+
+ fragment_info_list,frag_score = build_cif_PS(current_path_dict,ldp_path_id, cur_ldp_sugar_location,
+ input_pho_nodeid,pho_point,map_info_list)
+ overall_score+=frag_score
+ if len(fragment_info_list)<=2:
+ #not enough to build an atomic model, skip it.
+ continue
+ #clean pho locations in case we have one pho assigned to 2 sugars in the path
+ fragment_info_list,_ = clean_pho_assign_info_list(fragment_info_list,pho_point.merged_cd_dens[:,:3],map_info_list)
+ fragment_info_list,further_flag = clean_pho_assign_info_list(fragment_info_list,pho_point.merged_cd_dens[:,:3],map_info_list,round=2)
+ if further_flag:
+ fragment_info_list,_ = clean_pho_assign_info_list(fragment_info_list,pho_point.merged_cd_dens[:,:3],map_info_list,round=2)
+ if len(fragment_info_list)<=2:
+ #not enough to build an atomic model, skip it.
+ continue
+ Write_Atomic_Fraginfo_cif("%s_%d_cluster"%(ext_name,ldp_path_id),fragment_info_list,refer_base_location,tmp_save_path,False,map_info_list)
+ check_file_list.append(tmp_save_path)
+
+ return check_file_list,overall_score
+
+def Build_Atomic_Structure(overall_dict,
+ all_sugar_location, save_dir,
+ Path_P_align_list,Path_P_reverse_align_list,pho_point,map_info_list,refer_base_location):
+ frag_dir = os.path.join(save_dir,"Path_Atomic_Frags")
+ mkdir(frag_dir)
+
+ check_file_list,overall_score = build_atomic_fragment_cluster_cif_SP(overall_dict,
+ all_sugar_location,Path_P_align_list,Path_P_reverse_align_list,
+ frag_dir,pho_point,map_info_list,refer_base_location)
+
+ fragment_all_path = os.path.join(save_dir,"Final_Assemble_geo.cif")
+ with open(fragment_all_path,'w') as file:
+ file.write("#score: %f\n"%overall_score)
+ check_file_list.sort()
+ for item in check_file_list:
+ #cur_frag_path = os.path.join(save_dir,item)
+ cur_frag_name = os.path.split(item)[1]
+ cur_entry_id = cur_frag_name.replace(".cif","")
+ append_cif_info(cur_entry_id,item,fragment_all_path)
+ fragment_all_path = os.path.join(save_dir,"Final_Assemble_geo.pdb")
+ with open(fragment_all_path,'w') as file:
+ file.write("#score: %f\n"%overall_score)
+ check_file_list.sort()
+ pdb_file_list=[]
+ for item in check_file_list:
+ pdb_file_name = item.replace(".cif",".pdb")
+ cif2pdb(item,pdb_file_name)
+ pdb_file_list.append(pdb_file_name)
+ with open(fragment_all_path,'a+') as wfile:
+ for item in pdb_file_list:
+ with open(item,'r') as rfile:
+ for line in rfile:
+ wfile.write(line)
+
+
diff --git a/CryoREAD/graph/geo_utils.py b/CryoREAD/graph/geo_utils.py
new file mode 100644
index 0000000..80ef9e9
--- /dev/null
+++ b/CryoREAD/graph/geo_utils.py
@@ -0,0 +1,41 @@
+
+import imp
+from scipy.spatial.distance import cdist
+import numpy as np
+from collections import defaultdict
+
+from graph.LDP_ops import permute_point_coord_to_global_coord
+
+
+
+def Match_Sugar_Base_Location(Path_ID_List,sugar_point,Base_LDP_List,map_info_list):
+ #a dict that matches [global_sugar_coord]:[global_assigned_base_coord]
+ sugar_coordinate = sugar_point.merged_cd_dens[:,:3]
+
+ Base_Coord_List = []
+ for base_ldp in Base_LDP_List:
+ if len(base_ldp.merged_cd_dens)>0:
+ base_coord = base_ldp.merged_cd_dens[:,:3]
+ Base_Coord_List.append(base_coord)
+ Base_Coord_List = np.concatenate(Base_Coord_List,axis=0)
+ sb_distance = cdist(sugar_coordinate,Base_Coord_List)
+ Base_refer_dict={}
+ for cur_path_list in Path_ID_List:
+ current_length = len(cur_path_list)
+ for k in range(current_length):
+ node_id1 = int(cur_path_list[k])
+
+ node_id1_sugar_location = sugar_coordinate[node_id1]
+ check_location = ""
+ for kk in range(3):
+ check_location+="%.4f,"%node_id1_sugar_location[kk]
+ sb_distance_node1 = sb_distance[node_id1]
+ nearby_index = np.argmin(sb_distance_node1)
+ cur_base_location = Base_Coord_List[nearby_index]
+ global_sugar_coord = permute_point_coord_to_global_coord(node_id1_sugar_location,map_info_list)
+ new_key = ""
+ for k in range(3):
+ new_key+="%.4f,"%(global_sugar_coord[k])
+ global_base_coord = permute_point_coord_to_global_coord(cur_base_location,map_info_list)
+ Base_refer_dict[new_key]=global_base_coord
+ return Base_refer_dict
diff --git a/CryoREAD/graph/io_utils.py b/CryoREAD/graph/io_utils.py
new file mode 100644
index 0000000..574d775
--- /dev/null
+++ b/CryoREAD/graph/io_utils.py
@@ -0,0 +1,60 @@
+
+import mrcfile
+import numpy as np
+def save_LDP_map(save_map_path,LDP_array,origin_map_path):
+
+ with mrcfile.open(origin_map_path, permissive=True) as mrc:
+ prev_voxel_size = mrc.voxel_size
+ prev_voxel_size_x = float(prev_voxel_size['x'])
+ prev_voxel_size_y = float(prev_voxel_size['y'])
+ prev_voxel_size_z = float(prev_voxel_size['z'])
+ nx, ny, nz, nxs, nys, nzs, mx, my, mz = \
+ mrc.header.nx, mrc.header.ny, mrc.header.nz, \
+ mrc.header.nxstart, mrc.header.nystart, mrc.header.nzstart, \
+ mrc.header.mx, mrc.header.my, mrc.header.mz
+ orig = mrc.header.origin
+ print("Origin:", orig)
+ print("Previous voxel size:", prev_voxel_size)
+ print("nx, ny, nz", nx, ny, nz)
+ print("nxs,nys,nzs", nxs, nys, nzs)
+ print("mx,my,mz", mx, my, mz)
+ data= mrc.data
+ prediction = np.zeros(data.shape)
+ for k in range(len(LDP_array)):
+ x,y,z,density = LDP_array[k]
+ x, y, z = int(x), int(y),int(z)
+ prediction[x,y,z]=density
+
+ data_new = np.float32(prediction)
+ mrc_new = mrcfile.new(save_map_path, data=data_new, overwrite=True)
+ vsize = mrc_new.voxel_size
+ vsize.flags.writeable = True
+ vsize.x = 1.0
+ vsize.y = 1.0
+ vsize.z = 1.0
+ mrc_new.voxel_size = vsize
+ mrc_new.update_header_from_data()
+ mrc_new.header.nxstart = nxs * prev_voxel_size_x
+ mrc_new.header.nystart = nys * prev_voxel_size_y
+ mrc_new.header.nzstart = nzs * prev_voxel_size_z
+ mrc_new.header.mapc = mrc.header.mapc
+ mrc_new.header.mapr = mrc.header.mapr
+ mrc_new.header.maps = mrc.header.maps
+ mrc_new.header.origin = orig
+ mrc_new.update_header_stats()
+ mrc.print_header()
+ mrc_new.print_header()
+ mrc_new.close()
+ del data_new
+
+
+def append_cif_info(cur_entry_id,cur_frag_path,fragment_all_path):
+ with open(fragment_all_path,'a+') as wfile:
+ with open(cur_frag_path,'r') as rfile:
+ wfile.write("#\n")
+ wfile.write("data_"+str(cur_entry_id)+"\n")
+ wfile.write("_entry.id "+str(cur_entry_id)+"\n")
+ wfile.write("#\n")
+ for j,line in enumerate(rfile):
+ if j>=2:
+ wfile.write(line)
diff --git a/CryoREAD/graph/ortool_path_ops.py b/CryoREAD/graph/ortool_path_ops.py
new file mode 100644
index 0000000..d0c317c
--- /dev/null
+++ b/CryoREAD/graph/ortool_path_ops.py
@@ -0,0 +1,164 @@
+
+
+from ortools.constraint_solver import routing_enums_pb2
+from ortools.constraint_solver import pywrapcp
+import numpy as np
+import os
+from graph.visualize_utils import visualize_graph
+
+def create_ortool_data_model(coordinate_list,edge_pairs,edge_density, num_vehicles=20,cutoff_length=10,relax_choice=True):
+ """Stores the data for the problem."""
+ max_Value = 99999999
+ adj_matrix = np.ones([len(coordinate_list)+1,len(coordinate_list)+1])*max_Value#super big to avoid this connections
+ #adj matrix must be an integer matrix
+ for k in range(len(edge_pairs)):
+ node1, node2 = edge_pairs[k]
+ current_prob = edge_density[k]
+ #edge pairs already begins with 1
+ adj_matrix[node1,node2]= int(current_prob*100)
+ adj_matrix[node2,node1]= int(current_prob*100)
+
+ print("number of edges in the data: ",len(np.argwhere(adj_matrix!=max_Value)))
+ print("distance mean %f"%np.mean(adj_matrix[adj_matrix!=max_Value]))
+
+ #adj_matrix[adj_matrix==99999]=sys.maxsize
+ adj_matrix[0,:]=0#use 0 as a pseudo search starting point
+ adj_matrix[:,0]=0
+ adj_matrix[np.arange(1,len(coordinate_list)+1),np.arange(1,len(coordinate_list)+1)]=0
+
+
+
+ #continue add edges for remained nodes: 10 times of the actual distances.
+ if relax_choice:#necessary to be able to search useful candidates
+ max_distance = np.max(adj_matrix[adj_matrix!=max_Value])
+ for i in range(1,len(coordinate_list)+1):
+ for j in range(1,len(coordinate_list)+1):
+ if adj_matrix[i,j]==max_Value:
+ tmp_coord1 = coordinate_list[int(i)-1]
+ tmp_coord2 = coordinate_list[int(j)-1]
+ cur_distance = 0
+ for kk in range(3):
+ cur_distance+=(tmp_coord1[kk]-tmp_coord2[kk])**2
+ cur_distance = np.sqrt(cur_distance)
+ if cur_distance>=cutoff_length:
+ adj_matrix[i,j] = int(2*max_distance+cur_distance*200)#big penalty for them
+ else:
+ adj_matrix[i,j] = int(max_distance+cur_distance*100)
+ print("updated distance mean %f"%np.mean(adj_matrix[adj_matrix!=max_Value]))
+ data = {}
+ data['distance_matrix'] = adj_matrix#final_adj_matrix
+ data['num_vehicles'] = num_vehicles
+ print("number vehicles: ",num_vehicles,"adj matrix shape: ",len(adj_matrix))
+ data['depot'] = 0
+ return data
+
+def print_solution_vrp(data, manager, routing, solution):
+ """Prints solution on console."""
+ print(f'Objective: {solution.ObjectiveValue()}')
+ max_route_distance = 0
+ All_Route_List = []
+ for vehicle_id in range(data['num_vehicles']):
+ cur_Route_List =[]
+ index = routing.Start(vehicle_id)
+ plan_output = 'Route for vehicle {}:\n'.format(vehicle_id)
+ route_distance = 0
+ while not routing.IsEnd(index):
+ plan_output += ' {} -> '.format(manager.IndexToNode(index))
+ previous_index = index
+ cur_Route_List.append(manager.IndexToNode(index))
+ index = solution.Value(routing.NextVar(index))
+ route_distance += routing.GetArcCostForVehicle(
+ previous_index, index, vehicle_id)
+ plan_output += '{}\n'.format(manager.IndexToNode(index))
+ cur_Route_List.append(manager.IndexToNode(index))
+ plan_output += 'Distance of the route: {}m\n'.format(route_distance)
+ print(plan_output)
+ All_Route_List.append(cur_Route_List)
+ max_route_distance = max(route_distance, max_route_distance)
+ print('Maximum of the route distances: {}m'.format(max_route_distance))
+ return All_Route_List
+
+def build_travel_path(coordinate_list,travel_list):
+ final_coord =[]
+ final_pair = []
+ for k in range(len(travel_list)):
+ travel_id = int(travel_list[k])
+ if travel_id==0:
+ continue
+ travel_id -=1
+ final_coord.append(coordinate_list[travel_id])
+ for k in range(len(final_coord)-1):
+ final_pair.append([k+1,k+2])
+ return final_coord,final_pair
+def ortools_build_path(save_path,coordinate_list,edge_pairs,edge_density,cutoff_length,relax_choice=True):
+
+ max_Value = 99999999
+
+ #number_vehicles = int(np.ceil(len(coordinate_list) / 100))
+ number_vehicles = int(np.ceil(len(coordinate_list) / 20))#find 100 did not work so well for very big structures for new model
+ if number_vehicles<=5:
+ number_vehicles=5
+ ortool_data = create_ortool_data_model(coordinate_list, edge_pairs, edge_density, number_vehicles,cutoff_length,relax_choice=relax_choice)
+ distance_path = os.path.join(save_path, "distance.npy" )
+ np.save(distance_path, np.array(ortool_data['distance_matrix']))
+ distance_matrix = ortool_data['distance_matrix']
+ drop_penalty= int(2*100*np.max(edge_density)+cutoff_length*200)#int(np.max(distance_matrix[distance_matrix!=max_Value]))
+ print("drop penalty %d"%drop_penalty)
+ # Create the routing index manager.
+ manager = pywrapcp.RoutingIndexManager(len(ortool_data['distance_matrix']),
+ ortool_data['num_vehicles'], ortool_data['depot'])
+ routing = pywrapcp.RoutingModel(manager)
+
+ # for vehicle_id in range(ortool_data['num_vehicles']):
+ # routing.ConsiderEmptyRouteCostsForVehicle(True, vehicle_id)
+
+ def distance_callback(from_index, to_index):
+ """Returns the distance between the two nodes."""
+ # Convert from routing variable Index to distance matrix NodeIndex.
+ from_node = manager.IndexToNode(from_index)
+ to_node = manager.IndexToNode(to_index)
+ return ortool_data['distance_matrix'][from_node][to_node]
+ # Create Routing Model.
+
+ print("start routing")
+
+ transit_callback_index = routing.RegisterTransitCallback(distance_callback)
+
+ # Define cost of each arc.
+ routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)
+ print("add capacity")
+ # add dimension for vrp program
+ dimension_name = 'Distance'
+ routing.AddDimension(
+ transit_callback_index,
+ 0, # no slack
+ max_Value, # vehicle maximum travel distance
+ True, # start cumul to zero
+ dimension_name)
+ print("add penalty")
+ # Allow to drop nodes.
+ penalty = drop_penalty
+ for node in range(1, len(ortool_data['distance_matrix'])):
+ routing.AddDisjunction([manager.NodeToIndex(node)], penalty)
+ # Setting first solution heuristic.
+ print("add params")
+ search_parameters = pywrapcp.DefaultRoutingSearchParameters()
+ search_parameters.first_solution_strategy = (
+ routing_enums_pb2.FirstSolutionStrategy.AUTOMATIC)
+ # search_parameters.first_solution_strategy = (
+ # routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
+ search_parameters.local_search_metaheuristic = (
+ routing_enums_pb2.LocalSearchMetaheuristic.GUIDED_LOCAL_SEARCH)
+ search_parameters.time_limit.seconds = max(3600,int(1000*len(coordinate_list)/400)) # few hours search
+ search_parameters.solution_limit = 1000
+ search_parameters.log_search = True
+ # Solve the problem.
+ solution = routing.SolveWithParameters(search_parameters)
+ travel_list = print_solution_vrp(ortool_data, manager, routing, solution)
+ for j, cur_travel_list in enumerate(travel_list):
+ travel_coord, travel_edge = build_travel_path(coordinate_list, cur_travel_list)
+ visualize_graph(save_path,"travel_path%d"%j,travel_coord,travel_edge)
+ tmp_save_path = os.path.join(save_path,"travel_record_%d.txt"%j)
+ with open(tmp_save_path,'w') as file:
+ for travel_id in cur_travel_list:
+ file.write("%d\n"%(travel_id-1))
diff --git a/CryoREAD/graph/path_utils.py b/CryoREAD/graph/path_utils.py
new file mode 100644
index 0000000..5004906
--- /dev/null
+++ b/CryoREAD/graph/path_utils.py
@@ -0,0 +1,40 @@
+from ops.os_operation import mkdir
+import pickle
+import os
+from graph.Graph_ops import Pick_Graph_coord,Filter_subgraph_edge_density,Prune_Selected_Path
+from graph.visualize_utils import visualize_graph
+from graph.ortool_path_ops import ortools_build_path
+
+def collect_all_searched_path(subgraphs, pho_point,pho_graph,search_dir,map_info_list,params):
+
+
+ all_path_file = os.path.join(search_dir,"All_path.pkl")
+ if os.path.exists(all_path_file):
+ with open(all_path_file, 'rb') as handle:
+ All_Path_List= pickle.load(handle)
+ return All_Path_List
+
+ All_Path_List = []
+ for cluster_id, subgraph in enumerate(subgraphs):
+ if len(subgraph)<5:
+ continue
+ subgraph_path = os.path.join(search_dir,"sub_graph%d"%cluster_id)
+ mkdir(subgraph_path)
+ pho_coordinate_list, pho_edge_pairs = Pick_Graph_coord(subgraph, pho_point.merged_cd_dens,
+ pho_graph.edge, pho_graph.Nnode, map_info_list)
+ visualize_graph(subgraph_path,"pho_graph%d"%cluster_id,pho_coordinate_list,pho_edge_pairs)
+ pho_edge_d_dens = Filter_subgraph_edge_density(subgraph,pho_graph.edge)
+ assert len(pho_edge_d_dens)==len(pho_edge_pairs)
+ listfiles = [x for x in os.listdir(subgraph_path) if ".cif" in x and 'path' in x]
+ listfiles.sort()
+ if len(listfiles)<1:
+ ortools_build_path(subgraph_path,pho_coordinate_list,pho_edge_pairs,pho_edge_d_dens,params['R'],relax_choice=True)
+ listfiles = [x for x in os.listdir(subgraph_path) if ".cif" in x and 'path' in x]
+ listfiles.sort()
+ Path_ID_List=Prune_Selected_Path(subgraph_path,listfiles,pho_point.merged_cd_dens,
+ pho_graph,map_info_list,"pho_prune",params['R'])
+ for item in Path_ID_List:
+ All_Path_List.append(item)
+ with open(all_path_file, 'wb') as handle:
+ pickle.dump(All_Path_List, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ return All_Path_List
diff --git a/CryoREAD/graph/reassign_ops.py b/CryoREAD/graph/reassign_ops.py
new file mode 100644
index 0000000..d1b1bf3
--- /dev/null
+++ b/CryoREAD/graph/reassign_ops.py
@@ -0,0 +1,177 @@
+
+
+from collections import defaultdict
+import numpy as np
+import os
+from ops.os_operation import mkdir
+from graph.DP_ops import Calculate_Distance_array,dynamic_assign_multi
+from graph.structure_modeling import build_clusters_advanced
+import pickle
+
+
+def reassign_basedonfrag(solve_frag_combine_list,order_key_index,order_chain_index,overall_dict,
+ all_sugar_location, ldp_size,save_dir,checking_stride,top_select,chain_dict,
+ All_Base_Path_List_sugar,All_Path_List_sugar,Path_P_align_list,Path_P_reverse_align_list,
+ Pho_Prob_Refer_Dict,Pho_Prob_Refer_Reverse_Dict,):
+ cur_frag_path = os.path.join(save_dir,"Fragment_reassign_%d_%d_%d.pkl"%(ldp_size,checking_stride,top_select))
+ if os.path.exists(cur_frag_path):
+ with open(cur_frag_path, 'rb') as handle:
+ Path_Assign_Dict= pickle.load(handle)
+
+ return Path_Assign_Dict
+ Path_Assign_Dict=defaultdict(dict)#[path_id][assignment starting_index]:[seq_info]
+ ldp_gap_penalty = 10 # only applied if not reasonable skip
+ seq_gap_penalty = 25 # applied once we need to skip an amino acid
+ for order_index in solve_frag_combine_list:
+ order_index = int(order_index)
+ current_key1 = order_key_index[order_index]
+ current_chain_candidate1 = order_chain_index[order_index]
+ split_key1 = current_key1.split("_")
+ ldp_path1= int(split_key1[0])
+ ldp_starting_index1 = int(split_key1[1])
+ current_seq_info1 = overall_dict[current_key1][current_chain_candidate1]
+ Path_Assign_Dict[ldp_path1][ldp_starting_index1]=current_seq_info1
+ old_frag_path = os.path.join(save_dir,"Fragment_assemble_%d_%d_%d.pkl"%(ldp_size,checking_stride,top_select))
+ with open(old_frag_path, 'wb') as handle:
+ pickle.dump(Path_Assign_Dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+
+
+ Path_ReAssign_Dict=defaultdict(dict)
+ for ldp_path_id in Path_Assign_Dict.keys():
+ cur_ldp_sugar_location= all_sugar_location[ldp_path_id]
+ current_path_dict = Path_Assign_Dict[ldp_path_id]
+ all_starting_index_keys = [int(x) for x in current_path_dict.keys()]#should already be in order
+ all_seq_assign_length = [len(current_path_dict[tmp_path_id]['match_seq']) for tmp_path_id in current_path_dict]
+ assert len(all_starting_index_keys)==len(np.unique(all_starting_index_keys))
+ all_starting_index_keys = np.array(all_starting_index_keys)
+ all_seq_assign_length = np.array(all_seq_assign_length)
+ sorted_indexes = np.argsort(all_starting_index_keys)
+ all_starting_index_keys = all_starting_index_keys[sorted_indexes]
+ all_seq_assign_length = all_seq_assign_length[sorted_indexes]
+ interval_clusters = build_clusters_advanced(all_starting_index_keys,all_seq_assign_length)
+ current_path = All_Path_List_sugar[ldp_path_id]
+ current_pho_align = Path_P_align_list[ldp_path_id]
+ current_pho_align_reverse = Path_P_reverse_align_list[ldp_path_id]
+
+ current_base_list = All_Base_Path_List_sugar[ldp_path_id]
+ updated_base_list = [np.array(current_base_list[j]) for j in range(len(current_base_list))]
+ updated_base_list = np.array(updated_base_list)
+ #update the reward matrix based on the pho updates
+ pho_order_base_prob_list =[np.array(Pho_Prob_Refer_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Dict else np.zeros(4) for k in current_pho_align]
+ pho_reverse_base_prob_list = [np.array(Pho_Prob_Refer_Reverse_Dict[int(k)]) if k!=-1 and int(k) in Pho_Prob_Refer_Reverse_Dict else np.zeros(4) for k in current_pho_align_reverse]
+
+ assert len(pho_order_base_prob_list)==len(updated_base_list) and len(pho_order_base_prob_list)==len(pho_reverse_base_prob_list)
+ updated_base_score_list = [updated_base_list[j]+pho_order_base_prob_list[j]+pho_reverse_base_prob_list[j] for j in range(len(current_base_list))]
+ updated_base_score_list = np.array(updated_base_score_list)
+ for cluster_id in interval_clusters:
+ current_cluster_id_list = interval_clusters[cluster_id]
+ current_cluster_id_list.sort()
+
+ begin_index = current_cluster_id_list[0]
+ if len(current_cluster_id_list)==1:
+ #for non-overlapped one, simply take them.
+ Path_ReAssign_Dict[ldp_path_id][begin_index]=current_path_dict[begin_index]
+ continue
+ #cur_ldp_size = len(current_path_dict[current_cluster_id_list[-1]]['match_seq'])
+ #end_index = current_cluster_id_list[-1]+cur_ldp_size
+ #check all the matches to get the final extended region
+ end_index=-1
+ for k in range(len(current_cluster_id_list)):
+ cur_ldp_size = len(current_path_dict[current_cluster_id_list[k]]['match_seq'])
+ cur_end_index = current_cluster_id_list[k]+cur_ldp_size
+ if end_index<=cur_end_index:
+ end_index=cur_end_index
+
+ start_ldp_index=begin_index
+ end_ldp_index=end_index
+ fragment_ldp_location = cur_ldp_sugar_location[start_ldp_index:end_ldp_index]
+ #begin index, end_index suggested the ldps that needs for current re-dp.
+
+ #next step: find the seqeucen to assign.
+ current_seq_info = current_path_dict[current_cluster_id_list[0]]
+ overall_direction = current_seq_info['direction']
+ overall_chain = current_seq_info['chain']
+ overall_chain_length = current_seq_info['chain_length']
+ init_interval = current_path_dict[current_cluster_id_list[0]]['interval']
+ for path_id in current_cluster_id_list:
+ current_interval = current_path_dict[path_id]['interval']
+ if current_interval[0]<=init_interval[0]:
+ init_interval[0]=current_interval[0]
+ if current_interval[1]>=init_interval[1]:
+ init_interval[1]=current_interval[1]
+ print("current interval:",init_interval)
+
+ current_chain_sequence = chain_dict[overall_chain]
+ if overall_direction==-1:
+ current_chain_sequence = current_chain_sequence[::-1]
+ begin_seq_index = max(0,int(init_interval[0]-ldp_size/2))
+ end_seq_index = min(overall_chain_length,int(init_interval[1]+ldp_size/2))
+ current_chain_sequence = current_chain_sequence[begin_seq_index:end_seq_index]
+
+
+
+
+ #then the fragment information to calculate dp
+
+ study_ldp_base_list = updated_base_score_list[start_ldp_index:end_ldp_index]
+ dp_save_path = os.path.join(save_dir,"path%d_starting%d_chain_%s"%(ldp_path_id,start_ldp_index,overall_chain))
+ mkdir(dp_save_path)
+ fragment_distance_array = Calculate_Distance_array(fragment_ldp_location)
+
+ max_score, match_seq_line,match_seq_interval = dynamic_assign_multi(study_ldp_base_list,current_chain_sequence,
+ ldp_gap_penalty,seq_gap_penalty,fragment_distance_array, dp_save_path)
+ path_assign_score_list =max_score
+ path_assign_score_list=np.array(path_assign_score_list)
+ path_score_index = np.argsort(path_assign_score_list)
+ path_score_index = path_score_index[::-1]#from bigger to smaller
+ for select_index in path_score_index[:1]:
+
+ current_score = max_score[select_index]
+ current_match_seq = match_seq_line[select_index]
+ current_match_seq_interval = match_seq_interval[select_index]
+
+
+ new_dict={}
+ new_dict['match_seq']=current_match_seq
+ new_dict['score']=current_score
+ new_dict['interval']= [begin_seq_index+current_match_seq_interval[0],begin_seq_index+current_match_seq_interval[1]]
+ new_dict['chain']=overall_chain
+ new_dict['direction']=overall_direction
+ new_dict['chain_length'] = overall_chain_length
+ Path_ReAssign_Dict[ldp_path_id][start_ldp_index]= new_dict
+ with open(cur_frag_path, 'wb') as handle:
+ pickle.dump(Path_ReAssign_Dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
+ return Path_ReAssign_Dict
+
+
+def merge_assign_geo_seq(overall_geo_dict,overall_seq_dict):
+ Final_Dict=defaultdict(dict)
+ for ldp_path_id in overall_geo_dict.keys():
+ current_dict = overall_geo_dict[ldp_path_id]
+ current_direction = current_dict['direction']
+ current_match_seq = current_dict['match_seq']
+ #read sequence assignment information
+ current_seqpath_dict = overall_seq_dict[ldp_path_id]
+ all_starting_index_keys = [int(x) for x in current_seqpath_dict.keys()]#should already be in order
+ all_seq_assign_length = [len(current_seqpath_dict[tmp_path_id]['match_seq']) for tmp_path_id in current_seqpath_dict]
+ assert len(all_starting_index_keys)==len(np.unique(all_starting_index_keys))
+ all_starting_index_keys = np.array(all_starting_index_keys)
+ all_seq_assign_length = np.array(all_seq_assign_length)
+ sorted_indexes = np.argsort(all_starting_index_keys)
+ all_starting_index_keys = all_starting_index_keys[sorted_indexes]
+ all_seq_assign_length = all_seq_assign_length[sorted_indexes]
+
+
+ for starting_index in all_starting_index_keys:
+ current_seq_info = current_seqpath_dict[int(starting_index)]
+ seqassign_match_seq = current_seq_info['match_seq']
+ #no matter direction 0 or 1, we always assign from left, then we also do not need to switch back the pho/sugar location again
+ #cur_direction = current_seq_info['direction']
+ prev_seq = current_match_seq
+ current_match_seq = current_match_seq[:starting_index]+seqassign_match_seq+current_match_seq[starting_index+len(seqassign_match_seq):]
+ print("reassign %d index of %s to %s"%(starting_index,prev_seq,current_match_seq))
+
+ print("final seq+geo sequence %s"%current_match_seq)
+ current_dict['match_seq']=current_match_seq
+ Final_Dict[ldp_path_id]=current_dict
+ return Final_Dict
diff --git a/CryoREAD/graph/refine_structure.py b/CryoREAD/graph/refine_structure.py
new file mode 100644
index 0000000..77eb025
--- /dev/null
+++ b/CryoREAD/graph/refine_structure.py
@@ -0,0 +1,52 @@
+import os
+from ops.os_operation import mkdir
+import shutil
+from data_processing.format_pdb import format_pdb,remove_op3_pdb
+def refine_structure(input_pdb,input_map,output_dir,params):
+ mkdir(output_dir)
+ assert input_pdb[-4:]==".pdb"
+ format_pdb_path = os.path.join(output_dir,"input_format.pdb")
+ format_pdb(input_pdb,format_pdb_path)
+ if "resolution" not in params:
+ params['resolution'] =5
+ print("no resolutino input detected, use 5A as default resolution for refinement!")
+ os.system('cd %s; phenix.real_space_refine %s %s resolution=%.4f '
+ 'output.suffix="_phenix_refine" '
+ 'skip_map_model_overlap_check=True'%(output_dir,
+ format_pdb_path,input_map,params['resolution']))
+ gen_pdb_path = format_pdb_path[:-4]+"_phenix_refine_000.pdb"
+ count_check=0
+ while not os.path.exists(gen_pdb_path) and count_check<5:
+ gen_pdb_path = format_pdb_path[:-4]+"_phenix_refine_00%d.pdb"%(count_check+1)
+ count_check+=1
+ if not os.path.exists(gen_pdb_path):
+ print("1st round phenix refinement failed!")
+ return
+ refine1_pdb_path = os.path.join(output_dir,"Refine_cycle1.pdb")
+ shutil.copy(gen_pdb_path,refine1_pdb_path)
+
+ refine2_pdb_path = os.path.join(output_dir,"Refine_cycle2.pdb")
+ from coot.coot_refine_structure import coot_refine_structure
+ coot_software="coot"
+ coot_refine_structure(refine1_pdb_path,input_map,refine2_pdb_path,coot_software)
+ if not os.path.exists(refine2_pdb_path):
+ print("2nd round coot refinement failed!")
+ return
+
+ refine3_pdb_path = os.path.join(output_dir,"Refine_cycle3.pdb")
+ os.system('cd %s; phenix.real_space_refine %s %s resolution=%.4f '
+ 'output.suffix="_phenix_refine" '
+ 'skip_map_model_overlap_check=True'%(output_dir,refine2_pdb_path,input_map,params['resolution']))
+ phenix_final_pdb = refine2_pdb_path[:-4]+"_phenix_refine_000.pdb"
+ count_check=0
+ while not os.path.exists(phenix_final_pdb) and count_check<5:
+ phenix_final_pdb = refine2_pdb_path[:-4]+"_phenix_refine_00%d.pdb"%(count_check+1)
+ count_check+=1
+ if not os.path.exists(phenix_final_pdb):
+ print("3rd round phenix refinement failed!")
+ return
+ shutil.move(phenix_final_pdb,refine3_pdb_path)
+
+
+
+
diff --git a/CryoREAD/graph/structure_modeling.py b/CryoREAD/graph/structure_modeling.py
new file mode 100644
index 0000000..ccc73a4
--- /dev/null
+++ b/CryoREAD/graph/structure_modeling.py
@@ -0,0 +1,177 @@
+
+from collections import defaultdict
+import numpy as np
+import os
+from ops.os_operation import mkdir
+from graph.io_utils import append_cif_info
+from ops.cif_utils import cif2pdb
+from graph.structure_utils import build_noclusters_extra,build_clusters,\
+ build_clusters_advanced,merge_cluster_cif_PS,clean_pho_assign_info_list
+from atomic.io_utils import Write_Atomic_Fraginfo_cif
+
+
+def build_atomic_fragment_cluster_cif_SP(Path_Assign_Dict,all_sugar_location,Path_P_align_list,Path_P_reverse_align_list,
+ ldp_size,save_dir,pho_point,map_info_list,refer_base_location,DNA_Label,ext_name="path"):
+ overall_score=0
+ check_file_list=[]
+
+ for ldp_path_id in Path_Assign_Dict.keys():
+ cur_ldp_sugar_location= all_sugar_location[ldp_path_id]
+ cur_pho_ldp_nodeid_list = Path_P_align_list[ldp_path_id]
+ cur_pho_ldp_nodeid_reverse_list = Path_P_reverse_align_list[ldp_path_id]
+ current_path_dict = Path_Assign_Dict[ldp_path_id]
+ all_starting_index_keys = [int(x) for x in current_path_dict.keys()]#should already be in order
+ all_seq_assign_length = [len(current_path_dict[tmp_path_id]['match_seq']) for tmp_path_id in current_path_dict]
+ assert len(all_starting_index_keys)==len(np.unique(all_starting_index_keys))
+ all_starting_index_keys = np.array(all_starting_index_keys)
+ all_seq_assign_length = np.array(all_seq_assign_length)
+ sorted_indexes = np.argsort(all_starting_index_keys)
+ all_starting_index_keys = all_starting_index_keys[sorted_indexes]
+ all_seq_assign_length = all_seq_assign_length[sorted_indexes]
+ if 'extra' in ext_name:
+ interval_clusters = build_noclusters_extra(all_starting_index_keys,all_seq_assign_length)
+ else:
+ interval_clusters = build_clusters_advanced(all_starting_index_keys,all_seq_assign_length)#overlaped regions merged as one framenet. non-overlap will be different fragments
+ print("we clustered %d clusters out of %d fragments"%(len(interval_clusters),len(all_starting_index_keys)))
+ for cluster_id in interval_clusters:
+ current_cluster_id_list = interval_clusters[cluster_id]
+ tmp_save_path = os.path.join(save_dir,"%s_%d_cluster_%d.cif"%(ext_name, ldp_path_id,cluster_id))
+ tmp_verify_path = os.path.join(save_dir,"%s_%d_cluster_%d_verify.txt"%(ext_name,ldp_path_id,cluster_id))
+ fragment_info_list,frag_score = merge_cluster_cif_PS(current_path_dict,current_cluster_id_list,cur_ldp_sugar_location,
+ tmp_verify_path,cur_pho_ldp_nodeid_list,cur_pho_ldp_nodeid_reverse_list,pho_point,map_info_list)
+ overall_score+=frag_score
+ if len(fragment_info_list)<=2:
+ #not enough to build an atomic model, skip it.
+ continue
+ #clean pho locations in case we have one pho assigned to 2 sugars in the path
+ fragment_info_list,_ = clean_pho_assign_info_list(fragment_info_list,pho_point.merged_cd_dens[:,:3],map_info_list)
+ fragment_info_list,further_flag = clean_pho_assign_info_list(fragment_info_list,pho_point.merged_cd_dens[:,:3],map_info_list,round=2)
+ if further_flag:
+ fragment_info_list,_ = clean_pho_assign_info_list(fragment_info_list,pho_point.merged_cd_dens[:,:3],map_info_list,round=2)
+ if len(fragment_info_list)<=2:
+ #not enough to build an atomic model, skip it.
+ continue
+ Write_Atomic_Fraginfo_cif("%s_%d_cluster_%d"%(ext_name,ldp_path_id,cluster_id),fragment_info_list,refer_base_location,tmp_save_path,DNA_Label,map_info_list)
+ check_file_list.append(tmp_save_path)
+
+ return check_file_list,overall_score
+
+def Build_Atomic_Structure(solve_frag_combine_list,order_key_index,order_chain_index,overall_dict,
+ all_sugar_location, ldp_size,save_dir,checking_stride,top_select,
+ Path_P_align_list,Path_P_reverse_align_list,pho_point,map_info_list,Extra_Added_Assign_Dict,refer_base_location,DNA_Label):
+ Path_Assign_Dict=defaultdict(dict)#[path_id][assignment starting_index]:[seq_info]
+ for order_index in solve_frag_combine_list:
+ order_index = int(order_index)
+ current_key1 = order_key_index[order_index]
+ current_chain_candidate1 = order_chain_index[order_index]
+ split_key1 = current_key1.split("_")
+ ldp_path1= int(split_key1[0])
+ ldp_starting_index1 = int(split_key1[1])
+ current_seq_info1 = overall_dict[current_key1][current_chain_candidate1]
+ Path_Assign_Dict[ldp_path1][ldp_starting_index1]=current_seq_info1
+ #save all non-overlap in one dir, while extra in another dir.
+ non_overlap_dir = os.path.join(save_dir,"CP_SAT_frags")
+ mkdir(non_overlap_dir)
+ check_file_list,overall_score1 = build_atomic_fragment_cluster_cif_SP(Path_Assign_Dict,all_sugar_location,Path_P_align_list,Path_P_reverse_align_list,
+ ldp_size,non_overlap_dir,pho_point,map_info_list,refer_base_location,DNA_Label,ext_name="path")
+
+
+ overlap_dir = os.path.join(save_dir,"extra_support_frags")
+ mkdir(overlap_dir)
+ check_file_list2,overall_score2 = build_atomic_fragment_cluster_cif_SP(Extra_Added_Assign_Dict,all_sugar_location,Path_P_align_list,Path_P_reverse_align_list,
+ ldp_size,overlap_dir,pho_point,map_info_list,refer_base_location,DNA_Label,ext_name="extra")
+
+ check_file_list = check_file_list+check_file_list2
+
+ overall_score = overall_score1+overall_score2
+ fragment_all_path = os.path.join(save_dir,"Final_Assemble_%d_%d_%d.cif"%(ldp_size,checking_stride,top_select))
+ with open(fragment_all_path,'w') as file:
+ file.write("#score: %f\n"%overall_score)
+ check_file_list.sort()
+ for item in check_file_list:
+ #cur_frag_path = os.path.join(save_dir,item)
+ cur_frag_name = os.path.split(item)[1]
+ cur_entry_id = cur_frag_name.replace(".cif","")
+ append_cif_info(cur_entry_id,item,fragment_all_path)
+
+ fragment_all_path = os.path.join(save_dir,"Final_Assemble_%d_%d_%d.pdb"%(ldp_size,checking_stride,top_select))
+ with open(fragment_all_path,'w') as file:
+ file.write("#score: %f\n"%overall_score)
+ check_file_list.sort()
+ Natm=1
+ Nres=0
+ for item in check_file_list:
+ cur_pdb_name = item[:-4]+".pdb"
+ prev_resi = None
+ with open(cur_pdb_name,'r') as rfile:
+ with open(fragment_all_path,"a+") as wfile:
+ for line in rfile:
+ #advance modifying the residue id to build
+
+ if (line.startswith('ATOM')):
+ #chain_id, current_seq_index,atom_name2, cur_pho_position,nuc_type,avg_score
+ chain_name = line[21]
+ atom_name = line[12:16]
+ x=float(line[30:38])
+ y=float(line[38:46])
+ z=float(line[46:55])
+ resi=int(line[22:26])
+ score = float(line[60:68])
+ resn = line[17:20]
+ if resi!=prev_resi:
+ Nres+=1
+ prev_resi=resi
+ line=""
+ line += "ATOM%7d %-4s %3s%2s%4d " % (Natm, atom_name,resn, chain_name,Nres)
+ line = line + "%8.3f%8.3f%8.3f%6.2f%6.2f\n" % (x,y,z, 1.0, score)
+ wfile.write(line)
+
+ Natm+=1
+ #wfile.write(line)
+ else:
+ wfile.write(line)
+
+
+
+def Build_Atomic_Model_nonoverlap_frag(Path_Assign_Dict,
+ all_sugar_location, ldp_size,save_dir,checking_stride,top_select,
+ Path_P_align_list,Path_P_reverse_align_list,pho_point,map_info_list,
+ Extra_Added_Assign_Dict,refer_base_location,DNA_Label):
+ non_overlap_dir = os.path.join(save_dir,"CP_SAT_frags")
+ mkdir(non_overlap_dir)
+ check_file_list,overall_score1 = build_atomic_fragment_cluster_cif_SP(Path_Assign_Dict,all_sugar_location,Path_P_align_list,Path_P_reverse_align_list,
+ ldp_size,non_overlap_dir,pho_point,map_info_list,refer_base_location,DNA_Label,ext_name="extra")
+
+
+ overlap_dir = os.path.join(save_dir,"extra_support_frags")
+ mkdir(overlap_dir)
+ check_file_list2,overall_score2 = build_atomic_fragment_cluster_cif_SP(Extra_Added_Assign_Dict,all_sugar_location,Path_P_align_list,Path_P_reverse_align_list,
+ ldp_size,overlap_dir,pho_point,map_info_list,refer_base_location,DNA_Label,ext_name="extra2")
+
+ check_file_list = check_file_list+check_file_list2
+
+ overall_score = overall_score1+overall_score2
+ fragment_all_path = os.path.join(save_dir,"Final_Assemble_%d_%d_%d.cif"%(ldp_size,checking_stride,top_select))
+ with open(fragment_all_path,'w') as file:
+ file.write("#score: %f\n"%overall_score)
+ check_file_list.sort()
+ for item in check_file_list:
+ #cur_frag_path = os.path.join(save_dir,item)
+ cur_frag_name = os.path.split(item)[1]
+ cur_entry_id = cur_frag_name.replace(".cif","")
+ append_cif_info(cur_entry_id,item,fragment_all_path)
+
+ fragment_all_path = os.path.join(save_dir,"Final_Assemble_%d_%d_%d.pdb"%(ldp_size,checking_stride,top_select))
+ with open(fragment_all_path,'w') as file:
+ file.write("#score: %f\n"%overall_score)
+ check_file_list.sort()
+ pdb_file_list=[]
+ for item in check_file_list:
+ pdb_file_name = item.replace(".cif",".pdb")
+ cif2pdb(item,pdb_file_name)
+ pdb_file_list.append(pdb_file_name)
+ with open(fragment_all_path,'a+') as wfile:
+ for item in pdb_file_list:
+ with open(item,'r') as rfile:
+ for line in rfile:
+ wfile.write(line)
diff --git a/CryoREAD/graph/structure_utils.py b/CryoREAD/graph/structure_utils.py
new file mode 100644
index 0000000..bcc9e82
--- /dev/null
+++ b/CryoREAD/graph/structure_utils.py
@@ -0,0 +1,476 @@
+from collections import defaultdict
+import numpy as np
+import os
+from graph.LDP_ops import Convert_LDPcoord_To_Reallocation
+from ops.os_operation import mkdir
+from scipy.spatial.distance import cdist
+from ops.math_calcuation import calculate_distance
+import random
+
+
+
+def build_clusters(all_starting_index_keys,ldp_size):
+ all_starting_index_keys.sort()
+ cluster_dict=defaultdict(list)
+ cluster_id=0
+ prev_index = all_starting_index_keys[0]
+ cluster_dict[cluster_id].append(prev_index)
+ for k in range(1,len(all_starting_index_keys)):
+ current_index = all_starting_index_keys[k]
+ if prev_index+ldp_size>current_index:
+ cluster_dict[cluster_id].append(current_index)
+ else:
+ cluster_id+=1
+ cluster_dict[cluster_id].append(current_index)
+ prev_index = current_index
+ return cluster_dict
+
+def build_clusters_advanced(all_starting_index_keys,all_seq_indexes):
+ all_starting_index_keys.sort()
+ cluster_dict=defaultdict(list)
+ cluster_id=0
+ prev_index = all_starting_index_keys[0]
+ cluster_dict[cluster_id].append(prev_index)
+ for k in range(1,len(all_starting_index_keys)):
+ current_index = all_starting_index_keys[k]
+ prev_seq_length = all_seq_indexes[k-1]
+ if prev_index+prev_seq_length>current_index:
+ cluster_dict[cluster_id].append(current_index)
+ else:
+ cluster_id+=1
+ cluster_dict[cluster_id].append(current_index)
+ prev_index = current_index
+ return cluster_dict
+def build_noclusters_extra(all_starting_index_keys,all_seq_indexes):
+ cluster_id=0
+ cluster_dict=defaultdict(list)
+ for k in range(len(all_starting_index_keys)):
+ current_index = all_starting_index_keys[k]
+ cluster_dict[cluster_id].append(current_index)
+ cluster_id+=1
+ return cluster_dict
+def verify_chain_check(current_path_dict,current_cluster_id_list):
+ current_seq_info = current_path_dict[current_cluster_id_list[0]]
+ chain_id = current_seq_info['chain']
+ cur_direction = current_seq_info['direction']
+ cur_chain_length = current_seq_info['chain_length']
+ for k in range(1,len(current_cluster_id_list)):
+ current_seq_info = current_path_dict[current_cluster_id_list[k]]
+ if current_seq_info['chain']!=chain_id:
+ return False
+ if current_seq_info['direction']!=cur_direction:
+ return False
+ if current_seq_info['chain_length']!=cur_chain_length:
+ return False
+ return True
+
+def merge_cluster_cif_PS(current_path_dict,current_cluster_id_list,ldp_sugar_location,tmp_verify_path,
+ Path_P_align_list,Path_P_reverse_align_list,pho_point,map_info_list):
+ pho_merged_cd = pho_point.merged_cd_dens[:,:3]
+ if len(current_cluster_id_list)==1:
+ current_seq_info = current_path_dict[current_cluster_id_list[0]]
+ starting_index = int(current_cluster_id_list[0])
+ match_seq = current_seq_info['match_seq']
+ chain_id = current_seq_info['chain']
+ cur_score = current_seq_info['score']
+ align_length= len([x for x in match_seq if x!="-"])
+ avg_score = cur_score/align_length
+ cur_interval = current_seq_info['interval']
+ cur_direction = current_seq_info['direction']
+ cur_chain_length = current_seq_info['chain_length']
+ cur_ldp_size = len(match_seq)
+ current_ldp_location = ldp_sugar_location[starting_index:starting_index+cur_ldp_size]
+ current_pho_ldp_location = Path_P_align_list[starting_index:starting_index+cur_ldp_size]
+ if cur_direction==-1:
+ #adjust interval and write from end to begin
+ cur_interval = [cur_chain_length-cur_interval[1]-1,cur_chain_length-cur_interval[0]]#reverse chain to correct direction
+ match_seq = match_seq[::-1]
+ current_ldp_location = current_ldp_location[::-1]
+ current_pho_ldp_location = Path_P_reverse_align_list[starting_index:starting_index+cur_ldp_size]
+ current_pho_ldp_location = current_pho_ldp_location[::-1]
+ #now current_pho_ldp_location are only id list
+ current_pho_ldp_align_info = current_pho_ldp_location
+ current_pho_ldp_location = [pho_merged_cd[int(kk)] for kk in current_pho_ldp_location ]
+ current_pho_ldp_location = Convert_LDPcoord_To_Reallocation(current_pho_ldp_location , map_info_list)
+ #start to ensemble all the information
+ fragment_info_list = []
+ current_seq_index = cur_interval[0]
+ for k in range(len(match_seq)):
+ if match_seq[k]=="-":
+ continue
+ else:
+ cur_pho_tmp_position = current_pho_ldp_location[k]
+ if current_pho_ldp_align_info[k]>=0:#make sure it has assigned P
+ fragment_info_list.append([chain_id,current_seq_index+1,'P',cur_pho_tmp_position,match_seq[k],avg_score])
+ cur_resi_position = current_ldp_location[k]
+ fragment_info_list.append([chain_id,current_seq_index+1,"C4'",cur_resi_position,match_seq[k],avg_score])
+
+ current_seq_index+=1
+
+ return fragment_info_list,cur_score
+ print("starting merge fragement process")
+ current_cluster_id_list.sort()
+ begin_index = current_cluster_id_list[0]
+ #cur_ldp_size = len(current_path_dict[current_cluster_id_list[-1]]['match_seq'])
+ #end_index = current_cluster_id_list[-1]+cur_ldp_size
+ #check all the matches to get the final extended region
+ end_index=-1
+ for k in range(len(current_cluster_id_list)):
+ cur_ldp_size = len(current_path_dict[current_cluster_id_list[k]]['match_seq'])
+ cur_end_index = current_cluster_id_list[k]+cur_ldp_size
+ if end_index<=cur_end_index:
+ end_index=cur_end_index
+
+ overall_range=[begin_index,end_index]
+ print("current range:",overall_range)
+ assert verify_chain_check(current_path_dict,current_cluster_id_list)#assure same chain, same direction, same chain_length
+ current_seq_info = current_path_dict[current_cluster_id_list[0]]
+ overall_direction = current_seq_info['direction']
+ overall_chain = current_seq_info['chain']
+ overall_chain_length = current_seq_info['chain_length']
+ #1st fill all the match seq into the begin_index to ending index
+ #the rule is use the higher score fragment to avoid double assignment
+ overall_match_seq = " "*(end_index-begin_index)
+ overall_score_list = [0]*(end_index-begin_index)
+ overall_residue_id_list = [-1]*(end_index-begin_index)
+ #build a score dict first
+ score_dict ={}
+ score_list=[]
+ for path_id in current_cluster_id_list:
+ current_score = current_path_dict[path_id]['score']
+ #add a very small random variable to avoid link to same frag
+ rand_score = random.random()*1e-4
+ score_dict["%.10f"%(current_score+rand_score)]=path_id #starting index
+ score_list.append(current_score+rand_score)
+ #write low score first, then high score automatically overwrite
+ score_sort = np.argsort(score_list)
+
+ for k in range(len(score_sort)):
+ sort_index = int(score_sort[k])
+ cur_score = score_list[sort_index]
+ current_path_id = score_dict["%.10f"%cur_score]
+ shift_pos = int(current_path_id-begin_index)
+ cur_match_seq = current_path_dict[current_path_id]['match_seq']
+ cur_ldp_size = len(cur_match_seq)#in case it's short for some near tail regions
+ overall_match_seq=overall_match_seq[:shift_pos]+cur_match_seq+overall_match_seq[(shift_pos+cur_ldp_size):]
+ cur_avg_score = cur_score/len([x for x in cur_match_seq if x!="-"])
+ for x in range(shift_pos,shift_pos+cur_ldp_size):
+ overall_score_list[x]=cur_avg_score
+ cur_interval = current_path_dict[current_path_id]['interval']
+ current_seq_index = cur_interval[0]
+ for kk in range(len(cur_match_seq)):
+ if cur_match_seq[kk]=="-":
+ continue
+ else:
+ overall_residue_id_list[shift_pos+kk]=current_seq_index
+ current_seq_index+=1
+
+ current_ldp_location = ldp_sugar_location[begin_index:end_index]
+ current_pho_ldp_location = Path_P_align_list[begin_index:end_index]
+
+ if overall_direction==-1:
+ overall_match_seq=overall_match_seq[::-1]
+ current_ldp_location = current_ldp_location[::-1]
+ overall_residue_id_list =overall_residue_id_list[::-1]
+ overall_score_list = overall_score_list[::-1]
+ #then change the residue id
+ overall_residue_id_list=[overall_chain_length-x-1 if x!=-1 else -1 for x in overall_residue_id_list]
+ current_pho_ldp_location = Path_P_reverse_align_list[begin_index:end_index]
+ current_pho_ldp_location = current_pho_ldp_location[::-1]
+ current_pho_ldp_align_info = current_pho_ldp_location
+ current_pho_ldp_location = [pho_merged_cd[int(kk)] for kk in current_pho_ldp_location ]
+ current_pho_ldp_location = Convert_LDPcoord_To_Reallocation(current_pho_ldp_location , map_info_list)
+ fragment_info_list = []
+ overall_score=0
+ for k in range(len(overall_match_seq)):
+ if overall_match_seq[k]=="-":
+ continue
+ else:
+ cur_location = current_ldp_location[k]
+ cur_pho_location = current_pho_ldp_location[k]
+ cur_score = overall_score_list[k]
+ cur_residue = overall_residue_id_list[k]+1#the original starting from 0, not good
+ #make sure assignment in ['A','U','C','G']
+ assert overall_match_seq[k] in ['A','U','C','G','T']
+ overall_score +=cur_score
+ assert cur_residue!=-1
+ if current_pho_ldp_align_info[k]>=0:#avoid non-assign regions always give a wrong point
+ fragment_info_list.append([overall_chain,cur_residue,"P",cur_pho_location,overall_match_seq[k],cur_score])
+ fragment_info_list.append([overall_chain,cur_residue,"C4'",cur_location,overall_match_seq[k],cur_score])
+
+ #write record for overall output, overall residue id
+ #Purpose: Verify our fragment combination is correct
+ line_list =[""]*len(overall_match_seq)
+ #first write the overall residue type and overall
+ for k in range(len(overall_match_seq)):
+ line = line_list[k]
+ line += overall_match_seq[k]+"\t"
+ line += str(overall_residue_id_list[k]+1)+"\t"
+ line += "%.2f\t"%overall_score_list[k]
+ line_list[k]=line
+ #iteratively write different seq
+ for path_id in current_cluster_id_list:
+ starting_index = int(path_id)
+ shift_pos = int(starting_index-begin_index)
+ current_seq_info = current_path_dict[path_id]
+ match_seq = current_seq_info['match_seq']
+ cur_ldp_size = len(match_seq)
+ cur_useful_size = len([x for x in match_seq if x!="-"])
+ cur_interval = current_seq_info['interval']
+ cur_direction = current_seq_info['direction']
+ cur_chain_length = current_seq_info['chain_length']
+ if cur_direction==-1:
+ #adjust interval and write from end to begin
+ cur_interval = [cur_chain_length-cur_interval[0]-cur_useful_size,cur_chain_length-cur_interval[0]]#reverse chain to correct direction
+ match_seq = match_seq[::-1]
+ shift_pos = end_index-begin_index-shift_pos-cur_ldp_size#prev it's the distance to the ending part, now we swapped
+ current_seq_index = cur_interval[0]
+ cur_score = current_seq_info['score']
+ align_length= len([x for x in match_seq if x!="-"])
+ avg_score = cur_score/align_length
+ for k in range(len(overall_match_seq)):
+ line = line_list[k]
+ if k>=shift_pos and k0:
+ distance_array = cdist(Pho_Position_List,Pho_Position_List)
+ Remove_SeqID_set = set()#once in this set,should update pho location to match
+ Sugar_ID_list = list(Sugar_Record_Dict.keys())
+ remove_pair_list=set()
+ for k in range(len(distance_array)):
+ current_check_distance = distance_array[k]
+ current_seqid = Pho_Seqid_list[k]
+ selected_index = np.argwhere(current_check_distance<=0.1)
+ for tmp_index in selected_index:
+ if tmp_index==k:
+ continue
+ next_seqid = Pho_Seqid_list[int(tmp_index)]
+ Remove_SeqID_set.add(next_seqid)
+ Remove_SeqID_set.add(current_seqid)
+ if round==2:
+ min_seq_id = min(current_seqid,next_seqid)
+ max_seq_id = max(current_seqid,next_seqid)
+ remove_pair_list.add("%d_%d"%(min_seq_id,max_seq_id))
+ else:
+ #no pho assigned, assign now by our closest pho
+ distance_array = cdist(Sugar_Position_List,pho_ldp_location)
+ final_frag_info_list =[]
+ for i in range(len(frag_info_list)):
+ chain_id,current_seq_index,atom_name, cur_atom_position,nuc_type,avg_score =frag_info_list[i]
+ if atom_name=="C4'":
+ final_frag_info_list.append(frag_info_list[i])
+
+ sugar_index= int(Sugar_IDdefine_dict[int(current_seq_index)])
+ if sugar_index<1:
+ continue
+ cur_sp_distance1= distance_array[sugar_index-1]
+ cur_sp_distance2= distance_array[sugar_index]
+ sp_distance_cur = cur_sp_distance1+cur_sp_distance2
+ close_location_index = int(np.argmin(sp_distance_cur))
+ if cur_sp_distance1[close_location_index]>cutoff or cur_sp_distance2[close_location_index]>cutoff:
+ continue
+ select_pho_location = pho_ldp_location[close_location_index]
+ cur_info = [chain_id,current_seq_index,'P', select_pho_location,nuc_type,avg_score]
+ print("previous loc ",cur_atom_position,"updated loc: ",select_pho_location)
+ final_frag_info_list.append(cur_info)
+ return final_frag_info_list,True
+ print("duplicate use of %d/%d pho positions in atomic"%(len(Remove_SeqID_set),len(Sugar_ID_list)))
+ if round==1:
+ #check the pho is too far away from any one of 2 consecutive paths
+ for i in range(1,len(Sugar_ID_list)):
+ current_seq_index = Sugar_ID_list[i-1]
+ next_seq_index = Sugar_ID_list[i]
+ _,_,_, cur_sugar_position,_,_ =Sugar_Record_Dict[current_seq_index]
+ _,_,_, next_sugar_position,_,_ =Sugar_Record_Dict[next_seq_index]
+ if next_seq_index not in Pho_Record_Dict:
+ Remove_SeqID_set.add(next_seq_index)
+ continue
+ _,_,_, next_pho_position,_,_ =Pho_Record_Dict[next_seq_index]
+ s1_p_distance = calculate_distance(cur_sugar_position,next_pho_position)
+ s2_p_distance = calculate_distance(next_sugar_position,next_pho_position)
+ if s1_p_distance>=cutoff or s2_p_distance>=cutoff:
+ Remove_SeqID_set.add(next_seq_index)
+ #recalculated the distance
+ print("revise %d/%d pho positions in atomic"%(len(Remove_SeqID_set),len(Sugar_ID_list)))
+ print(Remove_SeqID_set)
+ distance_array = cdist(Sugar_Position_List,pho_ldp_location)
+ final_frag_info_list =[]
+ for i in range(len(frag_info_list)):
+ chain_id,current_seq_index,atom_name, cur_atom_position,nuc_type,avg_score =frag_info_list[i]
+ if atom_name=="C4'":
+ final_frag_info_list.append(frag_info_list[i])
+ continue
+
+ if current_seq_index not in Remove_SeqID_set:
+ final_frag_info_list.append(frag_info_list[i])
+ continue
+ if i==0:
+ final_frag_info_list.append(frag_info_list[i])
+ continue
+ sugar_index= int(Sugar_IDdefine_dict[int(current_seq_index)])
+ cur_sp_distance1= distance_array[sugar_index-1]
+ cur_sp_distance2= distance_array[sugar_index]
+ sp_distance_cur = cur_sp_distance1+cur_sp_distance2
+ close_location_index = int(np.argmin(sp_distance_cur))
+ if cur_sp_distance1[close_location_index]>cutoff or cur_sp_distance2[close_location_index]>cutoff:
+ continue
+ select_pho_location = pho_ldp_location[close_location_index]
+ cur_info = [chain_id,current_seq_index,atom_name, select_pho_location,nuc_type,avg_score]
+ print("previous loc ",cur_atom_position,"updated loc: ",select_pho_location)
+ final_frag_info_list.append(cur_info)
+
+ else:
+ #check again removing 1 pho ldp assignment if we still have 2 duplicated pho assignments
+ distance_array = cdist(Sugar_Position_List,Pho_Position_List)
+ Remove_SeqID_set=set()#pick one out of 2 duplicated phos
+ for pair in remove_pair_list:
+ remove_seqlist = pair.split("_")
+ prev_seqid = int(remove_seqlist[0])
+ follow_seqid = int(remove_seqlist[1])
+
+ prev_pho_index = Pho_Seqid_list.index(prev_seqid)
+ follow_pho_index = Pho_Seqid_list.index(follow_seqid)
+ if prev_pho_index==0:
+ #priority to keep the begining pho
+ Remove_SeqID_set.add(follow_seqid)
+ continue
+ sugar_index= int(Sugar_IDdefine_dict[int(prev_seqid)])
+ cur_sp_distance1= distance_array[sugar_index-1]
+ cur_sp_distance2= distance_array[sugar_index]
+ prev_sp_distance = cur_sp_distance1[prev_pho_index]+cur_sp_distance2[prev_pho_index]
+ sugar_index= int(Sugar_IDdefine_dict[int(follow_seqid)])
+ cur_sp_distance1= distance_array[sugar_index-1]
+ cur_sp_distance2= distance_array[sugar_index]
+ follow_sp_distance = cur_sp_distance1[follow_pho_index]+cur_sp_distance2[follow_pho_index]
+
+ if prev_sp_distance<=follow_sp_distance:
+ Remove_SeqID_set.add(follow_seqid)
+ else:
+ Remove_SeqID_set.add(prev_seqid)
+ print("round 2 we still put %d points into clear set"%(len(Remove_SeqID_set)))
+ final_frag_info_list =[]
+ for i in range(len(frag_info_list)):
+ chain_id,current_seq_index,atom_name, cur_atom_position,nuc_type,avg_score =frag_info_list[i]
+ if atom_name=="C4'":
+ final_frag_info_list.append(frag_info_list[i])
+ continue
+
+ if current_seq_index not in Remove_SeqID_set:
+ final_frag_info_list.append(frag_info_list[i])
+ continue
+ print("clearning changed %d to %d frag info"%(len(frag_info_list),len(final_frag_info_list)))
+
+
+ return final_frag_info_list,False
+
diff --git a/CryoREAD/graph/visualize_utils.py b/CryoREAD/graph/visualize_utils.py
new file mode 100644
index 0000000..b5ba785
--- /dev/null
+++ b/CryoREAD/graph/visualize_utils.py
@@ -0,0 +1,204 @@
+import numpy as np
+from ops.os_operation import mkdir
+
+
+def Show_Graph_Connect(coord_list,edge_list,save_path):
+ Natm =1
+ with open(save_path,'w') as file:
+ file.write('MODEL\n')
+ for i in range(len(coord_list)):
+ line = ''
+ tmp=coord_list[i]
+ tmp_chain='A'
+ line += "ATOM%7d %3s %3s%2s%4d " % (Natm, "CA ", "ALA", " " + tmp_chain, 1)
+ line += "%8.3f%8.3f%8.3f%6.2f%6.8f\n" % (
+ tmp[0], tmp[1], tmp[2], 1.0, 1.0)
+ Natm += 1
+ file.write(line)
+ for i in range(len(edge_list)):
+ nid1,nid2=edge_list[i]
+ line = "BOND %d %d\n" % (nid1,nid2)
+ file.write(line)
+
+def Show_Bfactor_cif(name,coord_list,save_path,base_prob):
+ Natm =1
+ chain_dict ={0:"A",1:"B",2:"C",3:"D",4:"E",5:"F",6:"G",7:"H"}
+ with open(save_path,'w') as file:
+ line = 'data_%s\n'%name
+ line += "#\nloop_\n_atom_site.group_PDB\n_atom_site.id\n_atom_site.type_symbol\n" \
+ "_atom_site.label_atom_id\n_atom_site.label_alt_id\n_atom_site.label_comp_id\n"\
+ "_atom_site.label_asym_id\n_atom_site.label_entity_id\n_atom_site.label_seq_id\n"\
+ "_atom_site.pdbx_PDB_ins_code\n_atom_site.Cartn_x\n_atom_site.Cartn_y\n_atom_site.Cartn_z\n"\
+ "_atom_site.occupancy\n_atom_site.B_iso_or_equiv\n_atom_site.auth_seq_id\n_atom_site.auth_asym_id\n"\
+ "_atom_site.pdbx_PDB_model_num\n"
+ file.write(line)
+ for i in range(len(coord_list)):
+ tmp=coord_list[i]
+ current_prob = base_prob[i]
+ tmp_chain="A"
+ line =""
+ line += "ATOM %-10d C %-3s . %-3s %-2s . %-10d . " % (Natm, "CA ", "ALA", " " + tmp_chain, Natm)
+ line += "%-8.3f %-8.3f %-8.3f %-6.2f %-6.8f %-10d %-2s 1 \n" % (
+ tmp[0], tmp[1], tmp[2], 1.0, current_prob,Natm,tmp_chain)
+ Natm += 1
+ file.write(line)
+import os
+def Show_Coord_cif(name,coord_list,save_path):
+ Natm =1
+ with open(save_path,'w') as file:
+ line = 'data_%s\n'%name
+ line += "#\nloop_\n_atom_site.group_PDB\n_atom_site.id\n_atom_site.type_symbol\n" \
+ "_atom_site.label_atom_id\n_atom_site.label_alt_id\n_atom_site.label_comp_id\n"\
+ "_atom_site.label_asym_id\n_atom_site.label_entity_id\n_atom_site.label_seq_id\n"\
+ "_atom_site.pdbx_PDB_ins_code\n_atom_site.Cartn_x\n_atom_site.Cartn_y\n_atom_site.Cartn_z\n"\
+ "_atom_site.occupancy\n_atom_site.B_iso_or_equiv\n_atom_site.auth_seq_id\n_atom_site.auth_asym_id\n"\
+ "_atom_site.pdbx_PDB_model_num\n"
+ file.write(line)
+ for i in range(len(coord_list)):
+ tmp=coord_list[i]
+ tmp_chain='A'
+ line =""
+ line += "ATOM %-10d C %-3s . %-3s %-2s . %-10d . " % (Natm, "CA ", "ALA", " " + tmp_chain, Natm)
+ line += "%-8.3f %-8.3f %-8.3f %-6.2f %-6.8f %-10d %-2s 1 \n" % (
+ tmp[0], tmp[1], tmp[2], 1.0, 1.0,Natm,tmp_chain)
+ Natm += 1
+ file.write(line)
+def visualize_graph(save_path,ext_name, coordinate_list, edge_pairs):
+ graph_path = os.path.join(save_path, ext_name+".cif")
+ Show_Coord_cif( ext_name,coordinate_list,graph_path)
+ if edge_pairs is None:
+ return
+ graph_path = os.path.join(save_path, ext_name+".pml")
+ with open(graph_path,'w') as file:
+ file.write("set connect_mode=1\n")
+ file.write("load %s, TRACE\n"%(ext_name+".cif"))
+ for k in range(len(edge_pairs)):
+ id1,id2= edge_pairs[k]
+ file.write("bond resi %d and TRACE, resi %d and TRACE\n"%(id1,id2))
+ file.write("hide everything\n")
+ file.write("show sticks, TRACE\n")
+ file.write("set connect_mode=0\n")
+def Show_BaseCoord_cif(name,coord_list,save_path,base_label,base_prob):
+ Natm =1
+ chain_dict ={0:"A",1:"U",2:"C",3:"G"}
+ with open(save_path,'w') as file:
+ line = 'data_%s\n'%name
+ line += "#\nloop_\n_atom_site.group_PDB\n_atom_site.id\n_atom_site.type_symbol\n" \
+ "_atom_site.label_atom_id\n_atom_site.label_alt_id\n_atom_site.label_comp_id\n"\
+ "_atom_site.label_asym_id\n_atom_site.label_entity_id\n_atom_site.label_seq_id\n"\
+ "_atom_site.pdbx_PDB_ins_code\n_atom_site.Cartn_x\n_atom_site.Cartn_y\n_atom_site.Cartn_z\n"\
+ "_atom_site.occupancy\n_atom_site.B_iso_or_equiv\n_atom_site.auth_seq_id\n_atom_site.auth_asym_id\n"\
+ "_atom_site.pdbx_PDB_model_num\n"
+ file.write(line)
+ for i in range(len(coord_list)):
+ tmp=coord_list[i]
+ current_label = int(base_label[i])
+ current_prob = base_prob[i]
+ tmp_chain=chain_dict[current_label]
+ line =""
+ line += "ATOM %-10d C %-3s . %-3s %-2s . %-10d . " % (Natm, "CA ", "ALA", " " + tmp_chain, Natm)
+ line += "%-8.3f %-8.3f %-8.3f %-6.2f %-6.8f %-10d %-2s 1 \n" % (
+ tmp[0], tmp[1], tmp[2], 1.0, current_prob,Natm,tmp_chain)
+ Natm += 1
+ file.write(line)
+
+def visualize_base(save_path,ext_name,coordinate_list,base_prediction):
+ #base prediction will be [N,4] array each indicates the probability.
+ path_path = os.path.join(save_path, ext_name+".cif")
+ base_label = np.argmax(base_prediction,axis=1)
+ base_prob = np.max(base_prediction,axis=1)
+ Show_BaseCoord_cif(ext_name,coordinate_list,path_path,base_label,base_prob)
+
+def Visualize_Path_Base(tmp_save_path,Path_ID_List,All_Base_Assign_List,merged_cd_dens,
+ map_info_list,DNA_Label,sugar_visual=False):
+ from graph.LDP_ops import Convert_LDPcoord_To_Reallocation
+ mkdir(tmp_save_path)
+ All_Info_List=[]
+ for k,cur_path_list in enumerate(Path_ID_List):
+ #cur_save_path = os.path.join(tmp_save_path,"basepath%d.pdb"%k)
+ current_length = len(cur_path_list)
+ current_base_list = All_Base_Assign_List[k]
+ current_base_list = np.array(current_base_list)
+ if not sugar_visual:
+ assert current_length-1==len(current_base_list)
+ current_location_list = [merged_cd_dens[int(kk)] for kk in cur_path_list]
+ coordinate_list= Convert_LDPcoord_To_Reallocation(current_location_list, map_info_list)
+
+
+ #pick the intermediate node as a location to visualize
+ if not sugar_visual:
+
+ new_coord_list = []
+ for j in range(len(coordinate_list)-1):
+ cur_coord = coordinate_list[j]
+ next_coord = coordinate_list[j+1]
+ new_coord =[(cur_coord[kk]+next_coord[kk])/2 for kk in range(3)]
+ new_coord_list.append(new_coord)
+ iter_range = len(coordinate_list)-1
+ else:
+ new_coord_list = coordinate_list
+ iter_range = len(coordinate_list)
+ for j in range(iter_range):
+ cur_coord = coordinate_list[j]
+ cur_label = int(np.argmax(current_base_list[j]))
+ cur_prob = np.max(current_base_list[j])
+ All_Info_List.append(cur_coord+[cur_label]+[cur_prob])
+
+ visualize_base(tmp_save_path,"basepath_%d"%k,new_coord_list,current_base_list)
+ map_dict={0:"A",1:"U",2:"C",3:"G"}
+ pdb_all_pho_path=os.path.join(tmp_save_path,"base_assign.pdb")
+ with open(pdb_all_pho_path,"w") as wfile:
+ Natm=1
+ for info in All_Info_List:
+ line =""
+ nuc_type = map_dict[info[3]]
+ if nuc_type=='T' and DNA_Label is False:
+ nuc_type='U'
+ if nuc_type=='U' and DNA_Label is True:
+ nuc_type='T'
+ if DNA_Label is True:
+ nuc_type ="D"+nuc_type
+ if Natm>9999:
+ line += "ATOM%7d %-3s %3s%2s%4d " % (Natm, "P", nuc_type, "A",9999)
+ else:
+ line += "ATOM%7d %-3s %3s%2s%4d " % (Natm, "P", nuc_type, "A",Natm)
+
+ line = line + "%8.3f%8.3f%8.3f%6.2f%6.2f\n" % (
+ info[0], info[1], info[2], 1.0, info[4])
+ wfile.write(line)
+
+ Natm+=1
+
+
+def Visualize_LDP_Path(save_path,ext_name,Path_ID_List,merged_cd_dens,map_info_list):
+ from graph.LDP_ops import Convert_LDPcoord_To_Reallocation
+ current_location_list = [merged_cd_dens[int(kk)] for kk in Path_ID_List]
+ current_coord_list = Convert_LDPcoord_To_Reallocation(current_location_list, map_info_list)
+ Show_Coord_cif(ext_name,current_coord_list,save_path)
+
+
+def Visualize_assign_DPbase(save_dir,ext_name,path_node_id_list,base_assign_list,
+ path_base_prob_list, merged_cd_dens,map_info_list):
+ from graph.LDP_ops import Convert_LDPcoord_To_Reallocation#to avoid circular import
+ current_location_list = [merged_cd_dens[int(kk)] for kk in path_node_id_list]
+ coordinate_list= Convert_LDPcoord_To_Reallocation(current_location_list, map_info_list)
+ new_coord_list=[]
+ map_dict={"A":0,"T":1,"C":2,"G":3}
+ for j in range(len(coordinate_list)-1):
+ cur_coord = coordinate_list[j]
+ next_coord = coordinate_list[j+1]
+ new_coord =[(cur_coord[kk]+next_coord[kk])/2 for kk in range(3)]
+ new_coord_list.append(new_coord)
+ final_coord_list=[]
+ base_label=[]
+ base_prob_list=[]
+ for j in range(len(new_coord_list)):
+ current_chr = base_assign_list[j]
+ if current_chr=="-":
+ continue
+ current_label= map_dict[current_chr]
+ final_coord_list.append(new_coord_list[j])
+ base_label.append(current_label)
+ base_prob_list.append(path_base_prob_list[j][current_label])
+ tmp_save_path = os.path.join(save_dir,ext_name+".cif")
+ Show_BaseCoord_cif(ext_name,final_coord_list,tmp_save_path,base_label,base_prob_list)
diff --git a/CryoREAD/main.py b/CryoREAD/main.py
new file mode 100644
index 0000000..70492fd
--- /dev/null
+++ b/CryoREAD/main.py
@@ -0,0 +1,166 @@
+import os
+import mrcfile
+import numpy as np
+from ops.argparser import argparser
+from ops.os_operation import mkdir
+import time
+
+
+def init_save_path(origin_map_path):
+ save_path = os.path.join(os.getcwd(), "Predict_Result")
+ mkdir(save_path)
+ map_name = os.path.split(origin_map_path)[1].replace(".mrc", "")
+ map_name = map_name.replace(".map", "")
+ map_name = map_name.replace("(", "").replace(")", "")
+ save_path = os.path.join(save_path, map_name)
+ mkdir(save_path)
+ return save_path, map_name
+
+
+if __name__ == "__main__":
+ params = argparser()
+ if params["mode"] == 0:
+ # configure the running path
+ running_dir = os.path.dirname(os.path.abspath(__file__))
+
+ gpu_id = params["gpu"]
+ if gpu_id is not None:
+ os.environ["CUDA_VISIBLE_DEVICES"] = gpu_id
+ cur_map_path = os.path.abspath(params["F"])
+ # process the map path if it's ending with .gz
+ if ".gz" == cur_map_path[-3:]:
+ from ops.os_operation import unzip_gz
+
+ cur_map_path = unzip_gz(cur_map_path)
+ model_dir = os.path.abspath(params["M"])
+ if params["resolution"] < 3 or params["resolution"] > 5:
+ if params["resolution"] < 3:
+ model_dir_candidate = os.path.join(running_dir, "best_model_0_3A")
+ else:
+ model_dir_candidate = os.path.join(running_dir, "best_model_5_10A")
+ model_1st_stage_path = os.path.join(model_dir_candidate, "stage1_network.pth")
+ model_2nd_stage_path = os.path.join(model_dir_candidate, "stage2_network.pth")
+ if os.path.exists(model_1st_stage_path) and os.path.exists(model_2nd_stage_path):
+ model_dir = model_dir_candidate
+ print("with resolution %f" % params["resolution"])
+ print("use resolution model in %s" % model_dir)
+ if params["resolution"] > 10:
+ print("!!!Warning!!!Please provide map at resolution lower than 10A to run!")
+ exit()
+ if params["prediction_only"] or params["no_seqinfo"]:
+ fasta_path = None
+ else:
+ fasta_path = os.path.abspath(params["P"])
+ if params["output"] is None:
+ save_path, map_name = init_save_path(cur_map_path)
+ else:
+ save_path = params["output"]
+ map_name = "input"
+ mkdir(save_path)
+ os.chdir(running_dir)
+ from data_processing.Unify_Map import Unify_Map
+
+ cur_map_path = Unify_Map(cur_map_path, os.path.join(save_path, map_name + "_unified.mrc"))
+ from data_processing.Resize_Map import Resize_Map
+
+ cur_map_path = Resize_Map(cur_map_path, os.path.join(save_path, map_name + ".mrc"))
+ if params["contour"] < 0:
+ # change contour level to 0 and increase all the density
+ from data_processing.map_utils import increase_map_density
+
+ cur_map_path = increase_map_density(cur_map_path, os.path.join(save_path, map_name + "_increase.mrc"), params["contour"])
+ params["contour"] = 0
+ # segment map to remove those useless regions
+ from data_processing.map_utils import segment_map
+
+ cur_new_map_path = os.path.join(save_path, map_name + "_segment.mrc")
+ cur_map_path = segment_map(cur_map_path, cur_new_map_path, contour=0) # save the final prediction prob array space
+
+ # do a pre check to notify user errors for contour
+ with mrcfile.open(cur_map_path, permissive=True) as mrc:
+ data = mrc.data
+ if np.max(data) <= params["contour"]:
+ print("!!!Warning!!!Please provide contour level lower than maximum density value to run!")
+ # exit()
+ from data_processing.map_utils import automate_contour
+
+ params["contour"] = automate_contour(data)
+ print("!!!Warning!!!reset contour level: %f" % params["contour"])
+
+ from predict.predict_1st_stage import Predict_1st_Stage
+
+ # 1st stage cascad prediciton
+ model_1st_stage_path = os.path.join(model_dir, "stage1_network.pth")
+ save_path_1st_stage = os.path.join(save_path, "1st_stage_detection")
+ mkdir(save_path_1st_stage)
+ Predict_1st_Stage(
+ cur_map_path,
+ model_1st_stage_path,
+ save_path_1st_stage,
+ params["box_size"],
+ params["stride"],
+ params["batch_size"],
+ params["contour"],
+ params,
+ )
+ # 2nd stage refine prediction
+ from predict.predict_2nd_stage import Predict_2nd_Stage
+
+ model_2nd_stage_path = os.path.join(model_dir, "stage2_network.pth")
+ save_path_2nd_stage = os.path.join(save_path, "2nd_stage_detection")
+ mkdir(save_path_2nd_stage)
+ Predict_2nd_Stage(
+ cur_map_path,
+ save_path_1st_stage,
+ model_2nd_stage_path,
+ save_path_2nd_stage,
+ params["box_size"],
+ params["stride"],
+ params["batch_size"],
+ params["contour"],
+ params,
+ )
+ # 0 gen protein map for other programs to run
+ cur_predict_path = os.path.join(save_path_2nd_stage, "Input")
+ chain_predict_path = os.path.join(cur_predict_path, "chain_predictprob.npy")
+ chain_prob = np.load(chain_predict_path) # [sugar,phosphate,A,UT,C,G,protein,base]
+ mask_map_path = os.path.join(save_path, "mask_protein.mrc")
+ from data_processing.Gen_MaskDRNA_map import Gen_MaskProtein_map
+
+ Gen_MaskProtein_map(chain_prob, cur_map_path, mask_map_path, params["contour"], threshold=0.3)
+ if params["prediction_only"]:
+ print("Our prediction results are saved in %s with mrc format for visualization check." % save_path_2nd_stage)
+ exit()
+ # graph based atomic structure modeling
+ gaussian_bandwidth = params["g"] # use 3
+ dcut = params["m"] # after meanshifting merge points distance<[float]
+ rdcut = params["f"] # remove ldp threshold 0.01
+ from graph.Build_Unet_Graph import Build_Unet_Graph
+
+ graph_save_path = os.path.join(save_path, "graph_atomic_modeling")
+ mkdir(graph_save_path)
+
+ Build_Unet_Graph(cur_map_path, chain_predict_path, fasta_path, graph_save_path, gaussian_bandwidth, dcut, rdcut, params)
+ elif params["mode"] == 1:
+
+ # structure refinement pipeline
+ input_pdb = os.path.abspath(params["F"])
+ input_map = os.path.abspath(params["M"])
+ output_dir = os.path.abspath(params["P"])
+ running_dir = os.path.dirname(os.path.abspath(__file__))
+ os.chdir(running_dir)
+ # resolution param also needed
+ from graph.refine_structure import refine_structure
+
+ refine_structure(input_pdb, input_map, output_dir, params)
+
+ elif params["mode"] == 2:
+ # add evaluation script for predicted structure and native structure
+ # python3 main.py --mode=2 -F=predicted.cif[.pdb] -M=target.cif[.pdb]
+ query_pdb = os.path.abspath(params["F"]) # predicted pdb/cif file
+ target_pdb = os.path.abspath(params["M"]) # native pdb/cif file
+ cutoff = 5.0
+ # output all the metrics reported in CryoREAD's paper
+ from evaluation.evaluate_structure import evaluate_structure
+
+ evaluate_structure(query_pdb, target_pdb, cutoff)
diff --git a/CryoREAD/model/Cascade_Unet.py b/CryoREAD/model/Cascade_Unet.py
new file mode 100644
index 0000000..02d8fd8
--- /dev/null
+++ b/CryoREAD/model/Cascade_Unet.py
@@ -0,0 +1,308 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from model.Unet_Layers import unetConv3d
+from model.init_weights import init_weights
+class Small_UNet_3Plus_DeepSup(nn.Module):
+ def __init__(self, in_channels=3, n_classes=1, feature_scale=4, is_deconv=True, is_batchnorm=True):
+ super(Small_UNet_3Plus_DeepSup, self).__init__()
+ self.is_deconv = is_deconv
+ self.in_channels = in_channels
+ self.is_batchnorm = is_batchnorm
+ self.feature_scale = feature_scale
+ #small unet
+ filters = [64, 128, 256]
+ ## -------------Encoder--------------
+ self.conv1 = unetConv3d(self.in_channels, filters[0], self.is_batchnorm)
+ self.maxpool1 = nn.MaxPool3d(kernel_size=2)
+
+ self.conv2 = unetConv3d(filters[0], filters[1], self.is_batchnorm)
+ self.maxpool2 = nn.MaxPool3d(kernel_size=2)
+
+ self.conv3 = unetConv3d(filters[1], filters[2], self.is_batchnorm)
+
+ ## -------------Decoder--------------
+ self.CatChannels = filters[0]
+ self.CatBlocks = 3
+ self.UpChannels = self.CatChannels * self.CatBlocks
+
+ #stage 2d
+ self.h1_PT_hd2 = nn.MaxPool3d(2, 2, ceil_mode=True)
+ self.h1_PT_hd2_conv = nn.Conv3d(filters[0], self.CatChannels, 3, padding=1)
+ self.h1_PT_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h1_PT_hd2_relu = nn.ReLU(inplace=True)
+
+ self.h2_Cat_hd2_conv = nn.Conv3d(filters[1], self.CatChannels, 3, padding=1)
+ self.h2_Cat_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h2_Cat_hd2_relu = nn.ReLU(inplace=True)
+
+ self.hd3_UT_hd2 = nn.Upsample(scale_factor=2, mode='trilinear') # 14*14
+ self.hd3_UT_hd2_conv = nn.Conv3d(filters[2], self.CatChannels, 3, padding=1)
+ self.hd3_UT_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd3_UT_hd2_relu = nn.ReLU(inplace=True)
+
+ self.conv2d_1 = nn.Conv3d(self.UpChannels, self.UpChannels, 3, padding=1) # 16
+ self.bn2d_1 = nn.BatchNorm3d(self.UpChannels)
+ self.relu2d_1 = nn.ReLU(inplace=True)
+
+ #stage 1
+ # h1->320*320, hd1->320*320, Concatenation
+ self.h1_Cat_hd1_conv = nn.Conv3d(filters[0], self.CatChannels, 3, padding=1)
+ self.h1_Cat_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h1_Cat_hd1_relu = nn.ReLU(inplace=True)
+
+ # hd2->160*160, hd1->320*320, Upsample 2 times
+ self.hd2_UT_hd1 = nn.Upsample(scale_factor=2, mode='trilinear') # 14*14
+ self.hd2_UT_hd1_conv = nn.Conv3d(self.UpChannels, self.CatChannels, 3, padding=1)
+ self.hd2_UT_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd2_UT_hd1_relu = nn.ReLU(inplace=True)
+
+ # hd3->80*80, hd1->320*320, Upsample 4 times
+ self.hd3_UT_hd1 = nn.Upsample(scale_factor=4, mode='trilinear') # 14*14
+ self.hd3_UT_hd1_conv = nn.Conv3d(filters[2], self.CatChannels, 3, padding=1)
+ self.hd3_UT_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd3_UT_hd1_relu = nn.ReLU(inplace=True)
+
+ # fusion(h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1)
+ self.conv1d_1 = nn.Conv3d(self.UpChannels, self.UpChannels, 3, padding=1) # 16
+ self.bn1d_1 = nn.BatchNorm3d(self.UpChannels)
+ self.relu1d_1 = nn.ReLU(inplace=True)
+ #final process
+
+ self.upscore3 = nn.Upsample(scale_factor=4, mode='trilinear')
+ self.upscore2 = nn.Upsample(scale_factor=2, mode='trilinear')
+
+ # DeepSup
+ self.outconv1 = nn.Conv3d(self.UpChannels, n_classes, 3, padding=1)
+ self.outconv2 = nn.Conv3d(self.UpChannels, n_classes, 3, padding=1)
+ self.outconv3 = nn.Conv3d(filters[2], n_classes, 3, padding=1)
+
+
+
+
+ # initialise weights
+ for m in self.modules():
+ if isinstance(m, nn.Conv3d):
+ init_weights(m, init_type='kaiming')
+ elif isinstance(m, nn.BatchNorm3d):
+ init_weights(m, init_type='kaiming')
+
+ def forward(self, inputs):
+ ## -------------Encoder-------------
+ h1 = self.conv1(inputs) # h1->320*320*64
+
+ h2 = self.maxpool1(h1)
+ h2 = self.conv2(h2) # h2->160*160*128
+
+ h3 = self.maxpool2(h2)
+ hd3 = self.conv3(h3) # h3->80*80*256
+
+ ## -------------Decoder-------------
+ #stage 2:
+ h1_PT_hd2 = self.h1_PT_hd2_relu(self.h1_PT_hd2_bn(self.h1_PT_hd2_conv(self.h1_PT_hd2(h1))))
+ h2_Cat_hd2 = self.h2_Cat_hd2_relu(self.h2_Cat_hd2_bn(self.h2_Cat_hd2_conv(h2)))
+ hd3_UT_hd2 = self.hd3_UT_hd2_relu(self.hd3_UT_hd2_bn(self.hd3_UT_hd2_conv(self.hd3_UT_hd2(hd3))))
+ hd2 = self.relu2d_1(self.bn2d_1(self.conv2d_1(
+ torch.cat((h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2), 1)))) # hd4->40*40*UpChannels
+
+ #stage 1:
+
+ h1_Cat_hd1 = self.h1_Cat_hd1_relu(self.h1_Cat_hd1_bn(self.h1_Cat_hd1_conv(h1)))
+ hd2_UT_hd1 = self.hd2_UT_hd1_relu(self.hd2_UT_hd1_bn(self.hd2_UT_hd1_conv(self.hd2_UT_hd1(hd2))))
+ hd3_UT_hd1 = self.hd3_UT_hd1_relu(self.hd3_UT_hd1_bn(self.hd3_UT_hd1_conv(self.hd3_UT_hd1(hd3))))
+ hd1 = self.relu1d_1(self.bn1d_1(self.conv1d_1(
+ torch.cat((h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1), 1)))) # hd1->320*320*UpChannels
+
+ d3 = self.outconv3(hd3)
+ d3 = self.upscore3(d3) # 64->256
+
+ d2 = self.outconv2(hd2)
+ d2 = self.upscore2(d2) # 128->256
+
+ d1 = self.outconv1(hd1) # 256
+ # return F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5)
+ # sigmoid layer is included in the loss
+ # This loss combines a Sigmoid layer and the BCELoss in one single class.
+ # This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as,
+ # by combining the operations into one layer, we take advantage of
+ # the log-sum-exp trick for numerical stability.
+ return [d1, d2, d3],[hd1,hd2,hd3]
+
+class Base_Unet(nn.Module):
+ def __init__(self, in_channels=3, n_classes=1, feature_scale=4, is_deconv=True, is_batchnorm=True):
+ super(Base_Unet, self).__init__()
+ self.is_deconv = is_deconv
+ self.in_channels = in_channels
+ self.is_batchnorm = is_batchnorm
+ self.feature_scale = feature_scale
+ #small unet
+ filters = [64, 128, 256]
+ self.CatChannels = filters[0]
+ self.CatBlocks = 3
+ self.UpChannels = self.CatChannels * self.CatBlocks
+ ## -------------Encoder--------------
+ self.conv1 = unetConv3d(self.in_channels+self.UpChannels, filters[0], self.is_batchnorm)
+ self.maxpool1 = nn.MaxPool3d(kernel_size=2)
+
+ self.conv2 = unetConv3d(filters[0]+self.UpChannels, filters[1], self.is_batchnorm)
+ self.maxpool2 = nn.MaxPool3d(kernel_size=2)
+
+ self.conv3 = unetConv3d(filters[1]+filters[2], filters[2], self.is_batchnorm)
+
+ ## -------------Decoder--------------
+
+
+ #stage 2d
+ self.h1_PT_hd2 = nn.MaxPool3d(2, 2, ceil_mode=True)
+ self.h1_PT_hd2_conv = nn.Conv3d(filters[0], self.CatChannels, 3, padding=1)
+ self.h1_PT_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h1_PT_hd2_relu = nn.ReLU(inplace=True)
+
+ self.h2_Cat_hd2_conv = nn.Conv3d(filters[1], self.CatChannels, 3, padding=1)
+ self.h2_Cat_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h2_Cat_hd2_relu = nn.ReLU(inplace=True)
+
+ self.hd3_UT_hd2 = nn.Upsample(scale_factor=2, mode='trilinear') # 14*14
+ self.hd3_UT_hd2_conv = nn.Conv3d(filters[2], self.CatChannels, 3, padding=1)
+ self.hd3_UT_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd3_UT_hd2_relu = nn.ReLU(inplace=True)
+
+ self.conv2d_1 = nn.Conv3d(self.UpChannels, self.UpChannels, 3, padding=1) # 16
+ self.bn2d_1 = nn.BatchNorm3d(self.UpChannels)
+ self.relu2d_1 = nn.ReLU(inplace=True)
+
+ #stage 1
+ # h1->320*320, hd1->320*320, Concatenation
+ self.h1_Cat_hd1_conv = nn.Conv3d(filters[0], self.CatChannels, 3, padding=1)
+ self.h1_Cat_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h1_Cat_hd1_relu = nn.ReLU(inplace=True)
+
+ # hd2->160*160, hd1->320*320, Upsample 2 times
+ self.hd2_UT_hd1 = nn.Upsample(scale_factor=2, mode='trilinear') # 14*14
+ self.hd2_UT_hd1_conv = nn.Conv3d(self.UpChannels, self.CatChannels, 3, padding=1)
+ self.hd2_UT_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd2_UT_hd1_relu = nn.ReLU(inplace=True)
+
+ # hd3->80*80, hd1->320*320, Upsample 4 times
+ self.hd3_UT_hd1 = nn.Upsample(scale_factor=4, mode='trilinear') # 14*14
+ self.hd3_UT_hd1_conv = nn.Conv3d(filters[2], self.CatChannels, 3, padding=1)
+ self.hd3_UT_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd3_UT_hd1_relu = nn.ReLU(inplace=True)
+
+ # fusion(h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1)
+ self.conv1d_1 = nn.Conv3d(self.UpChannels, self.UpChannels, 3, padding=1) # 16
+ self.bn1d_1 = nn.BatchNorm3d(self.UpChannels)
+ self.relu1d_1 = nn.ReLU(inplace=True)
+ #final process
+
+ self.upscore3 = nn.Upsample(scale_factor=4, mode='trilinear')
+ self.upscore2 = nn.Upsample(scale_factor=2, mode='trilinear')
+
+ # DeepSup
+ self.outconv1 = nn.Conv3d(self.UpChannels, n_classes, 3, padding=1)
+ self.outconv2 = nn.Conv3d(self.UpChannels, n_classes, 3, padding=1)
+ self.outconv3 = nn.Conv3d(filters[2], n_classes, 3, padding=1)
+
+
+
+
+ # initialise weights
+ for m in self.modules():
+ if isinstance(m, nn.Conv3d):
+ init_weights(m, init_type='kaiming')
+ elif isinstance(m, nn.BatchNorm3d):
+ init_weights(m, init_type='kaiming')
+
+ def forward(self, inputs,hidden_inputs):
+ ## -------------Encoder-------------
+ inputs1= torch.cat([inputs,hidden_inputs[0]],dim=1)
+ h1 = self.conv1(inputs1) # h1->320*320*64
+
+ h2 = self.maxpool1(h1)
+
+ h2_input = torch.cat([h2,hidden_inputs[1]],dim=1)
+ h2 = self.conv2(h2_input) # h2->160*160*128
+
+ h3 = self.maxpool2(h2)
+
+ h3_input = torch.cat([h3,hidden_inputs[2]],dim=1)
+ hd3 = self.conv3(h3_input) # h3->80*80*256
+
+ ## -------------Decoder-------------
+ #stage 2:
+ h1_PT_hd2 = self.h1_PT_hd2_relu(self.h1_PT_hd2_bn(self.h1_PT_hd2_conv(self.h1_PT_hd2(h1))))
+ h2_Cat_hd2 = self.h2_Cat_hd2_relu(self.h2_Cat_hd2_bn(self.h2_Cat_hd2_conv(h2)))
+ hd3_UT_hd2 = self.hd3_UT_hd2_relu(self.hd3_UT_hd2_bn(self.hd3_UT_hd2_conv(self.hd3_UT_hd2(hd3))))
+ hd2 = self.relu2d_1(self.bn2d_1(self.conv2d_1(
+ torch.cat((h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2), 1)))) # hd4->40*40*UpChannels
+
+ #stage 1:
+
+ h1_Cat_hd1 = self.h1_Cat_hd1_relu(self.h1_Cat_hd1_bn(self.h1_Cat_hd1_conv(h1)))
+ hd2_UT_hd1 = self.hd2_UT_hd1_relu(self.hd2_UT_hd1_bn(self.hd2_UT_hd1_conv(self.hd2_UT_hd1(hd2))))
+ hd3_UT_hd1 = self.hd3_UT_hd1_relu(self.hd3_UT_hd1_bn(self.hd3_UT_hd1_conv(self.hd3_UT_hd1(hd3))))
+ hd1 = self.relu1d_1(self.bn1d_1(self.conv1d_1(
+ torch.cat((h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1), 1)))) # hd1->320*320*UpChannels
+
+ d3 = self.outconv3(hd3)
+ d3 = self.upscore3(d3) # 64->256
+
+ d2 = self.outconv2(hd2)
+ d2 = self.upscore2(d2) # 128->256
+
+ d1 = self.outconv1(hd1) # 256
+ # return F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5)
+ # sigmoid layer is included in the loss
+ # This loss combines a Sigmoid layer and the BCELoss in one single class.
+ # This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as,
+ # by combining the operations into one layer, we take advantage of
+ # the log-sum-exp trick for numerical stability.
+ return [d1, d2, d3]#channel size: up_channels,up_channels, filters[2]
+
+
+
+
+class Cascade_Unet(nn.Module):
+ def __init__(self, in_channels=1, n_classes1=4,n_classes2=4, feature_scale=4, is_deconv=True, is_batchnorm=True):
+ super(Cascade_Unet, self).__init__()
+ self.is_deconv = is_deconv
+ self.in_channels = in_channels
+ self.is_batchnorm = is_batchnorm
+ self.feature_scale = feature_scale
+ #channel sizes of different levels
+ filters = [64, 128, 256]
+ self.chain_net= Small_UNet_3Plus_DeepSup(in_channels=in_channels, n_classes=n_classes1,\
+ feature_scale=feature_scale, is_deconv=is_deconv, is_batchnorm=is_batchnorm)
+ self.base_net = Base_Unet(in_channels=in_channels, n_classes=n_classes2,\
+ feature_scale=feature_scale, is_deconv=is_deconv, is_batchnorm=is_batchnorm)
+
+
+ def forward(self, inputs):
+ chain_output, hidden_input =self.chain_net(inputs)
+ base_output = self.base_net(inputs,hidden_input)
+ return chain_output,base_output
+
+
+class Cascade_maskUnet(nn.Module):
+ def __init__(self, in_channels=1, n_classes1=4,n_classes2=4, feature_scale=4, is_deconv=True, is_batchnorm=True):
+ super(Cascade_maskUnet, self).__init__()
+ self.is_deconv = is_deconv
+ self.in_channels = in_channels
+ self.is_batchnorm = is_batchnorm
+ self.feature_scale = feature_scale
+ #channel sizes of different levels
+ filters = [64, 128, 256]
+ self.chain_net= Small_UNet_3Plus_DeepSup(in_channels=in_channels, n_classes=n_classes1,\
+ feature_scale=feature_scale, is_deconv=is_deconv, is_batchnorm=is_batchnorm)
+ self.base_net = Base_Unet(in_channels=in_channels, n_classes=n_classes2,\
+ feature_scale=feature_scale, is_deconv=is_deconv, is_batchnorm=is_batchnorm)
+
+
+ def forward(self, inputs):
+ chain_output, hidden_input =self.chain_net(inputs)
+ base_region_output = chain_output[0][:,2:3]#to maintain the axis 1
+ base_output = self.base_net(inputs,hidden_input)
+ for k in range(len(base_output)):
+ base_output[k] = base_region_output*base_output[k]#enforce predictions
+ #chain_output[0][:,2:3] should also be enforced to be correct
+ return chain_output,base_output
diff --git a/CryoREAD/model/Small_Unet_3Plus_DeepSup.py b/CryoREAD/model/Small_Unet_3Plus_DeepSup.py
new file mode 100644
index 0000000..f8e928f
--- /dev/null
+++ b/CryoREAD/model/Small_Unet_3Plus_DeepSup.py
@@ -0,0 +1,134 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from model.Unet_Layers import unetConv3d
+from model.init_weights import init_weights
+
+'''
+ Small UNet 3+ with deep supervision
+'''
+
+class Small_UNet_3Plus_DeepSup(nn.Module):
+ def __init__(self, in_channels=3, n_classes=1, feature_scale=4, is_deconv=True, is_batchnorm=True):
+ super(Small_UNet_3Plus_DeepSup, self).__init__()
+ self.is_deconv = is_deconv
+ self.in_channels = in_channels
+ self.is_batchnorm = is_batchnorm
+ self.feature_scale = feature_scale
+ #small unet
+ filters = [64, 128, 256]
+ ## -------------Encoder--------------
+ self.conv1 = unetConv3d(self.in_channels, filters[0], self.is_batchnorm)
+ self.maxpool1 = nn.MaxPool3d(kernel_size=2)
+
+ self.conv2 = unetConv3d(filters[0], filters[1], self.is_batchnorm)
+ self.maxpool2 = nn.MaxPool3d(kernel_size=2)
+
+ self.conv3 = unetConv3d(filters[1], filters[2], self.is_batchnorm)
+
+ ## -------------Decoder--------------
+ self.CatChannels = filters[0]
+ self.CatBlocks = 3
+ self.UpChannels = self.CatChannels * self.CatBlocks
+
+ #stage 2d
+ self.h1_PT_hd2 = nn.MaxPool3d(2, 2, ceil_mode=True)
+ self.h1_PT_hd2_conv = nn.Conv3d(filters[0], self.CatChannels, 3, padding=1)
+ self.h1_PT_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h1_PT_hd2_relu = nn.ReLU(inplace=True)
+
+ self.h2_Cat_hd2_conv = nn.Conv3d(filters[1], self.CatChannels, 3, padding=1)
+ self.h2_Cat_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h2_Cat_hd2_relu = nn.ReLU(inplace=True)
+
+ self.hd3_UT_hd2 = nn.Upsample(scale_factor=2, mode='trilinear') # 14*14
+ self.hd3_UT_hd2_conv = nn.Conv3d(filters[2], self.CatChannels, 3, padding=1)
+ self.hd3_UT_hd2_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd3_UT_hd2_relu = nn.ReLU(inplace=True)
+
+ self.conv2d_1 = nn.Conv3d(self.UpChannels, self.UpChannels, 3, padding=1) # 16
+ self.bn2d_1 = nn.BatchNorm3d(self.UpChannels)
+ self.relu2d_1 = nn.ReLU(inplace=True)
+
+ #stage 1
+ # h1->320*320, hd1->320*320, Concatenation
+ self.h1_Cat_hd1_conv = nn.Conv3d(filters[0], self.CatChannels, 3, padding=1)
+ self.h1_Cat_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.h1_Cat_hd1_relu = nn.ReLU(inplace=True)
+
+ # hd2->160*160, hd1->320*320, Upsample 2 times
+ self.hd2_UT_hd1 = nn.Upsample(scale_factor=2, mode='trilinear') # 14*14
+ self.hd2_UT_hd1_conv = nn.Conv3d(self.UpChannels, self.CatChannels, 3, padding=1)
+ self.hd2_UT_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd2_UT_hd1_relu = nn.ReLU(inplace=True)
+
+ # hd3->80*80, hd1->320*320, Upsample 4 times
+ self.hd3_UT_hd1 = nn.Upsample(scale_factor=4, mode='trilinear') # 14*14
+ self.hd3_UT_hd1_conv = nn.Conv3d(filters[2], self.CatChannels, 3, padding=1)
+ self.hd3_UT_hd1_bn = nn.BatchNorm3d(self.CatChannels)
+ self.hd3_UT_hd1_relu = nn.ReLU(inplace=True)
+
+ # fusion(h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1)
+ self.conv1d_1 = nn.Conv3d(self.UpChannels, self.UpChannels, 3, padding=1) # 16
+ self.bn1d_1 = nn.BatchNorm3d(self.UpChannels)
+ self.relu1d_1 = nn.ReLU(inplace=True)
+ #final process
+
+ self.upscore3 = nn.Upsample(scale_factor=4, mode='trilinear')
+ self.upscore2 = nn.Upsample(scale_factor=2, mode='trilinear')
+
+ # DeepSup
+ self.outconv1 = nn.Conv3d(self.UpChannels, n_classes, 3, padding=1)
+ self.outconv2 = nn.Conv3d(self.UpChannels, n_classes, 3, padding=1)
+ self.outconv3 = nn.Conv3d(filters[2], n_classes, 3, padding=1)
+
+
+
+
+ # initialise weights
+ for m in self.modules():
+ if isinstance(m, nn.Conv3d):
+ init_weights(m, init_type='kaiming')
+ elif isinstance(m, nn.BatchNorm3d):
+ init_weights(m, init_type='kaiming')
+
+ def forward(self, inputs):
+ ## -------------Encoder-------------
+ h1 = self.conv1(inputs) # h1->320*320*64
+
+ h2 = self.maxpool1(h1)
+ h2 = self.conv2(h2) # h2->160*160*128
+
+ h3 = self.maxpool2(h2)
+ hd3 = self.conv3(h3) # h3->80*80*256
+
+ ## -------------Decoder-------------
+ #stage 2:
+ h1_PT_hd2 = self.h1_PT_hd2_relu(self.h1_PT_hd2_bn(self.h1_PT_hd2_conv(self.h1_PT_hd2(h1))))
+ h2_Cat_hd2 = self.h2_Cat_hd2_relu(self.h2_Cat_hd2_bn(self.h2_Cat_hd2_conv(h2)))
+ hd3_UT_hd2 = self.hd3_UT_hd2_relu(self.hd3_UT_hd2_bn(self.hd3_UT_hd2_conv(self.hd3_UT_hd2(hd3))))
+ hd2 = self.relu2d_1(self.bn2d_1(self.conv2d_1(
+ torch.cat((h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2), 1)))) # hd4->40*40*UpChannels
+
+ #stage 1:
+
+ h1_Cat_hd1 = self.h1_Cat_hd1_relu(self.h1_Cat_hd1_bn(self.h1_Cat_hd1_conv(h1)))
+ hd2_UT_hd1 = self.hd2_UT_hd1_relu(self.hd2_UT_hd1_bn(self.hd2_UT_hd1_conv(self.hd2_UT_hd1(hd2))))
+ hd3_UT_hd1 = self.hd3_UT_hd1_relu(self.hd3_UT_hd1_bn(self.hd3_UT_hd1_conv(self.hd3_UT_hd1(hd3))))
+ hd1 = self.relu1d_1(self.bn1d_1(self.conv1d_1(
+ torch.cat((h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1), 1)))) # hd1->320*320*UpChannels
+
+ d3 = self.outconv3(hd3)
+ d3 = self.upscore3(d3) # 64->256
+
+ d2 = self.outconv2(hd2)
+ d2 = self.upscore2(d2) # 128->256
+
+ d1 = self.outconv1(hd1) # 256
+ # return F.sigmoid(d1), F.sigmoid(d2), F.sigmoid(d3), F.sigmoid(d4), F.sigmoid(d5)
+ # sigmoid layer is included in the loss
+ # This loss combines a Sigmoid layer and the BCELoss in one single class.
+ # This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as,
+ # by combining the operations into one layer, we take advantage of
+ # the log-sum-exp trick for numerical stability.
+ return [d1, d2, d3]
diff --git a/CryoREAD/model/Unet_Layers.py b/CryoREAD/model/Unet_Layers.py
new file mode 100644
index 0000000..d82e613
--- /dev/null
+++ b/CryoREAD/model/Unet_Layers.py
@@ -0,0 +1,101 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from model.init_weights import init_weights
+
+class unetConv3d(nn.Module):
+ def __init__(self, in_size, out_size, is_batchnorm, n=2, ks=3, stride=1, padding=1):
+ """
+ :param in_size: input channel size
+ :param out_size: output channel size
+ :param is_batchnorm: apply batch norm or not
+ :param n: number of convolution layers
+ :param ks: kernal size
+ :param stride: stride size
+ :param padding: paddng size
+ """
+ super(unetConv3d, self).__init__()
+ self.n = n
+ self.ks = ks
+ self.stride = stride
+ self.padding = padding
+ s = stride
+ p = padding
+ if is_batchnorm:
+ for i in range(1, n + 1):
+ conv = nn.Sequential(nn.Conv3d(in_size, out_size, ks, s, p),
+ nn.BatchNorm3d(out_size),
+ nn.ReLU(inplace=True), )
+ setattr(self, 'conv%d' % i, conv)
+ in_size = out_size
+
+ else:
+ for i in range(1, n + 1):
+ conv = nn.Sequential(nn.Conv3d(in_size, out_size, ks, s, p),
+ nn.ReLU(inplace=True), )
+ setattr(self, 'conv%d' % i, conv)
+ in_size = out_size
+
+ # initialise the blocks
+ for m in self.children():
+ init_weights(m, init_type='kaiming')
+
+ def forward(self, inputs):
+ x = inputs
+ for i in range(1, self.n + 1):
+ conv = getattr(self, 'conv%d' % i)
+ x = conv(x)
+ return x
+
+class unetUp(nn.Module):
+ def __init__(self, in_size, out_size, is_deconv, n_concat=2):
+ super(unetUp, self).__init__()
+ # self.conv = unetConv2(in_size + (n_concat - 2) * out_size, out_size, False)
+ self.conv = unetConv3d(out_size*2, out_size, False)
+ if is_deconv:
+ self.up = nn.ConvTranspose3d(in_size, out_size, kernel_size=4, stride=2, padding=1)
+ else:
+ self.up = nn.Upsample(scale_factor=2, mode='bilinear') #nn.UpsamplingBilinear3d(scale_factor=2)
+
+ # initialise the blocks
+ for m in self.children():
+ if m.__class__.__name__.find('unetConv3d') != -1: continue
+ init_weights(m, init_type='kaiming')
+
+ def forward(self, inputs0, *input):
+ # print(self.n_concat)
+ # print(input)
+ outputs0 = self.up(inputs0)
+ for i in range(len(input)):
+ outputs0 = torch.cat([outputs0, input[i]], 1)
+ return self.conv(outputs0)
+
+class unetUp_origin(nn.Module):
+ def __init__(self, in_size, out_size, is_deconv, n_concat=2):
+ """
+ :param in_size: input channel size
+ :param out_size: output channel size
+ :param is_deconv: deconv bool flag
+ :param n_concat: number of concatentations from other levels
+ """
+ super(unetUp_origin, self).__init__()
+ # self.conv = unetConv2(out_size*2, out_size, False)
+ if is_deconv:
+ self.conv = unetConv3d(in_size + (n_concat - 2) * out_size, out_size, False)
+ self.up = nn.ConvTranspose3d(in_size, out_size, kernel_size=4, stride=2, padding=1)
+ else:
+ self.conv = unetConv3d(in_size + (n_concat - 2) * out_size, out_size, False)
+ self.up = nn.Upsample(scale_factor=2, mode='bilinear')#nn.UpsamplingBilinear2d(scale_factor=2)
+
+ # initialise the blocks
+ for m in self.children():
+ if m.__class__.__name__.find('unetConv3d') != -1: continue
+ init_weights(m, init_type='kaiming')
+
+ def forward(self, inputs0, *input):
+ # print(self.n_concat)
+ # print(input)
+ outputs0 = self.up(inputs0)
+ for i in range(len(input)):
+ outputs0 = torch.cat([outputs0, input[i]], 1)
+ return self.conv(outputs0)
diff --git a/CryoREAD/model/__init__.py b/CryoREAD/model/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/CryoREAD/model/init_weights.py b/CryoREAD/model/init_weights.py
new file mode 100644
index 0000000..8cdfd65
--- /dev/null
+++ b/CryoREAD/model/init_weights.py
@@ -0,0 +1,65 @@
+
+import torch
+import torch.nn as nn
+from torch.nn import init
+
+def weights_init_normal(m):
+ classname = m.__class__.__name__
+ #print(classname)
+ if classname.find('Conv') != -1:
+ init.normal_(m.weight.data, 0.0, 0.02)
+ elif classname.find('Linear') != -1:
+ init.normal_(m.weight.data, 0.0, 0.02)
+ elif classname.find('BatchNorm') != -1:
+ init.normal_(m.weight.data, 1.0, 0.02)
+ init.constant_(m.bias.data, 0.0)
+
+
+def weights_init_xavier(m):
+ classname = m.__class__.__name__
+ #print(classname)
+ if classname.find('Conv') != -1:
+ init.xavier_normal_(m.weight.data, gain=1)
+ elif classname.find('Linear') != -1:
+ init.xavier_normal_(m.weight.data, gain=1)
+ elif classname.find('BatchNorm') != -1:
+ init.normal_(m.weight.data, 1.0, 0.02)
+ init.constant_(m.bias.data, 0.0)
+
+
+def weights_init_kaiming(m):
+ classname = m.__class__.__name__
+ #print(classname)
+ if classname.find('Conv') != -1:
+ init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')
+ elif classname.find('Linear') != -1:
+ init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')
+ elif classname.find('BatchNorm') != -1:
+ init.normal_(m.weight.data, 1.0, 0.02)
+ init.constant_(m.bias.data, 0.0)
+
+
+def weights_init_orthogonal(m):
+ classname = m.__class__.__name__
+ #print(classname)
+ if classname.find('Conv') != -1:
+ init.orthogonal_(m.weight.data, gain=1)
+ elif classname.find('Linear') != -1:
+ init.orthogonal_(m.weight.data, gain=1)
+ elif classname.find('BatchNorm') != -1:
+ init.normal_(m.weight.data, 1.0, 0.02)
+ init.constant_(m.bias.data, 0.0)
+
+
+def init_weights(net, init_type='normal'):
+ #print('initialization method [%s]' % init_type)
+ if init_type == 'normal':
+ net.apply(weights_init_normal)
+ elif init_type == 'xavier':
+ net.apply(weights_init_xavier)
+ elif init_type == 'kaiming':
+ net.apply(weights_init_kaiming)
+ elif init_type == 'orthogonal':
+ net.apply(weights_init_orthogonal)
+ else:
+ raise NotImplementedError('initialization method [%s] is not implemented' % init_type)
\ No newline at end of file
diff --git a/CryoREAD/ops/Logger.py b/CryoREAD/ops/Logger.py
new file mode 100644
index 0000000..cfb0b8f
--- /dev/null
+++ b/CryoREAD/ops/Logger.py
@@ -0,0 +1,48 @@
+import math
+import shutil
+import os
+
+class AverageMeter(object):
+ """Computes and stores the average and current value"""
+ def __init__(self, name, fmt=':f'):
+ self.name = name
+ self.fmt = fmt
+ self.reset()
+
+ def reset(self):
+ self.val = 0
+ self.avg = 0
+ self.sum = 0
+ self.count = 0
+
+ def update(self, val, n=1):
+ self.val = val
+ self.sum += val * n
+ self.count += n
+ self.avg = self.sum / self.count
+
+ def __str__(self):
+ fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
+ return fmtstr.format(**self.__dict__)
+
+
+class ProgressMeter(object):
+ def __init__(self, num_batches, meters, prefix=""):
+ self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
+ self.meters = meters
+ self.prefix = prefix
+
+ def display(self, batch):
+ entries = [self.prefix + self.batch_fmtstr.format(batch)]
+ entries += [str(meter) for meter in self.meters]
+ print('\t'.join(entries))
+
+ def _get_batch_fmtstr(self, num_batches):
+ num_digits = len(str(num_batches // 1))
+ fmt = '{:' + str(num_digits) + 'd}'
+ return '[' + fmt + '/' + fmt.format(num_batches) + ']'
+ def write_record(self,batch,filename):
+ entries = [self.prefix + self.batch_fmtstr.format(batch)]
+ entries += [str(meter) for meter in self.meters]
+ with open(filename,"a+") as file:
+ file.write('\t'.join(entries)+"\n")
\ No newline at end of file
diff --git a/CryoREAD/ops/__init__.py b/CryoREAD/ops/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/CryoREAD/ops/acc_graph.py b/CryoREAD/ops/acc_graph.py
new file mode 100644
index 0000000..1b1884e
--- /dev/null
+++ b/CryoREAD/ops/acc_graph.py
@@ -0,0 +1,354 @@
+from numba import jit
+import numpy as np
+import math
+
+@jit(nogil=True,nopython=True)
+def acc_new_adj_matrix(Nori,merged_data,adj,origrid,Nnode):
+ Ne=0
+ abanden=0
+ check_set = set()
+ for ii in range(Nori):
+ m1 = int(merged_data[ii, 0])
+ merged_id1 = int(merged_data[m1, 5])
+ if ii % 10000 == 0:
+ print(ii)
+ if merged_id1 == -1:
+ continue
+ for jj in range(ii + 1, Nori):
+ m2 = int(merged_data[jj, 0])
+ if m1 == m2:
+ continue
+
+ merged_id2 = int(merged_data[m2, 5])
+ if merged_id2 == -1 or merged_id1 == merged_id2: # To avoid that point with stock=0, very lower density, however, never be combined to others
+ continue
+ if merged_id1*Nnode+merged_id2 in check_set:
+ continue
+ adjunct_label = False
+ for kk in range(3):
+ if (origrid[ii][kk] - origrid[jj][kk]) ** 2 > 1:
+ adjunct_label = True # It will not connected anymore
+ break
+ if adjunct_label:
+ abanden += 1
+ continue
+ check_set.add(merged_id1 * Nnode + merged_id2)
+ adj[Ne] = merged_id1*Nnode+merged_id2
+
+ Ne += 1 # count for effective edges
+ return adj,Ne
+@jit(nogil=True,nopython=True)
+def acc_adj_matrix(Nori,merged_data,adj,origrid):
+ if True:
+ Ne=0
+ abanden=0
+ #count_useful=0
+ for ii in range(Nori):#original point before merging
+ m1=int(merged_data[ii,0])
+ merged_id1=int(merged_data[m1,5])
+ if ii%10000==0:
+ print(ii)
+ if merged_id1==-1:
+ continue
+ for jj in range(ii+1,Nori):
+ m2=int(merged_data[jj,0])
+ if m1==m2:
+ continue
+
+ merged_id2=int(merged_data[m2,5])
+ if adj[merged_id1,merged_id2]:
+ continue
+ if merged_id2==-1 or merged_id1==merged_id2:#To avoid that point with stock=0, very lower density, however, never be combined to others
+ continue
+
+ adjunct_label=False
+ for kk in range(3):
+ if (origrid[ii][kk]-origrid[jj][kk])**2>1:
+ adjunct_label=True#It will not connected anymore
+ break
+ if adjunct_label:
+ abanden+=1
+ continue
+ adj[merged_id1][merged_id2]=1
+ adj[merged_id2][merged_id1]=1
+ Ne+=1#count for effective edges
+
+ return adj,Ne
+
+@jit(nogil=True,nopython=True)
+def acc_edge_prob(Ne,id1,id2,merged_cd_dens,dens,edge_d,fsiv,fmaxd,xdim,ydim,zdim,mrc_dense):
+ if True:
+ for ii in range(Ne):
+ if ii%1000==0:
+ print(ii)
+ v1=int(id1[ii])
+ v2=int(id2[ii])
+ vec=np.zeros(3)
+ for kk in range(3):
+ vec[kk]=merged_cd_dens[v2,kk]-merged_cd_dens[v1,kk]
+ MinDens=merged_cd_dens[v1,3]
+ cd1=np.zeros(3)
+ for jj in range(11):
+ for kk in range(3):
+ cd1[kk]=merged_cd_dens[v1,kk]+vec[kk]*0.1*jj
+ tmp_dens=mean_shift_pos(cd1,fsiv,fmaxd,xdim,ydim,zdim,mrc_dense)
+ if tmp_dens=xdim:
+ endp[0]=xdim
+ if endp[1]>=ydim:
+ endp[1]=ydim
+ if endp[2]>=zdim:
+ endp[2]=zdim
+ dtotal=0
+ for xp in range(int(stp[0]),int(endp[0])):
+ rx=(float)(xp-pos[0])**2
+ for yp in range(int(stp[1]),int(endp[1])):
+ ry=(float)(yp-pos[1])**2
+ for zp in range(int(stp[2]),int(endp[2])):
+ rz=(float)(zp-pos[2])**2
+ d2=rx+ry+rz
+ v=math.exp(-1.5*d2*fsiv)*dens[xp,yp,zp]
+ dtotal+=v
+ return dtotal
+@jit(nogil=True,nopython=True)
+def acc_local_mst(cid,Nnode,Ne,id1,id2,merged_cd_dens,d2cut,local_label):
+ """
+ :param cid:
+ :param Nnode:
+ :param Ne:
+ :param id1:
+ :param id2:
+ :param merged_cd_dens:
+ :param d2cut:
+ :param local_label:
+ :return:
+ purpose: if there exists a node is in acceptable range to an edge's two nodes, then the edge will be marked as local_label=True
+ """
+ if True:
+
+ count=0
+ for i in range(Nnode):
+ vec=np.zeros(3)
+ #if i%100==0:
+ # print(i)
+ for j in range(Nnode):
+ cid[j]=j
+ for k in range(Ne):
+ v1=int(id1[k])
+ v2=int(id2[k])
+ if cid[v1]==cid[v2]:
+ continue
+ #dist
+ d2=0
+ for l in range(3):
+ vec[l]=merged_cd_dens[i,l]-merged_cd_dens[v1,l]
+ d2+=vec[l]**2
+ if d2>d2cut:
+ continue
+ d2=0
+ for l in range(3):
+ vec[l]=merged_cd_dens[i,l]-merged_cd_dens[v2,l]
+ d2+=vec[l]**2
+ if d2>d2cut:
+ continue
+ local_label[k]=1
+ count+=1
+ tmpid=cid[v2]
+ for l in range(Nnode):
+ if cid[l]==tmpid:#Update cid
+ cid[l]=cid[v1]
+ #print(count)
+ return local_label,count
+
+@jit(nogil=True,nopython=True)
+def acc_get_density_cut_twographextra(check_list,merged_data,adj,density_record,origrid,Nnode,
+ cutoff_index,origrid_cutoff_index):
+ """
+ :param Nori: number of grid points that can be used
+ :param merged_data: an array with format # init_id, x,y,z,density, merged_to_id
+ :param adj: used to save the adjacency info
+ :param density_record: used to save the edge density information
+ :param origrid: origin grid array
+ :param Nnode: number of merged node reamined
+ :param cutoff_index: boundary of two nodes
+ :return:
+ #only focused pho or sugar internal connections
+ """
+ Ne = 0
+ abanden = 0
+ for ii in check_list:
+ ii = int(ii)
+ m1 = int(merged_data[ii, 0])#original id
+ if ii=origrid_cutoff_index:
+ merged_id1 +=cutoff_index
+ if ii % 10000 == 0:
+ print(ii)
+ if merged_id1 == -1:#Merged node, ignore
+ continue
+ for jj in check_list:
+ jj = int(jj)
+ if ii==jj:
+ continue
+ m2 = int(merged_data[jj, 0])
+ if ii=origrid_cutoff_index:
+ continue
+ if ii>=origrid_cutoff_index and jj=origrid_cutoff_index:
+ merged_id2 = int(merged_data[m2+origrid_cutoff_index, 5])
+ else:
+ merged_id2 = int(merged_data[m2, 5])
+ if merged_id2 == -1:
+ continue
+
+ if jj>=origrid_cutoff_index:
+ merged_id2+=cutoff_index#map to real merge id
+ if merged_id1==merged_id2:
+ continue
+ adjunct_label = False
+ for kk in range(3):
+ if (origrid[ii][kk] - origrid[jj][kk]) ** 2 > 1:
+ adjunct_label = True # It will not connected anymore
+ break
+ if adjunct_label:
+ abanden += 1
+ continue
+ adj[Ne] = merged_id1 * Nnode + merged_id2#Redundant, every possible connect save twice
+ density_record[Ne]=min(merged_data[ii, 4],merged_data[jj, 4])#Once one don't exist, then the connection will be not exist by this specific ori points
+ Ne += 1 # count for effective edges
+ return adj, Ne,density_record
+
+
+
+
+@jit(nogil=True,nopython=True)
+def acc_get_density_cut_twograph(Nori,merged_data,adj,density_record,origrid,Nnode,
+ cutoff_index,origrid_cutoff_index):
+ """
+ :param Nori: number of grid points that can be used
+ :param merged_data: an array with format # init_id, x,y,z,density, merged_to_id
+ :param adj: used to save the adjacency info
+ :param density_record: used to save the edge density information
+ :param origrid: origin grid array
+ :param Nnode: number of merged node reamined
+ :param cutoff_index: boundary of two nodes
+ :return:
+ """
+ Ne = 0
+ abanden = 0
+ for ii in range(origrid_cutoff_index):
+ m1 = int(merged_data[ii, 0])#original id
+ merged_id1 = int(merged_data[m1, 5])#link to merged point id
+
+ if ii % 10000 == 0:
+ print(ii)
+ if merged_id1 == -1:#Merged node, ignore
+ continue
+ for jj in range(origrid_cutoff_index, Nori):
+ m2 = int(merged_data[jj, 0])
+ merged_id2 = int(merged_data[m2+origrid_cutoff_index, 5])
+ if merged_id2 == -1:
+ continue
+ merged_id2+=cutoff_index#map to real merge id
+ assert merged_id1 != merged_id2#it should be impossible to be equal
+ adjunct_label = False
+ for kk in range(3):
+ if (origrid[ii][kk] - origrid[jj][kk]) ** 2 > 1:
+ adjunct_label = True # It will not connected anymore
+ break
+ if adjunct_label:
+ abanden += 1
+ continue
+ adj[Ne] = merged_id1 * Nnode + merged_id2#Redundant, every possible connect save twice
+ density_record[Ne]=min(merged_data[ii, 4],merged_data[jj, 4])#Once one don't exist, then the connection will be not exist by this specific ori points
+ Ne += 1 # count for effective edges
+ return adj, Ne,density_record
+
+@jit(nogil=True,nopython=True)
+def acc_get_density_cut(Nori,merged_data,adj,density_record,origrid,Nnode):
+ """
+ :param Nori: number of grid points that can be used
+ :param merged_data: an array with format # init_id, x,y,z,density, merged_to_id
+ :param adj: used to save the adjacency info
+ :param density_record: used to save the edge density information
+ :param origrid: origin grid array
+ :param Nnode: number of merged node reamined
+ :return:
+ """
+ Ne = 0
+ abanden = 0
+ for ii in range(Nori):
+ m1 = int(merged_data[ii, 0])#original id
+ merged_id1 = int(merged_data[m1, 5])#link to merged point id
+ if ii % 10000 == 0:
+ print(ii)
+ if merged_id1 == -1:#Merged node, ignore
+ continue
+ for jj in range(ii + 1, Nori):
+ m2 = int(merged_data[jj, 0])
+ if m1 == m2:
+ continue
+
+ merged_id2 = int(merged_data[m2, 5])
+ if merged_id2 == -1 or merged_id1 == merged_id2: # To avoid that point with stock=0, very lower density, however, never be combined to others
+ continue
+
+ adjunct_label = False
+ for kk in range(3):
+ if (origrid[ii][kk] - origrid[jj][kk]) ** 2 > 1:
+ adjunct_label = True # It will not connected anymore
+ break
+ if adjunct_label:
+ abanden += 1
+ continue
+ #make sure init points has connections
+ adj[Ne] = merged_id1 * Nnode + merged_id2#Redundant, every possible connect save twice
+ density_record[Ne]=min(merged_data[ii, 4],merged_data[jj, 4])#Once one don't exist, then the connection will be not exist by this specific ori points
+ Ne += 1 # count for effective edges
+ return adj, Ne,density_record
diff --git a/CryoREAD/ops/acc_local_mst.py b/CryoREAD/ops/acc_local_mst.py
new file mode 100644
index 0000000..dabb9b3
--- /dev/null
+++ b/CryoREAD/ops/acc_local_mst.py
@@ -0,0 +1,53 @@
+from numba import jit
+import numpy as np
+import math
+
+@jit(nogil=True,nopython=True)
+def acc_local_mst(cid,Nnode,Ne,id1,id2,merged_cd_dens,d2cut,local_label):
+ """
+ :param cid:
+ :param Nnode:
+ :param Ne:
+ :param id1:
+ :param id2:
+ :param merged_cd_dens:
+ :param d2cut:
+ :param local_label:
+ :return:
+ purpose: if there exists a node is in acceptable range to an edge's two nodes, then the edge will be marked as local_label=True
+ """
+ if True:
+
+ count=0
+ for i in range(Nnode):
+ vec=np.zeros(3)
+ #if i%100==0:
+ # print(i)
+ for j in range(Nnode):
+ cid[j]=j
+ for k in range(Ne):
+ v1=int(id1[k])
+ v2=int(id2[k])
+ if cid[v1]==cid[v2]:
+ continue
+ #dist
+ d2=0
+ for l in range(3):
+ vec[l]=merged_cd_dens[i,l]-merged_cd_dens[v1,l]
+ d2+=vec[l]**2
+ if d2>d2cut:
+ continue
+ d2=0
+ for l in range(3):
+ vec[l]=merged_cd_dens[i,l]-merged_cd_dens[v2,l]
+ d2+=vec[l]**2
+ if d2>d2cut:
+ continue
+ local_label[k]=1
+ count+=1
+ tmpid=cid[v2]
+ for l in range(Nnode):
+ if cid[l]==tmpid:#Update cid
+ cid[l]=cid[v1]
+ #print(count)
+ return local_label,count
diff --git a/CryoREAD/ops/acc_mean_shift.py b/CryoREAD/ops/acc_mean_shift.py
new file mode 100644
index 0000000..11ee87f
--- /dev/null
+++ b/CryoREAD/ops/acc_mean_shift.py
@@ -0,0 +1,126 @@
+from numba import jit
+import numpy as np
+import math
+@jit(nogil=True,nopython=True)
+def carry_shift(point_cd,cnt,fmaxd,fsiv,xdim,ydim,zdim,dens):
+ if True:
+ point_dens=np.zeros(cnt)
+ for i in range(cnt):
+ if i%1000==0:
+ print("mean shift",i,"/",cnt)
+ #print(i)
+ #print('start shifting for '+str(i))
+ pos=np.zeros(3)
+ for j in range(3):
+ pos[j]=point_cd[i][j]
+ if True:
+ while True:
+ stp=np.zeros(3)
+ endp=np.zeros(3)
+ for j in range(3):
+ stp[j]=int(pos[j]-fmaxd)
+ if stp[j]<0:
+ stp[j]=0
+ endp[j]=int(pos[j]+fmaxd+1)
+ if endp[0]>=xdim:
+ endp[0]=xdim
+ if endp[1]>=ydim:
+ endp[1]=ydim
+ if endp[2]>=zdim:
+ endp[2]=zdim
+ dtotal=0
+ pos2=np.zeros(3)
+ for xp in range(int(stp[0]),int(endp[0])):
+ rx=float(xp-pos[0])**2
+ for yp in range(int(stp[1]),int(endp[1])):
+ ry=float(yp-pos[1])**2
+ for zp in range(int(stp[2]),int(endp[2])):
+ rz=float(zp-pos[2])**2
+ d2=rx+ry+rz#relative distance square
+ v=np.exp(-1.5*d2*fsiv)*dens[xp,yp,zp]#This is the bottom part of the equation, where pos represents y, (xp,yp,zp) represents xi
+ dtotal+=v
+ if v>0:
+ pos2[0]+=v*(float)(xp)#pos2 is for the top part of the equation
+ pos2[1]+=v*(float)(yp)
+ pos2[2]+=v*(float)(zp)
+ if dtotal==0:
+ break
+ rd=1.00/float(dtotal)
+ tempcd=np.zeros(3)
+ for j in range(3):
+ pos2[j]*=rd#Now we get the equation result
+ tempcd[j]=pos[j]-pos2[j]
+ pos[j]=pos2[j]#Prepare for iteration
+ check_d=tempcd[0]**2+tempcd[1]**2+tempcd[2]**2#Iteration until you find the place is stable
+ if check_d<0.001:
+ break
+
+ for j in range(3):
+
+ point_cd[i][j]=pos[j]
+ point_dens[i]=dtotal/cnt
+ return point_cd,point_dens
+
+@jit(nogil=True,nopython=True)
+def carry_shift_limit(point_cd,cnt,fmaxd,fsiv,xdim,ydim,zdim,dens,move_limit=3):
+ if True:
+ point_dens=np.zeros(cnt)
+ for i in range(cnt):
+ if i%1000==0:
+ print(i)
+ #print(i)
+ #print('start shifting for '+str(i))
+ pos=np.zeros(3)
+ for j in range(3):
+ pos[j]=point_cd[i][j]
+ if True:
+ while True:
+ stp=np.zeros(3)
+ endp=np.zeros(3)
+ for j in range(3):
+ stp[j]=int(pos[j]-fmaxd)
+ if stp[j]<0:
+ stp[j]=0
+ endp[j]=int(pos[j]+fmaxd+1)
+ if endp[0]>=xdim:
+ endp[0]=xdim
+ if endp[1]>=ydim:
+ endp[1]=ydim
+ if endp[2]>=zdim:
+ endp[2]=zdim
+ dtotal=0
+ pos2=np.zeros(3)
+ for xp in range(int(stp[0]),int(endp[0])):
+ rx=float(xp-pos[0])**2
+ for yp in range(int(stp[1]),int(endp[1])):
+ ry=float(yp-pos[1])**2
+ for zp in range(int(stp[2]),int(endp[2])):
+ rz=float(zp-pos[2])**2
+ d2=rx+ry+rz#relative distance square
+ v=np.exp(-1.5*d2*fsiv)*dens[xp,yp,zp]#This is the bottom part of the equation, where pos represents y, (xp,yp,zp) represents xi
+ dtotal+=v
+ if v>0:
+ pos2[0]+=v*(float)(xp)#pos2 is for the top part of the equation
+ pos2[1]+=v*(float)(yp)
+ pos2[2]+=v*(float)(zp)
+ if dtotal==0:
+ break
+ rd=1.00/float(dtotal)
+ tempcd=np.zeros(3)
+ current_move = 0
+ for j in range(3):
+ pos2[j]*=rd#Now we get the equation result
+ tempcd[j]=pos[j]-pos2[j]
+ current_move += (pos2[j]-point_cd[i][j])**2
+ pos[j]=pos2[j]#Prepare for iteration
+ check_d=tempcd[0]**2+tempcd[1]**2+tempcd[2]**2#Iteration until you find the place is stable
+ if current_move>=move_limit**2:#limit further moving to avoid missing base predictions in some regions
+ break
+ if check_d<0.001:
+ break
+
+ for j in range(3):
+
+ point_cd[i][j]=pos[j]
+ point_dens[i]=dtotal/cnt
+ return point_cd,point_dens
diff --git a/CryoREAD/ops/acc_merge_point.py b/CryoREAD/ops/acc_merge_point.py
new file mode 100644
index 0000000..ee45c46
--- /dev/null
+++ b/CryoREAD/ops/acc_merge_point.py
@@ -0,0 +1,38 @@
+
+from numba import jit
+import numpy as np
+import math
+@jit(nogil=True,nopython=True)
+def acc_merge_point(Ncd,dens,dmin,rv_range,rdcut,stock,cd,d2cut,member):
+ if True:
+ for i in range(Ncd-1):
+ if i%10000==0:
+ print(i)
+ tmp=np.zeros(3)
+ if (dens[i]-dmin)*rv_range < rdcut:
+ stock[i]=0#Label the small density parts as unused parts
+ if stock[i]==0:
+ continue
+ for j in range(i+1,Ncd):
+ if stock[j]==0:
+ continue
+ d2=0
+ for k in range(3):
+ tmp[k]=cd[i][k]-cd[j][k]
+ d2+=tmp[k]**2
+ if d2dens[j]:
+ stock[j]=0
+ member[j]=i
+ else:
+ stock[i]=0
+ member[i]=j
+ break#jump out of the second rotation, since i has been merged
+ #Update member data, to updata some son/grandson points to original father point
+ for i in range(Ncd):
+ now=int(member[i])
+ while now!=member[now]:#If it's not merged points, it will totates to find the father point(merged point)
+ now=int(member[now])
+ member[i]=now
+ return stock,member
diff --git a/CryoREAD/ops/argparser.py b/CryoREAD/ops/argparser.py
new file mode 100644
index 0000000..83f2286
--- /dev/null
+++ b/CryoREAD/ops/argparser.py
@@ -0,0 +1,79 @@
+#
+# Copyright (C) 2020 Xiao Wang
+# Email:xiaowang20140001@gmail.com wang3702@purdue.edu
+#
+
+import argparse
+
+
+def argparser():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("-F", type=str, help="Input map file path. (str)")
+ parser.add_argument("-M", type=str, default="best_model", help='Pre-trained model path. (str) Default value: "best_model"')
+ parser.add_argument("-P", type=str, help="Optional fasta sequence file path. (str) ")
+ parser.add_argument("--output", type=str, help="Output directory")
+ parser.add_argument(
+ "--mode",
+ type=int,
+ required=True,
+ help="Control Mode for program: 0: cryo_READ structure modeling. Required parameter. (Integer), Default value: 0",
+ )
+ parser.add_argument(
+ "--contour", type=float, default=0, help="Contour level for input map, suggested 0.5*[author_contour]. (Float), Default value: 0.0"
+ )
+ parser.add_argument("--stride", type=int, default=32, help="Stride for scanning of deep learning model. (Integer), Default value: 32.")
+ parser.add_argument("--box_size", type=int, default=64, help="Input box size for deep learning model. (Integer), Default value: 64")
+ parser.add_argument("--gpu", type=str, default=None, help="Specify the gpu we will use. (str), Default value: None.")
+ parser.add_argument("--batch_size", type=int, default=8, help="Batch size for inference of network. (Integer), Default value: 8.")
+ parser.add_argument(
+ "-f",
+ type=float,
+ default=0.05,
+ help="Filter for representative points, for LDPs, " "removing points' normalized density<=-f (Float), Default value: 0.05",
+ ) #
+ parser.add_argument(
+ "-m", type=float, default=2.0, help="After meanshifting merge points distance<[float]. (Float), Default value: 2.0. "
+ ) # merge distance
+ parser.add_argument(
+ "-g", type=float, default=3.0, help="Bandwidth of the Gaussian filter, (Float), Default value: 3.0."
+ ) # gaussian filter
+ parser.add_argument("-k", type=float, default=0.5, help="Always keep edges where d4 and line[:4]=="ATOM":
+ split_info=line.strip("\n").split()
+ current_chain = split_info[chain_ids]
+ if current_chain in tmp_chain_list:
+ tmp_chain_list.remove(current_chain)
+ print("remain waiting assign chain number %d"%len(tmp_chain_list))
+ chain_map_dict = {}
+ with open(input_cif_path,'r') as rfile:
+ with open(final_pdb_path,'w') as wfile:
+
+ for line in rfile:
+ if len(line)>4 and line[:4]=="ATOM":
+ split_info=line.strip("\n").split()
+ current_chain = split_info[chain_ids]
+ current_atom_index = int(split_info[atom_ids])
+ current_atom_name = split_info[atom_type_ids]
+ current_res_index = int(split_info[seq_ids])
+ current_res_name = split_info[res_name_ids]
+ current_x = float(split_info[x_ids])
+ current_y = float(split_info[y_ids])
+ current_z = float(split_info[z_ids])
+ if len(current_chain)>1: #replace with a temporary id
+ if current_chain in chain_map_dict:
+ current_chain = chain_map_dict[current_chain]
+ else:
+ remain_select_list=[x for x in tmp_chain_list if x not in list(chain_map_dict.values())]
+ chain_map_dict[current_chain]=remain_select_list[0]
+ current_chain = chain_map_dict[current_chain]
+ if current_res_index>9999:
+ current_res_index=9999
+ if current_atom_index>9999999:
+ current_atom_index=9999999
+ wline=""
+ wline += "ATOM%7d %-4s %3s%2s%4d " % (current_atom_index, current_atom_name,
+ current_res_name, current_chain,current_res_index)
+ wline = wline + "%8.3f%8.3f%8.3f%6.2f\n" % (current_x,current_y,current_z, 1.0)
+ wfile.write(wline)
\ No newline at end of file
diff --git a/CryoREAD/ops/fasta_utils.py b/CryoREAD/ops/fasta_utils.py
new file mode 100644
index 0000000..6928a8c
--- /dev/null
+++ b/CryoREAD/ops/fasta_utils.py
@@ -0,0 +1,64 @@
+
+from collections import defaultdict
+def read_fasta(input_fasta_path,dna_check=False):
+ #format should be
+ #>chain_id
+ #sequence
+ dna_rna_set = {"A":0, "U":1, "T":1, "C":2, "G":3}
+ chain_dict=defaultdict(list)#key: chain, value: nuc sequence
+ current_id=None
+
+ tmp_chain_list=[chr(i) for i in range(ord('A'), ord('Z') + 1)] # uppercase letters
+ tmp_chain_list.extend([chr(i) for i in range(ord('a'), ord('z') + 1)]) # lowercase letters
+ read_chain=False
+ dna_label=True
+ with open(input_fasta_path,'r') as file:
+ for line in file:
+ if line[0]==">":
+ current_id = line.strip("\n")
+ current_id = current_id.replace(">","")
+ if "|" in current_id:
+ current_id = current_id.split("|")[0]
+ if "_" in current_id:
+ current_id = current_id.split("_")[1]
+ read_chain=True
+ elif len(line.strip("\n").replace(" ",""))>0:
+ if current_id is None or read_chain==False or len(current_id)!=1:
+ visit_set=list(chain_dict.keys())
+ for tmp_chain in tmp_chain_list:
+ if tmp_chain not in visit_set:
+ current_id=tmp_chain
+ break
+
+ line=line.strip("\n").replace(" ","").replace("N","").replace("X","")
+ #quick check to see if it includes protein fasta
+ dna_seq_flag=True
+ count_porition=0
+ for item in line:
+ if item in dna_rna_set:
+ count_porition+=1
+ count_porition = count_porition/len(line)
+ if count_porition<=0.9:
+ dna_seq_flag=False
+ if dna_seq_flag:
+ for item in line:
+ if item in dna_rna_set:
+ if item=="U":
+ dna_label=False
+ chain_dict[current_id].append(dna_rna_set[item])
+ print("read chain info from fasta:",chain_dict)
+ if dna_check:
+ return chain_dict,dna_label
+ else:
+ return chain_dict
+
+# def read_dna_label(chain_dict):
+# dna_label=False
+# for key in chain_dict:
+# seq = chain_dict[key]
+# for item in key:
+# if item=="T":
+# dna_label=True
+# return dna_label
+
+
diff --git a/CryoREAD/ops/math_calcuation.py b/CryoREAD/ops/math_calcuation.py
new file mode 100644
index 0000000..b180d43
--- /dev/null
+++ b/CryoREAD/ops/math_calcuation.py
@@ -0,0 +1,15 @@
+
+
+import numpy as np
+
+def calculate_distance(x,y):
+ distance=0
+ for k in range(3):
+ distance+=(x[k]-y[k])**2
+ distance = np.sqrt(distance)
+ return distance
+
+def calculate_cosine_value(edge1,edge2,edge3):
+ nominator = edge1**2+edge2**2-edge3**2
+ denominator = 2*edge1*edge2
+ return nominator/denominator
diff --git a/CryoREAD/ops/os_operation.py b/CryoREAD/ops/os_operation.py
new file mode 100644
index 0000000..a0e060a
--- /dev/null
+++ b/CryoREAD/ops/os_operation.py
@@ -0,0 +1,43 @@
+import os
+def mkdir(path):
+ path=path.strip()
+ path=path.rstrip("\\")
+ isExists=os.path.exists(path)
+ if not isExists:
+ print (path+" created")
+ os.makedirs(path)
+ return True
+ else:
+ print (path+' existed')
+ return False
+def execCmd(cmd):
+ r = os.popen(cmd)
+ text = r.read()
+ r.close()
+ return text
+
+import gzip
+def unzip_gz(file_path):
+ new_path = file_path.replace(".gz","")
+ g_file = gzip.GzipFile(file_path)
+ with open(new_path,"wb+") as file:
+ file.write(g_file.read())
+ g_file.close()
+ return new_path
+import shutil
+def collect_refine_pdb(save_dir,last_pdb_path,final_pdb_path):
+ pdb_file=os.path.join(save_dir,"Refine_cycle3.pdb")
+ if os.path.exists(pdb_file):
+ shutil.copy(pdb_file,final_pdb_path)
+ else:
+ pdb_file=os.path.join(save_dir,"Refine_cycle2.pdb")
+ if os.path.exists(pdb_file):
+ shutil.copy(pdb_file,final_pdb_path)
+ else:
+ pdb_file=os.path.join(save_dir,"Refine_cycle1.pdb")
+ if os.path.exists(pdb_file):
+ shutil.copy(pdb_file,final_pdb_path)
+ else:
+ if os.path.exists(last_pdb_path):
+ shutil.copy(last_pdb_path,final_pdb_path)
+
diff --git a/CryoREAD/ops/progressbar.py b/CryoREAD/ops/progressbar.py
new file mode 100644
index 0000000..89c3a00
--- /dev/null
+++ b/CryoREAD/ops/progressbar.py
@@ -0,0 +1,12 @@
+import sys
+def progressbar(it, prefix="", size=60, out=sys.stdout): # Python3.3+
+ count = len(it)
+ def show(j):
+ x = int(size*j/count)
+ print("{}[{}{}] {}/{}".format(prefix, "#"*x, "."*(size-x), j, count),
+ end='\r', file=out, flush=True)
+ show(0)
+ for i, item in enumerate(it):
+ yield item
+ show(i+1)
+ print("\n", flush=True, file=out)
diff --git a/CryoREAD/ops/save_formated_pdb.py b/CryoREAD/ops/save_formated_pdb.py
new file mode 100644
index 0000000..7d6d755
--- /dev/null
+++ b/CryoREAD/ops/save_formated_pdb.py
@@ -0,0 +1,12 @@
+from pymol import cmd
+import sys
+import os
+
+#pymol -cq xx.py [pdb_path]
+mobile_name = sys.argv[3]
+save_path = sys.argv[4]
+mobile_name = os.path.abspath(mobile_name)
+mobile_state=os.path.split(mobile_name)[1][:-4]
+cmd.load(mobile_name,mobile_state)
+#save_path = mobile_name[:-4]+"_formated.pdb"
+cmd.save(save_path,mobile_state)
diff --git a/CryoREAD/predict/__init__.py b/CryoREAD/predict/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/CryoREAD/predict/predict_1st_stage.py b/CryoREAD/predict/predict_1st_stage.py
new file mode 100644
index 0000000..ef886bd
--- /dev/null
+++ b/CryoREAD/predict/predict_1st_stage.py
@@ -0,0 +1,82 @@
+
+
+from ops.os_operation import mkdir
+
+import os
+import mrcfile
+import numpy as np
+from data_processing.map_utils import save_predict_specific_map,permute_ns_coord_to_pdb,find_top_density
+from predict.unet_detect_map_cascad import unet_detect_map_cascad
+
+
+def Predict_1st_Stage(input_map_path,model_path,save_path,
+ voxel_size,stride,batch_size,contour,params):
+ #check if prediction exists, if it exists, then skip
+ cur_predict_path = os.path.join(save_path,"Input")
+ chain_predict_path = os.path.join(cur_predict_path,"chain_predictprob.npy")
+ if os.path.exists(chain_predict_path) and os.path.getsize(chain_predict_path)>=1000:
+ print("stage 1 prediction has been generated and saved in %s"%chain_predict_path)
+ return
+ with mrcfile.open(input_map_path, permissive=True) as map_mrc:
+ #normalize data
+ map_data = np.array(map_mrc.data)
+ # get the value serve as 1 in normalization
+ map_data[map_data < 0] = 0
+ print("map density range: %f %f"%(0,np.max(map_data)))
+ percentile_98 = find_top_density(map_data,0.98)
+
+ print("map hist log percentage 98: ",percentile_98)
+ map_data[map_data > percentile_98] = percentile_98
+ min_value = np.min(map_data)
+ max_value = np.max(map_data)
+ map_data = (map_data-min_value)/(max_value-min_value)
+ nxstart, nystart, nzstart = map_mrc.header.nxstart, \
+ map_mrc.header.nystart, \
+ map_mrc.header.nzstart
+ orig = map_mrc.header.origin
+ orig = str(orig)
+ orig = orig.replace("(", "")
+ orig = orig.replace(")", "")
+ orig = orig.split(",")
+ nstart = [nxstart, nystart, nzstart]
+ mapc = map_mrc.header.mapc
+ mapr = map_mrc.header.mapr
+ maps = map_mrc.header.maps
+ print("detected mode mapc %d, mapr %d, maps %d" % (mapc, mapr, maps))
+ nstart = permute_ns_coord_to_pdb(nstart, mapc, mapr, maps)
+ new_origin = []
+ for k in range(3):
+ new_origin.append(float(orig[k]) + float(nstart[k]))
+
+ print("Origin:", new_origin)
+ train_save_path = os.path.join(save_path,"Input")
+ mkdir(train_save_path)
+ #adjust the contour level by the maximum value
+ print("given contour %f"%contour)
+ contour = contour/percentile_98
+ print("revised contour %f"%contour)
+ detection_chain,detection_base = unet_detect_map_cascad(map_data,model_path,
+ voxel_size,stride,batch_size,
+ train_save_path,contour,params)
+ chain_label_list = ["sugar", "phosphate","base","protein",]
+ for k,chain_name in enumerate(chain_label_list):
+ cur_map_path = os.path.join(save_path, "chain_" + str(chain_name) + "_prob.mrc")
+ save_predict_specific_map(cur_map_path, k , detection_chain, input_map_path,label_only=False)
+
+ base_label_list = ["A", "UT","C","G",]
+ for k,base_name in enumerate(base_label_list):
+ cur_map_path = os.path.join(save_path, "base_" + str(base_name) + "_prob.mrc")
+ save_predict_specific_map(cur_map_path, k , detection_base, input_map_path,label_only=False)
+
+ #save extened base detection results
+ base_region_detection = detection_chain[2]>0.5
+ base_detect_compare = np.argmax(detection_base,axis=0)
+ base_detect_compare[base_region_detection<=0]=-1
+
+ for k,base_name in enumerate(base_label_list):
+ #current_base= detection_base[k]>0.5
+ current_win_base = base_detect_compare==k
+ #combine_detection = current_base|current_win_base
+ cur_map_path = os.path.join(save_path, "base_" + str(base_name) + "_win.mrc")
+ save_predict_specific_map(cur_map_path, 1 ,current_win_base, input_map_path,label_only=True)
+
diff --git a/CryoREAD/predict/predict_2nd_stage.py b/CryoREAD/predict/predict_2nd_stage.py
new file mode 100644
index 0000000..ebddaa8
--- /dev/null
+++ b/CryoREAD/predict/predict_2nd_stage.py
@@ -0,0 +1,73 @@
+from ops.os_operation import mkdir
+import os
+import mrcfile
+import numpy as np
+from data_processing.map_utils import permute_ns_coord_to_pdb,save_predict_specific_map,find_top_density
+from predict.unet_detect_map_refine import unet_detect_map_refine
+def Predict_2nd_Stage(input_map_path,prob_dir,model_path,save_path,
+ voxel_size,stride,batch_size,contour,params):
+ #check if prediction exists, if it exists, then skip
+ cur_predict_path = os.path.join(save_path,"Input")
+ chain_predict_path = os.path.join(cur_predict_path,"chain_predictprob.npy")
+ if os.path.exists(chain_predict_path) and os.path.getsize(chain_predict_path)>=1000:
+ print("stage 2 prediction has been generated and saved in %s"%chain_predict_path)
+ return
+
+ #prob_dir: cascad 1st stage output directory
+ cur_predict_path = os.path.join(prob_dir,"Input")
+ chain_predict_path = os.path.join(cur_predict_path,"chain_predictprob.npy")
+ base_predict_path = os.path.join(cur_predict_path,"base_predictprob.npy")
+ base_predict_prob = np.load(base_predict_path)
+ chain_predict_prob = np.load(chain_predict_path)
+ #use map and contour information to filter background regions to save computation
+ with mrcfile.open(input_map_path, permissive=True) as map_mrc:
+ #normalize data
+ map_data = np.array(map_mrc.data)
+ # get the value serve as 1 in normalization
+ map_data[map_data < 0] = 0
+ print("map density range: %f %f"%(0,np.max(map_data)))
+ percentile_98 = find_top_density(map_data,0.98)
+
+ print("map hist log percentage 98: ",percentile_98)
+ map_data[map_data > percentile_98] = percentile_98
+ min_value = np.min(map_data)
+ max_value = np.max(map_data)
+ map_data = (map_data-min_value)/(max_value-min_value)
+ nxstart, nystart, nzstart = map_mrc.header.nxstart, \
+ map_mrc.header.nystart, \
+ map_mrc.header.nzstart
+ orig = map_mrc.header.origin
+ orig = str(orig)
+ orig = orig.replace("(", "")
+ orig = orig.replace(")", "")
+ orig = orig.split(",")
+ nstart = [nxstart, nystart, nzstart]
+ mapc = map_mrc.header.mapc
+ mapr = map_mrc.header.mapr
+ maps = map_mrc.header.maps
+ print("detected mode mapc %d, mapr %d, maps %d" % (mapc, mapr, maps))
+ nstart = permute_ns_coord_to_pdb(nstart, mapc, mapr, maps)
+ new_origin = []
+ for k in range(3):
+ new_origin.append(float(orig[k]) + float(nstart[k]))
+
+ print("Origin:", new_origin)
+ train_save_path = os.path.join(save_path,"Input")
+ mkdir(train_save_path)
+ #adjust the contour level by the maximum value
+ print("given contour %f"%contour)
+ contour = contour/percentile_98
+ print("revised contour %f"%contour)
+ detection_all = unet_detect_map_refine(map_data,chain_predict_prob,base_predict_prob, model_path,
+ voxel_size,stride,batch_size,
+ train_save_path,contour,params)
+
+ chain_label_list = ["sugar", "phosphate","A","UT","C","G","protein","base"]
+ for k,chain_name in enumerate(chain_label_list):
+ cur_map_path = os.path.join(save_path, "chain_" + str(chain_name) + "_prob.mrc")
+ save_predict_specific_map(cur_map_path, k , detection_all, input_map_path,label_only=False)
+
+
+
+
+
diff --git a/CryoREAD/predict/unet_detect_map_cascad.py b/CryoREAD/predict/unet_detect_map_cascad.py
new file mode 100644
index 0000000..1e4f28b
--- /dev/null
+++ b/CryoREAD/predict/unet_detect_map_cascad.py
@@ -0,0 +1,213 @@
+
+import numpy as np
+import os
+import datetime
+import time
+import torch
+import torch.nn as nn
+from ops.Logger import AverageMeter,ProgressMeter
+from data_processing.DRNA_dataset import Single_Dataset
+from model.Cascade_Unet import Cascade_Unet
+
+
+def gen_input_data(map_data,voxel_size,stride,contour,train_save_path):
+ scan_x, scan_y, scan_z = map_data.shape
+ count_voxel = 0
+ count_iter=0
+ Coord_Voxel = []
+ from progress.bar import Bar
+ bar = Bar('Preparing Input: ', max=int(np.ceil(scan_x/stride)*np.ceil(scan_y/stride)*np.ceil(scan_z/stride)))
+
+
+ for x in range(0, scan_x, stride):
+ x_end = min(x + voxel_size, scan_x)
+ for y in range(0, scan_y, stride):
+ y_end = min(y + voxel_size, scan_y)
+ for z in range(0, scan_z, stride):
+ count_iter+=1
+ bar.next()
+ #print("1st stage: %.4f percent scanning finished"%(count_iter*100/(scan_x*scan_y*scan_z/(stride**3))),"location %d %d %d"%(x,y,z))
+ z_end = min(z + voxel_size, scan_z)
+ if x_end < scan_x:
+ x_start = x
+ else:
+ x_start = x_end - voxel_size
+
+ if x_start<0:
+ x_start=0
+ if y_end < scan_y:
+ y_start = y
+ else:
+ y_start = y_end - voxel_size
+
+ if y_start<0:
+ y_start=0
+ if z_end < scan_z:
+ z_start = z
+ else:
+ z_start = z_end - voxel_size
+
+ if z_start<0:
+ z_start=0
+ #already normalized
+ segment_map_voxel = np.zeros([voxel_size,voxel_size,voxel_size])
+ segment_map_voxel[:x_end-x_start,:y_end-y_start,:z_end-z_start]=map_data[x_start:x_end, y_start:y_end, z_start:z_end]
+ if contour<=0:
+ meaningful_density_count = len(np.argwhere(segment_map_voxel>0))
+ meaningful_density_ratio = meaningful_density_count/float(voxel_size**3)
+ if meaningful_density_ratio<=0.001:
+ #print("no meaningful density ratio %f in current scanned box, skip it!"%meaningful_density_ratio)
+ continue
+ else:
+ meaningful_density_count = len(np.argwhere(segment_map_voxel > contour))
+ meaningful_density_ratio = meaningful_density_count / float(voxel_size ** 3)
+ if meaningful_density_ratio <= 0.001:
+ # print("no meaningful density ratio in current scanned box, skip it!")
+ continue
+ cur_path = os.path.join(train_save_path,"input_"+str(count_voxel)+".npy")
+ np.save(cur_path,segment_map_voxel)
+ Coord_Voxel.append([x_start,y_start,z_start])
+ count_voxel+=1
+ bar.finish()
+ Coord_Voxel = np.array(Coord_Voxel)
+ coord_path = os.path.join(train_save_path,"Coord.npy")
+ np.save(coord_path,Coord_Voxel)
+ print("In total we prepared %d boxes as input"%(len(Coord_Voxel)))
+ return Coord_Voxel
+
+
+def make_predictions(test_loader,model,Coord_Voxel,voxel_size,overall_shape,num_classes,base_classes):
+ avg_meters = {'data_time': AverageMeter('data_time'),
+ 'train_time': AverageMeter('train_time')}
+ progress = ProgressMeter(
+ len(test_loader),
+ [avg_meters['data_time'],
+ avg_meters['train_time'],
+ ],
+ prefix="#Eval:")
+ model.eval()
+ end_time = time.time()
+ scan_x, scan_y, scan_z = overall_shape
+ Prediction_Matrix = np.zeros([num_classes,overall_shape[0],overall_shape[1],overall_shape[2]])
+ Base_Matrix = np.zeros([base_classes,overall_shape[0],overall_shape[1],overall_shape[2]])
+ Count_Matrix = np.zeros(overall_shape)
+ #average for overlap regions
+ with torch.no_grad():
+ for batch_idx, data in enumerate(test_loader):
+ # input, atom_target, nuc_target = data
+ input, cur_index = data
+ #print(input.shape)
+ avg_meters['data_time'].update(time.time() - end_time, input.size(0))
+ cur_id = cur_index.detach().cpu().numpy()#test_loader.dataset.id_list[cur_index.detach().numpy()]
+ input = input.cuda()
+ chain_outputs, base_outputs = model(input)
+ final_output = torch.sigmoid(chain_outputs[0]).detach().cpu().numpy()
+ final_base = torch.sigmoid(base_outputs[0]).detach().cpu().numpy()
+
+ avg_meters['train_time'].update(time.time() - end_time, input.size(0))
+ progress.display(batch_idx)
+ for k in range(len(cur_id)):
+ tmp_index = cur_id[k]
+ x_start, y_start, z_start = Coord_Voxel[int(tmp_index)]
+ x_end,y_end,z_end = x_start+voxel_size,y_start+voxel_size,z_start+voxel_size
+ if x_end < scan_x:
+ x_start = x_start
+ else:
+ x_end = scan_x
+ x_start = x_end - voxel_size
+ if x_start<0:
+ x_start=0
+ if y_end < scan_y:
+ y_start = y_start
+ else:
+ y_end = scan_y
+ y_start = y_end - voxel_size
+ if y_start<0:
+ y_start=0
+ if z_end < scan_z:
+ z_start = z_start
+ else:
+ z_end=scan_z
+ z_start = z_end - voxel_size
+ if z_start<0:
+ z_start=0
+ #print(final_output[k].shape)
+ #print(Prediction_Matrix[:,x_start:x_end,y_start:y_end,z_start:z_end].shape)
+
+ Prediction_Matrix[:,x_start:x_end,y_start:y_end,z_start:z_end] += final_output[k][:,:x_end-x_start,:y_end-y_start,:z_end-z_start]
+ Base_Matrix[:,x_start:x_end,y_start:y_end,z_start:z_end] += final_base[k][:,:x_end-x_start,:y_end-y_start,:z_end-z_start]
+ Count_Matrix[x_start:x_end,y_start:y_end,z_start:z_end]+=1
+ if batch_idx%1000==0:
+ for j in range(num_classes):
+ count_positive = len(np.argwhere(Prediction_Matrix[j]>=0.5))
+ print("%d classes already detected %d voxels"%(j,count_positive))
+ end_time = time.time()
+ Prediction_Matrix = Prediction_Matrix/Count_Matrix
+ #replace nan with 0
+ Prediction_Matrix[np.isnan(Prediction_Matrix)] = 0
+ Prediction_Label = np.argmax(Prediction_Matrix,axis=0)
+
+ Base_Matrix = Base_Matrix/Count_Matrix
+ Base_Matrix[np.isnan(Base_Matrix)] = 0
+ #New_Base_Matrix = np.zeros([base_classes-1,overall_shape[0],overall_shape[1],overall_shape[2]])
+ New_Base_Matrix = Base_Matrix[:-1]
+ New_Base_Matrix[1] = np.maximum(Base_Matrix[1],Base_Matrix[-1])#merge u and t predictions
+
+ Base_Label = np.argmax(New_Base_Matrix,axis=0)
+ return Prediction_Matrix,Prediction_Label,New_Base_Matrix,Base_Label
+
+
+
+def unet_detect_map_cascad(map_data,resume_model_path,voxel_size,
+ stride,batch_size,train_save_path,contour,params):
+
+ coord_path = os.path.join(train_save_path, "Coord.npy")
+ if os.path.exists(coord_path):
+ Coord_Voxel = np.load(coord_path)
+ else:
+ Coord_Voxel = gen_input_data(map_data,voxel_size, stride, contour,train_save_path)
+ overall_shape = map_data.shape
+ test_dataset = Single_Dataset(train_save_path,"input_")
+ test_loader = torch.utils.data.DataLoader(
+ test_dataset,
+ batch_size=batch_size,
+ shuffle=False,
+ pin_memory=True,
+ num_workers=params['num_workers'],
+ drop_last=False)
+ chain_class = 4
+ base_class = 5
+ model = Cascade_Unet(in_channels=1,#include density and probability array
+ n_classes1=chain_class,
+ n_classes2= base_class,
+ feature_scale=4,
+ is_deconv=True,
+ is_batchnorm=True)
+
+ model = model.cuda()
+ model = nn.DataParallel(model, device_ids=None)
+ state_dict = torch.load(resume_model_path)
+ msg = model.load_state_dict(state_dict['state_dict'])
+ print("model loading: ",msg)
+ cur_prob_path = os.path.join(train_save_path, "chain_predictprob.npy")
+ cur_label_path = os.path.join(train_save_path, "chain_predict.npy")
+ cur_baseprob_path = os.path.join(train_save_path, "base_predictprob.npy")
+ cur_baselabel_path = os.path.join(train_save_path, "base_predict.npy")
+ if os.path.exists(cur_prob_path) and os.path.exists(cur_label_path):
+ Prediction_Matrix =np.load(cur_prob_path)
+ Prediction_Label = np.load(cur_label_path)
+ Base_Matrix = np.load(cur_baseprob_path)
+ Base_Label = np.load(cur_baselabel_path)
+ else:
+ Prediction_Matrix,Prediction_Label,Base_Matrix,Base_Label = make_predictions(test_loader,model,Coord_Voxel,
+ voxel_size,overall_shape,
+ chain_class,base_class)
+
+ np.save(cur_prob_path,Prediction_Matrix)
+ np.save(cur_label_path, Prediction_Label)
+ np.save(cur_baseprob_path,Base_Matrix)
+ np.save(cur_baselabel_path,Base_Label)
+ #save disk space for the generated input boxes
+ os.system("rm "+train_save_path+"/input*")
+ return Prediction_Matrix,Base_Matrix
+
diff --git a/CryoREAD/predict/unet_detect_map_refine.py b/CryoREAD/predict/unet_detect_map_refine.py
new file mode 100644
index 0000000..c1881e8
--- /dev/null
+++ b/CryoREAD/predict/unet_detect_map_refine.py
@@ -0,0 +1,216 @@
+
+import imp
+import numpy as np
+import os
+import datetime
+import time
+import torch
+import torch.nn as nn
+from ops.Logger import AverageMeter,ProgressMeter
+from data_processing.DRNA_dataset import Single_Dataset2
+from model.Small_Unet_3Plus_DeepSup import Small_UNet_3Plus_DeepSup
+def gen_input_data(map_data,chain_prob,base_prob,voxel_size,stride,contour,train_save_path):
+ from progress.bar import Bar
+ scan_x, scan_y, scan_z = map_data.shape
+ chain_classes =len(chain_prob)
+ base_classes = len(base_prob)
+ count_voxel = 0
+ count_iter=0
+ Coord_Voxel = []
+ bar = Bar('Preparing Input: ', max=int(np.ceil(scan_x/stride)*np.ceil(scan_y/stride)*np.ceil(scan_z/stride)))
+
+ for x in range(0, scan_x, stride):
+ x_end = min(x + voxel_size, scan_x)
+ for y in range(0, scan_y, stride):
+ y_end = min(y + voxel_size, scan_y)
+ for z in range(0, scan_z, stride):
+ bar.next()
+ count_iter+=1
+ z_end = min(z + voxel_size, scan_z)
+ if x_end < scan_x:
+ x_start = x
+ else:
+ x_start = x_end - voxel_size
+ if x_start<0:
+ x_start=0
+ if y_end < scan_y:
+ y_start = y
+ else:
+ y_start = y_end - voxel_size
+ if y_start<0:
+ y_start=0
+ if z_end < scan_z:
+ z_start = z
+ else:
+ z_start = z_end - voxel_size
+ if z_start<0:
+ z_start=0
+ #already normalized
+ segment_map_voxel = map_data[x_start:x_end, y_start:y_end, z_start:z_end]
+ if contour<=0:
+ meaningful_density_count = len(np.argwhere(segment_map_voxel>0))
+ meaningful_density_ratio = meaningful_density_count/float(voxel_size**3)
+ if meaningful_density_ratio<=0.001:
+ # print("meaningful density ratio %f of current box, skip it!"%meaningful_density_ratio)
+ continue
+ else:
+ meaningful_density_count = len(np.argwhere(segment_map_voxel > contour))
+ meaningful_density_ratio = meaningful_density_count / float(voxel_size ** 3)
+ if meaningful_density_ratio <= 0.001:
+ # print("no meaningful density ratio of current box, skip it!")
+ continue
+ segment_input_voxel = np.zeros([chain_classes+base_classes, voxel_size,voxel_size,voxel_size])
+ segment_input_voxel[:chain_classes,:x_end-x_start,:y_end-y_start,:z_end-z_start]=chain_prob[:,x_start:x_end, y_start:y_end, z_start:z_end]
+ segment_input_voxel[chain_classes:,:x_end-x_start,:y_end-y_start,:z_end-z_start]=base_prob[:,x_start:x_end, y_start:y_end, z_start:z_end]
+ #check values in segment input_voxel
+ #different classes >=0.5 number should be bigger than 0.001
+ count_meaningful = (segment_input_voxel>0.5).sum()
+ meaningful_density_ratio = count_meaningful/ float(voxel_size ** 3)
+ if meaningful_density_ratio <= 0.001:
+ #print("no meaningful predictions of current box in 1st stage, skip it!")
+ continue
+
+
+ cur_path = os.path.join(train_save_path,"input_"+str(count_voxel)+".npy")
+ np.save(cur_path,segment_input_voxel)
+ Coord_Voxel.append([x_start,y_start,z_start])
+ count_voxel+=1
+ #print("2nd stage: %.2f percent scanning finished"%(count_iter/(scan_x*scan_y*scan_z/(stride**3))))
+ bar.finish()
+ Coord_Voxel = np.array(Coord_Voxel)
+ coord_path = os.path.join(train_save_path,"Coord.npy")
+ np.save(coord_path,Coord_Voxel)
+ print("in 2nd stage, in total we have %d boxes"%len(Coord_Voxel))
+ return Coord_Voxel
+
+from model.Small_Unet_3Plus_DeepSup import Small_UNet_3Plus_DeepSup
+import gc
+def make_predictions(test_loader,model,Coord_Voxel,voxel_size,overall_shape,num_classes,run_type=0):
+ avg_meters = {'data_time': AverageMeter('data_time'),
+ 'train_time': AverageMeter('train_time')}
+ progress = ProgressMeter(
+ len(test_loader),
+ [avg_meters['data_time'],
+ avg_meters['train_time'],
+ ],
+ prefix="#Eval:")
+ model.eval()
+ end_time = time.time()
+ scan_x, scan_y, scan_z = overall_shape
+ Prediction_Matrix = np.zeros([num_classes,overall_shape[0],overall_shape[1],overall_shape[2]])
+ #Count_Matrix = np.zeros(overall_shape)
+ with torch.no_grad():
+ for batch_idx, data in enumerate(test_loader):
+ # input, atom_target, nuc_target = data
+ input, cur_index = data
+ #print(input.shape)
+ avg_meters['data_time'].update(time.time() - end_time, input.size(0))
+ cur_id = cur_index.detach().cpu().numpy()#test_loader.dataset.id_list[cur_index.detach().numpy()]
+ input = input.cuda()
+ outputs = model(input)
+ if run_type==2:
+ final_output = torch.softmax(torch.sigmoid(outputs[0]),dim=1).detach().cpu().numpy()
+ elif run_type==1:
+ final_output = torch.sigmoid(outputs[0]).detach().cpu().numpy()
+ else:
+ final_output = torch.softmax(outputs[0],dim=1).detach().cpu().numpy()
+ avg_meters['train_time'].update(time.time() - end_time, input.size(0))
+ progress.display(batch_idx)
+ for k in range(len(cur_id)):
+ tmp_index = cur_id[k]
+ x_start, y_start, z_start = Coord_Voxel[int(tmp_index)]
+ x_end,y_end,z_end = x_start+voxel_size,y_start+voxel_size,z_start+voxel_size
+ if x_end < scan_x:
+ x_start = x_start
+ else:
+ x_end = scan_x
+ x_start = x_end - voxel_size
+ if x_start<0:
+ x_start=0
+ if y_end < scan_y:
+ y_start = y_start
+ else:
+ y_end = scan_y
+ y_start = y_end - voxel_size
+ if y_start<0:
+ y_start=0
+ if z_end < scan_z:
+ z_start = z_start
+ else:
+ z_end=scan_z
+ z_start = z_end - voxel_size
+ if z_start<0:
+ z_start=0
+ #print(final_output[k].shape)
+ #print(Prediction_Matrix[:,x_start:x_end,y_start:y_end,z_start:z_end].shape)
+ #pred_label=np.argmax(final_output[k],axis=1)
+ #count_positive= len(np.argwhere(pred_label!=0))
+ #print("%d example with %d positive predictions"%(k,count_positive))
+ Prediction_Matrix[:,x_start:x_end,y_start:y_end,z_start:z_end] =np.maximum(Prediction_Matrix[:,x_start:x_end,y_start:y_end,z_start:z_end],final_output[k][:,:x_end-x_start,:y_end-y_start,:z_end-z_start])
+
+ #Count_Matrix[x_start:x_end,y_start:y_end,z_start:z_end]+=1
+ if batch_idx%1000==0:
+ for j in range(num_classes):
+ count_positive = len(np.argwhere(Prediction_Matrix[j]>=0.5))
+ print("%d classes already detected %d voxels"%(j,count_positive))
+ end_time = time.time()
+ del final_output
+ #del pred_label
+ del outputs
+ del input
+ gc.collect()
+ #Prediction_Matrix = Prediction_Matrix/Count_Matrix
+ #replace nan with 0
+ Prediction_Matrix[np.isnan(Prediction_Matrix)] = 0
+ Prediction_Label = np.argmax(Prediction_Matrix,axis=0)
+
+
+ return Prediction_Matrix,Prediction_Label
+def unet_detect_map_refine(map_data,chain_prob,base_prob,resume_model_path,voxel_size,
+ stride,batch_size,train_save_path,contour,params):
+ coord_path = os.path.join(train_save_path, "Coord.npy")
+ if os.path.exists(coord_path):
+ Coord_Voxel = np.load(coord_path)
+ else:
+ Coord_Voxel = gen_input_data(map_data,chain_prob,base_prob,voxel_size, stride, contour,train_save_path)
+ overall_shape = map_data.shape
+ test_dataset = Single_Dataset2(train_save_path,"input_")
+ test_loader = torch.utils.data.DataLoader(
+ test_dataset,
+ batch_size=batch_size,
+ pin_memory=True,
+ shuffle=False,
+ num_workers=params['num_workers'],
+ drop_last=False)
+ chain_class = len(chain_prob)
+ base_class = len(base_prob)
+ output_classes = chain_class+base_class
+ model = Small_UNet_3Plus_DeepSup(in_channels=chain_class+base_class,
+ n_classes=output_classes,
+ feature_scale=4,
+ is_deconv=True,
+ is_batchnorm=True)
+ model = model.cuda()
+ model = nn.DataParallel(model, device_ids=None)
+ state_dict = torch.load(resume_model_path)
+ msg = model.load_state_dict(state_dict['state_dict'])
+ print("model loading: ",msg)
+ cur_prob_path = os.path.join(train_save_path, "chain_predictprob.npy")
+ cur_label_path = os.path.join(train_save_path, "chain_predict.npy")
+ if os.path.exists(cur_prob_path) and os.path.exists(cur_label_path):
+ Prediction_Matrix =np.load(cur_prob_path)
+ Prediction_Label = np.load(cur_label_path)
+
+ else:
+ Prediction_Matrix,Prediction_Label = make_predictions(test_loader,model,Coord_Voxel,
+ voxel_size,overall_shape,
+ output_classes,run_type=1)#must sigmoid activated
+
+ np.save(cur_prob_path,Prediction_Matrix)
+ np.save(cur_label_path, Prediction_Label)
+ return Prediction_Matrix
+
+
+
+
+
diff --git a/CryoREAD/requirements.txt b/CryoREAD/requirements.txt
new file mode 100644
index 0000000..f82bd5e
--- /dev/null
+++ b/CryoREAD/requirements.txt
@@ -0,0 +1,10 @@
+biopython
+numpy
+numba
+scipy
+ortools==9.4.1874
+mrcfile
+torch==1.6.0
+progress
+numba-progress
+tqdm
diff --git a/CryoREAD/structure/Edge.py b/CryoREAD/structure/Edge.py
new file mode 100644
index 0000000..1077c85
--- /dev/null
+++ b/CryoREAD/structure/Edge.py
@@ -0,0 +1,21 @@
+class Edge(object):
+ def __init__(self):
+ self.d=0
+ self.dens=0#Density
+ self.id1=0#Node id
+ self.id2=0
+ self.eid=0#edge id
+ self.mst_label=False#the edge is in mst or not
+ self.local_label=False#Whether it is in local region
+ #if there exists a node is in acceptable range to an edge's two nodes, then the edge will be marked as local_label=True
+ self.keep_label=False#the 100% will be on the tree will be marked as keep_label=True
+ def copy_edge(self,in_edge):
+ self.d=in_edge.d
+ self.dens=in_edge.dens
+ self.id1=in_edge.id1
+ self.id2=in_edge.id2
+ self.eid=in_edge.eid
+ self.mst_label=in_edge.mst_label
+ self.local_label=in_edge.local_label
+ self.keep_label=in_edge.keep_label
+
diff --git a/CryoREAD/structure/MRC.py b/CryoREAD/structure/MRC.py
new file mode 100644
index 0000000..638b21e
--- /dev/null
+++ b/CryoREAD/structure/MRC.py
@@ -0,0 +1,536 @@
+
+import mrcfile
+import numpy as np
+import math
+from ops.acc_mean_shift import carry_shift,carry_shift_limit
+class MRC(object):
+ def __init__(self,file_name,gaussian_bandwidth,contour=0):
+ """
+ :param file_name: #File name of the mrc file
+ :param params: parameter config
+ """
+ self.file_name=file_name
+ #self.params=params
+ self.gaussian_bandwidth=gaussian_bandwidth
+ self.read_mrc(self.file_name,contour)
+ self.print_info()
+
+ def read_mrc(self, file_name=None,contour=0):
+ if file_name is not None:
+ self.filename = file_name
+ with mrcfile.open(self.file_name, permissive=True) as mrc:
+ # First read the header of mrc
+ self.nx, self.ny, self.nz = mrc.header.nx, mrc.header.ny, mrc.header.nz # number of columns/rows/sections in 3 dimensions
+ self.mode = mrc.header.mode # The recording mode in the mrc file
+ """
+ array (slow axis)
+ location:13-16 MODE
+ 0 8-bit signed integer (range -128 to 127)
+ 1 16-bit signed integer
+ 2 32-bit signed real
+ 3 transform : complex 16-bit integers
+ 4 transform : complex 32-bit reals
+ 6 16-bit unsigned integer
+ """
+ self.nxstart = mrc.header.nxstart # location of first column in unit cell
+ self.nystart = mrc.header.nystart # location of first row in unit cell
+ self.nzstart = mrc.header.nzstart # location of first section in unit cell
+ self.mx, self.my, self.mz = mrc.header.mx, mrc.header.my, mrc.header.mz # sampling along X,Y,Z axis of unit cell
+ self.xlen, self.ylen, self.zlen = mrc.header.cella.x, mrc.header.cella.y, mrc.header.cella.z # cell dimensions in angstroms
+ self.alpha, self.beta, self.gamma = mrc.header.cellb.alpha, mrc.header.cellb.beta, mrc.header.cellb.gamma # Cell angles in degrees
+ self.mapc, self.mapr, self.maps = mrc.header.mapc, mrc.header.mapr, mrc.header.maps # axis corresponds to column/row/section, 1 represents X axis,2 is Y, 3 is Z
+ self.dmin, self.dmax, self.dmean = mrc.header.dmin, mrc.header.dmax, mrc.header.dmean # minimum/maximum/density density value
+ self.ispg = mrc.header.ispg # space group number
+ # explanation for ispg:Spacegroup 0 implies a 2D image or image stack. For crystallography, ISPG represents the actual spacegroup. For single volumes from EM/ET, the spacegroup should be 1. For volume stacks, we adopt the convention that ISPG is the spacegroup number + 400, which in EM/ET will typically be 401.
+ self.nsymbt = mrc.header.nsymbt # NSYMBT specifies the size of the extended header in bytes, whether it contains symmetry records (as in the original format definition) or any other kind of additional metadata.
+ self.originx, self.originy, self.originz = mrc.header.origin.x, mrc.header.origin.y, mrc.header.origin.z
+ """
+ For transforms (Mode 3 or 4), ORIGIN is the phase origin of the transformed image in pixels, e.g. as used in helical processing of the MRC package. For a transform of a padded image, this value corresponds to the pixel position in the padded image of the center of the unpadded image.
+ For other modes, ORIGIN specifies the real space location of a subvolume taken from a larger volume. In the (2-dimensional) example shown above, the header of the map containing the subvolume (red rectangle) would contain ORIGIN = 100, 120 to specify its position with respect to the original volume (assuming the original volume has its own ORIGIN set to 0, 0).
+ """
+ self.dens = np.array(mrc.data) # Dense data from the mrc file
+ # self.dens.flags.writeable=True
+ self.shape = self.dens.shape
+ self.NumVoxels = self.nx * self.ny * self.nz
+ self.widthx = self.xlen / self.mx
+ self.widthy = self.ylen / self.my
+ self.widthz = self.zlen / self.mz # length of each sampling on different axis
+ self.ordermode = 0
+ if self.mapc == 1 and self.mapr == 2 and self.maps == 3:
+ self.ordermode = 1
+ self.xdim = self.nx
+ self.ydim = self.ny
+ self.zdim = self.nz
+ if self.mapc == 1 and self.mapr == 3 and self.maps == 2:
+ self.ordermode = 2
+ self.xdim = self.nx
+ self.ydim = self.nz
+ self.zdim = self.ny
+ if self.mapc == 2 and self.mapr == 1 and self.maps == 3:
+ self.ordermode = 3
+ self.xdim = self.ny
+ self.ydim = self.nx
+ self.zdim = self.nz
+ if self.mapc == 2 and self.mapr == 3 and self.maps == 1:
+ self.ordermode = 4
+ self.xdim = self.nz
+ self.ydim = self.nx
+ self.zdim = self.ny
+ if self.mapc == 3 and self.mapr == 1 and self.maps == 2:
+ self.ordermode = 5
+ self.xdim = self.ny
+ self.ydim = self.nz
+ self.zdim = self.nx
+ if self.mapc == 3 and self.mapr == 2 and self.maps == 1:
+ self.ordermode = 6
+ self.xdim = self.nz
+ self.ydim = self.ny
+ self.zdim = self.nx
+ #
+ self.dens[self.dens=threshold:
+ density[i, j, k] = chain_prob[i, j, k]
+ cnt_act += 1
+ self.dens = density
+ self.Nact = cnt_act
+
+ print("dens shape now")
+ print(self.shape)
+ print("x dim now %d,ydim now %d, zdim now %d" % (self.xdim, self.ydim, self.zdim))
+ print("mrc file mode %d" % self.ordermode)
+ total = self.shape[0] * self.shape[1] * self.shape[2]
+ print("in total useful percentage %.6f" % (self.Nact / total))
+ if self.ordermode == 1:
+ self.dens = self.dens.swapaxes(0, 2)
+
+ elif self.ordermode == 2:
+ self.dens = self.dens.swapaxes(0, 1)
+ self.dens = self.dens.swapaxes(0, 2)
+
+ elif self.ordermode == 3:
+ self.dens = self.dens.swapaxes(0, 1)
+ self.dens = self.dens.swapaxes(1, 2)
+ elif self.ordermode == 4:
+
+ self.dens = self.dens.swapaxes(0, 1)
+ elif self.ordermode == 5:
+ self.dens = self.dens.swapaxes(1, 2)
+ elif self.ordermode == 6:
+ print('order mode 0 no operation')
+ else:
+ print('invalid ordermode of mrc files %d' % self.ordermode)
+ exit()
+ print("dens shape now")
+ print(self.dens.shape)
+ self.shape = self.dens.shape
+ print("x dim now %d" % self.xdim)
+ print("after normalizing dens min %.4f max %.4f" % (np.min(self.dens), np.max(self.dens)))
+ return True
+
+ def upsampling_chain_prob(self,chain_prob,threshold=0.01):
+
+ main_chain_prob = chain_prob[1]
+ main_chain_prob[main_chain_prob<=threshold]=0#only keep our main chain prediction
+ main_chain_prob[self.dens<=0]=0
+ main_chain_prob[np.isnan(main_chain_prob)]=0
+ self.chain_dens = main_chain_prob
+ self.chain_Nact = len(np.argwhere(main_chain_prob!=0))
+ total = self.shape[0] * self.shape[1] * self.shape[2]
+ print("in total chain useful percentage %.6f" % (self.chain_Nact / total))
+ print("dens shape now")
+ print(self.dens.shape)
+ self.shape = self.dens.shape
+ print("after normalizing chain dens min %.4f max %.4f" % (np.min(self.chain_dens), np.max(self.chain_dens)))
+ return True
+ def upsampling_sugar_prob(self,main_chain_prob,threshold=0.01,filter_array=None):
+
+ #main_chain_prob = nuc_prob[1]
+ main_chain_prob[main_chain_prob<=threshold]=0#only keep our main chain prediction
+ main_chain_prob[self.dens<=0]=0
+ main_chain_prob[np.isnan(main_chain_prob)]=0
+ if filter_array is not None:
+ main_chain_prob[filter_array<=0]=0
+ self.sugar_dens = main_chain_prob
+ self.sugar_Nact = len(np.argwhere(main_chain_prob!=0))
+ total = self.shape[0] * self.shape[1] * self.shape[2]
+ print("useful points: ",self.sugar_Nact)
+ print("in total chain useful percentage %.6f" % (self.sugar_Nact / total))
+ print("dens shape now")
+ print(self.dens.shape)
+ self.shape = self.dens.shape
+ print("after normalizing sugar dens min %.4f max %.4f" % (np.min(self.sugar_dens), np.max(self.sugar_dens)))
+ return True
+
+ def upsampling_pho_prob(self,main_chain_prob,threshold=0.01,filter_array=None):
+
+ #main_chain_prob = nuc_prob[2]
+ main_chain_prob[main_chain_prob<=threshold]=0#only keep our main chain prediction
+ main_chain_prob[self.dens<=0]=0
+ main_chain_prob[np.isnan(main_chain_prob)]=0
+ if filter_array is not None:
+ main_chain_prob[filter_array<=0]=0
+ self.pho_dens = main_chain_prob
+ self.pho_Nact = len(np.argwhere(main_chain_prob!=0))
+ total = self.shape[0] * self.shape[1] * self.shape[2]
+ print("useful points: ",self.pho_Nact)
+ print("in total chain useful percentage %.6f" % (self.pho_Nact / total))
+ print("after normalizing pho dens min %.4f max %.4f" % (np.min(self.pho_dens), np.max(self.pho_dens)))
+ return True
+ def upsampling_specify_prob(self,specify_name,main_chain_prob,threshold=0.01,filter_array=None):
+
+ #main_chain_prob = nuc_prob[2]
+ main_chain_prob[main_chain_prob<=threshold]=0#only keep our main chain prediction
+ main_chain_prob[self.dens<=0]=0
+ main_chain_prob[np.isnan(main_chain_prob)]=0
+ if filter_array is not None:
+ main_chain_prob[filter_array<=0]=0
+ names = self.__dict__
+ names['%s_dens'%specify_name] = main_chain_prob
+ names["%s_Nact"%specify_name] = len(np.argwhere(main_chain_prob!=0))
+ total = self.shape[0] * self.shape[1] * self.shape[2]
+ print("useful points: ",names["%s_Nact"%specify_name])
+ print("in total chain useful percentage %.6f" % (names["%s_Nact"%specify_name] / total))
+ print("after normalizing pho dens min %.4f max %.4f" % (np.min(names['%s_dens'%specify_name] ), np.max(names['%s_dens'%specify_name] )))
+ return True
+
+ def upsampling_patom_prob(self,main_chain_prob,threshold=0.01,filter_array=None):
+
+
+ main_chain_prob[main_chain_prob<=threshold]=0#only keep our main chain prediction
+ main_chain_prob[self.dens<=0]=0
+ if filter_array is not None:
+ main_chain_prob[filter_array<=0]=0
+ self.patom_dens = main_chain_prob
+ self.patom_Nact = len(np.argwhere(main_chain_prob!=0))
+ total = self.shape[0] * self.shape[1] * self.shape[2]
+ print("useful points: ",self.patom_Nact)
+ print("in total chain useful percentage %.6f" % (self.patom_Nact / total))
+ print("after normalizing pho dens min %.4f max %.4f" % (np.min(self.patom_dens), np.max(self.patom_dens)))
+ return True
+
+ def upsampling_based_prob(self,prob_dict,threshold):
+ Name_list = ["atom", "nuc", "chain",'base']
+ self.dens.flags.writeable = True
+ cnt_act = 0
+ self.dens[self.dens<0]=0
+ #then normalize the original density
+ norm_density = self.dens/np.max(self.dens)
+
+ density = np.zeros(self.dens.shape)
+ chain_prob = prob_dict['chain']
+ atom_prob = prob_dict['atom']
+ nuc_prob = prob_dict['nuc']
+ base_prob = prob_dict['base']
+ for i in range(self.shape[0]):
+ for j in range(self.shape[1]):
+ for k in range(self.shape[2]):
+ if self.dens[i, j, k] < 0:
+ continue
+ #must ignore the side chain predictions
+ if chain_prob[0,i,j,k]>=threshold:
+ #density[i, j, k] = chain_prob[0,i,j,k]
+ density[i,j,k] = chain_prob[0,i,j,k]#np.sum(nuc_prob[:4,i,j,k]) #only keep the prob of bases
+ #norm_density[i,j,k]+np.sum(chain_prob[:2,i,j,k])\
+ #+np.sum(atom_prob[:4,i,j,k])+np.sum(nuc_prob[:6,i,j,k])
+ cnt_act += 1
+ # density[chain_prob[0]>=threshold] = norm_density + chain_prob[0]+chain_prob[1]+\
+ # atom_prob[0]+atom_prob[1]+atom_prob[2] +atom_prob[3]+\
+ # nuc_prob[0]+nuc_prob[1]+nuc_prob[2]+nuc_prob[3]+nuc_prob[4]+nuc_prob[5]
+ self.dens = density
+ self.Nact = cnt_act
+
+ print("dens shape now")
+ print(self.shape)
+ print("x dim now %d,ydim now %d, zdim now %d" % (self.xdim, self.ydim, self.zdim))
+ print("mrc file mode %d" % self.ordermode)
+ total = self.shape[0] * self.shape[1] * self.shape[2]
+ print("in total useful percentage %.6f" % (self.Nact / total))
+ if self.ordermode == 1:
+ self.dens = self.dens.swapaxes(0, 2)
+
+ elif self.ordermode == 2:
+ self.dens = self.dens.swapaxes(0, 1)
+ self.dens = self.dens.swapaxes(0, 2)
+
+ elif self.ordermode == 3:
+ self.dens = self.dens.swapaxes(0, 1)
+ self.dens = self.dens.swapaxes(1, 2)
+ elif self.ordermode == 4:
+
+ self.dens = self.dens.swapaxes(0, 1)
+ elif self.ordermode == 5:
+ self.dens = self.dens.swapaxes(1, 2)
+ elif self.ordermode == 6:
+ print('order mode 0 no operation')
+ else:
+ print('invalid ordermode of mrc files %d' % self.ordermode)
+ exit()
+ print("dens shape now")
+ print(self.dens.shape)
+ self.shape = self.dens.shape
+ print("x dim now %d" % self.xdim)
+ print("after normalizing dens min %.4f max %.4f" % (np.min(self.dens), np.max(self.dens)))
+ return True
+ def upsampling(self, map_t):
+ """
+ removing points that prob=self.params['R']:
+ print("skip edges with distance %f"%d)
+ continue
+ self.edge[Ne].d = d
+ self.edge[Ne].id1 = int(i)
+ self.edge[Ne].id2 = int(j)
+ Ne += 1
+ print('sorting edge for MST preparataion %d' % Ne)
+ self.Ne = Ne
+ #label = self.sort_edge() # sort edge in ascending order by the distance, prepare for later use
+ #if not label:
+ # print('sorting edge did not work, please update code')
+ # return None, None
+ all_edge_info = np.zeros([self.Ne, 3])
+ for i in range(self.Ne):
+ all_edge_info[i, 0] = self.edge[i].d
+ all_edge_info[i, 1] = self.edge[i].id1
+ all_edge_info[i, 2] = self.edge[i].id2
+ np.savetxt(edge_info_path, all_edge_info)
+ return all_edge_info
+
+ def sort_edge(self):
+ #edge distance to build a list, then sort the list
+ distance_list = []
+ for i in range(self.Ne):
+ tmp_distance = self.edge[i].d
+ distance_list.append(tmp_distance)
+ distance_list =np.array(distance_list)
+ sort_indexes = np.argsort(distance_list)
+ final_edge_list = []
+ for k in range(len(sort_indexes)):
+ cur_index = int(sort_indexes[k])
+ final_edge_list.append(self.edge[cur_index])
+ self.edge = final_edge_list
+
+ # for i in range(self.Ne):
+ # tmp_edge1 = self.edge[i]
+ # for j in range(i + 1, self.Ne):
+ # tmp_edge2 = self.edge[j]
+ # if tmp_edge1.d > tmp_edge2.d:
+ # tmp_edge = Edge()
+ # tmp_edge.copy_edge(tmp_edge1)
+ # tmp_edge1.copy_edge(tmp_edge2)
+ # tmp_edge2.copy_edge(tmp_edge)
+ # self.edge[j] = tmp_edge2
+ # self.edge[i] = tmp_edge1
+ # check the ascending order satisfied or not
+ for i in range(self.Ne - 1):
+ # if self.edge[i+1].d0:
+ self.merged_cd_dens = self.merged_cd_dens[np.newaxis,:]#very rare case for consideration
+ self.Nmerge = len(self.merged_cd_dens)
+
+ print('merging finishing with %d left' % self.Nmerge)
+ self.normalize()
+
+ def normalize(self):
+ self.merged_cd_dens[:, 3] = (self.merged_cd_dens[:, 3] - np.min(self.merged_data[:, 4])) / (
+ np.max(self.merged_data[:, 4]) - np.min(self.merged_data[:, 4]))+0.01#for 0 happen, which will raise problems in edge density calculation
+ # self.min_dens=np.min(self.merged_data[:,4])
+ # self.max_dens=np.max(self.merged_data[:,4])
+ self.merged_data[:, 4] = (self.merged_data[:, 4] - np.min(self.merged_data[:, 4])) / (
+ np.max(self.merged_data[:, 4]) - np.min(self.merged_data[:, 4]))
+
+ def clean_isolate_data(self,graph):
+ self.mask = np.zeros(self.Nori)
+ tmp_mask = np.zeros(self.Nori)
+ for i in range(graph.Ne):
+ if graph.edge[i].mst_label == False:
+ continue
+ v1 = graph.edge[i].id1
+ v2 = graph.edge[i].id2
+ tmp_mask[v1] = 1.00 # Member
+ tmp_mask[v2] = 1.00 # Member of the tree
+ for ii in range(self.Nori):
+ m1 = int(self.merged_data[ii, 0])
+ merged_id = int(self.merged_data[m1, 5]) # get the point id after merged
+ if merged_id == -1:
+ # print('not merged single point')
+ continue
+ self.mask[ii] = tmp_mask[merged_id] # mark this original point that it is in edge or not
+ print("We have %d/%d isolated points"%(len(np.argwhere(self.mask==0)),self.Nori))
+
+
+
+
+
diff --git a/CryoREAD/structure/Tree.py b/CryoREAD/structure/Tree.py
new file mode 100644
index 0000000..7c96b15
--- /dev/null
+++ b/CryoREAD/structure/Tree.py
@@ -0,0 +1,247 @@
+
+import numpy as np
+from structure.Node import Node
+import os
+class Tree(object):
+ def __init__(self,params):
+ self.params=params
+ self.len=0#tree length
+ self.bf_len=0
+ self.Nnode=0
+ self.node=[]#node list
+ self.Ntotal=0#number of nodes
+ self.Etotal=0#number of edges
+ self.Ne=0#Number of edges in tree
+ self.St=0#Start point
+ self.Ed=0
+ #self.ActE=[]#Active edge list:label True or False
+ #self.ActN=[]#Active node list:label True or False
+ self.Nstock=0
+ self.Lpath=0#path length
+ self.Nadd=0
+ self.Ncut=0
+ self.score=0
+ self.Nmv=0#Records the move list length
+
+ def Inittree(self,label):
+ if label:
+ if self.Nnode==0 or self.Etotal==0 or self.Ne==0:
+ print("Not necessary init for the tree")
+ return True
+
+
+ self.AddTbl=np.zeros(self.Etotal)
+ self.CutTbl=np.zeros(self.Etotal)
+ self.stock=np.zeros(self.Nnode)
+
+ self.cid=np.zeros(self.Nnode)
+ self.cost=np.zeros(self.Nnode)
+ #self.node=[]
+ self.mv=[]
+ #for i in range(self.Ne**2):
+ # tmp_Move=Move()
+ # self.mv.append(tmp_Move)
+ self.Path=np.zeros(self.Ne)
+
+ for i in range(self.Nnode):
+ tmp_node=Node()
+ tmp_node.N=0
+ self.node.append(tmp_node)
+ self.ActN=np.zeros(self.Nnode)
+ self.nextv=np.zeros(self.Nnode)
+ self.cost=np.zeros(self.Nnode)
+ self.ActE=np.zeros(self.Etotal)
+ self.MaskN = np.zeros(self.Nnode)
+ self.UsedN = np.zeros(self.Nnode)
+ self.Path = np.zeros(self.Ne+1)
+ return False
+ def CopyTree(self,tree_in,label):
+ self.len=tree_in.len
+ self.Nnode=tree_in.Nnode
+ self.Ne=tree_in.Ne
+ self.Ntotal=tree_in.Ntotal
+ self.Etotal=tree_in.Etotal
+ self.St=tree_in.St
+ self.Ed=tree_in.Ed
+ self.Nadd=tree_in.Nadd
+ self.Ncut=tree_in.Ncut
+ self.score=tree_in.score
+ self.Lpath=tree_in.Lpath
+ self.Nmv=tree_in.Nmv
+ self.ActE=np.zeros(self.Etotal)
+ self.MaskN = np.zeros(tree_in.Nnode)
+ self.UsedN = np.zeros(tree_in.Nnode)
+ if label==True:
+ #self.node=[]
+ #for i in range(self.Nnode):
+ # tmp_node=Node()
+ # self.node.append(tmp_node)
+
+ self.AddTbl=np.zeros(self.Etotal)
+ self.CutTbl=np.zeros(self.Etotal)
+ self.stock=np.zeros(self.Nnode)
+
+ self.cid=np.zeros(self.Nnode)
+
+ self.node=[]
+ self.mv=[]
+ #for i in range(self.Etotal**2):
+ # tmp_Move=Move()
+ # self.mv.append(tmp_Move)
+ self.cost=np.zeros(self.Nnode)
+ self.ActN=np.zeros(self.Nnode)
+ self.Path=np.zeros(self.Ne)
+ self.nextv=np.zeros(self.Nnode)
+ for i in range(self.Lpath):
+ self.Path[i]=tree_in.Path[i]
+ for i in range(self.Etotal):
+ self.ActE[i]=tree_in.ActE[i]
+ for i in range(self.Nnode):
+ self.cost[i]=tree_in.cost[i]
+ self.ActN[i]=tree_in.ActN[i]
+ self.nextv[i]=tree_in.nextv[i]
+ def MoveTree(self,graph,cut_id,add_id):
+
+ #Update len
+ self.len=self.len - graph.edge[cut_id].d + graph.edge[add_id].d
+ #Update ActiveE
+ self.ActE[cut_id]=0
+ self.ActE[add_id]=1
+
+ def Setup_Connection(self,graph):
+ Nmin_ldp = 10
+ label = graph.sort_edge() # sort edge in ascending order by the distance
+ if not label:
+ print('1st sorting edge did not work, please update code')
+ return True
+ # Clean trees
+
+ MaxCid = 0
+ Nt = 0 # id for tree
+ tree = []
+ print('finding mst')
+ for i in range(graph.Ne):
+ v1 = graph.edge[i].id1
+ v2 = graph.edge[i].id2
+ if graph.cid[v1] == graph.cid[v2]:
+ #they already put into the connected region
+ continue
+ tree.append(graph.edge[i])
+ tmp_cid = graph.cid[v2]
+ Nt += 1
+ if MaxCid < tmp_cid:
+ MaxCid = tmp_cid
+ # if smaller edge shares one point with another larger edge,
+ # if one point of the smaller edge's chain id equals to the larger id point of the larger edge,
+ # we change this smaller edge's point's chain id to the smaller id point of the large edge
+ # You can understand it as the connection id,which smaller distance 1 connection id connects to the larger one another point
+ # Chain id is here to avoid loop in mininmum tree
+ for j in range(Nt):
+ #current edge is also in the checking loop
+ if graph.cid[tree[j].id1] == tmp_cid:
+ graph.cid[tree[j].id1] = graph.cid[v1] # It can only use we keep id10:
+ print("CID %d N= %d\n"%(i,Ncid[i]))
+ if UseCid == -1: # this is the cluster that will be used to construct mst
+ return True # Break the running process
+
+ Ntmp = 0
+ for i in range(graph.Ne):
+ if Ncid[int(graph.cid[graph.edge[i].id1])]>=Nmin_ldp:
+ graph.edge[Ntmp] = graph.edge[i]
+ Ntmp += 1
+ graph.Ne = Ntmp # tree edges
+ # Sort the edge
+ graph.edge = graph.edge[0:graph.Ne] # Only keep the edges in the graph
+
+ print('finishing building a simple connected graph')
+ label = graph.sort_edge() # sort edge in ascending order by the distance
+ if not label:
+ print('2nd sorting edge did not work, please update code')
+ return True
+ print('finishing building a simple connected graph')
+ Nt = 0
+ for i in range(graph.Nnode):
+ graph.cid[i] = i # Reset the cid
+ for i in range(graph.Ne):
+ v1 = graph.edge[i].id1
+ v2 = graph.edge[i].id2
+ graph.edge[i].mst_label = False
+ graph.edge[i].local_label = False
+ graph.edge[i].eid = i
+
+ if graph.cid[v1] == graph.cid[v2]:
+ continue
+ tree[Nt] = graph.edge[i]
+ tmp_cid = graph.cid[v2]
+ graph.edge[i].mst_label = True#Used in MST
+ Nt += 1
+ if MaxCid < tmp_cid:
+ MaxCid = tmp_cid
+ for j in range(Nt):
+ if graph.cid[tree[j].id1] == tmp_cid:
+ graph.cid[tree[j].id1] = graph.cid[v1]
+ if graph.cid[tree[j].id2] == tmp_cid:
+ graph.cid[tree[j].id2] = graph.cid[v1]
+ graph.Nt = Nt
+ graph.tree = tree
+ print('cleaning tree finished')
+ print('after cleaning Nt=%d, Ne=%d' % (graph.Nt, graph.Ne))
+ return graph.cid
+ # Clean isolated dens data from mrc
+
+
+ def build_local_mst(self,graph,point, d2cut,save_path):
+ # local MST
+ print('local MST building')
+ edge_path = os.path.join(save_path, 'edge.txt')
+ if os.path.exists(edge_path):
+ tmp_edge_data=np.loadtxt(edge_path)
+ if len(tmp_edge_data) == graph.Ne:
+ for ii in range(graph.Ne):
+ graph.edge[ii].local_label = tmp_edge_data[ii, 6]
+ else:
+ graph.calcu_local_label(point, d2cut)
+ else:
+ graph.calcu_local_label(point, d2cut)
+ # End local MST
+ def setup_tree(self,graph):
+ self.Nnode = graph.Nnode
+ self.Ne = graph.Nt
+ self.Etotal = graph.Ne
+ print('after finishing building graph, we get Etotal=%d, Ne=%d' % (self.Etotal, self.Ne))
+ if self.Inittree(True):
+ return True
+ for i in range(graph.Ne):
+ if graph.edge[i].mst_label == False:
+ continue
+ id = graph.edge[i].id1
+ eid = graph.edge[i].eid
+ self.len += graph.edge[i].d
+ self.St = id
+ self.ActE[eid] = 1 # set edge's active label
+ self.Lpath += 1
+ print('MST len:%f' % self.len)
+ return False
+
+
+
+
+
diff --git a/CryoREAD/structure/__init__.py b/CryoREAD/structure/__init__.py
new file mode 100644
index 0000000..e69de29