Programmatic usage of NeoFox#

NeoFox provides an Application Programming Interface (API) that enables the integration into other applications. This API relies heavily on Protocol Buffers data models that provide placeholder objects to store the required data while enabling different representations, data manipulation, validation and normalization. We use the Protocol Buffers data models to generate Python code automatically and to implement validation and normalization around them, but Protocol Buffers is technology agnostic thus this may facilitate the integration with third party applications not necessarily implemented in Python (see https://developers.google.com/protocol-buffers). The API is tightly integrated with the Python data analysis library Pandas (see https://pandas.pydata.org/).

Here we show:

  • how to create new model objects

  • how to import/export these objects into different representations

  • how to manipulate them

  • how to validate and normalize on the data

And finally we show how to run NeoFox programmatically, you may want to skip to this part for a quick grasp of the API usage.

[1]:
import neofox
neofox.VERSION
[1]:
'0.6.1'

Neoantigens#

The neoantigen is the central piece of information that NeoFox handles, all output annotations refer to a neoantigen. A neoantigen is formed by two subentities transcript and mutation, plus some additional attributes. Here we show how to create a neoantigen, transform it into different representations and validate it.

Create a neoantigen#

Create a neoantigen candidate:

[5]:
from neofox.model.factories import NeoantigenFactory

# create a neoantigen candidate using the factory
neoantigen = NeoantigenFactory.build_neoantigen(
    mutated_xmer="DEVLGEPSQDILVTDQTRLEATISPET",
    wild_type_xmer="DEVLGEPSQDILVIDQTRLEATISPET",
    patient_identifier="P123",
    gene="VCAN",
    rna_expression=0.519506894,
    rna_variant_allele_frequency=0.857142857,
    dna_variant_allele_frequency=0.294573643,
    my_custom_annotation="add any custom annotation as additional fields with any name"
)

Representation into different formats#

The same piece of data agreeing with NeoFox data models can be represented in different formats. Here we show how to transform the data between several formats: JSON, Python dictionaries, Protocol Buffers binary representations, Pandas dataframes and tabular representations in files. This is relevant for enabling data import and export and adding flexibility to the integration with other tools.

What is shown here is applicable to all entities in NeoFox data models.

These objects can be easily transformed into JSON:

[6]:
print(neoantigen.to_json(indent=2))
{
  "patientIdentifier": "P123",
  "gene": "VCAN",
  "mutation": {
    "position": [
      14
    ],
    "wildTypeXmer": "DEVLGEPSQDILVIDQTRLEATISPET",
    "mutatedXmer": "DEVLGEPSQDILVTDQTRLEATISPET"
  },
  "rnaExpression": 0.519506894,
  "imputedGeneExpression": null,
  "dnaVariantAlleleFrequency": 0.294573643,
  "rnaVariantAlleleFrequency": 0.857142857,
  "externalAnnotations": [
    {
      "name": "my_custom_annotation",
      "value": "add any custom annotation as additional fields with any name"
    }
  ]
}

They can also be transformed into a Python native dictionary:

[7]:
neoantigen.to_dict()
[7]:
{'patientIdentifier': 'P123',
 'gene': 'VCAN',
 'mutation': {'position': [14],
  'wildTypeXmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
  'mutatedXmer': 'DEVLGEPSQDILVTDQTRLEATISPET'},
 'rnaExpression': 0.519506894,
 'imputedGeneExpression': None,
 'dnaVariantAlleleFrequency': 0.294573643,
 'rnaVariantAlleleFrequency': 0.857142857,
 'externalAnnotations': [{'name': 'my_custom_annotation',
   'value': 'add any custom annotation as additional fields with any name'}]}

And also into the Protocol Buffers binary format that allows a better compression for storing the data or sending it over the wire:

[8]:
neoantigen.SerializeToString()
[8]:
b'\n\x04P123\x12\x04VCAN\x1a=\n\x01\x0e\x12\x1bDEVLGEPSQDILVIDQTRLEATISPET\x1a\x1bDEVLGEPSQDILVTDQTRLEATISPET%g\xfe\x04?5[\xd2\x96>=\xb7m[?JT\n\x14my_custom_annotation\x12<add any custom annotation as additional fields with any name'

Integration with Pandas#

NeoFox integrates with the Python library for data analysis Pandas (see https://pandas.pydata.org/). A single object can be transformed into a Pandas Series and a list of objects can be transformed into a Pandas DataFrame. Pandas provide functionality to persist this tabular representations to files that can be stored and imported into other environments, for instance R.

What is shown here is applicable to all entities in NeoFox data models.

[9]:
import pandas as pd
from neofox.model.conversion import ModelConverter

Transform a list of transcripts into a Pandas DataFrame:

[10]:
mutation2 = Mutation(
    wild_type_xmer="AAAAAAAAAAAAAAAAAAAAAAAAAAA",
    mutated_xmer="AAAAAAAAAAAAAGAAAAAAAAAAAAA")
mutations_df = ModelConverter.objects2dataframe([mutation, mutation2])
mutations_df
[10]:
position wildTypeXmer mutatedXmer
0 [] DEVLGEPSQDILVIDQTRLEATISPET DEVLGEPSQDILVTDQTRLEATISPET
1 [] AAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAGAAAAAAAAAAAAA

Persist any Pandas object into a file:

[11]:
mutations_df.to_csv("data/my_mutations.csv", sep="\t", index=False)

And read it back:

[12]:
mutations_df2 = pd.read_csv("data/my_mutations.csv", sep="\t")
mutations = []
for _, row in mutations_df2.iterrows():
    mutations.append(Mutation().from_dict(row.to_dict()))
mutations
[12]:
[Mutation(position='[]', wild_type_xmer='DEVLGEPSQDILVIDQTRLEATISPET', mutated_xmer='DEVLGEPSQDILVTDQTRLEATISPET'),
 Mutation(position='[]', wild_type_xmer='AAAAAAAAAAAAAAAAAAAAAAAAAAA', mutated_xmer='AAAAAAAAAAAAAGAAAAAAAAAAAAA')]

In some cases you will may be handling nested objects, for instance a neoantigen. The nesting is flattened into the DataFrame by concatenating field names with a dot, eg: mutation.wild_type_xmer. In order to read the flattened data back into the nested models we need to add an intermediate step.

[13]:
# the flattened dictionary
neoantigen_series.to_dict()
[13]:
{'patient_identifier': 'P123',
 'gene': 'VCAN',
 'rna_expression': 0.519506894,
 'imputed_gene_expression': 0.0,
 'dna_variant_allele_frequency': 0.294573643,
 'rna_variant_allele_frequency': 0.857142857,
 'external_annotations': [],
 'mutation.position': [],
 'mutation.wild_type_xmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
 'mutation.mutated_xmer': 'DEVLGEPSQDILVTDQTRLEATISPET',
 'neofox_annotations.annotations': [],
 'neofox_annotations.annotator': '',
 'neofox_annotations.annotator_version': '',
 'neofox_annotations.timestamp': '',
 'neofox_annotations.resources_hash': ''}
[14]:
# the nested dictionary
ModelConverter._flat_dict2nested_dict(flat_dict=neoantigen_series.to_dict())
[14]:
{'patient_identifier': 'P123',
 'gene': 'VCAN',
 'rna_expression': 0.519506894,
 'imputed_gene_expression': 0.0,
 'dna_variant_allele_frequency': 0.294573643,
 'rna_variant_allele_frequency': 0.857142857,
 'external_annotations': [],
 'mutation': {'position': [],
  'wild_type_xmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
  'mutated_xmer': 'DEVLGEPSQDILVTDQTRLEATISPET'},
 'neofox_annotations': {'annotations': [],
  'annotator': '',
  'annotator_version': '',
  'timestamp': '',
  'resources_hash': ''}}
[15]:
# we can load the nested dictionary into a nested model object
Neoantigen().from_dict(ModelConverter._flat_dict2nested_dict(flat_dict=neoantigen_series.to_dict()))
[15]:
Neoantigen(patient_identifier='P123', gene='VCAN', mutation=Mutation(position=[], wild_type_xmer='DEVLGEPSQDILVIDQTRLEATISPET', mutated_xmer='DEVLGEPSQDILVTDQTRLEATISPET'), rna_expression=0.519506894, imputed_gene_expression=0.0, dna_variant_allele_frequency=0.294573643, rna_variant_allele_frequency=0.857142857, neofox_annotations=NeoantigenAnnotations(annotations=[], annotator='', annotator_version='', timestamp='', resources_hash=''), external_annotations=[])

Data validation#

The quality and cleanliness of data is of great importance to enable an effective data analysis and make the data machine readable. Clean data means that the data is valid and that it is in a normal and homogeneous form. The use of controlled vocabularies help to represent knowledge in a standardised way. This is a domain specific task, although it can be assisted with the right tools such as Pandas in Python or tidyverse in R, it requires domain expertise to perform it. NeoFox provides this domain expertise out of the box with its validation and normalization layers on top of its data models.

[11]:
from neofox.model.validation import ModelValidator
from neofox.exceptions import NeofoxDataValidationException

The data validation checks for missing required fields and shows relevant messages.

[15]:
from neofox.model.neoantigen import Neoantigen, Mutation

try:
    ModelValidator.validate_neoantigen(neoantigen=Neoantigen())
except NeofoxDataValidationException as e:
    print("Error message: {}".format(e))
[E 220208 11:32:29 validation:86] {}
Error message: A patient identifier is missing. Please provide patientIdentifier in the input file

It also performs more domain specific validations such as aminoacids being valid according to the IUPAC standard aminoacid representation.

[18]:
try:
    NeoantigenFactory.build_neoantigen(
        patient_identifier="12345",
        gene="VCAN",
        mutated_xmer="123456AAAAAAAAAAAAAA", # wrong aminoacid representation
        wild_type_xmer="123456GAAAAAAAAAAAAA")
except NeofoxDataValidationException as e:
    print("Error message: {}".format(e))
[E 220208 11:59:28 validation:86] {
       "patientIdentifier": "12345",
       "gene": "VCAN",
       "mutation": {
          "position": [
             7
          ],
          "wildTypeXmer": "123456GAAAAAAAAAAAAA",
          "mutatedXmer": "123456AAAAAAAAAAAAAA"
       },
       "rnaExpression": null,
       "imputedGeneExpression": null,
       "dnaVariantAlleleFrequency": null,
       "rnaVariantAlleleFrequency": null
    }
Error message: Non existing aminoacid 1

The data normalization layer ensures the aminoacid representation is normalized into 1 letter IUPAC codes.

[19]:
valid_neoantigen = NeoantigenFactory.build_neoantigen(
    patient_identifier="12345",
    wild_type_xmer="AAAAAAAAAAAAA",
    mutated_xmer="aaaaaGaaaaa")

print(valid_neoantigen.mutation.to_json(indent=2))
{
  "position": [
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10,
    11
  ],
  "wildTypeXmer": "AAAAAAAAAAAAA",
  "mutatedXmer": "AAAAAGAAAAA"
}

After validation a unique neoantigen identifier is generated, this is a hash function of the normalized neoantigen representation, thus two different representations of the same neoantigen will share the same identifier after normalization.

[20]:
validated_neoantigen = ModelValidator.validate_neoantigen(neoantigen=neoantigen)
print(validated_neoantigen.to_json(indent=2))
{
  "patientIdentifier": "P123",
  "gene": "VCAN",
  "mutation": {
    "position": [
      14
    ],
    "wildTypeXmer": "DEVLGEPSQDILVIDQTRLEATISPET",
    "mutatedXmer": "DEVLGEPSQDILVTDQTRLEATISPET"
  },
  "rnaExpression": 0.519506894,
  "dnaVariantAlleleFrequency": 0.294573643,
  "rnaVariantAlleleFrequency": 0.857142857
}

Patients#

The neoantigen annotation process needs some context information, in particular some data about the individual where the somatic mutation creating this neoantigen took place. This information includes mainly the HLA types of the patient which is needed to compute the binding of the potential neoepitopes.

Parse MHC I alleles into a normal representation#

The main complexity in the patient model is the representation of the MHC I and MHC II alleles present in the patient. The HLA alleles are typically represented using the nomenclature defined here http://hla.alleles.org, but de facto there is certain flexibility in the representation of HLA alleles in the community. NeoFox aims at normalizing the different HLA representations into a controlled representation agreeing with the HLA nomenclature. NeoFox only supports the classic MHC genes and although the provided HLA type is kept internally it only works with the first 4 digits.

There are specific functions in NeoFox to parse a list of non normal HLA alleles into a normalized representation of the HLA alleles. Due to the heterogeneous representations of alleles we use IPD-IMGT/HLA database in order to normalize ambiguous alleles (e.g.: B15228=>HLA-B15:228 and DPB110401=>HLA-DPB1104:01).

Furthermore, the zygosity of each HLA gene is inferred.

[21]:
from neofox.references.references import ReferenceFolder
import os

os.environ["NEOFOX_REFERENCE_FOLDER"] = "/neofox_install/reference_data"
reference_folder = ReferenceFolder()
hla_database = reference_folder.get_mhc_database()
[I 210928 12:18:03 references:342] Reference genome folder: /neofox_install/reference_data
[I 210928 12:18:03 references:343] Resources
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhc2pan_available_alleles_human.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhcpan_available_alleles_human.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db/Homo_sapiens.fa
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb/IEDB_homo_sapiens.fasta
[I 210928 12:18:03 references:345] /neofox_install/reference_data/hla_database_allele_list.csv
[22]:
# by default it loads the references for Homo sapiens and hence HLA, for mouse run:
h2_database = ReferenceFolder(organism='mouse').get_mhc_database()
[I 210928 12:18:03 references:342] Reference genome folder: /neofox_install/reference_data
[I 210928 12:18:03 references:343] Resources
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhc2pan_available_alleles_mice.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhcpan_available_alleles_mice.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db/Mus_musculus.fa
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb/IEDB_mus_musculus.fasta
[I 210928 12:18:03 references:345] /neofox_install/reference_data/h2_database_allele_list.csv

Parse a list of MHC I alleles. The data validation will ensure that the data is valid and it will infer the zygosity of the different genes. The data normalization layer will normalize the HLA representation into a the valid HLA nomenclature including the first 4 digits. Different representations of the same allele will be matched after normalization.

[23]:
mhc1 = ModelConverter.parse_mhc1_alleles(
    ["HLA-A*01:01:02:03N", "HLA-A*01:02:02:03N", "B15228", "HLA-B*15:228:02:04N", "C03_163"],
    mhc_database=hla_database)
ModelConverter.objects2dataframe(mhc1)
[23]:
name zygosity alleles
0 A HETEROZYGOUS [{'fullName': 'HLA-A*01:01:02:03N', 'name': 'H...
1 B HOMOZYGOUS [{'fullName': 'HLA-B*15:228', 'name': 'HLA-B*1...
2 C HEMIZYGOUS [{'fullName': 'HLA-C*03:163', 'name': 'HLA-C*0...
[24]:
ModelConverter.objects2dataframe(mhc1[0].alleles + mhc1[1].alleles + mhc1[2].alleles)
[24]:
fullName name gene group protein
0 HLA-A*01:01:02:03N HLA-A*01:01 A 01 01
1 HLA-A*01:02:02:03N HLA-A*01:02 A 01 02
2 HLA-B*15:228 HLA-B*15:228 B 15 228
3 HLA-C*03:163 HLA-C*03:163 C 03 163

Validation and normalization of MHC alleles#

The data validation layer checks that the provided allele representations are valid.

[25]:
try:
    ModelConverter.parse_mhc1_alleles(["HLA-W*01:01:02:03N"], mhc_database=hla_database)  # bad gene W
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))
Error message: Allele does not match HLA allele pattern HLA-W*01:01:02:03N
[26]:
try:
    ModelConverter.parse_mhc1_alleles(["HLA-A*first:second:02:03N"], mhc_database=hla_database)  # bad allele representation
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))
Error message: Allele does not match HLA allele pattern HLA-A*first:second:02:03N
[27]:
try:
    ModelConverter.parse_mhc1_alleles(["HLA-A*01:02:02:03N", "HLA-A*01:03:02:03N", "HLA-A*01:04:02:03N"], mhc_database=hla_database)  # wrong number of alleles
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))
Error message: More than 2 alleles for gene A

A warning message will be shown for non existing HLA alleles.

[28]:
ModelConverter.parse_mhc1_alleles(["HLA-B*01:02:02:03N", "HLA-C*01:02"], mhc_database=hla_database)
[W 210928 12:18:03 mhc_parser:159] Allele HLA-B*01:02:02:03N does not exist in the HLA database
[28]:
[Mhc1(name=<Mhc1Name.A: 0>, zygosity=<Zygosity.LOSS: 3>, alleles=[]),
 Mhc1(name=<Mhc1Name.B: 1>, zygosity=<Zygosity.HEMIZYGOUS: 2>, alleles=[MhcAllele(full_name='HLA-B*01:02:02:03N', name='HLA-B*01:02', gene='B', group='01', protein='02')]),
 Mhc1(name=<Mhc1Name.C: 2>, zygosity=<Zygosity.HEMIZYGOUS: 2>, alleles=[MhcAllele(full_name='HLA-C*01:02', name='HLA-C*01:02', gene='C', group='01', protein='02')])]

Parse MHC II alleles into a normal representation#

The model for MHC II alleles is more complex as we need to reflect all combinations of alpha and beta chains, but the data validation and normalization provided by NeoFox is fundamentally the same.

Parse a list of MHC II alleles:

[29]:
mhc2 = ModelConverter.parse_mhc2_alleles(["HLA-DPA1*01:03", "HLA-DPA1*01:04", "HLA-DPB1*01:01", "HLA-DPB1*01:01",
                                          "HLA-DQA1*01:01", "HLA-DQA1*01:01", "HLA-DQB1*02:01", "HLA-DQB1*02:01",
                                          "HLA-DRB1*01:01", "HLA-DRB1*01:01"], mhc_database=hla_database)

An MHC II gene with an heteroyzgous alpha chain and an homozygous beta chain has two isoforms

[30]:
mhc2[1].to_dict()
[30]:
{'name': 'DQ',
 'genes': [{'name': 'DQA1',
   'alleles': [{'fullName': 'HLA-DQA1*01:01',
     'name': 'HLA-DQA1*01:01',
     'gene': 'DQA1',
     'group': '01',
     'protein': '01'}]},
  {'name': 'DQB1',
   'alleles': [{'fullName': 'HLA-DQB1*02:01',
     'name': 'HLA-DQB1*02:01',
     'gene': 'DQB1',
     'group': '02',
     'protein': '01'}]}],
 'isoforms': [{'name': 'HLA-DQA1*01:01-DQB1*02:01',
   'alphaChain': {'fullName': 'HLA-DQA1*01:01',
    'name': 'HLA-DQA1*01:01',
    'gene': 'DQA1',
    'group': '01',
    'protein': '01'},
   'betaChain': {'fullName': 'HLA-DQB1*02:01',
    'name': 'HLA-DQB1*02:01',
    'gene': 'DQB1',
    'group': '02',
    'protein': '01'}}]}

An MHC II gene with an homozygous alpha and beta chains has a single isoform.

[31]:
mhc2[2].to_dict()
[31]:
{'genes': [{'alleles': [{'fullName': 'HLA-DRB1*01:01',
     'name': 'HLA-DRB1*01:01',
     'gene': 'DRB1',
     'group': '01',
     'protein': '01'}]}],
 'isoforms': [{'name': 'HLA-DRB1*01:01',
   'betaChain': {'fullName': 'HLA-DRB1*01:01',
    'name': 'HLA-DRB1*01:01',
    'gene': 'DRB1',
    'group': '01',
    'protein': '01'}}]}

The MHC II DRB gene is a special case with no alpha chain represented as this is not variable.

[32]:
mhc2[0].to_dict()
[32]:
{'name': 'DP',
 'genes': [{'name': 'DPA1',
   'zygosity': 'HETEROZYGOUS',
   'alleles': [{'fullName': 'HLA-DPA1*01:03',
     'name': 'HLA-DPA1*01:03',
     'gene': 'DPA1',
     'group': '01',
     'protein': '03'},
    {'fullName': 'HLA-DPA1*01:04',
     'name': 'HLA-DPA1*01:04',
     'gene': 'DPA1',
     'group': '01',
     'protein': '04'}]},
  {'name': 'DPB1',
   'alleles': [{'fullName': 'HLA-DPB1*01:01',
     'name': 'HLA-DPB1*01:01',
     'gene': 'DPB1',
     'group': '01',
     'protein': '01'}]}],
 'isoforms': [{'name': 'HLA-DPA1*01:03-DPB1*01:01',
   'alphaChain': {'fullName': 'HLA-DPA1*01:03',
    'name': 'HLA-DPA1*01:03',
    'gene': 'DPA1',
    'group': '01',
    'protein': '03'},
   'betaChain': {'fullName': 'HLA-DPB1*01:01',
    'name': 'HLA-DPB1*01:01',
    'gene': 'DPB1',
    'group': '01',
    'protein': '01'}},
  {'name': 'HLA-DPA1*01:04-DPB1*01:01',
   'alphaChain': {'fullName': 'HLA-DPA1*01:04',
    'name': 'HLA-DPA1*01:04',
    'gene': 'DPA1',
    'group': '01',
    'protein': '04'},
   'betaChain': {'fullName': 'HLA-DPB1*01:01',
    'name': 'HLA-DPB1*01:01',
    'gene': 'DPB1',
    'group': '01',
    'protein': '01'}}]}

Beware that incomplete MHC II molecules missing one of the chains will have no isoforms and thus no binding will be computed on them. In the case below the beta chain allele for the DP gene is missing.

[33]:
mhc2 = ModelConverter.parse_mhc2_alleles(["HLA-DPA1*01:03", "HLA-DPA1*01:04"], mhc_database=hla_database)
mhc2[1].to_dict()
[33]:
{'name': 'DQ',
 'genes': [{'name': 'DQA1', 'zygosity': 'LOSS'},
  {'name': 'DQB1', 'zygosity': 'LOSS'}]}

Create a patient#

[34]:
from neofox.model.neoantigen import Patient


mhc1 = ModelConverter.parse_mhc1_alleles(["HLA-A*01:01:02:03N", "HLA-A*01:02:02:03N",
                                          "HLA-B*15:01:02:03N", "HLA-B*15:01:02:04N",
                                          "HLA-C*03:02"], mhc_database=hla_database)
mhc2 = ModelConverter.parse_mhc2_alleles(["HLA-DPA1*01:03", "HLA-DPA1*01:04", "HLA-DPB1*01:01", "HLA-DPB1*01:01",
                                          "HLA-DQA1*01:01", "HLA-DQA1*01:01", "HLA-DQB1*02:01", "HLA-DQB1*02:01",
                                          "HLA-DRB1*01:01", "HLA-DRB1*01:01"], mhc_database=hla_database)
patient = Patient(
    identifier="P123",
    is_rna_available=True,
    tumor_type="NSCLC",
    mhc1=mhc1,
    mhc2=mhc2
)
ModelConverter.object2series(patient)
[34]:
identifier                                                       P123
is_rna_available                                                 True
tumor_type                                                      NSCLC
mhc1                [{'name': 'A', 'zygosity': 'HETEROZYGOUS', 'al...
mhc2                [{'name': 'DP', 'genes': [{'name': 'DPA1', 'zy...
Name: 0, dtype: object

Validate a patient#

[35]:
validated_patient = ModelValidator.validate_patient(patient)

A patient requires an identifier. MHC I and MHC II are optional in case one or the other are not available, the output annotations are adapted accordingly.

[36]:
try:
    ModelValidator.validate_patient(Patient())  # missing patient identifier
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))
[E 210928 12:18:04 validation:117] {}
Error message: A patient identifier is missing
[37]:
patient_without_mhc2 = ModelValidator.validate_patient(Patient(identifier="12345", mhc1=mhc1))
[38]:
patient_without_mhc1 = ModelValidator.validate_patient(Patient(identifier="12345", mhc2=mhc2))

Run Neofox#

Parse input data from a file#

Although we could create the data objects manually as shown above, for convenience it is useful to store the data in tabular format. Here we show how to parse the neoantigens and patients from tabular files.

The tabular file for neoantigens should look as follows:

[39]:
pd.read_csv("data/test_model_file.txt", sep="\t")
[39]:
gene transcript_identifier mutation.mutatedXmer mutation.wildTypeXmer patientIdentifier
0 VCAN uc003kii.3 DEVLGEPSQDILVTDQTRLEATISPET DEVLGEPSQDILVIDQTRLEATISPET Pt27
1 DCST2 uc001fgm.3 RTNLLAALHRSVRWRAADQGHRSAFLV RTNLLAALHRSVRRRAADQGHRSAFLV Pt24
2 NRAS uc009wgu.3 MTEYKLVVVGACGVGKSALTIQLIQ MTEYKLVVVGAGGVGKSALTIQLIQ Pt28
3 CEP350 uc001gnt.3 QTDSSSSDMQACSKDKAKISLGSSIDS QTDSSSSDMQACSQDKAKISLGSSIDS Pt63
4 CPPED1 uc002dca.4 DRAIPLVLVSGNHYIGNTPTAETVEEF DRAIPLVLVSGNHDIGNTPTAETVEEF Pt77
5 CXorf26 uc004ecl.1 YNKAVYISVQDKEEEKGVNNGGEKRAD YNKAVYISVQDKEGEKGVNNGGEKRAD Pt117
6 IGSF9B uc001qgx.4 ASTHLTVIGTSPHVPGSVRVQVSMTTA ASTHLTVIGTSPHAPGSVRVQVSMTTA Pt110
7 HEATR5A uc001wrf.4 TRRDEKSHPFTNPQWATRVFAAECVCR TRRDEKSHPFTNPRWATRVFAAECVCR Pt26
8 CHRDL2 uc001ovh.3 ARPDMFCLFHGKRHFPGESWHPYLEPQ ARPDMFCLFHGKRYFPGESWHPYLEPQ Pt77

There is a specific function to parse an input file into a list of neoantigens. Any additional column not matching a field in the neoantigens model, in this case transcript_identifier, will be parsed into the external annotations. Neofox when executed from the command line interface adds these external annotations in the output together with the new annotations.

[40]:
neoantigens = ModelConverter.parse_neoantigens_file("data/test_model_file.txt")

The tabular file for patients should look as follows:

[41]:
pd.read_csv("data/test_patient_file.txt", sep="\t")
[41]:
identifier mhcIAlleles mhcIIAlleles isRnaAvailable tumorType
0 Pt27 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC
1 Pt24 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC
2 Pt28 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC
3 Pt63 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC
4 Pt77 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC
5 Pt117 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC
6 Pt110 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC
7 Pt26 HLA-A*03:01,HLA-A*29:02,HLA-B*07:02,HLA-B*44:0... HLA-DRB1*04:02,HLA-DRB1*08:01,HLA-DQA1*03:01,H... True HNSC

Parse the patients into the model objects as follows:

[42]:
patients = ModelConverter.parse_patients_file("data/test_patient_file.txt", mhc_database=hla_database)

Annotate your neoantigens#

Running NeoFox requires the configuration its configuration through a number of environment variables, this is described in detail elsewhere in the documentation. This configuration can also be provided through a file passed into Neofox class in the field configuration_file.

[43]:
from neofox.neofox import NeoFox
import os
[44]:
os.environ["NEOFOX_REFERENCE_FOLDER"] = "/neofox_install/reference_data/"
os.environ["NEOFOX_RSCRIPT"] = "/usr/bin/Rscript"
os.environ["NEOFOX_BLASTP"] = "/neofox_install/ncbi-blast-2.10.1+/bin/blastp"
os.environ["NEOFOX_NETMHCPAN"] = "/neofox_install/netMHCpan-4.1/netMHCpan"
os.environ["NEOFOX_NETMHC2PAN"] = "/neofox_install/netMHCIIpan-4.0/netMHCIIpan"
os.environ["NEOFOX_MIXMHCPRED"] = "/neofox_install/MixMHCpred-2.1/MixMHCpred"
os.environ["NEOFOX_MIXMHC2PRED"] = "/neofox_install/MixMHC2pred-1.2/MixMHC2pred_unix"
os.environ["NEOFOX_PRIME"] = "/neofox_install/PRIME-master/PRIME"
annotations = NeoFox(neoantigens=neoantigens, patients=patients, num_cpus=4).get_annotations()
[I 210928 12:18:06 references:342] Reference genome folder: /neofox_install/reference_data/
[I 210928 12:18:06 references:343] Resources
[I 210928 12:18:06 references:345] /neofox_install/reference_data/netmhc2pan_available_alleles_human.txt
[I 210928 12:18:06 references:345] /neofox_install/reference_data/netmhcpan_available_alleles_human.txt
[I 210928 12:18:06 references:345] /neofox_install/reference_data/iedb
[I 210928 12:18:06 references:345] /neofox_install/reference_data/proteome_db
[I 210928 12:18:06 references:345] /neofox_install/reference_data/proteome_db/Homo_sapiens.fa
[I 210928 12:18:06 references:345] /neofox_install/reference_data/iedb/IEDB_homo_sapiens.fasta
[I 210928 12:18:06 references:345] /neofox_install/reference_data/hla_database_allele_list.csv
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at VCAN:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 8.8404052357
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at DCST2:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 0.1283784886
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at NRAS:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 14.0097794749
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CEP350:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 4.1881530572
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CPPED1:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 3.2718222656
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CXorf26:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 13.5743362176
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at IGSF9B:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 0.0771477047
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at HEATR5A:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 2.8020526973
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CHRDL2:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 0.3346201976
[I 210928 12:18:06 neofox:127] Data loaded
[I 210928 12:18:06 neofox:199] Starting NeoFox annotations...
[I 210928 12:20:18 neofox:246] Elapsed time for annotating 9 neoantigens 130 seconds

Neofox returns a list of annotations for each neoantigen, these are stored in an object called NeoantigenAnnotations which contains the corresponding neoantigen identifier, the annotator (ie: neofox), the annotator version, a timestamp and finally a list of the annotations.

[45]:
annotations[0].to_dict()
[45]:
{'patientIdentifier': 'Pt27',
 'gene': 'VCAN',
 'mutation': {'position': [14],
  'wildTypeXmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
  'mutatedXmer': 'DEVLGEPSQDILVTDQTRLEATISPET'},
 'rnaExpression': 8.8404052357,
 'imputedGeneExpression': 8.8404052357,
 'dnaVariantAlleleFrequency': None,
 'rnaVariantAlleleFrequency': None,
 'neofoxAnnotations': {'annotations': [{'name': 'Best_rank_MHCI_score',
    'value': '3.906'},
   {'name': 'Best_rank_MHCI_score_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_rank_MHCI_score_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_score', 'value': '2982.7'},
   {'name': 'Best_affinity_MHCI_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_affinity_MHCI_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_rank_MHCI_9mer_score', 'value': '3.906'},
   {'name': 'Best_rank_MHCI_9mer_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_rank_MHCI_9mer_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_score', 'value': '2982.7'},
   {'name': 'Best_affinity_MHCI_9mer_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_affinity_MHCI_score_WT', 'value': '9467.9'},
   {'name': 'Best_affinity_MHCI_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Best_affinity_MHCI_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_rank_MHCI_score_WT', 'value': '8.277'},
   {'name': 'Best_rank_MHCI_score_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Best_rank_MHCI_score_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_rank_MHCI_9mer_score_WT', 'value': '8.277'},
   {'name': 'Best_rank_MHCI_9mer_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Best_rank_MHCI_9mer_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_score_WT', 'value': '9467.9'},
   {'name': 'Best_affinity_MHCI_9mer_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Generator_rate_MHCI', 'value': '0'},
   {'name': 'Generator_rate_CDN_MHCI', 'value': '0'},
   {'name': 'Generator_rate_ADN_MHCI', 'value': '0'},
   {'name': 'PHBR_I', 'value': '6.2707'},
   {'name': 'Best_affinity_MHCI_9mer_position_mutation', 'value': '2'},
   {'name': 'Best_affinity_MHCI_9mer_anchor_mutated', 'value': '1'},
   {'name': 'Best_rank_MHCII_score', 'value': '3.26'},
   {'name': 'Best_rank_MHCII_score_epitope', 'value': 'SQDILVTDQTRLEAT'},
   {'name': 'Best_rank_MHCII_score_allele',
    'value': 'HLA-DQA1*03:01-DQB1*03:02'},
   {'name': 'Best_affinity_MHCII_score', 'value': '1103.5'},
   {'name': 'Best_affinity_MHCII_epitope', 'value': 'QDILVTDQTRLEATI'},
   {'name': 'Best_affinity_MHCII_allele', 'value': 'HLA-DRB1*08:01'},
   {'name': 'Best_rank_MHCII_score_WT', 'value': '6.14'},
   {'name': 'Best_rank_MHCII_score_epitope_WT', 'value': 'SQDILVIDQTRLEAT'},
   {'name': 'Best_rank_MHCII_score_allele_WT',
    'value': 'HLA-DQA1*03:01-DQB1*03:02'},
   {'name': 'Best_affinity_MHCII_score_WT', 'value': '562.6'},
   {'name': 'Best_affinity_MHCII_epitope_WT', 'value': 'QDILVIDQTRLEATI'},
   {'name': 'Best_affinity_MHCII_allele_WT', 'value': 'HLA-DRB1*08:01'},
   {'name': 'PHBR_II', 'value': '8.8958'},
   {'name': 'Generator_rate_MHCII', 'value': '0'},
   {'name': 'Generator_rate_CDN_MHCII', 'value': '0'},
   {'name': 'Generator_rate_ADN_MHCII', 'value': '0'},
   {'name': 'MixMHCpred_best_peptide', 'value': 'VTDQTRLEA'},
   {'name': 'MixMHCpred_best_score', 'value': '-0.09792'},
   {'name': 'MixMHCpred_best_rank', 'value': '10'},
   {'name': 'MixMHCpred_best_allele', 'value': 'HLA-A*29:02'},
   {'name': 'PRIME_best_peptide', 'value': 'LVTDQTRLE'},
   {'name': 'PRIME_best_score', 'value': '0.16349'},
   {'name': 'PRIME_best_rank', 'value': '5'},
   {'name': 'PRIME_best_allele', 'value': 'HLA-A*29:02'},
   {'name': 'MixMHC2pred_best_peptide', 'value': 'DEVLGEPSQDILVT'},
   {'name': 'MixMHC2pred_best_rank', 'value': '3.06'},
   {'name': 'MixMHC2pred_best_allele', 'value': 'HLA-DPA1*01:03-DPB1*04:01'},
   {'name': 'Expression_mutated_transcript', 'value': 'NA'},
   {'name': 'mutation_not_found_in_proteome', 'value': '1'},
   {'name': 'Amplitude_MHCI_affinity_9mer', 'value': '0.82656'},
   {'name': 'Amplitude_MHCI_affinity', 'value': '0.82656'},
   {'name': 'Amplitude_MHCII_rank', 'value': '1.8834'},
   {'name': 'Pathogensimiliarity_MHCI_9mer', 'value': '0'},
   {'name': 'Recognition_Potential_MHCI_9mer', 'value': '0'},
   {'name': 'Pathogensimiliarity_MHCII', 'value': '0'},
   {'name': 'DAI_MHCI_affinity', 'value': '6485.2'},
   {'name': 'CDN_MHCI', 'value': '0'},
   {'name': 'ADN_MHCI', 'value': '0'},
   {'name': 'CDN_MHCII', 'value': '0'},
   {'name': 'ADN_MHCII', 'value': '0'},
   {'name': 'Tcell_predictor_score', 'value': '0.5094749478464053'},
   {'name': 'Improved_Binder_MHCI', 'value': '1'},
   {'name': 'Selfsimilarity_MHCII', 'value': '0.9615321336133709'},
   {'name': 'Selfsimilarity_MHCI', 'value': '0.9746163344293891'},
   {'name': 'Selfsimilarity_MHCI_conserved_binder', 'value': 'NA'},
   {'name': 'Number_of_mismatches_MCHI', 'value': '1'},
   {'name': 'Priority_score', 'value': 'NA'},
   {'name': 'Neoag_immunogenicity', 'value': '142.3854'},
   {'name': 'IEDB_Immunogenicity_MHCI', 'value': '0.0263'},
   {'name': 'IEDB_Immunogenicity_MHCII', 'value': '0.24619'},
   {'name': 'Dissimilarity_MHCI', 'value': '0'},
   {'name': 'Dissimilarity_MHCII', 'value': '0'},
   {'name': 'vaxrank_binding_score', 'value': '0'},
   {'name': 'vaxrank_total_score', 'value': 'NA'},
   {'name': 'Hex_alignment_score_MHCI', 'value': '146'},
   {'name': 'Hex_alignment_score_MHCII', 'value': '374'}],
  'annotator': 'NeoFox',
  'annotatorVersion': '0.5.4.dev3',
  'timestamp': '20210928121942034880'},
 'externalAnnotations': [{'name': 'transcript_identifier',
   'value': 'uc003kii.3'}]}

Transform the annotations into a data frame#

[46]:
annotations_table = ModelConverter.annotations2table(neoantigens=annotations)
annotations_table.head(10)
[46]:
patientIdentifier gene mutation.mutatedXmer mutation.wildTypeXmer mutation.position dnaVariantAlleleFrequency rnaVariantAlleleFrequency rnaExpression imputedGeneExpression ADN_MHCI ... Priority_score Recognition_Potential_MHCI_9mer Selfsimilarity_MHCI Selfsimilarity_MHCII Selfsimilarity_MHCI_conserved_binder Tcell_predictor_score mutation_not_found_in_proteome transcript_identifier vaxrank_binding_score vaxrank_total_score
0 Pt27 VCAN DEVLGEPSQDILVTDQTRLEATISPET DEVLGEPSQDILVIDQTRLEATISPET 14 NA NA 8.840405 8.840405 0 ... NA 0 0.9746163344293891 0.9615321336133709 NA 0.5094749478464053 1 uc003kii.3 0 NA
1 Pt24 DCST2 RTNLLAALHRSVRWRAADQGHRSAFLV RTNLLAALHRSVRRRAADQGHRSAFLV 14 NA NA 0.128378 0.128378 0 ... NA 0 0.9421875787623097 0.9391684897220461 0.9421875787623097 0.27191401693602235 1 uc001fgm.3 0.38942 NA
2 Pt28 NRAS MTEYKLVVVGACGVGKSALTIQLIQ MTEYKLVVVGAGGVGKSALTIQLIQ 12 NA NA 14.009779 14.009779 0 ... NA 0 0.9330521460001094 0.9341826178609157 0.9330521460001094 0.5068878716790075 1 uc009wgu.3 1.4787 NA
3 Pt63 CEP350 QTDSSSSDMQACSKDKAKISLGSSIDS QTDSSSSDMQACSQDKAKISLGSSIDS 14 NA NA 4.188153 4.188153 0 ... NA 0 0.9822504402167844 0.9861912144506167 0.9822504402167844 0.3367698236194417 1 uc001gnt.3 0.23049 NA
4 Pt77 CPPED1 DRAIPLVLVSGNHYIGNTPTAETVEEF DRAIPLVLVSGNHDIGNTPTAETVEEF 14 NA NA 3.271822 3.271822 1 ... NA 0 0.9716393071320848 0.9451458345555184 NA 0.5720939101479084 1 uc002dca.4 0.8453 NA
5 Pt117 CXorf26 YNKAVYISVQDKEEEKGVNNGGEKRAD YNKAVYISVQDKEGEKGVNNGGEKRAD 14 NA NA 13.574336 13.574336 0 ... NA 0 0.9542175575949939 0.9697523262453144 0.9542175575949939 0.4886563384775356 1 uc004ecl.1 0 NA
6 Pt110 IGSF9B ASTHLTVIGTSPHVPGSVRVQVSMTTA ASTHLTVIGTSPHAPGSVRVQVSMTTA 14 NA NA 0.077148 0.077148 0 ... NA 0 0.9874590500904996 0.9803010405217453 0.9874590500904996 0.09919940654277996 1 uc001qgx.4 3.6385 NA
7 Pt26 HEATR5A TRRDEKSHPFTNPQWATRVFAAECVCR TRRDEKSHPFTNPRWATRVFAAECVCR 14 NA NA 2.802053 2.802053 0 ... NA 0 0.9614920042660836 0.9596541446763491 0.9614920042660836 0.3374084622586067 1 uc001wrf.4 3.1392 NA
8 Pt77 CHRDL2 ARPDMFCLFHGKRHFPGESWHPYLEPQ ARPDMFCLFHGKRYFPGESWHPYLEPQ 14 NA NA 0.334620 0.334620 0 ... NA 0 0.9782021524274523 0.9391077888953303 0.9782021524274523 0.49744103834505293 1 uc001ovh.3 0.91677 NA

9 rows × 96 columns

[47]:
annotations_table.set_index(['patientIdentifier', 'mutation.mutatedXmer']).stack().reset_index().head(60)
[47]:
patientIdentifier mutation.mutatedXmer level_2 0
0 Pt27 DEVLGEPSQDILVTDQTRLEATISPET gene VCAN
1 Pt27 DEVLGEPSQDILVTDQTRLEATISPET mutation.wildTypeXmer DEVLGEPSQDILVIDQTRLEATISPET
2 Pt27 DEVLGEPSQDILVTDQTRLEATISPET mutation.position 14
3 Pt27 DEVLGEPSQDILVTDQTRLEATISPET dnaVariantAlleleFrequency NA
4 Pt27 DEVLGEPSQDILVTDQTRLEATISPET rnaVariantAlleleFrequency NA
5 Pt27 DEVLGEPSQDILVTDQTRLEATISPET rnaExpression 8.84041
6 Pt27 DEVLGEPSQDILVTDQTRLEATISPET imputedGeneExpression 8.84041
7 Pt27 DEVLGEPSQDILVTDQTRLEATISPET ADN_MHCI 0
8 Pt27 DEVLGEPSQDILVTDQTRLEATISPET ADN_MHCII 0
9 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Amplitude_MHCII_rank 1.8834
10 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Amplitude_MHCI_affinity 0.82656
11 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Amplitude_MHCI_affinity_9mer 0.82656
12 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCII_allele HLA-DRB1*08:01
13 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCII_allele_WT HLA-DRB1*08:01
14 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCII_epitope QDILVTDQTRLEATI
15 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCII_epitope_WT QDILVIDQTRLEATI
16 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCII_score 1103.5
17 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCII_score_WT 562.6
18 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_allele HLA-C*16:01
19 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_allele_WT HLA-C*16:01
20 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_anchor_mutated 1
21 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_epitope VTDQTRLEA
22 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_epitope_WT VIDQTRLEA
23 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_position_mutation 2
24 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_score 2982.7
25 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_9mer_score_WT 9467.9
26 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_allele HLA-C*16:01
27 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_allele_WT HLA-C*16:01
28 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_epitope VTDQTRLEA
29 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_epitope_WT VIDQTRLEA
30 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_score 2982.7
31 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_affinity_MHCI_score_WT 9467.9
32 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCII_score 3.26
33 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCII_score_WT 6.14
34 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCII_score_allele HLA-DQA1*03:01-DQB1*03:02
35 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCII_score_allele_WT HLA-DQA1*03:01-DQB1*03:02
36 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCII_score_epitope SQDILVTDQTRLEAT
37 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCII_score_epitope_WT SQDILVIDQTRLEAT
38 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_9mer_allele HLA-C*16:01
39 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_9mer_allele_WT HLA-C*16:01
40 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_9mer_epitope VTDQTRLEA
41 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_9mer_epitope_WT VIDQTRLEA
42 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_9mer_score 3.906
43 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_9mer_score_WT 8.277
44 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_score 3.906
45 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_score_WT 8.277
46 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_score_allele HLA-C*16:01
47 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_score_allele_WT HLA-C*16:01
48 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_score_epitope VTDQTRLEA
49 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Best_rank_MHCI_score_epitope_WT VIDQTRLEA
50 Pt27 DEVLGEPSQDILVTDQTRLEATISPET CDN_MHCI 0
51 Pt27 DEVLGEPSQDILVTDQTRLEATISPET CDN_MHCII 0
52 Pt27 DEVLGEPSQDILVTDQTRLEATISPET DAI_MHCI_affinity 6485.2
53 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Dissimilarity_MHCI 0
54 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Dissimilarity_MHCII 0
55 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Expression_mutated_transcript NA
56 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Generator_rate_ADN_MHCI 0
57 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Generator_rate_ADN_MHCII 0
58 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Generator_rate_CDN_MHCI 0
59 Pt27 DEVLGEPSQDILVTDQTRLEATISPET Generator_rate_CDN_MHCII 0