Programmatic usage of NeoFox#

NeoFox provides an Application Programming Interface (API) that enables the integration into other applications. This API relies heavily on Protocol Buffers data models that provide placeholder objects to store the required data while enabling different representations, data manipulation, validation and normalization. We use the Protocol Buffers data models to generate Python code automatically and to implement validation and normalization around them, but Protocol Buffers is technology agnostic thus this may facilitate the integration with third party applications not necessarily implemented in Python (see https://developers.google.com/protocol-buffers). The API is tightly integrated with the Python data analysis library Pandas (see https://pandas.pydata.org/).

Here we show:

how to create new model objects
how to import/export these objects into different representations
how to manipulate them
how to validate and normalize on the data

And finally we show how to run NeoFox programmatically, you may want to skip to this part for a quick grasp of the API usage.

[1]:

import neofox
neofox.VERSION

[1]:

'0.6.1'

Neoantigens#

The neoantigen is the central piece of information that NeoFox handles, all output annotations refer to a neoantigen. A neoantigen is formed by two subentities transcript and mutation, plus some additional attributes. Here we show how to create a neoantigen, transform it into different representations and validate it.

Create a neoantigen#

Create a neoantigen candidate:

[5]:

from neofox.model.factories import NeoantigenFactory

# create a neoantigen candidate using the factory
neoantigen = NeoantigenFactory.build_neoantigen(
    mutated_xmer="DEVLGEPSQDILVTDQTRLEATISPET",
    wild_type_xmer="DEVLGEPSQDILVIDQTRLEATISPET",
    patient_identifier="P123",
    gene="VCAN",
    rna_expression=0.519506894,
    rna_variant_allele_frequency=0.857142857,
    dna_variant_allele_frequency=0.294573643,
    my_custom_annotation="add any custom annotation as additional fields with any name"
)

Representation into different formats#

The same piece of data agreeing with NeoFox data models can be represented in different formats. Here we show how to transform the data between several formats: JSON, Python dictionaries, Protocol Buffers binary representations, Pandas dataframes and tabular representations in files. This is relevant for enabling data import and export and adding flexibility to the integration with other tools.

What is shown here is applicable to all entities in NeoFox data models.

These objects can be easily transformed into JSON:

[6]:

print(neoantigen.to_json(indent=2))

{
  "patientIdentifier": "P123",
  "gene": "VCAN",
  "mutation": {
    "position": [
      14
    ],
    "wildTypeXmer": "DEVLGEPSQDILVIDQTRLEATISPET",
    "mutatedXmer": "DEVLGEPSQDILVTDQTRLEATISPET"
  },
  "rnaExpression": 0.519506894,
  "imputedGeneExpression": null,
  "dnaVariantAlleleFrequency": 0.294573643,
  "rnaVariantAlleleFrequency": 0.857142857,
  "externalAnnotations": [
    {
      "name": "my_custom_annotation",
      "value": "add any custom annotation as additional fields with any name"
    }
  ]
}

They can also be transformed into a Python native dictionary:

[7]:

neoantigen.to_dict()

[7]:

{'patientIdentifier': 'P123',
 'gene': 'VCAN',
 'mutation': {'position': [14],
  'wildTypeXmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
  'mutatedXmer': 'DEVLGEPSQDILVTDQTRLEATISPET'},
 'rnaExpression': 0.519506894,
 'imputedGeneExpression': None,
 'dnaVariantAlleleFrequency': 0.294573643,
 'rnaVariantAlleleFrequency': 0.857142857,
 'externalAnnotations': [{'name': 'my_custom_annotation',
   'value': 'add any custom annotation as additional fields with any name'}]}

And also into the Protocol Buffers binary format that allows a better compression for storing the data or sending it over the wire:

[8]:

neoantigen.SerializeToString()

[8]:

b'\n\x04P123\x12\x04VCAN\x1a=\n\x01\x0e\x12\x1bDEVLGEPSQDILVIDQTRLEATISPET\x1a\x1bDEVLGEPSQDILVTDQTRLEATISPET%g\xfe\x04?5[\xd2\x96>=\xb7m[?JT\n\x14my_custom_annotation\x12<add any custom annotation as additional fields with any name'

Integration with Pandas#

NeoFox integrates with the Python library for data analysis Pandas (see https://pandas.pydata.org/). A single object can be transformed into a Pandas Series and a list of objects can be transformed into a Pandas DataFrame. Pandas provide functionality to persist this tabular representations to files that can be stored and imported into other environments, for instance R.

What is shown here is applicable to all entities in NeoFox data models.

[9]:

import pandas as pd
from neofox.model.conversion import ModelConverter

Transform a list of transcripts into a Pandas DataFrame:

[10]:

mutation2 = Mutation(
    wild_type_xmer="AAAAAAAAAAAAAAAAAAAAAAAAAAA",
    mutated_xmer="AAAAAAAAAAAAAGAAAAAAAAAAAAA")
mutations_df = ModelConverter.objects2dataframe([mutation, mutation2])
mutations_df

[10]:

	position	wildTypeXmer	mutatedXmer
0	[]	DEVLGEPSQDILVIDQTRLEATISPET	DEVLGEPSQDILVTDQTRLEATISPET
1	[]	AAAAAAAAAAAAAAAAAAAAAAAAAAA	AAAAAAAAAAAAAGAAAAAAAAAAAAA

Persist any Pandas object into a file:

[11]:

mutations_df.to_csv("data/my_mutations.csv", sep="\t", index=False)

And read it back:

[12]:

mutations_df2 = pd.read_csv("data/my_mutations.csv", sep="\t")
mutations = []
for _, row in mutations_df2.iterrows():
    mutations.append(Mutation().from_dict(row.to_dict()))
mutations

[12]:

[Mutation(position='[]', wild_type_xmer='DEVLGEPSQDILVIDQTRLEATISPET', mutated_xmer='DEVLGEPSQDILVTDQTRLEATISPET'),
 Mutation(position='[]', wild_type_xmer='AAAAAAAAAAAAAAAAAAAAAAAAAAA', mutated_xmer='AAAAAAAAAAAAAGAAAAAAAAAAAAA')]

In some cases you will may be handling nested objects, for instance a neoantigen. The nesting is flattened into the DataFrame by concatenating field names with a dot, eg: mutation.wild_type_xmer. In order to read the flattened data back into the nested models we need to add an intermediate step.

[13]:

# the flattened dictionary
neoantigen_series.to_dict()

[13]:

{'patient_identifier': 'P123',
 'gene': 'VCAN',
 'rna_expression': 0.519506894,
 'imputed_gene_expression': 0.0,
 'dna_variant_allele_frequency': 0.294573643,
 'rna_variant_allele_frequency': 0.857142857,
 'external_annotations': [],
 'mutation.position': [],
 'mutation.wild_type_xmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
 'mutation.mutated_xmer': 'DEVLGEPSQDILVTDQTRLEATISPET',
 'neofox_annotations.annotations': [],
 'neofox_annotations.annotator': '',
 'neofox_annotations.annotator_version': '',
 'neofox_annotations.timestamp': '',
 'neofox_annotations.resources_hash': ''}

[14]:

# the nested dictionary
ModelConverter._flat_dict2nested_dict(flat_dict=neoantigen_series.to_dict())

[14]:

{'patient_identifier': 'P123',
 'gene': 'VCAN',
 'rna_expression': 0.519506894,
 'imputed_gene_expression': 0.0,
 'dna_variant_allele_frequency': 0.294573643,
 'rna_variant_allele_frequency': 0.857142857,
 'external_annotations': [],
 'mutation': {'position': [],
  'wild_type_xmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
  'mutated_xmer': 'DEVLGEPSQDILVTDQTRLEATISPET'},
 'neofox_annotations': {'annotations': [],
  'annotator': '',
  'annotator_version': '',
  'timestamp': '',
  'resources_hash': ''}}

[15]:

# we can load the nested dictionary into a nested model object
Neoantigen().from_dict(ModelConverter._flat_dict2nested_dict(flat_dict=neoantigen_series.to_dict()))

[15]:

Neoantigen(patient_identifier='P123', gene='VCAN', mutation=Mutation(position=[], wild_type_xmer='DEVLGEPSQDILVIDQTRLEATISPET', mutated_xmer='DEVLGEPSQDILVTDQTRLEATISPET'), rna_expression=0.519506894, imputed_gene_expression=0.0, dna_variant_allele_frequency=0.294573643, rna_variant_allele_frequency=0.857142857, neofox_annotations=NeoantigenAnnotations(annotations=[], annotator='', annotator_version='', timestamp='', resources_hash=''), external_annotations=[])

Data validation#

The quality and cleanliness of data is of great importance to enable an effective data analysis and make the data machine readable. Clean data means that the data is valid and that it is in a normal and homogeneous form. The use of controlled vocabularies help to represent knowledge in a standardised way. This is a domain specific task, although it can be assisted with the right tools such as Pandas in Python or tidyverse in R, it requires domain expertise to perform it. NeoFox provides this domain expertise out of the box with its validation and normalization layers on top of its data models.

[11]:

from neofox.model.validation import ModelValidator
from neofox.exceptions import NeofoxDataValidationException

The data validation checks for missing required fields and shows relevant messages.

[15]:

from neofox.model.neoantigen import Neoantigen, Mutation

try:
    ModelValidator.validate_neoantigen(neoantigen=Neoantigen())
except NeofoxDataValidationException as e:
    print("Error message: {}".format(e))

[E 220208 11:32:29 validation:86] {}

Error message: A patient identifier is missing. Please provide patientIdentifier in the input file

It also performs more domain specific validations such as aminoacids being valid according to the IUPAC standard aminoacid representation.

[18]:

try:
    NeoantigenFactory.build_neoantigen(
        patient_identifier="12345",
        gene="VCAN",
        mutated_xmer="123456AAAAAAAAAAAAAA", # wrong aminoacid representation
        wild_type_xmer="123456GAAAAAAAAAAAAA")
except NeofoxDataValidationException as e:
    print("Error message: {}".format(e))

[E 220208 11:59:28 validation:86] {
       "patientIdentifier": "12345",
       "gene": "VCAN",
       "mutation": {
          "position": [
             7
          ],
          "wildTypeXmer": "123456GAAAAAAAAAAAAA",
          "mutatedXmer": "123456AAAAAAAAAAAAAA"
       },
       "rnaExpression": null,
       "imputedGeneExpression": null,
       "dnaVariantAlleleFrequency": null,
       "rnaVariantAlleleFrequency": null
    }

Error message: Non existing aminoacid 1

The data normalization layer ensures the aminoacid representation is normalized into 1 letter IUPAC codes.

[19]:

valid_neoantigen = NeoantigenFactory.build_neoantigen(
    patient_identifier="12345",
    wild_type_xmer="AAAAAAAAAAAAA",
    mutated_xmer="aaaaaGaaaaa")

print(valid_neoantigen.mutation.to_json(indent=2))

{
  "position": [
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10,
    11
  ],
  "wildTypeXmer": "AAAAAAAAAAAAA",
  "mutatedXmer": "AAAAAGAAAAA"
}

After validation a unique neoantigen identifier is generated, this is a hash function of the normalized neoantigen representation, thus two different representations of the same neoantigen will share the same identifier after normalization.

[20]:

validated_neoantigen = ModelValidator.validate_neoantigen(neoantigen=neoantigen)
print(validated_neoantigen.to_json(indent=2))

{
  "patientIdentifier": "P123",
  "gene": "VCAN",
  "mutation": {
    "position": [
      14
    ],
    "wildTypeXmer": "DEVLGEPSQDILVIDQTRLEATISPET",
    "mutatedXmer": "DEVLGEPSQDILVTDQTRLEATISPET"
  },
  "rnaExpression": 0.519506894,
  "dnaVariantAlleleFrequency": 0.294573643,
  "rnaVariantAlleleFrequency": 0.857142857
}

Patients#

The neoantigen annotation process needs some context information, in particular some data about the individual where the somatic mutation creating this neoantigen took place. This information includes mainly the HLA types of the patient which is needed to compute the binding of the potential neoepitopes.

Parse MHC I alleles into a normal representation#

The main complexity in the patient model is the representation of the MHC I and MHC II alleles present in the patient. The HLA alleles are typically represented using the nomenclature defined here http://hla.alleles.org, but de facto there is certain flexibility in the representation of HLA alleles in the community. NeoFox aims at normalizing the different HLA representations into a controlled representation agreeing with the HLA nomenclature. NeoFox only supports the classic MHC genes and although the provided HLA type is kept internally it only works with the first 4 digits.

There are specific functions in NeoFox to parse a list of non normal HLA alleles into a normalized representation of the HLA alleles. Due to the heterogeneous representations of alleles we use IPD-IMGT/HLA database in order to normalize ambiguous alleles (e.g.: B15228=>HLA-B15:228 and DPB110401=>HLA-DPB1104:01).

Furthermore, the zygosity of each HLA gene is inferred.

[21]:

from neofox.references.references import ReferenceFolder
import os

os.environ["NEOFOX_REFERENCE_FOLDER"] = "/neofox_install/reference_data"
reference_folder = ReferenceFolder()
hla_database = reference_folder.get_mhc_database()

[I 210928 12:18:03 references:342] Reference genome folder: /neofox_install/reference_data
[I 210928 12:18:03 references:343] Resources
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhc2pan_available_alleles_human.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhcpan_available_alleles_human.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db/Homo_sapiens.fa
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb/IEDB_homo_sapiens.fasta
[I 210928 12:18:03 references:345] /neofox_install/reference_data/hla_database_allele_list.csv

[22]:

# by default it loads the references for Homo sapiens and hence HLA, for mouse run:
h2_database = ReferenceFolder(organism='mouse').get_mhc_database()

[I 210928 12:18:03 references:342] Reference genome folder: /neofox_install/reference_data
[I 210928 12:18:03 references:343] Resources
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhc2pan_available_alleles_mice.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/netmhcpan_available_alleles_mice.txt
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db
[I 210928 12:18:03 references:345] /neofox_install/reference_data/proteome_db/Mus_musculus.fa
[I 210928 12:18:03 references:345] /neofox_install/reference_data/iedb/IEDB_mus_musculus.fasta
[I 210928 12:18:03 references:345] /neofox_install/reference_data/h2_database_allele_list.csv

Parse a list of MHC I alleles. The data validation will ensure that the data is valid and it will infer the zygosity of the different genes. The data normalization layer will normalize the HLA representation into a the valid HLA nomenclature including the first 4 digits. Different representations of the same allele will be matched after normalization.

[23]:

mhc1 = ModelConverter.parse_mhc1_alleles(
    ["HLA-A*01:01:02:03N", "HLA-A*01:02:02:03N", "B15228", "HLA-B*15:228:02:04N", "C03_163"],
    mhc_database=hla_database)
ModelConverter.objects2dataframe(mhc1)

[23]:

	name	zygosity	alleles
0	A	HETEROZYGOUS	[{'fullName': 'HLA-A*01:01:02:03N', 'name': 'H...
1	B	HOMOZYGOUS	[{'fullName': 'HLA-B15:228', 'name': 'HLA-B1...
2	C	HEMIZYGOUS	[{'fullName': 'HLA-C03:163', 'name': 'HLA-C0...

[24]:

ModelConverter.objects2dataframe(mhc1[0].alleles + mhc1[1].alleles + mhc1[2].alleles)

[24]:

	fullName	name	gene	group	protein
0	HLA-A*01:01:02:03N	HLA-A*01:01	A	01	01
1	HLA-A*01:02:02:03N	HLA-A*01:02	A	01	02
2	HLA-B*15:228	HLA-B*15:228	B	15	228
3	HLA-C*03:163	HLA-C*03:163	C	03	163

Validation and normalization of MHC alleles#

The data validation layer checks that the provided allele representations are valid.

[25]:

try:
    ModelConverter.parse_mhc1_alleles(["HLA-W*01:01:02:03N"], mhc_database=hla_database)  # bad gene W
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))

Error message: Allele does not match HLA allele pattern HLA-W*01:01:02:03N

[26]:

try:
    ModelConverter.parse_mhc1_alleles(["HLA-A*first:second:02:03N"], mhc_database=hla_database)  # bad allele representation
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))

Error message: Allele does not match HLA allele pattern HLA-A*first:second:02:03N

[27]:

try:
    ModelConverter.parse_mhc1_alleles(["HLA-A*01:02:02:03N", "HLA-A*01:03:02:03N", "HLA-A*01:04:02:03N"], mhc_database=hla_database)  # wrong number of alleles
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))

Error message: More than 2 alleles for gene A

A warning message will be shown for non existing HLA alleles.

[28]:

ModelConverter.parse_mhc1_alleles(["HLA-B*01:02:02:03N", "HLA-C*01:02"], mhc_database=hla_database)

[W 210928 12:18:03 mhc_parser:159] Allele HLA-B*01:02:02:03N does not exist in the HLA database

[28]:

[Mhc1(name=<Mhc1Name.A: 0>, zygosity=<Zygosity.LOSS: 3>, alleles=[]),
 Mhc1(name=<Mhc1Name.B: 1>, zygosity=<Zygosity.HEMIZYGOUS: 2>, alleles=[MhcAllele(full_name='HLA-B*01:02:02:03N', name='HLA-B*01:02', gene='B', group='01', protein='02')]),
 Mhc1(name=<Mhc1Name.C: 2>, zygosity=<Zygosity.HEMIZYGOUS: 2>, alleles=[MhcAllele(full_name='HLA-C*01:02', name='HLA-C*01:02', gene='C', group='01', protein='02')])]

Parse MHC II alleles into a normal representation#

The model for MHC II alleles is more complex as we need to reflect all combinations of alpha and beta chains, but the data validation and normalization provided by NeoFox is fundamentally the same.

Parse a list of MHC II alleles:

[29]:

mhc2 = ModelConverter.parse_mhc2_alleles(["HLA-DPA1*01:03", "HLA-DPA1*01:04", "HLA-DPB1*01:01", "HLA-DPB1*01:01",
                                          "HLA-DQA1*01:01", "HLA-DQA1*01:01", "HLA-DQB1*02:01", "HLA-DQB1*02:01",
                                          "HLA-DRB1*01:01", "HLA-DRB1*01:01"], mhc_database=hla_database)

An MHC II gene with an heteroyzgous alpha chain and an homozygous beta chain has two isoforms

[30]:

mhc2[1].to_dict()

[30]:

{'name': 'DQ',
 'genes': [{'name': 'DQA1',
   'alleles': [{'fullName': 'HLA-DQA1*01:01',
     'name': 'HLA-DQA1*01:01',
     'gene': 'DQA1',
     'group': '01',
     'protein': '01'}]},
  {'name': 'DQB1',
   'alleles': [{'fullName': 'HLA-DQB1*02:01',
     'name': 'HLA-DQB1*02:01',
     'gene': 'DQB1',
     'group': '02',
     'protein': '01'}]}],
 'isoforms': [{'name': 'HLA-DQA1*01:01-DQB1*02:01',
   'alphaChain': {'fullName': 'HLA-DQA1*01:01',
    'name': 'HLA-DQA1*01:01',
    'gene': 'DQA1',
    'group': '01',
    'protein': '01'},
   'betaChain': {'fullName': 'HLA-DQB1*02:01',
    'name': 'HLA-DQB1*02:01',
    'gene': 'DQB1',
    'group': '02',
    'protein': '01'}}]}

An MHC II gene with an homozygous alpha and beta chains has a single isoform.

[31]:

mhc2[2].to_dict()

[31]:

{'genes': [{'alleles': [{'fullName': 'HLA-DRB1*01:01',
     'name': 'HLA-DRB1*01:01',
     'gene': 'DRB1',
     'group': '01',
     'protein': '01'}]}],
 'isoforms': [{'name': 'HLA-DRB1*01:01',
   'betaChain': {'fullName': 'HLA-DRB1*01:01',
    'name': 'HLA-DRB1*01:01',
    'gene': 'DRB1',
    'group': '01',
    'protein': '01'}}]}

The MHC II DRB gene is a special case with no alpha chain represented as this is not variable.

[32]:

mhc2[0].to_dict()

[32]:

{'name': 'DP',
 'genes': [{'name': 'DPA1',
   'zygosity': 'HETEROZYGOUS',
   'alleles': [{'fullName': 'HLA-DPA1*01:03',
     'name': 'HLA-DPA1*01:03',
     'gene': 'DPA1',
     'group': '01',
     'protein': '03'},
    {'fullName': 'HLA-DPA1*01:04',
     'name': 'HLA-DPA1*01:04',
     'gene': 'DPA1',
     'group': '01',
     'protein': '04'}]},
  {'name': 'DPB1',
   'alleles': [{'fullName': 'HLA-DPB1*01:01',
     'name': 'HLA-DPB1*01:01',
     'gene': 'DPB1',
     'group': '01',
     'protein': '01'}]}],
 'isoforms': [{'name': 'HLA-DPA1*01:03-DPB1*01:01',
   'alphaChain': {'fullName': 'HLA-DPA1*01:03',
    'name': 'HLA-DPA1*01:03',
    'gene': 'DPA1',
    'group': '01',
    'protein': '03'},
   'betaChain': {'fullName': 'HLA-DPB1*01:01',
    'name': 'HLA-DPB1*01:01',
    'gene': 'DPB1',
    'group': '01',
    'protein': '01'}},
  {'name': 'HLA-DPA1*01:04-DPB1*01:01',
   'alphaChain': {'fullName': 'HLA-DPA1*01:04',
    'name': 'HLA-DPA1*01:04',
    'gene': 'DPA1',
    'group': '01',
    'protein': '04'},
   'betaChain': {'fullName': 'HLA-DPB1*01:01',
    'name': 'HLA-DPB1*01:01',
    'gene': 'DPB1',
    'group': '01',
    'protein': '01'}}]}

Beware that incomplete MHC II molecules missing one of the chains will have no isoforms and thus no binding will be computed on them. In the case below the beta chain allele for the DP gene is missing.

[33]:

mhc2 = ModelConverter.parse_mhc2_alleles(["HLA-DPA1*01:03", "HLA-DPA1*01:04"], mhc_database=hla_database)
mhc2[1].to_dict()

[33]:

{'name': 'DQ',
 'genes': [{'name': 'DQA1', 'zygosity': 'LOSS'},
  {'name': 'DQB1', 'zygosity': 'LOSS'}]}

Create a patient#

[34]:

from neofox.model.neoantigen import Patient


mhc1 = ModelConverter.parse_mhc1_alleles(["HLA-A*01:01:02:03N", "HLA-A*01:02:02:03N",
                                          "HLA-B*15:01:02:03N", "HLA-B*15:01:02:04N",
                                          "HLA-C*03:02"], mhc_database=hla_database)
mhc2 = ModelConverter.parse_mhc2_alleles(["HLA-DPA1*01:03", "HLA-DPA1*01:04", "HLA-DPB1*01:01", "HLA-DPB1*01:01",
                                          "HLA-DQA1*01:01", "HLA-DQA1*01:01", "HLA-DQB1*02:01", "HLA-DQB1*02:01",
                                          "HLA-DRB1*01:01", "HLA-DRB1*01:01"], mhc_database=hla_database)
patient = Patient(
    identifier="P123",
    is_rna_available=True,
    tumor_type="NSCLC",
    mhc1=mhc1,
    mhc2=mhc2
)
ModelConverter.object2series(patient)

[34]:

identifier                                                       P123
is_rna_available                                                 True
tumor_type                                                      NSCLC
mhc1                [{'name': 'A', 'zygosity': 'HETEROZYGOUS', 'al...
mhc2                [{'name': 'DP', 'genes': [{'name': 'DPA1', 'zy...
Name: 0, dtype: object

Validate a patient#

[35]:

validated_patient = ModelValidator.validate_patient(patient)

A patient requires an identifier. MHC I and MHC II are optional in case one or the other are not available, the output annotations are adapted accordingly.

[36]:

try:
    ModelValidator.validate_patient(Patient())  # missing patient identifier
except NeofoxDataValidationException as e:
    print ("Error message: {}".format(e))

[E 210928 12:18:04 validation:117] {}

Error message: A patient identifier is missing

[37]:

patient_without_mhc2 = ModelValidator.validate_patient(Patient(identifier="12345", mhc1=mhc1))

[38]:

patient_without_mhc1 = ModelValidator.validate_patient(Patient(identifier="12345", mhc2=mhc2))

Run Neofox#

Parse input data from a file#

Although we could create the data objects manually as shown above, for convenience it is useful to store the data in tabular format. Here we show how to parse the neoantigens and patients from tabular files.

The tabular file for neoantigens should look as follows:

[39]:

pd.read_csv("data/test_model_file.txt", sep="\t")

[39]:

	gene	transcript_identifier	mutation.mutatedXmer	mutation.wildTypeXmer	patientIdentifier
0	VCAN	uc003kii.3	DEVLGEPSQDILVTDQTRLEATISPET	DEVLGEPSQDILVIDQTRLEATISPET	Pt27
1	DCST2	uc001fgm.3	RTNLLAALHRSVRWRAADQGHRSAFLV	RTNLLAALHRSVRRRAADQGHRSAFLV	Pt24
2	NRAS	uc009wgu.3	MTEYKLVVVGACGVGKSALTIQLIQ	MTEYKLVVVGAGGVGKSALTIQLIQ	Pt28
3	CEP350	uc001gnt.3	QTDSSSSDMQACSKDKAKISLGSSIDS	QTDSSSSDMQACSQDKAKISLGSSIDS	Pt63
4	CPPED1	uc002dca.4	DRAIPLVLVSGNHYIGNTPTAETVEEF	DRAIPLVLVSGNHDIGNTPTAETVEEF	Pt77
5	CXorf26	uc004ecl.1	YNKAVYISVQDKEEEKGVNNGGEKRAD	YNKAVYISVQDKEGEKGVNNGGEKRAD	Pt117
6	IGSF9B	uc001qgx.4	ASTHLTVIGTSPHVPGSVRVQVSMTTA	ASTHLTVIGTSPHAPGSVRVQVSMTTA	Pt110
7	HEATR5A	uc001wrf.4	TRRDEKSHPFTNPQWATRVFAAECVCR	TRRDEKSHPFTNPRWATRVFAAECVCR	Pt26
8	CHRDL2	uc001ovh.3	ARPDMFCLFHGKRHFPGESWHPYLEPQ	ARPDMFCLFHGKRYFPGESWHPYLEPQ	Pt77

There is a specific function to parse an input file into a list of neoantigens. Any additional column not matching a field in the neoantigens model, in this case transcript_identifier, will be parsed into the external annotations. Neofox when executed from the command line interface adds these external annotations in the output together with the new annotations.

[40]:

neoantigens = ModelConverter.parse_neoantigens_file("data/test_model_file.txt")

The tabular file for patients should look as follows:

[41]:

pd.read_csv("data/test_patient_file.txt", sep="\t")

[41]:

	identifier	mhcIAlleles	mhcIIAlleles	isRnaAvailable	tumorType
0	Pt27	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC
1	Pt24	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC
2	Pt28	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC
3	Pt63	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC
4	Pt77	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC
5	Pt117	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC
6	Pt110	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC
7	Pt26	HLA-A03:01,HLA-A29:02,HLA-B07:02,HLA-B44:0...	HLA-DRB104:02,HLA-DRB108:01,HLA-DQA1*03:01,H...	True	HNSC

Parse the patients into the model objects as follows:

[42]:

patients = ModelConverter.parse_patients_file("data/test_patient_file.txt", mhc_database=hla_database)

Annotate your neoantigens#

Running NeoFox requires the configuration its configuration through a number of environment variables, this is described in detail elsewhere in the documentation. This configuration can also be provided through a file passed into Neofox class in the field configuration_file.

[43]:

from neofox.neofox import NeoFox
import os

[44]:

os.environ["NEOFOX_REFERENCE_FOLDER"] = "/neofox_install/reference_data/"
os.environ["NEOFOX_RSCRIPT"] = "/usr/bin/Rscript"
os.environ["NEOFOX_BLASTP"] = "/neofox_install/ncbi-blast-2.10.1+/bin/blastp"
os.environ["NEOFOX_NETMHCPAN"] = "/neofox_install/netMHCpan-4.1/netMHCpan"
os.environ["NEOFOX_NETMHC2PAN"] = "/neofox_install/netMHCIIpan-4.0/netMHCIIpan"
os.environ["NEOFOX_MIXMHCPRED"] = "/neofox_install/MixMHCpred-2.1/MixMHCpred"
os.environ["NEOFOX_MIXMHC2PRED"] = "/neofox_install/MixMHC2pred-1.2/MixMHC2pred_unix"
os.environ["NEOFOX_PRIME"] = "/neofox_install/PRIME-master/PRIME"
annotations = NeoFox(neoantigens=neoantigens, patients=patients, num_cpus=4).get_annotations()

[I 210928 12:18:06 references:342] Reference genome folder: /neofox_install/reference_data/
[I 210928 12:18:06 references:343] Resources
[I 210928 12:18:06 references:345] /neofox_install/reference_data/netmhc2pan_available_alleles_human.txt
[I 210928 12:18:06 references:345] /neofox_install/reference_data/netmhcpan_available_alleles_human.txt
[I 210928 12:18:06 references:345] /neofox_install/reference_data/iedb
[I 210928 12:18:06 references:345] /neofox_install/reference_data/proteome_db
[I 210928 12:18:06 references:345] /neofox_install/reference_data/proteome_db/Homo_sapiens.fa
[I 210928 12:18:06 references:345] /neofox_install/reference_data/iedb/IEDB_homo_sapiens.fasta
[I 210928 12:18:06 references:345] /neofox_install/reference_data/hla_database_allele_list.csv
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at VCAN:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 8.8404052357
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at DCST2:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 0.1283784886
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at NRAS:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 14.0097794749
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CEP350:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 4.1881530572
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CPPED1:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 3.2718222656
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CXorf26:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 13.5743362176
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at IGSF9B:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 0.0771477047
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at HEATR5A:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 2.8020526973
[I 210928 12:18:06 expression_imputation:71] Fetching the gene expression at CHRDL2:10
[I 210928 12:18:06 expression_imputation:77] Fetched a gene expression of 0.3346201976
[I 210928 12:18:06 neofox:127] Data loaded
[I 210928 12:18:06 neofox:199] Starting NeoFox annotations...
[I 210928 12:20:18 neofox:246] Elapsed time for annotating 9 neoantigens 130 seconds

Neofox returns a list of annotations for each neoantigen, these are stored in an object called NeoantigenAnnotations which contains the corresponding neoantigen identifier, the annotator (ie: neofox), the annotator version, a timestamp and finally a list of the annotations.

[45]:

annotations[0].to_dict()

[45]:

{'patientIdentifier': 'Pt27',
 'gene': 'VCAN',
 'mutation': {'position': [14],
  'wildTypeXmer': 'DEVLGEPSQDILVIDQTRLEATISPET',
  'mutatedXmer': 'DEVLGEPSQDILVTDQTRLEATISPET'},
 'rnaExpression': 8.8404052357,
 'imputedGeneExpression': 8.8404052357,
 'dnaVariantAlleleFrequency': None,
 'rnaVariantAlleleFrequency': None,
 'neofoxAnnotations': {'annotations': [{'name': 'Best_rank_MHCI_score',
    'value': '3.906'},
   {'name': 'Best_rank_MHCI_score_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_rank_MHCI_score_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_score', 'value': '2982.7'},
   {'name': 'Best_affinity_MHCI_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_affinity_MHCI_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_rank_MHCI_9mer_score', 'value': '3.906'},
   {'name': 'Best_rank_MHCI_9mer_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_rank_MHCI_9mer_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_score', 'value': '2982.7'},
   {'name': 'Best_affinity_MHCI_9mer_allele', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_epitope', 'value': 'VTDQTRLEA'},
   {'name': 'Best_affinity_MHCI_score_WT', 'value': '9467.9'},
   {'name': 'Best_affinity_MHCI_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Best_affinity_MHCI_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_rank_MHCI_score_WT', 'value': '8.277'},
   {'name': 'Best_rank_MHCI_score_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Best_rank_MHCI_score_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_rank_MHCI_9mer_score_WT', 'value': '8.277'},
   {'name': 'Best_rank_MHCI_9mer_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Best_rank_MHCI_9mer_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_score_WT', 'value': '9467.9'},
   {'name': 'Best_affinity_MHCI_9mer_allele_WT', 'value': 'HLA-C*16:01'},
   {'name': 'Best_affinity_MHCI_9mer_epitope_WT', 'value': 'VIDQTRLEA'},
   {'name': 'Generator_rate_MHCI', 'value': '0'},
   {'name': 'Generator_rate_CDN_MHCI', 'value': '0'},
   {'name': 'Generator_rate_ADN_MHCI', 'value': '0'},
   {'name': 'PHBR_I', 'value': '6.2707'},
   {'name': 'Best_affinity_MHCI_9mer_position_mutation', 'value': '2'},
   {'name': 'Best_affinity_MHCI_9mer_anchor_mutated', 'value': '1'},
   {'name': 'Best_rank_MHCII_score', 'value': '3.26'},
   {'name': 'Best_rank_MHCII_score_epitope', 'value': 'SQDILVTDQTRLEAT'},
   {'name': 'Best_rank_MHCII_score_allele',
    'value': 'HLA-DQA1*03:01-DQB1*03:02'},
   {'name': 'Best_affinity_MHCII_score', 'value': '1103.5'},
   {'name': 'Best_affinity_MHCII_epitope', 'value': 'QDILVTDQTRLEATI'},
   {'name': 'Best_affinity_MHCII_allele', 'value': 'HLA-DRB1*08:01'},
   {'name': 'Best_rank_MHCII_score_WT', 'value': '6.14'},
   {'name': 'Best_rank_MHCII_score_epitope_WT', 'value': 'SQDILVIDQTRLEAT'},
   {'name': 'Best_rank_MHCII_score_allele_WT',
    'value': 'HLA-DQA1*03:01-DQB1*03:02'},
   {'name': 'Best_affinity_MHCII_score_WT', 'value': '562.6'},
   {'name': 'Best_affinity_MHCII_epitope_WT', 'value': 'QDILVIDQTRLEATI'},
   {'name': 'Best_affinity_MHCII_allele_WT', 'value': 'HLA-DRB1*08:01'},
   {'name': 'PHBR_II', 'value': '8.8958'},
   {'name': 'Generator_rate_MHCII', 'value': '0'},
   {'name': 'Generator_rate_CDN_MHCII', 'value': '0'},
   {'name': 'Generator_rate_ADN_MHCII', 'value': '0'},
   {'name': 'MixMHCpred_best_peptide', 'value': 'VTDQTRLEA'},
   {'name': 'MixMHCpred_best_score', 'value': '-0.09792'},
   {'name': 'MixMHCpred_best_rank', 'value': '10'},
   {'name': 'MixMHCpred_best_allele', 'value': 'HLA-A*29:02'},
   {'name': 'PRIME_best_peptide', 'value': 'LVTDQTRLE'},
   {'name': 'PRIME_best_score', 'value': '0.16349'},
   {'name': 'PRIME_best_rank', 'value': '5'},
   {'name': 'PRIME_best_allele', 'value': 'HLA-A*29:02'},
   {'name': 'MixMHC2pred_best_peptide', 'value': 'DEVLGEPSQDILVT'},
   {'name': 'MixMHC2pred_best_rank', 'value': '3.06'},
   {'name': 'MixMHC2pred_best_allele', 'value': 'HLA-DPA1*01:03-DPB1*04:01'},
   {'name': 'Expression_mutated_transcript', 'value': 'NA'},
   {'name': 'mutation_not_found_in_proteome', 'value': '1'},
   {'name': 'Amplitude_MHCI_affinity_9mer', 'value': '0.82656'},
   {'name': 'Amplitude_MHCI_affinity', 'value': '0.82656'},
   {'name': 'Amplitude_MHCII_rank', 'value': '1.8834'},
   {'name': 'Pathogensimiliarity_MHCI_9mer', 'value': '0'},
   {'name': 'Recognition_Potential_MHCI_9mer', 'value': '0'},
   {'name': 'Pathogensimiliarity_MHCII', 'value': '0'},
   {'name': 'DAI_MHCI_affinity', 'value': '6485.2'},
   {'name': 'CDN_MHCI', 'value': '0'},
   {'name': 'ADN_MHCI', 'value': '0'},
   {'name': 'CDN_MHCII', 'value': '0'},
   {'name': 'ADN_MHCII', 'value': '0'},
   {'name': 'Tcell_predictor_score', 'value': '0.5094749478464053'},
   {'name': 'Improved_Binder_MHCI', 'value': '1'},
   {'name': 'Selfsimilarity_MHCII', 'value': '0.9615321336133709'},
   {'name': 'Selfsimilarity_MHCI', 'value': '0.9746163344293891'},
   {'name': 'Selfsimilarity_MHCI_conserved_binder', 'value': 'NA'},
   {'name': 'Number_of_mismatches_MCHI', 'value': '1'},
   {'name': 'Priority_score', 'value': 'NA'},
   {'name': 'Neoag_immunogenicity', 'value': '142.3854'},
   {'name': 'IEDB_Immunogenicity_MHCI', 'value': '0.0263'},
   {'name': 'IEDB_Immunogenicity_MHCII', 'value': '0.24619'},
   {'name': 'Dissimilarity_MHCI', 'value': '0'},
   {'name': 'Dissimilarity_MHCII', 'value': '0'},
   {'name': 'vaxrank_binding_score', 'value': '0'},
   {'name': 'vaxrank_total_score', 'value': 'NA'},
   {'name': 'Hex_alignment_score_MHCI', 'value': '146'},
   {'name': 'Hex_alignment_score_MHCII', 'value': '374'}],
  'annotator': 'NeoFox',
  'annotatorVersion': '0.5.4.dev3',
  'timestamp': '20210928121942034880'},
 'externalAnnotations': [{'name': 'transcript_identifier',
   'value': 'uc003kii.3'}]}

Transform the annotations into a data frame#

[46]:

annotations_table = ModelConverter.annotations2table(neoantigens=annotations)
annotations_table.head(10)

[46]:

	patientIdentifier	gene	mutation.mutatedXmer	mutation.wildTypeXmer	mutation.position	dnaVariantAlleleFrequency	rnaVariantAlleleFrequency	rnaExpression	imputedGeneExpression	ADN_MHCI	...	Priority_score	Selfsimilarity_MHCI	Selfsimilarity_MHCII	Selfsimilarity_MHCI_conserved_binder	Tcell_predictor_score	mutation_not_found_in_proteome	transcript_identifier	vaxrank_binding_score	vaxrank_total_score
0	Pt27	VCAN	DEVLGEPSQDILVTDQTRLEATISPET	DEVLGEPSQDILVIDQTRLEATISPET	14	NA	NA	8.840405	8.840405	0	...	NA	0.9746163344293891	0.9615321336133709	NA	0.5094749478464053	1	uc003kii.3	0	NA
1	Pt24	DCST2	RTNLLAALHRSVRWRAADQGHRSAFLV	RTNLLAALHRSVRRRAADQGHRSAFLV	14	NA	NA	0.128378	0.128378	0	...	NA	0.9421875787623097	0.9391684897220461	0.9421875787623097	0.27191401693602235	1	uc001fgm.3	0.38942	NA
2	Pt28	NRAS	MTEYKLVVVGACGVGKSALTIQLIQ	MTEYKLVVVGAGGVGKSALTIQLIQ	12	NA	NA	14.009779	14.009779	0	...	NA	0.9330521460001094	0.9341826178609157	0.9330521460001094	0.5068878716790075	1	uc009wgu.3	1.4787	NA
3	Pt63	CEP350	QTDSSSSDMQACSKDKAKISLGSSIDS	QTDSSSSDMQACSQDKAKISLGSSIDS	14	NA	NA	4.188153	4.188153	0	...	NA	0.9822504402167844	0.9861912144506167	0.9822504402167844	0.3367698236194417	1	uc001gnt.3	0.23049	NA
4	Pt77	CPPED1	DRAIPLVLVSGNHYIGNTPTAETVEEF	DRAIPLVLVSGNHDIGNTPTAETVEEF	14	NA	NA	3.271822	3.271822	1	...	NA	0.9716393071320848	0.9451458345555184	NA	0.5720939101479084	1	uc002dca.4	0.8453	NA
5	Pt117	CXorf26	YNKAVYISVQDKEEEKGVNNGGEKRAD	YNKAVYISVQDKEGEKGVNNGGEKRAD	14	NA	NA	13.574336	13.574336	0	...	NA	0.9542175575949939	0.9697523262453144	0.9542175575949939	0.4886563384775356	1	uc004ecl.1	0	NA
6	Pt110	IGSF9B	ASTHLTVIGTSPHVPGSVRVQVSMTTA	ASTHLTVIGTSPHAPGSVRVQVSMTTA	14	NA	NA	0.077148	0.077148	0	...	NA	0.9874590500904996	0.9803010405217453	0.9874590500904996	0.09919940654277996	1	uc001qgx.4	3.6385	NA
7	Pt26	HEATR5A	TRRDEKSHPFTNPQWATRVFAAECVCR	TRRDEKSHPFTNPRWATRVFAAECVCR	14	NA	NA	2.802053	2.802053	0	...	NA	0.9614920042660836	0.9596541446763491	0.9614920042660836	0.3374084622586067	1	uc001wrf.4	3.1392	NA
8	Pt77	CHRDL2	ARPDMFCLFHGKRHFPGESWHPYLEPQ	ARPDMFCLFHGKRYFPGESWHPYLEPQ	14	NA	NA	0.334620	0.334620	0	...	NA	0.9782021524274523	0.9391077888953303	0.9782021524274523	0.49744103834505293	1	uc001ovh.3	0.91677	NA

9 rows × 96 columns

[47]:

annotations_table.set_index(['patientIdentifier', 'mutation.mutatedXmer']).stack().reset_index().head(60)

[47]:

	patientIdentifier	mutation.mutatedXmer	level_2	0
0	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	gene	VCAN
1	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	mutation.wildTypeXmer	DEVLGEPSQDILVIDQTRLEATISPET
2	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	mutation.position	14
3	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	dnaVariantAlleleFrequency	NA
4	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	rnaVariantAlleleFrequency	NA
5	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	rnaExpression	8.84041
6	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	imputedGeneExpression	8.84041
7	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	ADN_MHCI	0
8	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	ADN_MHCII	0
9	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Amplitude_MHCII_rank	1.8834
10	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Amplitude_MHCI_affinity	0.82656
11	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Amplitude_MHCI_affinity_9mer	0.82656
12	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCII_allele	HLA-DRB1*08:01
13	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCII_allele_WT	HLA-DRB1*08:01
14	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCII_epitope	QDILVTDQTRLEATI
15	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCII_epitope_WT	QDILVIDQTRLEATI
16	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCII_score	1103.5
17	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCII_score_WT	562.6
18	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_allele	HLA-C*16:01
19	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_allele_WT	HLA-C*16:01
20	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_anchor_mutated	1
21	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_epitope	VTDQTRLEA
22	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_epitope_WT	VIDQTRLEA
23	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_position_mutation	2
24	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_score	2982.7
25	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_9mer_score_WT	9467.9
26	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_allele	HLA-C*16:01
27	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_allele_WT	HLA-C*16:01
28	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_epitope	VTDQTRLEA
29	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_epitope_WT	VIDQTRLEA
30	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_score	2982.7
31	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_affinity_MHCI_score_WT	9467.9
32	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCII_score	3.26
33	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCII_score_WT	6.14
34	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCII_score_allele	HLA-DQA103:01-DQB103:02
35	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCII_score_allele_WT	HLA-DQA103:01-DQB103:02
36	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCII_score_epitope	SQDILVTDQTRLEAT
37	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCII_score_epitope_WT	SQDILVIDQTRLEAT
38	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_9mer_allele	HLA-C*16:01
39	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_9mer_allele_WT	HLA-C*16:01
40	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_9mer_epitope	VTDQTRLEA
41	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_9mer_epitope_WT	VIDQTRLEA
42	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_9mer_score	3.906
43	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_9mer_score_WT	8.277
44	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_score	3.906
45	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_score_WT	8.277
46	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_score_allele	HLA-C*16:01
47	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_score_allele_WT	HLA-C*16:01
48	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_score_epitope	VTDQTRLEA
49	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Best_rank_MHCI_score_epitope_WT	VIDQTRLEA
50	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	CDN_MHCI	0
51	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	CDN_MHCII	0
52	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	DAI_MHCI_affinity	6485.2
53	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Dissimilarity_MHCI	0
54	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Dissimilarity_MHCII	0
55	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Expression_mutated_transcript	NA
56	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Generator_rate_ADN_MHCI	0
57	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Generator_rate_ADN_MHCII	0
58	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Generator_rate_CDN_MHCI	0
59	Pt27	DEVLGEPSQDILVTDQTRLEATISPET	Generator_rate_CDN_MHCII	0