Title

Web API And Programmatic Access

🧬 JSON Formatted Results - https://api.predictprotein.org


History

The API was started to supply PredictProtein result data to be consumed by the MolArt javascript plugin (written by David Hoksza at the LCSB), and has been designed to be easily expanded upon.


Retrieve a result set for a particular UniProt ID or Accession Number:

GET /v1/results/{id|ppc_hash} - where {id} is a UniProt ID or Accession Number, and {ppc_hash} is a predictprotein sequence hash (mandatory).


By default, all results available through this API are returned for the specified valid query parameter.
You may, however, specify additional options in the GET request to refine and/or restructure the returned results:


filter - feature filter returning only the feature specified.

Valid filter values:

  • CONSERVATION_(CONSEQ)
  • DISORDERED_REGION_(META-DISORDER)
  • DISULFIDE_BOND_(DISULFIND)
  • DNA_BINDING_(PRONA)
  • PROTEIN_BINDING_(PRONA)
  • RELATIVE_B-VALUE_(PROF-BVAL)
  • RNA_BINDING_(PRONA)
  • SECONDARY_STRUCTURE_(REPROF)
  • SOLVENT_ACCESSIBILITY_(REPROF)
  • TOPOLOGY_(TMSEG)

Example: https://api.predictprotein.org/v1/results/Q9Z1D9?filter=CONSERVATION_(CONSEQ)


format - adjusts the default data format for a particular purpose.

Valid format values:

  • protvista - the data-/field- relationships are re-arranged in order that the data is properly consumed by the Protvista visualisation tool.

Example: https://api.predictprotein.org/v1/results/Q9Z1D9?filter=CONSERVATION_(CONSEQ)&format=protvista


method - returns data for a specific method.

Valid method values:

  • consurf
  • disulfind
  • mdisorder
  • norsnet
  • profbval
  • prona
  • reprof
  • tmseg

Example: https://api.predictprotein.org/v1/results/Q9Z1D9?method=norsnet


Retrieve a result set for a particular valid (and string-ified) JSON payload in either FASTA-format or plain sequence:

POST /v1/results - where JSON payload contains:

{
    "protein": {
        "sequence": "<FASTA-format or plain sequence>"
    }
}                
Back to top

🧬 Result File Information and Retrieval Access - https://predictprotein.org/api


Description

Since we realize the scientific community would benefit from having access to the more than 50 files that are generated for each submitted amino-acid sequence, we've created this API, allowing all users access to all result files for any already-computed protein sequence.

PredictProtein runs a series of methods, developed and contributed by various researchers, where each method generates at least one result/output file that may or may not be a dependent of another method in the PredictProtein pipeline. Sometimes, a method may return no data if it is unable to generate results; thus, not creating an output file. Once all of the methods have run in the PredictProtein pipeline, a result set is ready to be translated into the visualizations that you see on predictprotein.org

Some request parameters are mutually exclusive, meaning only one of the specified parameters may be used, and not in combination, with the other specified.

PPC Cache Result File Information and Data Download

GET POST /ppc_fetch - Along with a combination of mandatory and optional parameters, this endpoint will allow you to do things like:

  • Check what or if any files were produced by one or more specified methods
  • Get a listing of all files produced for a particular amino-acid sequence or unique PredictProtein sequence hash identifier in JSON format
  • Download a particular result file in its original format, or a group of result files compressed in a zip archive

When using POST as your request type, use the same parameters described here as keys in the request body of the JSON object, making sure at least the Content-Type: application/json HTTP header is set.

For example, sending a POST request to https://predictprotein.org/api/ppc_fetch, checking if and what files exist for the specified sequence for methods "coils", "reprof", and "loctree3":

{
    "action": "has",
    "sequence": "MAAGSGVVPPPLGAGLCTVKVEEDSPGNQESSGSGDWQNPETSRKQFRQLRYQEVAGPEEALSRLWELCRRWLRPELLSKEQIMELLVLEQFLTILPQELQAYVRDHSPESGEEAAALARTLQRALDRASPQGFMTFKDVAESLTWEEWEQLAAARKGFCEESTKDAGSTVVPGLETRTVNTDVILKQEILKEAEPQAWLQEVSQGMVPALTKCGDPSEDWEEKLPKAAVLLQLQGSEEQGRTAIPLLIGVSREERDSKNNESENSGSSVLGQHIQTAEGLGTNSQCGDDHKQGFHVKCHSVKPHSSVDSAVGLLETQRQFQEDKPYKCDSCEKGFRQRSDLFKHQRIHTGEKPYQCQECGKRFSQSAALVKHQRTHTGEKPYACPECGECFRQSSHLSRHQRTHASEKYYKCEECGEIVHVSSLFRHQRLHRGERPYKCGDCEKSFRQRSDLFKHQRTHTGEKPYACVVCGRRFSQSATLIKHQRTHTGEKPYKCFQCGERFRQSTHLVRHQRIHQNSVS",
    "method": "coils,reprof,lc3"
}                    

If this was a GET request, the URL, in this example, would be:

https://predictprotein.org/api/ppc_fetch?action=has&sequence=MAAGSGVVPPPLGAGLCTVKVEEDSPGNQESSGSGDWQNPETSRKQFRQLRYQEVAGPEEALSRLWELCRRWLRPELLSKEQIMELLVLEQFLTILPQELQAYVRDHSPESGEEAAALARTLQRALDRASPQGFMTFKDVAESLTWEEWEQLAAARKGFCEESTKDAGSTVVPGLETRTVNTDVILKQEILKEAEPQAWLQEVSQGMVPALTKCGDPSEDWEEKLPKAAVLLQLQGSEEQGRTAIPLLIGVSREERDSKNNESENSGSSVLGQHIQTAEGLGTNSQCGDDHKQGFHVKCHSVKPHSSVDSAVGLLETQRQFQEDKPYKCDSCEKGFRQRSDLFKHQRIHTGEKPYQCQECGKRFSQSAALVKHQRTHTGEKPYACPECGECFRQSSHLSRHQRTHASEKYYKCEECGEIVHVSSLFRHQRLHRGERPYKCGDCEKSFRQRSDLFKHQRTHTGEKPYACVVCGRRFSQSATLIKHQRTHTGEKPYKCFQCGERFRQSTHLVRHQRIHQNSVS&method=coils,reprof,lc3

sequence OR hash - identifier to search for in the PredictProtein cache (mandatory, mutually exclusive).
  • hash - PredictProtein computed 40-character hash for an amino-acid sequence
  • sequence - a valid amino-acid sequence

action - specifies what the service the server should perform for the specified hash or sequence (mandatory).

Valid action values:

  • has - returns a listing, in JSON format, of available file(s) either for:
    • the entire result set, if only parameter sequence or hash is specified
    • the files produced for one or more methods, using the optional method parameter
    • one particular file of interest, if the optional parameter file is used
  • get - returns computed data in either its original or zipped format, depending on additional options specified:
    • the entire result set, in a zipped archive, if only sequence or hash is specified
    • the files produced for one or more methods, in a zipped archive, using the optional method parameter
    • one particular file of interest, in its original raw format, when using the optional parameter file

method OR file - identifier to search for in the PredictProtein cache (optional, mutually exclusive).

Valid method values, of which one or more, separated by commas, may be specified:

  • coils
  • conseq
  • disulfind
  • hmmer
  • hssp
  • loctree3
  • mdisorder
  • mmseqs2
  • mstudent
  • norsnet
  • norsp
  • phdhtm
  • predictnls
  • profacc
  • profbval
  • profdisis
  • profisis
  • profsec
  • proftmb
  • prona
  • prosite_scan
  • psiblast
  • psic
  • reprof
  • seg
  • somena
  • tmhmm
  • tmseg

Data returned using the method parameter will be a compressed zipped archive containing those files created by the method(s) specified.

Valid file values, of which one, may be specified:

  • query.arch.lc3
  • query.arch.lc3.pb
  • query.arch.lc3.svm
  • query.bact.lc3
  • query.bact.lc3.pb
  • query.bact.lc3.svm
  • query.blastPsiAli.gz
  • query.blastPsiMat
  • query.blastPsiRdb
  • query.blastpSwissM8
  • query.chk
  • query.clustalngz
  • query.coils
  • query.coils_raw
  • query.consurf.grades
  • query.consurf.html
  • query.disis
  • query.disulfinder
  • query.euka.lc3
  • query.euka.lc3.pb
  • query.euka.lc3.svm
  • query.fasta
  • query.hmm2pfam
  • query.hmm3pfam
  • query.hmm3pfamDomTbl
  • query.hmm3pfamTbl
  • query.hsspPsiFil.gz
  • query.in
  • query.isis
  • query.mdisorder
  • query.metastudent.BPO.txt
  • query.metastudent.CCO.txt
  • query.metastudent.MFO.txt
  • query.mmseqs2AliPdb
  • query.mmseqs2AliUref
  • query.nls
  • query.nlsDat
  • query.nlsSum
  • query.nors
  • query.norsnet
  • query.phdPred
  • query.phdRdb
  • query.prof1Rdb
  • query.profAscii
  • query.profb4snap
  • query.profbval
  • query.profRdb
  • query.proftmb
  • query.proftmbdat
  • query.prona
  • query.prosite
  • query.psic
  • query.reprof
  • query.segNorm
  • query.somena
  • query.sumNors
  • query.tmhmm
  • query.tmseg

Data returned using the file parameter will be the raw output file created by the specified method.
Back to top

🧬 bio_embeddings API Access - https://embeddings.predictprotein.org/api


Description

bio_embeddings provides the retrieval of protein embeddings from protein sequences using pipeline which:

  • embeds sequences into matrix-representations (per-amino-acid) or vector-representations (per-sequence) that can be used to train learning models or for analytical purposes
  • projects per-sequence embeddings into lower dimensional representations using UMAP or t-SNE (for lightweight data handling and visualizations)
  • visualizes low dimensional sets of per-sequence embeddings onto 2D and 3D interactive plots (with and without annotations)
  • extracts annotations from per-sequence and per-amino-acid embeddings using supervised (when available) and unsupervised approaches (e.g. by network analysis)

In PredictProtein, bio_embeddings is also used for providing secondary structure predictions.

This API is also accessible to users by referring to the bio_embeddings API documentation

Back to top