tutorial

The TFmodeller program scans a protein sequence P against a library of protein-DNA complexes and builds comparative models of P if good templates are found. These models are used to get an idea of the P-DNA interface, its evolution and the putative recognised DNA sequences. This tutorial explains how to use it in these sections:

input data


1. fully automatic mode

To run TFmodeller you need the FASTA-formatted amino acid sequence of one or more proteins known or suspected to bind to DNA, such as the FNR transcription factor in E.coli:
>P0A9E5|FNR_ECOLI Fumarate and nitrate reduction regulato...
MIPEKRIIRRIQSGGCAIHCQDCSISQLCIPFTLNEHELDQLDNIIERKKPIQKGQTLFK
AGDELKSLYAIRSGTIKSYTITEQGDEQITGFHLAGDLVGFDAIGSGHHPSFAQALETSM
VCEIPFETLDDLSGKMPNLRQQMMRLMSGEIKGDQDMILLLSKKNAEERLAAFIYNLSRR
FAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAVKGKYITIENNDALA
QLAGHTRNVA

Once you paste the protein sequence and type your email, TFmodeller will scan this sequence against a weekly updated library of protein-DNA complexes, using PSI-BLAST. Each match found in this search is regarded as a template and the BLAST alignment is then used to drive the building of comparative models of the input sequence in complex with DNA. This is a fully automated process that builds monomeric complexes and might serve as a first approach mode. However, many transcription factors bind to DNA as multimeric complexes. You can model these using the user template/alignment mode.

2. user template mode

Often you will have an idea of what template is best to build this model, perhaps after checking interface similarity with the 3D-footprint database search form or after reading a paper. When this happens you must save your template coordinates in PDB format and tell TFmodeller to use it in the input form, at the bottom. Typically, a PDB template file will look like this:
HEADER    GENE-REGULATORY PROTEIN                 12-AUG-91   1CGP
JRNL        AUTH   S.C.SCHULTZ,G.C.SHIELDS,T.A.STEITZ             
JRNL        TITL   CRYSTAL STRUCTURE OF A CAP-DNA COMPLEX: THE DNA
JRNL        TITL 2 IS BENT BY 90 DEGREES                          
JRNL        REF    SCIENCE                       V. 253  1001 1991
JRNL        REFN   ASTM SCIEAS  US ISSN 0036-8075                 
REMARK   (many remarks may follow...)
REMARK   2 RESOLUTION. 3.0  ANGSTROMS.                            
ATOM      1  N   PRO A   9      32.555  55.928  33.201  1.00 82.62
ATOM      2  CA  PRO A   9      31.300  56.105  32.474  1.00 82.25
ATOM      3  C   PRO A   9      30.441  54.837  32.272  1.00 81.70
ATOM      4  O   PRO A   9      30.717  53.724  32.761  1.00 80.50
ATOM      5  CB  PRO A   9      31.739  56.735  31.148  1.00 81.60
...

In this mode, TFmodeller will extract the protein sequence contained in the PDB template and it will try to align it to the query input sequence using BLAST2SEQ. If the generated alignment is good enough (in terms of coverage and sequence identity) then a comparative modell of the protein-DNA complex will be built.
To summarize, in this mode you need to: 1) paste the protein sequence of your query and 2) upload the PDB coordinates of your chosen template.

3. user template + alignment mode

TFmodeller allows you to use custom alignments of the amino acid sequence of query and template. This might be useful when you are not satisfied by the automatic alignment. The alignment must be in FASTA format as well, with the input sequence on top, followed by the template's sequence. Headers will be ignored:
 
>sp|P0A9E5|FNR_ECOLI monomer
MIPEKRIIRRIQSGGCAIHCQDCSISQLCIPFTLNEHELDQLDNIIERKKPIQKGQTLFK
AGDELKSLYAIRSGTIKSYTITEQGDEQITGFHLAGDLVGFDAIG--SGHHPSFAQALET
SMVCEIPFETLDDLSGKMPNLRQQMMRLMSGEIKGDQDMILLLSKKNAEERLAAFIYNLS
RRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAVKGKYITIENNDA
LAQLAGHTRNVA
>template 1CGP chain A
---------------------------PTLEWFLSHCHIHKYP----------SKSTLIH
QGEKAETLYYIVKGSVAVLIKDEEGKEMILSYLNQGDFIGELGLFEEGQERSAWVRAKTA
CEVAEISYKKFRQLIQVNPDILMRLSAQMARRLQVTSEKVGNLAFLDVTGRIAQTLLNLA
K-QPDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILKMLEDQNLISAHGKTIVV-----
------------
It is important to note that the server requires that the amino acid sequence of the aligned template exactly matches the sequence in the PDB file with the coordinates.
In order to use the server in this mode you need to: 1) paste the protein sequence of your query, 2) upload the PDB coordinates of your chosen template and 3) upload the alignment in FASTA format.

4. modelling a multimeric complex

It is possible to take advantage of TFmodeller to build multimeric models, in which two or more protein chains bind to the same DNA molecule. Of course it is necessary to use a multimeric template to do this, extracted from the PDB or generated with symmetry matrices, as explained here. Here I will illustrate how to model a FNR dimer, the protein introduced earlier, which is known to be functional as a dimer. We will first obtain the sequence of the FNR dimer by concatenating two copies of the sequences, which we will paste in the window (note that with heterodimers we will concatenate two different sequences):
>sp|P0A9E5|FNR_ECOLI monomer
MIPEKRIIRRIQSGGCAIHCQDCSISQLCIPFTLNEHELDQLDNIIERKKPIQKGQTLFK
AGDELKSLYAIRSGTIKSYTITEQGDEQITGFHLAGDLVGFDAIG--SGHHPSFAQALET
SMVCEIPFETLDDLSGKMPNLRQQMMRLMSGEIKGDQDMILLLSKKNAEERLAAFIYNLS
RRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAVKGKYITIENNDA
LAQLAGHTRNVA
MIPEKRIIRRIQSGGCAIHCQDCSISQLCIPFTLNEHELDQLDNIIERKKPIQKGQTLFK
AGDELKSLYAIRSGTIKSYTITEQGDEQITGFHLAGDLVGFDAIG--SGHHPSFAQALET
SMVCEIPFETLDDLSGKMPNLRQQMMRLMSGEIKGDQDMILLLSKKNAEERLAAFIYNLS 
RRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAVKGKYITIENNDA
LAQLAGHTRNVA
Then we need to align the FNR dimer to the dimeric PDB template, with two concatenated protein chains, A and B, and put the alignment in a FASTA formatted text file (check the PDB file here):
>sp|P0A9E5|FNR_ECOLI dimer
MIPEKRIIRRIQSGGCAIHCQDCSISQLCIPFTLNEHELDQLDNIIERKKPIQKGQTLFK
AGDELKSLYAIRSGTIKSYTITEQGDEQITGFHLAGDLVGFDAIG--SGHHPSFAQALET
SMVCEIPFETLDDLSGKMPNLRQQMMRLMSGEIKGDQDMILLLSKKNAEERLAAFIYNLS
RRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAVKGKYITIENNDA
LAQLAGHTRNVAMIPEKRIIRRIQSGGCAIHCQDCSISQLCIPFTLNEHELDQLDNIIER
KKPIQKGQTLFKAGDELKSLYAIRSGTIKSYTITEQGDEQITGFHLAGDLVGFDAIG--S
GHHPSFAQALETSMVCEIPFETLDDLSGKMPNLRQQMMRLMSGEIKGDQDMILLLSKKNA
EERLAAFIYNLSRRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLGRFQKSGMLAV
KGKYITIENNDALAQLAGHTRNVA
>template 1CGP chains A,B
---------------------------PTLEWFLSHCHIHKYP----------SKSTLIH
QGEKAETLYYIVKGSVAVLIKDEEGKEMILSYLNQGDFIGELGLFEEGQERSAWVRAKTA
CEVAEISYKKFRQLIQVNPDILMRLSAQMARRLQVTSEKVGNLAFLDVTGRIAQTLLNLA
K-QPDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILKMLEDQNLISAHGKTIVV-----
--------PTLEWFLSHCHIHKYPSK----------------------------------
-------STLIHQGEKAETLYYIVKGSVAVLIKDEEGKEMILSYLNQGDFIGELGLFEEG
QERSAWVRAKTACEVAEISYKKFRQLIQVNPDILMRLSAQMARRLQVTSEKVGNLAFLDV
TGRIAQTLLNLAK-QPDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILKMLEDQNLISA
HGKTIVV-----------------
Finally, as explained above, you need to: 1) paste the protein sequence of your query, 2) upload the PDB coordinates of your chosen template and 3) upload the alignment in FASTA format.

Go back to top

output

After successfully receiving the submission data, TFmodeller will start. Results are always emailed to the user, although you have the chance to wait and see them appear in your browser. Results include: Results look like this:
# sequence library: /home1/tfmodell/db/dna_complexes_PDB.fas (Sun Mar  4 10:44:13 2007)

> P0A9E5_FNR_ECOLI_980419 number of comparative complexes = 1


_Matrix of homologous interface contacts:
_ stats: contacts=13 Nring=5 specif=0.38 entropy=0.67
_ PDBs:  1:1zrf_A,2:1cf7_A,3:1qbj_A,4:1sfu_A,5:2heo_A,6:1je8_A,7:1zlk_A,
_ PDBs:  8:1b8i_B,9:1tc3_C,10:2h27_A,11:1k61_A,12:2glo_A,13:2hdd_A,14:1rio_H,
_ -lnE:  1:25.1,2:8.4,3:8.1,4:8.1,5:7.1,6:6.0,7:5.6,8:5.5,9:5.3,10:5.2,
_ -lnE:  11:5.2,12:5.0,13:5.0,14:4.7,
_        1 2 3 4 5 6 7 8 91011121314
0196 R  ------------------YG-------- 1.33
0197 G* QT--------------HG---------- 0.88
0206 T* ST--------------SG----HC--TT 0.89
0207 V* RG--------------RG----RG--RG 0.67
0208 E* ECRG------SCKG--HG----RG--EC 0.91
0209 T  --RG------TATG----TG--QA--RT 1.02
0211 S  --YC------KTKA--RTRG--QG--RC 0.71
0212 R* RG--RG----VCNGNT--SGNC--IAQT 1.65
0213 L  ----------HTYA-------------- 0.60
0215 G  ----YGYGYGKG------FTST--KG-- 0.66
0216 R  --------------NA--RGNA--NA-- 0.59
0219 K  --------------------RG------ 0.66
0220 S  --------------RG------------ 0.37


model 1zrf_A 203 DNACOMPLEX resol=2.10 %ID=21 e-value=3e-56
_query    LDQLDNIIERKKPIQKGQTLFKAGDELKSLYAIRSGTIKSYTITEQGDEQITGFHLAGDL
_template LEWFLSHCHIHKYPSKS-TLIHQGEKAETLYYIVKGSVAVLIKDEEGKEMILSYLNQGDF
_contacts ................. ..........................................
_
_query    VGFDAIGS--GHHPSFAQALETSMVCEIPFETLDDLSGKMPNLRQQMMRLMSGEIKGDQD
_template IGELGLFEEGQERSAWVRAKTACEVAEISYKKFRQLIQVNPDILMRLSAQMARRLQVTSE
_contacts ........  ..................................................
_
_query    MILLLSKKNAEERLAAFIYNLSRRFAQRGFSPREFRLTMTRGDIGNYLGLTVETISRLLG
_template KVGNLAFLDVTGRIAQTLLNLAKQ-PDAMTHPDGMQIKITRQEIGQIVGCSRETVGRILK
_contacts ........................ ................*........***...*...
_
_query    RFQKSGMLAVKGKYITI
_template MLEDQNLISAHGKTIVV
_contacts .................
_
_stats: 5/5 aligned contacting residues, 2/5 conserved
_modelled protein-DNA interface (N-ring contacts):
_
_1.00          0.23   1.00                        
_R0212A        T0206A E0208A                      
_G      A      t      C      G      c      a      
_:      :      :      :      :      :      :      
_       T      a      G      C      g      t      
_                            E0208A V0207A V0207A 
_                            1.00   0.00   0.00   
_
_template reference: A.A.NAPOLI et al. J.MOL.BIOL. V. 357 173 2006 
_template_info: INDIRECT READOUT OF DNA SEQUENCE AT THE PRIMARY-KINK 
_template_info: SITE IN THE CAP-DNA COMPLEX: RECOGNITION OF PYRIMIDINE-PURINE 
_template_info: AND PURINE-PURINE STEPS. 
_PDB model file P0A9E5_FNR_ECOLI_980419-1zrf_A.pdb

_compressed PDB models file P0A9E5_FNR_ECOLI_980419_compressed_models.tgz
# TFmodeller : emailing results for P0A9E5_FNR_ECOLI_980419

how does it work?

The figure shows a flow chart of TFmodeller, exposing all steps involved in a modelling job.

Performance analysis

It is important to acknowledge key observations that affect the value of results generated by TFmodeller: