MEMSAT Usage Notes ================== Program and documentation is Copyright (C) 1993 David T. Jones, all rights reserved. All Trademarks and Registered Names are acknowledged in this document. Introduction ============ The enclosed program implements a new method for the prediction of the secondary structure and topology of all-helix integral membrane proteins based on the recognition of topological models. The method employs a set of statistical tables (log likelihoods) compiled from well-characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases towards certain amino acid species on the inside, middle and outside of a cellular membrane. The method is described in the following reference: "Jones, D.T., Taylor, W.R. and Thornton, J. M. (1994) Biochemistry. 33:3038-3049." Running MEMSAT ============== After compilation, MEMSAT may be run by simply typing 'MEMSAT' in which case it will display the available command-line options. *IMPORTANT* Make sure that the parameter files memb_m.dat and memb_s.dat are in your current directory when you run MEMSAT - unless you have set the environment variable MEMSAT_DIR to the correct location of the files. MEMSAT takes as input an amino acids sequence in most popular sequence file formats (including GCG, PIR and Fasta). The simplest possible command would be something like: MEMSAT protein.pep where protein.pep is an amino acid sequence file. Alternatively aspects of MEMSAT's operation may be modified using the following command line options: -hnnn = set maximum number of helices (default 20) This controls the maximum number of helices that will be considered. -dnnn = set minimum sequence length per helix (default 28) This sets the minimum sequence length required for a transmembrane helix. If the length of the sequence is l then MEMSAT will only permit l/28 helices to be predicted (assuming the default value of 28). -lnnn = set minimum loop length (default 6) This sets the minimum length for a "loop" between two transmembrane helices or at the N- or C- termini. -mnnn = set minimum length of helix (default 17) -xnnn = set maximum length of helix (default 25) The above options simply set the range of helix lengths that will be considered. The default minimum of 17 is untypically short, and a more general minimum of 19 might be preferred. -snnn = set helix score cutoff (default 100) If MEMSAT appears to be underpredicting the number of helices in your sequence, then this parameter may be reduced from the default of 100. Likewise if it is overpredicting, then this parameter may be increased. This value can be set to any integer value (positive or negative), but generally should be in the range -5000 to 5000. Command line examples: MEMSAT protein.pep MEMSAT -m19 -x24 -l4 t7recep.seq MEMSAT -h8 someseq.fasta NOTE that there can be no spaces between the option letter and the value. Example Results =============== In this example MEMSAT is used to predict the secondary structure and topology of bacterial archaerhodopsin. The input file (in compact PIR format) is as follows: >>>>BACA_HALS1 ARCHAERHODOPSIN PRECURSOR (AR). 5/92 MDPIALTAAVGADLLGDGRPETLWLGIGTLLMLIGTFYFIVKGWGVTDKEAREYYSITILVPGIASAAYLSM FFGIGLTEVQVGSEMLDIYYARYADWLFTTPLLLLDLALLAKVDRVSIGTLVGVDALMIVTGLVGALSHTPL ARYTWWLFSTICMIVVLYFLATSLRAAAKERGPEVASTFNTLTALVLVLWTAYPILWIIGTEGAGVVGLGIE TLLFMVLDVTAKVGFGFILLRSRAILGDTEAPEPSAGAEASAAD* The following output is produced (the comments in square brackets are not part of the actual program output): ARCHAERHODOPSIN PRECURSOR (AR). 5/92 260 residues read from file. [Check the number of residues matches the number in the file - a discrepancy indicates that MEMSAT has not read the sequence correctly This intermediate information shows the progress of MEMSAT's analysis of different topologies. Due to the way the algorithm works, it considers the single helix topologies first.] Helix 1 from 22 (in) to 40 (out) : 3176 [This summarises one of the evaluated topologies. The final number is the helix score - a value > 2000 indicates a strongly predicted helix; a value < 500 indicates a weakly predicted helix. These values have been scaled up by a factor of 1000 to convert them to integers - this is done to speed up the program.] ++++++++++++++++++++++IIIIXXXXXXXXXXXOOOO------------------------------- ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------------------------------------------- [This is a simple string representation of the predicted structure and topology. A '-' indicates an outside loop residue, '+' inside loop, 'O' outside helix cap, 'X' helix middle and 'I' inside helix cap] Score = -11.2050000 Helix 1 from 148 (out) to 164 (in) : 3939 ------------------------------------------------------------------------ ------------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII+++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++ Score = -8.0150000 [The overall score for the above topology - a value of -10000 indicates that the topology violates one of MEMSAT's constraints such as minimum helix score] Helix 1 from 6 (in) to 22 (out) : -3776 Helix 2 from 29 (out) to 45 (in) : 1175 Helix 3 from 53 (in) to 77 (out) : 3353 Helix 4 from 95 (out) to 111 (in) : 882 Helix 5 from 119 (in) to 138 (out) : 2021 Helix 6 from 148 (out) to 164 (in) : 3939 Helix 7 from 178 (in) to 194 (out) : 1568 Helix 8 from 201 (out) to 221 (in) : 129 Helix 9 from 228 (in) to 244 (out) : -1554 ++++++IIIIXXXXXXXXXOOOO------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXXXX XXOOOO-----------------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII+++++++++++++IIIIXXXXXXXXXOOOO------OOOOXXXXXXXXXXX XXIIII++++++IIIIXXXXXXXXXOOOO--------------- Score = -100000.0000000 Helix 1 from 22 (out) to 40 (in) : 3815 Helix 2 from 53 (in) to 77 (out) : 3353 Helix 3 from 95 (out) to 111 (in) : 882 Helix 4 from 119 (in) to 138 (out) : 2021 Helix 5 from 148 (out) to 164 (in) : 3939 Helix 6 from 178 (in) to 194 (out) : 1568 Helix 7 from 201 (out) to 221 (in) : 129 Helix 8 from 228 (in) to 244 (out) : -1554 ----------------------OOOOXXXXXXXXXXXIIII++++++++++++IIIIXXXXXXXXXXXXXXX XXOOOO-----------------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII+++++++++++++IIIIXXXXXXXXXOOOO------OOOOXXXXXXXXXXX XXIIII++++++IIIIXXXXXXXXXOOOO--------------- Score = -100000.0000000 Helix 1 from 22 (in) to 40 (out) : 3176 Helix 2 from 56 (out) to 77 (in) : 3202 Helix 3 from 94 (in) to 112 (out) : 737 Helix 4 from 119 (out) to 138 (in) : 2158 Helix 5 from 146 (in) to 167 (out) : 3309 Helix 6 from 178 (out) to 202 (in) : 2665 Helix 7 from 209 (in) to 233 (out) : 835 ++++++++++++++++++++++IIIIXXXXXXXXXXXOOOO---------------OOOOXXXXXXXXXXXX XXIIII++++++++++++++++IIIIXXXXXXXXXXXOOOO------OOOOXXXXXXXXXXXXIIII+++++ ++IIIIXXXXXXXXXXXXXXOOOO----------OOOOXXXXXXXXXXXXXXXXXIIII++++++IIIIXXX XXXXXXXXXXXXXXOOOO-------------------------- Score = 18.5380000 Helix 1 from 22 (out) to 40 (in) : 3815 Helix 2 from 53 (in) to 77 (out) : 3353 Helix 3 from 95 (out) to 111 (in) : 882 Helix 4 from 119 (in) to 138 (out) : 2021 Helix 5 from 148 (out) to 164 (in) : 3939 Helix 6 from 185 (in) to 204 (out) : 3106 ----------------------OOOOXXXXXXXXXXXIIII++++++++++++IIIIXXXXXXXXXXXXXXX XXOOOO-----------------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- -------------------------------------------- Score = 25.2460000 Helix 1 from 22 (in) to 40 (out) : 3176 Helix 2 from 56 (out) to 77 (in) : 3202 Helix 3 from 117 (in) to 138 (out) : 2172 Helix 4 from 148 (out) to 164 (in) : 3939 Helix 5 from 185 (in) to 204 (out) : 3106 ++++++++++++++++++++++IIIIXXXXXXXXXXXOOOO---------------OOOOXXXXXXXXXXXX XXIIII+++++++++++++++++++++++++++++++++++++++IIIIXXXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- -------------------------------------------- Score = 15.7350000 Helix 1 from 22 (out) to 40 (in) : 3815 Helix 2 from 53 (in) to 77 (out) : 3353 Helix 3 from 148 (out) to 164 (in) : 3939 Helix 4 from 185 (in) to 204 (out) : 3106 ----------------------OOOOXXXXXXXXXXXIIII++++++++++++IIIIXXXXXXXXXXXXXXX XXOOOO------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- -------------------------------------------- Score = 16.5840000 Helix 1 from 22 (in) to 40 (out) : 3176 Helix 2 from 148 (out) to 164 (in) : 3939 Helix 3 from 185 (in) to 204 (out) : 3106 ++++++++++++++++++++++IIIIXXXXXXXXXXXOOOO------------------------------- ------------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- -------------------------------------------- Score = 5.9440000 Helix 1 from 148 (out) to 164 (in) : 3939 Helix 2 from 185 (in) to 204 (out) : 3106 ------------------------------------------------------------------------ ------------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- -------------------------------------------- Score = 1.4670000 Helix 1 from 6 (out) to 22 (in) : -4082 Helix 2 from 29 (in) to 45 (out) : 1608 Helix 3 from 53 (out) to 69 (in) : 2478 Helix 4 from 76 (in) to 92 (out) : -3470 Helix 5 from 99 (out) to 115 (in) : 347 Helix 6 from 122 (in) to 138 (out) : 2058 Helix 7 from 148 (out) to 164 (in) : 3939 Helix 8 from 185 (in) to 204 (out) : 3106 Helix 9 from 216 (out) to 235 (in) : 1402 ------OOOOXXXXXXXXXIIII++++++IIIIXXXXXXXXXOOOO-------OOOOXXXXXXXXXIIII++ ++++IIIIXXXXXXXXXOOOO------OOOOXXXXXXXXXIIII++++++IIIIXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ Score = -100000.0000000 Helix 1 from 6 (in) to 22 (out) : -3776 Helix 2 from 29 (out) to 45 (in) : 1175 Helix 3 from 53 (in) to 77 (out) : 3353 Helix 4 from 95 (out) to 111 (in) : 882 Helix 5 from 119 (in) to 138 (out) : 2021 Helix 6 from 148 (out) to 164 (in) : 3939 Helix 7 from 185 (in) to 204 (out) : 3106 Helix 8 from 216 (out) to 235 (in) : 1402 ++++++IIIIXXXXXXXXXOOOO------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXXXX XXOOOO-----------------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ Score = -100000.0000000 Helix 1 from 22 (out) to 40 (in) : 3815 Helix 2 from 53 (in) to 77 (out) : 3353 Helix 3 from 95 (out) to 111 (in) : 882 Helix 4 from 119 (in) to 138 (out) : 2021 Helix 5 from 148 (out) to 164 (in) : 3939 Helix 6 from 185 (in) to 204 (out) : 3106 Helix 7 from 216 (out) to 235 (in) : 1402 ----------------------OOOOXXXXXXXXXXXIIII++++++++++++IIIIXXXXXXXXXXXXXXX XXOOOO-----------------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ Score = 31.8460000 Helix 1 from 22 (in) to 40 (out) : 3176 Helix 2 from 56 (out) to 77 (in) : 3202 Helix 3 from 117 (in) to 138 (out) : 2172 Helix 4 from 148 (out) to 164 (in) : 3939 Helix 5 from 185 (in) to 204 (out) : 3106 Helix 6 from 216 (out) to 235 (in) : 1402 ++++++++++++++++++++++IIIIXXXXXXXXXXXOOOO---------------OOOOXXXXXXXXXXXX XXIIII+++++++++++++++++++++++++++++++++++++++IIIIXXXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ Score = 22.3350000 Helix 1 from 22 (out) to 40 (in) : 3815 Helix 2 from 53 (in) to 77 (out) : 3353 Helix 3 from 148 (out) to 164 (in) : 3939 Helix 4 from 185 (in) to 204 (out) : 3106 Helix 5 from 216 (out) to 235 (in) : 1402 ----------------------OOOOXXXXXXXXXXXIIII++++++++++++IIIIXXXXXXXXXXXXXXX XXOOOO------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ Score = 23.1840000 Helix 1 from 22 (in) to 40 (out) : 3176 Helix 2 from 148 (out) to 164 (in) : 3939 Helix 3 from 185 (in) to 204 (out) : 3106 Helix 4 from 216 (out) to 235 (in) : 1402 ++++++++++++++++++++++IIIIXXXXXXXXXXXOOOO------------------------------- ------------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ Score = 12.5440000 Helix 1 from 148 (out) to 164 (in) : 3939 Helix 2 from 185 (in) to 204 (out) : 3106 Helix 3 from 216 (out) to 235 (in) : 1402 ------------------------------------------------------------------------ ------------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ Score = 8.0670000 Helix 1 from 22 (in) to 40 (out) : 3176 Helix 2 from 148 (out) to 164 (in) : 3939 ++++++++++++++++++++++IIIIXXXXXXXXXXXOOOO------------------------------- ------------------------------------------------------------------------ ----OOOOXXXXXXXXXIIII+++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++ Score = -3.5380000 1 helices (+) : Score = -11.205 1 helices (-) : Score = -8.015 2 helices (+) : Score = -3.538 2 helices (-) : Score = 1.467 3 helices (+) : Score = 5.944 3 helices (-) : Score = 8.067 4 helices (+) : Score = 12.544 4 helices (-) : Score = 16.584 5 helices (+) : Score = 15.735 5 helices (-) : Score = 23.184 6 helices (+) : Score = 22.335 6 helices (-) : Score = 25.246 7 helices (+) : Score = 18.538 7 helices (-) : Score = 31.846 8 helices (+) : Score = -100000 8 helices (-) : Score = -100000 9 helices (+) : Score = -100000 9 helices (-) : Score = -100000 [The scores for all of the considered topologies are listed before finally outputting the highest scoring one. A '+' indicates that the N- terminus is inside, a '-' that it is outside.] FINAL PREDICTION ================ 1: (out) 23-41 (3.81) [ *THESE* scores have NOT been multipled by 1000. ] 2: 54-78 (3.35) 3: 96-112 (0.88) 4: 120-139 (2.02) 5: 149-165 (3.94) 6: 186-205 (3.11) 7: 217-236 (1.40) [The last two lines output are the ASCII representation of the highest scoring topology and the input sequence.] ----------------------OOOOXXXXXXXXXXXIIII++++++++++++IIIIXXXXXXXXXXXXXXX XXOOOO-----------------OOOOXXXXXXXXXIIII+++++++IIIIXXXXXXXXXXXXOOOO----- ----OOOOXXXXXXXXXIIII++++++++++++++++++++IIIIXXXXXXXXXXXXOOOO----------- OOOOXXXXXXXXXXXXIIII++++++++++++++++++++++++ MDPIALTAAVGADLLGDGRPETLWLGIGTLLMLIGTFYFIVKGWGVTDKEAREYYSITILVPGIASAAYLSM FFGIGLTEVQVGSEMLDIYYARYADWLFTTPLLLLDLALLAKVDRVSIGTLVGVDALMIVTGLVGALSHTPL ARYTWWLFSTICMIVVLYFLATSLRAAAKERGPEVASTFNTLTALVLVLWTAYPILWIIGTEGAGVVGLGIE TLLFMVLDVTAKVGFGFILLRSRAILGDTEAPEPSAGAEASAAD [The precise results you get may differ from the above depending on the options you set and improvements to the parameter files] FINALLY ======= If you need assistance in getting MEMSAT working, then contact the author at the following address (E-mail is preferred): Dr David T. Jones E-mail: jones@bsm.bioc.ucl.ac.uk Biomolecular Structure and Modelling Unit Department of Biochemistry and Molecular Biology University College Gower Street London WC1E 6BT Tel. +44 71 387 7050 x3879 Fax. +44 71 380 7193