Greedy Motif Search

Nov 12, 2014

—

Having spent some time trying to grasp the underlying concept of the Greedy Motif Search problem in chapter 3 of Bioinformatics Algorithms (Part 1) I hoped to cement my understanding and perhaps even make life a little easier for others by attempting to explain the algorithm step by step below.

I will try to provide an overview of the algorithm as well as addressing each section of the pseudo code provided in the course:

Pseudo Code

   GREEDYMOTIFSEARCH(Dna, k, t)
        BestMotifs ← motif matrix formed by first k-mers in each string
                      from Dna
        for each k-mer Motif in the first string from Dna
            Motif₁ ← Motif
            for i = 2 to t
                form Profile from motifs Motif₁, …, Motif_{i - 1}
                Motif_i ← Profile-most probable k-mer in the i-th string
                          in Dna
            Motifs ← (Motif₁, …, Motif_t)
            if Score(Motifs) < Score(BestMotifs)
                BestMotifs ← Motifs
        output BestMotifs

I haven’t gone into any detail for creating profiles from k-mers, finding profile-most probable k-mers or scoring as I found these topics were relatively clear in the course text. Furthermore I haven’t posted my code for obvious reasons.

Overview

The basic idea of the greedy motif search algorithm is to find the set of motifs across a number of DNA sequences that match each other most closely. To do this we:

Run through each possible k-mer in our first dna string (illustrated for the first three 3-mers in orange below),
Identify the best matches for this initial k-mer within each of the following dna strings (using our profile-most probable function, illustrated below in green) thus creating a set of motifs at each step
Score each set of motifs to find and return the best scoring set.

For example the first three iterations in the main loop if k = 3 (k being the length of our k-mers, or in this case 3-mers) will give us the following motif sets and scores:

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG
Score = 7

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG
Score = 5

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG
Score = 4

The best motif set in the first three iterations, as illustrated above, is therefore the third which scores 4, the best overall score happens at the seventh iteration:

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG
Score = 2

The Code

BestMotifs ← motif matrix formed by first k-mers in each string from Dna

First we initialise best_motifs as a list featuring the first k-mer from each dna string.

 GGCGTTCAGGCA
 AAGAATCAGTCA
 CAAGGAGTTCGC
 CACGTCAATCAC
 CAATAATATTCG

So using our example DNA list above we get:

best_motifs = [‘GGC‘, ‘AAG‘, ‘CAA‘, ‘CAC‘, ‘CAA‘]

This list will be used to compare against other motif lists that we will build later in the algorithm.

for each k-mer Motif in the first string from Dna

Motif₁ ← Motif

In our outer loop we will be iterating through each k-mer in the first dna string (in this case ‘GGCGTTCAGGCA‘ ).

At the beginning of each of these iterations we will first initialise a new list called motif and then add the respective k-mer as its first entry

First iteration:

GGCGTTCAGGCA

So on the first iterations the motif list will be initialised as motif= [‘GGC‘]

Second-iteration

GGCGTTCAGGCA

On the second iteration the motif list will be initialised as motif = [‘GCG‘]

And so on…

Within each iteration of the outer loop we will be repeating the following instructions within our ‘inner loop’ below:

for i = 2 to t

form Profile from motifs Motif₁, …, Motif_{i – 1}

Motif_i ← Profile-most probable k-mer in the i-th string in Dna

Motifs ← (Motif₁, …, Motif_t)

Our inner loop effectively iterates through the second dna string the final dna string (or rather dna[1] to dna[t]). At each step we will add the most likely candidate k-mer from the current dna string to our current motif list.

As an example we will run through the first iteration of our main loop for which we initialised our motif list as [‘GGC‘] above:

 GGCGTTCAGGCA
 AAGAATCAGTCA
 CAAGGAGTTCGC
 CACGTCAATCAC
 CAATAATATTCG

To tie things together I have tried to colour code the current k-mer from the first dna string as orange, the current dna string as blue and the best matches from previously analysed dna strings as green.

Please note that we iterate through dna[1] to dna[t] within the inner loop which excludes dna[0]. dna[0] is used to create the first item in our motif list upon each iteration of our outer loop as described above. This ‘seed’ motif is then compared to each of the other dna strings. If we included dna[0] in our inner loop we would be comparing it to itself at each iteration.

Create a profile from all motifs in our list

We only have one item in our motif list when we first start our loop which results in the profile:

	G	G	C
A	0	0	0
C	0	0	1.0
G	1.0	1.0	0
T	0	0	0

And so our first profile might be represented in a form similar to:

profile = [(0.0, 0.0, 1.0, 0.0), (0.0, 0.0, 1.0, 0.0), (0.0, 1.0, 0.0, 0.0)]

Find Profile-most probable motif in dna[i] using the current profile

We can now run our profile-most probable function passing the profile above and dna[i] as arguments. In the first iteration of our inner loop we will be using the second dna string; ‘AAGAATCAGTCA‘ and the profile; [(0.0, 0.0, 1.0, 0.0), (0.0, 0.0, 1.0, 0.0), (0.0, 1.0, 0.0, 0.0)] which returns the motif AAG

Add the resulting motif to motif list

We now add the resulting k-mer to our motif list:

motif= [‘GGC‘, ‘AAG‘]

Rinse and repeat using the next dna string and the updated motif list:

 GGCGTTCAGGCA
 AAGAATCAGTCA
 CAAGGAGTTCGC
 CACGTCAATCAC
 CAATAATATTCG

Again we create a profile from all motifs in our list

	G	G	C
	A	A	G
A	0.5	0.5	0
C	0	0	0.5
G	0.5	0.5	0.5
T	0	0	0

Which we might represent in a form similar to:

profile =[(0.5, 0.0, 0.5, 0.0), (0.5, 0.0, 0.5, 0.0), (0.0, 0.5, 0.5, 0.0)]

And again we find the Profile-most probable motif in dna[i] using the current profile

Using dna[i]; ‘CAAGGAGTTCGC‘ and the profile; [(0.5, 0.0, 0.5, 0.0), (0.5, 0.0, 0.5, 0.0), (0.0, 0.5, 0.5, 0.0)] which returns the motif AAG which results in the updated motif list:

motifs = [‘GGC‘, ‘AAG‘, ‘AAG‘]

And so on…

We carry on this way until we reach the last dna string, dna[t]. In this case our motif list will look something like this:

motif = [‘GGC‘, ‘AAG‘, ‘AAG‘, ‘CAC‘, ‘CAA‘]

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG

ifScore(Motifs) < Score(BestMotifs) BestMotifs ← Motifs outputBestMotifs

Finally we need to compare our current motif list with our best motif list to see which has the lower score. In the case above motif [‘GGC‘, ‘AAG‘, ‘AAG‘, ‘CAC‘, ‘CAA‘] scores 7 compared to best_motif which scores 6. As such best_motif is not updated on this iteration.

I have duplicated the results for the first three iterations below now that we have some additional context.

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG
Score = 7

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG
Score = 5

GGCGTTCAGGCA 
AAGAATCAGTCA 
CAAGGAGTTCGC 
CACGTCAATCAC 
CAATAATATTCG
Score = 4

This entire process is then repeated with the next k-mer in dna[0] and so on until you have reached the last k-mer in the first dna string.

In each case the motif list is emptied and initialised with the current k-mer from the first dna string. As such the last motif list would be initialised as motif = ”GCA”.

Finishing off

Hopefully the above details will give you a good idea how to implement the algorithm.

If you do have any trouble it might be worth attempting to code the inner loop first to recreate the first motif set (starting with ‘GGC‘). Once you have that in hand it should be fairly straight forward to iterate through the remaining k-mers and keep track of the best scoring motif set.

And I think that covers everything… I really hope that this has been helpful! Please let me know if I have missed anything (literally or figuratively) as this is as much an education for myself as anybody else.

Edit: A few people have kindly asked if they can donate a tiny bit towards hosting costs and so I have tentatively added a donate (via paypal) button below. Thanks 🙂

[wpedon id=554]

Comments

79 responses to “Greedy Motif Search”

Adam Peterson

November 12, 2014

You can’t imagine how helpful this was for me. Turns out I just needed to fix one thing- but I’m not sure how much longer it would’ve taken me to fix that if I hadn’t walked through it with your guidance.

Much obliged

Reply
Patricia Ricky Pearl

November 24, 2014

Oh, is that how to implement the pseudocode? Yes, I’ll have to try that.
At least now I understand! 🙂

Reply
K Baker

November 25, 2014

I concur with Adam, this is a great walkthrough, the pseudocode is a little opaque as written (at least to me), and I have now found two small mistakes I had made, one related to this algorithm, and one to the profile most probable, without this page showing the interim data I would still be struggling, so thank you so much for your guidance, I’m sure there was more information on the fora, but I found your link there and this was just what I needed, a second mind to help me out.

Reply
Anders

November 26, 2014

Very helpful, thank you. I think you have an error, though. When demonstrating the scoring (orange and green) in the beginning, you always use AAG on the second line, but I’m pretty sure GGC would be followed by GTC, GCG would be followed by AAG, and CGT would be followed by AGT on the second line.

Reply
1. Anders
  
  November 27, 2014
  
  Nevermind. The reason it select AAG is because all the probabilities in the second string are 0, ie none of them exactly match the first motif.
  
  Reply
  1. MrGraeme
    
    November 27, 2014
    
    Hey Anders,
    Yeah, that bothered me for ages as well…
    Thanks for the feedback!
    
    Reply
    1. Paul
      
      October 3, 2015
      
      I thought that in the class he talked about adding a small value, like one, to everything in order to make more similar strings appear even when they aren’t perfect matches.
      
      Reply
Patricia R. Pearl

December 2, 2014

“Finally we need to compare our current motif list with our best motif list to see which has the lower score. In the case above motif [‘GGC‘, ‘AAG‘, ‘AAG‘, ‘CAC‘, ‘CAA‘] scores 7 compared to best_motif which scores 6. As such best_motif is not updated on this iteration.”

I understand your description until the phrase ‘scores 7’.
How did you determine that score?

Reply
Paul Jennings

May 3, 2015

A very heartfelt thanks for clearing this mess up the best way possible. Quite unfortunate the instructors did not do such a marvelous job with my being ready to drop the course as a result.

Reply
1. kayode Fashae
  
  January 3, 2022
  
  Kudos to the Instructors all the same! A big thanks to MrGraeme for complimenting the Instructors! The duo did an excellent job.
  
  Reply
PVA

May 3, 2015

On the second iteration the motif list will be initialised as motif = [‘GGC‘]

Is there a typo ?

Should it rather be :-
On the second iteration the motif list will be initialised as motif = [‘GCG‘]

Reply
1. MrGraeme
  
  May 6, 2015
  
  Whoops… I’ll check that out after work if I get a chance…
  Thanks for the feedback!
  
  Reply
  1. MJ
    
    May 11, 2015
    
    Also: dna[t] should be dna[t-1] everywhere
    
    Reply
    1. MrGraeme
      
      May 11, 2015
      
      Hmm, O.K., I’ll need a moment to check the logic here…
      
      Reply
2. Adrian
  
  May 10, 2015
  
  Yeah is a typo.
  best_motifs[0] should be ‘GGC‘
  
  Reply
  1. Adrian
    
    May 10, 2015
    
    Oh sorry I misunderstood. I dunno. I’m at another part of text maybe
    
    Reply
3. MrGraeme
  
  May 11, 2015
  
  Ok, fixed GCC / GGC typo… Thanks very much!
  
  Reply
Jessica

May 6, 2015

This was incredibly helpful! Thank you for putting this together!!

Reply
Eric

May 9, 2015

I’m struggling with how you identify AAG as the best match in string 2 (AAGAATCAGTCA) during the first iteration (i.e. with kmer GGC in string 1). If I start with a profile of (0010)(0010)(0100) for GGC, then I won’t choose AAG as the best match, since not a single base matches. If instead I make a profile matrix using my best motifs, which have been arbitrarily initialized to the first three letters of each DNA string, then I think that, for example, the third kmer in the second string, GAA, should fit the profile better than AAG. My profile matrix from the initial best motifs is:
(0.2 0.6 0.2 0)(0.8 0 0.2 0)(0.4 0.4 0.2 0)
So if I compare AAG and GAA, in the first position they’re both 0.2, so equally likely. In the second position they’re the A, so equally likely. But in the third position, G has 0.4 and A has 0.2, so GAA is more likely than AAG.

Up to this point, I’ve gotten everything correct in the code challenges and quizzes, but this one has really stumped me. Any help would be much appreciated.

Thanks.

Eric

Reply
1. Eric
  
  May 12, 2015
  
  I figured out my error – AAG is the most likely match to GGC in the first iteration because, although it has probability 0, all the others do too (when you multiply probabilities of the individual bases), so the first one is chosen.
  
  Thanks so very much, Mr. Graeme, for your step by step explanation of this problem. I definitely wouldn’t have solved it without your explanation (and barely did even with this site!).
  
  Best wishes,
  
  Eric
  
  Reply
  1. MrGraeme
    
    May 12, 2015
    
    Hey Eric,
    That’s great, it took me ages to get my head around that too,
    Cheers
    
    Reply
    1. Gaby
      
      October 27, 2015
      
      Hi!
      According to Eric’s explanation that means that As long as I get a probability of 0 I will take the first k-mer?
      Thank you.
      Greetings.
      
      Reply
Alex

May 10, 2015

This was really helpful. I’m not sure I would have been able to solve this with just the description in the text. Thanks for the clear explanation.

Reply
Adrian

May 10, 2015

There’s a mispelling in:
best_motifs = [‘GCC‘, ‘AAG‘, ‘CAA‘, ‘CAC‘, ‘CAA‘]
since the first kmer from first string is “GGC” (as stated in other part of the text).

Reply
Amir

May 11, 2015

This post was my main guide to solve this problem, definitely without this useful guide I was unable to overcome !
Thanks a lot.

Reply
fred

June 25, 2015

This is very helpful Thanks.
However there is still a minor problem here:
Second-iteration

GGCGTTCAGGCA
On the second iteration the motif list will be initialised as motif = [‘GGC‘]
SHOULD BE
On the second iteration the motif list will be initialised as motif = [‘GCG‘]

Reply
Khaled Elaish

July 29, 2015

Thank you!

Reply
Alaina

August 16, 2015

WOW. I commented over on Stepic, but this was SO useful to me. I was feeling pretty dumb at not being able to get it, but your walkthrough made it clear. Thanks so much for taking the time to write this!

Reply
1. MrGraeme
  
  August 16, 2015
  
  Hey! Glad it helped!
  
  Reply
Michelle

September 20, 2015

Thank you for this

Reply
Omar

September 21, 2015

great explanation, would never have made it without your help

Reply
Gene

September 22, 2015

Thank you so much for the explanation. As a biochemist with little programming experience I am playing catch-up on reading and applying pseudo code.

One question: when the first inner loop is complete and the motif list is emptied and initialised with the current k-mer from the first dna string, does the profile also reset to match the current kmer? (for instance, at initialization “GCA” in the second iteration, would it start out with [(0, 0, 1, 0), (0, 1, 0,0),(1,0,0,0)] rather can carrying forward the last profile at the end of the previous iteration?

Again, thank you.

Reply
1. Philipp
  
  January 9, 2016
  
  Yes it does also reset the profile and start anew just as you pointed out!
  
  Reply
Jakob

September 24, 2015

Nice. The initial visualization was what I needed to finally grasp this. Thanks a lot!

Reply
Charles

October 2, 2015

It seems I am stuck on the simple stuff. How are you arriving at the scores? Aren’t these the sums of Hamming distance between the first piece (in orange) to the subsequent most probable? I would appreciate someone going through the tabulation step by step. Thanks.

Reply
1. Michael Singer
  
  August 17, 2016
  
  To get the scores, first get the consensus of the array strings from motifs (and then separately from best_motifs).
  
  So if the motifs are the following:
  GTTCAGGCA
  AATCAGTCA
  CGAGTTCGC
  GTCAATCAC
  TAATATTCG
  Score = 7
  
  The consensus string (most common) is AAG and the score is the number of differences from that string. You get AAG from checking each Character from the 5 kmers. So position 0 has G, A, A, C, C [A most common] and position 1 has G, A, A, A, A [A most common] and position 2 has C, G, G, C, A [G most common] … AAG.
  
  GGC – AAG: 3 differences
  AAG – AAG: 0 differences
  AAG – AAG: 0 differences
  CAC – AAG: 2 differences
  CAA – AAG: 2 differences
  
  7 differences total
  
  Reply
  1. Michael Singer
    
    August 17, 2016
    
    It modified my formatting on the first listing. Let me try again.
    
    So if the motifs are the following:
    -GGC-GTTCAGGCA
    -AAG-AATCAGTCA
    C-AAG-GAGTTCGC
    -CAC-GTCAATCAC
    -CAA-TAATATTCG
    Score = 7
    
    Reply
Graeme Benstead-Hume

October 13, 2015

Sorry I haven’t replied to any questions recently! I have just started PhD. and along with looking after my very new son I haven’t had a chance to refresh myself on everything to answer suitably.

If any one else understands the problem clearly from the post it would be fantastic if you could answer any questions that come up either here or with a link to the stepic discussions page if you want more fame!

Cheers, Graeme

Reply
Ruslan V

September 27, 2016

Thank you, friend. It’s really very clear, understandable post. If there were not you, I’m afraid I’ll never understand this algorithm, at least so good.

Reply
LuisRamirez

October 20, 2016

Thanks for the explanation!

Reply
Chang Zhen

November 23, 2016

你真是太牛了！
Thank you from China.

Reply
Troy Sincomb

January 19, 2017

You are amazing O_o
This should be in the book.

Reply
1. MrGraeme
  
  January 20, 2017
  
  Very kind 🙂
  
  Glad you found it useful!
  
  Reply
carolynzy

January 28, 2017

Really thank you for make this much clearer for me. Saved me tons of time!

Reply
Flo

January 30, 2017

Hello ! I´m stucked.
I don´t understand how is it possible that given: “dna string; ‘AAGAATCAGTCA‘ and the profile; [(0.0, 0.0, 1.0, 0.0), (0.0, 0.0, 1.0, 0.0), (0.0, 1.0, 0.0, 0.0)]” it will return the motif AAG”.
I think I´m loosing something here. How could that profile return AAG ?

Thanks for your response!

Reply
Stevica Asak

March 5, 2017

Thank you.

Reply
Valentina

March 6, 2017

Thank you, it’s super clear 🙂

Reply
Benjamin Chang

April 5, 2017

Thank you so very much

Reply
Samuel Cole

April 9, 2017

Thanks for your help!

as soon as I catch up with the learning Ill make sure to keep an eye out for any articles you publish : D

Reply
Gwen Nicodemus

May 24, 2017

Thank you. I kept going through the text and videos and I could not figure out what ‘t’ was. If it’s passed in as a variable, it must be important, but what was it? You actually mentioned t in the pseudocode and description.

So, much thank yous.

Your explanation is beautiful.

Reply
Billy

July 21, 2017

Very helpful ! Thank you so much. You’ve really made my life easier 🙂

Reply
Juan Esteban Arango

January 22, 2018

Awesome post you made. Great resource to explain that algorithm. A lot of people got stuck there, and this for sure helped a lot!

Reply
JB

January 29, 2018

Thank you very much! It helped me a lot 🙂

Reply
Duha Alasiri

February 23, 2018

thank you soooo much
this really help me understand the algorithm

Reply
Brian W

May 27, 2018

for the third iteration, there are two modes (‘AAG’ and ‘AAT’) both appearing two times. However using ‘AAT’ gives you a lower score, so this should be indicated in the notes maybe? I am still figuring out how to do this. Maybe I will just go through doing hamming distance by doing all of the unique(strings) in the list and take lowest score which will be fast for the example but not when the list is much longer.

Reply
Мнгновенные казино вебмани

September 6, 2018

В ваших обзорах содержится большое
количество нужной информации. Я в любом случае добавлю ваш ресурс в избранное, чтобы
заходить на него ежедневно!!!
Я убежден, что таким образом буду узнавать много нового и интересного !

Спасибо огромное за следующие посты,
и удачи!

Reply
Brian A Salkas

November 18, 2018

I still do get how to profile got created in the first place…

Reply
1. Brian A Salkas
  
  November 25, 2018
  
  * I meant to say, I have no idea how you are scoring your matrix in the first place, where does the score 7 come from in your fist example?
  
  Reply
Brian A Salkas

November 28, 2018

thank you for the very clear explanation, I now understand how to the algorithm works,
I just have one question, how does the scoring work? Why does the first kmer get a score of 7, and the second a score of 5?

Reply
pic_achu

December 16, 2018

Thanks a lot

Reply
Elden Humphrey

January 22, 2019

Жаль, что вы не принимаете пожертвования…
Я бы с
удовольствием внес определенную сумму на счет этого
замечательного веб ресурса!!! Я убежден, что добавлю ваш канал
RSS в свою учетную запись Google … Я с нетерпением жду
появления новых постов, и отслеживаю их
в
моей группе в Facebook! Вскоре будем
обсуждать!

Reply
Richard Zipper

March 1, 2019

Thanks. Very helpful

Reply
Tanvir Saini

March 25, 2019

THANK YOU SO SO SO MUCH!!!

Reply
Zahra

July 2, 2019

Thank’s a lot!
It really helped me.

Reply
John Blaise

August 31, 2019

Under the section:

“Find Profile-most probable motif in dna[i] using the current profile”

we have the k-mer “GGC” and apply its profile to the string:

“AAGAATCAGTCA”

which is shown to yield AAG as our best match.

Why is AAG the best match? It matches only 0.0’s with the profile matrix from GGC. Something like “AGA” or “GAA” would at least have one character correctly overlap.

Correction: I just went through the code myself to verify why AAG is selected.

What actually happens is that our profile comparison is derived from the Pr function. The Pr function multiplies all the probabilities of each character’s match in the profile matrix together. If any character in the k-mer is matched to a 0.0 in the profile matrix even a single time, the other probabilities get multiplied by this 0.0, and the total probability returned is 0. This means AAG, where all 3 characters match to 0.0, returns a total probability of zero; AGA and GAA, which have one character match to 1.0, nevertheless has its 2 other characters matched to 0.0, resulting in the same total probability of zero. In other words, AAG, AGA, and GAA, are all weighted as the same, a zero, even though AGA and GAA in fact more closely match the input k-mer of GGC. The script only sees zeroes, and so it takes the first string by default: AAG.

The *only way* to return a non-zero probability, or a “match”, is if all 3 characters match the profile matrix. Then there are no 0.0’s being multiplied into the total probability, and we end up in a non-zero total probability. This is why the later dna strings input will find exactly matching k-mers. For example, the next iteration also finds AAG. It finds AAG because the previous iteration returned AAG, which added this combination of characters to the profile matrix, so the next AAG will exactly match it.

I don’t know if it’s better this way to throw away partial matches. It reduces false positives, but probably converges more slowly since we are just going to be returning the first few characters in the string until one matches perfectly. Then again, that is our goal, to find perfect matches. The main flaw is that if our seed string doesn’t contain exactly the optimal pattern in nature, then we’ll never find the optimal pattern due to the following strings being doomed to return a zero or low probability when matched against the profile matrix built-up from a seed that doesn’t contain the correct k-mer. For example, if our seed string, dna[0], doesn’t contain the correct pattern, but the next string, dna[1], does contain it, then we will never see this in our script. The script will match dna[1] to the profile matrix from dna[0] and simply return the first few characters or a different false pattern. The correct pattern would actually have a probability of 0.0 here.

Thanks for the informative post with the time put in for diagrams. It helped me understand the flawed angle the homework set had adopted.

Reply
1. MrGraeme
  
  June 9, 2020
  
  Thanks for updating your comment to describe your line of thinking! Too be honest I haven’t looked at the problem for a while and so it was helpful for me too!
  
  Reply
2. Hashim Mushtasin Reza
  
  July 2, 2020
  
  @John Blaise, thank you so much. You cannot believe the amount of time I spent scratching my head over this. Truly glad I came across your comment.
  Also, thank you too, MrGraeme, for this detailed explanation.
  
  Reply
HARSHIT JAITLY

April 27, 2020

Man, I love you, kudos _/\_ for the explanation, the world needs more people like you, more power to you

Reply
Firas Khatib

June 9, 2020

Yay, your site is back up!
Thank you so much, I use it for my classes 🙂

Reply
1. MrGraeme
  
  June 9, 2020
  
  Ha! Yes! Sorry! I was a bit busy and forgot to pay my hosting company!
  
  Reply
Mansi Marwah

June 22, 2020

Indeed very well explained . Had it not been for this explanation , I would have taken days to understand this .Thanks a lot .

Reply
Andrea Q

June 25, 2020

Wow! Thank you!! I was really lost with that problem.

Reply
Raul Munoz Hernandez

July 19, 2020

Thank you!
Your explanation was really useful to understand this algorithm!
Gracias!

Reply
Suyash Gupta

August 29, 2020

This should be on Stepik text instead. They have a habit of making things complicated unnecessarily.
Thanks a lot man!

Reply
1. MrGraeme
  
  August 29, 2020
  
  Thanks Suyash!
  I’m glad you found this helpful! I really like Stepik’s content though in this case I think the use of colour in the explaination makes it easier to follow.
  Cheers. Graeme
  
  Reply
  1. kayode Fashae
    
    January 3, 2022
    
    Yeah! Stepit challenges you which is the essence of learning. Determination is key in learning. Some of the questions took me days to solve and when solved I was better off at next challenges. Even MrGraeme alluded to this indirectly from his posting. Thanks Stepit for producing the likes of MrGraeme. Thanks to MrGraeme for the brilliant and excellent explanation!
    
    Reply
Arun A

February 24, 2021

This was very helpful indeed. One question : If we were to change sequence of dna strings (as an example move GGCGTTCAGGCA to end from first), our set of solution motif will be completely different – am I correct ?

Reply
Sheen

January 8, 2022

Hi! Since nobody asked this in the comments, i feel like i may be the only one who is missing the point but would still highly appreciate if somebody could clear the doubt..
How are we writing the code for the profile? Since till now in the course we are being provided with the profile of probabilities every time, so we haven’t had to write a code for the building the profile on our own..
I am very new to programming and would appreciate if somebody could help!

Reply
Sheen

January 8, 2022

Hi! Since nobody asked this in the comments, i feel like i may be the only one who is missing the point but would still highly appreciate if somebody could clear the doubt..
How are we writing the code for the profile? Since till now in the course we are being provided with the profile of probabilities every time, so we haven’t had to write a code for the building the profile on our own..
I am very new to programming and would appreciate if somebody could help

Reply

Greedy Motif Search

Comments

79 responses to “Greedy Motif Search”

Leave a Reply Cancel reply