Yeah, I got help earlier from elsewhere because you were gone, but they told me to get rid of the "== True" in the "in repeat_sequences == True" part, and that fixed my problem. So, so far, the script is going great! There are some things I need to work to change, though.
First, in the file that I'm going to fully run the code through (well, this file is the test file, and it's
only 100,000 letters), there are 50 letters per line, and there's a space after each set of 50. If the script doesn't read it properly, then it's going to appear with \n's everywhere. Wait, I just looked at your code, redyugi, and there's that .strip("\n") part. Will that apply change make it work for the entire .txt file?
Also, I need to create an option whether or not to import an already existing dictionary considering there are 23 full files I need to run the program through. Why 23? 23 chromosomes of a human. P: But this should be pretty easy.
Also, for your "while read_kmer:," wouldn't that make it so that if the length of the k-mer is 5 and you're 4 characters away from the end, it'll still try and run the code?