RenameFastaSequences

From Bio.scipy.org

Jump to: navigation, search

I was asked to rename all the sequences in a file full of FASTA formatted sequences. The new names would be based on a basename and an index, i.e. bar1, bar2, bar3, ...

This would take a graduate student several hours by hand. There's no easy way to use search and replace in an editor to do this. A few lines of python make short work of it.

Here's the python code, save this to a file called "rename.py":

# rename fasta sequences in file according to user input
# Copyright (c) 2007, Humberto Ortiz Zuazaga

import sys

filename, basename = sys.argv
input = open(filename)
output = open('%s.new' % filename, 'w')

count = 1
for line in input:
    if line.startswith('>'):
        output.write('>%s%d\n' % (basename, count))
        count += 1
    else:
        output.write(line)

If you have a file "foo", and you want sequences called "bar1, bar2, bar3, ..." you can run:

python rename.py foo bar
Personal tools