How to remove parts of a file in python?
I have a file named a.txt that looks like this:
I am the first line
I am the second. There may be more lines here.I am below a blank line.
I am the line.
There are more lines here.
Now I want to delete the content above the empty line (including the empty line). How can I do this with Pythonic?
source to share
Basically, you cannot delete stuff from the beginning of the file, so you have to write to a new file.
I think pythonic looks like this:
# get a iterator over the lines in the file:
with open("input.txt", 'rt') as lines:
# while the line is not empty drop it
for line in lines:
if not line.strip():
break
# now lines is at the point after the first paragraph
# so write out everything from here
with open("output.txt", 'wt') as out:
out.writelines(lines)
Here are some simpler versions of this, without with
for older Python versions:
lines = open("input.txt", 'rt')
for line in lines:
if not line.strip():
break
open("output.txt", 'wt').writelines(lines)
and a very straight forward version that just splits the file on an empty line:
# first, read everything from the old file
text = open("input.txt", 'rt').read()
# split it at the first empty line ("\n\n")
first, rest = text.split('\n\n',1)
# make a new file and write the rest
open("output.txt", 'wt').write(rest)
Note that this can be quite fragile, for example windows are often used \r\n
as one line, so an empty line will be \r\n\r\n
. But often you know that the file format only uses one kind of strings, so that might be a good thing.
source to share
The fileinput module (from the standard library) is handy for this kind of thing. It sets everything up so you can act as if you were editing a file "in place":
import fileinput
import sys
fileobj=iter(fileinput.input(['a.txt'], inplace=True))
# iterate through the file until you find an empty line.
for line in fileobj:
if not line.strip():
break
# Iterators (like `fileobj`) pick up where they left off.
# Starting a new for-loop saves you one `if` statement and boolean variable.
for line in fileobj:
sys.stdout.write(line)
source to share
Any idea how big the file will be?
You can read the file into memory:
f = open('your_file', 'r')
lines = f.readlines()
which will read line by line and store those lines in a list (s).
Then close the file and run it again with 'w':
f.close()
f = open('your_file', 'w')
for line in lines:
if your_if_here:
f.write(line)
This will overwrite the current file. Then you can choose which lines from the list you want to write. This is probably not a good idea if the file gets large, as the entire file must be in memory. But there is no need to create a second file to output your output for this.
source to share