Python3 extract line between two lines in txt file

I am new to Python. I am trying to extract one line ("concluded that our disclosure actions were effective in terms of") from a txt file ("infile.txt"). The file is relatively large and I need to search for the above line in one specific section (between "ITEM and nbsp; 9A" and "ITEM and nbsp; 9B"). An example of such a section:

</A>ITEM&nbsp;9A. CONTROLS AND PROCEDURES. </B></FONT></P> <P STYLE="margin-top:6px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B>Evaluation of Disclosure Controls and Procedures </B></FONT> STYLE="margin-top:6px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">Under the supervision and with the participation of our management, including our Chief Executive Officer and Chief Financial Officer, we conducted an evaluation of the effectiveness of our disclosure controls and procedures (as defined in Rules 13a-15(e) and 15d-15(e) under the Securities Exchange Act of 1934, as amended (Exchange Act)), as of the end of the period covered by this Annual Report on Form 10-K. Management recognizes that any controls and procedures, no matter how well designed and operated, can provide only reasonable assurance of achieving their objectives and management necessarily applies its judgment in evaluating the cost-benefit relationship of possible controls and procedures. Based on such evaluation, our Chief Executive Officer and Chief Financial Officer concluded that our disclosure controls and procedures were effective as of September&nbsp;28, 2012. </FONT></P> <P STYLE="margin-top:18px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B>Management&#146;s Annual Report on Internal Control over Financial Reporting </B></FONT> <P STYLE="margin-top:6px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">This Annual Report does not include a report of management&#146;s assessment regarding internal control over financial reporting or an attestation report of the company&#146;s registered public accounting firm due to a transition period established by rules of the Securities and Exchange Commission for newly public companies. </FONT> <P STYLE="margin-top:18px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B>Changes in Internal Control over Financial Reporting </B></FONT></P> <P STYLE="margin-top:6px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">There were no changes in our internal control over financial reporting (as defined in Rule&nbsp;13a-15(f) under the Exchange Act) during the quarter ended September&nbsp;28, 2012, that have materially affected, or are reasonably likely to materially affect, our internal control over financial reporting. </FONT> <P STYLE="margin-top:18px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B><A NAME="tx431171_16"></A>ITEM&nbsp;9B. OTHER INFORMATION.

      

If the section has the correct string, "concluded that our disclosure actions were effective in terms of" (in the section above, it's in the middle in the middle), then I would like to print "1" in a separate "output". csv ", if not, type" not found ". The starting point of the section is not always the beginning of the line. I'm sorry, but I couldn't figure out how to start ... I'm using Python 3.6.

Thank you in advance!

+3


source to share


2 answers


You can use regular expressions to extract text between a given opener and closer:

import re

opener = re.escape(r"ITEM&nbsp;9A")
closer = re.escape(r"ITEM&nbsp;9B")

      

You can view the excerpts with re.finditer

, and then filter the excerpts with the target string using the in-operator:



target_string = "concluded that our disclosure controls were effective as of"
for mo in re.finditer(opener + '(.*?)' + closer, inputstring, re.DOTALL):
    extract = mo.group(1)
    if target_string in extract:
        ...

      

Hope this is enough to get you started :-)

0


source


You can use re.findall

:



import re

the_data = re.findall("</A>ITEM&nbsp;9A. (.*?)</B>", string_data_from_file)

if len(the_data) >0:
    print "1"

else:
    print "Not found"

      

0


source







All Articles