-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsystem-convert.py
More file actions
41 lines (31 loc) · 1.43 KB
/
system-convert.py
File metadata and controls
41 lines (31 loc) · 1.43 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#Here's the update so far. I found a process in python that allows me to generate a text file from the PDF. I am now working on parsing the PDF files by extracting the name of the asset and the margin price separately (the text extraction is very crude at this point). I need to work on making sure the right numbers are added to the array, and to find a way to exclude the page numbers that come up. Then I will create methods to save numbers and compare them.
import subprocess
def createTextFromPDF(pdfPath):
retcode = subprocess.call(["/usr/bin/pdftotext", pdfPath])
#if __name__ == "__main__":
# createTextFromPDF('/home/dkurk/Documents/dkurk/sample.pdf')
file = open("sample.txt").readlines()
[s.strip() for s in file]
companies = []
margins = []
lines = []
for line in file:
line.rstrip('\n')
if line !='\n' and "RISK-BASED" not in line and line != "PRODUCT GROUP\n" and line != "OFFSET GROUP\n":
if "CLASS GROUP" in line:
companies.append(line)
elif "USIDX" in line:
companies.append(line)
elif "PRODUCT GROUP" in line:
companies.append(line)
elif "GROSS MARGIN" in line:
companies.append(line)
elif line[0].isdigit() and '/' not in line and "83824" not in line:
margins.append(line)
#print len(companies)
#print len(margins)
print companies
print margins
#tuples = zip(companies, margins)
#for a in tuples:
# print a