🤬
  • ■ ■ ■ ■ ■
    requirements.txt
    1 1  elementpath
    2 2  enum
    3 3  csv
    4  - 
     4 +requests==2.22.0
     5 +beautifulsoup4==4.8.1
  • ■ ■ ■ ■ ■ ■
    sanctions/README.md
    1  -## Folder with Sanctions lists parsers
    2  - 
    3  -`source` directory has raw sanction lists (e.g UN and OFAC)
    4  - 
    5  -`parsed` directory contains parced data
    6  - 
    7  -### How to use
    8  - 
    9  -UN sanctions
    10  - 
    11  -```
    12  -python sanctions/un_parser.py -i "/sanctions/source/un.xml" -o "/sanctions/parsed/un_parsed.csv"
    13  -```
    14  -If doesnt work, try the absolute path
  • ■ ■ ■ ■ ■ ■
    sanctions_and_peps/README.md
     1 +## Folder with Sanctions and PEP lists parsers
     2 + 
     3 +`source` directory has raw sanction lists (e.g UN and OFAC)
     4 + 
     5 +`parsed` directory contains parced data
     6 + 
     7 +### How to use
     8 + 
     9 +UN sanctions:
     10 + 
     11 +```
     12 +python sanctions/un_parser.py -i "/sanctions_and_peps/source/un.xml" -o "/sanctions_and_peps/parsed/un_parsed.csv"
     13 +```
     14 + 
     15 +RU BL PEPs:
     16 + 
     17 +The data are scraped from
     18 + 
     19 +```
     20 +python sanctions_and_peps/ru_bl_peps_parser.py -o /sanctions_and_peps/parsed/ru_bl_peps_parsed.csv
     21 +```
     22 + 
     23 +PS! If doesnt work, try the absolute path
  • sanctions_and_peps/parsed/ru_bl_peps_parsed.csv
    Diff is too large to be displayed.
  • sanctions/parsed/un_parsed.csv sanctions_and_peps/parsed/un_parsed.csv
    Content is identical
  • ■ ■ ■ ■ ■ ■
    sanctions_and_peps/ru_bl_peps_parser.py
     1 +import requests
     2 +import re
     3 +import csv
     4 +import argparse
     5 + 
     6 +from bs4 import BeautifulSoup
     7 + 
     8 + 
     9 +NAME_PARSER = re.compile(r"\((.*?)\)")
     10 + 
     11 + 
     12 +def parse_args():
     13 + parser = argparse.ArgumentParser()
     14 + parser.add_argument("-o", "--out", type=str, required=True)
     15 + return parser.parse_args()
     16 + 
     17 + 
     18 +def parse_name(compound_name):
     19 + found_name_en = NAME_PARSER.findall(compound_name)
     20 + if found_name_en:
     21 + name_en = found_name_en[0].strip()
     22 + else:
     23 + name_en = None
     24 + name_ru = compound_name.split("(")[0].strip()
     25 + return name_en, name_ru
     26 + 
     27 + 
     28 +def main():
     29 + args = parse_args()
     30 + 
     31 + url = "https://rupep.org/en/persons_list/"
     32 +
     33 + html_text = requests.get(url).text
     34 + soup = BeautifulSoup(html_text, "html.parser")
     35 +
     36 + data = soup.find("table", "everything quicksilver_target").findAll("tr")
     37 + 
     38 + header = [
     39 + "NAME_EN",
     40 + "NAME_RU",
     41 + "DOB",
     42 + "TAXPAYER_NUM",
     43 + "CATEGORY",
     44 + "LAST_POSITION_EN"
     45 + "LAST_POSITION_RU"
     46 + ]
     47 + 
     48 + output = []
     49 + for record in data[1:]:
     50 + parsed_row = [None, None, None, None, None, None, ]
     51 + for i, column in enumerate(record.findAll("td")):
     52 + column = column.text.strip()
     53 + # parsed names
     54 + if i == 0:
     55 + name_en, name_ru = parse_name(column)
     56 + parsed_row[0] = name_en
     57 + parsed_row[1] = name_ru
     58 + # DOB
     59 + elif i == 1:
     60 + if column:
     61 + parsed_row[2] = column
     62 + # taxpayer num
     63 + elif i == 2:
     64 + if column:
     65 + parsed_row[3] = column
     66 + # category
     67 + elif i == 3:
     68 + parsed_row[4] = column
     69 + # parsed position names
     70 + elif i == 4:
     71 + if column:
     72 + name_en, name_ru = parse_name(column)
     73 + parsed_row[4] = name_en
     74 + parsed_row[5] = name_ru
     75 + 
     76 + output.append(parsed_row)
     77 + 
     78 + with open(args.out, "w") as f:
     79 + writer = csv.writer(f)
     80 + writer.writerow(header)
     81 + for row in output:
     82 + writer.writerow(row)
     83 + 
     84 + 
     85 +if __name__=="__main__":
     86 + main()
  • sanctions/source/un.xml sanctions_and_peps/source/un.xml
    Content is identical
  • sanctions/un_parser.py sanctions_and_peps/un_parser.py
    Content is identical
Please wait...
Page is in error, reload to recover