🤬
  • ■ ■ ■ ■ ■
    .streamlit/config.toml
    1 1  [theme]
    2 2  base="light"
     3 +primaryColor="#F63333"
  • ■ ■ ■ ■ ■ ■
    README.md
    skipped 8 lines
    9 9   
    10 10  ## Tool Description
    11 11   
    12  -Financial crime journalists need to dig through complex corporate ownership databases (i.e. databases of companies and the people/companies that control those companies) in order to find potentially interesting people/companies related to financial crime. They face several problems along the way:
    13  -1. It is difficult to search across multiple publicly-available databases (UK Companies House, ICIJ Leaks, VK)
    14  -2. There are multiple ‘risk signatures’ associated with criminal activity (e.g. Cyclical or long-chain ownership, links to sanctions, etc) and different journalists prioritise different kinds of signatures in their investigation.
    15  -3. It is hard to prioritise which corporate ownership structures are more ‘risky’ than others
    16  -4. It is hard to see the visualise corporate ownership with different risk signals
     12 +Financial crime journalists need to dig through complex corporate ownership databases (i.e. databases of companies and the people/companies that control those companies) in order to find associations to criminal activity. They face several problems along the way:
     13 +1. It is difficult to search across multiple publicly-available databases (UK Companies House, Sanction lists, ICIJ Leaks, VK)
     14 +2. There are multiple ‘risk signatures’ associated with criminal activity (e.g. Cyclical or long-chain ownership, links to sanctions, etc) and different journalists prioritise different kinds of signatures in their investigation
     15 +3. The number of corporate networks is overwhelming, and so it is hard to prioritise which corporate ownership structures are more ‘risky’ than others
    17 16   
    18  -Corporate Risk Miner is a web app which evaluates different risk signatures of financial crime applied to the UK Companies House (UKCH) corporate ownership networks. These risk signatures include:
    19  -* Cyclic ownership: (to explain.....)
     17 +451 Corporate Risk Miner allows a user to navigate over different corporate ownership networks extracted from UK Companies House (UKCH) to identify and visualise those exhibiting risk signatures associated with financial crime. Example risk signatures include:
     18 +* Cyclic ownership: Circular company ownership (e.g. Company A owns Company B which owns Company C which owns Company A)
    20 19  * Long-chain ownership: Long chains of corporate ownership (e.g. Person A controls company A. Company A is an officer for Company B. Company B is an officer of company C. etc)
    21  -* Links to tax havens: Corporate networks which involve companies/people associated with tax haven jurisdictions
    22  -* Multi-jurisdictionness: Corporate networsk which span many jurisdictions
    23  -* Presence of proxy directors: Proxy directors are individual people who are registered as a company director but who are likely never involved in the running of the business. These people are often directors for many companies.
     20 +* Links to tax havens: Corporate networks which involve companies/people associated with tax haven or secrecy jurisdictions
     21 +* Presence of proxy directors: Proxy directors are individual people who are registered as a company director on paper but who are likely never involved in the running of the business.
    24 22  * Links to sanctioned entities: Official sanctioned people or companies, from sources such as the UN Sanctions List.
    25 23  * Links to politically-exposed persons (PEPs)
    26 24  * Links to disqualified directors
    27 25   
    28  -The user can customise the relative 'importance' of each risk signature for their search. For example one user may rate 'cyclic ownership' as a less important feature than 'association with tax havens' in flagging up potentially dodgy corporate networks. One the user chooses their signature preferences, the app generates a **risk score** associated with each corporate network and displays the structure of those networks with the highest risk scores.
     26 +The user can customise the relative importance of each risk signature for their search. The app then computes a **total risk score** for each corporate network in UKCH, and outlines the details of the most high-risk networks. The user can export these network results as a .csv file for later viewing.
    29 27   
    30 28  ## Installation
    31 29   
    skipped 27 lines
    59 57  - Any limitations of the current implementation of the tool
    60 58  - Motivation for design/architecture decisions
    61 59   
     60 +### Limitations
     61 +* Limited to cliques of ??? hop distance owing to space limitation
     62 +* Cyclicity calculation assumes an undirected graph to save computational time. This could be improved by taking into account specific directions of ownership.
     63 +* Entity resolution for company/people entities could be improved
     64 +* Graph visualisation for large corporate networks can be too cluttered to be useful.
     65 + 
     66 +### Potential next steps
     67 +* Expand to corporate ownership databases outside of the UK, for example using OpenCorporates data.
     68 +* Incorporate more external data sources identifying criminal or potentially-criminal activity for companies and people.
     69 +*
     70 + 
  • ■ ■ ■ ■ ■ ■
    app/Untitled-1.ipynb
    1  -{
    2  - "cells": [
    3  - {
    4  - "cell_type": "code",
    5  - "execution_count": 1,
    6  - "metadata": {},
    7  - "outputs": [],
    8  - "source": [
    9  - "import pandas as pd\n",
    10  - "import polars as pl"
    11  - ]
    12  - },
    13  - {
    14  - "cell_type": "code",
    15  - "execution_count": null,
    16  - "metadata": {},
    17  - "outputs": [],
    18  - "source": [
    19  - "aws s3 cp s3://ca-amt-df-playground-1-sagemaker-notebook-bucket-eu-west-1/hackathlon/nodes.parquet/ . --recursive "
    20  - ]
    21  - },
    22  - {
    23  - "cell_type": "code",
    24  - "execution_count": 13,
    25  - "metadata": {},
    26  - "outputs": [
    27  - {
    28  - "data": {
    29  - "text/html": [
    30  - "<div>\n",
    31  - "<style scoped>\n",
    32  - " .dataframe tbody tr th:only-of-type {\n",
    33  - " vertical-align: middle;\n",
    34  - " }\n",
    35  - "\n",
    36  - " .dataframe tbody tr th {\n",
    37  - " vertical-align: top;\n",
    38  - " }\n",
    39  - "\n",
    40  - " .dataframe thead th {\n",
    41  - " text-align: right;\n",
    42  - " }\n",
    43  - "</style>\n",
    44  - "<table border=\"1\" class=\"dataframe\">\n",
    45  - " <thead>\n",
    46  - " <tr style=\"text-align: right;\">\n",
    47  - " <th></th>\n",
    48  - " <th>cyclicity</th>\n",
    49  - " <th>node_num</th>\n",
    50  - " </tr>\n",
    51  - " <tr>\n",
    52  - " <th>network_id</th>\n",
    53  - " <th></th>\n",
    54  - " <th></th>\n",
    55  - " </tr>\n",
    56  - " </thead>\n",
    57  - " <tbody>\n",
    58  - " <tr>\n",
    59  - " <th>28587319354</th>\n",
    60  - " <td>0.000000</td>\n",
    61  - " <td>9</td>\n",
    62  - " </tr>\n",
    63  - " <tr>\n",
    64  - " <th>19180640338</th>\n",
    65  - " <td>0.000000</td>\n",
    66  - " <td>7</td>\n",
    67  - " </tr>\n",
    68  - " <tr>\n",
    69  - " <th>29711988418</th>\n",
    70  - " <td>0.000000</td>\n",
    71  - " <td>9</td>\n",
    72  - " </tr>\n",
    73  - " <tr>\n",
    74  - " <th>30146913753</th>\n",
    75  - " <td>0.000000</td>\n",
    76  - " <td>17</td>\n",
    77  - " </tr>\n",
    78  - " <tr>\n",
    79  - " <th>41943095593</th>\n",
    80  - " <td>0.833333</td>\n",
    81  - " <td>18</td>\n",
    82  - " </tr>\n",
    83  - " <tr>\n",
    84  - " <th>...</th>\n",
    85  - " <td>...</td>\n",
    86  - " <td>...</td>\n",
    87  - " </tr>\n",
    88  - " <tr>\n",
    89  - " <th>10546100446</th>\n",
    90  - " <td>0.400000</td>\n",
    91  - " <td>5</td>\n",
    92  - " </tr>\n",
    93  - " <tr>\n",
    94  - " <th>12286972756</th>\n",
    95  - " <td>0.000000</td>\n",
    96  - " <td>7</td>\n",
    97  - " </tr>\n",
    98  - " <tr>\n",
    99  - " <th>20667544820</th>\n",
    100  - " <td>0.100000</td>\n",
    101  - " <td>10</td>\n",
    102  - " </tr>\n",
    103  - " <tr>\n",
    104  - " <th>6838088944</th>\n",
    105  - " <td>0.000000</td>\n",
    106  - " <td>3</td>\n",
    107  - " </tr>\n",
    108  - " <tr>\n",
    109  - " <th>22044908124</th>\n",
    110  - " <td>0.272727</td>\n",
    111  - " <td>11</td>\n",
    112  - " </tr>\n",
    113  - " </tbody>\n",
    114  - "</table>\n",
    115  - "<p>100000 rows × 2 columns</p>\n",
    116  - "</div>"
    117  - ],
    118  - "text/plain": [
    119  - " cyclicity node_num\n",
    120  - "network_id \n",
    121  - "28587319354 0.000000 9\n",
    122  - "19180640338 0.000000 7\n",
    123  - "29711988418 0.000000 9\n",
    124  - "30146913753 0.000000 17\n",
    125  - "41943095593 0.833333 18\n",
    126  - "... ... ...\n",
    127  - "10546100446 0.400000 5\n",
    128  - "12286972756 0.000000 7\n",
    129  - "20667544820 0.100000 10\n",
    130  - "6838088944 0.000000 3\n",
    131  - "22044908124 0.272727 11\n",
    132  - "\n",
    133  - "[100000 rows x 2 columns]"
    134  - ]
    135  - },
    136  - "execution_count": 13,
    137  - "metadata": {},
    138  - "output_type": "execute_result"
    139  - }
    140  - ],
    141  - "source": [
    142  - "pd.read_parquet(\"./data/network.parquet\").set_index(\"network_id\")"
    143  - ]
    144  - }
    145  - ],
    146  - "metadata": {
    147  - "kernelspec": {
    148  - "display_name": "Python 3.10.1 ('.venv': venv)",
    149  - "language": "python",
    150  - "name": "python3"
    151  - },
    152  - "language_info": {
    153  - "codemirror_mode": {
    154  - "name": "ipython",
    155  - "version": 3
    156  - },
    157  - "file_extension": ".py",
    158  - "mimetype": "text/x-python",
    159  - "name": "python",
    160  - "nbconvert_exporter": "python",
    161  - "pygments_lexer": "ipython3",
    162  - "version": "3.10.1"
    163  - },
    164  - "orig_nbformat": 4,
    165  - "vscode": {
    166  - "interpreter": {
    167  - "hash": "7df5df506cb5a387f46ba54efbbd2d65ccf2196d092f81edeb09eadb2dc38463"
    168  - }
    169  - }
    170  - },
    171  - "nbformat": 4,
    172  - "nbformat_minor": 2
    173  -}
    174  - 
  • app/__pycache__/utils.cpython-310.pyc
    Binary file.
  • app/__pycache__/utils.cpython-38.pyc
    Binary file.
  • ■ ■ ■ ■ ■ ■
    app/app.py
    1 1  import json
     2 +import pandas as pd
    2 3  import streamlit as st
    3 4  from streamlit_agraph import agraph, Config
    4 5  from utils import (
    skipped 14 lines
    19 20  SLIDER_DEFAULT = 50
    20 21  DEFAULT_NUM_SUBGRAPHS_TO_SHOW = 3
    21 22  GRAPH_PLOT_HEIGHT_PX = 400
    22  -GRAPH_SIZE_RENDER_LIMIT = 30
     23 +GRAPH_SIZE_RENDER_LIMIT = 40
    23 24  subgraphs = get_subgraph_df()
    24 25   
    25 26  with st.sidebar:
    26  - st.title("Corporate risks")
     27 + st.title("451 Corporate Risk Miner")
    27 28   
    28 29   weight_chains = (
    29 30   st.slider(
    skipped 1 lines
    31 32   min_value=SLIDER_MIN,
    32 33   max_value=SLIDER_MAX,
    33 34   value=SLIDER_DEFAULT,
     35 + disabled=True,
    34 36   )
    35 37   / SLIDER_MAX
    36 38   )
    skipped 12 lines
    49 51   min_value=SLIDER_MIN,
    50 52   max_value=SLIDER_MAX,
    51 53   value=SLIDER_DEFAULT,
     54 + disabled=True,
    52 55   )
    53 56   / SLIDER_MAX
    54 57   )
    skipped 3 lines
    58 61   min_value=SLIDER_MIN,
    59 62   max_value=SLIDER_MAX,
    60 63   value=SLIDER_DEFAULT,
     64 + disabled=True,
    61 65   )
    62 66   / SLIDER_MAX
    63 67   )
    skipped 3 lines
    67 71   min_value=SLIDER_MIN,
    68 72   max_value=SLIDER_MAX,
    69 73   value=SLIDER_DEFAULT,
     74 + disabled=True,
    70 75   )
    71 76   / SLIDER_MAX
    72 77   )
    skipped 3 lines
    76 81   min_value=SLIDER_MIN,
    77 82   max_value=SLIDER_MAX,
    78 83   value=SLIDER_DEFAULT,
     84 + disabled=True,
    79 85   )
    80 86   / SLIDER_MAX
    81 87   )
    82  - # custom_names_a = st.multiselect(
    83  - # label="Custom persons of interest",
    84  - # options=nodes["node_id"],
    85  - # default=None,
    86  - # )
    87  - custom_names_b = st.file_uploader(label="Custom persons of interest", type="csv")
     88 + 
     89 + custom_names = st.file_uploader(
     90 + label="Custom persons/companies of interest", type="csv"
     91 + )
     92 + 
     93 + if custom_names:
     94 + custom_names = pd.read_csv(custom_names, header=None)[0].tolist()
     95 + st.write(custom_names)
    88 96   
    89 97   go = st.button("Go")
    90 98   
    skipped 42 lines
    133 141   config=Config(
    134 142   width=round(1080 / num_subgraphs_to_display),
    135 143   height=GRAPH_PLOT_HEIGHT_PX,
     144 + nodeHighlightBehavior=True,
     145 + highlightColor="#F7A7A6",
     146 + directed=True,
     147 + collapsible=True,
    136 148   ),
    137 149   )
    138 150   else:
    139 151   st.error("Subgraph is too large to render")
    140 152   
    141  - st.write(nodes_selected)
    142  - # # Build markdown strings for representing metadata
    143  - # markdown_strings = build_markdown_strings_for_node(nodes_selected)
     153 + # Build markdown strings for representing metadata
     154 + markdown_strings = build_markdown_strings_for_node(nodes_selected)
    144 155   
    145  - # st.markdown(":busts_in_silhouette: **People**")
    146  - # for p in markdown_strings["people"]:
    147  - # if ("SANCTIONED" in p) or ("PEP" in p):
    148  - # st.markdown(p)
    149  - # else:
    150  - # st.markdown(p)
     156 + st.markdown(":busts_in_silhouette: **People**")
     157 + for p in markdown_strings["people"]:
     158 + st.markdown(p)
    151 159   
    152  - # st.markdown(":office: **Companies**")
    153  - # for c in markdown_strings["companies"]:
    154  - # if ("SANCTIONED" in c) or ("PEP" in c):
    155  - # st.markdown(c)
    156  - # else:
    157  - # st.markdown(c)
     160 + st.markdown(":office: **Companies**")
     161 + for c in markdown_strings["companies"]:
     162 + st.markdown(c)
    158 163   
  • ■ ■ ■ ■ ■ ■
    app/utils.py
    skipped 3 lines
    4 4  import pandas as pd
    5 5   
    6 6  NODE_COLOUR_NON_DODGY = "#72EF77"
    7  -NODE_COLOUR_DODGY = "#EF7272"
     7 +NODE_COLOUR_DODGY = "#F63333"
    8 8  NODE_IMAGE_PERSON = "http://i.ibb.co/LrY3tfw/747376.png" # https://www.flaticon.com/free-icon/user_747376
    9 9  NODE_IMAGE_COMPANY = "http://i.ibb.co/fx6r1dZ/4812244.png" # https://www.flaticon.com/free-icon/company_4812244
    10 10   
    skipped 52 lines
    63 63   node_objects.append(
    64 64   Node(
    65 65   id=row["node_id"],
    66  - label=row["node_id"].split("|")[0],
    67  - size=30,
     66 + label="\n".join(row["node_id"].split("|")[0].split(" ")),
     67 + size=20,
    68 68   # color=NODE_COLOUR_DODGY
    69 69   # if (row["pep"] > 0 or row["sanction"] > 0)
    70 70   # else NODE_COLOUR_NON_DODGY,
    71  - # image=NODE_IMAGE_PERSON
     71 + image=NODE_IMAGE_PERSON,
    72 72   # if row["is_person"] == 1
    73 73   # else NODE_IMAGE_COMPANY,
    74  - # shape="circularImage",
    75  - shape="circle",
     74 + shape="circularImage",
    76 75   )
    77 76   )
    78 77   
    skipped 1 lines
    80 79   edge_objects.append(
    81 80   Edge(
    82 81   source=row["source"],
    83  - label=row["type"],
     82 + # label=row["type"][0],
    84 83   target=row["target"],
    85 84   )
    86 85   )
    skipped 9 lines
    96 95   markdown_strings["people"] = []
    97 96   
    98 97   for _, row in nodes_selected.iterrows():
    99  - node_metadata = json.loads(row["node_metadata"])
    100  - node_sanctions = (
    101  - "" if row["sanction"] == 0 else f"! SANCTIONED: {row['sanction_metadata']}"
    102  - )
    103  - node_pep = "" if row["pep"] == 0 else f"! PEP: {row['pep_metadata']}"
     98 + node_metadata = {
     99 + "name": row["node_id"],
     100 + "is_proxy": row["proxy_dir"],
     101 + "is_person": True,
     102 + }
    104 103   
    105  - if row["is_person"] == 1:
    106  - node_title = f"{node_metadata['name']} [{node_metadata['nationality']}/{node_metadata['yob']}/{node_metadata['mob']}]"
     104 + # node_metadata = json.loads(row["node_metadata"])
     105 + # node_sanctions = (
     106 + # "" if row["sanction"] == 0 else f"! SANCTIONED: {row['sanction_metadata']}"
     107 + # )
     108 + # node_pep = "" if row["pep"] == 0 else f"! PEP: {row['pep_metadata']}"
     109 + 
     110 + node_sanctions = ""
     111 + node_pep = ""
     112 + 
     113 + if node_metadata["is_person"]:
     114 + # node_title = f"{node_metadata['name']} [{node_metadata['nationality']}/{node_metadata['yob']}/{node_metadata['mob']}]"
     115 + node_title = f"{node_metadata['name']}"
    107 116   key = "people"
    108 117   else:
    109  - node_title = f"{node_metadata['name']} [{row['jur']}/{node_metadata['reg']}/{node_metadata['address']}]"
     118 + # node_title = f"{node_metadata['name']} [{row['jur']}/{node_metadata['reg']}/{node_metadata['address']}]"
     119 + node_title = f"{node_metadata['name']}"
    110 120   key = "companies"
    111 121   
    112 122   markdown_strings[key].append(
    skipped 7 lines
Please wait...
Page is in error, reload to recover