🤬
  • ■ ■ ■ ■ ■
    README.md
    1  -# Project name
     1 +# 451 Corporate Risk Miner
    2 2   
    3 3  ## Team Members
    4  -This section is a list of team members, and possibly links to GitHub/GitLab/LinkedIn/personal blog pages for members.
     4 +Elena Dulskyte [linkedin](https://www.linkedin.com/in/elena-dulskyte-50b83aa2/)
     5 +Marko Sahan [github](http://github.com/sahanmar) [linkedin](https://www.linkedin.com/in/msahan/)
     6 +Peter Zatka-Haas [github](http://github.com/peterzh) [linkedin](https://www.linkedin.com/in/peterzatkahaas)
    5 7   
    6 8  ## Tool Description
    7  -This sections discusses the purpose and motivation for the tool, and how it addresses a tool need you've identified.
     9 + 
     10 +Financial crime journalists need to dig through complex corporate ownership databases (i.e. databases of companies and the people/companies that control those companies) in order to find potentially interesting people/companies related to financial crime. They face several problems along the way:
     11 +1. It is difficult to search across multiple publicly-available databases (UK Companies House, ICIJ Leaks, VK)
     12 +2. There are multiple ‘risk signatures’ associated with criminal activity (e.g. Cyclical or long-chain ownership, links to sanctions, etc) and different journalists prioritise different kinds of signatures in their investigation.
     13 +3. It is hard to prioritise which corporate ownership structures are more ‘risky’ than others
     14 +4. It is hard to see the visualise corporate ownership with different risk signals
     15 + 
     16 +Corporate Risk Miner is a web app which allows users to evaluate different risk signatures of financial crime applied to the UK Companies House (UKCH) corporate ownership database. These risk signatures include:
     17 +* Cyclic ownership: (to explain.....)
     18 +* Long-chain ownership: Long chains of corporate ownership (e.g. Person A controls company A. Company A is an officer for Company B. Company B is an officer of company C. etc)
     19 +* Links to tax havens: Corporate networks which involve companies/people associated with tax haven jurisdictions
     20 +* Multi-jurisdictionness: Corporate networsk which span many jurisdictions
     21 +* Presence of proxy directors: Proxy directors are individual people who are registered as a company director but who are likely never involved in the running of the business. These people are often directors for many companies.
     22 +* Links to sanctioned entities: Official sanctioned people or companies, from sources such as the UN Sanctions List.
     23 +* Links to politically-exposed persons (PEPs)
     24 +* Links to disqualified directors
     25 + 
     26 +The user can customise the relative 'importance' of each risk signature for their search. For example one user may rate 'cyclic ownership' as a less important feature than 'association with tax havens' in flagging up potentially dodgy corporate networks. One the user chooses their signature preferences, the app generates a **risk score** associated with each corporate network and displays the structure of those networks with the highest risk scores.
    8 27   
    9 28  ## Installation
    10  -This section includes detailed instructions for installing the tool, including any terminal commands that need to be executed and dependencies that need to be installed. Instructions should be understandable by non-technical users (e.g. someone who knows how to open a terminal and run commands, but isn't necessarily a programmer), for example:
    11 29   
    12 30  1. Make sure you have Python version 3.8 or greater installed
    13 31   
    14 32  2. Download the tool's repository using the command:
     33 +```
     34 +git clone https://github.com/sahanmar/451
     35 +```
    15 36   
    16  - git clone https://github.com/bellingcat/hackathon-submission-template.git
     37 +3. Move to the tool's directory and install the tool
     38 +```
     39 +cd 451
     40 +pip install -r requirements.txt
     41 +```
    17 42   
    18  -3. Move to the tool's directory and install the tool
     43 +4. Start the streamlit app
     44 +```
     45 +streamlit run app/app.py
     46 +```
    19 47   
    20  - cd hackathon-submission-template
    21  - pip install .
     48 +5. On your web browser, load [http://localhost:8501](http://localhost:8501)
    22 49   
    23 50  ## Usage
    24  -This sections includes detailed instructions for using the tool. If the tool has a command-line interface, include common commands and arguments, and some examples of commands and a description of the expected output. If the tool has a graphical user interface or a browser interface, include screenshots and describe a common workflow.
     51 + 
     52 +TBD
    25 53   
    26 54  ## Additional Information
    27 55  This section includes any additional information that you want to mention about the tool, including:
    skipped 4 lines
  • ■ ■ ■ ■ ■ ■
    app/Untitled-1.ipynb
     1 +{
     2 + "cells": [
     3 + {
     4 + "cell_type": "code",
     5 + "execution_count": 1,
     6 + "metadata": {},
     7 + "outputs": [],
     8 + "source": [
     9 + "import pandas as pd\n",
     10 + "import polars as pl"
     11 + ]
     12 + },
     13 + {
     14 + "cell_type": "code",
     15 + "execution_count": null,
     16 + "metadata": {},
     17 + "outputs": [],
     18 + "source": [
     19 + "aws s3 cp s3://ca-amt-df-playground-1-sagemaker-notebook-bucket-eu-west-1/hackathlon/nodes.parquet/ . --recursive "
     20 + ]
     21 + },
     22 + {
     23 + "cell_type": "code",
     24 + "execution_count": 13,
     25 + "metadata": {},
     26 + "outputs": [
     27 + {
     28 + "data": {
     29 + "text/html": [
     30 + "<div>\n",
     31 + "<style scoped>\n",
     32 + " .dataframe tbody tr th:only-of-type {\n",
     33 + " vertical-align: middle;\n",
     34 + " }\n",
     35 + "\n",
     36 + " .dataframe tbody tr th {\n",
     37 + " vertical-align: top;\n",
     38 + " }\n",
     39 + "\n",
     40 + " .dataframe thead th {\n",
     41 + " text-align: right;\n",
     42 + " }\n",
     43 + "</style>\n",
     44 + "<table border=\"1\" class=\"dataframe\">\n",
     45 + " <thead>\n",
     46 + " <tr style=\"text-align: right;\">\n",
     47 + " <th></th>\n",
     48 + " <th>cyclicity</th>\n",
     49 + " <th>node_num</th>\n",
     50 + " </tr>\n",
     51 + " <tr>\n",
     52 + " <th>network_id</th>\n",
     53 + " <th></th>\n",
     54 + " <th></th>\n",
     55 + " </tr>\n",
     56 + " </thead>\n",
     57 + " <tbody>\n",
     58 + " <tr>\n",
     59 + " <th>28587319354</th>\n",
     60 + " <td>0.000000</td>\n",
     61 + " <td>9</td>\n",
     62 + " </tr>\n",
     63 + " <tr>\n",
     64 + " <th>19180640338</th>\n",
     65 + " <td>0.000000</td>\n",
     66 + " <td>7</td>\n",
     67 + " </tr>\n",
     68 + " <tr>\n",
     69 + " <th>29711988418</th>\n",
     70 + " <td>0.000000</td>\n",
     71 + " <td>9</td>\n",
     72 + " </tr>\n",
     73 + " <tr>\n",
     74 + " <th>30146913753</th>\n",
     75 + " <td>0.000000</td>\n",
     76 + " <td>17</td>\n",
     77 + " </tr>\n",
     78 + " <tr>\n",
     79 + " <th>41943095593</th>\n",
     80 + " <td>0.833333</td>\n",
     81 + " <td>18</td>\n",
     82 + " </tr>\n",
     83 + " <tr>\n",
     84 + " <th>...</th>\n",
     85 + " <td>...</td>\n",
     86 + " <td>...</td>\n",
     87 + " </tr>\n",
     88 + " <tr>\n",
     89 + " <th>10546100446</th>\n",
     90 + " <td>0.400000</td>\n",
     91 + " <td>5</td>\n",
     92 + " </tr>\n",
     93 + " <tr>\n",
     94 + " <th>12286972756</th>\n",
     95 + " <td>0.000000</td>\n",
     96 + " <td>7</td>\n",
     97 + " </tr>\n",
     98 + " <tr>\n",
     99 + " <th>20667544820</th>\n",
     100 + " <td>0.100000</td>\n",
     101 + " <td>10</td>\n",
     102 + " </tr>\n",
     103 + " <tr>\n",
     104 + " <th>6838088944</th>\n",
     105 + " <td>0.000000</td>\n",
     106 + " <td>3</td>\n",
     107 + " </tr>\n",
     108 + " <tr>\n",
     109 + " <th>22044908124</th>\n",
     110 + " <td>0.272727</td>\n",
     111 + " <td>11</td>\n",
     112 + " </tr>\n",
     113 + " </tbody>\n",
     114 + "</table>\n",
     115 + "<p>100000 rows × 2 columns</p>\n",
     116 + "</div>"
     117 + ],
     118 + "text/plain": [
     119 + " cyclicity node_num\n",
     120 + "network_id \n",
     121 + "28587319354 0.000000 9\n",
     122 + "19180640338 0.000000 7\n",
     123 + "29711988418 0.000000 9\n",
     124 + "30146913753 0.000000 17\n",
     125 + "41943095593 0.833333 18\n",
     126 + "... ... ...\n",
     127 + "10546100446 0.400000 5\n",
     128 + "12286972756 0.000000 7\n",
     129 + "20667544820 0.100000 10\n",
     130 + "6838088944 0.000000 3\n",
     131 + "22044908124 0.272727 11\n",
     132 + "\n",
     133 + "[100000 rows x 2 columns]"
     134 + ]
     135 + },
     136 + "execution_count": 13,
     137 + "metadata": {},
     138 + "output_type": "execute_result"
     139 + }
     140 + ],
     141 + "source": [
     142 + "pd.read_parquet(\"./data/network.parquet\").set_index(\"network_id\")"
     143 + ]
     144 + }
     145 + ],
     146 + "metadata": {
     147 + "kernelspec": {
     148 + "display_name": "Python 3.10.1 ('.venv': venv)",
     149 + "language": "python",
     150 + "name": "python3"
     151 + },
     152 + "language_info": {
     153 + "codemirror_mode": {
     154 + "name": "ipython",
     155 + "version": 3
     156 + },
     157 + "file_extension": ".py",
     158 + "mimetype": "text/x-python",
     159 + "name": "python",
     160 + "nbconvert_exporter": "python",
     161 + "pygments_lexer": "ipython3",
     162 + "version": "3.10.1"
     163 + },
     164 + "orig_nbformat": 4,
     165 + "vscode": {
     166 + "interpreter": {
     167 + "hash": "7df5df506cb5a387f46ba54efbbd2d65ccf2196d092f81edeb09eadb2dc38463"
     168 + }
     169 + }
     170 + },
     171 + "nbformat": 4,
     172 + "nbformat_minor": 2
     173 +}
     174 + 
  • app/__pycache__/utils.cpython-310.pyc
    Binary file.
  • app/__pycache__/utils.cpython-38.pyc
    Binary file.
  • ■ ■ ■ ■ ■ ■
    app/app.py
     1 +import json
     2 +import streamlit as st
     3 +from streamlit_agraph import agraph, Config
     4 +from utils import (
     5 + build_agraph_components,
     6 + get_subgraph_nodes_df,
     7 + get_subgraph_df,
     8 + get_subgraph_edges_df,
     9 + get_subgraph_with_risk_score,
     10 + build_markdown_strings_for_node,
     11 +)
     12 + 
     13 + 
     14 +st.set_page_config(layout="wide")
     15 + 
     16 + 
     17 +SLIDER_MIN = 0
     18 +SLIDER_MAX = 100
     19 +SLIDER_DEFAULT = 50
     20 +DEFAULT_NUM_SUBGRAPHS_TO_SHOW = 3
     21 +GRAPH_PLOT_HEIGHT_PX = 400
     22 +GRAPH_SIZE_RENDER_LIMIT = 30
     23 +subgraphs = get_subgraph_df()
     24 + 
     25 +with st.sidebar:
     26 + st.title("Corporate risks")
     27 + 
     28 + weight_chains = (
     29 + st.slider(
     30 + "Long ownership chains",
     31 + min_value=SLIDER_MIN,
     32 + max_value=SLIDER_MAX,
     33 + value=SLIDER_DEFAULT,
     34 + )
     35 + / SLIDER_MAX
     36 + )
     37 + weight_cyclic = (
     38 + st.slider(
     39 + "Cyclic ownership",
     40 + min_value=SLIDER_MIN,
     41 + max_value=SLIDER_MAX,
     42 + value=SLIDER_DEFAULT,
     43 + )
     44 + / SLIDER_MAX
     45 + )
     46 + weight_psc_haven = (
     47 + st.slider(
     48 + "Persons of significant control associated with tax havens",
     49 + min_value=SLIDER_MIN,
     50 + max_value=SLIDER_MAX,
     51 + value=SLIDER_DEFAULT,
     52 + )
     53 + / SLIDER_MAX
     54 + )
     55 + weight_pep = (
     56 + st.slider(
     57 + "Officers/PSCs are politically exposed",
     58 + min_value=SLIDER_MIN,
     59 + max_value=SLIDER_MAX,
     60 + value=SLIDER_DEFAULT,
     61 + )
     62 + / SLIDER_MAX
     63 + )
     64 + weight_sanctions = (
     65 + st.slider(
     66 + "Officers/PSCs/Companies are sanctioned",
     67 + min_value=SLIDER_MIN,
     68 + max_value=SLIDER_MAX,
     69 + value=SLIDER_DEFAULT,
     70 + )
     71 + / SLIDER_MAX
     72 + )
     73 + weight_disqualified = (
     74 + st.slider(
     75 + "Officers are disqualified directors",
     76 + min_value=SLIDER_MIN,
     77 + max_value=SLIDER_MAX,
     78 + value=SLIDER_DEFAULT,
     79 + )
     80 + / SLIDER_MAX
     81 + )
     82 + # custom_names_a = st.multiselect(
     83 + # label="Custom persons of interest",
     84 + # options=nodes["node_id"],
     85 + # default=None,
     86 + # )
     87 + custom_names_b = st.file_uploader(label="Custom persons of interest", type="csv")
     88 + 
     89 + go = st.button("Go")
     90 + 
     91 + 
     92 +with st.container():
     93 + 
     94 + subgraph_with_risk_scores = get_subgraph_with_risk_score(
     95 + subgraphs,
     96 + weight_chains=weight_chains,
     97 + weight_cyclic=weight_cyclic,
     98 + weight_psc_haven=weight_psc_haven,
     99 + weight_pep=weight_pep,
     100 + weight_sanctions=weight_sanctions,
     101 + weight_disqualified=weight_disqualified,
     102 + )
     103 + 
     104 + st.dataframe(data=subgraph_with_risk_scores, use_container_width=True)
     105 + 
     106 + selected_subgraph_hashes = st.multiselect(
     107 + label="Select corporate network(s) to explore",
     108 + options=list(subgraph_with_risk_scores.index),
     109 + default=list(
     110 + subgraph_with_risk_scores.head(DEFAULT_NUM_SUBGRAPHS_TO_SHOW).index
     111 + ),
     112 + )
     113 + 
     114 + 
     115 +with st.container():
     116 + num_subgraphs_to_display = len(selected_subgraph_hashes)
     117 + 
     118 + if num_subgraphs_to_display > 0:
     119 + cols = st.columns(num_subgraphs_to_display)
     120 + 
     121 + for c, subgraph_hash in enumerate(selected_subgraph_hashes):
     122 + nodes_selected = get_subgraph_nodes_df(subgraph_hash)
     123 + edges_selected = get_subgraph_edges_df(subgraph_hash)
     124 + 
     125 + with cols[c]:
     126 + if len(nodes_selected) < GRAPH_SIZE_RENDER_LIMIT:
     127 + (node_objects, edge_objects) = build_agraph_components(
     128 + nodes_selected, edges_selected
     129 + )
     130 + agraph(
     131 + nodes=node_objects,
     132 + edges=edge_objects,
     133 + config=Config(
     134 + width=round(1080 / num_subgraphs_to_display),
     135 + height=GRAPH_PLOT_HEIGHT_PX,
     136 + ),
     137 + )
     138 + else:
     139 + st.error("Subgraph is too large to render")
     140 + 
     141 + st.write(nodes_selected)
     142 + # # Build markdown strings for representing metadata
     143 + # markdown_strings = build_markdown_strings_for_node(nodes_selected)
     144 + 
     145 + # st.markdown(":busts_in_silhouette: **People**")
     146 + # for p in markdown_strings["people"]:
     147 + # if ("SANCTIONED" in p) or ("PEP" in p):
     148 + # st.markdown(p)
     149 + # else:
     150 + # st.markdown(p)
     151 + 
     152 + # st.markdown(":office: **Companies**")
     153 + # for c in markdown_strings["companies"]:
     154 + # if ("SANCTIONED" in c) or ("PEP" in c):
     155 + # st.markdown(c)
     156 + # else:
     157 + # st.markdown(c)
     158 + 
  • ■ ■ ■ ■ ■ ■
    app/utils.py
     1 +import streamlit as st
     2 +from streamlit_agraph import Node, Edge
     3 +import json
     4 +import pandas as pd
     5 + 
     6 +NODE_COLOUR_NON_DODGY = "#72EF77"
     7 +NODE_COLOUR_DODGY = "#EF7272"
     8 +NODE_IMAGE_PERSON = "http://i.ibb.co/LrY3tfw/747376.png" # https://www.flaticon.com/free-icon/user_747376
     9 +NODE_IMAGE_COMPANY = "http://i.ibb.co/fx6r1dZ/4812244.png" # https://www.flaticon.com/free-icon/company_4812244
     10 + 
     11 + 
     12 +@st.cache()
     13 +def get_subgraph_df():
     14 + return pd.read_parquet("./data/network.parquet", engine="pyarrow").set_index(
     15 + "network_id"
     16 + )
     17 + 
     18 + 
     19 +@st.cache()
     20 +def get_subgraph_nodes_df(subgraph_hash):
     21 + return pd.read_parquet(
     22 + "./data/nodes.parquet",
     23 + filters=[[("subgraph_hash", "=", subgraph_hash)]],
     24 + engine="pyarrow",
     25 + )
     26 + 
     27 + 
     28 +@st.cache()
     29 +def get_subgraph_edges_df(subgraph_hash):
     30 + return pd.read_parquet(
     31 + "./data/edges.parquet",
     32 + filters=[[("subgraph_hash", "=", subgraph_hash)]],
     33 + engine="pyarrow",
     34 + )
     35 + 
     36 + 
     37 +def get_subgraph_with_risk_score(
     38 + subgraph_table,
     39 + weight_chains,
     40 + weight_cyclic,
     41 + weight_psc_haven,
     42 + weight_pep,
     43 + weight_sanctions,
     44 + weight_disqualified,
     45 +):
     46 + 
     47 + out = subgraph_table.copy()
     48 + out["total_risk"] = out["cyclicity"] * weight_cyclic / out["cyclicity"].max()
     49 + return out.sort_values(by="total_risk", ascending=False)
     50 + 
     51 + 
     52 +def build_agraph_components(
     53 + nodes,
     54 + edges,
     55 +):
     56 + """Create agraph object from node and edge list"""
     57 + 
     58 + node_objects = []
     59 + edge_objects = []
     60 + 
     61 + for _, row in nodes.iterrows():
     62 + # node_metadata = json.loads(row["node_metadata"])
     63 + node_objects.append(
     64 + Node(
     65 + id=row["node_id"],
     66 + label=row["node_id"].split("|")[0],
     67 + size=30,
     68 + # color=NODE_COLOUR_DODGY
     69 + # if (row["pep"] > 0 or row["sanction"] > 0)
     70 + # else NODE_COLOUR_NON_DODGY,
     71 + # image=NODE_IMAGE_PERSON
     72 + # if row["is_person"] == 1
     73 + # else NODE_IMAGE_COMPANY,
     74 + # shape="circularImage",
     75 + shape="circle",
     76 + )
     77 + )
     78 + 
     79 + for _, row in edges.iterrows():
     80 + edge_objects.append(
     81 + Edge(
     82 + source=row["source"],
     83 + label=row["type"],
     84 + target=row["target"],
     85 + )
     86 + )
     87 + 
     88 + return (node_objects, edge_objects)
     89 + 
     90 + 
     91 +def build_markdown_strings_for_node(nodes_selected):
     92 + """Separate into People and Company strings"""
     93 + 
     94 + markdown_strings = dict()
     95 + markdown_strings["companies"] = []
     96 + markdown_strings["people"] = []
     97 + 
     98 + for _, row in nodes_selected.iterrows():
     99 + node_metadata = json.loads(row["node_metadata"])
     100 + node_sanctions = (
     101 + "" if row["sanction"] == 0 else f"! SANCTIONED: {row['sanction_metadata']}"
     102 + )
     103 + node_pep = "" if row["pep"] == 0 else f"! PEP: {row['pep_metadata']}"
     104 + 
     105 + if row["is_person"] == 1:
     106 + node_title = f"{node_metadata['name']} [{node_metadata['nationality']}/{node_metadata['yob']}/{node_metadata['mob']}]"
     107 + key = "people"
     108 + else:
     109 + node_title = f"{node_metadata['name']} [{row['jur']}/{node_metadata['reg']}/{node_metadata['address']}]"
     110 + key = "companies"
     111 + 
     112 + markdown_strings[key].append(
     113 + "\n".join(
     114 + [x for x in ["```", node_title, node_pep, node_sanctions] if len(x) > 0]
     115 + )
     116 + )
     117 + 
     118 + return markdown_strings
     119 + 
  • ■ ■ ■ ■ ■ ■
    src/app.py
    1  -import streamlit as st
    2  -from streamlit_agraph import agraph, Config
    3  -from utils import (
    4  - build_agraph_components,
    5  - get_edges_df,
    6  - get_subgraph_df,
    7  - get_nodes_df,
    8  - get_subgraph_with_risk_score,
    9  -)
    10  - 
    11  - 
    12  -st.set_page_config(layout="wide")
    13  - 
    14  - 
    15  -SLIDER_MIN = 0
    16  -SLIDER_MAX = 100
    17  -SLIDER_DEFAULT = 50
    18  -DEFAULT_NUM_SUBGRAPHS_TO_SHOW = 3
    19  - 
    20  -nodes = get_nodes_df()
    21  -edges = get_edges_df()
    22  -subgraphs = get_subgraph_df()
    23  - 
    24  -with st.sidebar:
    25  - st.title("Corporate risks")
    26  - 
    27  - weight_chains = (
    28  - st.slider(
    29  - "Long ownership chains",
    30  - min_value=SLIDER_MIN,
    31  - max_value=SLIDER_MAX,
    32  - value=SLIDER_DEFAULT,
    33  - )
    34  - / SLIDER_MAX
    35  - )
    36  - weight_cyclic = (
    37  - st.slider(
    38  - "Cyclic ownership",
    39  - min_value=SLIDER_MIN,
    40  - max_value=SLIDER_MAX,
    41  - value=SLIDER_DEFAULT,
    42  - )
    43  - / SLIDER_MAX
    44  - )
    45  - weight_psc_haven = (
    46  - st.slider(
    47  - "Persons of significant control associated with tax havens",
    48  - min_value=SLIDER_MIN,
    49  - max_value=SLIDER_MAX,
    50  - value=SLIDER_DEFAULT,
    51  - )
    52  - / SLIDER_MAX
    53  - )
    54  - weight_pep = (
    55  - st.slider(
    56  - "Officers/PSCs are politically exposed",
    57  - min_value=SLIDER_MIN,
    58  - max_value=SLIDER_MAX,
    59  - value=SLIDER_DEFAULT,
    60  - )
    61  - / SLIDER_MAX
    62  - )
    63  - weight_sanctions = (
    64  - st.slider(
    65  - "Officers/PSCs/Companies are sanctioned",
    66  - min_value=SLIDER_MIN,
    67  - max_value=SLIDER_MAX,
    68  - value=SLIDER_DEFAULT,
    69  - )
    70  - / SLIDER_MAX
    71  - )
    72  - weight_disqualified = (
    73  - st.slider(
    74  - "Officers are disqualified directors",
    75  - min_value=SLIDER_MIN,
    76  - max_value=SLIDER_MAX,
    77  - value=SLIDER_DEFAULT,
    78  - )
    79  - / SLIDER_MAX
    80  - )
    81  - custom_names_a = st.multiselect(
    82  - label="Custom persons of interest",
    83  - options=nodes["node_id"],
    84  - default=None,
    85  - )
    86  - custom_names_b = st.file_uploader(label="Custom persons of interest", type="csv")
    87  - 
    88  - go = st.button("Go")
    89  - 
    90  - 
    91  -with st.container():
    92  - 
    93  - subgraph_with_risk_scores = get_subgraph_with_risk_score(
    94  - subgraphs,
    95  - weight_chains=weight_chains,
    96  - weight_cyclic=weight_cyclic,
    97  - weight_psc_haven=weight_psc_haven,
    98  - weight_pep=weight_pep,
    99  - weight_sanctions=weight_sanctions,
    100  - weight_disqualified=weight_disqualified,
    101  - )
    102  - 
    103  - st.dataframe(data=subgraph_with_risk_scores, use_container_width=True)
    104  - 
    105  - selected_subgraph_hashes = st.multiselect(
    106  - label="Select corporate network(s) to explore",
    107  - options=list(subgraph_with_risk_scores.index),
    108  - default=list(
    109  - subgraph_with_risk_scores.head(DEFAULT_NUM_SUBGRAPHS_TO_SHOW).index
    110  - ),
    111  - )
    112  - 
    113  - 
    114  -with st.container():
    115  - num_subgraphs_to_display = len(selected_subgraph_hashes)
    116  - cols = st.columns(num_subgraphs_to_display)
    117  - 
    118  - for c, subgraph_hash in enumerate(selected_subgraph_hashes):
    119  - nodes_selected = nodes.loc[nodes["subgraph_hash"] == subgraph_hash]
    120  - edges_selected = edges.loc[edges["subgraph_hash"] == subgraph_hash]
    121  - 
    122  - with cols[c]:
    123  - (node_objects, edge_objects) = build_agraph_components(
    124  - nodes_selected, edges_selected
    125  - )
    126  - agraph(
    127  - nodes=node_objects,
    128  - edges=edge_objects,
    129  - config=Config(
    130  - width=round(1080 / num_subgraphs_to_display),
    131  - height=200,
    132  - ),
    133  - )
    134  - 
    135  - st.markdown("*People*")
    136  - st.dataframe(
    137  - nodes_selected.query("is_person == 1"),
    138  - use_container_width=True,
    139  - )
    140  - 
    141  - st.markdown("*Companies*")
    142  - st.dataframe(
    143  - nodes_selected.query("is_person == 0"),
    144  - use_container_width=True,
    145  - )
    146  - 
  • ■ ■ ■ ■ ■ ■
    src/utils.py
    1  -from curses import use_default_colors
    2  -import streamlit as st
    3  -from streamlit_agraph import Node, Edge
    4  -import json
    5  -import pandas as pd
    6  - 
    7  -NODE_COLOUR_NON_DODGY = "#72EF77"
    8  -NODE_COLOUR_DODGY = "#EF7272"
    9  -NODE_IMAGE_PERSON = "http://i.ibb.co/LrY3tfw/747376.png" # https://www.flaticon.com/free-icon/user_747376
    10  -NODE_IMAGE_COMPANY = "http://i.ibb.co/fx6r1dZ/4812244.png" # https://www.flaticon.com/free-icon/company_4812244
    11  - 
    12  -# @st.cache()
    13  -def get_subgraph_df():
    14  - return pd.read_csv("./data/subgraphs.csv", index_col="subgraph_hash")
    15  - 
    16  - 
    17  -# @st.cache()
    18  -def get_nodes_df():
    19  - return pd.read_csv("./data/nodes.csv")
    20  - 
    21  - 
    22  -# @st.cache()
    23  -def get_edges_df():
    24  - return pd.read_csv("./data/edges.csv")
    25  - 
    26  - 
    27  -def get_subgraph_with_risk_score(
    28  - subgraph_table,
    29  - weight_chains,
    30  - weight_cyclic,
    31  - weight_psc_haven,
    32  - weight_pep,
    33  - weight_sanctions,
    34  - weight_disqualified,
    35  -):
    36  - 
    37  - out = subgraph_table.copy()
    38  - out["total_risk"] = (
    39  - (out["cyclicity"] * weight_cyclic / out["cyclicity"].max())
    40  - + (
    41  - out["multi_jurisdiction"]
    42  - * weight_psc_haven
    43  - / out["multi_jurisdiction"].max()
    44  - )
    45  - + (out["num_sanctions"] * weight_sanctions / out["num_sanctions"].max())
    46  - + (out["num_peps"] * weight_pep / out["num_peps"].max())
    47  - )
    48  - return out.sort_values(by="total_risk", ascending=False)
    49  - 
    50  - 
    51  -def build_agraph_components(
    52  - nodes,
    53  - edges,
    54  -):
    55  - """Create agraph object from node and edge list"""
    56  - 
    57  - node_objects = []
    58  - edge_objects = []
    59  - 
    60  - for _, row in nodes.iterrows():
    61  - node_metadata = json.loads(row["node_metadata"])
    62  - node_objects.append(
    63  - Node(
    64  - id=row["node_id"],
    65  - label=node_metadata["name"],
    66  - size=25,
    67  - color=NODE_COLOUR_DODGY
    68  - if (row["pep"] > 0 or row["sanction"] > 0)
    69  - else NODE_COLOUR_NON_DODGY,
    70  - image=NODE_IMAGE_PERSON
    71  - if row["is_person"] == 1
    72  - else NODE_IMAGE_COMPANY,
    73  - shape="circularImage",
    74  - )
    75  - )
    76  - 
    77  - for _, row in edges.iterrows():
    78  - edge_objects.append(
    79  - Edge(
    80  - source=row["source"],
    81  - label=row["type"],
    82  - target=row["target"],
    83  - )
    84  - )
    85  - 
    86  - return (node_objects, edge_objects)
    87  - 
Please wait...
Page is in error, reload to recover