STRLCPY/451-CorporateRiskMiner

■ ■ ■ ■ ■ ■

README.md

1		-	# Project name
	1	+	# 451 Corporate Risk Miner
2	2
3	3		## Team Members
4		-	This section is a list of team members, and possibly links to GitHub/GitLab/LinkedIn/personal blog pages for members.
	4	+	Elena Dulskyte [linkedin](https://www.linkedin.com/in/elena-dulskyte-50b83aa2/)
	5	+	Marko Sahan [github](http://github.com/sahanmar) [linkedin](https://www.linkedin.com/in/msahan/)
	6	+	Peter Zatka-Haas [github](http://github.com/peterzh) [linkedin](https://www.linkedin.com/in/peterzatkahaas)
5	7
6	8		## Tool Description
7		-	This sections discusses the purpose and motivation for the tool, and how it addresses a tool need you've identified.
	9	+
	10	+	Financial crime journalists need to dig through complex corporate ownership databases (i.e. databases of companies and the people/companies that control those companies) in order to find potentially interesting people/companies related to financial crime. They face several problems along the way:
	11	+	1. It is difficult to search across multiple publicly-available databases (UK Companies House, ICIJ Leaks, VK)
	12	+	2. There are multiple ‘risk signatures’ associated with criminal activity (e.g. Cyclical or long-chain ownership, links to sanctions, etc) and different journalists prioritise different kinds of signatures in their investigation.
	13	+	3. It is hard to prioritise which corporate ownership structures are more ‘risky’ than others
	14	+	4. It is hard to see the visualise corporate ownership with different risk signals
	15	+
	16	+	Corporate Risk Miner is a web app which allows users to evaluate different risk signatures of financial crime applied to the UK Companies House (UKCH) corporate ownership database. These risk signatures include:
	17	+	* Cyclic ownership: (to explain.....)
	18	+	* Long-chain ownership: Long chains of corporate ownership (e.g. Person A controls company A. Company A is an officer for Company B. Company B is an officer of company C. etc)
	19	+	* Links to tax havens: Corporate networks which involve companies/people associated with tax haven jurisdictions
	20	+	* Multi-jurisdictionness: Corporate networsk which span many jurisdictions
	21	+	* Presence of proxy directors: Proxy directors are individual people who are registered as a company director but who are likely never involved in the running of the business. These people are often directors for many companies.
	22	+	* Links to sanctioned entities: Official sanctioned people or companies, from sources such as the UN Sanctions List.
	23	+	* Links to politically-exposed persons (PEPs)
	24	+	* Links to disqualified directors
	25	+
	26	+	The user can customise the relative 'importance' of each risk signature for their search. For example one user may rate 'cyclic ownership' as a less important feature than 'association with tax havens' in flagging up potentially dodgy corporate networks. One the user chooses their signature preferences, the app generates a risk score associated with each corporate network and displays the structure of those networks with the highest risk scores.
8	27
9	28		## Installation
10		-	This section includes detailed instructions for installing the tool, including any terminal commands that need to be executed and dependencies that need to be installed. Instructions should be understandable by non-technical users (e.g. someone who knows how to open a terminal and run commands, but isn't necessarily a programmer), for example:
11	29
12	30		1. Make sure you have Python version 3.8 or greater installed
13	31
14	32		2. Download the tool's repository using the command:
	33	+	```
	34	+	git clone https://github.com/sahanmar/451
	35	+	```
15	36
16		-	git clone https://github.com/bellingcat/hackathon-submission-template.git
	37	+	3. Move to the tool's directory and install the tool
	38	+	```
	39	+	cd 451
	40	+	pip install -r requirements.txt
	41	+	```
17	42
18		-	3. Move to the tool's directory and install the tool
	43	+	4. Start the streamlit app
	44	+	```
	45	+	streamlit run app/app.py
	46	+	```
19	47
20		-	cd hackathon-submission-template
21		-	pip install .
	48	+	5. On your web browser, load [http://localhost:8501](http://localhost:8501)
22	49
23	50		## Usage
24		-	This sections includes detailed instructions for using the tool. If the tool has a command-line interface, include common commands and arguments, and some examples of commands and a description of the expected output. If the tool has a graphical user interface or a browser interface, include screenshots and describe a common workflow.
	51	+
	52	+	TBD
25	53
26	54		## Additional Information
27	55		This section includes any additional information that you want to mention about the tool, including:
		skipped 4 lines

■ ■ ■ ■ ■ ■

app/Untitled-1.ipynb

1	+	{
2	+	"cells": [
3	+	{
4	+	"cell_type": "code",
5	+	"execution_count": 1,
6	+	"metadata": {},
7	+	"outputs": [],
8	+	"source": [
9	+	"import pandas as pd\n",
10	+	"import polars as pl"
11	+	]
12	+	},
13	+	{
14	+	"cell_type": "code",
15	+	"execution_count": null,
16	+	"metadata": {},
17	+	"outputs": [],
18	+	"source": [
19	+	"aws s3 cp s3://ca-amt-df-playground-1-sagemaker-notebook-bucket-eu-west-1/hackathlon/nodes.parquet/ . --recursive "
20	+	]
21	+	},
22	+	{
23	+	"cell_type": "code",
24	+	"execution_count": 13,
25	+	"metadata": {},
26	+	"outputs": [
27	+	{
28	+	"data": {
29	+	"text/html": [
30	+	"<div>\n",
31	+	"<style scoped>\n",
32	+	" .dataframe tbody tr th:only-of-type {\n",
33	+	" vertical-align: middle;\n",
34	+	" }\n",
35	+	"\n",
36	+	" .dataframe tbody tr th {\n",
37	+	" vertical-align: top;\n",
38	+	" }\n",
39	+	"\n",
40	+	" .dataframe thead th {\n",
41	+	" text-align: right;\n",
42	+	" }\n",
43	+	"</style>\n",
44	+	"<table border=\"1\" class=\"dataframe\">\n",
45	+	" <thead>\n",
46	+	" <tr style=\"text-align: right;\">\n",
47	+	" <th></th>\n",
48	+	" <th>cyclicity</th>\n",
49	+	" <th>node_num</th>\n",
50	+	" </tr>\n",
51	+	" <tr>\n",
52	+	" <th>network_id</th>\n",
53	+	" <th></th>\n",
54	+	" <th></th>\n",
55	+	" </tr>\n",
56	+	" </thead>\n",
57	+	" <tbody>\n",
58	+	" <tr>\n",
59	+	" <th>28587319354</th>\n",
60	+	" <td>0.000000</td>\n",
61	+	" <td>9</td>\n",
62	+	" </tr>\n",
63	+	" <tr>\n",
64	+	" <th>19180640338</th>\n",
65	+	" <td>0.000000</td>\n",
66	+	" <td>7</td>\n",
67	+	" </tr>\n",
68	+	" <tr>\n",
69	+	" <th>29711988418</th>\n",
70	+	" <td>0.000000</td>\n",
71	+	" <td>9</td>\n",
72	+	" </tr>\n",
73	+	" <tr>\n",
74	+	" <th>30146913753</th>\n",
75	+	" <td>0.000000</td>\n",
76	+	" <td>17</td>\n",
77	+	" </tr>\n",
78	+	" <tr>\n",
79	+	" <th>41943095593</th>\n",
80	+	" <td>0.833333</td>\n",
81	+	" <td>18</td>\n",
82	+	" </tr>\n",
83	+	" <tr>\n",
84	+	" <th>...</th>\n",
85	+	" <td>...</td>\n",
86	+	" <td>...</td>\n",
87	+	" </tr>\n",
88	+	" <tr>\n",
89	+	" <th>10546100446</th>\n",
90	+	" <td>0.400000</td>\n",
91	+	" <td>5</td>\n",
92	+	" </tr>\n",
93	+	" <tr>\n",
94	+	" <th>12286972756</th>\n",
95	+	" <td>0.000000</td>\n",
96	+	" <td>7</td>\n",
97	+	" </tr>\n",
98	+	" <tr>\n",
99	+	" <th>20667544820</th>\n",
100	+	" <td>0.100000</td>\n",
101	+	" <td>10</td>\n",
102	+	" </tr>\n",
103	+	" <tr>\n",
104	+	" <th>6838088944</th>\n",
105	+	" <td>0.000000</td>\n",
106	+	" <td>3</td>\n",
107	+	" </tr>\n",
108	+	" <tr>\n",
109	+	" <th>22044908124</th>\n",
110	+	" <td>0.272727</td>\n",
111	+	" <td>11</td>\n",
112	+	" </tr>\n",
113	+	" </tbody>\n",
114	+	"</table>\n",
115	+	"<p>100000 rows × 2 columns</p>\n",
116	+	"</div>"
117	+	],
118	+	"text/plain": [
119	+	" cyclicity node_num\n",
120	+	"network_id \n",
121	+	"28587319354 0.000000 9\n",
122	+	"19180640338 0.000000 7\n",
123	+	"29711988418 0.000000 9\n",
124	+	"30146913753 0.000000 17\n",
125	+	"41943095593 0.833333 18\n",
126	+	"... ... ...\n",
127	+	"10546100446 0.400000 5\n",
128	+	"12286972756 0.000000 7\n",
129	+	"20667544820 0.100000 10\n",
130	+	"6838088944 0.000000 3\n",
131	+	"22044908124 0.272727 11\n",
132	+	"\n",
133	+	"[100000 rows x 2 columns]"
134	+	]
135	+	},
136	+	"execution_count": 13,
137	+	"metadata": {},
138	+	"output_type": "execute_result"
139	+	}
140	+	],
141	+	"source": [
142	+	"pd.read_parquet(\"./data/network.parquet\").set_index(\"network_id\")"
143	+	]
144	+	}
145	+	],
146	+	"metadata": {
147	+	"kernelspec": {
148	+	"display_name": "Python 3.10.1 ('.venv': venv)",
149	+	"language": "python",
150	+	"name": "python3"
151	+	},
152	+	"language_info": {
153	+	"codemirror_mode": {
154	+	"name": "ipython",
155	+	"version": 3
156	+	},
157	+	"file_extension": ".py",
158	+	"mimetype": "text/x-python",
159	+	"name": "python",
160	+	"nbconvert_exporter": "python",
161	+	"pygments_lexer": "ipython3",
162	+	"version": "3.10.1"
163	+	},
164	+	"orig_nbformat": 4,
165	+	"vscode": {
166	+	"interpreter": {
167	+	"hash": "7df5df506cb5a387f46ba54efbbd2d65ccf2196d092f81edeb09eadb2dc38463"
168	+	}
169	+	}
170	+	},
171	+	"nbformat": 4,
172	+	"nbformat_minor": 2
173	+	}
174	+

app/__pycache__/utils.cpython-310.pyc

Binary file.

app/__pycache__/utils.cpython-38.pyc

Binary file.

■ ■ ■ ■ ■ ■

app/app.py

1	+	import json
2	+	import streamlit as st
3	+	from streamlit_agraph import agraph, Config
4	+	from utils import (
5	+	build_agraph_components,
6	+	get_subgraph_nodes_df,
7	+	get_subgraph_df,
8	+	get_subgraph_edges_df,
9	+	get_subgraph_with_risk_score,
10	+	build_markdown_strings_for_node,
11	+	)
12	+
13	+
14	+	st.set_page_config(layout="wide")
15	+
16	+
17	+	SLIDER_MIN = 0
18	+	SLIDER_MAX = 100
19	+	SLIDER_DEFAULT = 50
20	+	DEFAULT_NUM_SUBGRAPHS_TO_SHOW = 3
21	+	GRAPH_PLOT_HEIGHT_PX = 400
22	+	GRAPH_SIZE_RENDER_LIMIT = 30
23	+	subgraphs = get_subgraph_df()
24	+
25	+	with st.sidebar:
26	+	st.title("Corporate risks")
27	+
28	+	weight_chains = (
29	+	st.slider(
30	+	"Long ownership chains",
31	+	min_value=SLIDER_MIN,
32	+	max_value=SLIDER_MAX,
33	+	value=SLIDER_DEFAULT,
34	+	)
35	+	/ SLIDER_MAX
36	+	)
37	+	weight_cyclic = (
38	+	st.slider(
39	+	"Cyclic ownership",
40	+	min_value=SLIDER_MIN,
41	+	max_value=SLIDER_MAX,
42	+	value=SLIDER_DEFAULT,
43	+	)
44	+	/ SLIDER_MAX
45	+	)
46	+	weight_psc_haven = (
47	+	st.slider(
48	+	"Persons of significant control associated with tax havens",
49	+	min_value=SLIDER_MIN,
50	+	max_value=SLIDER_MAX,
51	+	value=SLIDER_DEFAULT,
52	+	)
53	+	/ SLIDER_MAX
54	+	)
55	+	weight_pep = (
56	+	st.slider(
57	+	"Officers/PSCs are politically exposed",
58	+	min_value=SLIDER_MIN,
59	+	max_value=SLIDER_MAX,
60	+	value=SLIDER_DEFAULT,
61	+	)
62	+	/ SLIDER_MAX
63	+	)
64	+	weight_sanctions = (
65	+	st.slider(
66	+	"Officers/PSCs/Companies are sanctioned",
67	+	min_value=SLIDER_MIN,
68	+	max_value=SLIDER_MAX,
69	+	value=SLIDER_DEFAULT,
70	+	)
71	+	/ SLIDER_MAX
72	+	)
73	+	weight_disqualified = (
74	+	st.slider(
75	+	"Officers are disqualified directors",
76	+	min_value=SLIDER_MIN,
77	+	max_value=SLIDER_MAX,
78	+	value=SLIDER_DEFAULT,
79	+	)
80	+	/ SLIDER_MAX
81	+	)
82	+	# custom_names_a = st.multiselect(
83	+	# label="Custom persons of interest",
84	+	# options=nodes["node_id"],
85	+	# default=None,
86	+	# )
87	+	custom_names_b = st.file_uploader(label="Custom persons of interest", type="csv")
88	+
89	+	go = st.button("Go")
90	+
91	+
92	+	with st.container():
93	+
94	+	subgraph_with_risk_scores = get_subgraph_with_risk_score(
95	+	subgraphs,
96	+	weight_chains=weight_chains,
97	+	weight_cyclic=weight_cyclic,
98	+	weight_psc_haven=weight_psc_haven,
99	+	weight_pep=weight_pep,
100	+	weight_sanctions=weight_sanctions,
101	+	weight_disqualified=weight_disqualified,
102	+	)
103	+
104	+	st.dataframe(data=subgraph_with_risk_scores, use_container_width=True)
105	+
106	+	selected_subgraph_hashes = st.multiselect(
107	+	label="Select corporate network(s) to explore",
108	+	options=list(subgraph_with_risk_scores.index),
109	+	default=list(
110	+	subgraph_with_risk_scores.head(DEFAULT_NUM_SUBGRAPHS_TO_SHOW).index
111	+	),
112	+	)
113	+
114	+
115	+	with st.container():
116	+	num_subgraphs_to_display = len(selected_subgraph_hashes)
117	+
118	+	if num_subgraphs_to_display > 0:
119	+	cols = st.columns(num_subgraphs_to_display)
120	+
121	+	for c, subgraph_hash in enumerate(selected_subgraph_hashes):
122	+	nodes_selected = get_subgraph_nodes_df(subgraph_hash)
123	+	edges_selected = get_subgraph_edges_df(subgraph_hash)
124	+
125	+	with cols[c]:
126	+	if len(nodes_selected) < GRAPH_SIZE_RENDER_LIMIT:
127	+	(node_objects, edge_objects) = build_agraph_components(
128	+	nodes_selected, edges_selected
129	+	)
130	+	agraph(
131	+	nodes=node_objects,
132	+	edges=edge_objects,
133	+	config=Config(
134	+	width=round(1080 / num_subgraphs_to_display),
135	+	height=GRAPH_PLOT_HEIGHT_PX,
136	+	),
137	+	)
138	+	else:
139	+	st.error("Subgraph is too large to render")
140	+
141	+	st.write(nodes_selected)
142	+	# # Build markdown strings for representing metadata
143	+	# markdown_strings = build_markdown_strings_for_node(nodes_selected)
144	+
145	+	# st.markdown(":busts_in_silhouette: People")
146	+	# for p in markdown_strings["people"]:
147	+	# if ("SANCTIONED" in p) or ("PEP" in p):
148	+	# st.markdown(p)
149	+	# else:
150	+	# st.markdown(p)
151	+
152	+	# st.markdown(":office: Companies")
153	+	# for c in markdown_strings["companies"]:
154	+	# if ("SANCTIONED" in c) or ("PEP" in c):
155	+	# st.markdown(c)
156	+	# else:
157	+	# st.markdown(c)
158	+

■ ■ ■ ■ ■ ■

app/utils.py

1	+	import streamlit as st
2	+	from streamlit_agraph import Node, Edge
3	+	import json
4	+	import pandas as pd
5	+
6	+	NODE_COLOUR_NON_DODGY = "#72EF77"
7	+	NODE_COLOUR_DODGY = "#EF7272"
8	+	NODE_IMAGE_PERSON = "http://i.ibb.co/LrY3tfw/747376.png" # https://www.flaticon.com/free-icon/user_747376
9	+	NODE_IMAGE_COMPANY = "http://i.ibb.co/fx6r1dZ/4812244.png" # https://www.flaticon.com/free-icon/company_4812244
10	+
11	+
12	+	@st.cache()
13	+	def get_subgraph_df():
14	+	return pd.read_parquet("./data/network.parquet", engine="pyarrow").set_index(
15	+	"network_id"
16	+	)
17	+
18	+
19	+	@st.cache()
20	+	def get_subgraph_nodes_df(subgraph_hash):
21	+	return pd.read_parquet(
22	+	"./data/nodes.parquet",
23	+	filters=[[("subgraph_hash", "=", subgraph_hash)]],
24	+	engine="pyarrow",
25	+	)
26	+
27	+
28	+	@st.cache()
29	+	def get_subgraph_edges_df(subgraph_hash):
30	+	return pd.read_parquet(
31	+	"./data/edges.parquet",
32	+	filters=[[("subgraph_hash", "=", subgraph_hash)]],
33	+	engine="pyarrow",
34	+	)
35	+
36	+
37	+	def get_subgraph_with_risk_score(
38	+	subgraph_table,
39	+	weight_chains,
40	+	weight_cyclic,
41	+	weight_psc_haven,
42	+	weight_pep,
43	+	weight_sanctions,
44	+	weight_disqualified,
45	+	):
46	+
47	+	out = subgraph_table.copy()
48	+	out["total_risk"] = out["cyclicity"] * weight_cyclic / out["cyclicity"].max()
49	+	return out.sort_values(by="total_risk", ascending=False)
50	+
51	+
52	+	def build_agraph_components(
53	+	nodes,
54	+	edges,
55	+	):
56	+	"""Create agraph object from node and edge list"""
57	+
58	+	node_objects = []
59	+	edge_objects = []
60	+
61	+	for _, row in nodes.iterrows():
62	+	# node_metadata = json.loads(row["node_metadata"])
63	+	node_objects.append(
64	+	Node(
65	+	id=row["node_id"],
66	+	label=row["node_id"].split("\|")[0],
67	+	size=30,
68	+	# color=NODE_COLOUR_DODGY
69	+	# if (row["pep"] > 0 or row["sanction"] > 0)
70	+	# else NODE_COLOUR_NON_DODGY,
71	+	# image=NODE_IMAGE_PERSON
72	+	# if row["is_person"] == 1
73	+	# else NODE_IMAGE_COMPANY,
74	+	# shape="circularImage",
75	+	shape="circle",
76	+	)
77	+	)
78	+
79	+	for _, row in edges.iterrows():
80	+	edge_objects.append(
81	+	Edge(
82	+	source=row["source"],
83	+	label=row["type"],
84	+	target=row["target"],
85	+	)
86	+	)
87	+
88	+	return (node_objects, edge_objects)
89	+
90	+
91	+	def build_markdown_strings_for_node(nodes_selected):
92	+	"""Separate into People and Company strings"""
93	+
94	+	markdown_strings = dict()
95	+	markdown_strings["companies"] = []
96	+	markdown_strings["people"] = []
97	+
98	+	for _, row in nodes_selected.iterrows():
99	+	node_metadata = json.loads(row["node_metadata"])
100	+	node_sanctions = (
101	+	"" if row["sanction"] == 0 else f"! SANCTIONED: {row['sanction_metadata']}"
102	+	)
103	+	node_pep = "" if row["pep"] == 0 else f"! PEP: {row['pep_metadata']}"
104	+
105	+	if row["is_person"] == 1:
106	+	node_title = f"{node_metadata['name']} [{node_metadata['nationality']}/{node_metadata['yob']}/{node_metadata['mob']}]"
107	+	key = "people"
108	+	else:
109	+	node_title = f"{node_metadata['name']} [{row['jur']}/{node_metadata['reg']}/{node_metadata['address']}]"
110	+	key = "companies"
111	+
112	+	markdown_strings[key].append(
113	+	"\n".join(
114	+	[x for x in ["```", node_title, node_pep, node_sanctions] if len(x) > 0]
115	+	)
116	+	)
117	+
118	+	return markdown_strings
119	+

■ ■ ■ ■ ■ ■

src/app.py

1	-	import streamlit as st
2	-	from streamlit_agraph import agraph, Config
3	-	from utils import (
4	-	build_agraph_components,
5	-	get_edges_df,
6	-	get_subgraph_df,
7	-	get_nodes_df,
8	-	get_subgraph_with_risk_score,
9	-	)
10	-
11	-
12	-	st.set_page_config(layout="wide")
13	-
14	-
15	-	SLIDER_MIN = 0
16	-	SLIDER_MAX = 100
17	-	SLIDER_DEFAULT = 50
18	-	DEFAULT_NUM_SUBGRAPHS_TO_SHOW = 3
19	-
20	-	nodes = get_nodes_df()
21	-	edges = get_edges_df()
22	-	subgraphs = get_subgraph_df()
23	-
24	-	with st.sidebar:
25	-	st.title("Corporate risks")
26	-
27	-	weight_chains = (
28	-	st.slider(
29	-	"Long ownership chains",
30	-	min_value=SLIDER_MIN,
31	-	max_value=SLIDER_MAX,
32	-	value=SLIDER_DEFAULT,
33	-	)
34	-	/ SLIDER_MAX
35	-	)
36	-	weight_cyclic = (
37	-	st.slider(
38	-	"Cyclic ownership",
39	-	min_value=SLIDER_MIN,
40	-	max_value=SLIDER_MAX,
41	-	value=SLIDER_DEFAULT,
42	-	)
43	-	/ SLIDER_MAX
44	-	)
45	-	weight_psc_haven = (
46	-	st.slider(
47	-	"Persons of significant control associated with tax havens",
48	-	min_value=SLIDER_MIN,
49	-	max_value=SLIDER_MAX,
50	-	value=SLIDER_DEFAULT,
51	-	)
52	-	/ SLIDER_MAX
53	-	)
54	-	weight_pep = (
55	-	st.slider(
56	-	"Officers/PSCs are politically exposed",
57	-	min_value=SLIDER_MIN,
58	-	max_value=SLIDER_MAX,
59	-	value=SLIDER_DEFAULT,
60	-	)
61	-	/ SLIDER_MAX
62	-	)
63	-	weight_sanctions = (
64	-	st.slider(
65	-	"Officers/PSCs/Companies are sanctioned",
66	-	min_value=SLIDER_MIN,
67	-	max_value=SLIDER_MAX,
68	-	value=SLIDER_DEFAULT,
69	-	)
70	-	/ SLIDER_MAX
71	-	)
72	-	weight_disqualified = (
73	-	st.slider(
74	-	"Officers are disqualified directors",
75	-	min_value=SLIDER_MIN,
76	-	max_value=SLIDER_MAX,
77	-	value=SLIDER_DEFAULT,
78	-	)
79	-	/ SLIDER_MAX
80	-	)
81	-	custom_names_a = st.multiselect(
82	-	label="Custom persons of interest",
83	-	options=nodes["node_id"],
84	-	default=None,
85	-	)
86	-	custom_names_b = st.file_uploader(label="Custom persons of interest", type="csv")
87	-
88	-	go = st.button("Go")
89	-
90	-
91	-	with st.container():
92	-
93	-	subgraph_with_risk_scores = get_subgraph_with_risk_score(
94	-	subgraphs,
95	-	weight_chains=weight_chains,
96	-	weight_cyclic=weight_cyclic,
97	-	weight_psc_haven=weight_psc_haven,
98	-	weight_pep=weight_pep,
99	-	weight_sanctions=weight_sanctions,
100	-	weight_disqualified=weight_disqualified,
101	-	)
102	-
103	-	st.dataframe(data=subgraph_with_risk_scores, use_container_width=True)
104	-
105	-	selected_subgraph_hashes = st.multiselect(
106	-	label="Select corporate network(s) to explore",
107	-	options=list(subgraph_with_risk_scores.index),
108	-	default=list(
109	-	subgraph_with_risk_scores.head(DEFAULT_NUM_SUBGRAPHS_TO_SHOW).index
110	-	),
111	-	)
112	-
113	-
114	-	with st.container():
115	-	num_subgraphs_to_display = len(selected_subgraph_hashes)
116	-	cols = st.columns(num_subgraphs_to_display)
117	-
118	-	for c, subgraph_hash in enumerate(selected_subgraph_hashes):
119	-	nodes_selected = nodes.loc[nodes["subgraph_hash"] == subgraph_hash]
120	-	edges_selected = edges.loc[edges["subgraph_hash"] == subgraph_hash]
121	-
122	-	with cols[c]:
123	-	(node_objects, edge_objects) = build_agraph_components(
124	-	nodes_selected, edges_selected
125	-	)
126	-	agraph(
127	-	nodes=node_objects,
128	-	edges=edge_objects,
129	-	config=Config(
130	-	width=round(1080 / num_subgraphs_to_display),
131	-	height=200,
132	-	),
133	-	)
134	-
135	-	st.markdown("People")
136	-	st.dataframe(
137	-	nodes_selected.query("is_person == 1"),
138	-	use_container_width=True,
139	-	)
140	-
141	-	st.markdown("Companies")
142	-	st.dataframe(
143	-	nodes_selected.query("is_person == 0"),
144	-	use_container_width=True,
145	-	)
146	-

■ ■ ■ ■ ■ ■

src/utils.py

1	-	from curses import use_default_colors
2	-	import streamlit as st
3	-	from streamlit_agraph import Node, Edge
4	-	import json
5	-	import pandas as pd
6	-
7	-	NODE_COLOUR_NON_DODGY = "#72EF77"
8	-	NODE_COLOUR_DODGY = "#EF7272"
9	-	NODE_IMAGE_PERSON = "http://i.ibb.co/LrY3tfw/747376.png" # https://www.flaticon.com/free-icon/user_747376
10	-	NODE_IMAGE_COMPANY = "http://i.ibb.co/fx6r1dZ/4812244.png" # https://www.flaticon.com/free-icon/company_4812244
11	-
12	-	# @st.cache()
13	-	def get_subgraph_df():
14	-	return pd.read_csv("./data/subgraphs.csv", index_col="subgraph_hash")
15	-
16	-
17	-	# @st.cache()
18	-	def get_nodes_df():
19	-	return pd.read_csv("./data/nodes.csv")
20	-
21	-
22	-	# @st.cache()
23	-	def get_edges_df():
24	-	return pd.read_csv("./data/edges.csv")
25	-
26	-
27	-	def get_subgraph_with_risk_score(
28	-	subgraph_table,
29	-	weight_chains,
30	-	weight_cyclic,
31	-	weight_psc_haven,
32	-	weight_pep,
33	-	weight_sanctions,
34	-	weight_disqualified,
35	-	):
36	-
37	-	out = subgraph_table.copy()
38	-	out["total_risk"] = (
39	-	(out["cyclicity"] * weight_cyclic / out["cyclicity"].max())
40	-	+ (
41	-	out["multi_jurisdiction"]
42	-	* weight_psc_haven
43	-	/ out["multi_jurisdiction"].max()
44	-	)
45	-	+ (out["num_sanctions"] * weight_sanctions / out["num_sanctions"].max())
46	-	+ (out["num_peps"] * weight_pep / out["num_peps"].max())
47	-	)
48	-	return out.sort_values(by="total_risk", ascending=False)
49	-
50	-
51	-	def build_agraph_components(
52	-	nodes,
53	-	edges,
54	-	):
55	-	"""Create agraph object from node and edge list"""
56	-
57	-	node_objects = []
58	-	edge_objects = []
59	-
60	-	for _, row in nodes.iterrows():
61	-	node_metadata = json.loads(row["node_metadata"])
62	-	node_objects.append(
63	-	Node(
64	-	id=row["node_id"],
65	-	label=node_metadata["name"],
66	-	size=25,
67	-	color=NODE_COLOUR_DODGY
68	-	if (row["pep"] > 0 or row["sanction"] > 0)
69	-	else NODE_COLOUR_NON_DODGY,
70	-	image=NODE_IMAGE_PERSON
71	-	if row["is_person"] == 1
72	-	else NODE_IMAGE_COMPANY,
73	-	shape="circularImage",
74	-	)
75	-	)
76	-
77	-	for _, row in edges.iterrows():
78	-	edge_objects.append(
79	-	Edge(
80	-	source=row["source"],
81	-	label=row["type"],
82	-	target=row["target"],
83	-	)
84	-	)
85	-
86	-	return (node_objects, edge_objects)
87	-

Added to readme; renamed app folder