1349 lines
55 KiB
Plaintext
1349 lines
55 KiB
Plaintext
{
|
||
"nbformat": 4,
|
||
"nbformat_minor": 0,
|
||
"metadata": {
|
||
"colab": {
|
||
"name": "α-ICON: an application to UK PSC data",
|
||
"provenance": [],
|
||
"collapsed_sections": [],
|
||
"toc_visible": true
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"name": "python"
|
||
}
|
||
},
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "Cifd3_bJhpUo"
|
||
},
|
||
"source": [
|
||
"## Introduction\n",
|
||
"This notebook demonstrates α-ICON (Indirect Control in Onion-like networks) --- an algorithm to identify ultimate controlling entities in corporate networks. We provide a self-contained application as a companion to [our paper](https://arxiv.org/abs/2109.07181) and [repository](https://github.com/eusporg/alphaicon).\n",
|
||
"\n",
|
||
"We will be working with the data from the UK's People with Significant Control register with 4.2 million companies and 4 million of their holders as of August, 2021.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "W2vhy5c6jmBi"
|
||
},
|
||
"source": [
|
||
"## Data loading & import\n",
|
||
"All data pre-processing of the [PSC snapshot](http://download.companieshouse.gov.uk/en_pscdata.html) is done in the [repository](https://github.com/eusporg/alphaicon) (`code/data_preparation/uk`). The resulting data is stored in a [public folder](https://drive.google.com/drive/folders/10Tq-b4BVsG3gmq2JVa026Nilzj8eojNB) on Google Drive.\n",
|
||
"\n",
|
||
"We will be working with two files:\n",
|
||
"\n",
|
||
"* `output/uk/uk_organisations_participants_2021_long_2aug21.zip` --- an archived \n",
|
||
"CSV with company ID-participant ID mapping from the PSC data and the respective equity shares.\n",
|
||
"* `output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_dpi_10000iter.zip` --- an archived CSV with company ID-participant ID mapping from the PSC data and their Direct Power Indices ([Mizuno, Doi, and Kurizaki (2020)](https://doi.org/10.1371/journal.pone.0237862)). \n",
|
||
"\n",
|
||
"\n",
|
||
" \n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"id": "FhocHBOyvbsu",
|
||
"outputId": "c6f1b1a0-ec59-49a0-abf0-1a3e08a10aab"
|
||
},
|
||
"source": [
|
||
"# Download and unarchive the data files from the Google Drive public link\n",
|
||
"!pip install gdown\n",
|
||
"\n",
|
||
"!gdown https://drive.google.com/uc?id=1rpi5FEPrKfx9vIwDpr_mfK6L971rrtL7 \n",
|
||
"!unzip uk_organisations_participants_2021_long_2aug21.zip\n",
|
||
" \n",
|
||
"!gdown https://drive.google.com/uc?id=1UBsF3RBMvjF7dBb1PG-wXhEBv3whoMLG\n",
|
||
"!unzip uk_organisations_participants_2021_long_7sep21_dpi_10000iter.zip\n",
|
||
" \n",
|
||
"import pandas as pd\n",
|
||
"import scipy\n",
|
||
"from os.path import join\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"import scipy.sparse as sp\n",
|
||
"from scipy.sparse.linalg import eigs\n",
|
||
"import numpy as np\n",
|
||
"from itertools import combinations\n",
|
||
"import tqdm\n",
|
||
"import networkx as nx\n",
|
||
"import gc"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"Requirement already satisfied: gdown in /usr/local/lib/python3.7/dist-packages (3.6.4)\n",
|
||
"Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from gdown) (1.15.0)\n",
|
||
"Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from gdown) (4.62.0)\n",
|
||
"Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from gdown) (2.23.0)\n",
|
||
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->gdown) (2.10)\n",
|
||
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->gdown) (3.0.4)\n",
|
||
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->gdown) (1.24.3)\n",
|
||
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->gdown) (2021.5.30)\n",
|
||
"Downloading...\n",
|
||
"From: https://drive.google.com/uc?id=1rpi5FEPrKfx9vIwDpr_mfK6L971rrtL7\n",
|
||
"To: /content/uk_organisations_participants_2021_long_2aug21.zip\n",
|
||
"74.3MB [00:00, 179MB/s]\n",
|
||
"Archive: uk_organisations_participants_2021_long_2aug21.zip\n",
|
||
" inflating: uk_organisations_participants_2021_long_2aug21.csv \n",
|
||
"Downloading...\n",
|
||
"From: https://drive.google.com/uc?id=1UBsF3RBMvjF7dBb1PG-wXhEBv3whoMLG\n",
|
||
"To: /content/uk_organisations_participants_2021_long_7sep21_dpi_10000iter.zip\n",
|
||
"74.7MB [00:00, 121MB/s] \n",
|
||
"Archive: uk_organisations_participants_2021_long_7sep21_dpi_10000iter.zip\n",
|
||
" inflating: uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv \n"
|
||
]
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "7GgWACrsmkM3"
|
||
},
|
||
"source": [
|
||
"### Import without downloading the data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "6dTPRzNvMLzP"
|
||
},
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"import scipy\n",
|
||
"from os.path import join\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"import scipy.sparse as sp\n",
|
||
"from scipy.sparse.linalg import eigs\n",
|
||
"import numpy as np\n",
|
||
"from itertools import combinations\n",
|
||
"import tqdm\n",
|
||
"import networkx as nx\n",
|
||
"import gc"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "879Esu8aV7VY"
|
||
},
|
||
"source": [
|
||
"# Data processing"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "ZiKLqZKfvOSX"
|
||
},
|
||
"source": [
|
||
"DPI = 1 # use dpi data\n",
|
||
"SH_MODE = 1 # only SH nodes are considered final holders\n",
|
||
"LEVELS = 20 # this parameter should be equal or exceed the number of \"onion layers\" in data. 20 is a bit of overkill for safety"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "pzZlT_OlUNcD"
|
||
},
|
||
"source": [
|
||
"if DPI:\n",
|
||
" name = 'uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv'\n",
|
||
" data = pd.read_csv(name, engine='python', dtype={'participant': str, 'entity': str})\n",
|
||
"else:\n",
|
||
" name = 'uk_organisations_participants_2021_long_2aug21.csv'\n",
|
||
" data = pd.read_csv(name, engine='python', dtype={'participant_id': str, 'company_number': str})"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "hGJzvLfvmtGZ"
|
||
},
|
||
"source": [
|
||
"### Primary processing"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "n_H7C2N_QZ-4",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/",
|
||
"height": 203
|
||
},
|
||
"outputId": "4fb79d1f-2ea3-4673-ba4f-758c7ca98dca"
|
||
},
|
||
"source": [
|
||
"data = data.dropna()\n",
|
||
"\n",
|
||
"if DPI:\n",
|
||
" data = data.astype({'dpi': float, 'participant':str, 'entity':str})\n",
|
||
" data = data.rename(columns={\"entity\": \"organisation_inn\",\n",
|
||
" 'participant': 'participant_id',\n",
|
||
" 'dpi': 'equity_share'})\n",
|
||
" \n",
|
||
" # columns are renamed for consistency with further code\n",
|
||
" data=data[['organisation_inn','participant_id','equity_share']]\n",
|
||
"\n",
|
||
"else:\n",
|
||
" data = data.astype({'equity_share': float, 'participant_id':str, 'company_number':str})\n",
|
||
" data = data.rename(columns={\"company_number\": \"organisation_inn\"})\n",
|
||
"\n",
|
||
"data = data[data['equity_share'] > 0]\n",
|
||
"data = data[data.participant_id != data.organisation_inn]\n",
|
||
"\n",
|
||
"# normalization of in-edge weights to 1\n",
|
||
"gdata = data.groupby('organisation_inn').sum().reset_index()\n",
|
||
"dict_companies = dict(gdata.values)\n",
|
||
"data['equity_share'] = data['equity_share']/np.array([dict_companies[num] for num in data['organisation_inn']])\n",
|
||
"\n",
|
||
"# finding SH and ST nodes\n",
|
||
"data['super_holder']=~pd.Series(data.participant_id).isin(data.organisation_inn)\n",
|
||
"data['super_target']=~pd.Series(data.organisation_inn).isin(data.participant_id)\n",
|
||
"\n",
|
||
"\n",
|
||
"data.head()"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "execute_result",
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>organisation_inn</th>\n",
|
||
" <th>participant_id</th>\n",
|
||
" <th>equity_share</th>\n",
|
||
" <th>super_holder</th>\n",
|
||
" <th>super_target</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>00000133</td>\n",
|
||
" <td>THE PENINSULAR AND ORIENTAL STEAM NAVIGATION C...</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>00000140</td>\n",
|
||
" <td>NICHOLLS & CLARKE LIMITED$NA$NA</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>00000295</td>\n",
|
||
" <td>COLIN$NA$WELLS$1967$2</td>\n",
|
||
" <td>0.4997</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>00000295</td>\n",
|
||
" <td>MOIRA$RUTH$SLEIGHT$1959$2</td>\n",
|
||
" <td>0.5003</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>00000371</td>\n",
|
||
" <td>DAVID$JOHN$ROWLAND$1945$6</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" organisation_inn ... super_target\n",
|
||
"0 00000133 ... True\n",
|
||
"1 00000140 ... True\n",
|
||
"2 00000295 ... True\n",
|
||
"3 00000295 ... True\n",
|
||
"4 00000371 ... True\n",
|
||
"\n",
|
||
"[5 rows x 5 columns]"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"execution_count": 5
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "XuR1-NTHm72S"
|
||
},
|
||
"source": [
|
||
"### Edges analysis"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "ZlCoK5TUW75I",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "1a8e1a01-ddae-435d-ae0d-15a81e8ea11a"
|
||
},
|
||
"source": [
|
||
"e = len(data)\n",
|
||
"e1 = len(data[(data['super_holder'] == True) & (data['super_target'] == True)])\n",
|
||
"e2 = len(data[(data['super_holder'] == True) & (data['super_target'] == False)])\n",
|
||
"e3 = len(data[(data['super_holder'] == False) & (data['super_target'] == True)])\n",
|
||
"e4 = len(data[(data['super_holder'] == False) & (data['super_target'] == False)])\n",
|
||
"print('total edges:', e)\n",
|
||
"print('SH -> ST edges', e1, '({0:.2f}%)'.format(e1/e*100))\n",
|
||
"print('SH -> ~ST edges', e2, '({0:.2f}%)'.format(e2/e*100))\n",
|
||
"print('~SH -> ST edges', e3, '({0:.2f}%)'.format(e3/e*100))\n",
|
||
"print('~SH -> ~ST edges', e4, '({0:.2f}%)'.format(e4/e*100))"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"total edges: 5096560\n",
|
||
"SH -> ST edges 4642420 (91.09%)\n",
|
||
"SH -> ~ST edges 151849 (2.98%)\n",
|
||
"~SH -> ST edges 256996 (5.04%)\n",
|
||
"~SH -> ~ST edges 45295 (0.89%)\n"
|
||
]
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "w9wQh6jRWHOM"
|
||
},
|
||
"source": [
|
||
"# Network analysis"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "5tiXjWcWnAuv"
|
||
},
|
||
"source": [
|
||
"### Core creation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "gvnRCxRAdppC",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "4acd7f4c-fb1d-473d-8d04-c1d3c087e394"
|
||
},
|
||
"source": [
|
||
"print('pruning SH and ST nodes of the external layer...')\n",
|
||
"rdata = data.loc[(data['super_holder'] == False) & (data['super_target'] == False)]\n",
|
||
"edges_left = len(rdata)\n",
|
||
"print('layer 1:', edges_left, 'edges left')\n",
|
||
"\n",
|
||
"crdata = rdata.copy()\n",
|
||
"\n",
|
||
"for i in range(1, LEVELS):\n",
|
||
" is_sh = ~pd.Series(rdata.participant_id).isin(rdata.organisation_inn)\n",
|
||
" current_unique_sh = pd.Series(rdata.loc[is_sh == True].participant_id.value_counts().keys())\n",
|
||
" print('current SH', len(current_unique_sh))\n",
|
||
" curr_sh_prop_name = 'is_level_' + str(i) + '_SH'\n",
|
||
" crdata[curr_sh_prop_name] = pd.Series(crdata.participant_id).isin(current_unique_sh)\n",
|
||
"\n",
|
||
" is_st = ~pd.Series(rdata.organisation_inn).isin(rdata.participant_id)\n",
|
||
" current_unique_st = pd.Series(rdata.loc[is_st == True].organisation_inn.value_counts().keys())\n",
|
||
" print('current ST', len(current_unique_st))\n",
|
||
" curr_st_prop_name = 'is_level_' + str(i) + '_ST'\n",
|
||
" crdata[curr_st_prop_name] = pd.Series(crdata.organisation_inn).isin(current_unique_st)\n",
|
||
"\n",
|
||
" rdata = rdata.loc[(is_sh == False) & (is_st == False)]\n",
|
||
" print('layer {}: {} edges left'.format(i+1, len(rdata)))\n",
|
||
"\n",
|
||
"data_core = rdata.copy()\n",
|
||
"core_holders = rdata.participant_id.value_counts()\n",
|
||
"print('unique core holders:', len(core_holders))\n",
|
||
"core_targets = rdata.organisation_inn.value_counts()\n",
|
||
"print('unique core targets:', len(core_targets))"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"pruning SH and ST nodes of the external layer...\n",
|
||
"layer 1: 45295 edges left\n",
|
||
"current SH 17777\n",
|
||
"current ST 28628\n",
|
||
"layer 2: 9102 edges left\n",
|
||
"current SH 2467\n",
|
||
"current ST 3981\n",
|
||
"layer 3: 3489 edges left\n",
|
||
"current SH 874\n",
|
||
"current ST 1220\n",
|
||
"layer 4: 1626 edges left\n",
|
||
"current SH 348\n",
|
||
"current ST 416\n",
|
||
"layer 5: 964 edges left\n",
|
||
"current SH 149\n",
|
||
"current ST 167\n",
|
||
"layer 6: 685 edges left\n",
|
||
"current SH 65\n",
|
||
"current ST 67\n",
|
||
"layer 7: 578 edges left\n",
|
||
"current SH 29\n",
|
||
"current ST 27\n",
|
||
"layer 8: 527 edges left\n",
|
||
"current SH 14\n",
|
||
"current ST 14\n",
|
||
"layer 9: 505 edges left\n",
|
||
"current SH 3\n",
|
||
"current ST 3\n",
|
||
"layer 10: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 11: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 12: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 13: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 14: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 15: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 16: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 17: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 18: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 19: 502 edges left\n",
|
||
"current SH 0\n",
|
||
"current ST 0\n",
|
||
"layer 20: 502 edges left\n",
|
||
"unique core holders: 498\n",
|
||
"unique core targets: 498\n"
|
||
]
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "ycGL0vJwnIBG"
|
||
},
|
||
"source": [
|
||
"### Classification of the nodes"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "Pq37803BGnPf",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "84c7e51d-1d23-4f78-ff69-310013ae7526"
|
||
},
|
||
"source": [
|
||
"super_holders, sh_counts=np.unique(data[data['super_holder']==True].participant_id, return_counts=True)\n",
|
||
"super_targets, st_counts=np.unique(data[data['super_target']==True].organisation_inn, return_counts=True)\n",
|
||
"print('SH', len(super_holders))\n",
|
||
"print('ST', len(super_targets))\n",
|
||
"\n",
|
||
"# core members\n",
|
||
"core_inn = np.array(data_core.participant_id.value_counts().index)\n",
|
||
"print('Core', len(core_inn))\n",
|
||
"\n",
|
||
"# intermediaries\n",
|
||
"not_super_holders=list(data[(data['super_holder']==False)].participant_id)\n",
|
||
"not_super_targets=list(data[(data['super_target']==False)].organisation_inn)\n",
|
||
"not_super=np.array(list(set(not_super_holders+not_super_targets)))\n",
|
||
"inter = not_super[np.isin(not_super, core_inn)==False]\n",
|
||
"print('Intermediaries', len(inter))\n",
|
||
"\n",
|
||
"\n",
|
||
"# create a dataframe\n",
|
||
"firms_sh = pd.DataFrame({'company_number/id': super_holders, 'type': np.array(['SH']*len(super_holders))})\n",
|
||
"firms_st = pd.DataFrame({'company_number/id': super_targets, 'type': np.array(['ST']*len(super_targets))})\n",
|
||
"firms_core = pd.DataFrame({'company_number/id': core_inn, 'type': np.array(['C']*len(core_inn))})\n",
|
||
"firms_inter = pd.DataFrame({'company_number/id': inter, 'type': np.array(['I']*len(inter))})\n",
|
||
"\n",
|
||
"dst = pd.concat([firms_sh, firms_st, firms_core, firms_inter]).reset_index().drop(['index'], axis=1)\n",
|
||
"\n",
|
||
"assert len(list(set(list(set(data.participant_id))+list(set(data.organisation_inn)))))==len(dst)\n",
|
||
"\n",
|
||
"dst.to_csv('dst_british.csv', encoding='utf-8-sig')\n"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"SH 3770439\n",
|
||
"ST 4047325\n",
|
||
"Core 498\n",
|
||
"Intermediaries 151803\n"
|
||
]
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "RUwpwzHunNDo"
|
||
},
|
||
"source": [
|
||
"### Isolates detection"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"id": "kqq_uMv2J-nT",
|
||
"outputId": "23ce817b-bf9f-45ea-acbb-bdcd80e7604f"
|
||
},
|
||
"source": [
|
||
"all_isolates = {i: [] for i in range(1, LEVELS)}\n",
|
||
"\n",
|
||
"data_without_sh = data.loc[(data['super_holder'] == False)]\n",
|
||
"new_sh = ~pd.Series(data_without_sh.participant_id).isin(data_without_sh.organisation_inn)\n",
|
||
"new_unique_sh = data_without_sh.loc[new_sh == True].participant_id.value_counts().keys().values\n",
|
||
"#print('new unique sh', len(new_unique_sh))\n",
|
||
"\n",
|
||
"data_without_st = data.loc[(data['super_target'] == False)]\n",
|
||
"new_st = ~pd.Series(data_without_st.organisation_inn).isin(data_without_st.participant_id)\n",
|
||
"new_unique_st = data_without_st.loc[new_st == True].organisation_inn.value_counts().keys().values\n",
|
||
"#print('new unique st', len(new_unique_st))\n",
|
||
"\n",
|
||
"isolates = list(set(new_unique_sh).intersection(set(new_unique_st)))\n",
|
||
"print('isolates of layer 1:', len(isolates))\n",
|
||
"all_isolates[1] = isolates\n",
|
||
"\n",
|
||
"\n",
|
||
"rdata = data.loc[(data['super_holder'] == False) & (data['super_target'] == False)]\n",
|
||
"for i in range(1, LEVELS):\n",
|
||
" is_sh = ~pd.Series(rdata.participant_id).isin(rdata.organisation_inn)\n",
|
||
" is_st = ~pd.Series(rdata.organisation_inn).isin(rdata.participant_id)\n",
|
||
" rdata_without_sh = rdata.loc[is_sh == False]\n",
|
||
" rdata_without_st = rdata.loc[is_st == False]\n",
|
||
"\n",
|
||
" new_sh = ~pd.Series(rdata_without_sh.participant_id).isin(rdata_without_sh.organisation_inn)\n",
|
||
" new_st = ~pd.Series(rdata_without_st.organisation_inn).isin(rdata_without_st.participant_id)\n",
|
||
"\n",
|
||
" new_unique_sh = rdata_without_sh.loc[new_sh == True].participant_id.value_counts().keys().values\n",
|
||
" new_unique_st = rdata_without_st.loc[new_st == True].organisation_inn.value_counts().keys().values\n",
|
||
"\n",
|
||
" isolates = list(set(new_unique_sh).intersection(set(new_unique_st)))\n",
|
||
" print('isolates of layer {}: {}'.format(i+1, len(isolates)))\n",
|
||
" all_isolates[i+1] = isolates\n",
|
||
"\n",
|
||
" rdata = rdata.loc[(is_sh == False) & (is_st == False)]"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"isolates of layer 1: 91167\n",
|
||
"isolates of layer 2: 3338\n",
|
||
"isolates of layer 3: 635\n",
|
||
"isolates of layer 4: 271\n",
|
||
"isolates of layer 5: 77\n",
|
||
"isolates of layer 6: 43\n",
|
||
"isolates of layer 7: 11\n",
|
||
"isolates of layer 8: 7\n",
|
||
"isolates of layer 9: 5\n",
|
||
"isolates of layer 10: 0\n",
|
||
"isolates of layer 11: 0\n",
|
||
"isolates of layer 12: 0\n",
|
||
"isolates of layer 13: 0\n",
|
||
"isolates of layer 14: 0\n",
|
||
"isolates of layer 15: 0\n",
|
||
"isolates of layer 16: 0\n",
|
||
"isolates of layer 17: 0\n",
|
||
"isolates of layer 18: 0\n",
|
||
"isolates of layer 19: 0\n",
|
||
"isolates of layer 20: 0\n"
|
||
]
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "ycHsNTtvuLmh"
|
||
},
|
||
"source": [
|
||
"coredata = crdata.copy()\n",
|
||
"coredata['SH_level'] = np.nan\n",
|
||
"coredata['ST_level'] = np.nan\n",
|
||
"\n",
|
||
"drop_cols = []\n",
|
||
"for i in range(1,LEVELS):\n",
|
||
" curr_sh_prop_name = 'is_level_' + str(i) + '_SH'\n",
|
||
" curr_st_prop_name = 'is_level_' + str(i) + '_ST'\n",
|
||
" coredata.loc[coredata[curr_sh_prop_name] == True, 'SH_level'] = i+1\n",
|
||
" coredata.loc[coredata[curr_st_prop_name] == True, 'ST_level'] = i+1\n",
|
||
" drop_cols.extend([curr_sh_prop_name, curr_st_prop_name])\n",
|
||
"\n",
|
||
"coredata.drop(columns=drop_cols, inplace = True)"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/",
|
||
"height": 417
|
||
},
|
||
"id": "k7O3gIGWE7pc",
|
||
"outputId": "ef7fc413-cb65-462e-90f6-e08d61308062"
|
||
},
|
||
"source": [
|
||
"coredata"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "execute_result",
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>organisation_inn</th>\n",
|
||
" <th>participant_id</th>\n",
|
||
" <th>equity_share</th>\n",
|
||
" <th>super_holder</th>\n",
|
||
" <th>super_target</th>\n",
|
||
" <th>SH_level</th>\n",
|
||
" <th>ST_level</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>00000529</td>\n",
|
||
" <td>05995030</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>00000866</td>\n",
|
||
" <td>05253545</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>11</th>\n",
|
||
" <td>00000950</td>\n",
|
||
" <td>03526047</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13</th>\n",
|
||
" <td>00001160</td>\n",
|
||
" <td>06452679</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>4.0</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>19</th>\n",
|
||
" <td>00001419</td>\n",
|
||
" <td>05282342</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5256244</th>\n",
|
||
" <td>SO307293</td>\n",
|
||
" <td>SC299736</td>\n",
|
||
" <td>0.4969</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>4.0</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5256255</th>\n",
|
||
" <td>SO307300</td>\n",
|
||
" <td>03942880</td>\n",
|
||
" <td>0.5069</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5256256</th>\n",
|
||
" <td>SO307300</td>\n",
|
||
" <td>11677818</td>\n",
|
||
" <td>0.4931</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5256265</th>\n",
|
||
" <td>SO307305</td>\n",
|
||
" <td>09431213</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5256338</th>\n",
|
||
" <td>ZC000195</td>\n",
|
||
" <td>01612178</td>\n",
|
||
" <td>1.0000</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>45295 rows × 7 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" organisation_inn participant_id ... SH_level ST_level\n",
|
||
"7 00000529 05995030 ... 3.0 NaN\n",
|
||
"10 00000866 05253545 ... NaN 2.0\n",
|
||
"11 00000950 03526047 ... 2.0 2.0\n",
|
||
"13 00001160 06452679 ... 4.0 2.0\n",
|
||
"19 00001419 05282342 ... 3.0 NaN\n",
|
||
"... ... ... ... ... ...\n",
|
||
"5256244 SO307293 SC299736 ... 4.0 2.0\n",
|
||
"5256255 SO307300 03942880 ... 2.0 2.0\n",
|
||
"5256256 SO307300 11677818 ... 2.0 2.0\n",
|
||
"5256265 SO307305 09431213 ... 3.0 2.0\n",
|
||
"5256338 ZC000195 01612178 ... NaN 3.0\n",
|
||
"\n",
|
||
"[45295 rows x 7 columns]"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"execution_count": 11
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "MkNwGFp14WXE"
|
||
},
|
||
"source": [
|
||
"# Full adjacency matrix calculation\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"id": "1MsJj0bWdYIn",
|
||
"outputId": "aa88a7c1-b556-443f-b63c-f9cbc066c7ff"
|
||
},
|
||
"source": [
|
||
"all_holders=data.participant_id.value_counts()\n",
|
||
"all_targets=data.organisation_inn.value_counts()\n",
|
||
"all_nodes = list(set(all_holders.keys()) | set(all_targets.keys()))\n",
|
||
"sorted(all_nodes)\n",
|
||
"node_inds = dict(zip(all_nodes, range(len(all_nodes))))\n",
|
||
"reverse_node_inds = dict(zip(range(len(all_nodes)), all_nodes))\n",
|
||
"print('all nodes:', len(all_nodes))\n"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"all nodes: 7970065\n"
|
||
]
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "VJjcD-cunrX0"
|
||
},
|
||
"source": [
|
||
"### Transitive ownership calculation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "JR6mpUUn3U9r"
|
||
},
|
||
"source": [
|
||
"def get_trans_ownership(alpha):\n",
|
||
"\n",
|
||
" print('Started calculations for alpha = {}'.format(alpha))\n",
|
||
" print('Preparing adjacency matrix...')\n",
|
||
" ADJ = sp.lil_matrix((len(all_nodes), len(all_nodes)), dtype = np.float32)\n",
|
||
"\n",
|
||
" print('non-zero elements:', ADJ.nnz)\n",
|
||
" # -------------------------- 1 - Superholders --------------------------------\n",
|
||
" print('Processing intermediate layers superholders...')\n",
|
||
" unique_curr_lvl_sh = {}\n",
|
||
" for i in range(1, LEVELS):\n",
|
||
" big_core_nodes_of_curr_level = coredata[coredata['SH_level'] == i].participant_id.value_counts().keys().values\n",
|
||
" isolates_of_curr_level = all_isolates[i]\n",
|
||
" all_sh_of_curr_level = np.concatenate((big_core_nodes_of_curr_level, isolates_of_curr_level))\n",
|
||
" unique_curr_lvl_sh[i] = all_sh_of_curr_level\n",
|
||
" \n",
|
||
" edge_ends_at_curr_lvl_sh = {i: pd.Series(data.organisation_inn).isin(unique_curr_lvl_sh[i]) for i in range(1,LEVELS)}\n",
|
||
" edges_of_interest_inter_sh = {i: data[edge_ends_at_curr_lvl_sh[i] == True] for i in range(1,LEVELS)}\n",
|
||
"\n",
|
||
" for lvl in range(1, LEVELS):\n",
|
||
" #print('processing layer {} super holders'.format(lvl))\n",
|
||
" edges = edges_of_interest_inter_sh[lvl]\n",
|
||
" shares = edges.equity_share.values\n",
|
||
" holders = edges.participant_id.values\n",
|
||
" curr_lvl_super_holders = edges.organisation_inn.values\n",
|
||
" \n",
|
||
" holders_pos_in_adj = np.array([node_inds[h] for h in holders])\n",
|
||
" curr_lvl_sh_pos_in_adj = np.array([node_inds[sh] for sh in curr_lvl_super_holders])\n",
|
||
"\n",
|
||
" target_inds_from_prev_level = np.array([])\n",
|
||
" holder_inds_from_prev_level = np.array([])\n",
|
||
" shares_from_prev_level = np.array([])\n",
|
||
"\n",
|
||
" # no need to define zero-level SH since they have zero ownership vectors\n",
|
||
" target_inds_from_prev_level, holder_inds_from_prev_level = ADJ[holders_pos_in_adj, :].nonzero()\n",
|
||
" shares_from_prev_level = ADJ[holders_pos_in_adj[target_inds_from_prev_level],\n",
|
||
" holder_inds_from_prev_level].A[0]\n",
|
||
" \n",
|
||
" for i, share in enumerate(shares):\n",
|
||
" ADJ[curr_lvl_sh_pos_in_adj[i], holders_pos_in_adj[i]] = share\n",
|
||
"\n",
|
||
" for i, prev_share in enumerate(shares_from_prev_level):\n",
|
||
" ADJ[curr_lvl_sh_pos_in_adj[target_inds_from_prev_level][i],\n",
|
||
" holder_inds_from_prev_level[i]] += prev_share*alpha\n",
|
||
" \n",
|
||
" print('non-zero elements:', ADJ.nnz)\n",
|
||
"\n",
|
||
" # --------------------------- 2 - Core construction ------------------------------------------\n",
|
||
" core_nodes = list(set(core_holders.keys()) | set(core_targets.keys()))\n",
|
||
" sorted(core_nodes)\n",
|
||
" core_node_inds = dict(zip(core_nodes, range(len(core_nodes))))\n",
|
||
" global_reverse_core_node_inds = dict(zip(range(len(core_nodes)), [node_inds[cnode] for cnode in core_nodes]))\n",
|
||
"\n",
|
||
" core_shares = data_core.equity_share.values.astype(float)\n",
|
||
" core_row_inds = np.array([core_node_inds[ch] for ch in data_core.participant_id.values])\n",
|
||
" core_col_inds = np.array([core_node_inds[ct] for ct in data_core.organisation_inn.values])\n",
|
||
"\n",
|
||
" W = sp.coo_matrix((core_shares, (core_row_inds, core_col_inds))).tocsc()\n",
|
||
" G = W.dot(sp.linalg.inv(sp.eye(W.shape[0]).tocsc() - alpha*W)) # precise transitivity matrix calculation for core\n",
|
||
"\n",
|
||
"\n",
|
||
" # --------------------- 3.1 - edges projecting from super-holders of previous levels to core--------------\n",
|
||
" print('Adding information from core...')\n",
|
||
" edge_ends_at_core = pd.Series(data.organisation_inn).isin(pd.Series(core_nodes))\n",
|
||
" edges_of_interest_core = data[edge_ends_at_core == True]\n",
|
||
" curr_core_nodes = edges_of_interest_core.organisation_inn.values\n",
|
||
" ext_holders = edges_of_interest_core.participant_id.values\n",
|
||
" ext_shares = edges_of_interest_core.equity_share.values\n",
|
||
"\n",
|
||
" ext_holders_pos_in_adj = np.array([node_inds[h] for h in ext_holders])\n",
|
||
" curr_core_nodes_pos_in_adj = np.array([node_inds[sh] for sh in curr_core_nodes])\n",
|
||
"\n",
|
||
" target_inds, ext_holder_inds = ADJ[ext_holders_pos_in_adj, :].nonzero()\n",
|
||
" shares_from_sh_to_core = ADJ[ext_holders_pos_in_adj[target_inds], ext_holder_inds].A[0]\n",
|
||
"\n",
|
||
" for i, share in enumerate(ext_shares):\n",
|
||
" ADJ[curr_core_nodes_pos_in_adj[i], ext_holders_pos_in_adj[i]] = share\n",
|
||
"\n",
|
||
" for i, prev_share in enumerate(shares_from_sh_to_core):\n",
|
||
" ADJ[curr_core_nodes_pos_in_adj[target_inds][i], ext_holder_inds[i]] += prev_share*alpha\n",
|
||
"\n",
|
||
"\n",
|
||
" # ------------------------------------- 3.2 edges inside the core ------------------------------------\n",
|
||
" inside_core_targets = np.array([global_reverse_core_node_inds[ch] for ch in G.nonzero()[1]])\n",
|
||
" inside_core_holders = np.array([global_reverse_core_node_inds[ch] for ch in G.nonzero()[0]])\n",
|
||
" inside_core_shares = G.data\n",
|
||
"\n",
|
||
" target_inds_core, holder_inds_core = ADJ[inside_core_holders, :].nonzero()\n",
|
||
" shares_from_prev_level = ADJ[inside_core_holders[target_inds_core], holder_inds_core].A[0]\n",
|
||
"\n",
|
||
" for i, share in enumerate(inside_core_shares):\n",
|
||
" ADJ[inside_core_targets[i], inside_core_holders[i]] = share\n",
|
||
"\n",
|
||
" for i, prev_share in enumerate(shares_from_prev_level):\n",
|
||
" ADJ[inside_core_targets[target_inds_core][i], holder_inds_core[i]] += prev_share*alpha\n",
|
||
"\n",
|
||
" print('non-zero elements:', ADJ.nnz)\n",
|
||
"\n",
|
||
"\n",
|
||
" # -------------------------------- 4 Supertargets of internal levels ----------------------------------\n",
|
||
" print('Processing intermediate layers supertargets...')\n",
|
||
" unique_curr_lvl_st = {}\n",
|
||
" for i in range(1,LEVELS):\n",
|
||
" big_core_nodes_of_curr_level = coredata[coredata['ST_level'] == i].organisation_inn.value_counts().keys().values\n",
|
||
" isolates_of_curr_level = all_isolates[i]\n",
|
||
" all_st_of_curr_level = np.concatenate((big_core_nodes_of_curr_level, isolates_of_curr_level))\n",
|
||
" #print(len(all_st_of_curr_level))\n",
|
||
" unique_curr_lvl_st[i] = all_st_of_curr_level\n",
|
||
"\n",
|
||
" edge_ends_at_curr_lvl_st = {i: pd.Series(data.organisation_inn).isin(unique_curr_lvl_st[i]) for i in range(1,LEVELS)}\n",
|
||
" edges_of_interest_inter_st = {i: data[edge_ends_at_curr_lvl_st[i] == True] for i in range(1,LEVELS)}\n",
|
||
"\n",
|
||
" for lvl in range(LEVELS-1, 0, -1):\n",
|
||
" #print('processing layer {} ST'.format(lvl))\n",
|
||
"\n",
|
||
" edges = edges_of_interest_inter_st[lvl]\n",
|
||
" shares = edges.equity_share.values\n",
|
||
" holders = edges.participant_id.values\n",
|
||
" curr_lvl_super_targets = edges.organisation_inn.values\n",
|
||
" \n",
|
||
" holders_pos_in_adj = np.array([node_inds[h] for h in holders])\n",
|
||
" curr_lvl_st_pos_in_adj = np.array([node_inds[sh] for sh in curr_lvl_super_targets])\n",
|
||
"\n",
|
||
" target_inds_from_prev_level, holder_inds_from_prev_level = ADJ[holders_pos_in_adj, :].nonzero()\n",
|
||
" shares_from_prev_level = ADJ[holders_pos_in_adj[target_inds_from_prev_level],\n",
|
||
" holder_inds_from_prev_level].A[0]\n",
|
||
" \n",
|
||
" #for i, share in enumerate(shares):\n",
|
||
" ADJ[curr_lvl_st_pos_in_adj, holders_pos_in_adj] = shares\n",
|
||
" \n",
|
||
" for i, prev_share in enumerate(shares_from_prev_level):\n",
|
||
" ADJ[curr_lvl_st_pos_in_adj[target_inds_from_prev_level][i],\n",
|
||
" holder_inds_from_prev_level[i]] += prev_share*alpha\n",
|
||
" \n",
|
||
" print('non-zero elements:', ADJ.nnz)\n",
|
||
"\n",
|
||
" # ------------------------------------ 5 - Supertargets ----------------------------------------\n",
|
||
" print('Processing supertargets...')\n",
|
||
" edges_of_interest_st = data[data['super_target'] == True]\n",
|
||
" shares = edges_of_interest_st.equity_share.values\n",
|
||
" holders = edges_of_interest_st.participant_id.values\n",
|
||
" super_targets = edges_of_interest_st.organisation_inn.values\n",
|
||
"\n",
|
||
" holders_pos_in_adj = np.array([node_inds[h] for h in holders])\n",
|
||
" st_pos_in_adj = np.array([node_inds[sh] for sh in super_targets])\n",
|
||
"\n",
|
||
" target_inds_from_prev_level, holder_inds_from_prev_level = ADJ[holders_pos_in_adj, :].nonzero()\n",
|
||
" shares_from_prev_level = ADJ[holders_pos_in_adj[target_inds_from_prev_level],\n",
|
||
" holder_inds_from_prev_level].A[0]\n",
|
||
" \n",
|
||
" #for i, share in enumerate(shares):\n",
|
||
" ADJ[st_pos_in_adj, holders_pos_in_adj] = shares\n",
|
||
" \n",
|
||
" # since we will not perform matrix slicing anymore, we may construct an additional\n",
|
||
" # coo matrix and sum it up with the main one:\n",
|
||
" ST_ADJ = sp.coo_matrix((shares_from_prev_level*alpha, (st_pos_in_adj[target_inds_from_prev_level], holder_inds_from_prev_level)),\n",
|
||
" shape = (len(all_nodes), len(all_nodes)))\n",
|
||
" \n",
|
||
" FINAL_ADJ = ST_ADJ.tolil() + ADJ\n",
|
||
"\n",
|
||
" print('non-zero elements:', FINAL_ADJ.nnz)\n",
|
||
"\n",
|
||
" # ------------------------------ 6 - Conversion to csv -----------------------------------------\n",
|
||
" all_trans_targets, all_trans_holders = FINAL_ADJ.nonzero()\n",
|
||
" all_trans_shares = FINAL_ADJ[all_trans_targets, all_trans_holders].A[0]\n",
|
||
" trans_data = pd.DataFrame()\n",
|
||
"\n",
|
||
" trans_data['company_number'] = [reverse_node_inds[tt] for tt in all_trans_targets]\n",
|
||
" trans_data['participant_id'] = [reverse_node_inds[th] for th in all_trans_holders]\n",
|
||
" trans_data['share'] = all_trans_shares\n",
|
||
"\n",
|
||
" print('Final dataframe has {} rows'.format(len(all_trans_shares)))\n",
|
||
"\n",
|
||
" del ADJ\n",
|
||
" del FINAL_ADJ\n",
|
||
" del ST_ADJ\n",
|
||
"\n",
|
||
" gc.collect()\n",
|
||
" \n",
|
||
" return trans_data\n"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "Hu1mhRopnTwW"
|
||
},
|
||
"source": [
|
||
"### Launcher"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "T92SOjxhJxVr",
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"outputId": "945773ba-6859-4bbf-c926-172226312add"
|
||
},
|
||
"source": [
|
||
"# Note that you may need to change path to the data in CORP_PATH\n",
|
||
"from google.colab import drive\n",
|
||
"drive.mount('/content/drive/', force_remount=True)\n",
|
||
"\n",
|
||
"if DPI:\n",
|
||
" CORP_PATH = '/content/drive/My Drive/Colab Notebooks/DataRoot/Corporate control/British data DPI'\n",
|
||
"else:\n",
|
||
" CORP_PATH = '/content/drive/My Drive/Colab Notebooks/DataRoot/Corporate control/British data'\n",
|
||
"\n",
|
||
"alphalist = [_/10.0 for _ in range(10)] + [0.999]\n",
|
||
"#alphalist = [0.999]\n",
|
||
"\n",
|
||
"for alpha in alphalist:\n",
|
||
" \n",
|
||
" trans_data = get_trans_ownership(alpha)\n",
|
||
"\n",
|
||
" if DPI:\n",
|
||
" dname = 'uk_organisations_transitive_ownership_alpha{}_2021_long_7sep21_dpi_10000iter'.format(alpha)\n",
|
||
" else:\n",
|
||
" dname = 'uk_organisations_transitive_ownership_alpha{}_2021_long_2aug21'.format(alpha)\n",
|
||
"\n",
|
||
" if SH_MODE:\n",
|
||
" dname = dname + '_SH_only.csv'\n",
|
||
" else:\n",
|
||
" dname = dname + '.csv'\n",
|
||
" \n",
|
||
" # if only SH should be left as holders, we should filter out all other holding entities:\n",
|
||
" if SH_MODE: \n",
|
||
" final_super_holder = ~pd.Series(trans_data.participant_id).isin(trans_data.company_number) \n",
|
||
" # the set of SH should coincide with the same set in the original data\n",
|
||
" trans_data = trans_data[final_super_holder == True]\n",
|
||
"\n",
|
||
" trans_data.to_csv(join(CORP_PATH, dname), index=False)"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
"Mounted at /content/drive/\n",
|
||
"Started calculations for alpha = 0.0\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 158125\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 158722\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 197144\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5096560\n",
|
||
"Final dataframe has 5096560 rows\n",
|
||
"Started calculations for alpha = 0.1\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.2\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.3\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.4\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.5\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.6\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.7\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.8\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.9\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n",
|
||
"Started calculations for alpha = 0.999\n",
|
||
"Preparing adjacency matrix...\n",
|
||
"non-zero elements: 0\n",
|
||
"Processing intermediate layers superholders...\n",
|
||
"non-zero elements: 165495\n",
|
||
"Adding information from core...\n",
|
||
"non-zero elements: 166851\n",
|
||
"Processing intermediate layers supertargets...\n",
|
||
"non-zero elements: 294304\n",
|
||
"Processing supertargets...\n",
|
||
"non-zero elements: 5694581\n",
|
||
"Final dataframe has 5694581 rows\n"
|
||
]
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "IvJtOjirn0cJ"
|
||
},
|
||
"source": [
|
||
"# Return top-*k* holders\n",
|
||
"\n",
|
||
"Finally, having prepared all the data on transitive ownership, we define a function that takes a company name and returns its top-*k* super-holders."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"id": "9dR0W06V5id1"
|
||
},
|
||
"source": [
|
||
"def get_top_k_holders(target, trans_data, k):\n",
|
||
"\n",
|
||
" rel_data = trans_data[trans_data.company_number == target]\n",
|
||
" top_k_data = rel_data.sort_values(by='share', ascending = False, ignore_index = True)[:k]\n",
|
||
"\n",
|
||
" # normalization of shares weights to 1\n",
|
||
" top_k_data.share = top_k_data.share/top_k_data.sum().share\n",
|
||
" \n",
|
||
" print(top_k_data)\n"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "XY38QTptoIhu"
|
||
},
|
||
"source": [
|
||
"An example with a random company:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"id": "VP9INrGh6i7g",
|
||
"outputId": "f5109ffb-58dc-4281-f22d-7ad88ffeaa62"
|
||
},
|
||
"source": [
|
||
"test_name = '06853998'\n",
|
||
"get_top_k_holders(test_name, trans_data, 2)\n"
|
||
],
|
||
"execution_count": null,
|
||
"outputs": [
|
||
{
|
||
"output_type": "stream",
|
||
"name": "stdout",
|
||
"text": [
|
||
" company_number participant_id share\n",
|
||
"0 06853998 MICHAEL$NA$GREVILLE$1962$2 0.500752\n",
|
||
"1 06853998 PETER$CHARLES$DE HAAN$1952$3 0.499248\n"
|
||
]
|
||
}
|
||
]
|
||
}
|
||
]
|
||
} |