You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-12Lines changed: 25 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,18 +3,20 @@
3
3
## Overview
4
4
5
5
The [dnb_mapper.py](dnb_mapper.py) python script converts Dun & Bradstreet (DNB) files to json files ready to load into Senzing. This includes the following formats ...
6
+
6
7
- Companies and their principles **(CMPCVF)** json format
7
8
- Global contacts **(GCA)** tab delimited csv format
8
9
- Ultimate beneficial owners **(UBO)** tab delinited csv format
9
10
10
-
Normally these are provided by DNB on request and placed on an FTP server for you to download.
11
+
Normally these are provided by DNB on request and placed on an FTP server for you to download.
11
12
12
13
*Warning: the [dnb_formats.json](dnb_formats.json) file contains the exact structure of these files. You may need to send these formats to DNB so they know exactly how to create them!*
13
14
14
-
Loading DNB data into Senzing requires additional features and configurations. These are contained in the
15
+
Loading DNB data into Senzing requires additional features and configurations. These are contained in the
*Note: Since the mapper-base project referenced above is required by this mapper, it is necessary to place them in a common directory structure like so ...*
61
+
57
62
```Console
58
63
/senzing/mappers/mapper-base
59
64
/senzing/mappers/mapper-dnb <--
60
65
```
66
+
61
67
You will also need to set the PYTHONPATH to where the base mapper is as follows ... (assumuing the directory structure above)
*Note:* This only needs to be performed one time! In fact you may want to add these configuration updates to a master configuration file for all your data sources.
This will step you through the process of adding the data sources, entity types, features, attributes and other settings needed to load this watch list data into Senzing. After each command you will see a status message saying "success" or "already exists". For instance, if you run the script twice, the second time through they will all say "already exists" which is OK.
75
84
76
85
Configuration updates include:
86
+
77
87
- addDataSource **DNB-COMPANY** used when when mapping companies from CMPCVF json files
78
88
- addDataSource **DNB-PRINCIPLE** used when when mapping principles from CMPCVF json files
79
89
- addDataSource **DNB-OWNER** used when when mapping owners from UBO csv files
80
90
- addDataSource **DNB-CONTACT** used when when mapping contacts from GCA csv files
81
91
- addEntityType **PERSON**
82
92
- addEntityType **ORGANIZATION**
83
93
- add features and attributes for ...
84
-
-**DNB_OWNER_ID** This is used to help prevent owners from resolving to each other and so that you can search on it.
94
+
-**DNB_OWNER_ID** This is used to help prevent owners from resolving to each other and so that you can search on it.
85
95
86
96
### Running the mapper
87
97
88
98
First, download the DNB files you want to load from the DNB FTP site. Since the data files are so large, these are normally split into multiple files.
The output file defaults to the same name and location as the input file and a .json extension is added.
99
111
100
112
*It is critical that the -f file format match the input files exactly!*
101
113
102
114
### Loading into Senzing
103
115
104
116
If you use the G2Loader program to load your data, its best to list the mapped json files you want to load in a project file. There is an example of one in your senzing instalation here: /opt/senzing/g2/python/demo/sample/project.csv. Then from from the /opt/senzing/g2/python directory ...
0 commit comments