Skip to content

Commit 81793bc

Browse files
authored
Merge pull request #5 from Senzing/issue-4.dockter.1
#4 Refactor to template
2 parents db29245 + 59042cb commit 81793bc

5 files changed

Lines changed: 29 additions & 32 deletions

File tree

.project

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
<?xml version="1.0" encoding="UTF-8"?>
22
<projectDescription>
3-
<name>mapper-dnb-ubo</name>
3+
<name>mapper-dnb</name>
44
</projectDescription>

CHANGELOG.md

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
[markdownlint](https://dlaa.me/markdownlint/),
77
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
88

9-
## [Unreleased]
10-
11-
- Thing 5
12-
- Thing 4
13-
14-
## [1.0.1] - yyyy-mm-dd
15-
16-
### Added to 1.0.1
17-
18-
- Thing 3
19-
20-
### Fixed in 1.0.1
21-
22-
- Thing 2
23-
249
## [1.0.0] - yyyy-mm-dd
2510

2611
### Added to 1.0.0
2712

28-
- Thing 2
29-
- Thing 1
13+
- Initial content

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ The variables are used throughout the installation procedure.
8383

8484
```console
8585
export GIT_ACCOUNT=senzing
86-
export GIT_REPOSITORY=senzing-repository-template
86+
export GIT_REPOSITORY=mapper-dnb
8787
```
8888

8989
Synthesize environment variables.

PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Which issue does this address
44

5-
ISSUE-???
5+
Issue number: #nnn
66

77
## Why was change needed
88

README.md

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,20 @@
33
## Overview
44

55
The [dnb_mapper.py](dnb_mapper.py) python script converts Dun & Bradstreet (DNB) files to json files ready to load into Senzing. This includes the following formats ...
6+
67
- Companies and their principles **(CMPCVF)** json format
78
- Global contacts **(GCA)** tab delimited csv format
89
- Ultimate beneficial owners **(UBO)** tab delinited csv format
910

10-
Normally these are provided by DNB on request and placed on an FTP server for you to download.
11+
Normally these are provided by DNB on request and placed on an FTP server for you to download.
1112

1213
*Warning: the [dnb_formats.json](dnb_formats.json) file contains the exact structure of these files. You may need to send these formats to DNB so they know exactly how to create them!*
1314

14-
Loading DNB data into Senzing requires additional features and configurations. These are contained in the
15+
Loading DNB data into Senzing requires additional features and configurations. These are contained in the
1516
[dnb_config_updates.json](dnb_config_updates.json) file.
1617

1718
Usage:
19+
1820
```console
1921
python3 dnb_mapper.py --help
2022
usage: dnb_mapper.py [-h] [-f DNB_FORMAT] [-i INPUT_SPEC] [-o OUTPUT_PATH]
@@ -35,30 +37,35 @@ optional arguments:
3537

3638
## Contents
3739

38-
1. [Prerequisites](#Prerequisites)
39-
2. [Installation](#Installation)
40-
3. [Configuring Senzing](#Configuring-Senzing)
41-
4. [Running the mapper](#Running-the-mapper)
42-
5. [Loading into Senzing](#Loading-into-Senzing)
40+
1. [Prerequisites](#prerequisites)
41+
1. [Installation](#installation)
42+
1. [Configuring Senzing](#configuring-senzing)
43+
1. [Running the mapper](#running-the-mapper)
44+
1. [Loading into Senzing](#loading-into-senzing)
4345

4446
### Prerequisites
47+
4548
- python 3.6 or higher
4649
- Senzing API version 1.7 or higher
47-
- https://github.com/Senzing/mapper-base
50+
- [Senzing/mapper-base](https://github.com/Senzing/mapper-base)
4851

4952
### Installation
5053

5154
Place the the following files on a directory of your choice ...
55+
5256
- [dnb_mapper.py](dnb_mapper.py)
5357
- [dnb_config_updates.json](dnb_config_updates.json)
5458
- [dnb_formats.json](dnb_formats.json)
5559

5660
*Note: Since the mapper-base project referenced above is required by this mapper, it is necessary to place them in a common directory structure like so ...*
61+
5762
```Console
5863
/senzing/mappers/mapper-base
5964
/senzing/mappers/mapper-dnb <--
6065
```
66+
6167
You will also need to set the PYTHONPATH to where the base mapper is as follows ... (assumuing the directory structure above)
68+
6269
```Console
6370
export PYTHONPATH=$PYTHONPATH:/senzing/mappers/mapper-base
6471
```
@@ -68,40 +75,46 @@ export PYTHONPATH=$PYTHONPATH:/senzing/mappers/mapper-base
6875
*Note:* This only needs to be performed one time! In fact you may want to add these configuration updates to a master configuration file for all your data sources.
6976

7077
From the /opt/senzing/g2/python directory ...
78+
7179
```console
7280
python3 G2ConfigTool.py <path-to-file>/dnb_config_updates.json
7381
```
82+
7483
This will step you through the process of adding the data sources, entity types, features, attributes and other settings needed to load this watch list data into Senzing. After each command you will see a status message saying "success" or "already exists". For instance, if you run the script twice, the second time through they will all say "already exists" which is OK.
7584

7685
Configuration updates include:
86+
7787
- addDataSource **DNB-COMPANY** used when when mapping companies from CMPCVF json files
7888
- addDataSource **DNB-PRINCIPLE** used when when mapping principles from CMPCVF json files
7989
- addDataSource **DNB-OWNER** used when when mapping owners from UBO csv files
8090
- addDataSource **DNB-CONTACT** used when when mapping contacts from GCA csv files
8191
- addEntityType **PERSON**
8292
- addEntityType **ORGANIZATION**
8393
- add features and attributes for ...
84-
- **DNB_OWNER_ID** This is used to help prevent owners from resolving to each other and so that you can search on it.
94+
- **DNB_OWNER_ID** This is used to help prevent owners from resolving to each other and so that you can search on it.
8595

8696
### Running the mapper
8797

8898
First, download the DNB files you want to load from the DNB FTP site. Since the data files are so large, these are normally split into multiple files.
8999

90100
Second, run the mapper. Example usage:
101+
91102
```console
92-
python3 dnb_mapper.py -f CMPCVF -i "./input/CMPCVF*.txt" -o ./output -l cmpcvf_stats.json
103+
python3 dnb_mapper.py -f CMPCVF -i "./input/CMPCVF*.txt" -o ./output -l cmpcvf_stats.json
93104

94-
python3 dnb_mapper.py -f GCA -i "./input/GCA*.txt" -o ./output -l gca_stats.json
105+
python3 dnb_mapper.py -f GCA -i "./input/GCA*.txt" -o ./output -l gca_stats.json
95106

96-
python3 dnb_mapper.py -f UBO -i "./input/UBO*.txt" -o ./output -l ubo_stats.json
107+
python3 dnb_mapper.py -f UBO -i "./input/UBO*.txt" -o ./output -l ubo_stats.json
97108
```
109+
98110
The output file defaults to the same name and location as the input file and a .json extension is added.
99111

100112
*It is critical that the -f file format match the input files exactly!*
101113

102114
### Loading into Senzing
103115

104116
If you use the G2Loader program to load your data, its best to list the mapped json files you want to load in a project file. There is an example of one in your senzing instalation here: /opt/senzing/g2/python/demo/sample/project.csv. Then from from the /opt/senzing/g2/python directory ...
117+
105118
```console
106119
python3 G2Loader.py -p <name of project file>
107120

0 commit comments

Comments
 (0)