Melissa Data Upload Excel List of Addresses Match Owner Name

In this commodity, we will talk briefly about data quality in SQL Server. Then, nosotros volition give a brief overview of the Melissa Data Quality for SQL Server Integration Services (SSIS), and we will demonstrate the components bachelor in the community edition.

Introduction

In general, data quality is the level of how much data fits its serving context. Enhancing the information quality is very critical since poor quality leads to inaccurate reporting and results in wrong decisions and inevitably economic damages. For this reason, each data management arrangement provided a bunch of tools used to meliorate the information quality level.

For SQL Server, many technologies can exist used to enhance data quality:

SQL Server Data Quality Services (DQS)

Data Quality Services is a knowledge-driven data quality characteristic adult by Microsoft and released in SQL Server 2012. Information technology tin can be installed from the SQL Server installation, and information technology provides different services, such as building a noesis base, data de-duplication, standardization.

To learn more almost this feature, you can refer to the following articles:

  • How to clean information using Information Quality Services and SQL Server Integration Services
  • How to utilize SQL Server Data Quality Services to ensure the correct aggregation of data
  • How to clean Master Data Services data using Information Quality Services in SQL Server

Using the well-known Microsoft SSIS Components

SQL Server Integration Services provides a bunch of components that tin can be used to assess and enhance data quality. These operations tin can exist performed at the control period level, such as data profiling and validation, or the information menstruation level using fuzzy lookups, conditional splits, derived columns, script component, and others.

Writing SQL Statements

One of the most pop data cleaning approaches is implementing your own logic using SQL statements, which is known as data wrangling. SQL Server provided a agglomeration of system functions that can exist used to ameliorate data quality.

Using tertiary-political party components

One of the most beautiful things in the Visual Studio IDE is that it is immune to develop tertiary-party components and integrate them within Microsoft products such equally SSIS. Many companies adult third-party SSIS components such as CDATA, Kingwaysoft, and COZYROC.

Regarding data quality, one of the most popular products in the market is Melissa data quality components for SQL Server.

Melissa Data Quality for SQL Server

Melissa data quality tools are a set of SSIS components that are used to clean and enrich data during the data transfer or integration process. 2 editions are bachelor:

  1. Enterprise edition: Commercial, contains a wide variety of data quality components and online services
  2. Community edition: Free, simply only a few components are available (check the link to a higher place)

In this article, we will exist talking nearly the customs edition, and we will briefly illustrate its components.

Download Melissa data quality customs edition

To download the Melissa data quality customs edition, you lot should navigate to the SQL Server editions page. Then, request a demo by filling the course located on the left side of the folio. And make sure to select the community edition.

Requesting Melissa data quality community edition

Figure 1 – SQL Server editions page

Afterwards requesting the demo, you will receive an email that contains a link for the web installer with a community license key.

Received email from Melissa

Figure ii – Received email

Now, you should download the spider web installer from the link you received. When finished, you should enter the Melissa license key during installation.

Melissa data quality license information form within installation

Figure 3 – License information asking during installation

When the installation is washed, the Melissa data quality components should appear within the SSIS toolbox (data flow level).

Melissa data quality components within SSIS toolbox

Figure four – Melissa data quality components within SSIS toolbox

If you add any of those components to the data menstruum, you will see the post-obit notification every fourth dimension you will try to open its editor.

Community edition notification

Effigy 5 – Customs edition notification

Equally mentioned in the Melissa SQL Server editions page, few features are bachelor in the customs edition:

  1. Contact Verify component: Simply accost parsing, name parsing, email correction, and phone formatting operations can be performed
  2. Profiler: Max 50000 records limit
  3. MatchUp: Max 50000 records limit

Note: to run examples, we exported a flat-file from AdventureWorks2017 database, using the following SQL Statement:

  • Note: We concatenated first, center and last proper noun to test the name parser. Besides, we added a "." at the cease of the email address to test the email correction feature.

Contact Verify Component

As mentioned before, at that place are but four features of the contact verify component available in the community edition:

  1. Proper name parsing: This feature is used to split a full name filed into first, middle, and concluding name fields. Also, information technology extracts additional information every bit title, prefix, and suffix. In the community edition, we are able only to extract the final name
  2. Address parsing: This feature is used to extract additional information from the accost field, such as the street proper name, suffix, mailbox name, and others
  3. Phone formatting: This characteristic is used to change the telephone number formatting
  4. Email correction: This feature is used to remove meaningless characters from an email address

To test this component, we create a new SSIS project and add together the following components:

  • Apartment File Connection Managing director: used to found a connection with the flat file we generated from the AdventureWorks2017 database
  • OLE DB Connection Manager: used to establish a connectedness with Tempdb (nosotros will apply information technology as destination)
  • Data flow task: where we will add the following components:
    • Flat File Source: read from the flat file connection managing director
    • MD Contact Verify: Melissa contact verify component
    • OLE DB Destination: where data will exist loaded

To configure the Contact Verify component, first, nosotros accept to specify the Melissa data directory. In the Contact Verify editor, go to "File > Avant-garde Configuration".

Opening advanced configuration

Effigy 6 – Opening avant-garde configuration

Make sure that the information file path is fix to "C:\Plan Files\Melissa Data\SQT\Information", which is the default data file path.

Advanced configuration form

Effigy seven – Advanced configuration course

Now, we will first configure the name parsing feature. In the contact verify editor, we select open the "Name" tab page. Then, we should specify the input Proper noun column and the output Last name column, as shown in the prototype below.

Name parsing tab page

Effigy eight – Proper noun parsing

Note that fifty-fifty first name and center name output cavalcade are specified by default, they will not generate whatsoever data in the customs edition. Besides, "Proper name 2" columns generate data if 2 names exist in the name field.

Adjacent, we should select the "Accost" tab folio to configure accost parsing. Then, we should specify all available input columns, as shown in the image below.

Address parsing input columns

Effigy nine – Accost parsing input columns

Now, we should press on the "Additional Output Columns" button to specify the output columns generated.

Parsed address columns

Effigy 10 – Parsed address columns

You will note that all properties related to the enterprise edition are disabled.

Next, we must select the "Telephone/Email" tab page to configure the phone formatting and the e-mail correction feature. Equally shown below, nosotros should specify the input phone and electronic mail columns, the output columns, and the desired phone format.

Phone/Email configuration

Figure eleven – Phone/E-mail configuration

Next, we must select the "Pass-Through Columns" tab page, to specify what are the columns in the input buffer we need to add together to the output buffer.

Pass-through columns

Effigy 12 – Pass-through columns

The Contact Verify component allows adding conditional filters to the generated output, which is not supported in the community edition. Y'all can check that in the "Output filter" tab page, where you can just change the output name.

Output filter

Figure xiii – Output filter

After configuring the Medico Contact Verify component, nosotros create a new destination table from the OLE DB destination component using by clicking on the "New" button.

Creating a new destination table

Figure 14 – Creating a new destination table

In the terminate, the data flow task should look like the post-obit:

Data flow task screenshot

Figure 15 – Data flow task

Later executing the package, we can run across the component impact from the result table, as shown in the following screenshots:

Name parsing result

Figure 16 – Name parsing result

Address parsing result

Figure 17 – Address parsing effect

Phone formatting and email correction result

Figure 18 – Phone formatting and email correction result

Profiler component

The 2nd Melissa data quality component is MD Profiler. It is a data profiling component similar to the SSIS information profiling task. This component is elementary; yous should select the input, pass-through, and result columns. And each profile data is generated within a split up output, as shown in the screenshots below. Likewise, you tin can perform some analysis on how the data processing is complete and save the contour to an external file.

Select input columns

Figure 19 – Select input columns

Select needed analysis

Figure xx – Select needed analysis

Configure profile output

Figure 21 – Configure profile output

The generated data profile outputs

Effigy 22 – The generated information profile outputs

Linking profile output to a destination

Figure 23 – Linking contour output to a destination

Even if this component is mentioned within the available features of the community edition. It notwithstanding not working correctly since information technology may non take a customs License cardinal.

Matchup component

The third Melissa information quality gratis component is the Matchup component. This component is similar to the SSIS lookup transformation just with a de-duplication feature. De-duplication is performed based on match codes ruleset. In the community edition, only 9-lucifer codes are bachelor.

Available Matchcode

Figure 24 – Bachelor Matchcode

To perform lookups, you should add a information source and link it to the Lookup component input as shown below.

Selecting Matchup component input type

Figure 25 – Selecting Matchup component input type

Decision

In this article, nosotros talked briefly about data quality and how to improve it in SQL Server Integration Services (SSIS). Nosotros illustrated the Customs edition of Melissa data quality and demonstrated the bachelor components; Contact Verify was fully explained while nosotros didn't provide much data on Profiler and Matchup since they need a separate commodity. Based on the demonstration, customs edition is only used for demonstration while we should buy the enterprise edition since it contains much more powerful tools that we may need at the enterprise level.

  • Author
  • Recent Posts

Hadi Fadlallah

kirkhopefinece.blogspot.com

Source: https://www.sqlshack.com/melissa-data-quality-free-components-for-ssis/

0 Response to "Melissa Data Upload Excel List of Addresses Match Owner Name"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel