Melissa Data Upload Excel List of Addresses Match Owner Name
In this commodity, we will talk briefly about data quality in SQL Server. Then, nosotros volition give a brief overview of the Melissa Data Quality for SQL Server Integration Services (SSIS), and we will demonstrate the components bachelor in the community edition.
Introduction
In general, data quality is the level of how much data fits its serving context. Enhancing the information quality is very critical since poor quality leads to inaccurate reporting and results in wrong decisions and inevitably economic damages. For this reason, each data management arrangement provided a bunch of tools used to meliorate the information quality level.
For SQL Server, many technologies can exist used to enhance data quality:
SQL Server Data Quality Services (DQS)
Data Quality Services is a knowledge-driven data quality characteristic adult by Microsoft and released in SQL Server 2012. Information technology tin can be installed from the SQL Server installation, and information technology provides different services, such as building a noesis base, data de-duplication, standardization.
To learn more almost this feature, you can refer to the following articles:
- How to clean information using Information Quality Services and SQL Server Integration Services
- How to utilize SQL Server Data Quality Services to ensure the correct aggregation of data
- How to clean Master Data Services data using Information Quality Services in SQL Server
Using the well-known Microsoft SSIS Components
SQL Server Integration Services provides a bunch of components that tin can be used to assess and enhance data quality. These operations tin can exist performed at the control period level, such as data profiling and validation, or the information menstruation level using fuzzy lookups, conditional splits, derived columns, script component, and others.
Writing SQL Statements
One of the most pop data cleaning approaches is implementing your own logic using SQL statements, which is known as data wrangling. SQL Server provided a agglomeration of system functions that can exist used to ameliorate data quality.
Using tertiary-political party components
One of the most beautiful things in the Visual Studio IDE is that it is immune to develop tertiary-party components and integrate them within Microsoft products such equally SSIS. Many companies adult third-party SSIS components such as CDATA, Kingwaysoft, and COZYROC.
Regarding data quality, one of the most popular products in the market is Melissa data quality components for SQL Server.
Melissa Data Quality for SQL Server
Melissa data quality tools are a set of SSIS components that are used to clean and enrich data during the data transfer or integration process. 2 editions are bachelor:
- Enterprise edition: Commercial, contains a wide variety of data quality components and online services
- Community edition: Free, simply only a few components are available (check the link to a higher place)
In this article, we will exist talking nearly the customs edition, and we will briefly illustrate its components.
Download Melissa data quality customs edition
To download the Melissa data quality customs edition, you lot should navigate to the SQL Server editions page. Then, request a demo by filling the course located on the left side of the folio. And make sure to select the community edition.
Figure 1 – SQL Server editions page
Afterwards requesting the demo, you will receive an email that contains a link for the web installer with a community license key.
Figure ii – Received email
Now, you should download the spider web installer from the link you received. When finished, you should enter the Melissa license key during installation.
Figure 3 – License information asking during installation
When the installation is washed, the Melissa data quality components should appear within the SSIS toolbox (data flow level).
Figure four – Melissa data quality components within SSIS toolbox
If you add any of those components to the data menstruum, you will see the post-obit notification every fourth dimension you will try to open its editor.
Effigy 5 – Customs edition notification
Equally mentioned in the Melissa SQL Server editions page, few features are bachelor in the customs edition:
- Contact Verify component: Simply accost parsing, name parsing, email correction, and phone formatting operations can be performed
- Profiler: Max 50000 records limit
- MatchUp: Max 50000 records limit
Note: to run examples, we exported a flat-file from AdventureWorks2017 database, using the following SQL Statement:
| 1 ii iii 4 5 6 7 8 9 ten 11 12 13 xiv 15 16 17 18 nineteen | SELECT [ BusinessEntityID ] , [ Title ] , [ FirstName ] , [ MiddleName ] , [ LastName ] , [ Suffix ] , REPLACE ( LTRIM ( RTRIM ( ISNULL ( [ FirstName ] , '' ) + ' ' + ISNULL ( [ MiddleName ] , '' ) + ' ' + ISNULL ( [ LastName ] , '' ) ) ) , ' ' , ' ' ) as [ Name ] , [ JobTitle ] , [ PhoneNumber ] , [ PhoneNumberType ] , [ EmailAddress ] + '.' as [ EmailAddress ] , [ EmailPromotion ] , [ AddressLine1 ] , [ AddressLine2 ] , [ City ] , [ StateProvinceName ] , [ PostalCode ] , [ CountryRegionName ] FROM [ AdventureWorks2017 ] . [ HumanResources ] . [ vEmployee ] |
-
Note: We concatenated first, center and last proper noun to test the name parser. Besides, we added a "." at the cease of the email address to test the email correction feature.
Contact Verify Component
As mentioned before, at that place are but four features of the contact verify component available in the community edition:
- Proper name parsing: This feature is used to split a full name filed into first, middle, and concluding name fields. Also, information technology extracts additional information every bit title, prefix, and suffix. In the community edition, we are able only to extract the final name
- Address parsing: This feature is used to extract additional information from the accost field, such as the street proper name, suffix, mailbox name, and others
- Phone formatting: This characteristic is used to change the telephone number formatting
- Email correction: This feature is used to remove meaningless characters from an email address
To test this component, we create a new SSIS project and add together the following components:
- Apartment File Connection Managing director: used to found a connection with the flat file we generated from the AdventureWorks2017 database
- OLE DB Connection Manager: used to establish a connectedness with Tempdb (nosotros will apply information technology as destination)
- Data flow task: where we will add the following components:
- Flat File Source: read from the flat file connection managing director
- MD Contact Verify: Melissa contact verify component
- OLE DB Destination: where data will exist loaded
To configure the Contact Verify component, first, nosotros accept to specify the Melissa data directory. In the Contact Verify editor, go to "File > Avant-garde Configuration".
Effigy 6 – Opening avant-garde configuration
Make sure that the information file path is fix to "C:\Plan Files\Melissa Data\SQT\Information", which is the default data file path.
Effigy seven – Advanced configuration course
Now, we will first configure the name parsing feature. In the contact verify editor, we select open the "Name" tab page. Then, we should specify the input Proper noun column and the output Last name column, as shown in the prototype below.
Effigy eight – Proper noun parsing
Note that fifty-fifty first name and center name output cavalcade are specified by default, they will not generate whatsoever data in the customs edition. Besides, "Proper name 2" columns generate data if 2 names exist in the name field.
Adjacent, we should select the "Accost" tab folio to configure accost parsing. Then, we should specify all available input columns, as shown in the image below.
Effigy nine – Accost parsing input columns
Now, we should press on the "Additional Output Columns" button to specify the output columns generated.
Effigy 10 – Parsed address columns
You will note that all properties related to the enterprise edition are disabled.
Next, we must select the "Telephone/Email" tab page to configure the phone formatting and the e-mail correction feature. Equally shown below, nosotros should specify the input phone and electronic mail columns, the output columns, and the desired phone format.
Figure eleven – Phone/E-mail configuration
Next, we must select the "Pass-Through Columns" tab page, to specify what are the columns in the input buffer we need to add together to the output buffer.
Effigy 12 – Pass-through columns
The Contact Verify component allows adding conditional filters to the generated output, which is not supported in the community edition. Y'all can check that in the "Output filter" tab page, where you can just change the output name.
Figure xiii – Output filter
After configuring the Medico Contact Verify component, nosotros create a new destination table from the OLE DB destination component using by clicking on the "New" button.
Figure 14 – Creating a new destination table
In the terminate, the data flow task should look like the post-obit:
Figure 15 – Data flow task
Later executing the package, we can run across the component impact from the result table, as shown in the following screenshots:
Figure 16 – Name parsing result
Figure 17 – Address parsing effect
Figure 18 – Phone formatting and email correction result
Profiler component
The 2nd Melissa data quality component is MD Profiler. It is a data profiling component similar to the SSIS information profiling task. This component is elementary; yous should select the input, pass-through, and result columns. And each profile data is generated within a split up output, as shown in the screenshots below. Likewise, you tin can perform some analysis on how the data processing is complete and save the contour to an external file.
Figure 19 – Select input columns
Figure xx – Select needed analysis
Figure 21 – Configure profile output
Effigy 22 – The generated information profile outputs
Figure 23 – Linking contour output to a destination
Even if this component is mentioned within the available features of the community edition. It notwithstanding not working correctly since information technology may non take a customs License cardinal.
Matchup component
The third Melissa information quality gratis component is the Matchup component. This component is similar to the SSIS lookup transformation just with a de-duplication feature. De-duplication is performed based on match codes ruleset. In the community edition, only 9-lucifer codes are bachelor.
Figure 24 – Bachelor Matchcode
To perform lookups, you should add a information source and link it to the Lookup component input as shown below.
Figure 25 – Selecting Matchup component input type
Decision
In this article, nosotros talked briefly about data quality and how to improve it in SQL Server Integration Services (SSIS). Nosotros illustrated the Customs edition of Melissa data quality and demonstrated the bachelor components; Contact Verify was fully explained while nosotros didn't provide much data on Profiler and Matchup since they need a separate commodity. Based on the demonstration, customs edition is only used for demonstration while we should buy the enterprise edition since it contains much more powerful tools that we may need at the enterprise level.
- Author
- Recent Posts
Source: https://www.sqlshack.com/melissa-data-quality-free-components-for-ssis/
0 Response to "Melissa Data Upload Excel List of Addresses Match Owner Name"
Post a Comment