At work we had an interesting problem come up. We allow our clients to import their contacts into our system from numerous ways like 3rd part integrations, API, and user uploads. As more and more data is imported from all these different data sources we ran into a problem of being to figure out which addresses were duplicates.
The following three addresses would be added to our system for the same contact. But, our system was unaware that the following addresses would be considered exactly the same:
1331 E Hashnode Ln
1331 East Hashnode Ln
1331 E Hashnode Lane
1331 East Hasnode Lane
All of the above addresses are correct and if we were to send snail mail, they'd arrive at the correct location.
We've run it in production now for a few months and we like the results. The docs on the repo are great and you should be easily able to add it to your existing PHP stack.
For our use case, we don't change the data that our clients upload/send us. Instead, we use the package's hash feature (
$address1->getFullHash()) and store that as a new column in our address table. If during import we see that the hash already exists we then merge any new data with the existing contact.
Thanks, for reading, please let me know if you found this helpful or have questions over on Twitter: twitter.com/guywarner801
Have a better free solution? Let me know in the comments.