• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
chrismuir
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links tochrismuir

refinr - Cluster and Merge Similar Values Within a Character Vector

These functions take a character vector as input, identify and cluster similar values, and then merge clusters together so their values become identical. The functions are an implementation of the key collision and ngram fingerprint algorithms from the open source tool Open Refine <https://openrefine.org/>. More info on key collision and ngram fingerprint can be found here <https://openrefine.org/docs/technical-reference/clustering-in-depth>.

Last updated

approximate-string-matchingclusteringdata-cleaningdata-clusteringfuzzy-matchingngramopenrefinecpp

6.95 score 103 stars 172 scripts 245 downloads