Remove Duplicate Words in C#

While developing application its very often that we take input from users, other applications or from some where else. It is very much possible that it may contain some junk or unnecessary value. Even if we put in all kind of validation, still in cannot prevent the duplicate words that can be entered mistakenly.

In order to remove duplicate words from any string or data which can simplify your algorithm or improve performance. Using a Dictionary instance we can remove duplicate words from a string in C#.

Following is a sample input and result

Input:  This is a test string for this blog
Output: This is a test string for blog
Note:   [The second 'this' was removed.]

Using word Dictionary

We have to select data structure which provides constant-time lookup times for keys such as Dictionary. The logic would be very straight forward, we will loop through all words, and will check each word against all words already encountered. If we will use two lists that will result in complexity and which will eventually make our program useless.

=== Example program that removes duplicate words (C#) ===
using System;
using System.Collections.Generic;
using System.Text;

class Program
    static void Main()
        string s = "This is a test string for this blog";

        s = "We use C# for development and share what we learn";

    static public string RemoveDuplicateWords(string v)
        // 1
        // Keep track of words found in this Dictionary.
        var d = new Dictionary<string, bool>();

        // 2
        // Build up string into this StringBuilder.
        StringBuilder b = new StringBuilder();

        // 3
        // Split the input and handle spaces and punctuation.
        string[] a = v.Split(new char[] { ' ', ',', ';', '.' },

        // 4
        // Loop over each word
        foreach (string current in a)
            // 5
            // Lowercase each word
            string lower = current.ToLower();

            // 6
            // If we haven't already encountered the word,
            // append it to the result.
            if (!d.ContainsKey(lower))
                b.Append(current).Append(' ');
                d.Add(lower, true);
        // 7
        // Return the duplicate words removed
        return b.ToString().Trim();

=== Output of the example program ===
This is a test string for this blog
    This is a test string for blog
We use C# for development and share what we learn.
    We use C# for development and share what learn
Tagged with: , ,
  • shujaath

    Thanks a lot for the code