Convert files encoding ASCII with special character to UTF-8 file

By | 9. May 2019

In previous post we looked briefly  How to Fix the Collation and Character Set of a MySQL Database , this time we will have look what had to be done with text files what were also in wrong encoding. The site was over 15 years old, php and plain text files were in Nordic encoding ASCII files.

For files opened in notepad and also in some more advanced text editors correctly but for  surprise saving files in utf-8 encoding ended up with partially incorrect characters. There are several guides in internet how to do it, but must say that after checking some of them and failing, I took typical software developer approach – writing own solution.

The result is as simple as it can be – only few lines of code.

 

 class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 1)
            {
                string fileName = args[0];
                if (File.Exists(fileName))
                {
                    // Set String encoding to nordic
                    string fileContent = Encoding.GetEncoding(1252).GetString(File.ReadAllBytes(fileName));
                    // Do not overwrite original file, but create file wiht new name
                    string newFileName = Path.ChangeExtension(fileName, $"dk{Path.GetExtension(fileName)}");
                    // Write utf-8 text file without BOM header
                    File.WriteAllText(newFileName, fileContent, new UTF8Encoding(false));
                }
            }

        }
    }

 

First we read in file as byte array and assigning the value to string value  in encoding page 1252, what was suitable for files in this case.

After that we save string back to file in utf-8 encoding without BOM header (as destination was Linux server, it was the proper way).

And that is all the magic!

For usability and convenience few more code fragments are present :

  • checking if program is executed with one argument
  • if so then use argument as file name
  • check if file exists
  • and finally assign new file name to converted file for avoiding destroying the source (myfile.php -> myfile.dk.php)

 

As simple as it gets. Obviously if your files do use different code page you have to use this one in  Encoding.GetEncoding statement.

If you are not programmer but still need the application then executable is attached to this post.  .Net framework  4.6.1 or newer is needed to use it

AsciiDkToUtf.zip (200 downloads)

 

 

 

 

Leave a Reply