Extra Spaces in Windows STDOUT
Today I was using windows and I piped some output to a file.
echo '["test"]' > bad.json
When I looked at the file, the content inside looked fine. There was no extra spaces or weird characters. But when I read the file using Node's fs module, I saw extra whitespace between the characters. In the console I saw this...
const fs = require('fs');
var output = fs.readFileSync('bad.json', 'utf8');
console.log(output);
Why were there extra spaces? If I ran the script using a manually created file (we'll call it good.json), the output was displayed correctly with no extra spaces. So I decided to look at the byte count of each file. bad.json had 22 bytes, but good.json had only 8. When I did a hex dump to see the bytes in both files this is what I saw...
file name: bad.json
0000-0010: ff fe 5b 00-22 00 74 00-65 00 73 00-74 00 22 00 ..[.".t. e.s.t.".
0000-0016: 5d 00 0d 00-0a 00 ].....
file name: good.json
0000-0008: 5b 22 74 65-73 74 22 5d ["test"]
Why was Windows adding extra bytes when I piped output to a file? So I tried creating another file using stdout, but this time I used the windows command prompt (cmd). I was previously using PowerShell to create bad.json. This time I didn't get extra bytes.
file name: cmd.json
0000-000b: 5b 22 74 65-73 74 22 5d-20 0d 0a ["test"] ...
So it looked like powershell was using a different character encoding than cmd and what I'm used to in Linux.
I found this StackOverflow post which said some versions of PowerShell use Unicode as the default encoding. I ran this command to find my poweshell version, and it said I was was using version 5.1.
$PSVersionTable.PSVersion
As an experiment, I ran this command to force PowerShell to use utf-8.
write-output '["test"]' | Out-File 'powershell-test.json' -encoding utf8
file name: powershell-test.json
0000-000d: ef bb bf 5b-22 74 65 73-74 22 5d 0d-0a ...["tes t"]..
This was a little better, but it still had extra bytes at the front. I did some further investigation, and found this StackOverflow post which said PowerShell 5.1 creates UTF-8 files with a pseudo Byte Order Marker. It went on to say that the latest, cross platform PowerShell (PowerShell Core) no longer encodes UTF-8 with a BOM.
So I installed PowerShell 6.1.1 (the latest cross platform version) and tried to run my original command.
echo '["test"]' > buen.json
file name: buen.json
0000-000a: 5b 22 74 65-73 74 22 5d-0d 0a ["test"] ..
Much better! Now I wanted to switch the default terminal from the old windows PowerShell to the new PowerShell Core. Since I mainly use the terminal in Visual Studio Code, it was easy to change the default. I went into Visual Studio Code's options and and changed the 'terminal.integrated.shell.windows' option from the old PowerShell path to the new PowerShell path.