Batch convert Irssi logs or other text files to UTF-8 using recode
Feb-2009
When I recently rented a VPS running a fresh install of Debian, I thought it was about time to stick with the now default Unicode locale, UTF-8. Doing this switch in a sensible fashion would include converting often used text files, such as chat logs from the older, more compatible but limited ISO 8859-15 charset.
(By the way: Linode, my VPS host, seems to be awesome.)
My chat client Irssi combined with OpenSSH, GNU Screen and Bitlbee provides me with hugely powerful social infrastructure in the form of continuous conversations that can be reached with any SSH client. Add logging and basic Unix tools to the mix and you have a silly fast and simple way of finding stuff you’ve discussed. In other words: my IRC/Live/Jabber logs are important works of reference and must be kept up to date with the system locale.
I failed to find any directly suitable or functional shell one-liners for this operation, until Thomas handed me something that worked for me.
The conversion command later on this page converts all files in Irssi’s default log location, ~/irclogs
, and its subdirectories from ISO-8859-15 to UTF-8. The conversion is performed on the files themselves using recode in their current location. Don’t run with scissors, please do yourself a favor by making a backup copy of your precious logs. The most obvious tool for that is perhaps:
"cp -r ~/irclogs ~/backup_irclogs"
This is Thomas‘ conversion command:
"find ~/irclogs/* | while read i; do echo "Converting $i"; recode ISO-8859-15..UTF-8 "$i"; done"
Stig later informed me about find having exec capablities, but since I’m lazy and all that, you’ll have to optimze the above version yourself.