Thursday, January 25, 2024

bug: MS microsoft excel and libre office calc for those who use spreadsheets to do pali + english translation

 




In case anyone else uses libre office calc or Microsoft excel spreadsheet to do pali + english translation

(so things are lined up neatly side by side)


I frequently will cut an entire column (for example English) and paste into a text editor like notepad++ (plain text editor but with richer regex) 

do some editing in notepad++ then copy that back into the spreadsheet.


The bug is, 

the English text I have in the sample above, somehow either the brackets (), {},  [], or *********, or perhaps tabs or some other invisible characters,

 or some combination of all of the above causes the copy buffer clipboard, when moving from notepad++ back into excel spreadsheet,

Will read that entire block of 12 lines back into the spreadsheet as ONE single line.

Which screws up the pali + english side by side alignment for the entire file.


I have the entire AN 5 collection of suttas (hundreds) at  8000+ lines of pali + english,

and it was a nightmare to figure out what was the problem.

At first, I thought it was libre office bug,

so I tried out microsoft excel, same problem.

A day and half later I finally tracked down the problem.


Another issue: Non breaking spaces

They look like a regular ' ' (space), but they're an invisible character that joins the strings on either side of it, so that computers treat it as a single word.

For example "butt head" appears to be two words.

But if I put a non breaking space between "butt" and "head", computers treats that as a single word.


This happened with some sutta central source files from pali (and maybe english) in the past.

Maybe they've fixed it, maybe not.

But the problem it caused on my end, with side by side pali + english in a table, is that the non breaking spaces would create giant long words, causing a huge table imbalance with 90% of the width of the table for one over the other, instead of 50/50.


Simple fix to replace non-breaking spaces with normal space:

with regex, search for \s   [any white space code]

and replace with ' ' [single normal space]


more complicated but correct precise fix:

 with a decent text editor, search for the hex number of the unicode for non breaking space, and replace it with one 







No comments:

Post a Comment