score:4
The texts at tipitaka.org were created by the VRI institute in the 90s, and generally are quite good. However, their work was very carefully studied and corrected by the Dhamma Society in Bangkok and the results published as the Mahasangiti Edition. It is this edition that was used as the basis by SuttaCentral (disclaimer: it me!).
The Mahasangiti, while not perfect, is the most accurate and reliable digital Pali text that has so far been created. Note that "reliable" here means that it accurately represents the 6th Council edition on which it was based. I have done a brief comparison of readings, and you can see that here:
As for the features such as punctuation and markup that you identify, as you surmise these are not found in the manuscripts. They were added by modern editors for ease of reading. So yes, feel free to ignore them. They are not always correct.
I think this is fine for a general purpose edition like the Mahasangiti, but I think we should also have what are called "diplomatic editions" that exactly represent what is in a specific manuscript with no alterations. SuttaCentral is currently working on such a diplomatic edition of the oldest Pali manuscript, a Cullavagga from the 13th century held in the Colombo Museum.
We have recently concluded a systematic set of tests for the integrity of our Pali texts, which you can read about here:
https://discourse.suttacentral.net/t/pali-text-integrity-checking/16438
I agree that we should remove all markup and extraneous information from texts as much as possible, and we are building a new system called "standoff markup" to achieve this. All non-essential information is separated from the text and can be recombined. Everything is stored as JSON data. We so far have not done this with punctuation, as it is too complex, but it would be a good idea. Perhaps at some point we'll be able to do that, too.
You can see an example of our corrected texts in standoff JSON here:
https://github.com/suttacentral/bilara-data/blob/published/root/pli/ms/sutta/mn/mn1_root-pli-ms.json
Upvote:-2
If, good householder, and who ever joy in such, is interested to help to care about the gift, my person had received, short before Upasaka Goenka left his existance here, for the Sangha of the Eight direction, as it is not really maintained (errors not corrected, not brought right into particular scripts), if he likes to make merits toward the Sangha, he may feel given to do so here. (Note that it isn't a worldly undertaking and dedicated toward liberation)
Some scripts had been already placed for further maintaining here.
And you should be clear that usual acting and punder is not only Karmical very destructive but also in regard of common laws, often a violation:
Copyright Vipassana Research Institute, for the source you probably think you can do want you want. And, yes, also approve and encoragement of taking on what is not given is bad Kamma, leading to poorness.
Din't one think because bad habits are common that they are justified, not to speak about long-term effects.
[Note that this isn't given for stacks, exchange, other world-binding trades, but for liberation]
Upvote:3
OP: There are ellipses … everywhere
This is a Tipitaka writing method called peyyala (p) [පෙය්යාල (පෙ)] to omit repetitive phrases or paragraphs in order to prevent Tipitaka becoming unfeasibly lengthier. In peyyala writing method, ellipsis (...) with or without p (පෙ) used to indicate an omitted phrase or passage.
e.g. The following paragraph written with peyyala.
කතමෙ ධම්මා කුසලා? යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි...පෙ...පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං ඡන්දාධිපතෙය්යං...පෙ...විරියාධිපතෙය්යං...පෙ...චිත්තාධිපතෙය්යං...පෙ...වීමංසාධිපතෙය්යං, තස්මිං සමයෙ ඵස්සො හොති...පෙ...අවික්ඛෙපො හොති...පෙ...ඉමෙ ධම්මා කුසලා.
If we write the same paragraph without peyyala it would become lengthier like this;
කතමෙ ධම්මා කුසලා? යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං ඡන්දාධිපතෙය්යං, තස්මිං සමයෙ ඵස්සො හොති. යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං විරියාධිපතෙය්යං, තස්මිං සමයෙ ඵස්සො හොති. යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං චිත්තාධිපතෙය්යං, තස්මිං සමයෙ ඵස්සො හොති. යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං වීමංසාධිපතෙය්යං, තස්මිං සමයෙ ඵස්සො හොති. යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං ඡන්දාධිපතෙය්යං, තස්මිං සමයෙ අවික්ඛෙපො හොති. යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං විරියාධිපතෙය්යං, තස්මිං සමයෙ අවික්ඛෙපො හොති. යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං චිත්තාධිපතෙය්යං, තස්මිං සමයෙ අවික්ඛෙපො හොති. යස්මිං සමයෙ ලොකුත්තරං ඣානං භාවෙති නිය්යානිකං අපචයගාමිං දිට්ඨිගතානං පහානාය පඨමාය භූමියා පත්තියා විවිච්චෙව කාමෙහි පඨමං ඣානං උපසම්පජ්ජ විහරති දුක්ඛපටිපදං දන්ධාභිඤ්ඤං වීමංසාධිපතෙය්යං, තස්මිං සමයෙ අවික්ඛෙපො හොති. ඉමෙ ධම්මා කුසලා.
OP: Latin punctuation like the question mark ? and period . everywhere.
Your first quote is from "Dhammasaṅgaṇī" and it is written in a question and answer pattern. So the latin punctuation question mark (?) has its usual meaning (used at the end of an interrogative sentence). Period (.) is used to indicate the end of a sentence.
OP: Things are aligned in several different ways as in the last image.
I cannot see much importance of the alignment of the sentences.
OP: Again, why so much random bold?
I'm not sure why some of the sentences and words are bold apart from the titles.
OP: And there's quotes ".
Quotes (") are usually used as speaker tags.
OP: What do I really need out of these texts? Do I need to keep the bold and centering and such (i.e., what information does it contain/encode?), or can I get rid of it?
My suggestion is to keep all the formatting except different alignments. (Most importantly the ellipses should be kept). I'm not sure whether the bolds should be kept as they are or not.
Note: This is how I understood. I may be wrong, but not Dhamma.