sacred-texts.com Homesacred-texts.com HomeAbout sacred-texts.comFrequently Asked QuestionsHow to contact sacred-textsSearch sacred-textsBuy the Internet Sacred Text Archive on CD-ROM
Wisdom is priceless, the sacred-texts CD-ROM is 49.95. Click here to learn more
Topics
Home
  World Religions
  Traditions
  Mysteries
  What's New?
  About
  Abuse
  Books
  Bibliography
  Contact
  Credits
  Copyrights
  Donate
  Downloads
  FAQ
  Links
  Map
  Press
  Privacy
  Search
  Top Level
  Terms of Service
  Translate
  Standards
  Unicode
  Volunteer
  Wishlist
Catalog
African
Age of Reason
Alchemy
Americana
Ancient Near East
Asia
Atlantis
Australia
Basque
Baha'i
Bible
Book of Shadows
Buddhism
Celtic
Christianity
Classics
Confucianism
DNA
Earth Mysteries
Egyptian
England
Esoteric/Occult
Evil
Fortean
Freemasonry
Gothic
Gnosticism
Grimoires
Hinduism
I Ching
Islam
Icelandic
Jainism
Journals
Judaism
Legends/Sagas
LGBT
Miscellaneous
Mormonism
Native American
Necronomicon
New Thought
Neopaganism/Wicca
Nostradamus
Oahspe
Pacific
Paleolithic
Philosophy
Piri Re'is Map
Prophecy
Roma
Sacred Books of the East
Sacred Sexuality
Shakespeare
Shamanism
Shinto
Sikhism
Sub Rosa
Swedenborg
Sky Lore
Tantra
Taoism
Tarot
Thelema
Theosophy
Time
Tolkien
UFOs
Utopia
Women
Zoroastrianism

 

Unicode


Many files posted at sacred texts since the spring of 2002 have embedded Unicode. Unicode is a multi-byte alphabet which can represent all major world scripts, and many obscure ones as well. This solves a major problem for creators of etexts, as it is now possible to fully transcribe texts in multiple languages without requiring ASCII transliterations, special fonts or browsing software. Unicode enabling also takes care of right-to-left scripts more-or-less automatically.

The major version 4 and up browsers support Unicode if you have a decent Unicode font installed, provided you designate that font as your default font.

That said, this is definitely still on the cutting edge, and you may need to tweak your browser settings to get the full character set. And there are some features which are buggy in particular browsers, although support seems to be getting better in newer versions; having an up-to-date version of your operating system also helps.

For instance, Netscape appears to have a few problems displaying some subscript and superscript characters such as Hebrew vowel points (they get displayed to the left of where they should be, with a space above them); this does not occur in Internet Explorer. Ironically, some versions of IE5 do not display medial and final forms when displaying Arabic (which makes it unusable for this purpose), while Netscape handles this issue correctly. For this reason, we have also posted a version of the Quran which uses gif images to display Arabic. But this is an exception. And this may have been fixed in more recent versions of the browser.

IE and Safari do not display the correct presentation forms for Unicode Cyrillic italics: Safari does not even allow Cyrillic to be italicized, whereas IE shows italicized forms of the base graphemes, which is incorrect. Opera and Firefox display these presentation forms correctly. Strangely enough, the italic Cyrillic presentation forms are displayed correctly in MS Word 2003.

Some problems viewing some polytonic Greek files on the 5.0 CD-ROM under Mac OS-X have been reported. These have been fixed on the website and the 6.0 DVD-ROM, but not on the 5.0 CD-ROM.

We welcome any comments or questions about the visibility of Unicode on this site in various browsers, and we will add advisories on this page. Extensive Unicode resources can be found at unicode.org [External Site].

Recommended Unicode Fonts

If you need a Unicode font, we recommend the Code 2000 shareware font [External Site]. This is a very extensive Windows font, and the one which we use to test the site with.

We also recommend the site http://www.alanwood.net/unicode/fonts.html, which lists dozens of Unicode fonts for a variety of platforms.

A Unicode font, Arial Unicode MS, comes with Windows XP. It has some good points: it seems to have better coverage of some of the more obscure Arabic characters than Code2000. That said, Arial Unicode MS is not pretty, and if reading everything in a sans serif font isn't your cup of tea, you may want to look elsewhere. Note that this font may not be installed on your XP system by default. If you have XP and don't see Arial Unicode MS as one of your available fonts, you may need to dig out your Windows disk. You also can buy it from Microsoft, but they charge an exorbitant $99 for it. With so many free and inexpensive Unicode fonts, there is no reason to pay that much!

There is also a page about font issues regarding the Unicode Hebrew Bible at sacred-texts which includes a specialized redistributable font.

Enabling Unicode in Your Browser

The most common complaint is 'I downloaded and installed Code2000 but I still see little boxes in your files'. This is because you also have to tell your browser that you want to view Unicode content using that font.

First of all, we recommend that if you have an older browser, you should obtain the most recent version. If you are using AOL or another ISP which has a bundled browser, you may wish to get the most recent version of Internet Explorer or Netscape and use it for browsing Unicode content; the bundled browsers are notoriously buggy, particularly when it comes to cutting-edge features such as Unicode.

Here's how to get Unicode working in Internet Explorer using Code2000. The procedure is very similar for other browsers.

1. Download and Install the Unicode Font

First of all you need to download the font and install it. For instance, if you are using Windows XP, you start the Control Panel 'Fonts' program, and then select 'Install New Font' from the 'File' menu.

2. Make the Unicode Font Your Default Web Page Font

Let's assume you have downloaded and installed the 'Code2000' font. Start Internet Explorer and go into 'Tools | Internet Options' and select the 'Fonts' dialog.

On the 'Web Page Font', Code2000 should show up in the scrolling listbox, if you downloaded it and installed it correctly. Select it.

Unless you do this, some Unicode characters (such as the accented Greek characters and some Hebrew characters) may not show up.

I'm still seeing little boxes! What to do?

The most common problem is skipping step two in the previous section. If you don't designate a full Unicode font as your default 'Web Page Font', you will still only have whatever minimal Unicode support is built into your operating system.

Typically this will include some of the simplest extended Latin accented characters, as well as basic Greek and Hebrew characters. However, you won't be able to view specialized accented Latin characters, polytonic Greek, or pointed Hebrew. You won't be able to see any Arabic or Devanagari characters, astrological symbols, and so on. These will show up as the dreaded 'boxes' (or question marks in some browsers).

The web pages with heavy Unicode dependencies at this site don't have embedded font information because that would greatly inflate their size; and in the case of sections such as the Hebrew Bible and Sanskrit/Transliterated Rig Veda, that adds up to some serious extra baggage. Therefore I leave it up to you to tell your browser which font to use. You can always switch it back easily if you aren't reading specialized Unicode content.

Manually Selecting Unicode Encoding

You may need to also manually select 'Unicode (UTF-8)' in certain browsers. For instance, under Internet Explorer, you can select 'View | Encoding', and 'Unicode (UTF-8)'. Under Netscape, this is 'View | Character Coding'.

Technically, some of these pages don't use the UTF-8 encoding scheme. However this seems to be the only way to specify that you are viewing Unicode content for some browsers. I've started to add UTF-8 META tags to all files which have any amount of Unicode. This seems to have helped.

Unicode Implementation

Technically speaking, the Unicode characters are embedded in 8 bit HTML using 'character entities', for instance:

ॐ = ॐ
א = א‎
Ω = Ω

If your browser is Unicode-enabled, you should see the Sanskrit letter for 'Aum' (see this image); the Hebrew letter Aleph, and a Greek capital Omega above.

For disk space and bandwidth reasons, I've also started to use the UTF-8 encoding scheme in the files which are predominantly Unicode, such as the Greek and Hebrew portions of the Bible and the Rig Veda. This is a variable-length binary compression scheme which encodes Unicode efficiently. Instead of the 6 bytes per character that the HTML entity requires, UTF-8 requires one to three bytes to represent the 16 bit Unicode character set. Most modern browsers handle UTF-8 automatically, assuming you have installed a complete Unicode font.

In some cases Unicode has been used to transcribe Latin characters with accents outside the ISO-8859-1 HTML character set. In other cases complete texts or extensive portions of the text are in Unicode. Among the Unicode character sets in use currently are Arabic, Chinese, Extended Latin, Greek, Hebrew, Tibetan, Runic and Sanskrit.

Some of the Unicode-enabled files at sacred-texts include:

Finally! The sacred-texts DVD-ROM is now shipping, in addition to both the 5.0 and 3.0 CD-ROM. The DVD-ROM has twice the content of the previous CD-ROM as well as an advanced search feature.

more...


Here's what customers are saying about the new 6.0 DVD-ROM:

  • This is an excellent product. Hours and hours of material to search through.Books I've never even heard of and some I've been searching for.This is a very important thing that this disk represents and I want to thank you for making this available.... $99.95 is a very realistic price for this product, as it is packed with material. more...
  • I think $99.95 is a very reasonable price for the DVD Rom, I will buy it and may even buy a gift or two for friends. more...
  • ... it will be a great bargain as anyone that collects books on these topics will recognize. more...
  • I was able to navigate easily through things.... I think [the price] is very fair.... that's a lot of data having it in one place is very handy more...

    HELP KEEP SACRED TEXTS FREE

    It costs tens of thousands of dollars a year to pay for this sites' bandwidth and maintenance. Without your continued support, sacred-texts will go offline or have to be scaled back. Your support is crucial; this site does not receive grants or institutional support.

    The best way to support the site is to purchase the CD-ROM. The Sacred-texts CD-ROM has hundreds of books on it that are extremely hard to locate, including all of the major world scriptures. If you buy a copy, you can feel good knowing that you are helping keep this site online.


    This site is available on CD-ROM!

    Buy it now

    "Stunning"
    read more...

    The Internet Sacred Text Archive CD-ROM includes electronic texts of nearly a thousand of the most important books and articles ever written, including over two hundred transcribed specially for sacred-texts. Years of extensive research and scholarship went into this CD-ROM: all the core texts of religion, mythology, folklore and the esoteric are on one disk.

    "worth far more than the price"
    read more...

    This collection includes the full text of each book, many with footnotes and illustrations. To buy all of these books you would have to spend tens of thousands of dollars, even if you could find them: many are out of print and hard to obtain at any price. You pay just pennies a book: the CD-ROM costs just $49.95; worldwide shipping is free when you buy direct!

    "Phenomenal collection"
    read more...

    Everything on the disk can be viewed in a standard web browser on your PC or Macintosh. Proceeds from sales of the CD-ROM go to support free access to the Internet Sacred Text Archive on the web and development of new etexts.


  • Collage of sacred texts, (c) 1999, J.B. Hare, All Rights Reserved
    This is a quiet place in cyberspace
    devoted to religious tolerance and scholarship

    Non-public domain contents of this site
    not otherwise copyrighted are © copyright 2006, John Bruno Hare, All Rights Reserved.
    See Site copyrights, Terms of Service for more information.
    Index |  FAQ |  Contact |  Search |  CD-ROM
    Open Source for the Human Soul