<time>
safe for historiansThe HTML 5 spec introduces the <time>
element to mark up a date or time. Although I support the inclusion of these semantics in HTML, I believe that the current specification of the <time>
element is vague because it avoids the question whether the element is safe for historians. Right now it hurts historical research more than it helps. In this entry I’ll explain why.
Although I will concentrate on the HTML5 syntax here, what I have to say also applies to the microformats datetime design pattern. The Microformats site adds one important detail to the discussion that the HTML5 spec overlooks: the point of having a <time>
element (or a datetime design pattern) at all:
Use the datetime-design-pattern to make datetimes that are human readable also formally machine readable.
Who needs machine readable dates? As far as I can see there are two target audiences for this operation. The first is obviously social applications that have to work with dates, and where it can be useful to compare dates of two different events. An app must be able to see if two events fall on the same day and warn you if they do.
However, as a target audience social applications are immediately followed by historians (or historical, chronological applications). After all, historians are (dare I say it?) historically the most prolific users of dates, until they were upstaged by social applications.
This raises the question whether the <time>
element should be tailored for historical use at all. When I started writing this entry I was convinced that it should.
In keeping with the definition of its purpose I the see the <time>
element as a tool for an Internet-wide chronological search-and-compare system. Such a system will be a boon to historians, who would be allowed to quickly and easily look up events that happened around the same time as the event they’re writing about.
In history, just as in other academic disciplines, serendipitous discoveries are the meat of exciting new theories. A history-compliant use of the <time>
element that allows automatic search and compare would broaden the horizons of historians.
However, now that I’ve reviewed some of the more common problems that have to be solved in order to decrease potential harm, I’m starting to doubt whether the <time>
element can easily be made to fit history.
Right now, though, the specification is a vague compromise that doesn’t make the <time>
element useful for historical research, but still allows it to be used historically.
I feel this ambiguity should be removed. I feel that the specification should clearly state whether the <time>
element is meant for historical use or not. The current vague, implied “No” should be changed to a clear answer. I prefer Yes, but I can live with No.
If the <time>
element should be made safe for historians, there’s quite a bit of work to be done; some of which is discussed in this article. If it should not be made history-safe, we have to add a cut-off date to the specification. Dates before this cut-off date would be ignored.
The basic problem (that we’ll discuss in great detail below) is that the current specification requires the use of the so-called proleptic Gregorian calendar. Although that makes perfect sense in the modern age, it becomes progressively more pointless as we travel back in time, and somewhere in the late 16th century we reach the point that proleptic Gregorian dates become actively harmful to historical research.
The basic problem is that historians of the Middle Ages and earlier periods use Julian dates because that’s what the documents of that era use. If we’d map them to proleptic Gregorian dates, as the specification demands, they would be worse than useless in any kind of automatic search-and-compare system.
Hundreds of years’ worth of historical literature uses Julian dates if the people from the era it discusses did so, and therefore a system that uses proleptic Gregorian dates just doesn’t find any matches.
The current specification acknowledges this problem — somewhat. It says:
For dates before the introduction of the Gregorian calendar, authors are encouraged to not use the time element, or else to be very careful about converting dates and times from the period to the Gregorian calendar
A literal interpretation would have odd consequences. If I’d write about the secret negotiations between Louis XIV and Charles II to destroy the Dutch Republic in the early 1670s, I would be allowed to mark up the dates of Louis XIV’s letters, but not those of Charles II’s ones. France used the Gregorian calendar back then, but England stuck to the Julian. Such a rule is useless for historians. Besides, it’s just plain weird.
As to marking up Charles II’s letters with Gregorian dates, that’s possible, but it could lead to the same problems we discussed above: the generally accepted date for a letter might be Julian, in which case an automated search for the Gregorian date misfires dramatically.
So I believe this remark is incorrect and should be changed. The specification should clearly and unambiguously state whether or not the <time>
element is fit for historical use instead of trying to find a vague formula that avoids this basic question. (I don’t even understand why this question should be avoided. It’s a simple one, though the consequences of a Yes are pretty complicated.)
If the answer is No the specification should define a cut-off date that is the earliest date the <time>
element (or automatic search systems based on it) accepts as valid. Earlier dates are simply ignored by a compliant implementation. That neatly avoids the bulk of the problems mentioned in this article, and makes sure that any historical use that falls within the constraints of the specification is actually useful.
Therefore, if historical use of the <time>
element is to be disallowed, we MUST (in the sense of RFC 2119) define a cut-off date.
The most obvious candidate for a cut-off date is 1 January 1970, the start of Unix Epoch time. There’s one problem, though: if we’d cut off the <time>
element there, many people alive today wouldn’t be able to mark up their birth dates.
Therefore I’d like to propose 1 January 1870 instead. Its relation to the start of the Unix Epoch is clear and it allows everybody alive today to mark up their dates of birth.
Besides, there’s some vaguish historical justification for this date. Around 1870 the final phase of European colonial imperialism started, which caused almost the entire world to be divided among the colonial powers. Not coincidentally, this also caused the Gregorian calendar to spread to even the most obscure corners of the world, and it became a true world standard.
The only problem with that cut-off date would be that Russia still used the Julian calendar in 1870 and continued to do so until 1918. Moving the cut-off date to 1918 is possible, but it would mean a few of the very oldest people in the world would not be able to mark up their birth dates.
If you’re convinced that the current specification of the <time>
element should not be changed to accomodate historians, you can stop reading here. The historical overview that follows is not important to you.
You should just:
<time>
element is not meant for historical use.Thank you for your attention.
If you’re still with me, you’re obviously interested in chronological problems. You’ll get what you want — in spades.
If the HTML5 <time>
element is to be made safe for historical use, the specification MUST (again in the sense of RFC 2119) allow
Furthermore:
datetime
format; most importantly Easter.Of these six rules, I believe that the first five are universal. Although I will defend them by studying European history exclusively, I think that most other chronologies will be served by them, too. The sixth rule is specific to European history; other civilisations will have other cut-off dates for arithmetic operations.
In order to understand all this we have to review the history of dates. There are two separate problems we have to discuss: the calendar (i.e. the days and months of the year), and the names of the years. The specification treats the first point vaguely, and ignores the second.
The solar year is about 365.2422 days long, which means it cannot be expressed in an integer number of days. As history progressed, calendars became better at dealing with this problem. For the purpose of our discussion, the Julian and Gregorian reforms are the most important.
Unfortunately, in 1582 the wars of religion raged in Europe, and the Protestants were not really eager to follow the directions of the Antichrist in Rome, especially if he happened to be right. The Orthodox also had pressing (and much older) reasons to demonstratively ignore the papacy.
Therefore all Catholic countries switched to the new calendar within years, but the Protestant and Orthodox countries refused to follow.
Most Protestant countries went over in 1700, when the wars of religion had become a vague memory and the actual difference between the two calendars acute. After all, 1700 was the first year that was a leap year in the Julian calendar but not in the Gregorian one.
Nonetheless, England and Russia continued to use the Julian calendar until 1752 and 1918, respectively. (In fact, one Scottish island stubbornly refused to implement the Gregorian calendar until ten years ago or so. The sheep might get confused.)
To this day, most Orthodox use the Julian calendar (even though the Orthodox states use the Gregorian one, and some Orthodox churches use the so-called Revised Julian calendar). That’s why the Orthodox celebrate Christmas (and sometimes Easter) on different days than the Catholics and Protestants.
Currently the specification decrees the use of so-called proleptic Gregorian dates; that is, the date a day would have had if the Gregorian calendar had already been in use back then.
Although this makes sense in the recent past, we’ll see that this decree becomes more harmful the more we go back into history. Although (proleptic) Gregorian makes perfect sense as a default, it MUST be possible to define another calendar.
Besides, there’s the matter of onus. Who, exactly, is responsible for mapping a Julian date to a proleptic Gregorian one? The HTML author or some kind of universal date-calculating system?
In other words, if I, as an historian, talk about 18th century Julian dates, do I have to map them to the Gregorian calendar myself (possibly by means of software I have to buy and install), or can I just trust an Internet-wide system to do so? Obviously I prefer the second solution because it’s less work for me and will probably introduce less errors.
Let’s take a look at a practical example.
On 24 January 1918 Lenin signed a decree that moved the brand-new Soviet Union from the Julian to the Gregorian calendar. In order to bridge the 14-day gap between Julian and Gregorian, 1-13 February 1918 were omitted, so that 31 January was directly followed by 14 February.
Now how are we going to mark up this paragraph? Let’s try the proleptic Gregorian calendar:
<p>On <time datetime="1918-02-06">24 January 1918</time> Lenin signed a decree that moved the brand-new Soviet Union from the Julian to the Gregorian calendar. In order to bridge the 14-day gap between Julian and Gregorian, 1-13 February 1918 were omitted, so that <time datetime="1918-02-13">31 January</time> was directly followed by <time datetime="1918-02-14">14 February</time>.</p>
This remapping might be confusing for human and machine, but using Gregorian dates still makes sense, especially since the text is about Russia introducing the Gregorian calendar.
However, the 1-13 February 1918
bit is a problem. They’re dates, but they have never existed. I think it’s best not to mark them up at all.
Slightly more confusing is the following:
After the October revolution (25 October 1917) Russia became a communist state.
This date is Julian; we’ll have to map it to Gregorian, but the consequence is that the October revolution takes place in November. Fortunately we’re used to this fact; most history books mention this oddity.
The use of Julian datetime
values becomes mandatory as we enter the Middle Ages. So let’s jump eight hundred years more into the past.
Jerusalem was conquered by the crusaders on 15 July 1099, and a great slaughter was perpetrated among its inhabitants of all races and creeds.
According to the specification we’d either have to use proleptic Gregorian dates or not use a datetime
attribute at all. Since I feel the second option invalidates the entire <time>
element, I’m forced to choose the first one:
<p>Jerusalem was conquered by the crusaders on <time datetime="1099-07-09">15 July 1099</time>, and a great slaughter was perpetrated among its inhabitants of all races and creeds.</p>
The problem is that the proleptic Gregorian 9 July is worthless. Every history book uses 15 July as the date of Jerusalem’s conquest, so an online search by a program that parses datetime
values would misfire dramatically.
More in general, medieval historians use whichever date system the people from that age actually used, and therefore all dates in all books about medieval history are Julian, and not proleptic Gregorian.
Because medieval historians use Julian dates, mapping medieval dates to proleptic Gregorian is going to cause widespread confusion. The machine-readable dates will match those used in history books and source collections. Thus, the misuse of the proleptic Gregorian calendar will actively hamper historical research instead of aiding it.
In the case of medieval (and earlier) history we MUST use Julian datetime
values. We do have to specify that fact, of course, which means we need an extra attribute, which I’ve dubbed calendar
for the moment:
<p>Jerusalem was conquered by the crusaders on <time datetime="1099-07-15" calendar="Julian">15 July 1099</time>, and a great slaughter was perpetrated among its inhabitants of all races and creeds.</p>
Easter is an important date; in fact during most of church history it was the most important holiday of the year. It’s not a fixed feast; it is celebrated on the first Sunday after the full moon after the first new moon that falls on or after the March equinox. (The actual calculation is somewhat more complicated, but this definition will do for now.)
Thus, the definition of Easter depends on the definition of the March equinox. In the Gregorian calendar it’s 21 March, and in the Julian calendar it’s also 21 March, but of course the two 21 Marches are several days apart, and if a new moon occurs in the gap between them, the Julian and Gregorian Easter dates will not match even after the Julian one is mapped to Gregorian. (This rule still holds for the Orthodox church.)
Because it was so hard to calculate this most important date of the year, considerable ingenuity was applied to the job throughout late antiquity and the middle ages. In fact, the very survival of chronological knowledge in the dark ages can be ascribed to the need to calculate Easter.
Every church had its paschal tables, which showed the dates for Easter (as well as chronologically related feasts such as Good Friday and Pentecost). When individual priests or monks started adding extra notes about important events to these tables, chronicles were born.
Concern over the slowly shifting date of Easter was what prompted Gregory XIII to institute his calendar reforms. He wanted to make sure the modern Church celebrated Easter on the dates prescribed by the Council of Nicaea in 325.
Let’s jump four more centuries back and take a look at a practical example.
On Easter 675, a land dispute between Praejectus, bishop of Clermont, and Hector, count of Marseille, was heard before the royal court.
In order to properly mark up Easter 675
we have to first calculate Julian Easter 675 and then map this date to the proleptic Gregorian one. A new moon may have occurred between Julian and Gregorian 21 March 675, after all.
This calculation is not impossible, but the question is on whom the onus should rest. The author, or some kind of centralised date system? (I, in any case, have not attempted to calculate the precise date.)
I feel the onus should be removed from the historian who wants to write about poor Praejectus and his murder and is not interested in HTML5 chronology.
Besides, “Easter 675” is an exact date: both modern historians and people who actually lived in 675 will reach the same result when they calculate it.
The problem is that it’s so very hard calculate, especially when you insist on the proleptic Gregorian date. And once you’ve found the correct result, it turns out nobody is interested. So let’s save ourselves a tough job and just do this:
<p>On <time datetime="0675-Easter" calendar="Julian">Easter 675</time>, a land dispute between Praejectus, bishop of Clermont, and Hector, count of Marseille, was heard before the royal court.</p>
This is how early medieval chronological reconstructions work:
The murder of bishop Praejectus probably took place on 26 January 676. We know for a fact he was still alive on Easter 675, and his successor as bishop of Clermont is said to have ruled “for fifteen years and a bit,” and to have died in the reign of king Theodoric III.
Since Theodoric III died in April 691, Praejectus’ successor became bishop in early 676 at the latest. Besides, St. Praejectus’ feast is celebrated on 26 January; and it is not unreasonable to assume it took place on the anniversary of his murder.
Early medieval historians are quite happy when they can pin such an exact date on an event; and never mind that the date is Julian.
Now how are we going to mark up all this? There are several problems here:
The first date should be marked up fully. After all it refers to an exact, specific date. The third date would probably have to be marked up by a <time>
element without a datetime
attribute.
As to the second date, we MUST use <time datetime="0691-04" calendar="Julian">
, and never mind that that date is incomplete.
The fact that we know the month of Theodoric III’s death makes this date more precise than most dates from that era. Any machine-generated historical timeline tool MUST mention “April 691” as the date of Theodoric’s death, because the fact might be important to chronological research such as determining when Praejectus was martyred.
Let’s go another eight hundred years back and land just in time to see Hannibal victorious against the Romans at Cannae. This historical battle, sources assure us, took place on 2 August 216 BC. We don’t have a prayer of re-mapping this date to a proleptic Gregorian or a Julian one.
The ancient Roman year had 355 days, and in theory every second year ought to have a so-called intercalary month of 22 or 23 days. The problem was that these months were inserted irregularly, and no chronologist ancient or modern has ever taken the trouble to track down the exact use of the intercalary month. (Besides, the sources are just not there.)
This means that we will never know exactly on which proleptic Gregorian date the battle of Cannae took place. The best we can say is that it took place in high summer; probably in July or August.
However, if a source would say that a certain event happened on 5 August 216 BC, we can be certain that it took place three days after the battle of Cannae. The Romans saw the use of a reliable chronology and were generally accurate within the constraints of their weird calendar.
Thus, the date of the battle of Cannae should be marked up as:
<time datetime="-216-08-02" calendar="Ancient Roman">2 August 216 BC</time>
With this final example we’ve discussed the need for a calendar
attribute sufficiently.
Apart from the calendar
attribute, this code example contains something else an HTML5 validator would get extremely upset about: the negative year.
The restriction that BC years may not be used is of course totally absurd in a historical context.
That brings us to the second problem: the names of the years, and especially the use of different naming systems.
If you’re into ancient chronology, it’s best to see years as having names, not numbers. Essentially our modern numbering of years from the Incarnation provides a common naming system; not a numbering system.
In JavaScript terms, the name of the current year is the string "2009"
and not the number 2009
.
Until the waning of the middle ages, all monarchies used naming systems based on the regnal year of the king or emperor. Although we talk about 12 August 1274, a contemporary document would not use the name “1274,” but would instead say “the second year of king Edward” for the English, or “the fourth year of king Philip” for the French.
(Of course historians first have to figure out these documents refer to Edward I and Philip III, and not, for instance, Edward III and Philip VI. Medieval chronology is such fun.)
In addition to this traditional naming system, that has been in use since the dawn of history, the Middle Ages used the naming system of years since the Incarnation that the devoutly Christian historians of the age considered the defining moment of human history.
The Book of Revelations clearly states that the Antichrist would be locked up for a thousand years before being allowed to briefly rule the Earth. The use of the Christian era naming system allowed everybody to get duly upset around the year 1000, and widespread confusion was sowed.
The Christian naming system was invented by Dionysius Exiguus, a monk living in Rome. Exactly why he thought that the year that he published his system was the 525th since Christ’s birth is not known, but his counting has been used ever since.
It is commonly said that Dionysius made a four-year mistake and that Christ was born in 4 BC. Matthew tells how Herod the Great tried to murder baby Jesus in the slaughter of the innocents, and Herod died in 4 BC. Thus, Christ must have been born in that year at the latest.
Matthew’s story is probably pious nonsense. He’s the least reliable evangelist historically speaking, writing 150 years after the events, and the others gospels do not mention the killing of the innocents. Therefore it’s very well possible that Dionysius was right after all.
Nonetheless, Christ is usually dated as having lived from 4 BC to 30 AD, for a total of thirty three years.
Before Dionysius introduced his reform, people used the old Roman system, in which every year was named after its two consuls.
After the Romans had discarded their monarchy in 509 BC they were forced to stop using regnal years. They needed a new naming system, and they decided to allow their two chief magistrates, the consuls, to give their names to the year.
Thus, “in the consulate of Cn. Pompeius Magnus and M. Licinius Crassus Dives” is a historically valid alternative to “70 BC.” In fact, BC or AD years may be considered a convenient shorthand for the “semantically” more correct consular years.
Although the consuls lost all political power after Augustus founded the Empire in 27 BC, the title was still given out to aristocrats who’d deserved a plum, as wel as to the Emperor himself, until the office was abolished in 541 AD. The consuls continued to give their names to the year. (In return they were graciously allowed to squander their fortunes on organising circus games.)
Modern historians have mapped consular years to Christian ones, and have established lookup tables. The ancient historian Dionysius of Halicarnassus has carefully mapped Greek history to Roman consular years. (He may have made mistakes, but if he did we’re not in a position to find out. We must accept his chronology.)
Thus the Roman consular years give us a common naming system (a namespace, if you wish) for 1049 years of Greek, Roman, and early Medieval history. This naming system can be combined with the Christian one to give us a more-or-less reliable chronology going back about 2,500 years.
The last eight hundred or so of the Roman consular years are universally accepted as historically reliable. In the more uncertain first two hundred years, even the most radical reviser proposes a shift of eight years at most, so across the centuries this naming system remained pretty reliable.
300 BC is the earliest year that we can map with complete accuracy; i.e. we can say with certainty that the consulate of M. Valerius Maximus Corvus for the fifth time and Q. Appuleius Pansa occurred exactly 2308 years before the present time.
For this reason among others, breaking off chronology at 1 AD, as HTML5 proposes, is pure nonsense. We’d miss out on another three hundred years of perfectly good, historically and arithmetically valid chronology.
The year 0 does not exist. The consulate of Cossus Cornelius Lentulus and L. Calpurnius Piso, which we call “1 BC, ” was directly followed by the consulate of C. Caesar and L. Aemilius Paullus, which we call “1 AD.”
Emperor Augustus ruled from 27 BC to 14 AD, and that’s a reign of 40 years, not 41.
While we’re on the subject of Augustus, he was also responsible for finally setting down a regular pattern of one leap year per four years. The first leap year in the new system was the consulate of S. Aelius Catus and C. Sentius Saturninus, and because we happen to call that year “4 AD” we’ve grown used to thinking that a leap year must necessarily be divisible by four.
The Roman consular lists, the fasti consulares, have been preserved in several versions. The standard version is the one created during Augustus’ rule by the historian M. Terentius Varro and set up on the Forum Romanum to serve as a public calendar. It was dutifully updated every year.
Because these fasti were moved to the Capitol after being excavated in the 16th century, they’re known as the Capitoline Fasti.
Historians are pretty certain that some errors have crept into the Capitoline Fasti. The first problematic year is 301 BC, when the Capitoline Fasti say a dictator was appointed instead of consuls. Although this was allowed under the Roman constitution, dictatorships are quite rare, and this particular one is not mentioned in any other source. Therefore, modern historians have concluded, this dictatorship never actually took place.
Thus the consulate of M. Livius Denter and M. Aemilius Paullus, which we call “302 BC,” was directly followed by the consulate of M. Valerius Maximus Corvus for the fifth time and Q. Appuleius Pansa, which we call “300 BC.”
There are several such problematic years in the Capitoline Fasti before 300 BC. Unfortunately historians disagree on some of these cases, and therefore they’ve decided to follow Varro’s system anyway, warts and all. Years from 509 to 301 BC are called “Varronian years,” and about three to eight of them have never existed.
Therefore Christian years before 300 BC are names, and not numbers, and MUST NOT be used for arithmetic operations.
Varronian years are still being used in history books. If we say that “the Greeks defeated the Persian navy in 480 BC at Salamis, and the Persian army in 479 BC at Plataea” we’re using Varronian years.
Historians are pretty certain that these dates are in fact three to eight years off, and that we cannot say that the battle of Salamis took place exactly 2488 before the present year. It’s more in the order of somewhere between 2480 and 2485 — probably.
Nonetheless, it does not make sense to say the battle of Salamis occurred anywhere from 477 to 472 BC. All history books say “480 BC,” and people (as well as chronological search systems) would get confused if we did anything else. We MUST continue to use the Varronian year.
So the example has to be marked up as:
The Greeks defeated the Persian navy in <time datetime="-480" yearNames="Varronian">480 BC</time>, and the Persian army in <time datetime="-479" yearNames="Varronian">479 BC</time>
More-or-less reliable chronology starts with the consulate of L. Iunius Brutus and L. Tarquinius Collatinus, Varronian year “509 BC.” All dates before 509 BC are educated guesswork at best. As we go further back in time the guesswork increases at the expense of the education.
Sure, you’ll often encounter earlier dates, but these have been painstakingly reconstructed by both ancient and modern historians, and there’s simply no way we can tell whether they’re right or wrong.
That’s the historians’ problem. Nonetheless, a history-compatible implementation of <time>
MUST allow an arbitrary year-naming system to be specified. (The actual mapping of such a system to the consular/Christian system is a problem for historians; not for spec writers. If an historian would use his own system, he’d be responsible for creating lookup tables.)
An example will show why this is necessary.
It is my personal belief that the so-called “First Dark Ages,” traditionally dated from 1200 to 800 BC, have never existed; i.e. 1200 BC was the same year as 800 BC (roughly speaking, of course).
I also feel that earlier chronology is a mess wrongly based on the so-called Thirty Dynasties of Egypt scheme of the historian Manetho, whose work is almost completely lost, and who wrote in a time when Alexander the Great’s rapacious successors were trying to outdo each other in bragging about the venerable antiquity of the people they were exploiting. Egyptian and Mesopotamian chronology thus became a tool in a propaganda war, and it has never recovered.
As a result, I think that the Egyptian and Mesopotamian monarchies developed a few centuries later than is generally assumed.
I also think that Egyptian chronology has serious defects and should be re-thought from the ground up. (Especially the fact that the XXII dynasty was a priestly one concurrent with the last native dynasty, the Persians, and the Ptolemies ought to be recognised.)
Since all other ancient chronologies are based on the Egyptian one, this would have far-reaching consequences.
This opinion is not popular in historical circles; in fact most professional historians of the age will hotly defend the First Dark Ages and their painstakingly created chronologies. That’s fine; since I’m the challenger I have to prove my challenge by doing some research.
The point is that in order to mark up my research in HTML I’d have to create my own year-naming system, while also using the year-naming systems that are currently in use among historians. To make matters more complex, most Ancient Near East chronologies have a high, a middle, and a low variant (all of which are wrong, in my opinion).
In other words, I have to be very careful to specify which year-naming system a particular <time>
element belongs to. I also have to be able to denote years belonging to my own chronological theory. Thus, I MUST be able to invent a value for the attribute I’ve called yearNames
.
(Incidentally, this research would greatly benefit from a centralised chronological system I could plug in to to automatically convert dates from other systems to my proposed system. I’d have to create lookup tables (or maybe even arithmetic operations), but once I’d have done that, I’d be able to move the onus of recalculating centuries of history to an automatic system. Now THAT would be a benefit of a history-safe <time>
element!)
<time>
safe for historiansThis short treatment of ancient chronology highlights only a few of the most important problems, and it doesn’t even try to cover non-European civilisations. More study is clearly necessary.
In conclusion, making <time>
safe for historians is not an easy job, and, as I said at the start, the question is whether it should be attempted at all. I hope to have given you some useful information that will allow you to take a position on this question.
This is the blog of Peter-Paul Koch, web developer, consultant, and trainer.
You can also follow
him on Twitter or Mastodon.
Atom
RSS
If you like this blog, why not donate a little bit of money to help me pay my bills?
Categories:
Comments are closed.
1 Posted by bruce on 6 April 2009 | Permalink
It seems obvious to me that HTML 5 time *should* be history-safe; the potential for mash-ups to pull data together is so useful.
But it ain't gonna happen; firstly, because of all the problems that you discuss and because the element is designed only for events that you'd want to be able to pull from the web into a calendar, eg a conference or birthday or something.
2 Posted by Matt Large on 6 April 2009 | Permalink
This issue is far too large for it to be solved in a HTML spec. I to would love for some "historian safe" implementation, but I just don't think it can happen. Not only would this be a massive issue for the browsers to have to deal with but very few web frameworks/databases would be able to support such a schema.
Some years back I worked on a website for the British Museum and we had to abandon most of the usual date/time formats, database fields, internal code types in order to allow for a larger range of dates and date processing and even then it was no where near as complete as any real historian would require.
Date/time is one of the few major issues with the real life/computer processing interface that developers will face again and again.
3 Posted by Erik on 6 April 2009 | Permalink
From what I know of XML, it seems like a namespace would make more sense than a "calendar" attribute.
something along the lines of
<Julian:time datetime="1918-02-06">24 January 1918</time>
then again, I don't know if the HTML spec is okay with namespaces.
4 Posted by Jaap on 6 April 2009 | Permalink
Funnily even the dates used for special (upcoming) events have hairy calendar problems.
There's the Chinese and the Hebrew Calendar. Both are used to marking special ocasions, like new years and religious events.
While easily converted into Georgian some of the same arguments raised in this article hold true for these calendars.
5 Posted by Jemaleddin on 6 April 2009 | Permalink
First of all: love the article. Fascinating, well-written, and clear. I feel like a better person having read it.
Secondly: eh.
I think the best thing to do is to allow a wide variety of correct inputs (to datetime, yearname and calendar) as you suggest and let programmers do the best they can with various heuristics and drop everything else on the floor.
Since the number of historians who know HTML well enough to give a damn about HTML5 is rapidly approaching 12, and the number that will ever hear of the time attribute will for the next 50 years approach a limit of 100, I think this is a real problem that will forever be searching for an implementation (of the problem). Getting people to use ANY metadata, even the wrong metadata, is a bigger issue than the format that they choose to store their wrong metadata in.
Yes?
6 Posted by ppk on 6 April 2009 | Permalink
@Jemaleddin I tend to agree with your numbers. The further I got with this article, the more it became a historical article because I felt like one, instead of a critique of the time element.
But you seem to opt for the No variant, which is perfectly fine by me, provided you accept my cut-off date or propose one yourself.
7 Posted by Jemaleddin on 6 April 2009 | Permalink
@ppk: Actually, I agree with your "Yes" variant, but with as little baked-in guidance as possible. Your suggestions for calendar and yearNames (needs a better name, no?) tags sounds like a great idea to me. - let's just make sure that we don't do something stupid like specify exactly 3 options for calendar - leave it open, yeah?
8 Posted by Sander Aarts on 6 April 2009 | Permalink
Interesting article, although it's still a bit abstract to me.
I think it would be nice if can be made 'safe' for historians, but the need for the yearNames attribute seems more like a personal wish of yours than a real need (or maybe I just don't understand what you mean, which could very well be the case).
If you present a new theory on certain periods in history, defining a new chronological order, and this theory becomes mainsteam a new calendar needs to be added (assuming the element will be made 'safe' for historians). Not before this acceptance.
Linguists don't need an attribute to specify possible languages either ;-)
9 Posted by Jaap on 6 April 2009 | Permalink
The calendar attribute seems to be treading into java.util.Calendar territory.
Sun provides an implementation of GregorianCalendar which is also an implementation of the Julian Calendar if the date is before October 15, 1582 or any other Date depending on the set setGregorianChange().
This date stuff is a complete mindf*ck and hasn't been solved in most programing languages (I know of)... I don't think HTML5 is the place to solve it.
I really love this article though PPK. I'm glad you didn't scrap it after the hairiness of the problem fully manifested itself to you.
10 Posted by Brooks on 6 April 2009 | Permalink
Great article, PPK. You should consider trying to publish it (or an expanded version) in a history or programming academic journal. Now, on to my personal experience with a similar issue.
I'm in the process of writing software to manage death records, among other things, and the HORRID support for dates before 1970 in pretty-much all programming languages is a constant source of frustration. Hopefully, articles like this will not only spur debate and development in HTML, but also in language development/computer science community as a whole.
Thanks, again, for all your hard work on this.
11 Posted by pbhj on 7 April 2009 | Permalink
I concur, great article - this is going to be a canonical reference on HTML5 dating problems for al 12 of those people (@Jemaleddin) I'm sure ;0)
What are we going to do when we have folks living on Mars, if we thought the calendar was messed up now ...
2 questions: what license is this text under & how much did you rely on Wikipedia??
12 Posted by Evert on 7 April 2009 | Permalink
Would it be possible to define an attribute, which specifies the calendar used.
If none is specified, assume gregorian, and define other calendars as you go (later spec?).
I believe it would be a good thing to build in this extensibility from the start, but it would allow you to, at least, defer the problem for now.
13 Posted by Ken on 7 April 2009 | Permalink
Great read. I think it is imperative to allow dates from thousands of years ago if nothing else but for Genealogy. Genealogy is an enormous and growing presence on the web. I've heard some say that it outranks porn in many cases.
I think all dates before 1582 don't really have much meaning as far as programmatic comparisons. I like the idea of specifying further attributes about dates for relative comparison, but there is really little need for absolute comparison for all dates everywhere. As you say, at some point BC, years can't even be pinned down.
As far as Genealogy, if you say that John Peterson was born 15 Jun 1502 and Jack Peterson was born on 23 Oct 1504, you can guess that they are more likely brothers than father son. But the Genealogist doesn't really care exactly how many days ago they were born.
14 Posted by Dustman on 7 April 2009 | Permalink
@Ken Suppose someone wanted to search those genealogies. How would a search engine find those dates if not programmatically? Comparison is essential to the mix. I'd personally like to find all mention of my Scottish ancestry from before my grandmother was born. How can a computer assist me if it can't determine if a date falls before this event?
15 Posted by bruce on 7 April 2009 | Permalink
I still think it's useful to be able to mark up historical dates, although your arguments for cutoff points make sense to me.
The thing is: people like timelines and mash-ups. If computers have to guess dates, by reading natural language, they get it wrong sometimes - for example, see Google's Timeline view which it suggests that Amiga computer existed in the year 1200. Actually, that's the model number:
http://www.google.com/views?q=commodore+amiga+view:timeline&vwdr=1200%20-%201299
If we had a method that allows unabigous marking up of ancient dates AND dates like "July 1245", such mash-ups would be much more accurate.
16 Posted by Sean B. Palmer on 7 April 2009 | Permalink
The easiest solution would be to write an article, when HTML10 is current, claiming that HTML5 never existed...
17 Posted by Chris Snyder on 7 April 2009 | Permalink
It seems incredibly (and shortsightedly) Euro-centric to enforce Gregorian dates. Why not just have an attribute that declares what calendar is being used for <time> in this instance?
As in <time calendar="julian">14 January 1912</time> or <time calendar="hebrew">13th of Nisan, 5769</time>.
You could argue that parsing these dates is harder, but it's actually much easier since you don't have to blindly guess that the author is using the calendar you expect her to use.
BT, very cruel to have a blog talking about html where you have to use entities to insert code in comments. :-p
18 Posted by Lars Gunther on 7 April 2009 | Permalink
This issue is indeed complicated. As you said, you have not even touched upon non European/Middle east history at all!
I therefore suggest that HTML 5 data_ attributes and microformats is the way to go for now. As there emerges cowpaths of usage, it can be incorporated into HTML 6.
Also: The major UA to consume the dates in questions are probably not browsers, but spiders and other server side scripts. They are equally well served by a microformat as by an HTML standard.
19 Posted by Jim O'Donnell on 7 April 2009 | Permalink
I left a comment yesterday but it seems to have been eaten as spam.
I'd recommend looking at dates and times in TEI, particularly the date element:
http://www.tei-c.org/Guidelines/P4/html/ref-DATE.html
My own suggestion for HTML5 was that HTML follow TEI, since TEI is already in use and understood by historians. While there might be only 12 historians who know HTML, there are a fair number who use XML to digitise historical texts :) So I don't think it's a big deal to ask historians to continue using the proleptic Gregorian calendar if they mark up historical dates in HTML. Dates predating the Julian calendar are still problematic. I'd be interested to know if calculators exist for mapping those to a proleptic calendar.
Uncertain dates would be really useful ie. dates consisting of just a year, range of years, or year and month. At work, we're already publishing photos on Flickr Commons which say 'Taken about 1900' or 'late 1870s' etc.
20 Posted by Nelson Menezes on 7 April 2009 | Permalink
I'll add to the compliments; a great article to read!
I agree with @Chris above... the euro-centricity of the spec is not good at the moment. The "calendar" attribute indeed makes a lot of sense, even if historical dates are not a concern.
Since having this element would probably solve (or allow to solve, outside the spec) historical date issues would be a good thing. By "ouside the spec" I mean that the spec itself should probably only require support for one calendar system, i.e. the "default" Gregorian one (which would not require the use of the "calendar" attribute). It should allow arbitrary calendar names and support for those would be outside the scope of the spec.
The above should do away with the need for the "yearNames" attribute, since your collection of names can be its own calendar (defined in the calendar attribute).
As for Easter and other "magic" dates, I'd say that's the responsibility of the author to translate that into an actual day in a given calendar, e.g.
<time datetime="675-04-17" calendar="Julian">Easter 675</time>
Finally, the "incomplete dates" suggestion makes perfect sense, and should be relatively straightforward regardless of "calendar" attribute.
21 Posted by Jim O'Donnell on 7 April 2009 | Permalink
By the way, one interesting application of ancient dates is Frank Stephenson's work on measuring the slowdown of the rotation of the Earth, by accurately dating Babylonian astronomical measurements:
http://books.google.com/books?id=DTb4DDuJNa4C&pg=PA128&lpg=PA128&dq=babylonian+eclipses+stephenson&source=bl&ots=WRYFJRSaND&sig=_7UiezYGxjAOvJAtDbJS8aanJCM
22 Posted by ppk on 7 April 2009 | Permalink
I find it odd that so many people propose to add a calendar attribute while ignoring the fact that my article proposes exactly that. Yes, we definitely need a calendar attribute, but please read my article first before commenting that we need it.
@Jemaleddin: Agree with the lack of baked-in guidance. You may specify any value you wish for either calendar or yearNames, but in return you must provide a lookup table or a formula.
@Jaap: Did the Java guys ever consider creating java.util.HistoricalCalendar to deal with old dates, leaving java.util.Calendar free to deal with dates from the last 50 years or so, which is much easier?
@Ken: Yes, genealogical sites would also benefit from a proper time element.
@bruce: Take a look at the Crusades map! Apparently the Crusaders conquered a town in Alaska.
@Jim O'Donnell: Interesting link, thanks. The certainty attribute might be a nice extra. They don't specify yearNames, though.
23 Posted by ppk on 7 April 2009 | Permalink
@Nelson Menezes: Calendars are different from year names.
Calendars define the names of the days of the year (and also other units such as months.)
Year names define the names of the years themselves.
They are not the same and we need both.
24 Posted by Jim O'Donnell on 7 April 2009 | Permalink
@ppk TEI has more complex structures for dates which can't be marked up with a simple date element
http://www.tei-c.org/Guidelines/P4/html/ND.html
I'm not sure how widely used that is. It might work for Roman consular dates, I'm not sure - the example given in the docs uses the French Revolutionary Calendar.
I've previously suggested a calendar attribute to the whatwg mailing list too, although I imagined it would operate more like lang in HTML, or calendar in TEI, and describe the calendar of the enclosed text. Something like [time datetime="1731-02-13" calendar="Julian"]2nd Feb 1730[/time] (since the old new year began in March).
25 Posted by Paul Rissen on 7 April 2009 | Permalink
An excellent and intriguing article. Definitely a valid problem, and I live in hope that there is some way that we can at least move towards some kind of Semantic Web that would be useful for historians as well as others. The thinking (and experimenting!) continues...
26 Posted by Nelson Menezes on 7 April 2009 | Permalink
@ppk, I cordially disagree. :) A calendar is a way to name days, weeks, months *and* years. Since there are no variations of scale between calendars (i.e. time is a constant; a point in one calendar maps to one point in another calendar) then you can map 12 August in "the second year of king Edward" like so:
<time datetime="0002-09-12" calendar="KingEdwardII">12th of August in the second year of king Edward</time>
or
<time datetime="1274-09-12" calendar="Julian">12th of August in the second year of king Edward</time>
For Roman dates where each year is individually named with no numeral sequence (as per your article), you could then use:
<time datetime="Cn. Pompeius Magnus and M. Licinius Crassus Dives-09-12" calendar="Roman">12th of August in the year of Cn. Pompeius Magnus and M. Licinius Crassus Dives</time>
or:
<time datetime="-0070-09-12" calendar="Julian">12th of August in the year of Cn. Pompeius Magnus and M. Licinius Crassus Dives</time>
Of course, you'd need the mapping tables between "Roman" and "Gregorian" calendars, but that should be outside the scope of HTML.
27 Posted by Nelson Menezes on 7 April 2009 | Permalink
One note on the "weird" name of the year in my example above... this would probably be applicable to other elements of the calendar as well (months, days, hours, etc), and it becomes entirely the responsibility of the calendar type to map that to a specific point in time.
28 Posted by Ken on 8 April 2009 | Permalink
@Jim O'Donnell, I really like the ideas in TEI. I think the "certainty" and "exact" attributes would be helpful for date searching and would be easily stored in a database. It would probably take some fancy Joe Celko ( http://www.celko.com/books.htm ) SQL queries for such comparisons, but it would be possible to do searches on fuzzy dates such as the BC ones ppk mentions.
The TEI also seems to capture a lot of the concepts brainstormed here such as "calendar", "zone", and "type" attributes and the , , , , and elements.
The TEI shows how complex a simple system of programmatic calendaring can be; I think having that level of complexity available to HTML when needed would be a great advantage.
29 Posted by Ken on 8 April 2009 | Permalink
the <occasion>, <dateRange>, <timeRange>, <distance>, and <offset> elements.
30 Posted by RichB on 8 April 2009 | Permalink
> BC dates (negative dates, in other words) to be specified
Obviously, Year 0 is invalid.
For historical purposes, there's no better treatment than the GEDCOM date format - the standard genealogical interchange format:
http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#DATE_CALENDAR
31 Posted by Jarrod on 8 April 2009 | Permalink
Before I get into my comment, I am going to define a couple of terms, for the purposes if this discussion:
A _Translation_ is the act of taking a date in one calendar, and (attempting to) express it in a different calendar. Your example of Proleptic Gregorian vs Julian dates in an example of this.
A _Transformation_ is the act if taking a date in one calendar, and (attempting to) express it in a different form in the same calendar. Your example above would be transforming a date "In the X year of King Y" into the familiar four-digit year form. Another example - and one I think the designers of HTML are more interested in - is the transformation of timezones.
Both tasks are much more in the mind of the HTML designers as "how do I specify the time of the party?" than historical dates.
I think your idea discusses both - calendar= and yearNames= roughly map. The issue is - if your browser encounters a time it doesn't know how to deal with, what should it do?
In order to have arbitrary calendars, the only way I can think of is ugly: [time datetime="2009-01-01" calendar="http://www.w3c.org/times/gregorian1.time "]New Years Day 2009[/time]
The file has some statement on how to translate/transform its times, and which are legal.
32 Posted by James Ojaste on 8 April 2009 | Permalink
I think it's a mistake to straight-up forbid math before a certain cutoff date.
Let's not forget that there are other fields of human endeavor that can produce dates before 300 BC (chemistry, astronomy, and geology to name a handful). The further back you go, the more imprecision you'll find (unless you're dealing with a theoretical exercise - and we should be able to mark those up as well!), but just as science has standardized on a way to signify precision in other measurements (using scientific notation and significant digits), there is no reason we can't do something similar for temporal data.
It would be nice to be able to specify a relative "zero-point" from which a date is specified. In most cases, it would be implicit - a calendar="Gregorian" attribute would place a mythical zero-point a little over 2000 years in the past. The encoder could then adjust the point of interest with something like relativeTo="2009-04-08". They could then refer to yesterday with datetime="-1d" or next year with datetime="+1a".
33 Posted by Jay Fienberg on 9 April 2009 | Permalink
Great article--and I like your proposals about how to extend the markup to be more useful.
The description of time is complex, and besides the TEI document type mentioned above, references like Anglo-American Cataloguing Rules could be important precedents for marking up a variety of date / time data--e.g., dates in library catalogs represent a variety of marked-up structures that are already in wide-spread use in hypertext-like information systems.
It's not uncommon to see dates in a library catalog like "April 200?," or "March [3,4], 1974," or "before 1960" for example. And, those can be used in timelines / sorts, to echo your comments about ancient dates.
Ideas about making "datetimes that are human readable also formally machine readable" are superficial if they fail to recognize "how" they might be understood. There's no such thing as "semantic markup of time," rather there are many semantics. And, so it's great to see you asking: which semantics can and should be marked-up in HTML 5, and for whom are they intended to be useful for what purposes?
34 Posted by Mark on 9 April 2009 | Permalink
Echoing others, very nice article, looks like you've really put the effort in.
I really don't know what can/should be done about this. I agree that actual historians using the time html5 element will be very few on the ground. However, it would be great to be able to properly search by time. That'd enable a whole heap of things not even historical-related.
I might be in the "No" group - pick a date (I like the idea of basing it on when the Gregorian calendar came into world-wide use), and cut-off there.
Perhaps this dilemma should be solved in others ways than a HTML spec (as others have mentioned)?
35 Posted by Deron Meranda on 13 April 2009 | Permalink
Has anybody considered a non-calendar centric machine-readable format like JDN or MJDN?
http://en.wikipedia.org/wiki/Julian_day
They are concise, unambiguous, and not tied to any specific calendar or cut-off date.
36 Posted by Armin Hanisch on 17 April 2009 | Permalink
Agreeing with Deron, IMHO there's a pointsfor a uninterrupted reference timeline (were Julian date is a good candidate giving a "cut-off date" of 4713BC!). What I do miss in the discussion is the need for a location reference to allow automatic processing of time elements, as for instance "August 10th, 1612" does not make sense without stating the location (it took until 1918 for the Gregorian calendar to become accepted in Europe), especially as we do have dates like February 30 (which is a perfectly valid date, if the "yearName" is 1712 and the country is Sweden. Just my 2 cents ...