Web Client Programming with PerlAutomating Tasks on the WebBy Clinton Wong1st Edition March 1997 This book is out of print, but it has been made available online through the O'Reilly Open Books Project. |
Appendix B
Reference TablesThis appendix contains several tables that will be useful when negotiating HTTP content. Covered in this appendix are:
- Media Types
- Whenever an entity-body is sent via HTTP, a media type must be sent using the Content-type header. Also, web clients can use the Accept header to define which media types the client can handle.
- Character Encoding
- In URL-encoded data (as described in Chapter 3, Learning HTTP), any "special" characters such as spaces and punctuation must be encoded with a % escape sequence.
- Languages
- Entity-bodies can be sent with a Content-language header, to declare what language the entity is written in. Clients can declare which languages they can handle, using the Accept-language header.
- Character Sets
- Clients can use the Accept-charset header to declare which character sets they are capable of handling.
Media Types
Listed below are media types that are registered with the Internet Assigned Number Authority (IANA). According to the HTTP specification, use of nonregistered media types is discouraged.
The IANA media list is available in RFC 1700. A more readable document describing the assigned media types is available at ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/.
A variety of methods is used to identify the media type of a document. The easiest method, but the least accurate, is to map well-known file extensions with a media type. For example, a file that ends in ".GIF" would map to "image/gif". However, in usual practice, there is no verification that the file is in fact a GIF file.
A more accurate method would examine the structure or data format of the file and map it to a media type. For some media types, magic numbers allow this to happen. For example, all GIF files begin with the three uppercase letters of GIF, and all JPEG files begin with 0xFFD8 (hexadecimal notation). This method, however, is more time consuming.
Under some filesystems, media types may be mapped by examining the file type/creator attribute of the file. While this is easily achieved under MacOS's HFS, other filesystems (DOS, NTFS, BSD) do not have these file attributes.
Table B-1: Internet Media Types Type
Subtype
text
plain
text
richtext
text
enriched
text
tab-separated-values
text
html
text
sgml
multipart
mixed
multipart
alternative
multipart
digest
multipart
parallel
multipart
appledouble
multipart
header-set
multipart
form-data
multipart
related
multipart
report
multipart
voice-message
message
rfc822
message
partial
message
external-body
message
news
message
http
application
octet-stream
application
postscript
application
oda
application
atomicmail
application
andrew-inset
application
slate
application
wita
application
dec-dx
application
dca-rft
application
activemessage
application
rtf
application
applefile
application
mac-binhex40
application
news-message-id
application
news-transmission
application
wordperfect5.1
application
application
zip
application
macwriteii
application
msword
application
remote-printing
application
mathematica
application
cybercash
application
commonground
application
iges
application
riscos
application
eshop
application
x400-bp
application
sgml
application
cals-1840
application
vnd.framemaker
application
vnd.mif
application
vnd.ms-excel
application
vnd.ms-powerpoint
application
vnd.ms-project
application
vnd.ms-works
application
vnd.ms-tnef
application
vnd.svd
application
vnd.music-niff
application
vnd.ms-artgalry
application
vnd.truedoc
application
vnd.koan
image
jpeg
image
gif
image
ief
image
g3fax
image
tiff
image
cgm
image
naplps
image
vnd.dwg
image
vnd.svf
image
vnd.dxf
audio
basic
audio
32kadpcm
video
mpeg
video
quicktime
video
vnd.vivo
Character Encoding
When the client sends data to a CGI program using the Content-type of application/x-www-form-urlencoded, certain special characters are encoded to eliminate ambiguity. Table B-2 shows which characters are transformed and which are not transformed. For more information on URLs, see RFC 1738.
Table B-2: Character Encoding ASCII
Symbol
CGI representation
< 32
always encode with %xx where xx is the hexadecimal representation of the character
32
+ or %20
33
!
%21
34
"
%22
35
#
%23
36
$
%24
37
%
%25
38
&
%26
39
'
%27
40
(
%28
41
)
%29
42
*
*
43
+
%2B
44
,
%2C
45
-
-
46
.
.
47
/
%2F
48
0
0
49
1
1
50
2
2
51
3
3
52
4
4
53
5
5
54
6
6
55
7
7
56
8
8
57
9
9
58
:
%3A
59
;
%3B
60
<
%3C
61
=
%3D
62
>
%3E
63
?
%3F
64
@
%40
65
A
A
66
B
B
67
C
C
68
D
D
69
E
E
70
F
F
71
G
G
72
H
H
73
I
I
74
J
J
75
K
K
76
L
L
77
M
M
78
N
N
79
O
O
80
P
P
81
Q
Q
82
R
R
83
S
S
84
T
T
85
U
U
86
V
V
87
W
W
88
X
X
89
Y
Y
90
Z
Z
91
[
%5B
92
\
%5C
93
]
%5D
94
^
%5E
95
_
_
96
`
%60
97
a
a
98
b
b
99
c
c
100
d
d
101
e
e
102
f
f
103
g
g
104
h
h
105
i
i
106
j
j
107
k
k
108
l
l
109
m
m
110
n
n
111
o
o
112
p
p
113
q
q
114
r
r
115
s
s
116
t
t
117
u
u
118
v
v
119
w
w
120
x
x
121
y
y
122
z
z
123
{
%7B
124
|
%7C
125
}
%7D
126
~
%7E
127
%7F
> 127
always encode with %xx where xx is the hexadecimal representation of the character
Languages
A language tag is of the form of:
<primary-tag> <-subtag>
where zero or more subtags are allowed. The primary-tag specifies the language, and the subtag specifies parameters to the language, like dialect information, country identification, or script variations. RFC 1766 contains the complete documentation of languages and parameter usage. The key values for the primary-tag and subtag are outlined in Tables B-3 and B-4, respectively.
Examples:
- de
- (German)
- en
- (English)
- en-us
- (English, USA)
Table B-3 lists the primary langauge tags as defined in ISO 639 and RFC 1766.
Table B-3: Primary Language Types Primary Tag
Language
aa
Afar
ab
Abkhazian
af
Afrikaans
am
Amharic
ar
Arabic
as
Assamese
ay
Aymara
az
Azerbaijani
ba
Bashkir
be
Byelorussian
bg
Bulgarian
bh
Bihari
bi
Bislama
bn
Bengali; Bangla
bo
Tibetan
br
Breton
ca
Catalan
co
Corsican
cs
Czech
cy
Welsh
da
Danish
de
German
dz
Bhutani
el
Greek
en
English
eo
Esperanto
es
Spanish
et
Estonian
eu
Basque
fa
Persian
fi
Finnish
fj
Fiji
fo
Faeroese
fr
French
fy
Frisian
ga
Irish
gd
Scots, Gaelic
gl
Galician
gn
Guarani
gu
Gujarati
ha
Hausa
he
Hebrew
hi
Hindi
hr
Croatian
hu
Hungarian
hy
Armenian
ia
Interlingua
id
Indonesian
ie
Interlingue
ik
Inupiak
is
Icelandic
it
Italian
iu
Inuktitat
iw
Hebrew
ja
Japanese
jw
Javanese
ka
Georgian
kk
Kazakh
kl
Greenlandic
km
Cambodian
kn
Kannada
ko
Korean
ks
Kashmiri
ku
Kurdish
ky
Kirghiz
la
Latin
ln
Lingala
lo
Laothian
lt
Lithuanian
lv
Latvian, Lettish
mg
Malagasy
mi
Maori
mk
Macedonian
ml
Malayalam
mn
Mongolian
mo
Moldavian
mr
Marathi
ms
Malay
mt
Maltese
my
Burmese
na
Nauru
ne
Nepali
nl
Dutch
no
Norwegian
oc
Occitan
om
(Afan) Oromo
or
Oriya
pa
Punjabi
pl
Polish
ps
Pashto, Pushto
pt
Portuguese
qu
Quechua
rm
Rhaeto-Romance
rn
Kirundi
ro
Romanian
ru
Russian
rw
Kinyarwanda
sa
Sanskrit
sd
Sindhi
sg
Sangro
sh
Serbo-Croatian
si
Singhalese
sk
Slovak
sl
Slovenian
sm
Samoan
sn
Shona
so
Somali
sq
Albanian
sr
Serbian
ss
Siswati
st
Sesotho
su
Sudanese
sv
Swedish
sw
Swahili
ta
Tamil
te
Tegulu
tg
Tajik
th
Thai
ti
Tigrinya
tk
Turkmen
tl
Tagalog
tn
Setswana
to
Tonga
tr
Turkish
ts
Tsonga
tt
Tatar
tw
Twi
ug
Uigar
uk
Ukrainian
ur
Urdu
uz
Uzbek
vi
Vietnamese
vo
Volapuk
wo
Wolof
xh
Xhosa
yi
Yiddish
yo
Yoruba
za
Zhuang
zh
Chinese
zu
Zulu
Table B-4 lists the language subtypes as defined in ISO 3166.
Table B-4: Language Subtypes Subtype
Country
AD
Andorra
AE
United Arab Emirates
AF
Afghanistan
AG
Antigua and Barbuda
AI
Anguilla
AL
Albania
AM
Armenia
AN
Netherland Antilles
AO
Angola
AQ
Antarctica
AR
Argentina
AS
American Samoa
AT
Austria
AU
Australia
AW
Aruba
AZ
Azerbaidjan
BA
Bosnia-Herzegovina
BB
Barbados
BD
Bangladesh
BE
Belgium
BF
Burkina Faso
BG
Bulgaria
BH
Bahrain
BI
Burundi
BJ
Benin
BM
Bermuda
BN
Brunei Darussalam
BO
Bolivia
BR
Brazil
BS
Bahamas
BT
Buthan
BV
Bouvet Island
BW
Botswana
BY
Belarus
BZ
Belize
CA
Canada
CC
Cocos (Keeling) Isl.
CF
Central African Rep.
CG
Congo
CH
Switzerland
CI
Ivory Coast
CK
Cook Islands
CL
Chile
CM
Cameroon
CN
China
CO
Colombia
CR
Costa Rica
CS
Czechoslovakia
CU
Cuba
CV
Cape Verde
CX
Christmas Island
CY
Cyprus
CZ
Czech Republic
DE
Germany
DJ
Djibouti
DK
Denmark
DM
Dominica
DO
Dominican Republic
DZ
Algeria
EC
Ecuador
EE
Estonia
EG
Egypt
EH
Western Sahara
ES
Spain
ET
Ethiopia
FI
Finland
FJ
Fiji
FK
Falkland Isl. (Malvinas)
FM
Micronesia
FO
Faroe Islands
FR
France
FX
France (European Ter.)
GA
Gabon
GB
Great Britain (UK)
GD
Grenada
GE
Georgia
GH
Ghana
GI
Gibraltar
GL
Greenland
GP
Guadeloupe (Fr.)
GQ
Equatorial Guinea
GF
Guyana (Fr.)
GM
Gambia
GN
Guinea
GR
Greece
GT
Guatemala
GU
Guam (US)
GW
Guinea Bissau
GY
Guyana
HK
Hong Kong
HM
Heard & McDonald Isl.
HN
Honduras
HR
Croatia
HT
Haiti
HU
Hungary
ID
Indonesia
IE
Ireland
IL
Israel
IN
India
IO
British Indian O. Terr.
IQ
Iraq
IR
Iran
IS
Iceland
IT
Italy
JM
Jamaica
JO
Jordan
JP
Japan
KE
Kenya
KG
Kirgistan
KH
Cambodia
KI
Kiribati
KM
Comoros
KN
St. Kitts Nevis Anguilla
KP
Korea (North)
KR
Korea (South)
KW
Kuwait
KY
Cayman Islands
KZ
Kazachstan
LA
Laos
LB
Lebanon
LC
Saint Lucia
LI
Liechtenstein
LK
Sri Lanka
LR
Liberia
LS
Lesotho
LT
Lithuania
LU
Luxembourg
LV
Latvia
LY
Libya
MA
Morocco
MC
Monaco
MD
Moldavia
MG
Madagascar
MH
Marshall Islands
ML
Mali
MM
Myanmar
MN
Mongolia
MO
Macau
MP
Northern Mariana Isl.
MQ
Martinique (Fr.)
MR
Mauritania
MS
Montserrat
MT
Malta
MU
Mauritius
MV
Maldives
MW
Malawi
MX
Mexico
MY
Malaysia
MZ
Mozambique
NA
Namibia
NC
New Caledonia (Fr.)
NE
Niger
NF
Norfolk Island
NG
Nigeria
NI
Nicaragua
NL
Netherlands
NO
Norway
NP
Nepal
NR
Nauru
NT
Neutral Zone
NU
Niue
NZ
New Zealand
OM
Oman
PA
Panama
PE
Peru
PF
Polynesia (Fr.)
PG
Papua New Guinea
PH
Philippines
PK
Pakistan
PL
Poland
PM
St. Pierre & Miquelon
PN
Pitcairn
PT
Portugal
PR
Puerto Rico (US)
PW
Palau
PY
Paraguay
QA
Qatar
RE
Reunion (Fr.)
RO
Romania
RU
Russian Federation
RW
Rwanda
SA
Saudi Arabia
SB
Solomon Islands
SC
Seychelles
SD
Sudan
SE
Sweden
SG
Singapore
SH
St. Helena
SI
Slovenia
SJ
Svalbard & Jan Mayen Isl.
SK
Slovak Republic
SL
Sierra Leone
SM
San Marino
SN
Senegal
SO
Somalia
SR
Suriname
ST
St. Tome and Principe
SU
Soviet Union
SV
El Salvador
SY
Syria
SZ
Swaziland
TC
Turks & Caicos Islands
TD
Chad
TF
French Southern Terr.
TG
Togo
TH
Thailand
TJ
Tadjikistan
TK
Tokelau
TM
Turkmenistan
TN
Tunisia
TO
Tonga
TP
East Timor
TR
Turkey
TT
Trinidad & Tobago
TV
Tuvalu
TW
Taiwan
TZ
Tanzania
UA
Ukraine
UG
Uganda
UK
United Kingdom
UM
US Minor Outlying Isl.
US
United States
UY
Uruguay
UZ
Uzbekistan
VA
Vatican City State
VC
St.Vincent & Grenadines
VE
Venezuela
VG
Virgin Islands (British)
VI
Virgin Islands (US)
VN
Vietnam
VU
Vanuatu
WF
Wallis & Futuna Islands
WS
Samoa
YE
Yemen
YU
Yugoslavia
ZA
South
ZM
Zambia
ZR
Zaire
ZW
Zimbabwe
Character Sets
Table B-5 lists the character sets that may be used with the Accept-language and Content-language HTTP headers. This list does not describe all of the possible character sets of international languages that can appear in the headers. For a comprehensive list of character sets, their aliases, and pointers to more descriptive documents, refer to RFC 1700.
Table B-5: Character Sets Character Sets
Language
Source
US-ASCII
American Standard Code for Information Exchange
RFC 1345
ISO-8859-1
Latin Alphabet No. 1
RFC 1345
ISO-8859-2
Latin Alphabet No. 2
RFC 1345
ISO-8859-3
Latin Alphabet No. 3
RFC 1345
ISO-8859-4
Latin Alphabet No. 4
RFC 1345
ISO-8859-5
Latin/Cyrillic Alphabet
RFC 1345
ISO-8859-6
Latin/Arabic Alphabet
RFC 1345
ISO-8859-7
Latin/Greek Alphabet
RFC 1345
ISO-8859-8
Latin/Hebrew Alphabet
RFC 1345
ISO-8859-9
Latin Alphabet No. 5
RFC 1345
ISO-2022-JP
Japanese
RFC 1468
ISO-2022-JP-2
Extension of Japanese in ISO-2022-JP
RFC 1554
ISO-2022-KR
Korean
RFC 1557
UNICODE-1-1
Unicode for MIME
RFC 1641
UNICODE-1-1-UTF-7
7-bit UCS Transformation Format
RFC 1642
UNICODE-1-1-UTF-8
8-bit UCS Transformation Format
N/A
Back to: Chapter Index
Back to: Web Client Programming with Perl
© 2001, O'Reilly & Associates, Inc.
[email protected]