|
|
Funny File Names |
|
|
Funny File Names ... and the Ugly URLs that Love Them .. I mean, that by Mitch Marks
For an illustration of the range of allowable names, take a look at a listing of the folder this page comes from: http://cuip.uchicago.edu/~mitchell/funny-file-names/ . Some of those characters (space, newline, tab, control characters) are illegal, however, within URLs. Others are either strictly or by convention used only at particular places within URLs (tilde, question mark, ampersand), or always stand for something special (percent). If we've ended up for some reason stuck with a file whose name is oddly made in one or more of those ways, and a URL needs to make use of the filename, but the URL can't contain those characters (or not in precisely the way they appear in the file), then how can we link to those files? Or are they just inaccessible? The answer is that a URL can contain an encoding for just about any character, well beyond the possibilities when there is no encoding and no way to refer to a character except by using it. The encoding is always a percent sign (%) followed by two hexadecimal digits. A hexadecimal digit is either a regular digit, in the range 0-9, or a letter from near the alphabet, in the range of a-f. When used in the percentsign-encoding in a URL, hex digits that are letters can be either upper or lower case. Even when the code begins with a zero, both hex digits are used. So for each encoded character there are exactly three characters together representing it in the encoding: a percent sign and then two hexadecimal digits. The two-digit hex codes are given in the third and seventh columns in the table near the end of this page -- the ones headed 'Hex'. With the percent sign to signal that an encoding is being used, the way to encode a space in a file's name into a URL for that file would be as '%20', since the table shows 20 as the hex for SPACE. Similarly, a TAB gets encoded as '%09', a tilde as '%7E', and a percent sign itself as '%25'. (A percent sign in a URL always signals the encoding is going on, and never simply stands for a percent sign in the file name.) If you've gotten stuck with some files with funny names, and are making a link to one of them in some other page, your web editor should handle the URL percent encoding on its own, if you select the file to link to through some "pick file" popup in the editor program. So understanding the encoding is mostly just going to help you understand what's going on with that, and is not something you're going to need to use actively. Still, now that you know the system, you'll be able to handle it if some percent-encoded URL needs tinkering with. (Of course, a better way to solve that kind of problem would be to go back and rename those files to something less tricky.) You should also feel enabled to overrule an over-strenuous editing program which insists on percent-encoding If you go back to the file listing of this directory, http://cuip.uchicago.edu/~mitchell/funny-file-names/, when you click on one of the files you can see in your browser's location bar or web-address space what the encoding is that it has arranged to handle that file by. If a similar encoding were in a link, that would provide a way to lnk to the oddly-named file in a portable way. You can check whether the encoding shown by your browser accords with what you could manually construct using the table. Here are some of them: http://cuip.uchicago.edu/~mitchell/funny-file-names/name%20with%20spaces.htm for the file called "name with spaces.shtm"
We've seen that filenames can have percent signs. Sometimes these files get created unintentionally when some web editor or ftp program "sees" an encoded HTTP URL and mistakenly takes that to be the file name to upload and create on the server. Thus, if the user named the file "First Page.html", and the editor properly encoded it as "First%20Page.html" for linkage purposes, but then somehow the publish or ftp program (maybe aided and abetted by the user copying something that shouldn't be copied there) uploads the file but gives it the name "First%20Page.html". But that's not the file name that will be sought by the URL "First%20Page.html"! The URL "First%20Page.html" get the file "First Page.html" because a percent sign in a URL is always for encoding. Exercise for the reader: What would be the proper encoded URL for a file named "First%20Page.html". To get really fanciful, a percent-encoding is allowed anywhere in the file-path part of a URL, even if not needed. So (though there's no earthly reason to) we could if we wanted make a link spelled out as http://cuip.uchicago.edu/%77%69%74%2f%32%30%30%31 .(Can you predict before clicking where that goes?)
ASCII is the American Standard Code for Information Interchange. It
is a 7-bit The following table contains the 128 ASCII characters. Oct Dec Hex Char Oct Dec Hex Char
|
||