Blog
Ever wanted a specific directory in your site to be available only to people who you decide only? Ever got frustrated with the seeming holes in client-side options for this that allowed virtually anyone with enough skill to mess around in your source to get in? htaccess is the answer!
There are numerous methods to password protecting areas of your site, some server language based (such as ASP, PHP or PERL) and client side based, such as JavaScript. JavaScript is not as secure or foolproof as a server-side option, a server side challenge/response is always more secure than a client dependent challenge/response. htaccess is about as secure as you can or need to get in everyday life, though there are ways above and beyond even that of htaccess. If you aren’t comfortable enough with htaccess, you can password protect your pages any number of ways, and JavaScript Kit has plenty of password protection scripts for your use.
The first thing you will need to do is create a file called .htpasswd. The same idea behind naming the htaccess file itself, and you should be able to do that by this point. If not, go here. In the htpasswd file, you place the username and password (which is encrypted) for those whom you want to have access.
For example, let’s make a username and password of wsabstract (NEVER make the username the same as the password in real life), the htpasswd file would look like this:
1 | wsabstract:wsabstract |
After it was encrypted fromthe website I provided, it would look something like this:
1 | wsabstract:y4E7Ep8e7EYV |
Notice that it is UserName first, followed by the Password. There is a web based tool available for you to easily encrypt the password into the proper encoding for use in the httpasswd file.
For security, you should not upload the htpasswd file to a directory that is web accessible (yoursite.com/.htpasswd), it should be placed above your www root directory. You’ll be specifying the location to it later on, so be sure you know where you put it. Also, this file, as with htaccess, should be uploaded as ASCII and not BINARY.
Create a new htaccess file and place the following code in it:
1 2 3 4 5 6 | AuthUserFile /usr/local/you/safedir/.htpasswd AuthGroupFile /dev/null AuthName EnterPassword AuthType Basic require user wsabstract |
The first line is the full server path to your htpasswd file. If you have installed scripts on your server, you should be familiar with this. Please note that this is not a URL, this is a server path. Also note that if you place this htaccess file in your root directory, it will password protect your entire site, which most likely isn’t your exact goal.
The second to last line is where you enter the username of those who you want to have access to that portion of your site. Note that using this will allow only that specific user to be able to access that directory. This applies if you had an htpasswd file that had multiple users setup in it and you wanted each one to have access to an individual directory.
The AuthName is the name of the area you want to access. It could be anything, such as “OpenSesame”. You can change the name to whatever you want, within reason.
We are using AuthType Basic because we are using basic HTTP authentication.
If you want to use SSI, but don’t seem to have the ability to do so with their current web host. You can change that with htaccess. A note of caution first… definitely ask permission from your host before you do this, it can be considered ‘hacking’ or violation of your host’s TOS, so be safe rather than sorry.
Add the following code to your .htaccess file:
1 2 3 | AddType text/html .shtml AddHandler server-parsed .shtml Options Indexes FollowSymLinks Includes |
The first line tells the server that pages with a .shtml extension (for Server parsed HTML) are valid. The second line adds a handler, the actual SSI bit, in all files named .shtml. This tells the server that any file named .shtml should be parsed for server side
commands. The last line is just techno-junk that you should throw in there.
And that’s it, you should have SSI enabled. If all your pages currently have the .html prefix, they will need to all be changed to .shtml. But wait! we can leave them as .html and just add this line to the code above, between the first and second lines:
1 | AddHandler server-parsed .html |
So then it will look like this:
1 2 3 4 | AddType text/html .shtml AddHandler server-parsed .html AddHandler server-parsed .shtml Options Indexes FollowSymLinks Includes |
A note of caution on that one too, however. This will force the server to parse every page with the extension .html for SSI commands, even if they have no SSI commands within them. If you are using SSI sparingly on your site, this is going to give you more server drain than you can justify. SSI does slow down a server because it does extra stuff before serving up a page, although in human terms of speed, it is virtually transparent. Some people also prefer to allow SSI in html pages so as to avoid letting anyone who looks at the page extension to know that they are using SSI in order to prevent the server being compromised through SSI hacks, which is possible. Either way, you now have the knowledge to use it either way.
If, however, you are going to keep SSI pages with the extension of .shtml, and you want to use SSI on your Index pages, you need to add the following line to your htaccess:
1 | DirectoryIndex index.shtml index.html |
This allows a page named index.shtml to be your default page, and if that is not found, index.html is loaded.
Want to block a user by IP? Stalking your site from the vastness of the electron void? In your htaccess file, add the following code–changing the IPs to suit your needs–each command on one line each:
1 2 3 4 | order allow,deny deny from 123.45.6.7 deny from 012.34.5. allow from all |
You can deny access based upon IP addres or an IP block. The above blocks access to the site from 123.45.6.7, and from any sub domain under the IP block 012.34.5. (012.34.5.1, 012.34.5.2, 012.34.5.3, etc.)
I have yet to find a useful application of this, but the purpose for it is not relevant. If you have a need to block by IP. This is how you do it.
You can also set an option for deny from all, which would of course deny everyone. You can also allow or deny by domain name rather than IP address:
1 2 | order allow,deny allow from .javascriptkit.com |
Will allow www.javascriptkit.com or virtual.javascriptkit.com, etc.
This series of tutorials is designed for both novice and intermediate web developers to help understand htaccess. I have discovered that htaccess is an incredibly powerful tool in the right hands and I’d like to share my understanding with you.
If you have heard of htaccess, chances are that it has been in relation to implementing custom error pages or password protected directories. But there is much more available to you through the marvelously simple .htaccess file.
A Few General Ideas
An htaccess file is a simple ASCII file, such as you would create through a text editor like NotePad, Utraedit (PC) or BBEdit (OS X). I’ve noticed some confusion over the naming convention for the file, so I’ll address that first.
.htaccess is the file extension. .htaccess is also the COMPLETE filename. It is not file.htaccess or somepage.htaccess, it is simply named .htaccess
In order to create the file, open up a text editor and save an empty page as .htaccess (or type in one character, as some editors will not let you save an empty page). Chances are that your editor will append its default file extension to the name (ex: for Notepad it would call the file .htaccess.txt). You need to remove the .txt (or other) file extension in order to create an htacccess file. You can do this by right clicking on the file whereever the file is saved and rename it by removing anything that doesn’t say .htaccess.
i.e. if your file is called “somefile.htaccess.txt” remove the beginning and end so it’s only named, “.htaccess”.
htaccess files must be uploaded as ASCII mode, not BINARY. You may need to CHMOD the htaccess file to
1 2 3 | 644 or RW-R--R-- |
This makes the file usable by the server, but prevents it from being read by a browser, which can seriously compromise your security. (For example, if you have password protected directories, if a browser can read the htaccess file, then they can get the location of the authentication file and then reverse engineer the list to get full access to any portion that you previously had protected (read, you just got hacked). There are different ways to prevent this, one being to place all your authentication files above the root directory so that they are not www accessible, and the other is through an htaccess series of commands that prevents itself from being accessed by a browser, more on that later)
Most commands in htaccess are meant to be placed on one line only, so if you use a text editor that uses word-wrap, make sure it is disabled or it might throw in a few characters that annoy Apache to no end, although Apache is typically very forgiving of malformed content in an htaccess file. It’s best practices to keep everything on one line.
htaccess is an Apache thing, not an NT thing. There are similar capabilities for NT servers, but I don’t deal with NT servers at all and it falls outside the scope of this guide.
htaccess files affect the directory they are placed in and all sub-directories, that is an htaccess file located in your root directory
1 | http://yoursite.com |
would affect
1 2 3 | http://yoursite.com/content or http://yoursite.com/contents/content |
You get the point, it affects everything.
It is important to note that this can be prevented (if, for example, you did not want certain htaccess commands to affect a specific directory) by placing a new htaccess file within the directory you don’t want affected with certain changes, and removing the specific command(s) from the new htaccess file that you do not want affecting this directory. In short, the nearest htaccess file to the current directory is treated as the only htaccess file. If the nearest htaccess file is your global htaccess located in your root, then it affects every single directory in your entire site.
Before you go off and plant htaccess everywhere, read through this and make sure you don’t do anything redundant, since it is possible to cause an infinite loop of redirects or errors if you place something weird in the htaccess.
Also…some sites do not allow use of htaccess files, since depending on what they are doing, they can slow down a server overloaded with domains if they are all using htaccess files. I can’t stress this enough: You need to make sure you are allowed to use htaccess before you actually use it. Some things that htaccess can do can compromise a server configuration that has been specifically setup by the admin, so don’t get in trouble.
What do you usually do when you click a URL and encounter a “404 File Not Found” error? Do you:
1. Click on the BACK button of your browser and go somewhere else?
2. Try to back up one directory in the URL and try again?
3. Write to the webmaster of the site and the referring site to inform them of the situation?
If you are like most people, you’ll simply click on the BACK button and try another site. The majority of people don’t even know that there are any other alternatives.
You need to do something so that you do not lose this group of people who come to your site by following an old link or by typing your URL incorrectly.
Requirements for Customizing the 404 File Not Found Page
It is not possible to customize your 404 error page if your web host has not enabled this facility for your website. But I don’t know any hosting plans (besides free ones like Geocities) that don’t allow this.
The .htaccess file is what the majority of web servers use to allow you to fine-tune your web server configurations at a directory level. Other types of web servers handle the customization of 404 error pages differently but these are isolated instances or very technical which in that case your not reading this article
Step One: Creating/Modifying the .htaccess File
This step may not be necessary in all situations. Some web hosts already configure their web server so that it will look for a specific file in your web directory when a certain document cannot be found. If so, simply skip this step.
If your web server is not an Apache web server, you will have to find out from your web host what you need to do to enable the server to serve your customized file when a file cannot be found.
Otherwise, the first thing you need to do is to add the following line to a file named “.htaccess” (without the enclosing quotes and with the preceding period). In most instances, no such file will exist, and you can simply create one with a text editor (such as Notepad on Windows). So open up a text editor, paste the following code into the page and save the file as .htaccess.
1 | ErrorDocument 404 /notfound.html |
Note about uploading .htaccess up to your server. Most FTP clients by default will hide any files that start with a “.” so when you upload the file, it may look like it didn’t upload. Check our transfer logs to see if it was successful (it most likely was). If you REALLY want to see this file or need to edit it later to ad more rules. You can tell your FTP client to display hidden files. Because there are so many FTP clients available. I’ll only give you a hint on how to display it:
1 | ls -a |
You will of course need to put a notfound.html file in the main web directory for the above directive to work.
The “ErrorDocument 404″ directive essentially tells the Apache web server that whenever it cannot find the file it needs in that directory and its subdirectories, it is to use the document specified in the URL that follows.
One .htaccess file in your main directory will do the trick for that directory and its subdirectories. However, if you want a certain subdirectory to show a different 404 File Not Found message, you can always place a .htaccess file into that directory. This will override any .htaccess files you have in the parent directories.
Step Two: Creating Your Error Document File
What should go into your custom 404 File Not Found page?
If you simply let the visitor know that the file could not be found they’ll most likely jsut hit the back button and move on. In order not to lose that visitor, you will have to provide him some way to locate the document he wanted.
Your page should have one or more of the following:
1. A link to your main page, with a suggestion that the visitor can find what he wants there.
2. If you have a search engine for your website, you should definitely put a search box on that page. Even if you don’t have a search engine, you can use a Google Custom Search quite easily.
3. A link to your site map, which lists all the pages on your website.
4. If you know of frequently mistyped URLs on your site, you can even put links to the correct location directly on the page, so that visitors who arrive there from outside can quickly get to the correct page. Remember, you don’t want to lose that visitor, so do all you can to help him.
5. Any other navigational aids that you may have - for example, if you have a drop down navigation menu on your normal pages, you should probably put one here as well.
If you like, you can even put a simple form on the page to allow your visitors to inform you of the broken link. However, the primary aim of this page is not to help you track bad links, but to make sure your visitor does not leave your site if what he wants can be found there.
Incidentally, you should make your 404 page larger than 512 bytes, even when you are testing. Otherwise Internet Explorer (IE) will load what it calls its built-in “friendly HTTP error message” instead of your 404 page.
Step Three: Testing the Error Document
When you’re satisfied with your page, upload it together with your .htaccess file to your website. Then test it by typing a URL that you know does not exist.
Your error page should load up. From this error page, test to see that the links here lead to the pages you intended it to lead.
Common Errors with a 404 Custom Error Page
1. The most common error people have with their custom error page is making a mistake in the URL they put in their .htaccess file. This leads the web server into a loop when a visitor tries to access a missing file. When a file cannot be found the server tries to load the file specified in your ErrorDocument directive. But that file does not exist too, so it tries to load the file specified in that directive. You get the idea.
2. Make sure you test your error file by typing in a non-existent URL. Do not test it by typing its real URL - that will of course work but it will prove nothing.
3. Another common error is to forget that your 404 Error Page may be loaded either from the main directory or from a subdirectory or even your CGI-BIN directory. When you put links on your 404 Document Not Found page, such as hyperlinks leading to other pages on your site or links to images (such as your logo), be sure that you use the full URL and not a relative link. That is, use things like
1 | <a href="http://www.example.com/sitemap.html">Site Map</a> |
instead of
1 | <a href="sitemap.html">Site Map</a> |
The first will work even if the 404 page appears for a missing file in a subdirectory, but the second will not.
Conclusion
When a visitor encounters a 404 File Not Found error on your site, you’re on the verge of losing the visitor that you’ve worked so hard to obtain through the search engines and third party links. Creating your custom 404 error page allows you to minimize the number of visitors lost that way.