In a recent meeting with the Dokeos 2.0 team, I realized that sometimes things are not quite evident. One of these cases that can only be understood with experience is that there's no use in allowing a user to upload files and keep the true filenames on the server's disk. In fact, it is quite a bigger problem to do that instead of changing the filenames and storing files as hashes.
Reason 1: Avoid security issues
When uploading a file to the server, you will have filters in place (won't you?). This being said, you would have to be very good technically to know exactly what kind of files represent a danger to your server if accessed directly. For example, a user could upload an example PHP script called, let's say, "test.php", and this script could contain malicious code. If a user is able to upload such script and the same (or another) user can access that file directly (like http://www.example.com/media/test.php), then the access itself will trigger the execution of this script on your server and the implied malicious effect.
Now, what people know a little less, is that a file called "test.php.txt" can also be interpreted as a PHP script (you didn't know, did you?).
There are loads of other examples of file extensions that are dangerous to your server in one case or another, but the idea is always the same. You don't want users to be able to upload these files "as is" and be able to access them without your approval. You could use .htaccess rules, but then you would have to rely on the fact that the server you're going to install your application to actually manages these .htaccess rules. In free software, that's something complicated to do (or at least if you do it, it becomes complicated for the user to install it).
Reason 2: Character encoding
Reason 3: Character casing
Reason 4: Updating filenames
Reason 5: Duplication
Reason 6: Identifying duplicates
Reason 7: Load splitting
Disadvantages
When uploading inter-related files (like in the case of an HTML document with CSS and images, for example), you will have to ensure that single files can be queried through your server's scripts normally (not by their hash names). This requires a bit more work, but is easily done through a redirection to some "download.php" script if available (there are other possible ways).
Comments
Hi yannick,
After our meeting in belgium i refactored the document structure in dokeos 2.0. We now use hashes on the file system that are stored in the database. Also all the hashes are divided in several directories depending on the first character of the hash.
We also use a download script so users don't get annoyed with the file hashes. This is not only used to actually download files, but also to show images in an image tag. We will implement the same system for our html documents in for example the iframe of the learning path.
Tanks for the tip!
Best regards
Sven