Assignments in LMS: How to cope with disk space issues

In this article, we'll talk about managing an LMS when all institutional assignments handed in by students have to be handed in through the LMS, and what that means if you have several thousand students. This case is based on one of our customers which provides in-class English courses to about 55,000 students in a monthly cycle. The institution started working with us in 2012, so it's already been 5 years since we've started developing their custom Chamilo LMS setup. Thanks to a good alignment of their management's vision and ours, they have been able to progressively offer more value to their students, and where monthly (end of cycle) assignments of a few A4 sheets of paper used to be handed in by each and every student, all students now (since early 2015) have the obligation to upload their work online, either from the institution's facilities, or from home.

Environmental benefits

Before diving into the more technical issues at hand, let's make a brief analysis of one of the benefits using an LMS can provide... With an average of 1.5 sheet of paper per student monthly, the institution will save a staggering 990,000 sheets of paper from being printed this year and dumped a little later. Image retirée. To retake the estimate by and ePayPlus (from whom we are using the illustration above), the savings generated by this institutional decision are:

  • 120,780 liters of water
  • 6,732 kilograms of CO2 (almost 7 tons of CO2)
  • 495 kilograms of landfill waste

This doesn't take into account the waste and CO2 generated by the datacenter, with *one single physical server* handling the whole system (plus passive backups), but it is very unlikely to be above 1 ton of CO2 per year, leaving an offset of at least 5 tons of CO2 off our atmosphere. While the results of this implementation of a professional Chamilo are really impressive, this article talks about the challenges of managing such a system.

General introduction

Chamilo is a very lightweight LMS. It can run off a Raspberry Pi B+ or an XO (of the One Laptop Per Child project). It has been able (so far) to handle up to 800K registered students like a breeze and has features to be easily configured on a scalable cloud hosting platform (with cases demonstrated on Google Cloud Engine, AWS, Digital Ocean, Rackspace's OpenStack and others). It offers numerous tools for teachers and is widely recognized as one of the easiest (if not the easiest) LMS to manage for teachers, with training speeds 2 to 5 times higher than Moodle, for example (which results in *massive* savings when introducing the platform to the staff of a 1000+ teachers' institution - just think about saving 3*1000 training days of wages...).

Managing monthly assignments from 50K+ students

Now obviously, these CO2 savings come with a cost in terms of complexity of management. In short, we currently register more than 90GB of assignments being uploaded to the server each month, and this increases slowly with the increasing number of students and the increasing size of media resources, in particular increasing pictures sizes and the progressive introduction of homemade videos as part of the assignments. Image retirée.The chart above shows the number of GBs received each month. You can see how the amount of gigabytes moved from about 60GB in early 2015 to more than 90GB in early 2017. Offsetting a 12% growth in the number of students, we're still facing a 34% increase in two years, or a 17% increase per year in average assignments size. Or, if represented in a cumulative chart, we get something like this (2TB after 2 years). Image retirée.Now obviously, this brings a few questions. If comparing this to early 2016, we are already at more than double the amount of space used. And obviously we are also reaching the limits of standard disks space. So the questions are

  • What do we do with all these assignments?
  • How far will this grow and what capacity will we have to plan for to still be OK 5 years from now?
  • Virtual or dedicated servers?
  • Hosted or on-premises?

These are all valid questions, but the answer, saldy, depends on the use case. Let's focus on what to do with the assignments and, to some extend, how to plan ahead. If you are interested in the other questions, our team of consultants will be more than happy to review your case. Just drop us a message through the contact form (top menu).

Case 1: Assignments volatility (or "pruning")

In some cases, like the one above, assignments are not really important in the long run. What's important is the score given by the teachers. This means that, while the assignment is important for the student's evaluation, it is not the only resource taken into account for his/her grading. And as the one-month cycle dictates that a student must get his/her grade immediately, well, the complaint process in case anything went wrong is relatively quick. Some (very few) of the lessons cycles last 2 months, so the longest time we have to keep a single assignment (the "original") in the system is actually for a maximum of 2 months *after* any given cycle, so at any given time you will only store assignments for 3 months (the current + the 2 previous months). This gives a totally different schema of disk usage, as you can see in the chart below.

Image retirée.

In technical terms, the action of deleting what is not necessary anymore is often called "pruning". Fair enough, that's reducing our disk usag of about 80%, but it's still growing, year on year. Also note that this only looks at assignments storage. An LMS will store *many* more items (forum contributions and images uploads, wikis, videos of course contents, etc). So size issues are not fixed, but this is a good solution to reduce considerably the urgency of this issue. Obviously, this is only true if you can delete the assignments after a few weeks/months, but increase that "pruning cycle" to 9 months and you're already in a difficult situation again. It is also important to note that just deleting the files on disk will generate database vs filesystem inconsistencies, and that deleting 55,000 assignments from the interface might now be the most useful use of your time. What can we do to help you out? Well, we developed a series of scripts in Chamilo that can help you automate the deletion of these tasks within a given prune cycle. Interested? Check the tests/scripts/ directory in the development repository of Chamilo, or contact us for support!

Case 2: Auto-compression

A more complex solution is to compress all documents that are sent by students. Although this does not exist in Chamilo itself, we have experience in developing similar solutions, where documents are uploaded and compressed on the fly, and uncompressed before handing them out to the user. This ensures that all documents remaining on disk will use less space than their "normal", uncompressed version. However, due to the higher requirements in processing power and the very light footprint of Chamilo, we cannot add this in Chamilo by default. In our experience, this provides a 20% volume reduction (this can vary highly depending on the type of documents). Not that big a benefit by itself, but if you are above the 2TB, this can save you a few months from buying a new drive (and taking on the long process of transfering all your data from one disk to another). Sadly, this also increases the CPU load of about 10% (this can be much higher if the upload/download of documents is one of the main activities on your portal). This requires the setting up of a hidden extra field for documents in Chamilo, and a series of complex changes to the documents or assignments handling system. Again, if you feel like this is the solution for you but you don't know how to do it, contact us.

Case 3: Glacier and Co

If you are hosting your solution on the cloud, then Glacier at AWS (or similar long-term storage solutions) is the way to go. Obviously, if you want to keep an assignment for the whole time of the student's 5-years study career, it's much less likely that you will need to recover his assignments for a year before. Only if there is an error in the academical or administrative process will this file be necessary. This is exactly the type of circumstances where Glacier makes itself useful: by storing stuff that will almost never be used, but that need to be stored. This implies a little more fiddling in Chamilo, as you will have to register an "expiration date" by type of resource and make an automated process send the files to Glacier and change the link to the file in the database to make this new location clear to Chamilo. But at $0.004 per GB per month (~$4/TB/month), this can really be interesting. The first case situation above would only cost about $150/year by the end of 2017, storing all the files from 2015, 2016 and 2017, + the initial cost of sending and the possible costs of recovering some of these tasks. Roughly, you should probably get to $500 per year (because sending over 600,000 files there will cost money).

BeezNest's services

BeezNest is one of the companies that joined the Chamilo project before its official launch in 2010. We've been working on Chamilo since then and BeezNest has been named the responsible editor of Chamilo LMS by the Chamilo Association in 2015 as a recognition for the unparalleled contribution we have made to the software, contributing more than 90% of the total Chamilo codebase. We thrive by providing Chamilo consulting, development, training, hosting and support services at the highest possible level of quality. Image retirée. If you have a Chamilo problem, if no one else can help, maybe you can hire the BeezNest team! Contact us through the form in the contact tab. We have customers in many countries (including the UK, the US and Canada) and technical teams in Belgium, France, Germany, Spain, Uruguay and Peru.