Overview
The Setting
UCLA is a big university, the biggest out of any UC currently. Having to be worried around enrollment time is something students at most colleges face, but the problems become magnified when you realize that there could be literally hundreds of people gunning for a spot in the class that can only seat 30 people. People fuss about not being able to get out of 8AM classes at other colleges, but at least in my experience at UCLA, you start hearing more concerning stories much more commonly, all the way up to ones where people can't take certain classes they need to graduate and have to take an extra quarter just because a certain class they needed filled up. It's to the point where people paying other to hold spots for them in certain classes is a common practice, which in my opinion, really shouldn't have to be common.
For these reasons, I have recently taken an interest in ways to record enrollment data to mainly answer questions like "how fast do certain classes fill up?", "which classes have a high drop rate?", "is this class more popular during the first or second enrollment passes?", and many many others. As well as to satisfy my own curiosity, I thought other people might want to have a look at the data for their own classes or curiosity, so I made this website so that I could present my findings, and even let people search for their own classes in the Class Fill-Up section.
Definitions
These can get rather confusing (many people use terms like "section" and "course" interchangeably), but it's important to hammer out exactly what terms mean when I say them here.
- MyUCLA: UCLA’s official student portal. Where students see grades, how full classes are, and also where students actually perform the enrollment process. Always seems to be under maintenance during exam periods...
- Section: A particular offering of a course for a given term. The main confusion here is that many people only use "section" to refer to the discussion sections (they're like recitations) that are usually part of a course, and refer to the main portion taught by a professor as the "lecture". However, as far as UCLA's database sees things, the word "section" encompasses both of these elements.
- Class ID Number: A 9 digit ID that uniquely identifies a certain Lecture or section of a certain class. Sections also have their own number, and are just made by incrementing the main class’s ID. For example Japan 5 has the Class ID Number 261015200, while its discussions are labeled 261015201 and 261015202, in order.
Collecting the Data
All of the current for any section at UCLA are openly displayed at the official Schedule of Classes website, no login necessary. However, no data about what the enrollment numbers were in the past, so this dataset has no record of enrollment numbers before Fall 2019. To collect the data I wrote a scraper in Go that was eventually deployed as multiple AWS Lambda functions. The current dataset encompasses the Fall 2019 until right before the end of Winter 2020, meaning that if we want class enrollment numbers during a quarter, we only really have Fall.
However, I think this dataset encompasses the most interesting times to look at enrollment numbers, which is around halfway during the quarter when everyone has their enrollment appointments. (See the first section for more details).
Even though UCLA doesn't seem to store all the intermediate data how many people were enrolled in the class (because let's be honest, that's very uninportant to see for the average student trying to look up classes), UCLA at least does keep a record of what each section looked like at least at the end of each quarter all the way back to 1999. (Example search for Math classes in Fall 1999)
This allows us to collect at least a good estimate on what classes were being offered, and how many people were taking classes in each subject area throughout history.
There are certain courses meant more to keep track of how TAs get their credit when teaching discussion sections, and other individual study sections that aren't really what most people think of as "classes" but need to be kept in the system for record-keeping purposes. These are the 300s-500s classes, and I didn't deem them necessary to include in this dataset. Besides, enrollment isn't ever really a concern for the much smaller and catered-to graduate students, who are the usual audience for these courses.
For all the little details and sample code for how I wrote the scraper and built the database, check outBuildings and Classrooms
Additional information about classrooms and buildings were scraped from the official UCLA Map, building list, and Classroom Grid Search. Over the past 20 years at UCLA, classroom sizes have changed with renovations and buildings have both been created and destroyed. All classroom data was scraped in February 2020 and is only current to February 2020; it doesn't take into account any historical changes.
Information about buildings and classrooms are stored as two tables in the database:
buildings and rooms, respectively.
buildings contains an id, name, abbreviation, and coordinates of all buildings listed by
the registrar. rooms contains rows of an id, reference to a building in the
buildings table, room number, and the maximum capacity of the room.
Data
The data set contains:
- 65 terms (99W to 20S)
- 147 Departments
- 282 subject areas (keep in mind some subject areas change name, and certain subjects come and go . For example Arts Education has only existed since 15F, while African Languages have, like many other languages, have split off into their own subject areas in the same quarter, presumably to make numbering easier)
- 40,239 courses
- 353,255 sections
- Over 30,000,000 rows hourly of enrollment data
I've put all the data into a PostgreSQL dump file that's about 2 gigabytes large. In case you're new to PostgreSQL, they have a really nice tutorial for first time setup on their website. After following it, you can look at this page for instructions on how to load the dump file.
All the data I've used for this project is pulled straight from the database, I only automated some queries using psycopg2 with Python, I'll probably upload a Jupyter Notebook later to demonstrate some of the queries. If you're interested in seeing the data itself though, they can be found in a separate directory on this website.