Slow Data in a Hurried Age*

* This is a play on David Mikics Slow Reading in a Hurried Age. As someone whose research is about building tools for decision-making with data, I find myself at odds with how these tools are often used. Mikics describes how we have lost the joy of reading books because of how hurried our world has become; our reading attention spans are now limited to tweets, and other forms of bad writing. I think the hurried age also applies to how we make decisions with data: so below is my attempt at describing the problem. In the future, I will try to propose ways of slowing things down.

Part 1: The Gated Data Nullius

What purpose does data serve?

When we began to build abodes for data, the databases, from cave walls to stone tablets to blue ledgers, to OracleTMs and finally to lakes in the cloud, we started with a simple desire: to keep record — a shared contract of what happened, when, where and how, and what should happen in the future as of now. All this kept us civil: “this is our story for the record.

I remember over-stuffed, tattered, beige folders of patient files shuffling around my mother’s clinic: papers of test results, scribbled notes, and pink and yellow carbon copy prescriptions peeking from every side, held by an invisible force in each folder. Each one was a joint story of human interaction across time, place and technology. As with all stories, much is left out. Tests were done on Feb; Results came back a week later; Prescriptions were given on March; More tests; More results; The patient is back. From these fragments, my mother would skilfully fill the narrative gaps. As Ali sat down, she could tell he wasn’t taking his meds, the very ones she prescribed in March. He was giving the meds to his brother who couldn’t afford them and seemed sicker! But he was getting sicker too as the record showed; even after accounting for Ali’s confession: He wasn’t fasting that day they ran the tests. Like every day he couldn’t resist his morning tea with 4 sugar heap-fuls. The full story lives off the record. Rich lives, paltry records.

When I grade labs, exams, or papers, I would often think of those folders. I submit my grades on Gradescope, thinking here is my feedback for the record yet fully aware of the richness of in-class conversations, the 1-on-1s, the circumstances of my students and the learning (or lack thereof) beyond what the record shows. My students do the same thing when they fill in course evaluations, here is what we thought of the class for your record, fully aware of the richness of all our interactions inside and outside the classroom.

But data are not just records! A “database” is an antiquated term. Data doesn’t live in a base. It’s part of a set; it’s pulled from a log; it’s generated from a model. It’s beyond you and me and it doesn’t care for our full stories. How did the database become the data set? Well, a couple of things happened:

First Development: We stopped being deliberate about what we retained. When we tallied inventory in ledgers or kept notes in records, there was a clear sense of limits and costs. Maintaining a record took time, effort and physical space1. There was a visible cost to retention. So we only kept what we needed: we were deliberate. But when data retention ceased to have a visible cost — Data centers are hidden from sight, often across continents; Once paid, the subscription, licence or storage costs create a perverted sunk cost fallacy, in that we ignore the cost and assume data lives free! — we gave up our deliberations. Keeping track of things with exacting precision does not require much or any effort: we track when we track things down to the millisecond2. We track how long a patient spent with the doctor and how long it took to grade one answer vs. another. We don’t need to store notes on our interactions, we can and we will store every uhm and err.

Second Development: We stopped owning our stories. As the beige folders made their way into my mother’s office, I could see the people in the waiting room perk up: “is that mine?” They felt they owned the folder and the fragments of their story that it held. But who owns data? The minutes it took for a patient to be seen; the diagnostic codes that justify insurance claims for tests, referrals and refills; and so on. Whose stories (if any) are they representing? The physical folders are gone but with it the imaginary force that tied together the doctor, the patient, the nurses, and many others across time and place as they ritually kept record of their many interactions. The insurance company keeps the data and decides our insurance premiums even if we never engaged in “keeping record” with the insurer. The medical group keeps the data and pushes wellness checks on unwell patients. The clinic, the university, the hypermarket, the bank, the city … we don’t see our stories, yet it is data wrapped on our records, forcing its way into our lives.

We have grown familiar and even comfortable with the disappearance of the ownership of our narratives even when we commit to writing our own stories. On social media, there are no owned stories, only data. A post generates likes, reposts, and maybe responses. Our posts and responses are stripped of the richness of our lives; they don’t even rise to paltry records because we don’t even try to share the context through which we write our fragments (we can’t) or read others’ (we don’t ask).

From these developments emerges the data nullius, like terra nullius or no man’s land, no individual owns the data. Unlike the commons, we don’t govern it either for collective use. Yet, it is kept and gated with varying degrees of access.

So what purpose does data serve?

Quick takes on critical issues! I don’t mean this cynically. It is just the way it is. Imagine a boardroom, where you or someone else are sitting around representing an institution. A well meaning concern is posed. Either to prioritize or discredit the concern, one announces “Show me the data!” The data person3 pipes up: “well we have this data that we can use to answer this question with … (5 minutes of data analysis jargon go by; words like correlation and causation are thrown around to signal deep knowledge of all things data) … it isn’t exactly this and of course we would have to do further analysis but we can show … (5 more minutes!)” The newly formulated question has nothing to do with the concern but that question has the right data parts. The data itself is not substantial enough to answer the question but that is secondary, future work, after the first take is presented at the next board room meeting.

The presence of any data (Development 1) drives this behavior. Pushing back is seen as ludicrous. Surely, you would want a quick handle or even an approximate answer before you invest any further resources into studying an issue or handling it. Resources are constrained. We have no time. There is also a feedback loop that plays in here: data begets data. Resources are put into the secondary future work to gather any more data that can be easily gathered, joined, merged, etc. (Development 1). We can now answer even more questions motivated by, but disconnected from, deep concerns. And so we operate in a world of compounded approximate and bad answers to irrelevant questions.

Let me give an example. Across universities faculty are concerned that using course evaluations to evaluate teaching effectiveness is leading to grade inflation4. Brought to a university’s board room, the data person formulates the following irrelevant question: “are student grades correlated with course evaluations?” Some analytical work of course is needed to account for discretized, finite grades, non-normal distributions, non-linearity, etc . All of this is explained along with the required “correlation is not causation” to provide the necessary due diligence that assures everyone is aware of the imperfection as they nod to “let’s see what the data says5. Yet no matter how much data6 exists, it cannot uncover whether faculty take a more lenient grading approach in fear of poor evaluations. Here are some reasons why:

  • Many faculty-student-administrator-adjective-interactions: Espousing this concern, grade inflation is probably strongest when we have a large proportion of (i) must-get-an-A students (or perceptions thereof), (ii) risk-averse yet impartial faculty: the combination of holding equity dear, yet wanting to get tenured, promoted or renewed, and (iii) data-driven administrators that believe course evals should determine teaching effectiveness. But we could have variations on this: smaller proportions, pragmatic (a-C-would-do) students, risk-tolerant or partial faculty, etc. Moreover, each of these plays into a noisy, person-specific, interaction-specific, function of how much additional grade is tacked on. From Development 2, we know that the data doesn’t come with any of that context, even if the record between the faculty and the student does so for them.
  • Many other factors: Perhaps grade inflation happens because of many factors that may include course evaluations or changes in knowledge expectations or the faculty actually just got better at teaching.
  • It’s time-evolving: New faculty might come in with certain cultural norms of grading. Perhaps they were stringent and received poor evals. They sought advice from a colleague who said “just give them good grades.” After some internal moral qualms, the faculty did become lenient and evals did improve. The faculty pushes on the same advice to newer faculty. They got tenured and then found a new sense of risk tolerance. They make the material more challenging and they are more confident docking down students but the evals remained the same. They now claim that evals are not influenced by grades. This is just one complex trajectory for one faculty that explains the interplay of grading norms and evals.

To all these reasons and any more, the data person has rigorous solutions; rigorous methods that are less sensitive to the violation of their assumptions; causal models; data curation (e.g. let’s look at large class sizes taught by multiple senior instructors within the same year — so calculus 101?!). Yet the violation of an assumption should invalidate the result; causal models cannot account for unaccounted causes and often require more data than what is available for robustness; and at what point does the curated data stop generalizing to the entire university. Debating the data person can go on indefinitely.

Let me be clear, I am not making an argument on the validity of the concern: it may, it may not be. I am arguing against the presumption that with data, especially what is often readily available, we can figure it out. I’m also pointing out that the emergence of the data nullius makes it increasingly difficult to argue against a fast data approach. Back in the board room, imagine a voice that says: “hold on, that is not what records of grading are meant to be used for; neither are records of course evaluation.” If that voice does emerge, it will quickly be quelled again with irrelevant responses. “Oh we are doing this analysis in the aggregate, we are not violating any one’s confidentiality.” But it isn’t about confidentiality, in fact the extraction of a data point from the very context it was created within is the problem. But the voice rarely emerges because we are familiar and comfortable with the disappearance of the ownership of our stories.

Again these are rough ideas, feel free to comment.

  1. The tangible nature of a record meant it also didn’t travel much or far. ↩︎
  2. A nursery app would ping me the very minute my toddler had a bowel movement! ↩︎
  3. At this point, it is fair for you to ask “Azza, are you the data person?” I’m not the data person but I also plead the 5th 🙂 ↩︎
  4. This an age old concern. There are lots of empirical studies that show some evidence of this, and no evidence. I learned that in the US a possible driver for grade inflation was the Vietnam War; to protect failing students from being conscripted, faculty boosted the grades! ↩︎
  5. The presence of empirical studies with other universities does not detract from the data analysis effort: (1) we have the data, so we can do it! (2) we are unique. ↩︎
  6. There might be other ways to determine how prevalent grade inflation might be at an institution and what drives it but none of them are easy or fast or without problem: Qualitative interviews (Too long) ; Surveys (Few answer them, fewer truthfully); Logical reasoning (Ah, the philosophers); Game theoretic formulations and simulations (Ah, the economists); Randomized Control Trials (Ah, more economists … What? You want to do what?); We build a multi-factorial AI model that predicts grades or course evals and we examine the weights assigned to course evals or grades (Get out of here!). ↩︎

A Theory for Technology Design that Empowers

In the discussion of technology, we often find a refrain at least among the more sensible scholars: “Technology is neutral, it is what we do with it that matters.” This makes sense as it stands in contrast to the marching chants of techno-defeatists: “Technology progresses on, as if it has a mind of its own and will continue to persevere, survive and grow like other sentient beings, evolving to finally achieve its personhood like humans – the pinnacle of evolution.” Both claims are deficient. Technology is designed and technological solutions are designed with a designer’s framing that colors the personality of a tech even if it never achieves personhood; this framing determines its relationship to us as humans.

Let me grossly simplify, generalize and state that designers think of the humans who they design tech for in one of two ways:

  1. Humans are flawed, technology aims to overcome, or
  2. Humans are potent, technology aims to empower.

These diametrically opposing perceptions of humans by designers may appear subtle with respect to a technological solution but they are not. Technology interacts with humans in amplifying, self-fulfilling loops: a technology rooted in a human flaw will only amplify the flaw, sap us of our potential and leave us weaker and more dependent on the technology itself. A technology rooted in a human potential will only amplify our power, leaving us stronger and self-fulfilled. Designers in the first camp are in essence self-haters1 : they see weakness in us, hate it and aim to eradicate it. Those in the second camp are empowerers: they celebrate human potential, see our strengths and aim to grow them. As a designer, I can attest that it is easier to design from within the self-hating camp. After all, weaknesses are problems and we enjoy solving problems. Whether what was identified is indeed a weakness or a problem rather than a feature is beside the point, once the design process begins.

Two examples ought to make concrete my arguments. A majority of technology is borne out of self-hate, so let’s start with one technology that isn’t.

The Bicycle as Empowering Technology

Riding a bicycle brings me an immense joy. A bicycle cannot be borne out of a perception of humans as flawed2. It celebrates our keen sense of balance — who would have thought it wise to stay upright on less than one inch of rolling tyres; It celebrates our capacity to grasp our surroundings as we dart around potholes, pedestrians, or car doors that suddenly open; It celebrates the physical strength of our quads and hams as we climb up hills and travel longer distances. A bicycle empowers us to go further because we can. In the self-fulfilling form of any technology: the more you bike, the better you balance, the more acute your reflexes get and the stronger you become. Bicycles bring joy because they let us lead, they extend our powers rather than replace.

The Gamified Exhibit as Self-Hating Technology

My husband and I recently took our boys (6 & 4) to a children’s exhibit. The beautiful exhibit had juxtaposed art works of stars, planets and galaxies, along with historical and contemporary artefacts that allowed humans to peer ever more deeply into the skies and space. Naturally curious and interested in space, the exhibit was bound to leave an everlasting impression of wonder in any child. Except for the self-hater in every designer who couldn’t trust that the heavens, space, art and history alone can excite a child to wander in awe and learn about the cosmos. Starting with a human flaw — children lack motivation — a technological solution is proposed: Let’s gamify this experience! And so begins the design process of complex technology layering that created the following:

On entering the exhibit, two avatars construct a narrative to enlist the children’s help. There has been a system glitch and the databases holding key information about humans and the cosmos are corrupted. Don’t lose hope! if you complete a series of exercises, you can fix the databases.

Accepting the mission earns each child a barcode bracelet that tracks their progress towards fixing the glitches: a jumbled up digital copy of a centuries old painting, a disordered compass, mislabeled planetary objects, etc. Our children are ecstatic. As we follow along through the 15 stations or so, we find ourselves explaining one tech glitch after another. There is no time to experience the beauty of any artefact, appreciate its perseverance through time, or consider its meaning. We trudge along: scan, fix, collect points. Hooray, a badge! Then things take a darker turn: competition creeps in. “Mom, let’s do this faster! Dad is already two stations ahead.” “Dad, they have more points! Do it right!” Now we are not only moving through a repair factory line, we are fighting off intensifying feelings of injustice. We are one step away from full catastrophic meltdowns.

Technology did overcome! We completed all the tasks. In order. Two email certificates attest to that. Problem solved.

The self-fulfilling, amplifying effects do not end on exiting the exhibit. The self-hating design is analytically confirmed to be superior: “look all the tracking data shows success, the kids are going through all the stations; they saw all the things we wanted them to see.” Our kids were robbed of their potential to just enjoy wandering around aimlessly through broken clay pots, metal spears, historical gadgets, art and depictions of the cosmos. Now, a gamified interest-production layer is required because museums need to be covered and completed. Aimless wandering and just resting at one or two things that spiritually speak to one can only be a flaw that ought to be technologically eradicated.

As I prepare for my Spring course on Techruption, I find myself dwelling on these two design mindsets. Perhaps every now and then I’ll post about a technology that I find emanating from one design camp or the other and perhaps I can make my arguments more nuanced. I would appreciate a conversation on this, so feel free to comment or reach out.

  1. A benefit of a blog post is the ability to state things a bit more controversially and strongly 😉 ↩︎
  2. We can’t really go back to the psyche of the first person who designed a bicycle so we are going to engage in a bit of logical fallacy and affirm the consequent. ↩︎


The Writing Rules

These rules are for my research lab.

Do not use ChatGPT or any other genAI tool. Since 2022, my students and my research team have been increasingly using ChatGPT to help (re)write sentences, abstracts, or even entire paragraphs and sections of a paper. The argument is that it looks more professional and has fewer errors. As a reader, I am tired of reading generated text: It is often repetitive and shallow. It can miss entirely the point you are trying to make or worse make up an entirely different point. You can write it better. Even if it is rougher, or seems less polished, we understand you better when we hear your voice. We also spend more time editing out the generated fillers than refining rough text. 

The bigger concern here is that “writing is thinking”. If you let a tool do the writing for you, you are not clarifying your ideas, figuring out the bigger picture or logically constructing and validating your arguments and algorithms.

This rule applies even to abstracts. If you can’t distill the main contributions and claims of your paper into a single paragraph, how will you give an elevator pitch to anyone who asks about your research?

Do not fill space. If you can write it clearly in a sentence, don’t write a paragraph! If you need a paragraph, don’t write a page, etc. Brevity is an art and one you should perfect. From now on, ignore whatever notion you may have about how much space something should occupy. Some of our conferences penalize papers that are unnecessarily wordy. Gassy writing stinks!

Do not use passive voice. Experiments were not done, you did them. Show some professional accountability by owning up to the work. I have zero-tolerance for passive voice. It has no place in scientific writing. If you are intentionally obfuscating attribution, we have bigger problems. Also, active voice is easier to read and flows better.

Do not start with “Data is exploding …” or other tropes. I get that it might help you get over the empty page intro block, but how about you just get into it and tell us what the problem is; the rest will naturally come together.

Do not cite papers you haven’t read. If you can’t meaningfully describe a system in related works and how it compares with your work, don’t put it in. Everyone is tired of lazy citation lists. Also don’t use citation lists to back common sense claims such as “database systems are used in healthcare [1, 2, 3, …100], finance [100, … , 200], blah, blah, blah …”

Do not be pompous. Write simple and clear. Think “KISS — Keep it simple, stupid.” This rules also applies to algorithms, mathematical formalisms, etc. Strive for simplicity. I will institute a pomposity jar: you will pay a dollar for every1 Greek alphabet, ten for every subscript or superscript, and a hundred for every convoluted or over-the-top sentence. All proceeds will go towards a coffee fund for the readers who have to go through your paper.

Disclaimer: Writing rules are not legally binding. I bestow the rights of judge, jury and enforcer to myself. Violators can pay in the local currency: 1 US Dollar=3.65 UAE Dirhams. No students have been charged … yet.

  1. I considered adding “extraneous” to the use of Greek notation but then if I’m going over a proof or algorithm, I sure do need the coffee. ↩︎

Fighting back the tide of AI Learning Shortcuts, with AI

Any conversation on teaching with AI needs to start from “how do we learn?”. Here I find Barabara Oakley’s many books on learning, Mindshift, A Mind for Numbers, Learning How to Learn, etc to provide an honest answer: we learn “with effort” and lots of it! Low-effort tasks like listening to lectures, reviewing notes, rereading notes are not going to give students the learning gains they need, at least not beyond passing an imminent exam that tests our capacity for word-association. Actively recalling a lecture and readings through recreating notes, solving problems, asking questions, getting help when stuck, and repeated rehearsals over periods of time, would however. These are all effortful. Most of Oakley’s learning tips are not about eliminating such efforts. Rather, they are about setting up the right environment (place and time) and, more importantly the right mindset (motivation, and warming-up with pomodoro techniques) conducive for learning that lasts. 

Now that we understand the reality of how effortful learning is, we can talk about the impact of AI on education. Marc Watkins’s post on the plethora of Ed-Tech AI tools promising to remove the effort from learning should ring alarm bells in the heads of many educators. There are two cases to explore here: educators are doing the right thing, and educators are not doing the right thing.

The second case is widely discussed and is definitely the stance many struggling students take and for good reason. Remember the days when you would hear teaching descriptions along these lines: “Its like throwing pies at a wall and hoping something sticks”. Now I hear: “Its like aiming a fire hose at students and hoping they drink!” I can’t shake the image of battered and drenched students from my head. As a computer scientist, I can relate. The research field is constantly evolving, with an ever-growing complexity (or incomprehensibility) that makes one envy Lacan’s students. AI tools that summarize papers, hour-long lectures and talks, and answer questions can help me get a handle on the hose and push it away, but I am completely aware of how deficient my learning is when doing so. Our fields have grown, our primary sources have multiplied, and we are trying to bring our undergrads up to the cutting edge — this ain’t working.  Let’s simplisticly (and incorrectly for the sake of illustration) assume that our brains are a somewhat fixed volume — a cuboid — with its width representing different disciplines, its length representing the different subfileds within a discipline and its depth representing concepts and techniques within each subfield. If we keep increasing the width and length, the depth’s got to give. And even if we promote learning in ways that share concepts across fields and disciplines and allow our brain to succinctly codify knowledge, student motivation and time cannot keep up with what we are expecting them to learn. If we don’t distill our fields into meaningful trickles that students can build on as they see fit, then they will turn to these AI tools to push away the fire hose. If I assign five primary source readings a week, I should not expect any deep grasp of these readings and if students can get the cliff notes of these readings from an AI tool, then that is fair game. With that much assigned, a shallow, superficial understanding is all that one can hope for after all. Which poses the question: why do so in the first place? If we aren’t distilling our fields then we are leaving it up to our students to do so in ways that boost the sales of these AI tools and we can’t fault them for not learning as we intended. The same argument extends to our writing assignments, labs, problem sets, etc. For our students, using AI “shortcuts” is not about reducing effort but about surviving the term. Perhaps our great challenge as educators is not about curtailing AI or other shortcuts, but about taking the responsibility of curation and pruning to ensure depth that matters.

This brings us to the first case, where we are doing the right things and assigning the right kind of learning work. In this world, I assign one primary source reading and engage my students critically in it over a reasonable period of time, and I also sufficiently motivate the students, so they believe in the value of learning the material. The same AI tools that helped with the second case are detrimental to learning. A tool that does “the effort of learning” for students strips them of learning. We need to have conversations with students about how they should learn to preempt these behaviors. These conversations must stress how to use AI tools appropriately, i.e. to set up the right environment and mindset to get the effort rolling. This can be difficult when there is an abundance of tools that aim to eliminate effort and an inaccessibility to tools that actually promote learning. 

In joint work with NYUAD alum, Liam Richards, and Prof. Nancy Gleason, we built one such AI tool, ReaderQuizzer. To help students with active-reading, ReaderQuizzer generates questions at one of two reading levels, comprehension and analysis, to help students stay on track and self-assess their reading at every page. This tool is in contrast with other AI tools that provide answers to questions asked of a reading or that summarize it. Our tool puts the effort on the student but helps guide and maintain it, akin to the 20-minute pomodoro timer that gets student started on any learning exercise. ReaderQuizzer for now is locked in as a research prototype, inaccessible to the many students who could benefit from it. The same is true for many possible AI education tools designed by educators who understand how learning works. Another tool my lab is building is a syllabus reading tool that transforms syllabi into actionable learning tasks directly in your calendar. Why are they locked? First, institutions unlike start-ups are held more accountable from a legal perspective. Does a student uploading a reading onto a tool like ReaderQuizzer constitute a copyright violation? Until a legal team settles this, higher education institutes may not wish to risk legal battles with publishers. Second, a resistance and a fear of change, breaching the subject of open access to syllabi can lead to hours of heated faculty debate. Third, institutions don’t have the IT/tech development resources to launch such tools. Where to host the servers, who manages the subscriptions, what are the right subscriptions to purchase and so on. 

Outside academia, the incentives are such that tools that offer certain features, especially those that promise “shortcuts” will inherently attract more students. If there was a magic pill that keeps one healthy and fit, would we take the pill or sweat for a tedious hour everyday? When the side-effects aren’t obvious — you aren’t truly learning — but there are immediate gains — you complete the assignment or pass the test — its hard to make the right choices. Naturally, bad tools will flourish. 

So what can be done? We can create in-house AI Ed-Tech sand boxes where both legal and development teams support on a tool by tool basis rather than aim to resolve grander issues for all. Uploading certain readings can violate copyright but not all do. A single subscription or LLM model may not work for all possible tools but since money doesn’t seem to be the primary resource constraint for now, let 100 tools flourish each with their own subscription or LLM models. What we shouldn’t do is wait until the market is flooded by bad options that we can’t pull our students back from.


Evaluation of an emergency online course

I had ambitious plans for the two months of emergency online teaching including sharing my weekly lesson plans and lessons learned. I haven’t delivered a weekly report as promised but I did offer occasional updates. So, I hope in this wrap-up post to first share some the main lessons I learned. As many of us have to teach courses online again this fall, some of these lessons may help others better plan for round two! I have already received my course evaluations so I’ll emphasize the methods that the students appreciated. Second, I would like to put forward a plan of action moving forward. Many of my colleagues described the two-month ordeal as challenging and complicated. While I would use similar words to describe parts of the course, the word that comes to my mind about this experience is illuminating. I teach again in the spring and I hope by then the world returns to some semblance of normality with in-person classes. My plan moving forward assumes in-person classes are back but hopes to integrate many of the positive aspects of online-learning that I have discovered.

Lessons Learned

The hybrid model works.

I made a drastic decision to switch my teaching model in the one week preparation time to a hybrid one: asynchronous lectures + one live session a week. If you have prepared lecture slides and materials from teaching a course many times before, you understand how this decision might seem like an additional and unnecessary overhead. I cannot stress how useful it was to learning. Putting my self in my students’ shoes, I understand how after 12-15 minutes of watching an online seminar, I am easily distracted and I am onto emails, newsfeeds, or other more attention-holding tasks. An online Zoom lecture isn’t much different. You can introduce more breaks and participatory exercises to break the monotony but once you lose your students, it is very hard to bring them back in. In the hybrid model, I would occasionally dive into a specific concept in detail and within 10 minutes, I can see the videos turning off. That said, I was lucky in that I found excellent online videos by Prof. Joe Hellerstein at UC Berkeley and Prof. Andy Pavlo at CMU that covered roughly the same material I had in my syllabus. If you have the summer to prepare, it might be worthwhile to record your own lectures. You have to keep each video segment short. On average the segments were 5 minutes long, with 10 minutes being the longest video I shared.

She also picked and provided video lessons each week that were easy to watch and follow along to, and when the video was long, she picked out the relevant segments which made it less intimidating, since I didn’t have to watch a bunch of 1 hour lecture videos.”

Course evaluation, anon student.

Pre-recorded lectures are not enough and you have to back them with a live session once a week and extended office hours. The live session helped me work with the students on exploring real-world problems from different perspectives and to dig into more advanced material. A lecture format didn’t quite work here and I found the students most engaged when I introduced a problem and then broke them up into groups of 4-6 and asked each group to come up a with solution based on the material covered in the previous week’s module. For example, after the introductory module on transactions, I asked each student group to pick an application they are familiar with and discuss what sorts of anomalies can occur from multiple, interleaving transactions and whether we need a serializable transactional system to support the application. The discussion in each group was lively and instructional.

If you choose to cancel one of your weekly classes like I did, I recommend replacing it with office hours. As students take on independent learning, they may still struggle with certain concepts and a one-on-one can really help students overcome mental blocks. For office hours, I set up the zoom room and waited. It was a casual set-up. I had my camera turned off and microphone muted most of the time. I didn’t expect students to turn on their cameras for office hours either. Some students joined without any questions and used this time to individually work on the weekly lesson with me around and to hear any Q&A that came about. It was a laid back, semi-supervised, learning session.

Structure and organization are key.

In earlier posts, I attached some samples of the structure of my weekly lesson plan. As the class continued, I found myself adding more text and notes to each week’s module and distinguishing core concepts from optional/advanced/good-to-know ones. My advice here is to stick to a consistent plan and do not deviate unless there are exceptional circumstances. Weekly assessments for each lesson were due on Saturday, and the following Monday “Patch-the-Gap” lecture focused on any gaps uncovered from grading the assessments. Of the 22 students who evaluated my class, 8 students answered “organization” and “structure” to the question “what aspects of the course were most valuable to you?” I made sure to communicate any changes in the course plan by email and in the live session. I discussed how the different class deliverables connected to learning objectives and how any changes in these deliverables did not impact their learning objectives. I was open about the advantages and limitations of online learning and used structure to ensure that no student was left behind. Having a rough week shouldn’t derail a student completely: keeping each lesson plan as self-contained as possible and allowing students to revisit past material was how I achieved this.

Weekly assessments, labs and problem sets are crucial, …

… not midterms and final exams: I say this knowing that I will alienate some of my colleagues. Personally, I enjoy putting together a challenging midterm. For many of us, midterms and exams are powerful tools that make us dust-off our textbooks, read, revisit slides and learn. Understandably, giving up exams does not come easy. I have worked through several reasons for not giving them up myself and here is what I found opting for online weekly assessments in lieu of midterms/exams:

  • Myth 1 – Students will share answers and not bother going through the material: I didn’t find this to be the case. Open-ended questions revealed a diversity of student understanding. Even for true/false and multiple-choice questions, there were was a diversity of responses that makes me question this assumption. Handling academic integrity concerns is separate from the form of assessment and should be addressed by affirming a positive learning mindset in students. For example, it helps to reiterate to students their pledge to academic integrity. Reminding students of why they take courses in the first place is also effective. I explain to students that I teach material that will help them with real-world problems that they may face in their professional lives: individually going through the difficult mental process of learning new material and solving problems themselves makes them better prepared for their professional careers. Taking shortcuts doesn’t.
  • Myth 2 – Everyone gets an A: This is a somewhat problematic statement, which I hear often. It implies that online, open-book assessments are somewhat trivial and can not accurately discern the degree of student learning. I found that the weekly assessments allowed students to better gauge their own learning and understanding, more so than closed-book timed examinations where students attributed poor performance to insufficient time, anxiety, having a bad day or confusing questions. It also helped students learn at their own pace. Underlying this statement is also the incorrect assumption that the purpose of teaching a course is assigning a grade: this often has the effect of making students believe that the purpose of taking a course is getting a grade (or getting an A in particular). Grades, however, are just another form of feedback to students as they continue on their life-long learning journeys. The average grade for this course was a B+: most of the students demonstrated adequate competence with concepts introduced in the course and if they need to apply these concepts in future problems, they should have a good idea of what more they need to learn or relearn to do so expertly.

“The weekly assessments really helped to not let the fog and de-motivation of online classes set in.”

Course evaluation, anon student.

“I think the weekly assessments were instrumental in helping me learn and still feel connected to the class and its material.”

Course evaluation, anon student.
  • Myth 3 – You can’t really test problem-solving. The week-long, open-book, multiple submission, assessments kept students on track as they had to learn and understand the weekly lessons to correctly complete the assessments. When students found a question difficult to answer, they had the opportunity to rewatch the video lectures or review the reading material to provide a more confident response. Having office hours a few days before the assessment deadline helped students start their weekly lessons early to avoid cramming the day of the deadline. Having extensive and challenging problem sets and labs allowed students to apply the knowledge they learned across several weekly lessons onto bigger problems. As computer scientists, we solve complex problems and the few minutes we have to answer questions on a timed midterm rarely reflect the reality of the problem-solving process and limit the range of problems we can ask our students to engage with.

I really enjoyed doing the Labs and Problem Sets. While they were challenging, getting this hands on approach has definitely made me more competent as a CS major and a student in general.

Course evaluation, anon student.

Really cool labs with tests, which enabled you to monitor your progress as you go through them, which is not common in other CS classes. I also enjoyed the teamwork dynamic connected with labs as that heavily reminded me of the workplace collaboration aspect of software dev jobs.

Course evaluation, anon student.
  • Use a tool that helps make online grading and feedback easy. I highly recommend Gradescope. The ability to assign rubrics as you grade and reuse comments speeds up grading and helps you figure out the main patterns in student responses. Giving thorough feedback is difficult when you have a large class size and this tool definitely made it feasible.
  • Responsiveness and availability go a long way: Going online meant that many student questions were posted on the online-class forum. I checked those once a day and responded directly to questions or endorsed correct responses from fellow students.

Thoughts for the Future

Moving forward, the experience has made me consider the following improvements for future iterations of the class whether it is online or in-person.

  1. Creating instructional, interactive, and visual notebooks of algorithms, data structures and protocols: There are a variety of online sources that provide some form of animated visualizations of different concepts and my students were exceptional in finding those and sharing them with each other on the class-forum, e.g. animations of searches, insertions, and deletions on b+ trees helped students better understand how they worked. However, what I found missing from many of these beautiful visualizations is a demonstration of how these artifacts behave in practice. For example, how to bulk-load a b+ tree and why is it faster than insertions? À la distill.pub style, I would like to invest time into creating visualization notebooks that delve further into actual practice, behavior in real-world and more advanced optimizations and variations. It is time to rethink the textbook into a more interactive and visual medium. In the recovery and distributed transactions unit, students found it difficult to appreciate the delicateness of designing recovery or commit protocols. An interactive notebook here would ideally allow them to introduce failure and play along the protocols or modify the protocols and understand why they might fail. If this is something you would like to contribute to, please reach out.
  2. Creating a database of continuous online assessment questions: Preparing new weekly assessments every week is time-consuming and error-prone. Students found some of my questions confusing, especially the multiple-choice ones. There is a wide variety of questions online and I had to use those and modify them (thanks to Joe Hellerstein’s edX course). That said, it would be great to build a database of questions designed specifically for regular, online assessment that have the right balance of difficulty and offer a good mix of easy-to-grade and open-ended questions.
  3. Creating recorded video lectures: I did enjoy the reduced stress of not having to lecture twice a week. Despite teaching database systems a number of times before, every semester, I still have to prepare for 2-3 hours for every hour of lecture. I would much prefer to use this time to create content that students can access any time and at their own pace. As much as I like the external video material I found (there is also Professor Jennifer Widom’s excellent StanfordOnline course), I would like to create material that is more aligned with my lecturing style and my personal weighting of the importance of different topics. I am not sure I have time this year to start working on this. I have always enjoyed the energy of students during lecture so I would like to start recording myself once we start in-person classes.


Bring some hope into your online class – weeks 2-4

Online teaching takes a mental toll on students and faculty alike. I miss the face to face interactions, and the gallery of 25+ mini mug-shots on Zoom just isn’t the same. What is quickly transpiring is that some students have adapted really well to the online environment. These students do appreciate the recorded lectures and the ability to replay, pause and even speed through them and do find the weekly assessments helpful as a measure of their learning. Others haven’t adapted as well. As one of my students explained: “I am finding it very hard to stay motivated.” I understand how challenging it is to stay motivated with online learning, and I can only imagine how the current circumstances compound the issue.

So I decided to liven things up for Week 4. Instead of the regular Zoom session where I go over some of the material in depth, I invited NYUAD alumni, whom I have taught a course previously or I have supervised their capstone, to our zoom session: Nine awesome alumni showed up! It was heart-warming to hear their stories from around the world. They shared their professional experiences that spanned a gamut of sectors: consulting, software development and testing, product design, graduate school, real-estate and even the army. They shared a message of hope that these circumstances will pass. They calmed some of the seniors whose post-graduation job offers got upended. They shared their strategies to survive the unstructured world after university: one alum shared that he only signs 6-month contracts and assesses his achievements and life goals every 6-months, flexibly changing his career and plans if the past 6-months made him miserable. Another explained how he used an 8-month employment gap to reinvent his career. They talked about their current jobs and their biggest challenge post-graduation: budgeting! Another explained how “learning, re-learning and reading books” is something that helps deal with the current situation as well as multi-tasking across a variety of personal-interest projects.

They also reflected on their time at NYUAD and their senior years. They described how OS and DB Systems were intense. “We did everything in Azza’s class: projects, assignments, research, exams, and labs!” and how they regretted not taking the “wood-working” class. They all shared how their experiences at NYUAD made them better professionals and people. I may have intended for the session to cheer up my students, but it ended up lifting my spirits and making me so proud of my former students. More importantly, it made me optimistic about the future of my current students, knowing that NYUAD prepares resilient individuals who adapt well to the changing world.


Week 1 of emergency-online-teaching DB systems

Week 1 Lessons covered: Hashing (Cuckoo, Extendible, Linear) and Buffer Pool Management.

Lessons learned from Assessments

The weekly assessments are a mix of auto-graded (e.g. what value causes an infinite loop in a given cuckoo hashing structure) and open-ended questions (e.g. given the pin-counter, can you do away with the dirty bit?). I used Gradescope’s online assignment feature and it worked wellI highly recommend it. With 25 students, I could still read through and grade open-ended questions — it did take most of my Sunday morning!

I found that students came to Zoom’s office hours to validate their understanding of how certain algorithms worked. So the auto-graded aspects did ensure that students worked through these algorithms individually.

The open-ended questions showed me gaps that need to be addressed. Usually, these questions explored different design tradeoffs (e.g. students often drew an analogy between a B+ tree’s fill factor and a hashing table’s utilization factor and split on a rather low utilization threshold of 2/3.) It is spring break this week but next week my in-class Zoom session will target these specific areas such as why would a DBMS rely on an OS file cache and what are some side-effects/problems that could occur?

Re: Academic Integrity: We discussed online academic integrity at length during our Zoom class and I had my students digitally pledge to maintain (using a piazza poll) online academic integrity. It is definitely worth while to do so. It may seem obvious but there are things that aren’t. For example, asking students not to record zoom sessions without explicit permission and consent from others especially if you have students that would not participate if they know they are being recorded, etc.

Lessons learned from office hours

I guess what would have helped most during office hours was to have back-up worked examples, especially if your weekly unit covers specific data-structures/algorithms, etc. I had a student ask about why do you need local-depth in extendible hashing. As I tried to explain that you need to know at local overflows if you need to double the slot directory or not and if local-depth < global-depth, you don’t need to, it became clear that it would have been much easier (less confusing) if I worked through some ready-made examples on the screen.

Lessons learned from the forum

Students do like to help each other. My students shared links to external materials that helped them understand something better than the videos, assigned reading materials or textbook. They also answered each other’s questions much faster (maybe even better) than I would have.

If you don’t have a class forum, start one!

General student feedback

Some students returned to their home countries to stay with family. Others got caught in between the global travel bans and are stuck (neither at home, nor in the UAE!) I received some extension requests but even those students worked through most of the material and online assessments. I think the students do appreciate/prefer the new format. Students sent “thank you” notes. This made me feel much better about the asynchronous approach at least for this week.

More weekly updates to come.


Switching to an online DB systems course on short notice

Disclaimer. I have never taught online. With only 1-week of prep, we were asked to switch to online teaching. I am quickly learning how to do so from trying to figure out how best to deliver content, to how to assess student understanding and work around the many challenges of an online environment. I assume other faculty will be in a similar situation as COVID-19 continues to spread and more countries take precautionary measures such as closing university campuses. Here are the steps I am taking and I hope this blog helps others who are in a similar position and need to teach a Database Systems course. I will also provide the lessons I learn as I go through this.

Pre-online status. We have covered the relational model, relational algebra, SQL, database design, normalization and FDs, and have started the database architecture unit. We had just finished access methods. My lecture style encourages lots of student participation with different in-class exercises. My class has 25 students. Labs (SimpleDB labs) and problem sets can be done in pairs.

Key decision point. You need to decide early on how you intend to proceed with your online class: synchronous, asynchronous or a hybrid of both.

Option 1: Synchronous model. You continue to meet during your lecture sessions and deliver your lectures as before on an online environment like Zoom.

Option 2: Asynchronous model. You prepare recorded lectures and engage students through forum discussions.

Option 1 is difficult to run with 25+ students. It isn’t clear how effective the “talking head” is in-terms of learning: will students digitally raise hands and ask clarifying questions? will students tune-out? Option 1 will never mimic your in-person classroom experience. Moreover, with many students potentially returning home to different parts of the world during closures, it is not clear how many can easily zoom into the classroom. You can record your zoom sessions but I am concerned about how likely students will ask questions or participate if sessions are recorded. Also, as a lecturer, the amount of preparation for a recorded session is much higher than a regular one. I initially considered this approach under the assumption that the university closure will only last for a month with a scheduled 1-week spring break any way. However, on further discussions with other faculty, my optimism wained: it is possible that closures will last for the entire semester and if some of your students left the country, they may not be able to return in time. Option 2 involves a fair amount of work to ensure students are posting questions and engaging in online forums, especially if you haven’t established an online posting culture in your class early on.

Option 3 Hybrid model. I opted for this model to select the best of both worlds. I will be using Professor Andy Pavlo’s recorded lectures from his Fall 2019 Database Systems course as well as Joe Hellerstein’s CS186 Berkeley online recorded lectures. Both lecture sets are excellent. They are at the right level for my class and at the right pace. You might have to pick and choose the material and reorder it to better fit your planned syllabus. Andy’s lectures are not segmented by topic, which means that listeners might loose attention/tune-out. I am creating online lessons as follows: I am using short video segments on a focused topic (I’m taking parts of Andy’s lectures or using Joe’s segments directly). After each topic, I will ask 1-2 short questions worth a few points. These will contribute to each student’s final grade and will help me assess student learning and participation. I will be including my online lessons in this blog with links to the segmented videos and the post-video questions.

Instead of meeting twice a week, I’m planning to meet once a week instead to discuss problem areas and conduct some in-session activities (Zoom’s breakout rooms should help with this). I reduced the number of meetings for two reasons (i) to allow students more time to go through the weekly video lectures and readings and to answer sub-topic questions, and (ii) to provide me more time to prepare the weekly lessons, and to determine problem areas for in-zoom discussion from solutions to sub-topic questions.

Assessment. With two midterms to go, a group project, a lab and another problem set, I am rethinking my assessment strategy.

My current plan is as follows: (i) Replace the group project with students individually writing 2-3 research paper critiques and responding to at least one other student critique online. (ii) Replacing the midterms with the sub-topic questions that are spread out through out the entire semester. (iii) Keeping the remaining lab and problem set as is. Ultimately, assessment should be in line with your learning objectives and the form of assessment can change as long as you achieve your objectives.

My goal with the group research project was to expose students to research ideas in DBMS, which I hope to partially achieve with critiques that are easier to do individually and remotely.

While some tools enable online proctoring, it is difficult to administer midterms online. Finally, it is worth noting that students may be stressed, or worried about family that they may not be able to travel to due to travel restrictions or quarantines, or even sick or in isolation. By distributing the weight of the midterms across many questions for each lesson I hope to not disadvantage students who are dealing with a particularly difficult situation.

With labs and problem sets, I advise not switching to a completely different set if you already started your class. For example, BusTub and DataBass are really cool labs/projects for teaching database systems internals and are auto-graded but the overhead for students to switch midway to another lab etc. might be overwhelming and your capacity for remote support and debugging is severely limited. For this semester, I will continue to use SimpleDB as they already completed Lab 1.

Tools. I’m using Zoom for once a week class discussions and office hours. Piazza for the class forum.

If you haven’t introduced other tools before the shutdown don’t go overboard introducing many tools. Stick to only the tools you absolutely need and those that students will actually use.

Sanity. This is not an easy transition so here are few tips for your mental sanity.

  1. Don’t be overwhelmed by the tons of resources online and advice on online teaching. Most of it you will not be able to follow/implement in the short time frame, so feel free to ignore it and do what you think makes sense for you.
  2. Keep it positive. I always wanted to try out alternative teaching methods and this might be an opportunity to do so.
  3. Feel free to use existing teaching material when possible. If they will help you achieve your learning objectives then it doesn’t have to be perfect or equivalent to the experience you provide students in your classroom or in one-on-one meetings.
  4. You are dealing with more than moving to online teaching. Your research labs may also be closing and researchers might be leaving. Try to keep a healthy expectation of what you hope to achieve this semester or even this year. For example, user studies are suspended this month and that will impact my research and ability to publish this cycle. It will also impact some senior year Capstone projects in my lab. I’m ok with that and I will work with the students around this.

Acknowledgements: I would like to thank Nancy Gleason at the Hilary Balon Teaching Center for her advice and on-going sessions that help support NYUAD faculty, Andy Pavlo and Joe Hellerstein for their lectures. Alexandra Meliou for sharing her materials from a flipped introductory database application class even if it wasn’t quite a fit for my course.


How to write a critique for a research paper?

Note: this is an active post and I’ll be updating it as I receive feedback or find helpful illustrative examples.

One of my many awesome advisors and teachers, Daniel Abadi, made us write critiques of research papers in his graduate database systems course. It was an excellent exercise as it allowed us to:

  1. think deeply of the work,
  2. create a summary that we can later visit whenever we need to refresh our memory of the work or even to write the related works,
  3. think about new research problems or different research perspectives on our current work, and
  4. practice writing (and believe me you need the practice).

If you read a paper and you like it (or dislike it), write a critique! Adrian Colyer has a popular blog where he critiques research papers in systems and ML.

Read the rest of this entry »

The Case for Redistributing Charitable Crowdfunding Donations

What better time to blog about charity than during Ramadan, the month of giving? In late 2015, we partnered up with LaunchGood, a crowdfunding platform, to study ways to improve the overall success of the different charitable campaigns they support. We decided to tackle the problem from a data-driven perspective: we examined two years worth of data on campaigns and donors. Here is a detailed technical report of our key findings.

Read the rest of this entry »