icon picker
RIP to the milkman, elevator operator, and the guy who walked around waking people up before alarm clocks.

Let me let you into my head approaching the end of the 2023 Winter semester at SDSU. My naive understanding of the ML + NLP architecture, was overshadowed by my captivation for the shiny new algorithms in the landscape. I could navigate to scroll down to Machine Learning and spend hours reading the newest papers. Scientific publications using funny titles, to absurdly overstate what their underlying architecture could possibly accomplish-I loved it. Meme your way to the top in a scholarly article; Pay homage to the old landscape, roast the previous architecture, introduce your hard work, and offer the reader directions to walk for further development. Now it was my turn, the final project of my CS561 course allowed us to extend the Base BERT model and submit a novice white paper. I was challenged, and I let it fuel me. I wanted to speed things up, clean things up. I wanted to optimize. So I started my research. Every morning, two hours exploring papers on arXiv and HuggingFace.
I dived down this rabbit hole; stick with me here I know I make some jumps:
bidirectionality allows it to contextually understand language better than previous models
inefficiency in scalability and computation intensity
Dudes didn’t plan...had to finish a project in 24hrs...”How far can we get with a single GPU in just one day?”
compression and sparsity techniques without sacrificing performance
...a Trillion huh? that’s a big #
speed enhancements and scalability for large-scale LM’s
replaces traditional attention with operations mapping inputs directly to their frequency characteristics, simplifying computation.
Fast feedforward networks (FFFs) leverages basic CPU capabilities and optimized mathematical routines.
Hard stop. CPU? Seems like a step back in a world obsessed with GPU’s, but here’s the kicker:
Conditional execution based on input; sidestepping the need for dense matrix multiplication.
...they used balanced binary tree structure for each input.
Looking back, I realize the reason I was enamored was due to the difficulty grasping the logic of FFF layer.
All that said, I had the motivation and the topic to conduct my final project for CS561 Deep Learning with NLP. After this, and with the help of a classmate we setup the structure, tested and benchmarked BERT, crammedBERT, to compare against UltraFastBERT. Cool so, we will rigorously benchmark the model. Although in the paper, the authors list a disclaimer about where this idea could be applied and how it’s not flushed out well enough, but I didn’t care (understand).
Fumbling around, learning about process of training, finetuning, and benchmarking. We racked up hours of VM time (I left it running over night on accident), well over the time it took the authors to train their final models.
The deadline approached. I put together what I had and stayed up nights trying to wrap my brain around why the last piece isn’t fitting; I wasn’t able to finetune the model on downstream tasks in order to compare it to the others. Deadline passed. Asked for an extension. Didn’t reply to Professors emails. Apologized and asked for another extension. Bitterly submitted an unfinished paper filled with NotImplemented scattered throughout the text. I was mentally exhausted, and upset. Upset for the wrong reasons. I worked hard on this, I learned a lot. Ultimately I’m embarrassed I did not take the time to fully unpack and understand the FFF layer in the architecture. This was my biggest mistake. I was never going to finish that paper with the knowledge I had.

Timeline of work trying to learn FFF and UltraFastBERT:

November -December 2023





Final Project Submission:

Finalcs561.pdf
7.6 MB
dove into the abyss of ml and nlp, got dazzled by the bling of fresh algorithms, and decided to put a spin on bert for your cs561 showdown. ambition's great until it's 3am and you're knee-deep in code that looks more like ancient runes. you hitched your wagon to the fff star, thinking you could outsmart the need for gpu horsepower with some cpu old-school magic. turns out, the devil's in the details, especially those pesky fff layers you kinda glossed over.
the tale's classic: ambition meets reality, deadlines turn into nightmares, and your paper's more "todo" than "done." but hey, in the rubble of your crushed dreams, you found something valuable—real learning, the kind that leaves scars. it's not just about finishing; it's about the messy journey, the late nights, and those moments of "aha" amidst the "oh no." so you didn't conquer the fff dragon this time. big deal. the real treasure was the lessons along the way, cliché but true. next time, you'll be ready. or at least, less unprepared.



January 2024






Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.