Work & academic life Campus and surroundings Campus life Life in the US Meeting old friends… …and making new friends UIST ‘24 in Pittsburgh video { width: 100%; height: auto; margin-bottom: 0px; } p { margin-top: 0px; } Work & academic life 99% of the time
Bobby Murphy at Princeton HCI
demoing research prototypes to Bobby Murphy
Prof. Yann LeCun presenting at the Princeton Symposium on Safe Deployment of Foundation Models in Robotics
There are 48 combinatorial ways of assigning coordinate frame axes (assign right/left, up/down, and forward/backward to x, y, z, which is $6 \times 4 \times 2$), and it seems as if our disciplines give their best in trying to use all of them. Unfortunately, this means there are $48\times47=2556$ ways of converting between coordinate frames, and each of them is dangerously close to a bug. As if that were not enough, words like extrinsics or pose matrix are used with different meanings, adding to the confusion that inherently surrounds transforms and rotations.
On a related note, for questionnaire analysis check-out my repo under https://github.com/MohamedKari/questionnaire-analysis.
Problem Setup “The purpose of a repeated measures designs is to determine the effect of different experimental conditions on a dependent variable” (Rutherford 2001, p. 61) from measures taken on the same subject under these different conditions.
Let’s assume we follow a within-subject design, measuring the dependent variable “Duration” (or the mean of Duration across multiple repetitions) across 7 different participants under the the independent variables “Distance” and “Size”.
A minimum viable setup for deploying a set of docker-composed containers to a single Docker host in your preferred cloud through a GitHub Actions workflow. This post extends the previous one on secure APIs.
Most “frameworks for bringing ML into production” only cover part of the picture. Let’s try to take a step back and make sense of the different framework-agnostic facets of MLOps, namely Continuous Code Integration and Delivery, Continuous Data Provisioning, Continuous Training, and Continuous Delivery.
Using Certbot, Nginx, and Flask, each running in a Docker container spun up through Docker Compose, this post shows how to serve an API over HTTPS conveniently with Let’s Encrypt certificates. Template repo available under https://github.com/MohamedKari/secure-flask-container-template.
Intro For my latest research, I am looking into visual SLAM (e. g. ORB-SLAM2). Since VSLAM libraries are designated for running efficiently on embedded systems, they are generally programmed in C/C++ and designed with just Linux in mind (even though, for example, in version 2, ROS also aims for compatibility with MacOS).
As a MacBook user, this becomes “interesting”. Of course, Docker makes it easy to run libraries for Linux. However, once the software also comprises graphical UIs, e.
In this post, the different deployment alternatives of Spark on Kubernetes are evaluated. From this, I’ll outline the workflow for building and running Spark Applications as well as Spark Cluster-backed Jupyter Notebooks, both running PySpark in custom containers. It is shown how to include conda-managed Python dependencies in the image. Also, it is described how to deploy a notebook server running in Spark’s client mode to the Kubernetes cluster. Workloads use AWS S3 as the data source and sink and are observable using the Spark history server.
Reproducing ML models can be a pain. And this is not even talking about managing model reproducibility with different datasets, features, hyperparameters, architectures, setups, non-deterministic optimization or about model reproducibility in a production-ready setup with constantly evolving input data. No, what I am talking about is getting a model which was developed and published by a different researcher to run on your own machine. Sometimes, or more like most times, this can be a nerve-wrecking endeavor.
The problem of developing ML models on a MacBook In a recent blog post, I have argued why I think it is a good idea to develop ML models inside Docker containers. In short: reproducibility. However, if you don’t have access to a CUDA-enabled GPU, developing or even only replicating state-of-the-art deep-learning research can be close to impossible, Docker or not. All ML researchers and engineers working on a MacBook have probably been exposed to this complication.
The problem of storing large volumes of unstructured datasets Data preprocessing is a vital part of machine learning workflows. However, the story starts even earlier. Even before versioning or labelling data, we have to store the data we want learn from. This quickly becomes a non-trivial task in deep-learning problems where we often operate on non-tabular data such as images resulting in terabyte-scale dataset sizes such as the Waymo dataset for example.