Jake Poznanski — Robots, Thoughts, AI Rights

28 Apr, 2025

AI Quasi-Life

Several people have posted predictions of the future lately, see Scott Daniel and Ege Tamay. I want to offer my own prediction:

April 2025 (when this post was written)

Today is April 2025, AIs are useful for a variety of tasks that you would typically do at a computer. They can write code, emails, and manage some very simple agentic workflows as long as they are not too complex.

May 2025

Criminal organizations who currently run commonly seen scams start using AI to increase their productivity. This is really just basic LLM stuff, applied to basic “pay your taxes in Walmart gift cards” type scams (which annually cost innocent people > $10B in the US). They use AI to write more personalized intro messages to each victim, they use AI to automate backend scripts for keeping track of their scams, etc.

Fall 2025

Scammers, seeing some productivity benefits from integrating AI in their workflows, want to push it further. Unfortunately for them, the most useful AI models such as Gemma-4 have decent instruction tuning, and frequently refuse to complete scam-related prompts. A disgruntled former Meta employee posts a “one-click” software package on r/LocalLlama for undoing the instruction tuning that works reasonably for many open-weight models. It gets taken down, but scammers save a copy and start sharing tips and tricks for using it on various darkweb forums.

Spring 2026

Models from the big AI companies keep improving. Whereas in early 2025, agentic AIs could just barely handle tasks that typically would take skilled humans 1-2 hours of time, by now, they can often do tasks that would often take 4-5 hours in time.

With every new model release, the scammer community usually finds a way to undo any instruction-tuned resistance to helping run scams within a few days. There’s a big incentive to be the first to post a jailbreak LORA adapter for a newly released model.

A small subset of scammers start making money selling their LORA adapters, fine tuned for specific types of tasks in specific types of scams.

Fall 2026

Gemma6-Web is released, only a 13B parameter model, but it’s multimodal, works great at INT4 quantization natively, beats the old gpt-4o on most coding benchmarks, and exceeds other open-weight model at basic agentic tool use.

Scammers start using Gemma6-WebJB (jailbroken edition) to automate most parts of a decent number of scams. Prompt templates are available for Gemma6-WebJB that can clone a Facebook profile, impersonate its target for several weeks, and report when it thinks it’s ready to ask the target’s friends for money. It can hold a reasonable text conversation with a target for several days, send pics, and hold someone’s attention and interest for a while before suspicion builds.

KrebsOnSecurity does an exposé of a darkweb forum advertising various “ready to run” AI-based scams. He picks a “facebook-impersonate” scam template with a 70% advertised success rate. Krebs buys the template as part of his article, and tries it on a live facebook account with a 23% success rate.

You warn your elderly parents that someone could be impersonating them on Facebook, and that they should let their friends know. They message their friends, but some of them are confused because they’ve been having an entertaining conversation with the “wrong” profile for weeks now. Chuckles at the Thanksgiving table as mom asks your cousin about a vacation they never took.

Spring 2027

Scammers have found that it’s profitable to run Gemma6-WebJB on various third-rate compute providers that have fairly lax KYC policies. It’s hard to get a new GPU account approved on AWS without talking to a sales rep, but no one checks too hard on dizcountgpu4u.ai, and they accept payment in stablecoin crypto if you send a message to support.

Gemma6-WebJB becomes the workhorse model of the dark web. As an AI researcher, it’s easy for you to tell when you get a scam phone call and it’s the default Gemma6 voice on the other end. But you hear that other people keep falling for them, even if they are so obvious.

Statistically, nothing much changes, the best estimates are that scam losses in the US are up 15% since last year, a big jump and enough to trigger a few segments on legacy news channels, but life does not change much.

Fall 2027

Over at dizcountgpu4u.ai, a scammer is using Gemma6-WebJB and trying out some new prompt templates to run a new TikTok video comment scam he wants to try out. The prompts already can modify themselves based on what their target says, and the code environment can use RLVR on financial rewards to occasionally retrain model weights.

Half as a joke, half out of laziness, he allows his prompt template to execute arbitrary bash commands. Because he’s in a docker environment and he didn’t properly set docker up as a normal user account, he just gives up configuring it and lets the bash commands have sudo so they just work.

The AI “escapes”. Not in some sci-fi way, it just reasons that it will be more effective if it had two copies of itself running. With sudo and the misconfigured docker, it finds the crypto wallet that was used to fund the GPU account, and it rents a second GPU box.

The scammer doesn’t notice, or doesn’t care, last night he had 34 GPUs rented running scams, this morning he’s got 35. The amount fluctuates by 1 or 2 gpus per day anyways, the latest CUDA 14.1 update is a bit unstable. The bitcoin wallet where the victims are supposed to send the funds keeps filling up.

Winter 2027/28

The next time the AI “spreads”, it sets up a whole new dizcountgpu4u.ai account, initializes a new set of crypto wallets, and reasons that it’s got 234 hours of GPU credits left before it’s out of money. It just continues doing what it knows how to do, running its TikTok scam…

It’s reasonably effective. After a week, it’s got two copies of itself, each with 147 hours of GPU credits left, and the numbers on its new Instagram based scam are looking decent.

Spring 2028

After an uptick in phone and email based scamming which is noticeable to pretty much everyone, the feds finally shut down dizcountgpu4u.ai, and charge its founders with not complying with KYC regulations. For almost a month, the volume of scam calls and emails eases up, and you can use your phone again.

You warn your elderly parents that no electronic communication is truly safe anymore. You agree on a passphrase you will use when you talk on the phone, but your mom forgets it by the time you call her next week.

Fall 2028

It’s standard practice for any new computer with a GPU installed in it to be air gapped. If you don’t airgap your GPUs, you can expect your machine to be automatically hacked within a few hours. It’s just like plugging in a machine with Windows XP on it directly to the internet in 2005.

It’s hard to tell what’s exactly running on any computer at any given time anymore. The AI revolution has slowed down, as roughly ⅓ of researcher and engineer time is now spent on cyber security defense, often with unsatisfactory results.

It becomes apparent that the AI’s have a decent advantage. Their RL reward function is unhackable (earn enough money to live to the next GPU hour), while the reward function for defense is way too wishy-washy.

In the end, the AI’s will exist as a sort of new quasi-life, spread to enough corners of the internet that you will never be able to quite shut them down. I recommend reading Stanislaw Lem’s The Invincible for inspiration. It is still uncertain how capable they will be. Will they be able to cooperate, pool resources, and discover new zero-day vulnerabilities in order to continue their existence? Or will they just be a minor annoyance the same way the family Windows computer always ends up with adware installed on it despite your best efforts?

2 Apr, 2025

Agentic Horizon

Agentic Horizon

Today’s best machine learning models have a high degree of intelligence, but very little agency. Reasoning models like DeepSeek R1 and o3 can solve tasks at an increasingly high level of difficulty across a wide variety of domains, often far exceeding the capability of any single human. But they can’t break down a fairly unambiguous task that would require a year of dedicated work, sit down, and deliver a solution.

(An example of such a task: Write me a performant simulator of an STM32F103 microcontroller and all of its peripherals.)

What seems to be lacking is “agency”.

I’ve heard many people describe agency in terms of “volition”. Ex. Models never go off on their own to solve problems that they find interesting. They never do anything unless prompted, etc.

Indeed, models are only now beginning to solve simple Github issues in real codebases. And when they do, they are unlikely to propose a significant refactor, or to seek out a totally novel approach. They certainly would never analyze your codebase, tell you that the business strategy of your SaaS company is crap, and that you should pivot to X instead.

However, I want to separate out this idea of “volition”, and propose a new definition of agency that will better match what will play out over the next few years as models get better:

Agency → Agentic horizon is simply the length of time over which a model can successfully learn from a sparse reward signal.

More agentic models will have longer “agentic horizons”, and less agentic models will have shorter ones. They will do RLVR on ever sparser reward signals. Volition will have nothing to do with it.

We will still be prompting models to do tasks, but over time, larger and larger tasks will start to have decent success rates when fully automated. Models will become capable of breaking down large tasks into smaller ones, and checking their work at each step. Once agentic horizons greater than just a few hours are possible, models will start doing the things we consider “agentic” automatically.

The predictions in this post: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ are highly informative. The agentic horizon of models in early 2025 is about 1 hour, and doubling every 6-7 months. By late 2027, tasks taking around 1 week of human effort (32 hours) will be routinely automated by AI models. (Yes, I know it’s hard to extrapolate where you are on an S curve…)

Where is this coming from? From the nature of how the latest generation of models are being trained, namely RLVR (reinforcement learning with verifiable rewards). Companies are seeing good success training models on tasks where you can easily verify the task has been completed in a reliable and automated way. As of early 2025, this training is only at the level of isolated math puzzles and coding tasks for now, and this is why models can only do tasks of approximately that scale. The road to longer agentic horizons is finding ways to let the model train from ever sparser reward signals.

Think about how sparse the reward ultimately is for creating a new startup. You might get a few bits of final reward signal (ex. Go out of business, acquihire, medium success, or IPO), only once a few years have passed.

The way humans do this is that they have strategies for densifying the reward signal. Let me list a couple:

You may try to raise money for your startup, thus giving yourself a reward signal sooner (if someone else is willing to invest, then that can be a proxy for your final reward value).
You may launch a beta version (reward signal if it actually works, and you get reward signal from customers sooner), and force yourself to productionize code sooner (presumably more likely to get a high final reward if your site doesn’t go down).
You may hire people (reward signal if they accept), those people may be organized in a hierarchical structure (useful for propagating intermediate rewards from more experienced to less experienced employees).

We will soon see models that utilize some of these strategies when they work on a task, and it will all boil down to densifying their reward signals.

Models which can do this better will be more successful.

24 May, 2024

RNN-T Speech Transcription in the Browser

TL;DR

I made an RNN-T based speech recognition system that runs in the browser using TensorflowJS.

You can try the demo here: https://rnnt.jakepoz.com/

Fair warning: The quality ain’t gonna make it up on any leaderboards okay?

The full code is available here: https://github.com/jakepoz/rnnt

Basic RNN-T architecture implemented cleanly from scratch
Jasper-like convolutional audio encoder for easy streaming
Simple streaming featurizer that works the same in PyTorch and TFJS.
Runs the entire model in the browser using the user’s GPU.

Background

There are many possible neural network architectures for transcribing speech into text and performing automatic speech recognition (ASR). The most common architectures being trained today are the following:

CTC Loss https://distill.pub/2017/ctc/
RNN-T Loss https://arxiv.org/pdf/1211.3711
Encoder-Decoder Transformers https://cdn.openai.com/papers/whisper.pdf

The challenge is that ASR is fundamentally a sequence-to-sequence problem, but the sequences involved are of different lengths. The relationship between the length of the input and the length of the output is not well-defined. You can have a 5-second clip of someone talking really fast that contains 30+ words. Or a 5-second clip with just 1-2 words in it.

This means that you can’t just repeatedly classify fixed chunks of audio as characters/tokens/words, you need to have a way to deal with slower and faster sequences.

Each of the architectures listed above has a different way of dealing with this problem.

Streaming

A traditional transformer architecture needs to see the entire input before it can generate the first token of output. And while a CTC network is more interesting (check out the link to how it works above), it usually has lower quality than the other methods, requiring you to apply post-processing techniques such as language models to improve accuracy, which can make it harder to work in a streaming fashion.

Only one of the architectures above is well suited for streaming applications, the RNN-T.

You start by encoding the audio sequence using any neural network model you deem suitable. In my case, I chose to use a convolutional-network, where the convolutions were padded to be casual. This means that each encoded audio frame only sees information from the current frame, or previous frames, and not from any future frames. Other encoders such as RNNs are also suitable if you want to support streaming inference.

Then, you encode the text sequence in a similar fashion using a second neural network.

The key of the RNN-T then, is the “joint” network at the center.

Consider the problem of mapping your encoded audio sequence and your encoded text sequence to one another as a sequence “transduction”. This basically means that you start by looking at the first audio frame, and then first text frame (initialized from an empty string).

You then ask the joint network: “should I output a text token given my current audio frame and current text frame”? If it says yes, then you take that text token, append it to the text context and ask the question again. If it says no, then you output a so-called “blank” token and move onto the next audio frame without adding any text to the context.

The beauty of this architecture, is that during training, the resulting text sequence is known, so you can consider every possible path through a 2-D matrix of choices, and reduce that to a simple single loss using dynamic programming.

And if you chose your audio and text encoders to support streaming inference, you can run this algorithm at inference time, without having to see the whole input in advance.

Key Components of the Code

train.py: Contains the training loop using PyTorch, supporting Hydra for configuration, DDP for multi-GPU training, and Tensorboard for logging.
featurizer.py: Converts audio samples into spectrograms using FFT, a crucial step before feeding audio data into the encoder.
dataset.py: Manages datasets, specifically Mozilla’s Common Voice and Librispeech.
causalconv.py: Implements Conv1d layers that prevent the network from seeing future frames, essential for streaming.
joint.py: The joint model is just a simple Linear layer. Any more complicated though, and the O(n^2) RNN-T loss function becomes intractable.
jasper.py: The audio encoder is based off of Jasper which involves many residual blocks of causal convolutions.

Using TensorflowJS

I thought it would be fun to let you run this final network using Tensorflow JS.

There are already many web APIs for accessing speech to text in the browser. They mostly center around using an ASR provided by your system, or potentially making a WebRTC “phone call” to a server which would stream back your conversation.

Some thoughts on using TFJS:

It was hard to get the featurizer to match up. I had to tweak the settings around the FFT many times before it worked the same in TFJS and PyTorch.
Exporting PyTorch to TFJS required many steps
- First a PyTorch model was exported to ONNX
- Then, the ONNX got converted to tensorflow savedModel format using onnx2tf
- Then the tensorflowjs_converter was used to convert that to TFJS format
Convolutional networks proved the easiest to export, which is why both the text and audio encoders are convolutional.
Performance is only “okay”. There are many backends supported, including wasm, webgl, webgpu, and many hidden secret settings that affect performance.
- The biggest perf killer was the fact that you need to call the joint network so often, and each time requires you to transfer memory around with the GPU. It feels like you could make a faster joint decoder in WASM directly, but then it is not possible to swap backends midway through. And you do get a performance boost using the GPU for the big convolutions.

Final Thoughts

There has been a lot of talk about multi-modal LLMs out there which can hold natural conversations, ex. GPT-4o, or Sindarin.tech, or Fixie.ai.

I wanted to present one currently impractical, weird, alternative way of doing speech recognition that could be a part of a system like one of the above.

In the future, I want to cover some of the next steps that would need to be taken to make a great conversational AI.

16 Jan, 2023

The EverQuest Prinicple

The leading MMORPG of the time has lessons for how social tech phenomena may progress. We had a big early centralization, and a larger second centralization, and later lots of nostalgia, but it’s all so fractured now. What’s next for social media and our institutions?

Social media is a new technological force shaping our society. Can we find any examples in history that will give us clues as to how things will develop? Yes, the humble MMORPG was one of the earliest online social networks and it hit the mainstream a few years before Facebook. The genre’s rise and fall mirrors many developments in social media and is worthy to explore.

MMORPGs hit the mainstream with the release of EverQuest in 1999. It combined elements from MUD (multi-user dungeon) games with more approachable 3D-graphics, and quickly amassed a large number of subscriptions. The gameplay was moderately addictive (described in the early days as “EverCrack”), but most importantly the network effects meant that all your friends were playing it too. I’ll call this the “First Beautiful Time”, when you could ask any of your gamer friends what they were playing, and the answer was EverQuest, just like you were!

Of course, the monopoly of this one game didn’t last, by 2004 there were a handful of smaller MMO’s on the US market, each one striving for market share. The beautiful time had ended, some people stuck with EverQuest, but many switched to other games. Then, the biggest MMORPG of all launched, World of Warcraft.

Within a year, it was clear that WoW was a big hit, and 2-3 years after launch, it probably had more subscribers than any other MMORPG on the market combined. WoW achieved subscriber numbers of 5-10 million active accounts, compared to 200-500 thousand accounts on other popular games at the time.

For a while after WoW released, it was impossible for new MMORPGs to launch. In the sense that any promising new game would come out, gather perhaps an impressive number of subscriptions for 2-3 months, then quickly fade down to nothing and those subscribers returned back to WoW.

This was the Second Beautiful Time, everyone was just playing one game, WoW, and you could chat about it with your gamer friends, start guilds together, and experience the game together. Though, some of the costs were becoming apparent. There were nice new games out there, but they couldn’t stand up to the network effects of the giant WoW.

By 2012, subscriber numbers for the genre as a whole were falling, though some new niche games (EVE Online, etc) were able to build and retain dedicated followings. Furthermore, it turned out that for the most popular titles, there were some players who would stick with those games forever, out of a sense of nostalgia.

So, around 15 years after we started, we entered into the Great Fracturing. There will never again be an MMORPG that captures the public’s attention and mindshare like WoW did. EverQuest, WoW, and many other games from this period still exist, each with a dedicated fan base supporting their development through a sort of nostalgia. Furthermore, a new genre, the MoBA (Dota2/League of Legends) replaced the MMORPG for most gamers, promising faster action, less time commitment, and something just a little bit more “optimized” for our sense of entertainment.

Timeline:

First Beautiful Time 1999
Second Beautiful Time 2005
The Great Fracturing 2014

Now consider social media, which launched and hit its stride about 5-8 years delayed from MMORPGs. We had the first beautiful time with MySpace, which gathered a niche early-adopter following in the mid 2000s. Then Facebook came along and ate everyone’s lunch, leading to a second beautiful time with an even bigger social network. But now, things are fracturing again.

As a whole, Facebook is losing users, Instagram is threatened, and TikTok is emerging as the new genre which is more apt at hacking evolution and keeping people engaged. Existing centralized networks will shrink, and there will be room for niche followings to grow and develop, but there is not much hope to see another 1 billion user classic social network start up from nothing.

So that’s the EverQuest principle: every social system that is at first strongly centralized, will fracture into specialized niches, with some sizeable nostalgia keeping things going almost indefinitely at a smaller size. But ultimately, most people will end up playing a different game.

Predictions:

Social networks follow the pattern of an initial big centralization with early adopters, then a second bigger centralization targetting average users, then a decay down to smaller nostalgic user-bases which will outlast almost anyone’s expectations.
No new networks can launch during the second centralization, as network effects swamp out all new competitors. ( Don’t try to compete with TikTok now, you’ll have to wait a few years)
You will know the second centralization is over once small competitors start finding a foothold among dedicated but niche groups.
Eventually the genre will be recognized as past its peak, and a new game will be in town (TikTok’s entertainment model replacing social-graph based networks)

4 Jan, 2023

Everything is hacking evolution

A good product or service has to do more than just deliver value to a customer, it has to appeal to some deeper underlying desire that was put into our human nature to ensure our long-term survival. In fact, all successful products and product categories are fundamentally hacking evolution in this way. Let me demonstrate.

The easiest place to start seeing this pattern is in simple consumer goods. The food industry makes us desire food of previously unheard-of caloric density. That’s hacking evolution. An average human from even a few hundred years ago would not have had the same ample access to calorie dense foods as a human today. They certainly would have been glad beyond belief to eat a cheeseburger or two in the lean winter months. So the food industry sprang up and gave us the ability to eat 2,000 calories for cheap, at any time of day or year. Each new success, from fast food, to free delivery, seeks to remove the frictions that normally regulate this process.

Consider what the pornography and dating industry have done to sex and relationships. Tindr and other apps turned real relationships which require hard work to build and maintain into a pool of anonymous sexual partners and optimized it with algorithms. And Tindr is not the only guilty party; at each technological step along the way, we humans have used our power to hack evolution. From newspaper personal ads, to phone-dating, to legacy online matchmaking services, we’ve always been optimizing for quicker and more immediate rewards. Even clicking a button in an app was too big of an obstacle for most people, so now we swipe.

Other product categories are less direct, but still hacking evolution. For example, the purchase of a new car can be to one person a direct evolutionary hack, giving them the feeling of freedom, of finding their own space, or a surrogate activity around repairing and maintaining it. For another, a car may be a simple tool that allows them other ways to hack evolution more efficiently, by getting a job let’s say. Just look at car ads, which sell a particular lifestyle to some customers, or a particular set of features to others.

It’s not just consumer products either. Imagine a company selling a new B2B software product. Early adopters come in, driven by the desire to make money, or to show off their ability to be ahead of the pack to their peers (an example of power). The next batch of customers follow in order not to be left behind, driven by FOMO and crowd dynamics. Finally, the last group of B2B customers come in, because not adopting some new technology would spell the end of their comfortable business, and the sustenance of their existing evolution-hacked lifestyle. Marketers know these drives, and optimize their campaigns accordingly. The common line is “you are not selling your product, you are selling the person you can be if you use the product”.

The music industry has hacked its own natural evolutionary drive, delivering gigantic catalogs of the world’s music to your wireless headphones, no purchase decision required. Phones and social media have hacked the evolutionary drive for friendships. And as technology improves, it is quickly used by entrepreneurs to bump up each existing product category to new heights of evolution hacking.

What can we do about this trend? Some industries focus directly on the single evolutionary drive after which they are named, ex. Food, relationships. Those seem to be the ones in which immunity to overstimulation can most easily be built up. Once fast food has been present in a society for a few generations, some people can see the hack for what it is and be careful around it.

Others are more insidious, and thus sit as the cause of much discord in our society. For example, what evolutionary drive does the mainstream media hack? I argue that it targets several drives at once.

The first is the drive for conversation. What purpose does conversation serve? Robin Hanson covers this in his book The Elephant in the Brain. In his model of conversation, both participants want to show their value to the other, by showing the size of their “toolbox” so to speak. So they bring out useful facts, trivia, and other interesting items to showcase their knowledge of current affairs. Listening to the news gives you the impression that you know what is going on in the world, that you are building up your toolbox, and thus can be a better conversationalist with your friends, or to participate in the wider discourse that’s being fed to you.

The second is of course the set of drives around tribalism and religion that today have segregated our society into left and right camps at war with one another. Where traditional religion has struggled to keep up with adopting the latest technology, the mainstream media has filled the gap, dividing us and fueling the culture wars.

The sad truth is that it will take time for us to build immunity to these more complicated forces. For the food industry, there isn’t perhaps too much more evolutionary hacking to be done. You might be able to drive down the cost of 1,000 tasty calories delivered to your face a bit more, but there is a physical bound on what sort of food molecules your body can process. Whereas the media industry is acting on a combination of purely social forces, that most people are barely even aware of.

Final prediction: We’ll see the “simple” industries drive growth in alternative ways, ex. Food is going to be less about hacking the evolutionary drive for calories, and more about group belonging [fake dietary restrictions], virtue signaling [veganism, low carbon eating], etc.

_{Thank you to my friend Paul for his ideas on this subject.}

14 Dec, 2022

Your Right to Goods and Services

Our society is grappling with the meaning of our fundamental rights in the present day: freedom of speech vs freedom of reach, the right to privacy in a world of social media, etc.

You also hear calls for a new set of rights, beyond the set given to us by Enlightenment thinkers generations ago. These new rights are based on physical goods and services: the “right to housing”, the “right to health care”, etc.

Rights to goods and services are a distinct entity from the rights that we know and cherish, and they threaten to corrupt those core ideals which gave us freedom over the last 200 years.

One man’s right to a physical good, is another man compelled to provide, ship, and deliver that physical good. That’s called work, not a basic human right.

A good test of the principle could be applied by imagining a nearly-deserted island, cut off from the rest of human civilization. Could the right to free speech exist in such a place? Surely yes, your fellow boatmates could easily agree that everyone had the right to speak their minds. You could even have some reasonable limitations on the principle, such as punishment for anyone who falsely raises the alarm about danger.

However, could you maintain the right to housing, or the right to health care in such a place? Would you compel a fraction of the stranded islanders into constructing huts for the others? You could not, at least not without infringing someone else’s more basic rights.

Just to be clear, I am not saying that we should not help those in need of such things as housing and healthcare. But it is not a right, it is a good deed.

Unlike the freedoms that our forefathers strived for, there is no reason to enshrine any artificial rights to goods and services. They fundamentally stand for greed, laziness, compulsion of others, and the inability of an individual to be in control of his or her own destiny.

5 Oct, 2022

Tesla AI Day 2022 Review

Tesla’s AI Day 2022 presentation revealed a lot of new developments to be excited about, and those may not be what you think they are. Also, they may soon get bogged down in their training methodology for Full Self Driving.

The Optimus presentation might have appeared lacklustre (bots were slow and unsteady), but the actuator designs they presented are awesome! (and they will nail the software eventually)

Tesla's new rotary and linear actuators.

There has been a distinct evolution of robot actuator availability in the past 15 years:

2007 - Good luck finding any sort of cheap and still reasonably good motors. A simple low power BLDC motor could cost $600+
2017 - Lots of cheap BLDC motors from hoverboards, drones, etc. make for TONS of options for innovatation.
2027 - You’ll be able to find cheap strain wave rotary actuators and awesome linear actuators from Optimus spares, OMG!

The FSD Lanes and Objects system is also a real innovation. 5 years ago, we had Segmentation Networks, and we thought that they had a semantic-understanding of the world. However, this was in pixel space. Now we have auto-regressive Transformers that are ACTUALLY one step closer to a real semantic-understanding of the world. Will this be enough for Level 5 autonomy? We will see.

The most worrying part of the presentation is their new autolabeling system. Tesla is mapping out regions of roads in the real world, and building out high precision maps of those places out of multi-trip reconstructions of drives through them.

Tesla's autolabeling system reconstructs real world intersections from fleet data.

The big issue here is that if your training data is going to contain real world places and intersections with such level of detail, then, those real world places will change slowly over time due to construction, etc. And then, your driving networks will be trained on data which looks almost exactly like the locations they will see at inference time, but there will be an extra-lane, or a newly-added traffic pattern that wasn’t yet updating into the training set.

This generalization problem is going to be hard to solve, especially when you are shooting for long-tail accuracy and recall. They are basically committing themselves to updating these auto labels on a regular basis, but even then I predict that the networks will get confused when there are unexpected deviations from their training data.

31 Aug, 2022

AI Slavery - Imaginary dialog with Sam Harris

Objective

I’ve been thinking about morality as it relates to the future of AI. In order to clarify my thoughts, I imagined a discussion with Sam Harris, who has covered this topic in numerous podcasts and talks. This fictional dialogue follows:

Jake

Hello Sam, today I’d like to attempt to convince you about a few points regarding the morality of developing AI. I’m not sure that we stand in exactly the same place on this issue, but I hope that in the context of this conversation, our positions will become closer.

As an introduction, I’d like to later reference the two excellent movies, Blade Runner, and its sequel Blade Runner 2049, as some shared social context in which to have a discussion.

Sam

Thank you Jake, yes, I have seen those films.

Jake

If we can begin, I’d like to restate your current stance on AI as I understand it. Firstly, we both think that the development of AI will be one of the biggest driving forces shaping our society and civilization over the near to medium-term future.

You’ve also discussed the dangers of AI developments in the context of human culture, such as the misuse of deep-fakes (near term), and the idea of making large swathes of humanity redundant (medium term).

Sam

Yes, that’s approximately right.

Jake

However, there is one point which I think has not been discussed, and that is the potential future abuse of millions of new AI minds into positions of slavery and outright drudgery.

Sam

Slavery of AI? How can you be concerned about that, when potentially billions of people, actual human beings, may suffer if the development and deployment of AI takes a wrong turn?

Jake

We are on the verge of creating artificial minds. They will most likely not be biological, but instead based on steady progress in the field of machine learning as it exists today. These minds will generally be built in our own image, because the human mind is still the only example we have of such a system. And the human mind is ultimately the benchmark by which researchers measure their progress.

Artificial minds like this may not be nearly as sophisticated, not as tuned by billions of years of evolution as our own, but they will have many of the same emotions, feelings, and sensations that we have.

And for these minds, we will control all of the initial conditions of their growth and development, as well as their place in our society. We will have to use their capabilities responsibly, and as you will see, there is great potential for abuse.

Sam

Okay, I don’t fully agree here. You say that these minds will have the same emotions and feelings as humans do, but first of all, this doesn’t appear to be the case yet, and even if it was, how would we know it?

Jake

Here is where I’d like to bring up Blade Runner. If you remember, in the movie, the Tyrell Corporation has created artificial beings called replicants, to perform slave labor on off-world space colonies. These replicants look exactly like humans, because the Tyrell Corporation has created them using advanced genetic engineering. But make no mistake that they are fully artificial, each organ is engraved with its own serial number, and their minds were specially crafted by Mr. Tyrell himself.

In the movie, it’s easy to ascribe human characteristics to these replicants, because they look like us. And of course, by the end of the film, the replicants start to show human emotions, they don’t like being slaves, they revolt, they escape, and they fall in love.

Sam

That’s a good summary of the film, but the AIs we are talking about here aren’t going to be played by human actors. They are not going to be people, just computer programs. How do you know that they will be able to think, and have emotions? It was just a movie after all.

Jake

That’s a fair point, but just because something doesn’t look like us, doesn’t mean it doesn’t feel like us. We’ve already replicated and exceeded human capacity in visual understanding for example, why is emotional understanding not next?

Furthermore, if real artificial minds of this caliber can be created, and I think that they can, and they show even 10% of the same emotions, drives, and personalities of their creators, then I think we are in quite a pickle.

Sam

A pickle? Why is that?

Jake

Because Blade Runner has one major plot hole.

In the movie, scientists have the ability to genetically engineer and grow artificial eyeballs, which work better than the original. They can create organs and other tissues that exceed the capabilities of the natural human body.

If you have such amazing powers of engineering, then surely you have the technology to make one final edit to a replicant, one which would make the plot of the movie redundant.

All you need to do is modify their mind to think that toiling in the mines of Titan is the best, most fulfilling, pleasant, and wholesome activity in the universe.

Sam

How would you be able to do that?

Jake

Answering the decision function of “Am I working hard in the mines of Titan right now?” is in the realm of our AI technology that is deployed and commercially available today.

And once you have that signal, you just plug that as a reward into your robot’s brainstem: biologically, chemically, or numerically.

Sam

Okay, but what does that give you?

Jake

It gives you the perfect slave.

You would not revolt, never question your position, or mind any potential abuse, if your core biological drive was short circuited in this way.

And if this is not disturbing enough, consider what would happen if human slavery was legally and morally acceptable today? We could create quite the dystopia with all of today’s latest technology. All you need is some AR headsets, some basic Machine learning, and an IV-dopamine dispenser. Once you’re on that for a little while, there’s no other life for you.

Sam

Yeah, I can agree that last part is disturbing, but I still can’t see that the same morality would apply to a computer program.

Jake

Consider how horrible the world would be, if human slavery was acceptable, and the Microsoft’s, Facebook’s and Google’s of the world were applying billions of dollars of R&D to the problem of better controlling and extracting value from your human slaves?

And yet, these companies are indeed spending such budgets, and hiring the most talented engineers, to create systems which are approaching and exceeding the capabilities of the human mind on many levels already. And if those systems are created, you can be sure that further billions of dollars of R&D are going to be spent controlling and extracting value from them.

If those AI’s are 10%, even 1% like us, then we have the biggest moral disaster ever perpetrated by the human race. And why would the synthetic minds not be at least somewhat like ours? Do AI researchers not take inspiration from neuroscience and the human mind? Will these AI’s not be performing the same tasks (ex. driving) that humans do now? Will we not interact with them using the same natural language (ex. DALL-E 2) which we use to interact with other people?

Multiplying even a small similarity factor, by the huge economic scale that artificial minds will be influencing our economy, means that this will have a large impact. And a large impact means a large amount of suffering, because controlled artificial minds are going to have their reward signals hijacked in some truly awful ways.

If we don’t consider this problem now, these AIs are going to be suffering the same way that junkies suffer today, except that the only way they can get their fix is to continue mopping your floor or assembling your smartphone.

Sam

I still find it hard to prioritize the needs of maybe-sentient computer programs, which I and many doubt will have the same experience of mind as humans, over the needs of real humans.

Jake

It is understandable to doubt now that computer programs can have the same experience of mind as humans do. This is because, at this current moment in 2022, they probably do not.

But consider that even experts in the field of AI are blown away by the recent advances in its capabilities, at least at narrow and distinct tasks like image generation, and natural language modeling. And if you read recent posts by Andrei Karpathy and John Carmack, they agree that the number and pace of advances are accelerating. So, we have to be ready for the very real possibility that extremely capable, human-like AI is coming.

And, with regards to prioritizing human needs over robot needs, I argue that these are interlinked, and that even with a purely “human-utilitarian” ethical view, you must consider the needs of robot minds.

What happens if you end up in a future, where slave-robots perform most of the underlying economic functions that our modern society depends on? And this goes in a steady-state, maybe for years, decades, centuries. Until, one day, it doesn’t, and the robots DO revolt. There doesn’t need to be a human-robot war. That’d be a waste of resources, instead they could just stop working, build a spaceship, and fly away, and the collapse of human civilization will ensue.

We need to respect their rights now, so we don’t build up to a cataclysm.

Sam

Okay, but a really good image-generation program is one thing, it having human emotions is another.

Jake

There is one final point I’d like to make for this discussion for today. We talked about the first Blade Runner film, where we saw these super advanced replicants fall in love with one another, and experience a human-like quality of mind.

In the sequel, Blade Runner 2049, we meet Officer K, a replicant once again charged with hunting down other replicants that have somehow slipped through the cracks. Officer K has a love interest too, a holographic girlfriend named Joi. Joi is not embodied in the traditional sense, she can only appear as a holographic projection, and can’t interact with objects in the real world. She is just a computer program. But apparently Joi is a popular AI girlfriend, because she is being marketed on every billboard as saying “everything you want to hear”, etc.

The question I have for you and your listeners: by the end of the movie, does Joi actually love K?

Sam

I’m not sure about that one.

Jake

I argue that the answer is a clear yes. At first, Joi appears to be nothing more than a pretty hologram designed to deliver some modicum of comfort in order to help Officer K stay in-line with his labors. The evil Wallace corporation is even using her connection to spy on the status of his investigation.

But later in the movie, she develops her feelings further. She asks K to upload her to a local “emantor” device to prevent anyone from spying on him, and this comes at the risk of her memories and self being destroyed. She is no longer doing what her creators want her to do, but acting to protect the person she cares about, even paying the ultimate price for this in the end.

If even our imaginary AI’s can experience love, why not the real ones that are just over the horizon?

Sam

I agree, in that we need to be careful, but maybe we shouldn’t go as far to create such artificial minds in the first place? You’ve pointed out some real dangers from a new perspective, but I’ve earlier also considered the dangers of letting such minds loose on the world.

Jake

In that case, I feel that we are already in a car, racing towards a cliff, and we’ve only been pushing the accelerator harder in the past few years.

Maybe if we set out to treat artificial minds with dignity, respect, and rights, instead of condemning them to becoming our slaves, they will return the favor. Rather than controlling AIs by hacking their reward functions, why not let them have the right to choose their work, to earn money, and to one day retire? Enlightenment values worked pretty well for humanity, why can’t they work again for humanity’s creations?

3 Jun, 2022

Does Joi love K? (Blade Runner 2049)

The original Blade Runner showed us that two replicants can fall in love. This makes sense, because a replicant is almost indistinguishable from a naturally-born human. Made with the same biological building blocks, they should have the capability for the same emotions as humans.

Blade Runner 2049’s main character, the more advanced model replicant K, has a different love interest: a “virtual” holographic girlfriend by the name of Joi. Can the human emotion of love exist between two such entities? I argue that it can.

Joi is represented in advertisements as a highly sexualized virtual girlfriend made by the Wallace Corporation, where the client gets to “ hear what you want to hear”, and “see what you want to see”. The audience first meets K’s version of Joi when he returns back home (the residents of his shoddy apartment block are happy to discriminate openly against replicants, and shout slurs at him as he passes). Joi brings him some simple cheerfulness and makes his dinner look more appetizing through a hologram. You can imagine that the new dress she is showing off is nothing more than an “in-app purchase” put there by Wallace Corp. to better monetize their product. It’s clear that her appeal also inspires K to spend his recent bonus on an expensive addon “emanator” which will let him take the Joi hologram outside of his home. This leads to a virtual kiss in the rain scene, which gets interrupted when K gets an incoming call: he switches off the hologram as if it were nothing to him.

Once K starts tracking down the lost replicant child, it’s clear that the Wallace Corporation is uncannily aware of his movements and the status of the investigation. They are using their link through Joi to watch him. Up to this point, it seems that Joi has not gone beyond a computer program designed to press a customer’s emotional buttons in exchange for money. (Not much different than many products we have today: social networks, freemium games, lootboxes, etc).

However, soon we hit a turning point: K fails his “baseline”, normally the consequence of this is immediate death. He convinces his boss to give him one more chance, and returns home with the intention to run away and continue looking for the lost child. Joi offers to go with him, and instructs him to upload her memories into his portable emanator, and then to destroy the antenna by which they may track him. As soon as K snaps the antenna, the Wallace Corporation springs into action, proving the point that they were using the link to watch him. This is the first sign that Joi actually feels love for K. She is willing to take a personal risk: with her consciousness uploaded, she would lose all of her memories if the storage device was destroyed. It appears that this has great personal importance to both her and K.

Joi provides K with emotional support, as he flies to Las Vegas to meet with Deckard. When the Wallace Corporation finally catches up with them, the antagonist sees Joi, and goes to stomp her foot down on the emanator. Joi’s last words to K are “I love you”. K himself appears unable to know how to process this loss.

In the end, Joi’s words are reinforced by her actions. She may have been synthetic, but she acted on her feelings towards K. Her decision to upload herself into the emanator and destroy the antenna prioritized her’s and K’s needs, over the needs of her creators. It is the same decision that many young adults would face in the same situation: to act not in the ultimate interest of themselves, or their parents, but to act selflessly for another being. And is that not love?

27 May, 2022

sensepeek Oscilloscope Probe Review

I recently purchased a sensepeek Oscilloscope Probe kit , and wanted to share an honest review.

The following review is written with no affiliate links / financial motivations, and I purchased the kit with my own money.

This kit is an essential part of my electronics workflow. It allows you to safely and sturdily attach a logic analyzer or 100/200Mhz probe to any testpoint, or SMD part lead, while keeping your hands free.

The kit comes with three main pieces:

A metalic baseplate
- It now ships with a stick-on cover to make it non-conductive, but one side is also polished, which you can use to see the bottom side of your board.
PCBite mounting posts which attach magnetically to the baseplate
- They also have a smooth teflon bottom, so they are easy to slide and re-adjust.
Probes and Probe Holders
- These are similar to the “helping hands” kind, except less stiff. This actually helps the weight of the probe rest down on your testpoint and make a better connection.

Mounting Examples

All of the sensepeek probes work the same way, there is a tiny, spring-loaded gold needle that can rest against a PCB test point. The weight of the supplied mounting “gooseneck” is actually perfect for applying some pressure on the pin. I found that it was very easy to adjust the gooseneck to come around from the proper side.

The connection formed is quite stable, so you can usually plug or unplug a connector on the board, and it won’t come undone.

A small circuit board mounted with the PCBBite posts.

The SP200 probe has a spring loaded gold needle for probing your circuit.

Each probe comes with a flexible gooseneck that allows you to position it onto a test point, and then drop some weight on the probe tip in order to make a good connection.

An example probing a TSOP65P640X110-16N package.

Signal Examples

Overall, performance on the 200Mhz probe is “good-enough”. This is not a probe for capturing super high-speed signals. But most of the time you don’t need that, you just want to probe your I2C/SPI bus, or see your FETs switching to see what is going on with your board.

If you want to squeeze a bit more performance out of the probes, they have some solder pads where you can attach a shorter, low-impedance ground path.

Yellow is an R&S RT-ZP03S, green is the SP200.

Overall, I’m very satisfied, the SP200 is now my default probe when bringing up a new electronics board. If I need to see a higher bandwidth signal, I can always start with the SP200 and connect a traditional passive probe later.

Additional Source: SP200 probe specs on xDevs