MilikMilik

Under the Hood of Consumer AI: More Math Than Human, and Where Copyright Gets Blurry

Under the Hood of Consumer AI: More Math Than Human, and Where Copyright Gets Blurry

How AI Works: Probabilities, Not Personalities

Modern consumer AI systems, from chatbots to code assistants, are not mini-brains with feelings or intent. They are large statistical models that perform enormous amounts of math extremely quickly. A popular family of models, called transformers, looks at sequences of data—usually text—and predicts which token (a word or piece of a word) is most likely to come next. Tools like Transformer Explainer show this visually: you type a prompt, and the model computes probability distributions over possible next tokens, step by step, based on patterns it learned during training. What seems like creativity is really sophisticated autocomplete driven by statistics and context tracking, not understanding or will. Recognising that AI is “more math than human” matters because it helps users treat outputs as probabilistic guesses, not authoritative answers or moral agents, and reduces the temptation to attribute human-like judgment or responsibility to a model.

Under the Hood of Consumer AI: More Math Than Human, and Where Copyright Gets Blurry

Why Training Data Matters for Everyday AI Use

To learn those patterns, AI models are trained on vast datasets of text, code, and other media. During training, the model adjusts millions or billions of internal weights so it can better predict the next token in different contexts. It does not store whole documents like a database; instead it compresses patterns across the data into numerical relationships. Still, what goes in strongly shapes what comes out. If the training data contains biased language, technical errors, or copyrighted material, echoes of those appear in generated outputs. Tools like Transformer Explainer highlight that a model’s behaviour is the product of this training: adjusting settings such as temperature changes how much it leans into unlikely tokens, illustrating how randomness and probability drive responses. For everyday users, this means AI can be incredibly useful for drafting, brainstorming, or explaining—but it can also be confidently wrong, reflect training-data biases, and sometimes resemble existing copyrighted content too closely.

Malus.sh and AI Code Cloning: A New Kind of Risk

The Malus.sh controversy shows how AI code generation collides with software copyright. Malus.sh claims it can use AI to “liberate” existing open-source projects by recreating them from scratch, producing new code with corporate-friendly licenses and no copyleft obligations. The idea borrows from traditional “clean room” methods, where one team documents behaviour and another, who has never seen the original code, reimplements it. With generative AI, that workflow becomes faster and easier: a model can reproduce the functions of a library without directly copying its source. Malus.sh presents this in provocative language, blurring satire and serious product, yet its founders say paying customers are already using it. For developers, this raises tough questions: if AI can generate a functionally similar library under a more permissive license, where does originality end and AI code cloning begin—and how much protection do existing licenses really provide?

Implications for Open Source, Companies, and Indie Devs

For open-source communities, AI code cloning tools feel like a potential end-run around the social contract of sharing. Projects released under copyleft licenses were designed to ensure improvements remained open; if an AI can produce a “from-scratch” MIT- or BSD-licensed clone that replicates the same behaviour, contributors worry their work can be commercially exploited without reciprocity or attribution. The debate intensified around a ground-up rewrite of the widely used Python library chardet, where an AI-assisted version received a zero-clause BSD license. Some see this as an inevitable outcome: once generative AI exists, it will be used to sidestep stricter licenses. For companies, that’s tempting but legally uncertain territory. For indie devs, it raises strategic choices—whether to adopt more permissive licenses, focus on services and support, or design projects where value lies beyond easily cloneable code.

Practical Guidance for Consumers and Hobbyists

Most everyday users won’t be running Malus.sh, but the same copyright and consumer AI ethics issues apply whenever you ask an AI to write text, images, or code. Safer use cases include: drafting blog posts, learning concepts, prototyping non-critical scripts, or generating personal-use content. Be more cautious when you: 1) ask an AI to “recreate” a specific proprietary or GPL-licensed project; 2) use AI-generated code in commercial products without review; or 3) rely on AI outputs that closely mirror a well-known work. When in doubt, treat AI as a collaborator, not an oracle: review and modify outputs, document how you used the tool, and respect original project licenses rather than trying to “wash” obligations away. If you plan to distribute or monetise AI-generated code or media, consult the relevant open-source or content licenses, and consider getting legal advice for high-stakes products.

Comments
Say Something...
No comments yet. Be the first to share your thoughts!