15 comments

  • becomevocal 21 minutes ago
    First thought was "only 30 tasks" however the findings map to what I've seen personally: code review consumes majority of tokens
  • gmerc 34 minutes ago
    It’s just like Airline reward miles and offers no benefit to companies over just renting bare metal GPU time
    • emsign 21 minutes ago
      I hope this horrible time will soon be over when cheaper NPUs come available from more hardware companies, and also when model size get optimized down further.

      I wonder what hyperscaled compute farms and models will be good for at that running cost when most AI needs can be fulfilled by on-prem and on-device hardware and models. Probably only customer left are big governments. So in the end the tax payer has to pay for those billions of investments by the AI cartel.

  • SubiculumCode 3 hours ago
    Reminded me of this paper from last year trying to optimize efficient token usage providing budget guidance information. [1]

    [1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Stee...

  • sedatk 1 hour ago
    One month I could use Github Copilot fully with no disruptions. The next month, after pricing changes, I’ve run out of tokens in two days.

    Such drastic changes tell me that pricing of tokens is arbitrary, and AI business is running out of money fast.

    • lucaspiller 1 hour ago
      I think it's more a consequence of pushing for the biggest valuation/IPO. Rumoured profits on inference are north of 70%.

      Taking SpaceX as an example, they have increased prices across all their consumer products over the past six months. But they definitely aren't short on money with Alphabet and Anthropic combined paying them over $2 billion per month.

      Microsoft/GitHub lost out here as they were just repacking other people's products.

      • lefra 32 minutes ago
        Inference can only happen after having invested in training and datacenter construction. Arguing about "inference profitability" sounds a lot to me like ignoring large cost centers of these comanies.
  • sakuraiben 4 hours ago
    One thing I've noticed using agents for coding is that they really like to write thousands of unit tests but not dynamically test.
    • drivebyhooting 4 hours ago
      And they like to burn a ton of tokens writing and debugging tests that are semantically corrupt.
    • esperent 1 hour ago
      Unit tests are a type of dynamic testing. As opposed to static testing which is linting/typechecking etc.

      If you want a difference kind of dynamic testing besides unit tests, have you tried writing it in as a requirement during the planning/PRD phase?

    • gib444 3 hours ago
      And AWS heavily pushes a complex lambda solution stringing together as many chargeable AWS services as possible for a simple requirement

      Their interests are often not your interests. In this case they want you to unnecessary money on useless work (let's stop the euphemism of "tokens" btw)

    • make3 4 hours ago
      you can just tell them to do more dynamic testing. I think dynamic testing is partly frowned upon because it slows things down & can take down software where you wouldn't expect
  • emsign 27 minutes ago
    At its current iteration the AI tech market is not economically sustainable, not for the other markets outside the AI economy, and most deadly not even for the main target customers or AI tech companies themselves. There have been several news of companies having overspent their token budget month after month. The hardware monopolist and his network of buddy companies can determine the token price as freely as they want, there are no competitors, their only "competitor" is when people stop using AI alltogether.
  • drivebyhooting 4 hours ago
    In the past Google et al would hire engineers based on how well they could optimize the infrastructure.

    Maybe soon companies will look at how engineers can optimize the token efficiency of AI.

    • Retric 3 hours ago
      That assumes Tokens will remain a meaningful expense. I’m not sure developers will find uses for ever more tokens nearly as quickly as the prices fall.
      • ares623 2 hours ago
        How are we so confident that prices will fall? Isn't the exact opposite happening, right now, during arguably the most critical part of this whole saga (pre-IPO to make things appear as beautiful and as not-obviously-illegal as possible)? And the only reason they were "falling" previously was for hyper growth.
        • jpatt 1 hour ago
          The Growth aspect mentioned is that VCs are subsidizing the bill right now, so it is hard to know if at the current moment the demand curve would promote as much usage without it, but assuming demand remained constant (not even growing), you could expect token prices to be competed down. It is a commodity without a moat.

          Now that we have pretty decent open source models, anyone can create a new business to supply more tokens. Sure there’s short term scarcity: energy, GPUs, cooling, but this is a scale up problem. More token demand = more data center build = more energy plant build. This downward pressure will also keep frontier private model prices in check.

          Differentiation seems to be happening at the harness level, whereby we can expect token spend to be a metric to compete on and drive down for the customer (at least hoping tools in the application space don’t continue token based billing as their primary revenue stream).

          These are not short term hyper growth forces, but a fundamental alignment of incentives.

        • fc417fc802 38 minutes ago
          In the one direction the hardware continues to improve, new buildouts continue to come online, and methods for improving the parameter efficiency of models continue to be discovered.

          In the other direction models continue to grow larger, new customers continue to arrive, and existing customers continue to find ever more creative ways to burn large quantities of tokens as the prices fall.

          I doubt anyone can say with certainty where the equilibrium will be 1 or 5 years from now largely because (among many other things) it's impossible to predict how much of the current economy AI will end up eating. In general though the third party providers of open weights models are probably the most reliable data source available since they have little to no incentive to subsidize usage.

        • Retric 1 hour ago
          I don’t think we can extrapolate from current API pricing, but dramatically improving hardware in terms of cost:performance is the underlying reality.

          Betting against that you need to assume exponentially more expensive models every year.

        • mobelkh 1 hour ago
          it is falling if you look elsewhere, deepseek made their 75% discount on their V4 models permanent, on one hand there's LLM improvements that make inference cheaper (e.i. MoE, hybrid attention), on the other hand we're getting more inference focused chips that break the nvidia monopoly.

          i don't think a lot of people know this, but a cluster of GPUs can serve multiple clients without much of a drop in performance, e.i. worst case scenario you band together with 6-16 people to run a 2-3 H100 server to host deepseek V4 Flash or 4-6 to run Pro, and you're getting the same performance as if you ran it alone, this means a lot of companies can afford throwing 50-100k into their own LLM server cluster.

          We're at a price point where if you push it further people will move, there's no real vendor lock in, your agent config, skills, MCP servers etc are all reusable with other models and harnesses, so unless you get all providers to collude on a price hike, you risk an exodus of customers

      • dnlosx 3 hours ago
        [dead]
  • senectus1 3 hours ago
    amusing side note:

    Was in a meeting reviewing a potential new product, it was going well until they showed us that they had added AI to it (of course they have). It was pretty obviously just shoehorned in, and one part of that obviousness was that they had a column that showed how many tokens it took to make each query.

    I asked who is paying for the tokens, they said its included in the license. I said, so is there a budget or is it all you can eat. they said good question they didnt know and would get back to me. I said the reason i asked was just one query there had a 250k token burn on it. and it was a fairly simple query about one device.

    then, one of the execs on their side was heard saying out loud "Why are we even showing this to the customers?"

    it have us quite a chuckle. But lesson learned... the cost of adding AI to anything isnt really being accounted for let alone the true cost of actually running the AI.

    all things AI are going to get more expensive. even if you dont want the AI aspect.

  • winphoto 2 hours ago
    [flagged]
  • baarse 3 hours ago
    [dead]
  • andrewvu0203 3 hours ago
    [dead]
  • bonigv 3 hours ago
    [dead]
  • Waffle2180 3 hours ago
    [flagged]
  • satvikpendem 3 hours ago
    Tokenomics is already a word used to describe cryptocurrency economics, not sure why they'd try to redefine it for AI even if a different sort of token is used.
    • NewJazz 2 hours ago
      New fad. Forget about the old fad. This one will be old soon, you better get on board before its too late!