I went to GitHub Universe in October, and I was surprised by how heavily GitHub and Microsoft are leaning into AI. I expected it to be a feature, but they’re weaving it into their offerings at every level. In fact, they just announced that Microsoft keyboards are going to have a physical Copilot button! At Universe, Github said that this was the year they refounded themselves, based on Copilot.
AI is not the next thing coming in that environment, and it’s not a track, it’s part of every track and every product. And given how much GitHub penetrates the market, there’s no hiding from it. And there are a lot of good parts. You can ask Copilot to help you generate a regex for very complicated patterns. We used to have to look them up and experiment, but now there’s enough of a base of input that Copilot can help you generate a tuned regex for exactly what you need.
You can also use Copilot to help you write more detailed pull request descriptions. Programmers are sometimes lazy about that, so if we can make it easier to write, we end up with something easier to read and review, also. That might be especially useful in an open-source or community-centered codebase, where there is not a manager who can enforce pull request standards with authority.
Copilot Enterprise may also be trained just on internal code, so if you have a relatively clean codebase, you can keep aligned with that, instead of adding possibly-dubious practices that work in the current open-source codebase Copilot is trained on, but might not work for more secure or enterprise-focused development. If your codebase has Fortran or COBOL, you want to respect that existing code and the structures built around it at your organization. Your standards become self-reinforcing.
GitHub kept saying AI, and I think it’s important that we’re really clear on what we’re talking about. What most people are using the word “AI” for is actually something like ChatGPT, which is a Large Language Model, and they’re not quite the same. And neither of them is the same as an Artificial General Intelligence, which is the sci-fi kind of I, Robot thing. People having conversations where the artificial intelligence is understanding them instead of just statistically modeling the right response is a long way from where we are.
We are not there yet.
LLMs are very powerful, but they’re not smart. They don’t understand. They pattern match. I sometimes say AI, because that’s the language we’re all using, but I don’t think we have any AI yet, in the magical sense.
Copilot isn’t full automation. It’s an assistant, not a replacement. You still need to be a subject matter expert in the business problem you’re trying to solve, because you need to be able to spot wrong or damaging solutions. AI/LLM is statistics and word association. It can detect patterns and match them, but if the pattern is not correct, then you have a problem. And you can’t tell if the pattern is incorrect if you don’t know what it should look like.
You still need to know how to fix the pattern or how to solve it, more than just fixing the parts. For someone like me, who’s a subject matter expert in Terraform and shell scripting, Copilot is useful for me to quickly generate crud code, because I can ask for quick code and when I look at it, I can see the parts that work or the parts that I don’t like. And so I can take it as a starting point and fix it. I generally don’t bother to try to fix the LLM unless the answer is totally off and I need a better answer, because that’s a distraction from what I’m trying to do, I usually just want a starting point, or template.
But I worry about newer developers, who may rely on LLM to generate their code -- how will they be at debugging the code, since they don’t always know what lines are causing what action? What if ChatGPT or OpenAI shuts down, like lots of new apps shut down? The LLM hides the process, and you don’t know what the process is or how to repeat it because you’ve never done it by hand.
Think of it like a cashier. Their cash register calculates the sales tax, and knows what it is in that exact municipality, and whether or not clothing is taxed. If the power goes out, there’s no way that they know how to calculate the sales tax correctly. Someone else has set that calculation, and they may not even know what the tax rate is. Newer devs may end up in the same place, where they are perfectly competent, until their tool is unavailable.
I don’t really 100% trust technology will always be working, so I like to know how to do things myself, even if I don’t do it most of the time. I at least want to be able to look something up!
Before you can automate a process, you need to know how the process is done manually. Then you can automate it. Then you can try to refine it or make it more efficient. You can’t do all those things at the same time, it’s not going to work.
In the same way, we can’t trust LLMs/AI through our whole stack without understanding what we’re automating.
There are things that automation and AI will be great at. If it does a vulnerability scan of your system, and sees things that should be fixed and that the remediation path is pretty obvious, we could just agree to fix them automatically. Or you can choose not to apply those fixes, because maybe you know better than Github what you need to keep stable. I think this is one of those situations where if you have proper testing, it may not be a bad idea to just have AI apply the fix.
But you really do need to review the code first before you apply the fix without checking it. I don’t think they’ll deliberately introduce anything, but it might interact strangely with your code base if you don’t understand what the fix is doing.
GitHub has a lot of actions and other add-ons from third-party companies, and I think if the main thing that action does is something Copilot can do, those companies need to be thinking about their business plan and long-term sustainability. Microsoft has so much software it can draw on to add Copilot value. Word? Copilot now. Excel? Copilot! They already have the market base to do that. I don’t think Microsoft/GitHub will get into really complex things, but if they already have it, they might as well roll it out with Copilot.
When I came back from GitHub Universe, I started thinking about what OpenContext needed to do to respond to this big move from GitHub.
I think we need to start investigating more about what things we can detect about generative AI, or the use of generative AI, or the way models are used. The point of OpenContext is not to track all the things, but track the relationships of things that matter.
For example, if the majority of your codebase is Python, but you’re also using generative AI because you have a call out to OpenAI or HuggingFace, maybe we don’t need to do much about it, but I think highlighting it, or showing it to the user would be useful, so that everyone knows.
Right now we are having a lot of copyright fights about what these large LLMs can scrape. But we are going to have audits soon to make sure we’re in compliance with whatever rules we settle on. If you are regulated or responsible, you need to be able to point at what model you’re using, what data you’re feeding it, where the data came from, is the data being sent back, either raw or as results. It’s not going to be acceptable to make a commercial product if you don’t have at least some provenance, and the more sensitive the product is, the more you’re going to need to be able to pinpoint what is happening with the LLM.
OpenContext can’t track all of that. Right now, no one can track all of it due to the complexity. But we think we could at least help track some of the interaction points and show them. This could help you create guardrails.
I didn’t think OpenContext was going to have to work on that this year, and now I feel like it’s going to get urgent pretty quickly.
It’s not always easy to see if a human, bot, or automated process created content or data. Being able to track that means that you can apply different kinds of guardrails and standards to it. Think of it like automated testing. Automated testing can’t and won’t catch everything, but it will catch a lot of things, before you break production. That’s why we require automated testing for most code deployment pipelines. If we’re using AI to generate code multiple times faster, we need to upgrade the guardrails for that pipeline/road. It’s obvious to us that people can’t or shouldn’t approve their own PRs for production code, so why would we let AIs do it?
Also, our data scientists are going to be moving in ways they’re not used to. They used to be able to treat production models as static, but now if you’re feeding user data back into the model, and implementing it, data scientists are going to have to get used to being on call. I’m not sure they’re ready for that!
It’s time to stop talking about what will happen when LLMs show up in our workplaces and codebases. They’re already here. They’ve been here for a while - every time a sentence or a command line autocompletes, that’s the technology that underlies it. What’s going to become important is using the technology in a way that’s safe, auditable, and trustworthy. I think OpenContext is going to have a lot of opportunities to help people map and understand their internal systems, and how those systems interact with LLMs and external systems.