Just as the art world is filled with wildly divergent opinions about what makes a great work of art, programmers often disagree upon what makes for great code, at least beyond the basic requirement that it shouldn’t crash.
Every developer has their own set of rules and guidelines. When a developer says not to do something, it’s probably because they did it once and failed badly. But new issues can arise when we overcompensate for a mistake by running in the opposite direction. Say your team dodges the x trap by choosing y instead, but it turns out that y has its own issues, leading to yet another long lost weekend.
The good news is, you can learn from both the original mistake and the overcompensation. The best path to nirvana is often the middle one. In this article, we look at some of the most common programming mistakes, as well as the dangers involved in doing the opposite.
Playing it fast and loose
Ignoring basics is one of the easiest ways to produce unstable, crash-prone code. Maybe this means ignoring how arbitrary user behavior could affect your program. Will the input of a zero find its way into a division operation? Will submitted text always be the right length? Are your date formats following the correct standard? Is the username verified against the database? The tiniest mistake can cause software to fail.
One way to solve this is to exploit the error catching features of the code. A developer who likes to play it fast and loose might wrap their entire stack with one big catch for all possible exceptions. They’ll just dump the error into a log file, return an error code, and let someone else deal with the mess. No sweat, right?
Obsessing over details
Some say that a good programmer is someone who looks both ways when crossing a one-way street. But, like playing it fast and loose, this tendency can backfire. Software that is overly buttoned up can slow your operations to a crawl. Checking a few null pointers may not make much difference, but some code is just a little too nervous, checking that the doors are locked again and again so that sleep never comes. No processing gets done in a system like this because the code gets lost in a labyrinth of verification and authentication.
The challenge is to design your layers of code to check the data when it first appears and then let it sail through. Sure, there will be some mistakes as a result, but that’s what error checking is for.
Too much theoretical complexity
Some programmers embrace the study of algorithms. They enjoy designing complex data structures and algorithms because they want to build the most efficient stack possible. Each layer or library must be perfect.
It’s a nice impulse but in many cases, the end result is a huge application that consumes too much memory and runs like molasses in January. In theory, it will be fast but you won’t see it until there are 100 billion users with 50 million documents per user.
Much of algorithmic theory focuses on how well algorithms and data structures scale. The analysis only really applies when the data grows large. In many cases, the theory doesn’t account for how much code it takes to shave off some time.
In some cases, the theory elides crucial details. One of the biggest time sinks is fetching data either from the main memory or worse from a database in the cloud. Focusing on the practical issues of where the data is stored and when it is accessed is better than an elaborate data structure.
Not enough theoretical complexity
The flip side of getting bogged down in programming theory is ignoring the theoretical side of a data structure or algorithm. Code written this way might run smoothly on the test data but get bogged down at deployment time, when users start shoving their records into the system.
Scaling well is a challenge and it is often a mistake to overlook the ways that scalability might affect how the system runs. Sometimes, it’s best to consider these problems during the early stages of planning, when thinking is more abstract. Some features, like comparing each data entry to another, are inherently quadratic, which means your optimizations might grow exponentially slower. Dialing back on what you promise can make a big difference.
Thinking about how much theory to apply to a problem is a bit of a meta-problem because complexity often increases exponentially. Sometimes the best solution is careful iteration with plenty of time for load testing. An old maxim states that “premature optimization is a waste of time.” Start with a basic program, test it, and then fix the slowest parts.
Too much faith in artificial intelligence
We are in a moment where it is becoming clear that AI algorithms can deliver amazing results. The output is shockingly realistic and better than expected. Many believe that the age of the sentient computer has arrived.
AI sometimes provides data that is incredibly useful. Programmers have swapped search engines for large language models because they can’t stand all the ads and “amplified” features created by humans. They mistrust human interference and put their faith in machine learning.
It’s important, though, to recognize exactly what algorithms can do and how they work. Machine learning systems analyze data, then build an elaborate function that imitates it. They’re like clever parrots when delivering text. The problem is that they are programmed to deliver everything with the same confident authority, even when they’re completely wrong. At worst, an AI can be terribly wrong and not realize it any more than we do.
Not enough training data
An artificial intelligence model is only as good as its training data. Now that machine learning algorithms are good enough for anyone to run on a whim, programmers are going to be called to plug them into the stack for whatever project is on deck.
The problem is that AI tools are still spooky and unpredictable. They can deliver great results, and they can also make huge mistakes. Often the problem is that the training data isn’t sufficiently broad or representative.
A “black swan” is a scenario that wasn’t covered by the training data. They are rare but can completely confound an AI. When events aren’t found in the training data, the AI may produce a random answer.
Gathering data is not what programmers are typically trained to do. Training an artificial intelligence model means collecting and curating data instead of just writing logic. It’s a different mindset than we are used to, but it’s essential for creating trustworthy AI models.
Trusting your security to magic boxes
Worried about security? Just add some cryptography. Don’t worry, the salesman said: It just works.
Computer programmers are a lucky lot. After all, computer scientists keep creating wonderful libraries filled with endless options to fix what ails our code. The only problem is that the ease with which we can leverage someone else’s work can also hide complex issues that gloss over or, worse, introduce new pitfalls into our code.
The world is just beginning to understand the problem of sharing too much code in too many libraries. When the Log4j bug appeared, many managers were shocked to find it deeply embedded in their code. So many people had come to rely on the tool that it can be found inside libraries that are inside other libraries that were included in code running as a standalone service.
Sometimes, the problem is not just in a library but in an algorithm. Cryptography is a major source of weakness here, says John Viega, co-author of 24 Deadly Sins of Software Security: Programming Flaws and How to Fix Them. Far too many programmers assume they can link in the encryption library, push a button, and have iron-clad security.
The National Institute of Standards and Technology, for instance, just announced that they were retiring SHA-1, an early standard for constructing a message hash. Enough weaknesses were found so that it’s time to move on.
The reality is that many of these magic algorithms have subtle weaknesses. Avoiding them requires learning more than what’s in the “quick start” section of the manual.
You may not be able to trust other people, but can you really trust yourself? Developers love to dream about writing their own libraries. But thinking you know a better way to code can come back to haunt you.
“Grow-your-own cryptography is a welcome sight to attackers,” says John Viega, noting that even the experts make mistakes when trying to prevent others from finding and exploiting weaknesses in their systems.
So, whom do you trust? Yourself or so-called experts who also make mistakes?
We can find the answer in risk management. Many libraries don’t need to be perfect, so grabbing a magic box is more likely to be better than the code you write yourself. The library includes routines written and optimized by a group. They may make mistakes, but the larger process will eliminate many of them.
Too much trust in the client
Programmers often forget that they don’t have complete control over their software when it’s running on someone else’s machine. Some of the worst security bugs appear when developers assume the client device will do the right thing. For example, code written to run in a browser can be rewritten by the browser to execute any arbitrary action. If the developer doesn’t double-check all of the data coming back, anything can go wrong.
One of the simplest attacks relies on the fact that some programmers just pass the client’s data to the database, a process that works well until the client decides to send along SQL instead of a valid answer. If a website asks for a user’s name and adds the name to a query, the attacker might type in the name
x; DROP TABLE users;. The database dutifully assumes the name is
x, then moves on to the next command, deleting the table filled with all the users.
Clever people can abuse the trust of the server in many more ways. Web polls are invitations to inject bias. Buffer overruns continue to be one of the simplest ways to corrupt software.
To make matters worse, severe security holes can arise when seemingly benign holes are chained together. One programmer may allow the client to write a file, assuming that the directory permissions will stop any wayward writing. Another may open up the permissions just to fix a random bug. Alone there’s no trouble, but together, these coding decisions can hand over arbitrary access to the client.
Not enough trust in the client
Too much security can also lead to problems. Maybe not gaping holes but general trouble for the entire enterprise. Social media sites and advertisers have figured out that too much security and intrusive data collection may discourage participation. People either lie or drop out.
Too much security can corrode other practices. Just a few days ago, I was told that the way to solve a problem with a particular piece of software was just to
chmod 777 the directory and everything inside it. Too much security gummed the works, leaving me to loosen strictures just to keep everything running.
Because of this, many web developers are looking to reduce security as much as possible, not only to make it easy for people to engage with their products but also to save them the trouble of defending more than the minimum amount of data required. One of the latest trends is to get rid of passwords altogether. People can’t keep track of them. So to log in, the websites send a single-use email that’s not much different from a password-reset message. It’s a simpler mechanism that’s ultimately just about as secure.
My book, Translucent Databases, describes a number of ways that databases can store less information while providing the same services.
Closing the source
One of the trickiest challenges for any company is determining how much to share with software users.
John Gilmore, a co-founder of one of the earliest open source software companies, Cygnus Solutions, says the decision to not distribute code works against the integrity of that code. Distribution is one of the easiest ways to encourage innovation and, more importantly, uncover and fix bugs:
A practical result of opening your code is that people you’ve never heard of will contribute improvements to your software. They’ll find bugs and attempt to fix them; they’ll add features; they’ll improve the documentation. Even when their improvement has been amateurishly done, a few minutes of reflection will often reveal a more harmonious way to accomplish a similar result.
The advantages run deeper. Often the code itself grows more modular and better structured as others recompile and move it to other platforms. Just opening up the code forces you to make the info more accessible, understandable, and thus better. As we make the small tweaks to share the code, they feed the results back into the code base.
Openness as a cure-all
Millions of open source projects have been launched, and only a tiny fraction have ever attracted more than a few people to help maintain, revise, or extend the code. In other words, W.P. Kinsella’s “if you build it, they will come” doesn’t always produce practical results.
While openness makes it possible for others to pitch in and thus improve your code, the mere fact that it’s open won’t do much unless there’s an incentive for outside contributors to put in the work. Passions among open source proponents can blind some developers to the reality that openness alone doesn’t prevent security holes, eliminate crashing, or make a pile of unfinished code inherently useful. People have other things to do, and an open pile of code often competes with paid work.
Opening up a project can also add new overhead for communications and documentation. A closed-source project requires solid documentation for users, but an open source project also requires documenting the API and road maps for future development. This extra work pays off for large projects, but it can weigh down smaller ones.
Too often, code that works some of the time is thrown up on GitHub with the hope that the magic elves will stop making shoes and rush to start up the compiler—a decision that can derail a project’s momentum before it truly gets started.
Apple’s Goto Fail bug and the Log4j vulnerability are just two good examples of where errors hid in plain sight for years. The good news is that someone found them eventually. The bad news is that none of us know what hasn’t been found yet.
Opening up the project can also strip away financial support and encourage a kind of mob rule. Many open source companies try to keep some proprietary features within their control; this gives them leverage to get people to pay to support the core development team. Projects that rely more on volunteers than paid programmers often find that volunteers are unpredictable. While wide-open competitiveness and creativity can yield great results, some flee back to closed-source projects, where structure, hierarchy, and authority support methodical development.
Copyright © 2023 IDG Communications, Inc.