Signal and Noise

Signal and Noise

According to conceptually.org, the general idea of signal and noise is defined simply as:

The signal is the meaningful information that you’re actually trying to detect. The noise is the random, unwanted variation or fluctuation that interferes with the signal.

In the context of security testing, this is actually quite interesting as two people can look at the same thing and conclude entirely different things about it. This is highlighted by both testers’ experiences in testing as well as their backgrounds before testing. At a minimum, there is a certain amount of knowledge needed to know what you are looking at, otherwise everything is all noise. But even with the requisite knowledge to start dissemination, what signal are you looking for? Do you even know? In the immortal words of David Allen, “you may not always think about your values, but you always think inside them.”

I’ve written a fair bit about reverse engineering websites, and I find myself reflecting on why this topic matters to me so much. At the foundational level, this is nearly a requirement to perform qualified tests against systems you don’t have source code for. But if you were given source code, would you still attempt it? Many activities would be significantly reduced because of your immediate visibility into underlying architecture. And while this access gives you deeper understanding with far less ‘guess’ work… it is just one of many types of perspectives.

The truth is that access to source code provides really only one type of signal. It gives you a better sense of boundaries and truth, but it doesn’t immediately get you to what you are actually trying to detect. Source code is a shell of an idea, that really only exists in its actual deployment and operation. There is far more to the execution of a website than just what the code body would tell you directly. First the code itself exists inside a framework, so while it may only use some portion of the framework— it is still subject to the behaviors and rules of said framework. This is also true of servers that host sites, for browsers that interpret the html/js interplay, and any other technologies used to host/log or route people to the site. And even once you get past that, the data added from users and that of products, and then even interactions between wildly different character sets supported by frameworks… it all leads you to a very different beast.

But lets put all that aside as well. Modeling the structure, or even the interactions between features is just yet another perspective. I have met and known many numerous testers who aren’t all that interested in that type of modelling. It is helpful, but if we were being honest, it doesn’t scale very well. Instead, they just focus on a broader attack surface identification and apply specific testing methods that can sort out and identify vulnerabilities based on probability. You won’t model 100 websites quickly, but you could identify and hit them with a variety of tests that give you leads (signal) out of a world of noise. This thinking is likely also how many drive-by attackers likely work, except they focus on leveraging CVEs or some other type of known-vulnerability pattern. Add to that timing against new bug releases measured against time to patch and you have a very valuable scalable testing approach.

Neither approach is a catch all. And further they are just two perspectives of, I am sure, many. When you are evaluating a system to derive something meaningful, are you starting with intentions that align to your objectives? Or are you just doing what others said to do… but don’t know why? There are no easy answers, and lately I feel like I have just more questions.