You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blogs+posts/software-engineering/endpoint-protection/README.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -70,7 +70,7 @@ In the very early days, several AV companies distributed file-hash updates via a
70
70
This was the decade that witnessed the PC and Client/Server revolution so it should come as no surprise that it was also the period that saw a significant increase in the number and complexity of malware. Companies like McAfee and Norton emerged offering solutions initially based on hand-generated signatures, then later automated hash-based signature detection but, by the end of the decade, had shifted to more advanced-heuristic methods in order to detect malware at scale.
71
71
72
72
#### Hand-written Signatures
73
-
In the early days, signatures were hand-written to exactly identify a virus instance. The AV engines understood file formats and the signature contained instructions on what to look for inside the file. Thus began the game of cat-and-mouse played between malware authors and antimalware vendors, which involved narrowing the time-period from the point the virus was released into the wild to the time AV companies could generate and issue a new signature to detect it. Malware authors tried to evade detection by making the virus code more complex and thus harder to generate a signature for it or by using polymorphic code to constantly generate new versions of itself, albeit only slightly altered. Malware authors also tried to widen the playing field by using other types of files, other than executables, and developed viruses for Word Documents, Office macros, VBA Scripts, etc. Both executable malware and macro viruses became 'parasitic', which means they attached themselves to genuine executables and macros in order to hide and evade detection.
73
+
In the early days, signatures were hand-written to exactly identify a virus instance. The AV engines understood file formats and the signature contained instructions on what to look for inside the file. Thus began the game of cat-and-mouse played between malware authors and antimalware vendors, which involved narrowing the time-period from the point the virus was released into the wild to the time AV companies could generate and issue a new signature to detect it. Malware authors tried to evade detection by making the virus code more complex and thus harder to generate a signature that could be used to detect it. Both executable malware and macro viruses became 'parasitic', which means they attached themselves to genuine executables and macros in order to hide and evade detection.
74
74
75
75
#### Hash-based Signatures
76
76
Hash-based signature-based detection works on the basis of 'calculating the hash' for a given file, usually a binary file representing an executable. It uses a well-known hashing algorithm (e.g. MD5, SHA256) to generate a unique, fixed-length string. In turn, this acts as a key identifier for an executable file, regardless of what filename it was given or perhaps what other file attributes were changed. When a file gets scanned by AV software, either as part of a scheduled scan or on-demand when the executable file is being invoked, the hash is calculated and this key is used to check whether the binary executable is known malware. This involves checking it against a database registry of known malware, usually shipped with the AV install and subsequently updated or, in some instances, checked using a remote lookup.
@@ -82,10 +82,11 @@ The overarching problem though, was one of scale. The rate at which new malware
82
82
### 2000's
83
83
This decade saw the explosion of the internet and the additional threats it brought with it so features such as e-mail protection and network protection (e.g. firewalls) began to be incorporated into solutions. Rules-based engines were used to keep up with the rate of malware discovery, by way of providing a faster and more flexible mechanism to deploy detection logic to the endpoint.
84
84
85
-
This decade also saw the advent of polymorphic malware, principally to avoid the detection techniques of AV. Leveraging hash-based signatures were prevalent from the pervious decade and continuing with these techniques required automated infrastructure to acquire new samples, run detection scanners over them and then, if convicted, generate the new hashes and automatically deliver this new content in the next update. Cloud infrastructure began to emerge generally in this decade and this technology was adopted primarily to support these processes.
85
+
This decade also saw the advent of polymorphic malware, principally to avoid the detection techniques of AV. Leveraging hash-based signatures were prevalent from the pervious decade and continuing with these techniques required automated infrastructure to acquire new samples, run detection scanners over them and then, if convicted, generate the new hashes and automatically deliver this new content in the next update. Malware authors also tried to widen the playing field by using other types of files, other than executables, and developed viruses for Word Documents, Office macros, VBA Scripts, etc.
86
+
86
87
87
88
#### Cloud Infrastructure
88
-
The emergence of cloud provided the necessary infrastructure to support the mechanics of the ongoing cycle hash-based detection. Eventually, the content became so large that it was no longer possible distribute full content to the endpoint but instead distribute the most prevalent hashes and heuristics but maintain the full database in the cloud. From an architectural perspective, this meant that the endpoint was effectively a cache of the full database in the cloud, with the backend infrastructure now responsible for quickly maintaining that cache.
89
+
The emergence of cloud provided the necessary infrastructure to support the mechanics of the ongoing cycle hash-based detection. Eventually, the content became so large that it was no longer possible to distribute full content to the endpoint but instead distribute the most prevalent hashes and heuristics and maintain the full database in the cloud. In a sense, and from an architectural perspective, the endpoint was now an agent for the cloud.
89
90
90
91
The challenge here is similar to that of maintaining a distributed database, or perhaps more precisely, that of a content delivery network (CDN). New malware is constantly being discovered, which means the hashes for these files need to be added to the database. Next, these new records need to be distributed either directly to the endpoint running the AV software or, depending on the design of the solution, mirrored to a server geographically close to endpoints. Calls across a network are subject to the laws of physics as well as network latency and having access to a server - or _Point of Presence (POP)_ - is one way the performance of antivirus software can be improved and operated in the background without affecting user experience. Various strategies for sharing updates and reducing network traffic, such as caching results, are employed to make the task of signature checking efficient. This is a complex problem when one considers the scale of the operation needed to support millions or even hundreds of millions of endpoints. The rate at which new malware is discovered and the rate at which existing malware can modify itself ever so slightly in order to generate a different hash is a huge challenge for the industry, to the point where it's arguably easier to store and manage hashes for known-good software.
0 commit comments