Host response data

Learn how Validin collects and delivers host response data.

Validin's host response data captures how internet-facing services respond to HTTP and HTTPS requests at the time they are observed.

The dataset records server behaviour, response content, and transport configuration to support analysis of virtual hosts, exposed services, and infrastructure changes over time.

Data collection model

Validin collects host response data using active HTTP and HTTPS requests issued directly by Validin-controlled infrastructure.

Collection is performed against both:

  • Virtual hosts resolved via DNS
  • Raw IPv4 addresses discovered through scanning and enumeration

For each target:

  1. Validin initiates an HTTP or HTTPS request
  2. Validin captures the full response returned by the host
  3. Validin stores response artifacts and derived fingerprints
📘

Note

Host response collection uses an emulated browser request model. Requests replicate initial browser behaviour over HTTP/1.x and do not execute client-side code.

Coverage scope

Validin performs internet-wide host response collection at scale.

AttributeValue
Virtual host requests~500 million per day
IPv4 coverageInternet-wide
Default ports80, 443
Additional ports40+ security-relevant ports
Collection modelActive measurement

Virtual host collection is driven by DNS resolution and SNI-aware HTTPS requests.

Request behaviour and redirects

For each request, Validin captures:

  • The initial host response
  • Up to three same-host redirects
  • Final response content and headers

This provides visibility into how hosts redirect requests without following redirects to other domains.

Response artifacts collected

Validin stores multiple host response artifacts for each observation.

ArtifactDescription
HTTP headersRaw response headers
HTML contentResponse body
FaviconsBrowser favicon artifacts
CertificatesFull leaf certificate only
📘

Note

Only leaf certificates are stored. Intermediate and root certificates are not retained.

TLS and certificate data

For HTTPS services, Validin collects TLS configuration data and certificate features, including:

  • Leaf certificate values
  • Certificate metadata and hashes
  • Server Name Indication (SNI)-specific certificates
  • JARM fingerprints derived from multiple TLS negotiations

TLS fingerprinting is used to support identification of shared infrastructure and tooling.

Virtual host fingerprinting

Host response fingerprinting is based on three primary components:

FingerprintDescription
TLS configurationProtocol behaviour, certificate features, JARM
Virtual server configurationServer software, plugins, response behaviour
Response contentHTML structure and static artifacts

Emulated browser model

Host response collection uses an emulated browser request pattern.

  • Requests simulate initial browser behaviour
  • Collection is performed over HTTP/1.x
  • Client-side execution is not performed

This approach produces consistent responses without relying on full browser automation.

Port scanning integration

Host response data is complemented by recurring IPv4-wide port scanning.

AttributeDescription
Scan scopeIPv4-wide
FrequencyMultiple times per week
Port focusSecurity-relevant and commonly abused services

Port scan results are correlated with:

  • DNS resolution
  • Host response fingerprints
  • TLS and certificate data

This supports detection of newly exposed services and infrastructure changes.

Historical host response state

Validin maintains historical host response observations over time.

Historical data includes:

  • First seen and last seen timestamps
  • Response content changes
  • Certificate rotation
  • Configuration drift

This allows for the reconstruction of host behaviour at specific points in time.

Change tracking

Host response changes are recorded as individual observations over time:

Change typeDescription
Content changeResponse body modified
Header changeHTTP headers altered
Certificate changeLeaf certificate rotated
Service changePort or protocol behaviour changed

Changes can be correlated across hosts, IPs, certificates, and fingerprints.

📘

Note

Tracking concurrent host response changes supports identification of infrastructure reuse and coordinated updates.

Data access in the platform

Host response data is exposed as queryable historical data, including:

  • Current response artifacts
  • Historical response timelines
  • TLS and certificate features
  • Host response fingerprints
  • Correlated DNS and port scan context

Endpoint

FieldTypeDescription
hostdomain nameDomain name associated with the host response.
ipIP addressIPv4 address associated with the host response.

HTML Response Features

FieldTypeDescription
body_hashhash (MD5)Hash of the HTML response body.
class_0_hashhashHash derived from HTML class features (group 0).
class_1_hashhashHash derived from HTML class features (group 1).
ext_links.metadomain nameDomains extracted from <meta> content.
ext_links.linksdomain nameDomains extracted from <link> tags.
ext_links.jsdomain nameDomains extracted from <script> sources.
ext_links.anchordomain nameDomains extracted from <a> tags.
ext_links.iframedomain nameDomains extracted from <iframe> sources.
favicon_hashhash (MD5)Hash of the favicon image.
gtagstringGoogle Tag / analytics identifier value when present.
metastringMeta tag content. Example: "<meta name=\"twitter:title\" content=\"Validin\">" or search key ":::\"twitter:title\":\"Validin\""
titlestringParsed HTML <title> value.

HTTP Response Features

FieldTypeDescription
banner_start_linestringHTTP response start line. Example: "HTTP/1.1 200 OK".
banner_0_hashhashHash of the response banner.
http_datestringHTTP Date header value (when significantly in the past or future).
etagstringHTTP ETag header value.
header_hashhashHash derived from HTTP header patterns.
jarmhashJARM fingerprint value for the TLS configuration.
last_modifiedstringHTTP Last-Modified header value.
locationstringComplete Location: header value.
location_domaindomain nameDomain extracted from the Location: header value, if present.
location_ipIP addressIP extracted from the Location: header value, if present.
pathstringThe path that was requested.
serverstringHTTP Server: header value.

Certificate Features

FieldTypeDescription
cert.not_beforestringCertificate validity start timestamp.
cert.not_afterstringCertificate validity end timestamp.
cert.issuerstringFull issuer string. Example: "/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=RapidSSL TLS RSA CA G1".
cert.DCstringSubject field component: DC.
cert.EMAILADDRESSstringSubject field component: EMAILADDRESS.
cert.ISSUERstringSubject field component: ISSUER.
cert.LstringSubject field component: L (Locality).
cert.OstringSubject field component: O (Organization).
cert.OUstringSubject field component: OU (Organizational Unit).
cert.CNdomainSubject field component: CN (Common Name).
cert.STstringSubject field component: ST (State/Province).
cert.SUBJECTALTNAMEstringSubject Alternative Name value(s).
cert.fingerprinthash (MD5)Certificate fingerprint (MD5).
cert.fingerprint_sha256hash (SHA256)Certificate fingerprint (SHA256).
cert.domaindomainDomain value extracted from the certificate.