Host response data
Learn how Validin collects and delivers host response data.
Validin's host response data captures how internet-facing services respond to HTTP and HTTPS requests at the time they are observed.
The dataset records server behaviour, response content, and transport configuration to support analysis of virtual hosts, exposed services, and infrastructure changes over time.
Data collection model
Validin collects host response data using active HTTP and HTTPS requests issued directly by Validin-controlled infrastructure.
Collection is performed against both:
- Virtual hosts resolved via DNS
- Raw IPv4 addresses discovered through scanning and enumeration
For each target:
- Validin initiates an HTTP or HTTPS request
- Validin captures the full response returned by the host
- Validin stores response artifacts and derived fingerprints
NoteHost response collection uses an emulated browser request model. Requests replicate initial browser behaviour over HTTP/1.x and do not execute client-side code.
Coverage scope
Validin performs internet-wide host response collection at scale.
| Attribute | Value |
|---|---|
| Virtual host requests | ~500 million per day |
| IPv4 coverage | Internet-wide |
| Default ports | 80, 443 |
| Additional ports | 40+ security-relevant ports |
| Collection model | Active measurement |
Virtual host collection is driven by DNS resolution and SNI-aware HTTPS requests.
Request behaviour and redirects
For each request, Validin captures:
- The initial host response
- Up to three same-host redirects
- Final response content and headers
This provides visibility into how hosts redirect requests without following redirects to other domains.
Response artifacts collected
Validin stores multiple host response artifacts for each observation.
| Artifact | Description |
|---|---|
| HTTP headers | Raw response headers |
| HTML content | Response body |
| Favicons | Browser favicon artifacts |
| Certificates | Full leaf certificate only |
NoteOnly leaf certificates are stored. Intermediate and root certificates are not retained.
TLS and certificate data
For HTTPS services, Validin collects TLS configuration data and certificate features, including:
- Leaf certificate values
- Certificate metadata and hashes
- Server Name Indication (SNI)-specific certificates
- JARM fingerprints derived from multiple TLS negotiations
TLS fingerprinting is used to support identification of shared infrastructure and tooling.
Virtual host fingerprinting
Host response fingerprinting is based on three primary components:
| Fingerprint | Description |
|---|---|
| TLS configuration | Protocol behaviour, certificate features, JARM |
| Virtual server configuration | Server software, plugins, response behaviour |
| Response content | HTML structure and static artifacts |
Emulated browser model
Host response collection uses an emulated browser request pattern.
- Requests simulate initial browser behaviour
- Collection is performed over HTTP/1.x
- Client-side execution is not performed
This approach produces consistent responses without relying on full browser automation.
Port scanning integration
Host response data is complemented by recurring IPv4-wide port scanning.
| Attribute | Description |
|---|---|
| Scan scope | IPv4-wide |
| Frequency | Multiple times per week |
| Port focus | Security-relevant and commonly abused services |
Port scan results are correlated with:
- DNS resolution
- Host response fingerprints
- TLS and certificate data
This supports detection of newly exposed services and infrastructure changes.
Historical host response state
Validin maintains historical host response observations over time.
Historical data includes:
- First seen and last seen timestamps
- Response content changes
- Certificate rotation
- Configuration drift
This allows for the reconstruction of host behaviour at specific points in time.
Change tracking
Host response changes are recorded as individual observations over time:
| Change type | Description |
|---|---|
| Content change | Response body modified |
| Header change | HTTP headers altered |
| Certificate change | Leaf certificate rotated |
| Service change | Port or protocol behaviour changed |
Changes can be correlated across hosts, IPs, certificates, and fingerprints.
NoteTracking concurrent host response changes supports identification of infrastructure reuse and coordinated updates.
Data access in the platform
Host response data is exposed as queryable historical data, including:
- Current response artifacts
- Historical response timelines
- TLS and certificate features
- Host response fingerprints
- Correlated DNS and port scan context
Endpoint
| Field | Type | Description |
|---|---|---|
host | domain name | Domain name associated with the host response. |
ip | IP address | IPv4 address associated with the host response. |
HTML Response Features
| Field | Type | Description |
|---|---|---|
body_hash | hash (MD5) | Hash of the HTML response body. |
class_0_hash | hash | Hash derived from HTML class features (group 0). |
class_1_hash | hash | Hash derived from HTML class features (group 1). |
ext_links.meta | domain name | Domains extracted from <meta> content. |
ext_links.links | domain name | Domains extracted from <link> tags. |
ext_links.js | domain name | Domains extracted from <script> sources. |
ext_links.anchor | domain name | Domains extracted from <a> tags. |
ext_links.iframe | domain name | Domains extracted from <iframe> sources. |
favicon_hash | hash (MD5) | Hash of the favicon image. |
gtag | string | Google Tag / analytics identifier value when present. |
meta | string | Meta tag content. Example: "<meta name=\"twitter:title\" content=\"Validin\">" or search key ":::\"twitter:title\":\"Validin\"" |
title | string | Parsed HTML <title> value. |
HTTP Response Features
| Field | Type | Description |
|---|---|---|
banner_start_line | string | HTTP response start line. Example: "HTTP/1.1 200 OK". |
banner_0_hash | hash | Hash of the response banner. |
http_date | string | HTTP Date header value (when significantly in the past or future). |
etag | string | HTTP ETag header value. |
header_hash | hash | Hash derived from HTTP header patterns. |
jarm | hash | JARM fingerprint value for the TLS configuration. |
last_modified | string | HTTP Last-Modified header value. |
location | string | Complete Location: header value. |
location_domain | domain name | Domain extracted from the Location: header value, if present. |
location_ip | IP address | IP extracted from the Location: header value, if present. |
path | string | The path that was requested. |
server | string | HTTP Server: header value. |
Certificate Features
| Field | Type | Description |
|---|---|---|
cert.not_before | string | Certificate validity start timestamp. |
cert.not_after | string | Certificate validity end timestamp. |
cert.issuer | string | Full issuer string. Example: "/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=RapidSSL TLS RSA CA G1". |
cert.DC | string | Subject field component: DC. |
cert.EMAILADDRESS | string | Subject field component: EMAILADDRESS. |
cert.ISSUER | string | Subject field component: ISSUER. |
cert.L | string | Subject field component: L (Locality). |
cert.O | string | Subject field component: O (Organization). |
cert.OU | string | Subject field component: OU (Organizational Unit). |
cert.CN | domain | Subject field component: CN (Common Name). |
cert.ST | string | Subject field component: ST (State/Province). |
cert.SUBJECTALTNAME | string | Subject Alternative Name value(s). |
cert.fingerprint | hash (MD5) | Certificate fingerprint (MD5). |
cert.fingerprint_sha256 | hash (SHA256) | Certificate fingerprint (SHA256). |
cert.domain | domain | Domain value extracted from the certificate. |
Updated 13 days ago
