Service 22

Amazon Neptune

DatabaseGraphManaged

Neptune is a managed graph database. Where relational databases store rows and join them at query time, a graph database stores nodes and the edges between them as first-class objects, so a query that traverses relationships — friends of friends, fraud rings, recommendation paths — stays fast no matter how deep the traversal.

Launched in 2018, Neptune fits workloads where queries are about relationships rather than scanning columns: social networks, knowledge graphs, fraud detection, recommendations, network topology, identity resolution. Its cluster uses the same Aurora-style separated-storage architecture.

Two Graph Models and Query Languages

Neptune supports both major models. A Property Graph has labeled nodes and edges with arbitrary properties (the Neo4j/TinkerPop model) — the natural fit for application development. RDF stores subject-predicate-object triples linked by URIs (the semantic-web model) for knowledge bases that compose across datasets.

Query languages match: Gremlin (imperative traversal) and openCypher (declarative pattern matching) for property graphs, and SPARQL for RDF. For property-graph apps, openCypher reads more like SQL and is the easier starting point.

Cluster Architecture

Like DocumentDB, Neptune uses the Aurora cluster: one writer plus up to fifteen readers on a shared distributed volume across three AZs, with cluster/reader/instance endpoints and seconds-long failover. Neptune authenticates query requests with IAM and Signature Version 4 rather than passwords.

Neptune Analytics

The Neptune database is persistent and transactional, serving live application queries. Neptune Analytics, separate, is an in-memory engine for fast graph algorithms — PageRank, centrality, community detection, vector similarity — over a snapshot loaded from the database, S3, or CSV. Use the database for live queries and Analytics for offline algorithmic analysis; teams often run both.

Neptune vs relational vs DocumentDB

Neptune — queries that traverse relationships many hops deep — graphs are the point, not a side effect.

RDS / Aurora — relational data where a few self-joins or foreign keys are enough; not every relationship is a graph problem.

DocumentDB / DynamoDB — document or key-value data with occasional references, where traversal depth is shallow.

Common Mistakes

Treating any data with relationships as a graph problem — when joins suffice, a relational schema in Aurora or RDS is simpler.
Adopting Neptune with no graph experience for data that is not deeply graph-shaped, adding a new query language for little gain.
Modeling the graph without thinking about traversal patterns — like DynamoDB, the worst case is a query you never designed for.
Using the Neptune database for heavy graph algorithms instead of Neptune Analytics, which is built for them.
Running a single-instance cluster with no reader in another AZ, slowing failover (the Aurora rule again).
Expecting password authentication — Neptune uses IAM and SigV4-signed requests, which clients must handle.

Best Practices

Use openCypher for property-graph applications unless the team already knows Gremlin.
Model the graph for your traversal patterns up front.
Use Neptune Analytics for graph algorithms and the database for live queries.
Run at least one reader in a different AZ from the writer.
Enable encryption at rest at creation.
Measure carefully on very large, high-fan-out graphs — traversal cost grows with fan-out, not just total size.

Comparable services GCP No direct equivalent (Spanner Graph)Azure Cosmos DB (Gremlin API)

Knowledge Check

When is a graph database like Neptune the right choice over a relational database?

When queries traverse relationships many hops deep and relationships are the primary concern, not row aggregation
Whenever the relational data happens to contain any foreign key relationships at all between any of its various tables
Whenever you need fast millisecond single-row primary-key lookups by indexed identifier
Whenever you need to run aggregate sums and counts over billions of rows in a wide table

Which query language is the easiest starting point for a property-graph application?

openCypher — its declarative pattern matching reads much like SQL
SPARQL — it is the standard declarative query language for property graphs
SQL dialect — Neptune is fully PostgreSQL wire-compatible underneath
Gremlin — it is the one and only supported option for property graphs

What is the difference between Neptune database and Neptune Analytics?

The database serves persistent live queries; Analytics is an in-memory engine for graph algorithms over a snapshot
They are simply two different marketing names for one and the same underlying Neptune service
Analytics is always the single writer instance, while the database acts only as a read replica
The database engine runs RDF triples exclusively, while the Analytics engine is strictly restricted to property graphs only

How does Neptune authenticate query requests?

With IAM and AWS Signature Version 4, not passwords like RDS or DocumentDB
With a master username and password supplied directly on the connection command line
With Kerberos tickets as the only supported mechanism
It requires no authentication at all once you are connecting from inside a VPC

You got correct