Amazon Neptune
Service 22

Amazon Neptune

DatabaseGraphManaged

Neptune is a managed graph database. Where relational databases store rows and join them at query time, a graph database stores nodes and the edges between them as first-class objects, so a query that traverses relationships — friends of friends, fraud rings, recommendation paths — stays fast no matter how deep the traversal.

Launched in 2018, Neptune fits workloads where queries are about relationships rather than scanning columns: social networks, knowledge graphs, fraud detection, recommendations, network topology, identity resolution. Its cluster uses the same Aurora-style separated-storage architecture.

Two Graph Models and Query Languages

Neptune supports both major models. A Property Graph has labeled nodes and edges with arbitrary properties (the Neo4j/TinkerPop model) — the natural fit for application development. RDF stores subject-predicate-object triples linked by URIs (the semantic-web model) for knowledge bases that compose across datasets.

Query languages match: Gremlin (imperative traversal) and openCypher (declarative pattern matching) for property graphs, and SPARQL for RDF. For property-graph apps, openCypher reads more like SQL and is the easier starting point.

Cluster Architecture

Like DocumentDB, Neptune uses the Aurora cluster: one writer plus up to fifteen readers on a shared distributed volume across three AZs, with cluster/reader/instance endpoints and seconds-long failover. Neptune authenticates query requests with IAM and Signature Version 4 rather than passwords.

Neptune Analytics

The Neptune database is persistent and transactional, serving live application queries. Neptune Analytics, separate, is an in-memory engine for fast graph algorithms — PageRank, centrality, community detection, vector similarity — over a snapshot loaded from the database, S3, or CSV. Use the database for live queries and Analytics for offline algorithmic analysis; teams often run both.

Neptune vs relational vs DocumentDB

Neptune — queries that traverse relationships many hops deep — graphs are the point, not a side effect.

RDS / Aurora — relational data where a few self-joins or foreign keys are enough; not every relationship is a graph problem.

DocumentDB / DynamoDB — document or key-value data with occasional references, where traversal depth is shallow.

Common Mistakes
  • Treating any data with relationships as a graph problem — when joins suffice, a relational schema in Aurora or RDS is simpler.
  • Adopting Neptune with no graph experience for data that is not deeply graph-shaped, adding a new query language for little gain.
  • Modeling the graph without thinking about traversal patterns — like DynamoDB, the worst case is a query you never designed for.
  • Using the Neptune database for heavy graph algorithms instead of Neptune Analytics, which is built for them.
  • Running a single-instance cluster with no reader in another AZ, slowing failover (the Aurora rule again).
  • Expecting password authentication — Neptune uses IAM and SigV4-signed requests, which clients must handle.
Best Practices
  • Use openCypher for property-graph applications unless the team already knows Gremlin.
  • Model the graph for your traversal patterns up front.
  • Use Neptune Analytics for graph algorithms and the database for live queries.
  • Run at least one reader in a different AZ from the writer.
  • Enable encryption at rest at creation.
  • Measure carefully on very large, high-fan-out graphs — traversal cost grows with fan-out, not just total size.
Comparable services GCP No direct equivalent (Spanner Graph)Azure Cosmos DB (Gremlin API)

Knowledge Check

When is a graph database like Neptune the right choice over a relational database?

  • When queries traverse relationships many hops deep and relationships are the primary concern, not row aggregation
  • Whenever the relational data happens to contain any foreign key relationships at all between any of its various tables
  • Whenever you need fast millisecond single-row primary-key lookups by indexed identifier
  • Whenever you need to run aggregate sums and counts over billions of rows in a wide table

Which query language is the easiest starting point for a property-graph application?

  • openCypher — its declarative pattern matching reads much like SQL
  • SPARQL — it is the standard declarative query language for property graphs
  • SQL dialect — Neptune is fully PostgreSQL wire-compatible underneath
  • Gremlin — it is the one and only supported option for property graphs

What is the difference between Neptune database and Neptune Analytics?

  • The database serves persistent live queries; Analytics is an in-memory engine for graph algorithms over a snapshot
  • They are simply two different marketing names for one and the same underlying Neptune service
  • Analytics is always the single writer instance, while the database acts only as a read replica
  • The database engine runs RDF triples exclusively, while the Analytics engine is strictly restricted to property graphs only

How does Neptune authenticate query requests?

  • With IAM and AWS Signature Version 4, not passwords like RDS or DocumentDB
  • With a master username and password supplied directly on the connection command line
  • With Kerberos tickets as the only supported mechanism
  • It requires no authentication at all once you are connecting from inside a VPC

You got correct