Amazon Neptune
Neptune is a managed graph database. Where relational databases store rows and join them at query time, a graph database stores nodes and the edges between them as first-class objects, so a query that traverses relationships — friends of friends, fraud rings, recommendation paths — stays fast no matter how deep the traversal.
Launched in 2018, Neptune fits workloads where queries are about relationships rather than scanning columns: social networks, knowledge graphs, fraud detection, recommendations, network topology, identity resolution. Its cluster uses the same Aurora-style separated-storage architecture.
Two Graph Models and Query Languages
Neptune supports both major models. A Property Graph has labeled nodes and edges with arbitrary properties (the Neo4j/TinkerPop model) — the natural fit for application development. RDF stores subject-predicate-object triples linked by URIs (the semantic-web model) for knowledge bases that compose across datasets.
Query languages match: Gremlin (imperative traversal) and openCypher (declarative pattern matching) for property graphs, and SPARQL for RDF. For property-graph apps, openCypher reads more like SQL and is the easier starting point.
Cluster Architecture
Like DocumentDB, Neptune uses the Aurora cluster: one writer plus up to fifteen readers on a shared distributed volume across three AZs, with cluster/reader/instance endpoints and seconds-long failover. Neptune authenticates query requests with IAM and Signature Version 4 rather than passwords.
Neptune Analytics
The Neptune database is persistent and transactional, serving live application queries. Neptune Analytics, separate, is an in-memory engine for fast graph algorithms — PageRank, centrality, community detection, vector similarity — over a snapshot loaded from the database, S3, or CSV. Use the database for live queries and Analytics for offline algorithmic analysis; teams often run both.
Neptune — queries that traverse relationships many hops deep — graphs are the point, not a side effect.
RDS / Aurora — relational data where a few self-joins or foreign keys are enough; not every relationship is a graph problem.
DocumentDB / DynamoDB — document or key-value data with occasional references, where traversal depth is shallow.
- Treating any data with relationships as a graph problem — when joins suffice, a relational schema in Aurora or RDS is simpler.
- Adopting Neptune with no graph experience for data that is not deeply graph-shaped, adding a new query language for little gain.
- Modeling the graph without thinking about traversal patterns — like DynamoDB, the worst case is a query you never designed for.
- Using the Neptune database for heavy graph algorithms instead of Neptune Analytics, which is built for them.
- Running a single-instance cluster with no reader in another AZ, slowing failover (the Aurora rule again).
- Expecting password authentication — Neptune uses IAM and SigV4-signed requests, which clients must handle.
- Use openCypher for property-graph applications unless the team already knows Gremlin.
- Model the graph for your traversal patterns up front.
- Use Neptune Analytics for graph algorithms and the database for live queries.
- Run at least one reader in a different AZ from the writer.
- Enable encryption at rest at creation.
- Measure carefully on very large, high-fan-out graphs — traversal cost grows with fan-out, not just total size.
Knowledge Check
When is a graph database like Neptune the right choice over a relational database?
- When queries traverse relationships many hops deep and relationships are the primary concern, not row aggregation
- Whenever the relational data happens to contain any foreign key relationships at all between any of its various tables
- Whenever you need fast millisecond single-row primary-key lookups by indexed identifier
- Whenever you need to run aggregate sums and counts over billions of rows in a wide table
Which query language is the easiest starting point for a property-graph application?
- openCypher — its declarative pattern matching reads much like SQL
- SPARQL — it is the standard declarative query language for property graphs
- SQL dialect — Neptune is fully PostgreSQL wire-compatible underneath
- Gremlin — it is the one and only supported option for property graphs
What is the difference between Neptune database and Neptune Analytics?
- The database serves persistent live queries; Analytics is an in-memory engine for graph algorithms over a snapshot
- They are simply two different marketing names for one and the same underlying Neptune service
- Analytics is always the single writer instance, while the database acts only as a read replica
- The database engine runs RDF triples exclusively, while the Analytics engine is strictly restricted to property graphs only
How does Neptune authenticate query requests?
- With IAM and AWS Signature Version 4, not passwords like RDS or DocumentDB
- With a master username and password supplied directly on the connection command line
- With Kerberos tickets as the only supported mechanism
- It requires no authentication at all once you are connecting from inside a VPC
You got correct