Support Online
Skip to main content

Java SAX Parser Example: Parsing Large XML Files Efficiently

Meta Description (155 characters):
Learn how to process XML documents in a memory-friendly manner using an event-driven approach using the SAX parser in Java.

🧠 What Will You Learn in This Guide?

In this guide, you will learn how to read XML documents using Java SAX Parser (Simple API for XML).
Unlike DOM Parser, SAX processes the XML file line by line (sequentially) without loading the entire file into memory.
Thus, it provides high performance, especially when working on large XML files.

You will also learn to handle XML events (startElement, endElement, characters) by creating your own custom handler class (DefaultHandler).


🔍 1. What is SAX Parser?

SAX Parser is a Java API that reads XML documents on an event-based basis.
The javax.xml.parsers.SAXParser class calls methods on specific events (for example, the beginning or end of a tag) when scanning XML.

Key Features:

  • Memory friendly: Processes XML piece by piece, does not store the whole thing in memory.
  • Event-driven: Works with tag start/end and content events.
  • Ideal for large files: Preferred for GB size XML documents.
  • Read-oriented: SAX only reads XML, does not write.

💡 Comparison: While DOM loads XML into the tree structure; SAX works on a stream basis.


⚙️ 2. Working Logic of SAX Parser

During the parsing process, SAX Parser triggers methods for the following events:

🧠 Event🧾 Method🔍 Description
Document StartstartDocument()Called when XML reading starts.
End of DocumentendDocument()Triggered when the reading is completed.
Tag StartstartElement()Called when a new XML tag is found.
End of TagendElement()Called when the tag is closing.
Text Datacharacters()It is called when the data in the tag is processed.

📘 Usage:
By extending the org.xml.sax.helpers.DefaultHandler class we can override only the necessary methods.
This makes the code simpler and easier to maintain.


📄 3. Sample XML Document

The following example

<?xml version="1.0" encoding="UTF-8"?>
<Employees>
<Employee id="1">
<age>29</age>
<name>GenixNode</name>
<gender>Male</gender>
<role>Cloud Architect</role>
</Employee>
<Employee id="2">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
</Employees>

🧱 4. Employee Class

Java class representing the <Employee> tag in XML:


package com.genixnode.xml;

public class Employee {
private int id;
private String name;
private String gender;
private int age;
private String role;

// Getter ve Setter metotları

@Override
public String toString() {
return "Personel:: ID=" + this.id + " İsim=" + this.name +
" Yaş=" + this.age + " Cinsiyet=" + this.gender +
" Rol=" + this.role;
}
}

📘 Description: This class ensures that data received from XML is kept object-based.


🧩 5. Custom Handler Class (MyHandler)

The Handler class captures XML read events and converts them into Employee objects.


package com.genixnode.xml.sax;

import java.util.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import com.genixnode.xml.Employee;

public class MyHandler extends DefaultHandler {
private List<Employee> empList = null;
private Employee emp = null;
private StringBuilder data = null;

boolean bAge = false, bName = false, bGender = false, bRole = false;

public List<Employee> getEmpList() {
return empList;
}

@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("Employee")) {
String id = attributes.getValue("id");
emp = new Employee();
emp.setId(Integer.parseInt(id));
if (empList == null) empList = new ArrayList<>();
} else if (qName.equalsIgnoreCase("name")) {
bName = true;
} else if (qName.equalsIgnoreCase("age")) {
bAge = true;
} else if (qName.equalsIgnoreCase("gender")) {
bGender = true;
} else if (qName.equalsIgnoreCase("role")) {
bRole = true;
}
data = new StringBuilder();
}

@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (bAge) { emp.setAge(Integer.parseInt(data.toString())); bAge = false; }
else if (bName) { emp.setName(data.toString()); bName = false; }
else if (bGender) { emp.setGender(data.toString()); bGender = false; }
else if (bRole) { emp.setRole(data.toString()); bRole = false; }

if (qName.equalsIgnoreCase("Employee")) empList.add(emp);
}

@Override
public void characters(char ch[], int start, int length) throws SAXException {
data.append(new String(ch, start, length));
}
}

💬 Why use StringBuilder? SAX sometimes splits the data within a tag into chunks. Therefore, the characters() method can be called more than once; StringBuilder combines these parts.


⚡ 6. Starting Parsing


package com.genixnode.xml.sax;

import java.io.File;
import java.util.List;
import javax.xml.parsers.*;

import com.genixnode.xml.Employee;

public class XMLParserSAX {
public static void main(String[] args) {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = factory.newSAXParser();
MyHandler handler = new MyHandler();
saxParser.parse(new File("/usr/local/data/personel_kayit.xml"), handler);

List<Employee> empList = handler.getEmpList();
for (Employee emp : empList)
System.out.println(emp);
} catch (Exception e) {
e.printStackTrace();
}
}
}

📘 Description: This code reads the XML file and converts each <Employee> element into an Employee object.


📤 7. Sample Output


Personel:: ID=1 İsim=GenixNode Yaş=29 Cinsiyet=Male Rol=Cloud Architect
Personel:: ID=2 İsim=Lisa Yaş=35 Cinsiyet=Female Rol=CEO

💡 8. Points to Consider

💡 Status🧾 Description
Memory EfficiencySAX reduces RAM usage by reading XML piece by piece.
One-Way ReadingData is read only in the forward direction, it cannot be reversed.
Event BasedEvents are triggered at the start and end of the tag.
DefaultHandler UsageSimple code is written by overriding only the necessary methods.
DOM AlternativeDOM may be preferred for small XML files.

❓ Frequently Asked Questions (FAQ)

  1. What is the difference between SAX and DOM?

DOM loads the entire XML into memory, while SAX processes it line by line. SAX is much lighter on large files.

  1. Can SAX edit XML?

No. SAX only performs read-only operations. DOM or StAX should be used for editing.

  1. Why is characters() called more than once?

SAX parser can split the text into pieces, so we combine all the pieces with StringBuilder.

  1. In what situations should I use SAX?

SAX is ideal if you are working with large XML files or memory is limited.

  1. What do boolean variables in startElement do?

These variables determine which XML field is read during the characters() call.


🏁 Conclusion

SAX Parser is one of the most efficient ways to parse large XML data in Java with minimal memory usage. Thanks to its event-based structure, it keeps data flow under control and increases performance.

💬 Try it: You can immediately test the efficiency of SAX Parser with a high-performance Java working environment that you will install on the GenixNode platform. 🚀