Java SAX Parser Example: Parsing Large XML Files Efficiently
Meta Description (155 characters):
Learn how to process XML documents in a memory-friendly manner using an event-driven approach using the SAX parser in Java.
🧠 What Will You Learn in This Guide?
In this guide, you will learn how to read XML documents using Java SAX Parser (Simple API for XML).
Unlike DOM Parser, SAX processes the XML file line by line (sequentially) without loading the entire file into memory.
Thus, it provides high performance, especially when working on large XML files.
You will also learn to handle XML events (startElement, endElement, characters) by creating your own custom handler class (DefaultHandler).
🔍 1. What is SAX Parser?
SAX Parser is a Java API that reads XML documents on an event-based basis.
The javax.xml.parsers.SAXParser class calls methods on specific events (for example, the beginning or end of a tag) when scanning XML.
Key Features:
- Memory friendly: Processes XML piece by piece, does not store the whole thing in memory.
- Event-driven: Works with tag start/end and content events.
- Ideal for large files: Preferred for GB size XML documents.
- Read-oriented: SAX only reads XML, does not write.
💡 Comparison: While DOM loads XML into the tree structure; SAX works on a stream basis.
⚙️ 2. Working Logic of SAX Parser
During the parsing process, SAX Parser triggers methods for the following events:
| 🧠 Event | 🧾 Method | 🔍 Description |
|---|---|---|
| Document Start | startDocument() | Called when XML reading starts. |
| End of Document | endDocument() | Triggered when the reading is completed. |
| Tag Start | startElement() | Called when a new XML tag is found. |
| End of Tag | endElement() | Called when the tag is closing. |
| Text Data | characters() | It is called when the data in the tag is processed. |
📘 Usage:
By extending the org.xml.sax.helpers.DefaultHandler class we can override only the necessary methods.
This makes the code simpler and easier to maintain.
📄 3. Sample XML Document
The following example
<?xml version="1.0" encoding="UTF-8"?>
<Employees>
<Employee id="1">
<age>29</age>
<name>GenixNode</name>
<gender>Male</gender>
<role>Cloud Architect</role>
</Employee>
<Employee id="2">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
</Employees>
🧱 4. Employee Class
Java class representing the <Employee> tag in XML:
package com.genixnode.xml;
public class Employee {
private int id;
private String name;
private String gender;
private int age;
private String role;
// Getter ve Setter metotları
@Override
public String toString() {
return "Personel:: ID=" + this.id + " İsim=" + this.name +
" Yaş=" + this.age + " Cinsiyet=" + this.gender +
" Rol=" + this.role;
}
}
📘 Description: This class ensures that data received from XML is kept object-based.
🧩 5. Custom Handler Class (MyHandler)
The Handler class captures XML read events and converts them into Employee objects.
package com.genixnode.xml.sax;
import java.util.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import com.genixnode.xml.Employee;
public class MyHandler extends DefaultHandler {
private List<Employee> empList = null;
private Employee emp = null;
private StringBuilder data = null;
boolean bAge = false, bName = false, bGender = false, bRole = false;
public List<Employee> getEmpList() {
return empList;
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("Employee")) {
String id = attributes.getValue("id");
emp = new Employee();
emp.setId(Integer.parseInt(id));
if (empList == null) empList = new ArrayList<>();
} else if (qName.equalsIgnoreCase("name")) {
bName = true;
} else if (qName.equalsIgnoreCase("age")) {
bAge = true;
} else if (qName.equalsIgnoreCase("gender")) {
bGender = true;
} else if (qName.equalsIgnoreCase("role")) {
bRole = true;
}
data = new StringBuilder();
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (bAge) { emp.setAge(Integer.parseInt(data.toString())); bAge = false; }
else if (bName) { emp.setName(data.toString()); bName = false; }
else if (bGender) { emp.setGender(data.toString()); bGender = false; }
else if (bRole) { emp.setRole(data.toString()); bRole = false; }
if (qName.equalsIgnoreCase("Employee")) empList.add(emp);
}
@Override
public void characters(char ch[], int start, int length) throws SAXException {
data.append(new String(ch, start, length));
}
}
💬 Why use StringBuilder? SAX sometimes splits the data within a tag into chunks. Therefore, the characters() method can be called more than once; StringBuilder combines these parts.
⚡ 6. Starting Parsing
package com.genixnode.xml.sax;
import java.io.File;
import java.util.List;
import javax.xml.parsers.*;
import com.genixnode.xml.Employee;
public class XMLParserSAX {
public static void main(String[] args) {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = factory.newSAXParser();
MyHandler handler = new MyHandler();
saxParser.parse(new File("/usr/local/data/personel_kayit.xml"), handler);
List<Employee> empList = handler.getEmpList();
for (Employee emp : empList)
System.out.println(emp);
} catch (Exception e) {
e.printStackTrace();
}
}
}
📘 Description: This code reads the XML file and converts each <Employee> element into an Employee object.
📤 7. Sample Output
Personel:: ID=1 İsim=GenixNode Yaş=29 Cinsiyet=Male Rol=Cloud Architect
Personel:: ID=2 İsim=Lisa Yaş=35 Cinsiyet=Female Rol=CEO
💡 8. Points to Consider
| 💡 Status | 🧾 Description |
|---|---|
| Memory Efficiency | SAX reduces RAM usage by reading XML piece by piece. |
| One-Way Reading | Data is read only in the forward direction, it cannot be reversed. |
| Event Based | Events are triggered at the start and end of the tag. |
| DefaultHandler Usage | Simple code is written by overriding only the necessary methods. |
| DOM Alternative | DOM may be preferred for small XML files. |
❓ Frequently Asked Questions (FAQ)
- What is the difference between SAX and DOM?
DOM loads the entire XML into memory, while SAX processes it line by line. SAX is much lighter on large files.
- Can SAX edit XML?
No. SAX only performs read-only operations. DOM or StAX should be used for editing.
- Why is characters() called more than once?
SAX parser can split the text into pieces, so we combine all the pieces with StringBuilder.
- In what situations should I use SAX?
SAX is ideal if you are working with large XML files or memory is limited.
- What do boolean variables in startElement do?
These variables determine which XML field is read during the characters() call.
🏁 Conclusion
SAX Parser is one of the most efficient ways to parse large XML data in Java with minimal memory usage. Thanks to its event-based structure, it keeps data flow under control and increases performance.
💬 Try it: You can immediately test the efficiency of SAX Parser with a high-performance Java working environment that you will install on the GenixNode platform. 🚀

