There is MatchAllDocsQuery that can do this job too. And I think it's better to use it:)
IndexReader reader = IndexReader.open(FSDirectory.open(indexDir));
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new MatchAllDocsQuery();
TopDocs docs = searcher.search(query, reader.maxDoc());
for(ScoreDoc scoredoc:docs.scoreDocs){
Document doc = searcher.doc(scoredoc.doc);
//do your job
}
One of my project use Nutch to fetch forum posts and index them using Lucene. Each post has been processed to eliminate html tag and bbs code. Later, we need to extract some useful information from posts. Obviously, iterate the fetched raw html files is not a good idea.
Fortunately, the extracted post contents are indexed as a field, postcontent. Therefore, reading Lucene's index file and iterating through documents are much faster than read the original file. Here is the solution:
public void iternateIndex(String indexFolderPath) {
try {
Directory index = new SimpleFSDirectory(new File(indexFolderPath));
IndexReader reader = IndexReader.open(index);
for (int i = 0; i < reader.maxDoc(); i++) {
Document doc = reader.document(i);
if (doc != null) {
Field contentField = doc.getField("postcontent");
if (contentField != null && contentField.stringValue() != null) {
String postContent = contentField.stringValue();
//do sth here
}
}
}
this.writeToFile();
} catch (IOException e1) {
e1.printStackTrace();
}
}