Source Code Cross Referenced for NaiveBayesMultinomialUpdateable.java in » Science » weka » weka » classifiers » bayes » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Science » weka » weka.classifiers.bayes

Source Cross Referenced Class Diagram Java Document (Java Doc)

001:        /*
002:         *    This program is free software; you can redistribute it and/or modify
003:         *    it under the terms of the GNU General Public License as published by
004:         *    the Free Software Foundation; either version 2 of the License, or
005:         *    (at your option) any later version.
006:         *
007:         *    This program is distributed in the hope that it will be useful,
008:         *    but WITHOUT ANY WARRANTY; without even the implied warranty of
009:         *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
010:         *    GNU General Public License for more details.
011:         *
012:         *    You should have received a copy of the GNU General Public License
013:         *    along with this program; if not, write to the Free Software
014:         *    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
015:         */
016:
017:        /*
018:         *    NaiveBayesMultinomialUpdateable.java
019:         *    Copyright (C) 2003 University of Waikato, Hamilton, New Zealand
020:         *    Copyright (C) 2007 Jiang Su (incremental version)
021:         */
022:
023:        package weka.classifiers.bayes;
024:
025:        import weka.classifiers.UpdateableClassifier;
026:        import weka.core.Instance;
027:        import weka.core.Instances;
028:        import weka.core.Utils;
029:
030:        /**
031:         <!-- globalinfo-start -->
032:         * Class for building and using a multinomial Naive Bayes classifier. For more information see,<br/>
033:         * <br/>
034:         * Andrew Mccallum, Kamal Nigam: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI-98 Workshop on 'Learning for Text Categorization', 1998.<br/>
035:         * <br/>
036:         * The core equation for this classifier:<br/>
037:         * <br/>
038:         * P[Ci|D] = (P[D|Ci] x P[Ci]) / P[D] (Bayes rule)<br/>
039:         * <br/>
040:         * where Ci is class i and D is a document.<br/>
041:         * <br/>
042:         * Incremental version of the algorithm.
043:         * <p/>
044:         <!-- globalinfo-end -->
045:         *
046:         <!-- technical-bibtex-start -->
047:         * BibTeX:
048:         * <pre>
049:         * &#64;inproceedings{Mccallum1998,
050:         *    author = {Andrew Mccallum and Kamal Nigam},
051:         *    booktitle = {AAAI-98 Workshop on 'Learning for Text Categorization'},
052:         *    title = {A Comparison of Event Models for Naive Bayes Text Classification},
053:         *    year = {1998}
054:         * }
055:         * </pre>
056:         * <p/>
057:         <!-- technical-bibtex-end -->
058:         *
059:         <!-- options-start -->
060:         * Valid options are: <p/>
061:         * 
062:         * <pre> -D
063:         *  If set, classifier is run in debug mode and
064:         *  may output additional info to the console</pre>
065:         * 
066:         <!-- options-end -->
067:         *
068:         * @author Andrew Golightly (acg4@cs.waikato.ac.nz)
069:         * @author Bernhard Pfahringer (bernhard@cs.waikato.ac.nz)
070:         * @author Jiang Su
071:         * @version $Revision: 1.2 $
072:         */
073:        public class NaiveBayesMultinomialUpdateable extends
074:                NaiveBayesMultinomial implements  UpdateableClassifier {
075:
076:            /** for serialization */
077:            private static final long serialVersionUID = -7204398796974263186L;
078:
079:            /** the word count per class */
080:            protected double[] m_wordsPerClass;
081:
082:            /**
083:             * Returns a string describing this classifier
084:             * 
085:             * @return 		a description of the classifier suitable for
086:             * 			displaying in the explorer/experimenter gui
087:             */
088:            public String globalInfo() {
089:                return super .globalInfo() + "\n\n"
090:                        + "Incremental version of the algorithm.";
091:            }
092:
093:            /**
094:             * Generates the classifier.
095:             *
096:             * @param instances 	set of instances serving as training data
097:             * @throws Exception 	if the classifier has not been generated successfully
098:             */
099:            public void buildClassifier(Instances instances) throws Exception {
100:                // can classifier handle the data?
101:                getCapabilities().testWithFail(instances);
102:
103:                // remove instances with missing class
104:                instances = new Instances(instances);
105:                instances.deleteWithMissingClass();
106:
107:                m_headerInfo = new Instances(instances, 0);
108:                m_numClasses = instances.numClasses();
109:                m_numAttributes = instances.numAttributes();
110:                m_probOfWordGivenClass = new double[m_numClasses][];
111:                m_wordsPerClass = new double[m_numClasses];
112:                m_probOfClass = new double[m_numClasses];
113:
114:                // initialising the matrix of word counts
115:                // NOTE: Laplace estimator introduced in case a word that does not 
116:                // appear for a class in the training set does so for the test set
117:                double laplace = 1;
118:                for (int c = 0; c < m_numClasses; c++) {
119:                    m_probOfWordGivenClass[c] = new double[m_numAttributes];
120:                    m_probOfClass[c] = laplace;
121:                    m_wordsPerClass[c] = laplace * m_numAttributes;
122:                    for (int att = 0; att < m_numAttributes; att++) {
123:                        m_probOfWordGivenClass[c][att] = laplace;
124:                    }
125:                }
126:
127:                for (int i = 0; i < instances.numInstances(); i++)
128:                    updateClassifier(instances.instance(i));
129:            }
130:
131:            /**
132:             * Updates the classifier with the given instance.
133:             *
134:             * @param instance 	the new training instance to include in the model
135:             * @throws Exception 	if the instance could not be incorporated in
136:             * 			the model.
137:             */
138:            public void updateClassifier(Instance instance) throws Exception {
139:                int classIndex = (int) instance.value(instance.classIndex());
140:                m_probOfClass[classIndex] += instance.weight();
141:
142:                for (int a = 0; a < instance.numValues(); a++) {
143:                    if (instance.index(a) == instance.classIndex()
144:                            || instance.isMissing(a))
145:                        continue;
146:
147:                    double numOccurences = instance.valueSparse(a)
148:                            * instance.weight();
149:                    if (numOccurences < 0)
150:                        throw new Exception(
151:                                "Numeric attribute values must all be greater or equal to zero.");
152:                    m_wordsPerClass[classIndex] += numOccurences;
153:                    m_probOfWordGivenClass[classIndex][instance.index(a)] += numOccurences;
154:                }
155:            }
156:
157:            /**
158:             * Calculates the class membership probabilities for the given test
159:             * instance.
160:             *
161:             * @param instance 	the instance to be classified
162:             * @return 		predicted class probability distribution
163:             * @throws Exception 	if there is a problem generating the prediction
164:             */
165:            public double[] distributionForInstance(Instance instance)
166:                    throws Exception {
167:                double[] probOfClassGivenDoc = new double[m_numClasses];
168:
169:                // calculate the array of log(Pr[D|C])
170:                double[] logDocGivenClass = new double[m_numClasses];
171:                for (int c = 0; c < m_numClasses; c++) {
172:                    logDocGivenClass[c] += Math.log(m_probOfClass[c]);
173:                    int allWords = 0;
174:                    for (int i = 0; i < instance.numValues(); i++) {
175:                        if (instance.index(i) == instance.classIndex())
176:                            continue;
177:                        double frequencies = instance.valueSparse(i);
178:                        allWords += frequencies;
179:                        logDocGivenClass[c] += frequencies
180:                                * Math.log(m_probOfWordGivenClass[c][instance
181:                                        .index(i)]);
182:                    }
183:                    logDocGivenClass[c] -= allWords
184:                            * Math.log(m_wordsPerClass[c]);
185:                }
186:
187:                double max = logDocGivenClass[Utils.maxIndex(logDocGivenClass)];
188:                for (int i = 0; i < m_numClasses; i++)
189:                    probOfClassGivenDoc[i] = Math
190:                            .exp(logDocGivenClass[i] - max);
191:
192:                Utils.normalize(probOfClassGivenDoc);
193:
194:                return probOfClassGivenDoc;
195:            }
196:
197:            /**
198:             * Returns a string representation of the classifier.
199:             *
200:             * @return 		a string representation of the classifier
201:             */
202:            public String toString() {
203:                StringBuffer result = new StringBuffer();
204:
205:                result.append("The independent probability of a class\n");
206:                result.append("--------------------------------------\n");
207:
208:                for (int c = 0; c < m_numClasses; c++)
209:                    result.append(m_headerInfo.classAttribute().value(c))
210:                            .append("\t").append(
211:                                    Double.toString(m_probOfClass[c])).append(
212:                                    "\n");
213:
214:                result.append("\nThe probability of a word given the class\n");
215:                result.append("-----------------------------------------\n\t");
216:
217:                for (int c = 0; c < m_numClasses; c++)
218:                    result.append(m_headerInfo.classAttribute().value(c))
219:                            .append("\t");
220:
221:                result.append("\n");
222:
223:                for (int w = 0; w < m_numAttributes; w++) {
224:                    result.append(m_headerInfo.attribute(w).name())
225:                            .append("\t");
226:                    for (int c = 0; c < m_numClasses; c++)
227:                        result.append(
228:                                Double.toString(Math
229:                                        .exp(m_probOfWordGivenClass[c][w])))
230:                                .append("\t");
231:                    result.append("\n");
232:                }
233:
234:                return result.toString();
235:            }
236:
237:            /**
238:             * Main method for testing this class.
239:             *
240:             * @param args 	the options
241:             */
242:            public static void main(String[] args) {
243:                runClassifier(new NaiveBayesMultinomialUpdateable(), args);
244:            }
245:        }

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.