SPSS Python Extension Functions

The SPSS Python Extension is an addon to SPSS that allows you to include Python code in SPSS syntax programs, greatly enhancing the programmability of SPSS. The Python extension is completely free, and is available as part of the standard installation of SPSS.

We recently updated all of our Python macros from Python 2 to Python 3. Python 2 is no longer available in SPSS as of version 29. These conversions have not been fully tested, so you must use them at their own risk. If you have a problem using one of these functions please send a bug report to jamied@virginia.edu.

Those interested in working with Python might also be available in the book "Programming and Data Management in SPSS," which gives a good introduction to using Python in SPSS. It is also free, and is available from the original author (Raynald Levesque) at:

http://www.spsstools.net/spss_programming.htm

So why should you be interested in Python? Well, basically, it gives you the ability to do a lot of new things in SPSS syntax, such as the following:

Create loops that include SPSS analyses
Have dynamic syntax programs so that the results of earlier analyses can affect what analyses are performed later
Search directories for particular types of data sets and perform analyses on all of them
Extract values from analysis output and store them in a data set
Create generic functions that can be reused in later programs

Taking advantage of the last capability, we have created a number of useful Python functions that you might want to use in your own programs. You will need to have the Python extension installed in order to use them, but after that, all you'd need to do is include the function syntax in your program with an INSERT FILE command and you'll be able to use the function.

Functions Related to Mplus

Function name	Last Revision	Description
MplusLPA Process model data Process mean data	2025-05-05	Performs a latent profile analysis or a latent class analysis in Mplus. It can create a dataset containing the variable means from different profiles, and can create a dataset containing model statistics. The "Process model data" syntax will add the LMR test to the model data file. The "Process mean data" syntax will create profile plots based on the means in the mean data file.
MplusLTA	2022-07-19	Performs a latent transition analyseis in Mplus. It creates output and data files similar to those produced by the MplusLPA program.
MplusMItoSPSS	2012-07-13	Reads in a set of multiple imputation datasets generated in Mplus and turns them into a multiple imputation dataset in SPSS.
MplusPathAnalysis	2025-05-08	Performs a path analysis within SPSS by using Mplus. The program automatically converts the active dataset to Mplus, writes the necessary Mplus code, runs the analysis in Mplus, and then brings the results into the SPSS output window.
MplusTwoLevel	2025-05-03	Runs a two-level model from within SPSS by using Mplus. The program automatically converts the active dataset to Mplus, writes the necessary Mplus code, runs the analysis in Mplus, and then brings the results into the SPSS output window.
savedataToSPSS	2021-08-08	Converts a Mplus savedata file to an SPSS data set.
SPSStoMplus	2015-01-19	Converts an SPSS data set into Mplus format, and also generates a skeleton input file that would read the data set.. It performs a number of transformations to the data to make it consistent with the requirements of Mplus.

Other Functions

Function name	Last Revision	Description
artCategorize	2023-06-24	rtificially categorizes a continuous variable. You give the name of the variable and the number of groups, and the function will find the appropriate percentiles and create a new variable that divides people into groups based on their scores on the continuous variable.
checkboxToOrdered	2020-07-20	Takes a list of existing binary variables and converts them into a single categorical variable based on an ordered list.
checkDataset	2022-08-21	Provides basic information about all of the variables in a data set. String variables and variables that have value labels are treated categorically. Other variables are treated continuously.
condSelect	2012-08-23	Allows you to randomly select a fixed number of participants from each level of a grouping variable. This is useful when you want to graphically examine relations but there are too many participants to make sense of the graphs.
corrCI	2018-02-11	Provides a correlation matrix for a set of variables that includes a confidence interval.
CSVtoSPSS	2017-09-27	Locates all of the CSV files in a source directory and converts them all to SPSS data sets, which are placed in an identified target directory.
dBetween	2013-04-04	Calculates the d for a single two-group between-subjects comparison.
dBetweenDataset	2021-03-09	Calculates the results for two-group between subjects comparisons and puts them into a new dataset. Allows you to optionally define multiple groupings, multiple outcomes, and a split variable.
delEmptyVars	2019-10-22	Deletes all variables that have no valid cases.
descriptive	2017-02-16	Allows you to use a summary statistic, like the mean, standard deviation, or median, as part of a formula so that it automatically updates when the data changes.
descriptiveDataset	2016-12-28	Calculates one or more summary statistics for one or more variables and puts the results into a new data set. Also allows you to optionally define a split variable. Rows in the dataset can be labeled, and the dataset can be appended by issuing the command multiple times, potentially with different labels each time.
dummycode	2019-03-21	Automatically creates a set of dummy codes for a categorical variable.
ExcelToSPSS	2014-06-10	Locates all of the Excel files (either .xls or .xlsx) in a source directory and converts them all to SPSS data sets, which are placed in an identified target directory.
exploreDistributions	2020-09-07	Takes a group of continuous variables and then provides information about their univariate and bivariate distributions including 1. Descriptive statistics for each variable 2. Correlations among variables 3. Univariate histograms 4. Bivariate scatterplots 5. Univariate missingness 6. Patterns of missing data
freqDataset	2016-12-03	Creates a dataset containing the frequencies of a set of variables. Rows in the dataset can be labeled, and the dataset can be appended by issuing the command multiple times, potentially with different labels each time.
ICC	2014-11-19	Calculates the intraclass correlation and the design effect for a cluster variable on a list of outcomes.
labelsToValues	2023-02-16	Creates a new variable equal to the labels of another variable.
mergeAllSPSScases	2019-03-08	Locates all of the SPSS data sets in a source directory and merges them into a single file, which is placed in an identified target directory. Specifically used to merge datasets that have a similar set of variables but different cases.
nameSplit	2014-08-31	Given a long filename, it returns it automatically split into smaller portions so that it doesn't violate SPSS constraints on how many characters are allowed on a single line of syntax.
notMissing	2018-11-13	Removes cases missing values on the identified variable. Works on both string and numeric variables.
numericMissing	2019-08-05	Sets the missing values for all numeric variables in the data set using a single command.
removeLabels	2020-08-07	Removes variable labels and/or value labels from a list of variables.
resolveDuplicates	2015-09-21	Resolves duplicate cases in a dataset. You specificy a primary row for each case, which is where most of the values are taken from. However, if the primary row is missing a value on the variable, it is filled in from the other rows for that case if they have valid values.
saveExportData	2024-10-13	Saves a data set, exports it to Excel, creates a data dictionary, and exports the dictionary to Word. This is useful for retaining documentation about your data, and also facilitates transferring the file to ChatGPT for data visualization and analysis.
sensiSpeci	2017-09-27	This program provides users with classification statistics (such as sensitivity and specificity) for the ability of a screener to predict the value of a test.
shrinkString	2021-04-12	Set every string variable in the data set to the smallest size that will fit all its values.
textCSVtoVars	2022-05-02	Takes a set of comma-separated values in a text string and puts each entry into a separate variable. Was originally designed to handlethe responses to Qualtrics checkbox questions.
textSplit	2014-02-17	Reads a text variable and then creates a number of additional variables separating the contents of the original text variable into different parts based on an identified delimiter. This can be used to take a sentence and have each word put in a different variable.
valuesToLabels	2023-02-16	Takes label information that is contained in one variable and uses it to create value labels for a second variable.
variableSuffix	2020-07-28	Applies a specific suffix to a set of variables in the dataset. You can apply the suffix to all of the variables, a specific list of variables, or all of the variables except a specific list of variables.
varMode	2015-01-07	SPSS Python Extension function to calculate the mode of a set of variables.
weightedCorr	2013-10-15	Creates a correlation matrix using a regression weight. This function calculates each weighted correlation individually using the regression command (rather than using the WEIGHT command, which artificially increases your sample size), and then combines them into a correlation matrix.
withinDescriptive	2018-11-03	Similar to the descriptive function, this allows you to use a summary statistic as part of a formula. However, this function restricts the calculation to those cases that are in the same condition as the current case on a split variable.

Back to the Stat-Help.com home page.