Tuesday, March 17, 2009

CSV Files to XML

Comma Separated Value files (CSV) are one of the most common file exchange formats seen. It is a "flat" format, where typically each row has the same format. Each field, or data item, is separated from the others by a comma, and fields containing commas are quoted. DataDirect XML Converters are high-performance Java and .NET data conversion components that support converting of CSV to XML and XML to CSV.
CSV Import and Export

CSV Import/Export is a feature of almost every program that deals with tabular data, from accounting programs like Intuit's Quicken to office programs such as Microsoft Excel. It is a very old format, dating to the earliest days of computing. Even Radio Shack's TRS-80 model I used CSV data in DATA statements.

But in spite of the ubiquity of the CSV data format, there is a surprising number of variations in implementations. For this reason, the DataDirect XML Converters implementation offers several switches which can customize the behavior for your specific application.
Convert CSV to XML

The following options are supported in the DataDirect XML Converter CSV converter precisely because they are unspecified in the real world, and vary between applications.

* Whether the first row of the CSV data contains the names of the fields, or they are known by the sending and receiving applications implicitly
* How commas within values are handled in CSV files
* How quotes within quoted values are handled in CSV files. Are they doubled to escape them, or is there a special escape character, such as a backslash?
* Whether single or double quotes are used
* Whether runs of consecutive but empty fields should be ignored
* What encoding is used — Windows1252, ISO-8859-1 or US-ASCII are the most common
* What line ending is used — CR/LF from Windows, LF from Unix/Linux, or CR from the Macintosh are often seen

TSV Files

A close cousin to CSV files are TSV, or Tab Separated Value files. These are a common export format from spreadsheets also. There is a separate DataDirect XML Converter for these, but it actually is just the CSV Converter in disguise, since either can be converted to the other by changing the definition of the separator character from comma to tab.
CSV Sample File

Use the following sample CSV file for use in trying out DataDirect XML Converters:


Using the above file, a tiny Java, C# or VB.Net program can quickly transform the input CSV to XML, or convert XML to CSV. For example, the following Java program converts the above CSV sample file into XML and writes the output to the console.
import java.io.*;
import javax.xml.transform.stream.*;
import com.ddtek.xmlconverter.*;
public class CSVDemo {
public static void main(String[] args) throws Throwable {
ConverterFactory factory = new ConverterFactory();
ConvertToXML toXML = factory.newConvertToXML("converter:CSV:first=yes");
OutputStreamWriter w = new OutputStreamWriter(System.out, "US-ASCII");
toXML.convert(new StreamSource(args[0]), new StreamResult(w));
}
}

This would create the following output (with some parts removed).


XML to CSV

Going from XML to CSV is just as important, and the code is just as small. It takes no special schema, and it doesn't even take XQuery or XSLT. Anything immediately under the root element becomes the rows, and anything under that becomes the values between the commas. In the above example, switching to the ConvertFromXML class and writing the sample XML in would produce the initial CSV file.

Just like the EDI converters and other converters in the DataDirect XML Converters suite, CSV (and TSV) converters are fully bidirectional.
How To Use CSV Files

CSV files often are used as a bridge between older applications and newer ones. Having a tool that transparently reads and writes CSV format files in your toolbox can be very valuable. So whether you have to merge inventory data coming from an outside vendor into your database or publish a list of financial figures for your accountants, being able to bridge these two worlds is the purpose of the DataDirect XML Converters. You can get started with directly accessing CSV data from your Java or .NET applications by downloading a free trial of DataDirect XML Converters today.

No comments: